Kuali Rice Development
  1. Kuali Rice Development
  2. KULRICE-4477

Investigate how to remove the requirement to use SELECT ... FOR UPDATE when processing workflow documents

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Development
    • Labels:
    • Similar issues:
      KULRICE-5772When field is required that uses a select control, clicking on the control to select a value gives the required error and kicks you out
      KULRICE-9520Document the process for setting up and using AnnotationMetadataProvider
      KULRICE-4892Document Operation screen in workflow doesn't display XML properly if it's encrypted with a different encryption key then the standalone server uses
      KULRICE-894Investigate the ability to have a Compensation mechinism invoked from a workflow process
      KULRICE-943Update workgroup documentation to include Workgroup types and configuration
      KULRICE-9521Document the process for setting up and using SpringMetadataProvider
      KULRICE-357sync rice/kfs/workflow ddl process
      KULRICE-897Investigate removing document content XML payload when doc goes to final
      KULRICE-10360Investigate a reset option for dialog groups
      KULRICE-12987KRAD AdHoc operation is resetting document status
    • Rice Module:
      KEW
    • KAI Review Status:
      Not Required
    • KTI Review Status:
      Not Required

      Description

      This has long been a problem where we select for update on KREW_DOC_HDR_T by document id when the workflow engine commences processing.

      It does this to prevent optimistic lock exceptions from happening if the document is processed both by users of the application and/or by background processing threads.

        Activity

        Hide
        Eric Westfall added a comment -

        Here are some thoughts from Jonathan on this subject:

        So, I was thinking of how to remove that SELECT ... FOR UPDATE locking of the workflow table.

        We have a perfect hook for the call in the RouteHeaderService.lockRouteHeader() method.  So we could call whatever we needed to from there.  My question is whether the KSB has the model for broadcasting and requesting responses of a service on the network.  We might want for it to be a synchronous call, perhaps with a timeout so that a down-node would not cause too much slowdown.

        I was thinking of sending out the message to all the other KEW instances in a cluster to see if any of them have locked a document.

        Another option would be a lock table separate from the document header.  But the updates to and checks on that table would need to be done via a non-transactional datasource separate from the main threads to allow other processing threads and servers to see the change.  This would be more of an impacting change, but it seems simpler overall.  I'm just not sure how to make the service style work and perform regardless of the state of the nodes in the cluster.

        Show
        Eric Westfall added a comment - Here are some thoughts from Jonathan on this subject: So, I was thinking of how to remove that SELECT ... FOR UPDATE locking of the workflow table. We have a perfect hook for the call in the RouteHeaderService.lockRouteHeader() method.  So we could call whatever we needed to from there.  My question is whether the KSB has the model for broadcasting and requesting responses of a service on the network.  We might want for it to be a synchronous call, perhaps with a timeout so that a down-node would not cause too much slowdown. I was thinking of sending out the message to all the other KEW instances in a cluster to see if any of them have locked a document. Another option would be a lock table separate from the document header.  But the updates to and checks on that table would need to be done via a non-transactional datasource separate from the main threads to allow other processing threads and servers to see the change.  This would be more of an impacting change, but it seems simpler overall.  I'm just not sure how to make the service style work and perform regardless of the state of the nodes in the cluster.
        Hide
        Travis Schneeberger added a comment -

        Hey Eric - just for my own curiosity, what is the problem with using SELECT ... FOR UPDATE and why do we want to switch to something else?

        Show
        Travis Schneeberger added a comment - Hey Eric - just for my own curiosity, what is the problem with using SELECT ... FOR UPDATE and why do we want to switch to something else?
        Hide
        Eric Westfall added a comment -

        Well, I don't have specifics here unfortunately, but we've encountered deadlocking in production Rice as a result of this (though not very frequently). Additionally, I think it was CSU that encountered a lot of problems because of document locking.'

        Also, because of the way it's implemented I believe it prevents actions like "approve" from being approved against a document while it is being processed by the workflow engine. Since engine processing can take many seconds (sometimes a minute or so) for certain documents it's undesirable to have the user's browser just "hang" for that period of time.

        Finally, because we can't guarantee the order in which locking of multiple documents will occur within transactions, there is the possibility for a deadlock (for example, tx1 locks doc A, tx2 locks doc B, tx1 attempts to lock B but gets blocked, tx2 attempts to lock A but gets blocked...deadlock). We actually had to add a "getAdditionalDocumentLockingIds" or something along those lines to the post processor to work around this and it's pretty kludgy.

        Show
        Eric Westfall added a comment - Well, I don't have specifics here unfortunately, but we've encountered deadlocking in production Rice as a result of this (though not very frequently). Additionally, I think it was CSU that encountered a lot of problems because of document locking.' Also, because of the way it's implemented I believe it prevents actions like "approve" from being approved against a document while it is being processed by the workflow engine. Since engine processing can take many seconds (sometimes a minute or so) for certain documents it's undesirable to have the user's browser just "hang" for that period of time. Finally, because we can't guarantee the order in which locking of multiple documents will occur within transactions, there is the possibility for a deadlock (for example, tx1 locks doc A, tx2 locks doc B, tx1 attempts to lock B but gets blocked, tx2 attempts to lock A but gets blocked...deadlock). We actually had to add a "getAdditionalDocumentLockingIds" or something along those lines to the post processor to work around this and it's pretty kludgy.
        Hide
        Eric Westfall added a comment -

        Oh yeah, the other reason is because "My DBAs will yell at me"...and they have

        Show
        Eric Westfall added a comment - Oh yeah, the other reason is because "My DBAs will yell at me"...and they have

          People

          • Assignee:
            Unassigned
            Reporter:
            Eric Westfall
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Structure Helper Panel