Kuali Rice Development
  1. Kuali Rice Development
  2. KULRICE-4037

KFS user preference records cause batch slowness due to too many krew_usr_optn_t records in Rice 1.0.1.1

    Details

    • Type: Bug Fix Bug Fix
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: KFS Release 3.0.1
    • Fix Version/s: 1.0.2
    • Component/s: Development
    • Labels:
      None
    • Environment:
      KFS 3.0.1 with Rice 1.0.1.1 embedded. Tomcat with Java 1.6 on AIX with Oracle 10g DB
    • Similar issues:
      KULRICE-7796getWorkflowDocumentService().documentSearch saves the search into user options but is failing to save because the value for VAL is too big.
      KULRICE-5211Implement PreferencesService (KREW_USR_OPTN_T) so that it is accessed remotely instead of via database
      KULRICE-3242Unique constraint violated on insert into KREW_USR_OPTN_T
      KULRICE-5968Allow for users to customize their email notification preferences based on the document type
      KULRICE-5011Implement an improved way to delete RELOAD_ACTION_LIST prefs for users
      KULRICE-1270CustomizableActionListEmailServiceImpl is too slow for production email batch
      KULRICE-4280RELOAD_ACTION_LIST user options not getting deleted on first load of Action List
      KULRICE-9799Document search saving search to user option even for system user, which causes KFS batch jobs to fail
      KULRICE-6326Too many entries in the namespace dropdown causes it to be unwieldy
      KULRICE-10288Convert KEW UserOption table to JPA
    • Rice Module:
      Rice Core
    • Application Requirement:
      KFS

      Description

      The pcard batch create and route steps drastically slowed down (increased from 1 sec per pcard to 45 secs per pcard) between KFS 3.0/Rice 1.0.1 and KFS 3.0.1/Rice 1.0.1.1. The pCard Create step normally runs in a few minutes, but can take over 5 hours for 600 pcards.

      CSU found that the KFS user (principal id 2), has 165,000 records in the krew_usr_optn_t table after 9 months in production. This is not causing a problem in production on Rice 1.0.1, however, is adding hours of processing time to KFS 3.0.1/Rice 1.0.1.1. We can reduce this delay by deleting the entries in the table with prsn_optn_id similar to RELOAD_ACTION_LIST124691870426661790.

      The slow down appears in DefaultNotificationService in the shouldNotify method in the following line:
      Preferences preferences = KEWServiceLocator.getPreferencesService().getPreferences(actionItem.getPrincipalId());

        Issue Links

          Activity

          Hide
          Peter Giles (Inactive) added a comment -

          BTW, I also looked into doing a conditional insert, but I couldn't find a database independent way to do so.

          Show
          Peter Giles (Inactive) added a comment - BTW, I also looked into doing a conditional insert, but I couldn't find a database independent way to do so.
          Hide
          Eric Westfall added a comment -

          I think the concurrency issue here probably has to do with optimistic lock exceptions. Say there are multiple background threads at once (possibly processing different documents through the workflow engine) which all end up updating the same user's action list. If we were to do updates to the record, or delete it we are going to have problems with optimistic locks (i believe the user option table has a version number column).

          One possible thought, could we have the code always execute a delete prior to inserting the new record? So, for example:

          DELETE from KREW_USR_OPTN_T where prsn_optn_id like 'RELOAD_ACTION_LIST%' and prncpl_id=?

          Then insert the new RELOAD_ACTION_LISTxxxxxxx option? The only concern I would have with that is that it would cause each of those background threads to block each other until each one commits.

          One other thought, considering this is a caching thing, could we instead spawn a separate thread which updates/inserts/deletes/whatever the reload option in the table? That would help us to prevent from concurrency on this table during critical workflow engine processing.

          Show
          Eric Westfall added a comment - I think the concurrency issue here probably has to do with optimistic lock exceptions. Say there are multiple background threads at once (possibly processing different documents through the workflow engine) which all end up updating the same user's action list. If we were to do updates to the record, or delete it we are going to have problems with optimistic locks (i believe the user option table has a version number column). One possible thought, could we have the code always execute a delete prior to inserting the new record? So, for example: DELETE from KREW_USR_OPTN_T where prsn_optn_id like 'RELOAD_ACTION_LIST%' and prncpl_id=? Then insert the new RELOAD_ACTION_LISTxxxxxxx option? The only concern I would have with that is that it would cause each of those background threads to block each other until each one commits. One other thought, considering this is a caching thing, could we instead spawn a separate thread which updates/inserts/deletes/whatever the reload option in the table? That would help us to prevent from concurrency on this table during critical workflow engine processing.
          Hide
          Peter Giles (Inactive) added a comment -

          All of the other methods I considered either required compromising vendor independence at the database layer, gave me a really bad feeling around db lock contention, or fell down due to the potential for optimistic locking collisions. I went with a solution that is not pretty (read: hacky), but is much simpler than the alternatives without a large price in lock contention.

          The following has beein committed:

          UserOptionsServiceImpl.java:

          • added 10% chance of doing a delete of RELOAD_ACTION_LIST prefs for the user on each saveRefreshUserOption(...) call.

          UserOptionsServiceCache.java:

          • no longer clearing out the user's cache on refreshActionList(...) calls. This isn't needed to mitigate KULRICE-4037, but clearing the cache in this circumstance is not needed.

          If this solution is sufficiently despised to require rolling it back and doing something else, my plan B would be to add methods to the UserOptionsDAO to specifically deal with RELOAD_ACTION_LIST prefs, and have the DAOs use straight JDBC for those methods. I would then use only a single user pref (no numbers tagged on), and the only operations I would perform would be to insert or delete it. Inserts would have to catch and ignore SQLException, assuming it meant constraint violation because the flag was already set. This would actually be the best performer, and is nice in that there is only ever 1 row per user, but loses points for it's JDBC-ness.

          Show
          Peter Giles (Inactive) added a comment - All of the other methods I considered either required compromising vendor independence at the database layer, gave me a really bad feeling around db lock contention, or fell down due to the potential for optimistic locking collisions. I went with a solution that is not pretty (read: hacky), but is much simpler than the alternatives without a large price in lock contention. The following has beein committed: UserOptionsServiceImpl.java: added 10% chance of doing a delete of RELOAD_ACTION_LIST prefs for the user on each saveRefreshUserOption(...) call. UserOptionsServiceCache.java: no longer clearing out the user's cache on refreshActionList(...) calls. This isn't needed to mitigate KULRICE-4037 , but clearing the cache in this circumstance is not needed. If this solution is sufficiently despised to require rolling it back and doing something else, my plan B would be to add methods to the UserOptionsDAO to specifically deal with RELOAD_ACTION_LIST prefs, and have the DAOs use straight JDBC for those methods. I would then use only a single user pref (no numbers tagged on), and the only operations I would perform would be to insert or delete it. Inserts would have to catch and ignore SQLException, assuming it meant constraint violation because the flag was already set. This would actually be the best performer, and is nice in that there is only ever 1 row per user, but loses points for it's JDBC-ness.
          Hide
          Peter Giles (Inactive) added a comment -

          I'll take the lack of boos and hisses to mean the solution was acceptable. Resolving.

          Show
          Peter Giles (Inactive) added a comment - I'll take the lack of boos and hisses to mean the solution was acceptable. Resolving.
          Hide
          Peter Giles (Inactive) added a comment -

          The work done for this issue introduced a database deadlocking bug. Please see https://wiki.kuali.org/display/KULRICE/Kuali+Rice+Performance#KualiRicePerformance-CachingBehavioranddeadlocksinKREWUSROPTNT and KULRICE-5011 for information on a fix.

          Show
          Peter Giles (Inactive) added a comment - The work done for this issue introduced a database deadlocking bug. Please see https://wiki.kuali.org/display/KULRICE/Kuali+Rice+Performance#KualiRicePerformance-CachingBehavioranddeadlocksinKREWUSROPTNT and KULRICE-5011 for information on a fix.

            People

            • Assignee:
              Peter Giles (Inactive)
              Reporter:
              John Walker (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Structure Helper Panel