Minimizing Lock Contention Issues
One issue that comes up a lot is dealing with locking issues. Usually it is posed by someone who is experiencing timeouts in an online environment and they want to know how to minimize them. Here is some guidance.
When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify. So, it stands to reason that you should try to minimize the duration of locks that are being held in your system. There are some approaches you can take to achieve this, but for the most part, they require programming changes.
First of all, make sure that all of your batch processes -- especially any that run during the same timeframe, but really all batch process -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks.
A good approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.
(Also, please note that these are just sample numbers and not necessarily the correct numbers to start with at your shop.)
A good article to read regarding COMMIT strategies is "The Woes of Commitment" by Bonnie Baker.
Next, look at all of your programs, batch and online, and try to save the data modification statements to as close to the COMMIT as possible. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.
If you want to investigate the timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).