Monday, November 23, 2009

Reading Things That Aren't There... and Missing Things That Are!

You can shoot yourself in the foot using DB2 if you are not careful. There are options that you can specify that may cause you to read data that is not really in the database. And, alternately, you can set things up so that you miss reading data that is actually in the database.

How, you might be asking? Well, dirty reads will take care of the first one. Specifying ISOLATION(UR) implements read-through locks, which is sometimes referred to as a dirty read. It applies to read operations only. With this isolation level data may be read that never actually exists in the database, because the transaction can read data that has been changed by another process but is not yet committed.

Read uncommitted isolation provides the highest level availability and concurrency of the isolation levels, but the worst degree of data integrity. It should be used only when data integrity problems can be tolerated. Certain types of applications, such as those using analytical queries, estimates, and averages are likely candidates for read uncommitted locking. A dirty read can cause duplicate rows to be returned where none exist or no rows may be returned when one (or more) actually exists. When choosing read uncommitted isolation the programmer and DBA must ensure that these types of problems are acceptable for the application.

OK, so what about not reading data that is in the database? DB2 V9 provides us an option to do just that. In DB2 9 it is possible for a transaction to skip over rows that are locked. This can be accomplished by means of the SKIP LOCKED DATA option within your SQL statement(s). SKIP LOCKED DATA can be specified in SELECT, SELECT INTO, and PREPARE, as well as searched UPDATE and DELETE statements. You can also use the SKIP LOCKED DATA option with the UNLOAD utility.

When you tell DB2 to skip locked data then that data is not accessed and your program will not have it available. DB2 just skip over any locked data instead of waiting for it to be unlocked. The benefit, of course, is improved performance because you will not incur any lock wait time. But it comes at the cost of not accessing the locked data at all. This means that you should only utilize this clause when your program can tolerate skipping over some data.

The SKIP LOCKED DATA option is compatible with cursor stability (CS) isolation and read stability (RS) isolation. But it cannot be used with uncommitted read (UR) or repeatable read (RR) isolation levels. DB2 will simply ignore the SKIP LOCKED DATA clause under UR and RR isolation levels.

Additionally, SKIP LOCKED DATA works only with row locks and page locks. That means that SKIP LOCKED DATA does not apply to table, partition, LOB, XML, or table space locks. And the bigger the lock size, the more data that will be skipped when a lock is encountered. With row locking you will be skipping over locked rows, but with page locking you will be skipping over all the rows on the locked page.

Use both of these features with extreme care and make sure that you know exactly what you are telling DB2 to do. Otherwise, you might be reading more... or less than you want!

Labels: , , ,

Monday, July 06, 2009

The "Dirty" Read (AKA Uncommitted Read)

Anyone accustomed to application programming when access to a database is required understands the potential for concurrency problems. When one application program attempts to read data that is in the process of being changed by another, the DBMS must forbid access until the modification is complete in order to ensure data integrity. Most DBMS products, DB2 included, use a locking mechanism for all data items being changed. Therefore, when one task is updating data on a page, another task can not access data (read or update) on that same page until the data modification is complete and committed.

Programs that read DB2 data typically access numerous rows during their execution and are thus quite susceptible to concurrency problems. Since V4, DB2 has provided read-through locks, also know as “dirty read” or “uncommitted read,” to help overcome concurrency problems. When using an uncommitted reads an application program can read data that has been changed, but is not yet committed.

Dirty read capability is implemented using a new isolation level, UR, for uncommitted read. This adds to the current isolation levels of RR (repeatable read) and CS (cursor stability). If the application program is using the UR isolation level, it will read data without taking locks. This enables the application program to read data contained in the table as it is being manipulated.

Consider the following sequence of events:

(1) At 9:00 AM, the a transaction is executed containing the following SQL to change a specific value.

UPDATE EMP
SET FIRST_NAME = “MICHELLE”
WHERE EMPNO = 10020;

The transaction is a long-running one and continues to execute without issuing a COMMIT.

(2) At 9:01 AM, a second transaction attempts to SELECT the data that was changed, but not committed.


If the UR isolation level were specified for the second transaction, it would read the changed data even though it had yet to be committed. Obviously, if the program need not wait to take a lock and merely reads the data in whatever state it happens to be at that moment, the program will execute faster than if it had to wait for locks to be taken and resources to be freed before processing.

However, the implications of reading uncommitted data must be carefully examined before being implemented. Several types of problems can occur. A dirty read can cause duplicate rows to be returned where none exist. Alternately, a dirty read can cause no rows to be returned when one (or more) actually exists. Additionally, an ORDER BY clause does not guarantee that rows will be returned in order if the UR isolation level is used. Obviously, these problems must be taken into consideration before using the UR isolation level.

UR Isolation Requirements

  • The UR isolation level applies to read-only operations: SELECT, SELECT INTO, and FETCH from a read-only result table.
  • Any application plan or package bound with an isolation level of UR will use uncommitted read functionality for any read-only SQL. Operations that are contained in the same plan or package and are not read-only will use an isolation level of CS.
  • The isolation level that is defined at the plan or package level during BIND or REBIND can be overridden as desired for each SQL statement in the program. The WITH clause can be used to specify the isolation level for any individual SQL statement. For example:


SELECT EMPNO, LAST_NAME
FROM EMP
WITH UR;

The WITH clause is used to allow an isolation level to be used on a statement-by-statement basis. The UR isolation level can be used only with read-only SQL statements. This includes read-only cursors and SELECT INTO statements.

Benefits and Drawbacks

So when is it a good idea to implement dirty reads using the UR isolation level? The general rule of thumb is to avoid dirty reads whenever the results must be 100 percent accurate. Examples of this would be when:
  • calculations that must balance are being performed on the selected data
  • data is being retrieved from one source to insert to or update another
  • production, mission-critical work is being performed that can not contain or cause data integrity problems
Truthfully, most DB2 applications are not usually candidates for dirty reads. However, there are a few specific situations in which the dirty read capability will be of major benefit. Consider the following cases in which the UR isolation level could prove to be useful:
  • Access is required to a reference, code, or look-up table that is basically static in nature. Due to the non-volatile nature of the data, a dirty read would be no different than a normal read the majority of the time. In those cases when the code data is being modified, any application reading the data would incur minimum, if any, problems.
  • Statistical processing must be performed on a large amount of data. For example, your company may wish to determine the average age of female employees within a certain pay range. The impact of an uncommitted read on an average of multiple rows will be minimal because a single value changed will not greatly impact the result.
  • Dirty read can prove invaluable in a data warehousing environment that uses DB2 as the DBMS. A data warehouse is a time sensitive, subject-oriented, store of business data that is used for on-line analytical processing. Other than periodic data propagation and/or replication, access to the data warehouse is read only. An uncommitted read can be perfect in a read only environment since it can cause little damage because the data is generally not changing. More and more data warehouse projects are being implemented in corporations worldwide and DB2 with dirty read capability may be a very wise choice for data warehouse implementation.
  • In those rare cases when a table, or set of tables, is used by a single user only, UR can make a lot of sense. If only one individual can be modifying the data, the application programs can be coded such that all (or most) reads are done using UR isolation level, and the data will still be accurate.
  • Finally, if the data being accessed is already inconsistent little harm can be done using a dirty read to access the information.
Administration

Because of the data integrity issues associated with dirty reads, it is a good idea for DBAs to keep track of the plans and packages that specify an isolation level of UR. This information can be found in the DB2 catalog. The following two queries can be used to find the applications using uncommitted reads.

Issue the following SQL for a listing of plans that were bound with ISOLATION(UR) or contain at least one statement specifying the WITH UR clause:

SELECT DISTINCT S.PLNAME
FROM SYSIBM.SYSPLAN P,
SYSIBM.SYSSTMT S
WHERE (P.NAME = S.PLNAME AND
P.ISOLATION = ‘U’
)
OR S.ISOLATION = ‘U’
ORDER BY S.PLNAME;

Issue the following SQL for a listing of packages that were bound with ISOLATION(UR) or contain at least one statement specifying the WITH UR clause:

SELECT DISTINCT P.COLLID, P.NAME, P.VERSION
FROM SYSIBM.SYSPACKAGE P,
SYSIBM.SYSPACKSTMT S
WHERE (P.LOCATION = S.LOCATION AND
P.LOCATION = ‘ ‘ AND
P.COLLID = S.COLLID AND
P.NAME = S.NAME AND
P.VERSION = S.VERSION AND
P.ISOLATION = ‘U’
)
OR S.ISOLATION = ‘U’
ORDER BY S.COLLID, S.NAME, S.VERSION;

Synopsis

The dirty read capability can provide relief to concurrency problems and deliver faster performance in very specific situations. Be certain to understand the implications of the UR isolation level and the “problems” it can cause before diving headlong into implementing it in your production applications.

Labels: , ,

Monday, June 22, 2009

Know Your ISOLATION Levels

Did you know that DB2 provides a way to change the way that a program or SQL statement acquires locks? That way is known as the isolation level and it can be set to specify the locking behavior for a transaction or statement. Standard SQL defines four isolation levels that can be set using the SET TRANSACTION ISOLATION LEVEL statement:

  • Serializable
  • Repeatable read
  • Read committed
  • Read uncommitted

The isolation level determines the mode of page or row locking implemented by the program as it runs.

DB2 supports a variation of the standard isolation levels. DB2 implements page and row locking at the program execution level, which means that all page or row locks are acquired as needed during the program run. Page and row locks are released as the program run depending on the isolation level.

In DB2 you can specify the following four isolation levels:

  • cursor stability (CS),
  • repeatable read (RR),
  • read stability (RS), and
  • uncommitted read (UR).

Using the ISOLATION parameter of the BIND command you can set the isolation level of a package or plan. You also can use the WITH parameter on a SELECT statement to set the isolation level of a single SQL statement.

Cursor stability is the DB2 implementation of the SQL standard read committed isolation level. CS is perhaps the most common DB2 isolation level in use in production applications because it offers a good tradeoff between data integrity and concurrency. When CS is specified the transaction will never read data that is not yet committed; only committed data can be read.

A higher level of integrity is provided with repeatable read. Under an RR isolation level all page locks are held until they are released by a COMMIT (or ROLLBACK), whereas with CS read-only page locks are released as soon as another page is accessed. Repeatable read is the default isolation level if none is specified at BIND time.

An RR page locking strategy is useful when an application program requires consistency in rows that may be accessed twice in one execution of the program, or when an application program requires data integrity that cannot be achieved with CS.

For example of a good reason to use RR page locking, consider a reporting program that scans a table to produce a detail report, and then scans it again to produce a summarized managerial report. If the program is bound using CS, the results of the first report might not match the results of the second.

Suppose that you are reporting the estimated completion dates for project activities. The first report lists every project and the estimated completion date. The second, managerial report lists only the projects with a completion date greater than one year.

The first report indicates that two activities are scheduled for more than one year. After the first report but before the second, however, an update occurs. A manager realizes that she underestimated the resources required for a project. She invokes a transaction to change the estimated completion date of one of her project's activities from 8 months to 14 months. The second report is produced by the same program, but reports 3 activities.

If the program used an RR isolation level rather than CS, an UPDATE that occurs after the production of the first report but before the second would not have been allowed. The program would have maintained the locks it held from the generation of the first report and the updater would be locked out until the locks were released.

How about another example? Consider a program that is looking for pertinent information about employees in the information center and software support departments who make more than $30,000 in base salary. In the DB2 sample tables department 'C01' is the information center and department 'E21' is software support.

The program opens a cursor based on the following SELECT statement:


SELECT EMPNO, FIRSTNME, LASTNAME,
WORKDEPT, SALARY
FROM DSN8710.EMP
WHERE WORKDEPT IN ('C01', 'E21')
AND SALARY > 30000;

The program then begins to FETCH employee rows. Assume further, as would probably be the case, that the statement uses the XEMP2 index on the WORKDEPT column. An update program that implements employee modifications is running concurrently. The program handles transfers by moving employees from one department to another, and implements raises by increasing the salary.

Assume that Sally Kwan, one of your employees, has just been transferred from the information center to software support. Assume further that another information center employee, Heather Nicholls, received a 10 percent raise. The update program running concurrently with the report program implements both of these modifications.

If the report program were bound with an isolation level of CS, the second program could move Sally from 'C01' to 'E21' after she was reported to be in department 'C01' but before the entire report was finished. Thus, she could be reported twice: once as an information center employee and again as a software support employee. Although this circumstance is rare, it can happen with programs that use cursor stability. If the program were bound instead with RR, this problem could not happen. The update program probably would not be allowed to run concurrently with a reporting program, however, because it would experience too many locking problems.

Now consider Heather's dilemma. The raise increases her salary 10 percent, from $28,420 to $31,262. Her salary now fits the parameters specified in the WHERE condition of the SQL statement. Will she be reported? It depends on whether the update occurs before or after the row has been retrieved by the index scan, which is clearly a tenuous situation. Once again, RR avoids this problem.

You might be wondering, "If CS has the potential to cause so many problems, why is it used so ubiquitously? Why not trade the performance and concurrency gain of CS for the integrity of RR?"

The answer is simple: the types of problems outlined are rare. The expense of using RR, however, can be substantial in terms of concurrency. So the tradeoff between the concurrency expense of RR and the efficiency of CS usually is not a sound one.

The third isolation level provided by DB2 is read stability (RS). Read stability is similar in functionality to the RR isolation level, but a little less. A retrieved row or page is locked until the end of the unit of work; no other program can modify the data until the unit of work is complete, but other processes can insert values that might be read by your application if it accesses the row a second time.

Consider using read stability over repeatable read only when your program can handle retrieving a different set of rows each time a cursor or singleton SELECT is issued. If using read stability, be sure your application is not dependent on having the same number of rows returned each time.

Finally, we come to the last, and most maligned isolation level, uncommitted read (UR). The UR isolation level provides read-through locks, also know as dirty read or read uncommitted. Using UR can help to overcome concurrency problems. When you're using an uncommitted read, an application program can read data that has been changed but is not yet committed.

UR can be a performance booster, too, because application programs bound using the UR isolation level will read data without taking locks. This way, the application program can read data contained in the table as it is being manipulated. Consider the following sequence of events:

1. To change a specific value, at 9:00 a.m. a transaction containing the following SQL is executed:


UPDATE EMP
SET FIRSTNME = "MICHELLE"
WHERE EMPNO = 10020;

The transaction is a long-running one and continues to execute without issuing a COMMIT.


2. At 9:01 a.m., a second transaction attempts to SELECT the data that was changed, but not committed.


If the UR isolation level were used for the second transaction, it would read the changed data even though it had yet to be committed. Obviously, if the program doesn't need to wait to take a lock and merely reads the data in whatever state it happens to be at that moment, the program will execute faster than if it had to wait for locks to be taken and resources to be freed before processing.

The implications of reading uncommitted data, however, must be carefully examined before being implemented. Several types of problems can occur. Using the previous example, if the long-running transaction rolled back the UPDATE to EMPNO 10020, the program using dirty reads may have picked up the wrong name ("MICHELLE") because it was never committed to the database.

Inaccurate data values are not the only problems that can be caused by using UR. A dirty read can cause duplicate rows to be returned where none exist. Alternatively, a dirty read can cause no rows to be returned when one (or more) actually exists. Additionally, an ORDER BY clause does not guarantee that rows will be returned in order if the UR isolation level is used. Obviously, these problems must be taken into consideration before using the UR isolation level.

Keep in mind, too, that the UR isolation level applies to read-only operations: SELECT, SELECT INTO, and FETCH from a read-only result table. Any application plan or package bound with an isolation level of UR will use uncommitted read functionality for any read-only SQL. Operations contained in the same plan or package and are not read-only will use an isolation level of CS.

When is it appropriate to use UR isolation? The general rule of thumb is to avoid UR whenever the results must be 100 percent accurate. Following are examples of when this would be true:

  • Calculations that must balance are being performed on the selected data

  • Data is being retrieved from one source to insert to or update another

  • Production, mission-critical work is being performed that cannot contain
    or cause data integrity problems

In general, most DB2 applications are not serious candidates for dirty reads. In a few specific situations, however, the dirty read capability will be of major benefit. Consider the following cases in which the UR isolation level could prove to be useful:

  • Access is required to a reference, code, or look-up table that basically is static in nature. Due to the non-volatile nature of the data, a dirty read would be no different than a normal read the majority of the time. In those cases when the code data is being modified, any application reading the data would incur minimum, if any, problems.

  • Statistical processing must be performed on a large amount of data. Your company, for example, might want to determine the average age of female employees within a certain pay range. The impact of an uncommitted read on an average of multiple rows will be minimal because a single value changed will not greatly impact the result.

  • Dirty reads can prove invaluable in a data warehousing environment that uses DB2 as the DBMS. A data warehouse is a time-sensitive, subject-oriented, store of business data that is used for online analytical processing. Other than periodic data propagation and/or replication, access to the data warehouse is read-only. Because the data is generally not changing, an uncommitted read is perfect in a read-only environment due to the fact that it can cause little damage. More data warehouse projects are being implemented in corporations worldwide and DB2 with dirty read capability is a very wise choice for data warehouse implementation.

  • In those rare cases when a table, or set of tables, is used by a single user only, UR can make a lot of sense. If only one individual can be modifying the data, the application programs can be coded such that all (or most) reads are done using UR isolation level, and the data will still be accurate.

  • Finally, if the data being accessed already is inconsistent, little harm can be done using a dirty read to access the information.

Although the dirty read capability can provide relief to concurrency problems and deliver faster performance in specific situations, it also can cause data integrity problems and inaccurate results. Be sure to understand the implications of the UR isolation level and the problems it can cause before diving headlong into implementing it in your production applications.

Summary

It is important for DB2 DBAs and application programmers to know the four isolation levels and their impact on SQL. Using the isolation levels is an effective way to control concurrency and locking for your DB2 applications.

Labels: , , ,