Tuesday, May 26, 2009

Ensuring Data Integrity is a Tricky Business

The term "data integrity" can mean different things to different people and at different times. But at a high level, there really are two aspects of integrity with respect to databases: database structure integrity and semantic data integrity. Keeping track of database objects and ensuring that each object is created, formatted and maintained properly is the goal of database structure integrity. Each DBMS uses its own internal format and structure to support the databases, table spaces, tables, and indexes under its control. System and application errors at times can cause faults within these internal structures. The DBA must identify and correct such faults before insurmountable problems occur. Semantic data integrity refers to the meaning of data and relationships that need to be maintained between different types of data. The DBMS provides options, controls and procedures to define and assure the semantic integrity of the data stored within its databases.

Structural database integrity and consistency is critical in the ongoing administration of databases. If the structural integrity of the database is not sound, everything else will be suspect, too. There are multiple types of structural problems that can occur. Indexing problems are one. Certain types of database maintenance can cause such problems and DBAs need to be able to recognize the problem, and rebuild the indexes to correct their structural integrity. Indexes are not the only database objects that utilize pointers. Many DBMSs use pointers to store very large objects containing text and image data. These can become corrupted.In today's modern database systems, structural integrity is rare -- much rarer than it used to be.

The more difficult and more pervasive problem is assuring the semantic integrity of the data. Getting that right requires proper design, processes that match your business requirements, good communication skills, and constant vigilance.

Perhaps the number one cause of data integrity problems is improperly designed databases. Just getting the data type and length correct for each column can go a long way to making sure the right data is stored. Think about it. If you need to store dates but the column is defined as CHAR(8) how can you enforce that only valid dates are stored? You would need to code program logic to accomplish that. But if the column is defined as DATE then the DBMS would take care of it -- and more of the data would be likely to be correct.

The DBA must also set up data relationships properly in the database. This is done using referential integrity (RI), a method for ensuring the "correctness" of data within a DBMS. People tend to over-simplify RI stating that it is merely the identification of relationships between tables. It is actually much more than this. Of course, the identification of the primary and foreign keys that constitutes a relationship between tables is a component of defining referential integrity. Basically, RI guarantees that an acceptable value is always in the foreign key column. Acceptable is defined in terms of an appropriate value as housed in the corresponding primary key (or perhaps null).

The combination of the relationship and the rules attached to that relationship is referred to as a referential constraint. The rules that accompany the RI definition are just as important as the relationship. These rules define how data is to be properly added to the databases and what happens when it is removed.

There are other mechanisms in the DBMS that DBAs can use to enforce semantic data integrity. Check constraints and rules can be applied to columns that dictate valid values. The DBMS will reject invalid data that does not conform to the constraints. More complex data relationships can be set up using database triggers.

Every DBA should take advantage of the mechanisms provided by the DBMS to ensure data integrity. When DBMS-provided methods are used, fewer data integrity problems are likely to be found. Fewer data integrity problems mean higher quality databases and more proficient end users. You have to know what integrity rules are proper for the DBMS to enforce. But once defined, many of those rules can be enforced by the DBMS.

And that is very good, indeed!

Labels: , , , , , ,

Tuesday, May 29, 2007

INSTEAD OF Triggers [DB2 9 for z/OS]

DB2 9 for z/OS introduces a new type of trigger: the INSTEAD OF trigger. The primary usage of INSTEAD OF triggers is to enable views that would not otherwise be updatable to support updates. Typically, a view that consists of multiple base tables cannot be updated. But with an INSTEAD OF trigger this problem can be surmounted. You can code an INSTEAD OF trigger to direct inserts, updates and deletes to the appropriate underlying tables of the view.


With the INSTEAD OF trigger, your application code does not have to include complex algorithms to specify which operations should be performed against views and which should be performed against base table. Instead, all actions are performed against the view and the activated trigger determines which underlying base tables are to be impacted.


So, you might choose to code an INSTEAD OF trigger on a view over a join to allow modifications on the view to go through to the underlying base tables joined in that view. Or you can encode and decode data within a view: for example, the view could contain the decryption functions while the INSTEAD OF triggers use the encryption functions to ensure security.


Only one INSTEAD OF trigger is allowed for each type of operation on a given subject view. That is, one for inserts, one for updates, and one for deletes. Therefore, you can have a grand total of three INSTEAD OF triggers per view.


DB2 executes the triggered-action instead of the insert, update, or delete operation on the subject view. Neither the WHEN clause nor the FOR EACH STATEMENT clause are allowed in INSTEAD OF triggers.


Furthermore, there are some restrictions on the view in order for an INSTEAD OF trigger to be allowed. First of all, the view must exist at the current server when the INSTEAD OF trigger is created. Additionally, none of the following are permitted for a view to have an INSTEAD OF trigger:

  • the WITH CASCADED CHECK option
  • a view on which a symmetric view has been defined
  • a view that references data encoded with different encoding schemes or CCSID values
  • a view with a ROWID, LOB, or XML column (or a distinct type that is defined as one of these types)
  • a view with a column based on an underlying column defined as an identity column, security label column, or a row change timestamp column
  • a with a column that is defined (directly or indirectly) as an expression
  • a view with a column that is based on a column of a result table that involves a set operator
  • a view with any columns that have field procedures
  • a view where all of the underlying base tables are DB2 Catalog tables or created global temporary tables
  • a view that has other views dependent on it

One way to think of INSTEAD OF triggers is that they contain the inverse of the logic in the body of the view. If the view joins tables, the trigger should break the join apart to modify the correct data. If the view decrypts columns, the INSTEAD OF trigger should encrypt the columns. Etc.


Let’s take a look at an example to better understand the INSTEAD OF trigger. First, we create a view that joins the EMP and DEPT tables:



CREATE VIEW EMP_DEPT (EMPNO, FIRSTNME, MIDINIT, LASTNAME,
PHONENO, HIREDATE, DEPTNAME)
AS SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, PHONENO,
HIREDATE, DEPTNAME
FROM EMP, DEPT
WHERE EMP.WORKDEPT = DEPT.DEPTNO;



OK, so far, so good. But since this view is a join it is not updateable. Let’s fix this by coding up some INSTEAD OF triggers. First, we’ll take care of inserts:




CREATE TRIGGER E_D_ISRT
INSTEAD OF INSERT ON EMP_DEPT
REFERENCING NEW AS NEWEMP
FOR EACH ROW INSERT INTO EMPLOYEE
(EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
VALUES
(EMPNO, FIRSTNME, MIDINIT, LASTNAME,
COALESCE
((SELECT DEPTNO FROM DEPT AS D WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
RAISE_ERROR('70001', 'Unknown dept name')
),
PHONENO, HIREDATE);



An insert against the view would not be inserting a new department, so we will be inserting data into the EMP table. If the department does not exist, we’ll raise an error. Next we’ll consider updates:




CREATE TRIGGER E_D_UPD
INSTEAD OF UPDATE ON EMP_DEPT
REFERENCING NEW AS NEWEMP OLD AS OLDEMP
FOR EACH ROW
BEGIN ATOMIC
VALUES(CASE WHEN NEWEMP.EMPNO = OLDEMP.EMPNO THEN 0
ELSE RAISE_ERROR('70002', 'Must not change EMPNO') END);
UPDATE EMP AS E SET
(FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE)
= (NEWEMP.FIRSTNME, NEWEMP.MIDINIT, NEWEMP.LASTNAME,
COALESCE((SELECT DEPTNO FROM DEPT AS D
WHERE D.DEPTNAME = NEWEMP.DEPTNAME),
RAISE_ERROR ('70001', 'Unknown dept name')),
NEWEMP.PHONENO, NEWEMP.HIREDATE)
WHERE NEWEMP.EMPNO = E.EMPNO;
END








Finally we take care of deletions:




CREATE TRIGGER E_D_DEL
INSTEAD OF DELETE ON EMP_DEPT
REFERENCING OLD AS OLDEMP
FOR EACH ROW
DELETE FROM EMP AS E WHERE E.EMPNO = OLDEMP.EMPNO;


Using an INSTEAD OF trigger, each requested modification operation made against the view is replaced by the trigger logic. The trigger performs the insert, update, or delete on behalf of the view. No application changes are required because the code is in the trigger which resides in the database.


If you want to read more about INSTEAD OF triggers, I recommend this quite extensive article (albeit for DB2 LUW) out on the IBM Developer Works web site.


Labels: ,