Friday, January 30, 2009

Hey DBAs! Recoverability Trumps Performance

Many DBAs reading this blog will probably think I'm wrong, at least initially. They'll claim that managing performance is the most important thing they do, but they are confusing frequency with importance. Yes, DBAs confront performance issues more often than they build backup plans – and they better be managing performance more frequently than they are actually recovering their databases or their company has big problems!

So why do I say that recoverability is at the pinnacle of the DBA task list? Well, if you cannot recover your databases after a problem then it won’t matter how fast you can access them, will it? Anybody can deliver fast access to the wrong information (or worse yet, no information at all). It is the job of the DBA to keep the information in their company’s databases accurate, secure, and accessible.

So what do we need to do to assure the integrity of our database data? First we need to understand the availability needs of our data in terms of the business. In the event of a failure how rapidly must we be able to recover from that failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process which corrupts the database.

Only after we know the impact to the business can we develop an appropriate backup and recovery plan. We need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA needs to be phrased as a recovery time objective (RTO) from an application perspective; for example “The amount of time to restore application availability after a failure of the order entry system cannot exceed 2 hours (or 10 minutes or whatever is appropriate for your business)”

To create effective RTOs you will need to be able to answer the question “What is the cost of not having this data available?” When we know the expectations of the business we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.

It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with the application recovery SLA. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.

Not all data is created equal. Some of your databases and tables contain data that is necessary for the core of your business. Other database objects contain data that is less critical or easily derived from other sources. Armed with this information, DBAs can develop RTOs such that the recovery plan matches the needs of the business.

Establishing a reasonable backup schedule requires you to balance two competing demands: the need to take image copy backups frequently to assure reasonable recovery time, while at the same time dealing with the need to take image copies infrequently so as not to interrupt daily business. All the while keeping in mind, if you make fewer image copies you will need to apply more log records during the recovery, and the recovery will take longer. The DBA must balance these competing objectives based on RTOs, usage criteria, and the capabilities of the DBMS.

When was the last time you re-evaluated and tested your backup and recovery plans? Oh, you may have looked at disaster plans, but have you examined your ability to recover locally? Do you know how long it would take to recover your most important primary customer tables, for example, if you took a hit in the middle of the day?

Regular recoverability health checking should be a standard, documented responsibility for the DBA staff; and if you can acquire software to automate the health-check process, all the better.

Labels: ,

Friday, January 02, 2009

Recovery AssuranceExpert for DB2 z/OS: Automating the IT risk management of business availability

Business availability is more than just having a reliable hardware and database platform in place. Even the best high availability environment cannot safeguard itself from logical errors. Since most companies cannot afford downtime, it is important that the enterprise data on which they depend is always available.

Well-planned recovery procedures should be able to assure a complete recovery of enterprise-critical data within a pre-defined time window that provides for minimum disruption of the business. However, within complex environments, it is nearly impossible to perform recovery tests without disrupting the production system. Therefore, even the best-planned recovery scenarios fail because of operational risks resulting from unforeseen and typically immeasurable vulnerabilities.

If you are interested in minimizing the risk associated with DB2 for z/OS availability and recoverability, read this white paper by Brenda Honeycutt to learn about the value of regular, periodic health checks to assure your recovery time objectives for DB2 recoverability.

Labels: , ,

Monday, October 20, 2008

DBA Rules of Thumb

Database administration is a very technical discipline, but it is also a discipline in which the practitioner is very visible politically within the organization. As such, DBAs should be armed with the proper attitude and knowledge before attempting to practice the discipline of database administration.

Just as important as technical acumen, though, is the ability to carry oneself properly and to embrace the job appropriately. With this in mind, I wrote a series of blog entries on DBA Rules of Thumb over at my Data Management Today blog... and I thought the information I wrote there may be helpful to my DB2 and mainframe readership here, so I'm sharing the eight rules of thumb (with links) here on my DB2 Portal blog:
  1. Document Everything!
  2. Automate Ingelligently
  3. Share
  4. Don't Panic!
  5. Focus Your Efforts
  6. Invest In Yourself
  7. Diversify
  8. Develop Business Acumen
What do you think? Did I miss anything important?

P.S. Just a reminder that I will be presenting a webinar on assuring DB2 recoverability with my colleague, Michael Figaro, this Thursday, October 23, 2008 at 10:30 Central time. If you are at all interested in the topic, be sure to register today - and attend this Thursday!

Labels: , ,

Thursday, October 09, 2008

Assuring the Recoverability of Your DB2 Databases

Availability requires much more than just having a reliable hardware and database platform. Most companies cannot afford significant downtime, and some cannot afford any! As such, it is crucial for unplanned outages to be as short as possible. But it is not just a business requirement, in many cases assuring a speedy recovery is also a legal mandate. Regulations such as SOX and Basel II dictate that any outage is resolved within a predefined period of time.

But how many of us can answer, with any degree of certainty, the question “How long will this outage last?” There are many variables that need to be considered when estimating a DB2 recovery time: backups available, quality, point-in-time requirements, amount of log processing, disk speed, tape mounts, and on and on and on...

With these thoughts in mind, Michael Figaro and I will be delivering a webinar titled Assuring the Recoverability of Your DB2 Databases, on Thursday, October 23, 2008 at 10:30 am CDT.

We will tackle issues ranging from regulations, IT complexity, and business continuity, to DSNZPARMs and backup/recovery planning. We’ll also make the case that planning for database recoverability is the most important task among the many tasks of the DBA.

As part of the webinar we will introduce and demonstrate Recovery AssuranceExpert, a new technology to help you ensure that all of your critical DB2 objects are recoverable within your recovery time objectives. Recovery AssuranceExpert is an automated solution to perform daily health checks of data availability and recoverability, as well as provide actual recovery times required for a DB2 object, a complete application, or even a whole DB2 subsystem. Join us on October 23rd to find out how you can insure that your actual recoverability times fit into your SLAs.

Labels: , , ,

Tuesday, September 04, 2007

MODIFY RECOVERY [DB2 9 for z/OS]

Another IBM DB2 utility that has received some attention under Version 9 is MODIFY RECOVERY. For those not sure what this utility is, MODIFY RECOVERY is used to remove records from the SYSIBM.SYSCOPY table in the DB2 Catalog, along with related log records from the SYSIBM.SYSLGRNX directory table and related information from the DBD.

Before the new V9 changed, when you ran MODIFY RECOVERY you had to specify the deletion criteria in one of two ways:

  • Given a specific date, delete all recovery information before that data
  • Or given an age in days, delete all recovery information older than the age

But as of V9 we can alternately specify what is to be retained instead of what is to be deleted. This way, we tell the utility what we want to keep and it will remove the rest.

So, instead of coding the DELETE parameter (with a DATE or AGE) we would instead code a RETAIN parameter. The RETAIN parameter can specify one of five different options:

LAST (integer) – Using this option informs DB2 to choose a date such that the number of full image copies kept is not less than the integer specified. So, if we specify LAST(5), then we will delete all entries that exist prior to the date on which the fifth oldest copy was taken. If more than one copy was taken on that date the result could be that more than 5 copies are retained.

LOGLIMIT – Specifying this option will delete all copies where DB2 no longer has log records to recover forward from. DB2 uses the BSDS to establish the oldest archive log timestamp.

GDGLIMIT – You can use this option to enable the number of copies to be retained to be linked to the corresponding GDG base limit.

GDGLIMIT LAST (integer) – It is possible also to combine the GDGLIMIT and LAST options. DB2 will then use the GDG base limit, if the last primary copy is a GDG, if not it uses the integer specified.

GDGLIMIT LOGLIMIT – And finally, we can combine the GDGLIMIT and LOGLIMIT options, too. If the most recent primary full copy is a GDG then the GDG limit is used. If the last copy is not a GDG then the BSDS is used to determine the date prior to which deletions will occur.

The last two options (the ones that combine two parameters) are particularly useful if you have a mix of GDGs and non-GDGs for your image copies. They are also helpful if you use the LISTDEF option when you make image copy backups.

It is important to understand that for all these options MODIFY RECOVERY will cause DB2 to determine the most recent date that satisfies the retention requirement. After ascertaining the correct date to use, all entries before that data will be deleted. In effect DB2 chooses the most recent date that would satisfy the retention criteria. So, under some circumstances there may be more than one image copy on the specific date that is established, and therefore DB2 will keep those additional copies, too.

As an example, say we take 7 image copies on a single day. Then we run MODIFY RECOVERY … RETAIN LAST (3). In this case, all 7 will be retained as they were made on the same day, even though we indicated that we wanted to retain only the last 3.

Finally, there is one more significant change made to the MODIFY RECOVERY utility. Prior to V9, information was only deleted when copies were removed from SYSIBM.SYSCOPY. But as of V9, MODIFY RECOVERY will deletes SYSIBM.SYSLGRNX entries even if no SYSIBM.SYSCOPY records are deleted.

Why is this important? Well, some shops have migrated over to running BACKUP SYSTEM instead of individual image copies. In this case, there will be no individual image copies to remove in SYSIBM.SYSCOPY, but the SYSIBM.SYSLGRNX records will continue to grow. After migrating to V9 you can use MODIFY RECOVERY to trim the size of SYSIBM.SYSLGRNX which can improve performance for processes that need to access log range information.

Labels: , ,