Disaster Recovery Plan for Information Technology Organizations

Abstract

Businesses today face not only uncertain markets but also various threats and disasters that are artificial as well as natural. While natural disasters include Tsunamis, Tornadoes, Earthquakes etc., and artificial disasters include fires, power outages and even terrorist attacks. What is common among these disasters is that they strike quickly, without warning and destroy the business infrastructure, and this includes IT systems, networks, and buildings. A look at the more recent disasters and how they affected the businesses involved is included to show why disaster recovery and business continuity plans are absolutely critical, especially to a government organisation. One look at the FIMA fiasco after Katrina should be proof enough of the consequences of not having a comprehensive and tested disaster recovery plan and a business continuity plan. People died in New Orleans unnecessarily for this reason. To ensure that organisations continue to function even after a disaster has struck, a viable Disaster Recovery Plan (DRP) combined with a Business Continuity Plan (BCP) needs to be in place and tested. These plans should specify the centres from which data should be recovered, alternate sites where operations can be resumed, what data to back up, detailed procedures to be used and the team structure that would implement the DRP, along with details like who will do what, when, where and how. Such a plan will avoid loss and confusion in the face of a disaster and reduce the time to recover data. It may even save lives. This dissertation provides details of the planning and implementation of a DRP and BCP for an IT organisation. Details of the steps to be followed, network diagrams, templates for risk assessment and others are given in detail.

Introduction

Background of the Problem

Businesses worldwide operate under conditions that are subject to change, depending on the political, economic, and natural conditions. Businesses are constantly under the threat of disasters, such as earthquakes, terrorist attacks, fire, power outages, and stock market crashes. Due to such risks, the intellectual assets, such as classified documents, source codes, and physical assets, such as infrastructure and hardware, run the risk of compromise. In such a situation, a plan must be in place to allow the business to recover its intellectual assets and continue the business operations within the shortest time possible. It is also essential to assure clients and business partners who have invested time and resources that in case of a disaster, their investments will be recovered within an acceptable time frame. To handle such situations, it is important to have an IT disaster recovery plan that is implemented to counter the effects of disasters.

Having a Disaster Recovery Plan (DRP) and a Business Continuity Plan (BCP) becomes even more important when a business expands to the overseas market. It is the aim of his paper to discuss the important elements of a DRP and BCP for any company, especially those with global operations. It is no longer enough to make sure that all data is backed up. Data backup may be the most important part of a Disaster Recovery Plan, but it is just one part of it. Business continuity depends upon having a clear plan for getting the company up and running in the event of any interruption. This kind of plan is not as simple as simply making sure all data is safe. It is a plan for ensuring that the service level and business process are not interrupted any longer than absolutely necessary and that essential business processes are up and running as soon as possible, followed by a return to normal business in a reasonably short time. Too many companies, as evidenced during the recent disaster of Hurricane Katrina in New Orleans, have not planned adequately for disaster. It is much like the old story about the leaky roof: When it is raining, one cannot fix the roof. When the sun is shining, the roof doesn’t need to be fixed. Today’s businesses, even if they are not globalised, can incur tremendous losses for downtime, and lost data can mean going out of business. Damaged assets will eventually be covered by insurance in many cases, but the Business Continuity Plan ensures that the business will resume as soon as possible in spite of material losses.

Aims of the Project

It is the intention of this paper to outline the importance of both the Disaster Recovery Plan and the Business Continuity Plan in an effort to make a convincing argument to create them or upgrade outdated plans. As part of this process, I will outline completely what the best practice should be for establishing a workable DRP and BCP.

Thesis Statement

All businesses which hire employees and work with data need both a Disaster Recovery Plan and a Business Continuity Plan, both to ensure that proper steps have been taken to protect the assets of the company and to ensure that procedures are established and documented so that business resumes as soon as possible minimising the costs of interruptions. The larger the company, the more important these plans become.

Importance

Data is a company’s lifeblood. It includes customer records and information, employee records, supplier records, orders in progress, inventory, company financial and tax records, product information and documentation, marketing collateral, correspondence, company website and newsletters. In addition to data loss, a Business Continuity Plan will minimise losses due to downtime and will instil customer confidence in such cases as when it is used because they will see that the company is smart enough to have a plan and that the customers are important enough to warrant one. For essential services or government offices, this kind of plan is essential to the community at large.

Methodology

The methodology for this project will include a close look at current literature, a careful examination of the best practices for establishing useful DRPs and BCPs and an assessment of the risks involved in not having these plans in place. We will examine the literature for records of problems when a DRP or BCP was not in place to see what the results actually were. Wherever possible, comparisons will be made.

Literature Review

Why a DRP and a BCP are Needed

Recent events such as the 9/11 attacks, Katrina hurricane and the Tsunami in Southeast Asia showed that disasters, both natural and artificial, can strike with very little warning and totally take out the infrastructure, such as buildings and essential services, or even whole towns and cities. Cabling and any IT systems that are located in any of these locations may also be damaged or destroyed, and power to get up and running again may not be available. Data backed up on-site may not be accessible. Benton (2007) defined disaster recovery as “the process, policies and procedures of restoring operations critical to the resumption of business, including regaining access to data (records, hardware, software, etc.), communications (incoming, outgoing, toll-free, fax, etc.), workspace, and other business processes after a natural or human-induced disaster”. Any disaster recovery would also involve reconstruction of buildings, relocating people, building roads, restoring power and communications and many other activities. (Meade, 1993). However, the immediate need is to be able to access all data and be ready to conduct business as usual.

In the current environment, the threat from terrorists and also from nature places IT systems at high risk. Since many companies have very strict rules & regulations regarding retrieval and storage of sensitive information, data tends to get centralised. If a disaster strikes the central server room where the data is stored, then all the company’s soft assets would be lost forever. Information about customers, business strategies and records, marketing and trading information and other details would become irrecoverable. In such a scenario, a strategic plan that protects all computer-based operations necessary for the company’s day-to-day survival is vital. If a company loses sensitive data, then it not only loses its soft assets but also the confidence of the customers, and there is no acceptable excuse for such carelessness. With the increasing use of IT systems and dependence on business-critical information, the importance of protecting irreplaceable data has become a top business need. Since many companies rely on IT systems and regard them as critical infrastructure, the need for regular backup is very crucial so that even after a disaster strikes, the company can begin operating again within a short period of time. Many large companies provide up to 4 per cent in their IT budget for disaster recovery systems. It is estimated that 43 per cent of companies that have lost data during disasters and could not replace the data went bankrupt, while 51 per cent had to shut down within two years, and only six per cent could survive in the long run (Swartz, 2004).

So a good IT Disaster Recovery Plan to ensure data recovery and an alternative location plus a Business Continuity Plan to make necessary adjustments in locations and personnel are required to ensure that a company is able to recover quickly in case of a disaster, retaining customer confidence and continuing business as soon as possible. Most companies are woefully unprepared for a real disaster and not as well prepared for a minor disaster as they think. The major reasons are that they do not really know what to do since DRP and BCP are not a major part of business education, even now, and those which have such plans have not tested them. (Landry and Koger 2006).

Lessons from the Past Should Encourage Us to Ensure the Future

In looking at some of the more extreme recent disasters, we can see the needs, but most companies don’t really know how they will fare unless they have already been there. However, looking at case studies should help us to learn from the triumphs and the mistakes of others. Just a short list of disasters would include the 9/11 Attack on the World Trade Center, four hurricanes in 2004 that hit Florida, the Tsunami which impacted 14 nations in December 2004, Hurricane Katrina, several typhoons and the very recent earthquake which hit China. It is not a question of IF a disaster will occur, but WHEN. US Federal Building in Oklahoma City was bombed, data, as well as on-site backups, were destroyed. Since data losses and system unavailability resulting from a disaster cripple the operation of an organisation, the Sarbanes-Oxley Act of 2002 requires organisations to maintain accurate and safe record-keeping, or they can be charged with negligence. (Choy et al.2000).

The East Coast blackout of 2003 resulted in estimates of direct costs from millions in lost retail sales to $6.4 billion in total losses. (Anderson and Geckil 2003) Even companies that have contracts for service with very large dependable companies may find that their four-hour response deadline cannot be met if premises are not accessible or roads are closed. (Landry and Koger 2006).

What disasters cost the U.S.

Sarrel (2007) says that even small businesses need Disaster Recovery and Business Continuity plans. They may be limited to remotely backed up data and an operations manual detailing how business is conducted and who does what, for and with whom etc. It should include a contact list for employees, a customer list with profiles and order history and all the company’s partners and supply chain companies. The article included a small graph of the losses from disaster in the US from 1986 to 2005. The numbers (in billions) are bigger than the operating budgets of many small countries.

Security should also be a major consideration. Vijayan (2005) stated in October 2005 column in Computerworld that most companies do not include security risks in their DRPs. Insecure backups pose a risk even without the occurrence of a separate disaster. He quoted Palmer of Lenox saying that IT departments need an expanded perspective of what constitutes a disaster. John Pironti, the principal security consultant at Unisys Corp., mentioned that disasters like Katrina and Rita make IT departments focus on physical equipment, while most are still stuck in the 1970-80s mindset concerning security.

Bandyopadhyay (2001) identified some special needs for HMOs (Health Management Organizations). In particular, he says that there are two elements of the disaster recovery planning process which are crucial to recovery for HMOs: impact analysis (BLA), to assess the effect a breakdown would have on the business, and testing the plan (Keehn 1993; Tetry 1995). Brian Fonseca (2002) noted in a 2002 article for Infoworld’s Services magazine that IT managers are nervous about committing DRP and backup to an outside source, but that it might not be such a bad idea, since these companies are experts at this. Most of these companies can also help you devise a workable BRP, which Yager (2002) says is even more critical since most companies ignore it.

9/11 Attacks on the World Trade Center

The World Trade Center (WTC) Towers was home to 1,200 businesses that employed over 10,000 people. Some of the largest were:

  • Morgan Stanley Dean Witter
  • Bank of America
  • Deutsche Bank
  • Oppenheimer Funds
  • Credit Suisse First Boston (Stevens 2003).

Many other large financial companies had offices in the WTC’s local vicinity and were extensively damaged or destroyed. The most severely affected were those in the building. These companies may have lost personnel and certainly lost hard assets. Those companies with appropriate Disaster Recovery and Business Continuity Plans were able to continue doing business within a reasonable time and recovered all their data. Those with less than good plans did not fare so well, even if most or all of their people survived. Others nearby had problems with access, air quality and services, so many had to operate from other locations.

TheBeast

“TheBeast develops software and platforms for real-time data distribution and online securities trading across the Web, wireless connections, and wide-area networks. On September 1l, the company was relatively lucky. However, its offices were on the 80th floor, just below where the first plane hit. All 63 of the company’s employees escaped the building safely. By the end of the week, most were working in an alternate facility in New Jersey. The company. However, it wasn’t as fortunate in saving its data. TheBeast had a disaster recovery plan that entailed only weekly backups of software code. There was no backup plan implemented for other data resources. such as email.” (Bannon 2002) This company lost a week of coding, all its email and some applications housed in its data centre, which had not been backed up to a level 3 off site server. Had they been using one of the more sophisticated email programs, such as Exchange Server, available long before this disaster, the backup could have been created to work without ever interrupting the flow of the program. (Symoens, Jeff 2000). This company is not unique. At the time of the writing of this article, it could easily have been said that every company in the US would lose data in a disaster, and some would lose it all. With proper planning, no company needs to lose more than a few transactions, whatever can be processed in the space of just a few minutes or even seconds.

Empire Blue Cross and Blue Shield

Empire Blue Cross and Blue Shield had a better plan. The company’s 1,900 employees occupied ten floors at 1World Trade Center, and on September 11. The network resided there with 250 servers and a web call centre. Eleven people lost their lives in the attacks. However, the quick actions of a senior server specialist at the remote backup centre saved a huge amount of work and were responsible for the smooth transfer of a business to that site. As soon as the news was heard on the remote site, just minutes after the attack, the specialist in the Albany data centre switched everything to the Albany server, including all employee profiles, before the building went down. Empire’s very thorough DRP meant that the 4.4 million people for whom it provides health insurance did not see any difference. Employees, over the next few days, could log on at the remote sites as if they were still in the building. All traffic that would normally have gone to the World Trade Center was automatically rerouted to Albany and Long Island. Within an hour after the attack, IBM Global services helped them place an order for 150 servers, 500 laptops and 500 workstations. I must emphasise that the key to the effectiveness of this, or any, DRP/BCP is people. If a company loses a large portion of its IT staff, or any staff for that matter, a plan for temporary, or even permanent, replacements should also be available. (Bannon and Levin, 2002).

Here is an illustration of how Empire Blue Cross’s DRP worked after 9/11.

Empire Blue Cross’s

Merrill-Lynch

Within minutes of the crash, IT was transferring control to the backup facility in New Jersey. The 60,000-square-feet facility was designed as a disaster response facility, and all personnel knew where to dial in, so transactions did not even blink. Trading operations smoothly transferred to London, Tokyo and Hong Kong. The company used a telemarketing service and the company’s public Internet site to communicate with workers: five thousand technical employees, many working 36-hour shifts, re-established communications with the stock exchange. Monday morning, when the exchange opened, Merrill Lynch was there, using telephones to communicate with the floor because electronic links were down but doing business as usual. Their research analysts had to work from home as all their data had been lost. They compiled new data and used email to send it to the traders on the floor. One thing learned was that backup facilities for mission-critical business functions should not be located where they might be vulnerable to the same disaster as the primary site. The NYSE later decided to move their backup because Brooklyn was considered too close. Oddly enough, the IT infrastructure of the WTC was not destroyed because of backups deployed elsewhere. (Ballman, Janet 2001).

Phone service was another problem. All three major cellular carriers replaced their downed towers with COWS (Cellular on Wheels). Verizon repaired its entire lower Manhattan infrastructure over the weekend, a truly heroic feat. Paul Lacouture, president of Verizon’s network services division, said, ”I’ve gone into our buildings after the fire). I’ve restored our networks after floods and earthquakes. This was a combination of all those things times a factor of three or four.” Decentralisation of backups was responsible for minimising the loss of data. Companies with all their eggs in one basket lost everything. On-site backup is simply not sufficient. Organisations that relied heavily on paper records are rethinking their reliability. Almost no paper records survived this disaster. Many people consider paper to be more reliable than electronic storage, but this is no longer the case.

Government offices, such as the New York Port Authority, historically rely heavily on paper records. The Port Authority, whose offices were in the World Trade Center, lost nearly all of its paper records, but it was able to continue doing business using the electronic backups. The company of marsh and McLennan had just finished converting nearly 25 million documents to digital format, and these were stored off-site, so their effort paid off. This disaster showed the vulnerability of paper documents to be intolerable for good business practices. All companies should give themselves five years to convert all documentation to digitised format, which should be backed up off-site. Backup tapes on-site are almost useless in this kind of disaster and should only be used as a method for restoring compromised computer systems in a level 1 disaster. (Stevens.2001).

Morgan Stanley Dean Witter

Morgan Stanley Dean Witter had about 4000 employees in the Twin Towers. Fortunately, the employees on the high floors of Tower Two had a 20-minute warning before the second plane struck. An evacuation plan which had been put into place long before was implemented immediately, and the company wound up losing only 13 employees. Those employees in charge of operations immediately walked 22 blocks to the backup site and turned on all the computers. “One of the earliest and best decisions he made, Scott said, was turning one of Morgan Stanley’s credit card call centres in Phoenix into a toll-free emergency hotline as a first step in locating the company’s 3,700 missing employees. The number was posted on national television by 11:00 A.M. “We had the first national emergency number of any organisation, including the federal government,” he said. “By 1:30 P.M., the centre had received more than 2,500 calls.” (Walsh, Catherine, 2001) “‘If you wait for a crisis to begin to lead, it’s too late,’ said Scott, president and chief operating officer, during a recent talk at Harvard Business School.”

Deutsche Banke

“Deutsche Banke’s New York offices have tested their disaster recovery plan. intentionally failing-over all of their mission-critical applications and servers between the bank’s two data centre locations-the World Trade Center in New York and a location in neighbouring New Jersey.” (Bannon, Karen J. 2002) The company was up and running on backups within minutes of the disaster of 9/11. Sadly, they lost two employees and all of their 5000 PCs in their offices. However, 2000 laptops were purchased immediately, and within a week, all employees were working, either at the New Jersey backup installation or from home.

Sidney Austin Brown and Wood

When two planes slammed into the World Trade Center (WTC) buildings on September 11, 2001 (9111), the law firm Sidley Austin Brown & Wood LLP (SAB&W) was hit directly. The firm occupied floors 54 through 59 in the North Tower. The building hit first. Their people were evacuated successfully. The bigger challenge would come during the next months, as lawyers requested but did not always receive files. The pre-9/11refrain of “No, don’t send that off-site’ became “Please tell me you sent it off-site!” Digitised documentation would certainly have helped here. Lessons learned: communication was a problem and should have been planned in more detail; some senior personnel were not assigned tasks, which was frustrating, and their friends were valuable, including colleagues, customers, suppliers and even competitors. The insurance policy, which had been updated September 1 and doubled, was a blessing, and it covered everything in the way of tangible assets, and even employee personal effects. (Their insurance company was not cheering.) (Barr, Jean 2003).

NYSA

The companies with the least disruption were reported to be those with mirrored data. The companies with the most disruption relied heavily on paper records or lost a large portion of their personnel. Streich, of NYSA (New York Shipping Association) said, “If you have a disaster plan and you don’t test it, you will not have a job” (Grygo, Eugene 2001)

The consensus among all the sources pointed out that people were the real key to how companies recovered, but the backed up data was vital to their roles. (Gibson, Stan 2001) Communication was most often cited as a problem. (2005).

One last word about disasters: the one we need to be prepared for is the one that hasn’t happened yet. In all likelihood in sometime in the future, there will be a massive cyber attack. Most companies and government offices are not really prepared for this. Yes, there is security in place, but few organisations have plans detailing what to do if all of the security walls are breached. (Carlson, Caron 2005).

Hurricane Katrina

This disaster was one of the worst possible for businesses. Those who did not lose personnel had damage to their premises, and their locations were not accessible for weeks, possibly months. Personnel who survived were probably scattered to different locations and may have had no place to live if they returned. Equipment and data backup media were either damaged or destroyed by the storm, the flood or the humid environment which followed. Lack of community services made it impossible for most businesses to return to New Orleans for an extended period, even if their people could. For companies, downtime is costly for them and everyone in the supply chain. For government and essential services, the problems simply multiply, especially as government and private enterprise are almost inextricably intertwined by virtue of that aforementioned supply chain. (Landry and Koger 2006).

Companies in the immediate area were not the only ones impacted by Katrina. Any company in the supply chain of affected companies had an impact. In addition, problems with traffic and power spread across a wide area. Essential services in nearby communities were severely strained. The long-term power outages after Katrina extended even into Georgia and the Carolinas (Breed 2004; Brice and Langan 2004). Many companies in New Orleans had backed up generators to cover electrical needs but had to shut down large water-cooled mainframe systems because city water pressure failed two days after the storm. (Landry and Koger 2006) They have since begun to look at the power consumption of their data centres and make changes. Power is always a problem in a major disaster, and there is no assurance that it will not be limited in the recovery site, no matter where that is located.

Some lessons learned from Katrina include the distance of recovery centres should be from the business centre. Larry Dignan (2006) of EWeek mentions in the August 28, 2006 issue that they cannot be too far away. Many businesses found themselves with both their primary site and their backup site compromised and inaccessible. Even if they had an accessible remote backup of data, they had no computers on which to run them. If they had a good emergency provider, then this problem was resolved eventually, but transporting the personnel they could find and getting replacements for those they could not find was a major problem.

The supply chain was another major problem. FEMA had major problems. They could not track supply trucks after they left the warehouse carrying disaster aid supplies. They have since added tracking devices such as the ones Wal-Mart uses in its supply chain. Tim Titus of Lexus Nexus suggests that backup suppliers be included in the SRP/BCP since current suppliers may also be affected by the same disaster. (Britt 2005) They had the required DRP, but it was outdated and had never been tested.

One thing about hurricanes is that they do not change much. Businesses in Florida learned lessons from Ivan, so they were able to help after Katrina. Katrina taught Houston area businesses enough to get them more ready for Rita a month later. (Mearian, Lucas and Weiss, Todd R. 2005) Mearian, in another article (2005), noted Pacific Capital Bankcorp decided to bring disaster recovery programs in-house rather than trust them to an outsourcer where the backup facility is thousands of miles away, and there is a question of who will get priority in a shared facility. Many businesses have long considered disaster recovery plans and backup facilities to be insurance policies without a return on investment. However, now that it is becoming abundantly clear that the insurance policy is absolutely necessary, companies are looking at creating deletion policies to reduce the huge volume of data that needs to be backed up. Companies like Bally have every bit of code they ever created backed up on worm drives and replicated to DVDs. However, much of the day-to-day data is not that valuable. (Pudy, Teresa 1999) Many public businesses, such as hospitals, are required by government regulation to keep everything. Perhaps this needs to be changed. Sometime in the future, this amount of data is going to become a cumbersome burden. Ken Black of Yahoo each handles from 4 to 7 petabytes of data backup. He says, “We have a group called the Paranoids. They’re our security people, and they look for holes everywhere— and what’s irritating is, we’re finding them everywhere.” (Mearian, Lucas 2005).

The day before Katrina came ashore, the IT staff at Tidewater Inc. and IBM.

AS/400 and Compaq servers from New Orleans to Houston. IP addresses would automatically failover, and the servers were running in just over two days. Tidewater provides supply vessels and marine support services to the offshore energy industry. Since this is a critical function, they have sensed opened a new backup facility in Dallas. Companies all across the Gulf Coast have since that time create new disaster recovery plans. (Fisher et al. 2006).

Problems Incurred in Planning and Recovery

A number of problems were incurred in planning for disaster recovery and during the recovery process. Lundquist (2004) emphasises that DRPs need to include an up-to-date inventory. How often this should be updated depends upon the company’s purchasing schedules. The lessons learned from each of these can be applied in future plans. The range of problems covered an impossibly wide area, but certain types of problems recurred over and over again, so we will concentrate on these.

Scheduling

Disaster recovery problems included many concerning scheduling the recovery. In fact, scheduling is such a problem that even with the same company doing simulations, the scheduling problems recurred, but each was different from all the others. There are simply too many variables to consider except in a dynamic fashion and presents a number of useful tools.

Keeton (2006) outlines some strategies for dealing with these problems. “The data recovery scheduling problem is challenging: with multiple workloads to recover and multiple strategies for each workload’s recovery, the number of possible schedules is large. Unfortunately, the categorical approaches administrators use to devise recovery plans don’t provide optimal solutions: their inefficiencies result in millions of dollars of extra penalties in the face of disasters.”

DRPs and BCPs are, in fact, only a framework for a professional manager to use for planning. Since no two real disasters are the same, it requires extensive contingency planning and a talented, intelligent team to deal with the differences. Nemzow (1997) spent 14 pages just outlining how to set up flexibility and contingency plans. These have had to be expanded to include the lessons learned from more recent disasters.

As noted by Griffiths and Kramolis (2007), the larger the installation, the more numerous and larger the possible problems become. Tata Consulting mentioned in an article for Eweek that you need your disaster recovery plan to extend to your offshore providers, and you need a contingency plan in case they cannot deliver. (Dignan, Larry 2006).

Conclusions From The Literature Review

From this review of the literature, which included any documents detailing some of the more recent disasters in how companies responded, we know that the most important thing of all in disaster recovery and business continuity is the plan. Every company that had a plan managed to recover eventually. Having a written plan is more important than almost anything else. Digital backups are certainly extremely important, but most companies have been doing these routinely for a long time. What the companies have not done is to thoroughly think out exactly what they need to do when a disaster strikes. Even if financial constraints prevent the full implementation of a complete disaster recovery plan and business continuity plan, having the plan creates a set of guidelines that can be followed in such an event. It’s a simple matter to make a note of those things that have not yet been implemented while creating these two plans. By writing the plan out, discussing it with colleagues and running it by other professionals, portions of the plan can be prioritised effectively. It was found that the business continuity plan was every bit as important as the disaster recovery plan because it is the framework by which the company will get back to business as usual.

The second most important point found in these reports indicated that testing the plan was nearly as important as having it. This especially applies to digital records, and controls need to be failed over to the backup site. It also applies to having people assigned with tasks and having the ability to communicate with all the company’s people. Communication was cited as an extremely important problem after all of the disasters covered. Most important in all of these reports were problems that pertained to people. Many companies set up call lines that employees could use call in and let the company know where they were.

A third very important point found was that there is no such thing as a backup site too far away. However, a backup site to close may become just as useless as the primary site. Many companies kept their backup site near, especially if they were backing up paper records. Paper records were found to be extremely vulnerable, and it is suggested that all records need to be digitised. There also needs to be a policy created for how long records are kept and did what state. Without this policy, companies may lose valuable records, or worse, be totally inundated in useless, outdated trash.

Suggestions for PWA

Considering all the lessons learned from these few disasters which we have covered here, it should be abundantly clear that every government office or company that supplies critical public services must have a thorough Disaster Recovery Plan and a well thought out Business Continuity Plan. What follows is the suggestion for this company disaster recovery and business continuity plans. There is some general information in order to provide a broad overview needed to understand each facet of the plan. However, care has been taken to keep the topic narrowly focused on the needs of the Public Works Authority.

How to Formulate the Backup Plan

A backup plan must be tailor-made for each company. In order to know exactly what needs to be backed up, an assessment of the business process is necessary. The analysis report must contain every step in the business process, from initial contact with the customer to the final acceptance of payment and its disposition. The reason this has to be so thorough is to ensure that every step in the business process can be replicated at a remote location after a disaster. The detailed report will contain what is done by whom for the entire business process and what items are necessary for the satisfactory completion of the process. The questions to be answered by this report will include the following:

  • How does the customer know about the organisation?
  • How does the customer contact the organisation?
  • How is the initial contact handled, and by whom?
  • How is the order placed?
  • How is the order fulfilled, and by whom?
  • What are the details of the supply chain?
  • How is the order delivered?
  • How is payment made?
  • How is payment disbursed after receipt?

Once every step is detailed, then notations need to be made concerning the necessary equipment for completing each step. Also, each person who handles one of the steps needs to document how that step is accomplished. In this way, there is documentation of the business process in case of missing personnel. Replacements can then use the document to guide them in replicating that step of the business process. Please note that the business process may appear simple until it needs to be documented. It is procedural knowledge that is being documented, which is far more complicated than domain knowledge. That is, the procedures that we know seem simple until we have to write down every step for someone whom we assume knows nothing.

What to Backup

The question of what to backup is best answered by asking ‘what are the company’s soft assets? An IT company may regard its software source code, its database structure, software source code of its applications as very crucial. For example, a company such as Microsoft would consider the source code of Windows, XP, MS Office and other software applications as critical and would want to ensure that the code is recovered at any point in time. A banking company would consider the financial records of its customers, its own receivables and credit/ debit records as very important. Banks store the account details, credit card payment and receipt details, information about mortgages and loans, Forex accounts and all transaction records, and these are continuously backed up to at least one local and one remote location. A large investment and share trading company or a bank that deals in futures would consider its stock portfolio and transaction records as very important. Government defence bodies would consider details of their troop deployment, state of munitions and aircraft, status and position of different missile systems as crucial to the protection of their country. So the data to be backed up would depend on what the company feels is necessary for conducting their business. Hence the data to be backed up would vary (Toigo, 2005).

Another issue that comes up is the question of data formats and the type of backup. An organisation typically stores information either in encrypted form, binary code or in the form of documents such as MS Word, XLS, pdf, image files and so on, and these formats have to be saved according to the organisation’s needs. Many organisations, to preserve the integrity of their data systems, encrypt data using 128-bit or 256-bit encryption. At any point of time during the recovery system, the encryption key should be available to authorised personnel with the required level of clearances (Toigo, 2005). These keys need to be stored somewhere away from the business location but be instantly available to authorised personnel.

Different techniques are used for backing up data, and these include the incremental backup system that writes only data that has been changed since the last backup. Considering that banks and large organisations have data sizes in the range of petabytes, if a daily backup of this huge stream of data were to be taken, then massive resources would be required, time used to be excessive, and the system would slow to a crawl. To get over this problem, incremental data backup is used, and this process ensures that only data that has been changed since the last backup is written in the backup area. Also, since backup slows down the system, the company’s run the data backup process as a day end process, late in the night when very few users would be logged in (Hiatt, 2007).

It is worthwhile to remember this statement “When it comes to back up, members of the organisation are paranoid. While some feel that every little bit of email or document that they have created (which would be probably be deleted by the recipient) has to be backed up, others tend to develop paranoia that their documents or writing would be available for everyone to see and they would not want to share it with others. The management has to step in at a certain stage and frame a policy on what is worth backing and what is best left on the PC of a warehouse assistant clerk” (Kaye, 2006).

In order to know what soft assets and intellectual material to back up, a risk assessment should be undertaken. This includes assessing the probability of the risks and the relative importance of the possible losses and their attendant costs. While doing this risk assessment, it would be a very practical step to create a document retention policy that clearly defines essential documents that must be kept and for how long. It should also include how they should be backed up and the redundancy required.

Understanding different levels of disasters

The possible disasters that an organisation could face have been divided into five levels, and the effects of each level and the disaster recovery plan would differ for each level from Level 1 to Level 4. Level 1 would be the least severe, while Level 4 would be regarded as a catastrophe. However, the backup remains the same because one cannot predict ahead of time which level of disaster might strike.

Disasters can be classified into (Preston, 1999):

  • Level 1 Disaster: This causes a minor outage. An example of a Level 1 disaster is a sudden short power outage or modem or router failure. Some or all business processes at a location might experience minor damage, but processes will continue to run with reduced efficiency. The full processing capability of mission critical business processes and related infrastructure and people can be restored within an hour. Recovery at an alternate site may not be required (Preston, 1999).
  • Level 2 Disaster: This causes a moderate outage. An example of a Level 2 disaster is complete LAN failure or prolonged power outage. Some or all business processes at a location might experience moderate damage. Processes may or may not continue since the equipment is below the minimum capacity to run. The full processing capability of mission critical business processes and related infrastructure and people may be restored within 2 hours. An alternate recovery site may not be required for continuing business, but alternate equipment or communication links may be required (Preston, 1999).
  • Level 3 Disaster: This causes severe disaster. An example of a Level 3 disaster is riots or minor fire. Infrastructure ceases to function. The full processing capability of all business processes from that location and related infrastructure may be restored within 1-2 days. Use of an alternate recovery site will be required (Preston, 1999).
  • Level 4 Disaster: This is a catastrophe, such as an earthquake, war, or a major terrorist attack—this type of disaster results in major disruption of services. Full processing capability cannot be achieved for a substantial period of time. Recovery will require the use of an alternate recovery site (Preston, 1999).

The following table gives details of these threat levels.

Type Of Disaster Description
Minor Outage (Level 1) Some or all business processes at a location experience minor damage or outage, but processes will continue on a degraded basis. The full processing capability of mission critical business processes and related infrastructure and people can be restored within 1 hourby getting the necessary infrastructure, people and data operational. Recovery at an alternate site is determined not to be required. It is assumed that the usual office premises & people are available to the business. e.g.
  1. A link between two locations is temporarily unavailable
  2. Modem, switch or router fails.
  3. Sparks in electrical connections force a temporary shutdown of servers or routers in that area. Operations resumed as soon as electrical connections are repaired.
  4. Virus and hacking attacks or due to improper behaviour of employees.
Moderate Outage
(Level 2)
Some or all business processes at a location experience moderate damage or outage. Processes may or may not continue on a degraded basis. The full processing capability of mission critical business processes and related infrastructure and people may be restored within 4 hours. An alternate site may not be requiredfor continuing business, but alternate equipment or route (in case of communication links) may be required depending on the criticality of the business process and infrastructure. It is assumed that the usual office premises and people are available to the business. e.g.
  1. Power surge damages equipment
  2. Link Failure (that can be recovered within 4 Hours)
  3. LAN Failure
Disaster
(Level 3)
A Centre has experienced a severe disaster. There is a total shutdown of infrastructure. The full processing capability of all business processes from that location and related infrastructure and people may be restored within 1-2 days.Use of an alternate recovery site will be required. It is assumed that premises and equipment are inaccessible, but people can congregate elsewhere if required. e.g.
  1. Flood / Rain / Snow makes office premises at one of the offices inaccessible.
  2. Riots, fire or arson at a location near one of the offices renders the office premises inaccessible.
  3. Extended power cut or reparable damage to the building.
Catastrophe
(Level 4)
A centre has experienced a major disaster that will likely result in a major disruption of services. Full processing capability cannot be achieved for a substantial period of time.Recovery will require the use of an alternate processing site, as well as offsite offices for employees over an extended period of time. e.g.
  1. Earthquake, severe floods or other large natural disasters
  2. Terrorist Attacks / Bombing
  3. Extended Communal Riots etc.
  4. Extensive damage to building, making it unsafe or even inaccessible

A disaster may impact an organisation in the following ways (Gilchrist, 2001):

  • The organisation may not be able to operate from the affected site.
  • The organisation may lose critical resources, such as systems, documents, and people.
  • The organisation may not be able to interact and provide services to business partners, clients, brokers, vendors, and other related financial institutions.

In addition to incurring financial losses, disasters may impact the credibility of the company. In extreme cases, the company may lose many of its clients.

Objectives of the Disaster Recovery Plan

DRP plan is intended to provide a framework within which companies can make decisions promptly during a business disruption. The objectives of this plan are (Kaye, 2006):

  • To identify major business risks.
  • To proactively minimise the risks to an acceptable level by taking appropriate preventive and alternative measures.
  • To effectively manage the consequences of business interruption caused by any event through contingency plans.
  • To effectively manage the process of returning to normal operations in a planned and efficient manner.

The scope of the corporate business continuity management plan document must include plans for restoring:

  • SBUs (Strategic Business Units) and all the Projects being executed by the SBUs
  • Shared services and data
  • Information Systems at all locations of the company

Framing the IT Disaster Recovery Plan

Information is the key to survival for organisations. Information could be stored either electronically or as hard copies. A Disaster Recovery Plan (DRP) is a set of procedures designed to restore information systems. A DRP mostly deals with technological issues and also recommends infrastructure that should be implemented to prevent damages when a disaster occurs. A disaster can make the business processes totally or partially unavailable. Business Continuity Plan (BCP) focuses on sustaining the business processes of a company during and after a disaster, and this plan is a continuation of the DRP and cannot be implemented in isolation. A BCP lists the actions to be taken, the resources to be used, and the procedures to be followed before, during, and after a disaster. An IT disaster recovery plan is implemented for an organisation in this section (Facer, 2001).

The DRP within a company is responsible for performing the business impact analysis, a process of classifying information systems resources baselined on criticality, and the development and maintenance of a DRP. Tasks that need to be covered are included in the BCP document. The DRP should also maintain the BCP document up-to-date. This responsibility includes periodic reviews of the document – both scheduled (time-driven) and unscheduled (event-driven). DRP defines a Recovery Time Objective (RTO) that specifies a time frame for recovering critical business processes. The DRP meets the needs of critical business processes in the event of disruption extending beyond the time frame. Recovery capability for each Strategic Business Unit (SBU) – including all Projects being executed under the SBU – shared service, location and Offshore Development Centre are defined. In the event of any moderate or minor disaster, the recovery capability should ensure that the business processes work seamlessly without affecting any other dependent critical business processes. E.g. If the main power grid is disrupted, there must be standby facilities like generators to ensure that power is available (Facer, 2001).

In this paper, a DRP plan would be implemented for an IT company called ABC Ltd. The following illustration shows how the company is organised.

Assets and Nodes of ABC Ltd
Figure 1. Assets and Nodes of ABC Ltd. for DRP (adapted from Preston, 1999)

The above figure shows different assets and nodes of ABC company are organised. The company has its headquarters in New York and a number of units in branches in areas such as Washington, Rochester, Syracuse and others. The company also has a number of offshore development centres, and these are identified as ABC Europe, ABC Japan, ABC Australia, etc. In addition, the company has a number of clients, and these are identified as Client 1, Client 2.

Defining the Organisation Chart for DRP

Before implementing a DRP, it is essential that an organisation chart be created that would identify key employees who would be members of the DRP team. The following figure illustrates the organisation chart of ABC Ltd.

Organisation Chart for ABC Ltd
Figure 2. Organisation Chart for ABC Ltd (Margaret, 2007)

Protecting Intellectual Assets with the DRP

In a business relationship, a client invests in internal resources like personnel, funds to set up infrastructure. In addition, clients may provide a company with resources in the form of confidential information, raw source codes, initial drawings, machinery. In addition, a company serving its clients has similarly invested funds and other resources in the business engagement. These investments represent assets. Companies must take preventive actions, such as setting up a dedicated security team or formulate policies that help you reduce damage when disasters occur. This must also include backing up again to another site if the backup site must be used for any length of time and protecting the company’s assets from cyber attacks.

IT Team Security Structure

The IT Security Team of an organisation is responsible for implementing and maintaining the corporate security policy at all ODC locations and other support units. A dedicated Security Officer should be assigned to all the units. In addition, the company needs to conduct a security awareness program for all ODCs. The following figure shows a typical IT Security Team structure.

Structure of an IT Security Team
Figure 3. Structure of an IT Security Team (Brunetto, 2006)

This figure shows the structure of the IT Security Team of a company, ABC, Ltd. The figure shows the various SBUs and their locations. It also lists the responsibilities of the IT security team of the SBU and the centre.

The DRP Network Diagram

The DRP would need to cover all these units and assets. To allow quick backup and DRP procedures for the company, the following network diagram is proposed.

Network Diagram for DRP
Figure 4. Network Diagram for DRP (adapted from Preston, 1999)

In the diagram, the connectivity is allowed through a primary ISDN Back-Up Line and a Dial-Up Line. A separate ISDN line for backup is required since the backup process consumes extra bandwidth and may slow down regular business processes.

Based on corporate security policy, all the locations with direct Internet access/connection should be secured by deploying firewalls. You can have a dedicated team of professionals certified in various technologies who centrally manage the firewalls. You also need to have a change management procedure that enables you to incorporate any desired change in the existing setup within short notice. When a disaster occurs, if backup hardware exists, it can be used in the disaster recovery plan to restore services. You can protect gateways by installing Checkpoint Firewall Modules in the organisation Network. This enterprise wide implementation is managed using a central management console. At each location, a De-Militarized Zone (DMZ) must be created to protect important servers. It is also necessary to ensure that the policies installed on the Checkpoint Firewall Modules are based on the corporate network security policies. Precautions must be taken against Internet hacking and vulnerabilities. Vulnerabilities are holes or weak points in the network. The following figure shows a sample firewall installation for a location (Preston, 1999).

Firewall Network Diagram for DRP
Figure 5. Firewall Network Diagram for DRP(adapted from Preston, 1999)

The Firewall would ensure that unauthorised users would not be able to enter the network when backup processes are running or when a DRP plan is being implemented during a disaster.

Steps to Implement a DRP

Developing the DRP involves the following steps (Preston, 1999):

  • Risk Assessment
  • Business Impact Analysis
  • Strategy Selection
  • Business Continuity Plan Documentation
  • Testing
  • Maintenance

The next sections provide details of these steps.

Risk Assessment

In this phase, risks to the business processes have to be identified along with assessing existing mitigation measures and recommend mitigation measures wherever necessary. The activities in this phase help DRP administrators to determine the extent of the potential threat and the risk associated with the IT infrastructure and IT applications of your company. A threat is any circumstance or event that can potentially cause harm to the business. The risk assessment phase involves/includes the following (Hiatt, 2007):

  • Inventory: identifies/Documents the various business processes, hardware, software, communication links, documents, and associated people using standard templates developed by the risk assessment team.
  • Threat analysis: Identifies various threats to the business processes. It also identifies the probability of a threat being executed and the potential impact a threat will have on the business in the event of its execution. This is done using a standard template developed by the risk assessment team. The risk assessment team identifies a list of over 35 possible threats to any asset. Based on this list, each location is assessed for the probability of each threat being executed and the potential impact on the business processes.
  • Vulnerability analysis: Scans critical servers and hardware devices owned by the company periodically for identifying vulnerabilities and taking corrective actions based on the audit reports. These reports should be studied for their completeness and adequacy. In addition, while arriving at the probability of a threat being executed, the existing vulnerabilities of each location must be analysed.
  • Business Risk Assessment: Includes a detailed assessment of the practices followed by the business units with respect to risk management. The risk assessment team should conduct detailed interviews using standard questionnaires with senior representatives of the business units to understand the risk management practices of the individual business units.
  • Single Point of Failure Analysis (SPOF): identifies the most vulnerable business process. A SPOF is the weakest link in a business process. Each SBU must identify the SPOF at their locations.
  • Risk Matrix: Analyses the identified risk, derived by qualitative analysis of various threats and vulnerabilities to business processes through threats and vulnerabilities analysis, business risk assessment and SPOF analysis. The risk areas are classified as Very High-Risk Areas, High-Risk Areas, Medium-Risk Areas, and Low-Risk Areas. You can also recommend mitigation measures for each risk area identified.

The following figure illustrates the risk analysis for the company.

Risk Analysis for DRP
Figure 6. Risk Analysis for DRP (Hiatt, 2007)

A number of templates have to be used at this stage to gather information about a project. These would provide micro information at a project level or at a client level. Some templates that need to be used include (Ambs, 2000):

  • Template for DRP Resource Requirements: This template is used to gather data for resources that are required to prepare a DRP.
  • Template For Project: This template is used to gather data about a project and helps to create a DRP at a project level.
  • Template For Project Team Details: This template is used to gather details of the project team members. The data is used to identify key members who may need to be moved to an alternate recovery site in case of a disaster.
  • Template For Client Team Details: This template is used to gather data about the client team details. Members identified here can be contacted in case of a disaster.
  • Template For Resource Requirement at Project Locations: This template is used to gather details of resources required at the alternate recovery site.
  • Template For Project DR Alternate Site: This template is used to gather data for an alternate recovery site.
  • Template At DR Location For People And Resources: This template is useful to gather data about people and other resources required at the alternate site.
  • Template For Min Required Resources At Alternate Site: This template is used to gather data about the minimum resources required at the alternate recovery site. Details of software and hardware that would be required need to be listed.
  • Template for Project Recovery Plan: This template is used to gather data for project recovery.

A sample template is shown below:

Table 1. Sample Template for Risk Assessment (Ambs, 2000)

Project Disaster Recovery Plan – Project DR Procedures
Backup And Recovery Procedures.
Indicate Backup procedures and other details for each software resource (E.g. database, code under development etc.) and paper-based resource (e.g. hard copy of a contract signed with customer etc.)
Backup Procedures
Frequency of Backup Weekly
Location of Stored Data CA
File Naming Convention 8.3
Description
Responsibility for taking backup Jane Doe
Recovery Testing Procedures.
Indicate how frequently will be backed up data be tested for recovery, what will be the sampling methodology, who will test for recovery, who will approve test results etc.
Frequency of Recovery Testing Monthly
Sampling Method for Recovery Testing Random
Description
Responsibility John Doe
Recovery Procedures
Describe the procedures that will be used to recover the resource in the event of a Disaster. Detailed step by step procedure to get the application/function up and running.
Description Install oracle and import all data.
Responsibility Mike

Business Impact Analysis

The overall objective in this phase of the project is to gain an understanding of the business processes and to lay the framework of a business continuity plan for the business units. A Business Impact Analysis (BIA) must be performed with the objective of (Benton, 2007):

  • Evaluating the risk to the business due to systems and process failures.
  • Identifying critical business processes and the associated computing applications.
  • Estimating the impact of disruption.
  • Defining the recovery time objectives for critical business processes.

Following figure illustrates the methodology used for BIA.

Business Impact Analysis
Figure 7. Business Impact Analysis (Benton, 2007)

This figure shows the business impact analysis approach. BIA is performed by interviewing business processes owners using detailed questionnaires or templates. The primary areas on which the interviews should focus are (Benton, 2007):

  • Identification of critical business processes and critical resources and applications associated with critical business processes.
  • Interfaces between various business processes.
  • Identification of outage impacts of business function unavailability and maximum allowable downtimes.
  • Prioritisation of recovery processes through recovery time objectives.

The resultant BIA documented for each business process describes the following:

  • The outage impact on the business process.
  • The criticality of each business process based on the outage impact. The business processes are classified into four levels of criticality – Mission Critical, High Criticality, Medium Criticality, and Low Criticality Business Process.
  • The minimum human resource required to sustain the business process during a disaster.
  • Criticality of locations from where the business processes are executed.
  • Criticality of the IT infrastructure that supports the business processes.
  • Existing recovery times for the business processes in terms of hardware acquisition time and software installation time.
  • Recovery time objectives for the business processes depending on the criticality of the business process.

Strategy Selection and Implementation

Based on the risks identified in the risk analysis phase and the RTO defined in the BIA phase, strategies are identified to mitigate the risks and satisfy the RTO adequately. The strategies included – for each business process and associated resource is (Margaret, 2007):

  • Infrastructure Strategy: Includes hardware, software, and networking redundancy.
  • Alternate Site Strategy: Defines the alternate site from where the business process will be recovered in case of disaster.
  • Equipment Strategies: Ensures availability of necessary equipment at the alternate site.
  • People Strategies: Ensures availability of critical personnel at the alternate site, e.g., specialised software like databases, operating systems need skilled people who know what needs to be done to get the applications running quickly.
  • Other Strategies: Handles insurance, service level agreements, and annual maintenance contracts to transfer risks that cannot be mitigated directly.

In order to tackle the operational contingencies for a large organisation, the BCMP outlines the BCP concept of operations. The concept of operations is based on the risk mitigation strategies identified by the BCMP and approved by the corporate centre.

DRP – BCP Structure

Based on the size, geographical spread, and complexity of the organisation structure, the DRP is divided into individual BCP for the various SBUs. Each SBU shared service and location. The location BCP covers the infrastructure and support functions for the location, whereas the business unit BCP covers the SDLC – Software Life Cycle Development Cycle for all projects executed from the SBU site. The shared services BCP include the continuity plan for support services, such as finance, accounts, and human resource. Depending on the type and extent of the BCP event, relevant BCP is invoked. The following illustration gives the BCMP structure for a company (Pfleeger, 2002).

BCMP structure for ABC
Figure 8. BCMP structure for ABC Ltd. ()

Crises Team Management Structure

Each BCP identifies a Crisis Management Team (CMT) that will take charge of respective operations in the event of a disaster. The composition of the various Crisis Management Teams is depicted in the following figure (Swartz, 2004).

CMTP structure for a Location DRP
Figure 9. CMTP structure for a Location DRP (Swartz, 2004)

Process Flow to identify disaster and activate DRP

Communication lines should be established that follow guidelines for reporting and managing disasters. The process flow diagram shown in the following describes the various stages of reporting a disaster.

Process Flow diagram for reporting disasters
Figure 10. Process Flow diagram for reporting disasters (Kaye, 2006)

The CMT may decide to activate some BCP procedures even before the DAT reverts back to the CMT with the Damage Assessment Report. This ensures that in case of a severe disaster, business processes having a low recovery time objective are activated immediately without awaiting a detailed assessment of the extent of the damage.

DRP Invoking Procedures

DRP activation depends on the level of disaster. The BCP documents the following procedures during a disaster (Preston, 1999);

  1. Procedures for invoking relevant BCPs
  2. Procedures for communication of disaster.

This includes procedures for: First notification of disaster and further escalation to CMT:

  • Notification of disaster to SBU heads
  • Notification of disaster to employees
  • Notification of disaster to customers
  • Notification of disaster to Media Management

Procedures for Emergency Evacuation including Roles and Responsibilities of various personnel involved in Evacuation and Recovery Procedures for various Infrastructure Items and IT Applications.

Project Specific Disaster Recovery Plan

Each project should prepare a DRP before the start of the project in pre defined templates. Each Project Disaster Recovery Plan identifies an alternate site from where the project will be executed in case the primary location is inaccessible based on the requirements of the project and the availability of infrastructure at an alternate site. This information is available from various templates that are used in the risk assessment (Toigo, 2005).

The plan should identify critical project team members who will be shifted to the designated alternate location in case of such an incident. Where an employee may need to travel to on-site locations during a disaster, travel and other necessary documents are kept ready.

Data backup for all Projects should be stored at a predetermined location.

In case of a disaster where the primary site becomes inaccessible, each SBU from that location communicates requirements to the CMT to shift project team members.

CMT facilitates the transportation of key employees to alternate locations through the Administration department.

Notification Procedures

A structure to notify disasters should be in place. This structure is also called a call tree. A call tree to notify the occurrence of a disaster is shown in the following figure.

Call tree to notify disasters
Figure 11. Call tree to notify disasters. (Toigo, 2005)

The figure shows the structure used to notify affected parties about the disasters. Emergency Procedures for Project DRP are:

  • Control will be transferred to on-site – if required.
  • If recovery is required from an alternate location, acquire resources or infrastructure from CML.
  • Initiate the process of recovering processes, data, and applications as per the RTO or identified priority.
  • Make arrangements for transportation of people (as identified in Project DRP)
  • Resume operations at an alternate location.
  • Confirm all Mission Critical services are restored
  • Use call tree to notify affected parties that services have been restored from an alternate location.
  • Take control back offshore.

Testing

Testing helps to evaluate the ability of recovery staff to implement the plan quickly and effectively. Each element of the BCP and DRP should be tested to confirm the accuracy of individual recovery procedures and the overall effectiveness of the plan. Plan testing is designed to determine (Pfleeger, 2002):

  • Whether the recovery teams are ready to cope with a disruption.
  • Whether recovery inventories stored off-site are adequate to support recovery operations.
  • Whether the business continuity plan has been properly maintained.

Test Plan

Before conducting the test, a detailed test plan should be developed. The test plan includes (Pfleeger, 2002):

  • Scope of the Test – Defines the boundaries of the test. For example, it lists the location, area, projects, components, and data.
  • Test objectives.
  • Test Scenario – This includes
  • Type of Test – For example, Structured Walkthrough Test, Component Test or Full Function Test.
  • Test Schedule
  • Description of the Test Scenario
  • Success Criteria For the Test – including the method used to evaluate the test results.
  • Test Participants
  • Sequence of Activities

In addition, maintenance procedures should be implemented for the DRP. To prevent Level 1 incidents of virus and hacking attacks or due to improper behaviour of employees, a security policy should also be implemented. The policy would specify rules of conduct while working, rules for email, data storage, personal storage devices such as iPods, MP3 players, mobiles with cameras and others.

Conclusions

This is seen as the bare minimum necessary for the creation of a useful DRP/BCP for PWA. It is provided with the hope that this will help in the decision to create and implement a DRP/BCP for this organisation. Please see Appendix A for a brief comparison of the two plans.

Appendix A

Comparison Chart
A Comparison Chart for Disaster Recovery VS Continuity Planning

References

Ambs, Ken. 2000. Optimizing Restoration Capacity in The AT&T Network. Interfaces Journal. Volume 30. Issue 1. pp: 26-40

ANDERSON, P.L. AND GECKIL, I. K. 2003. Northeast blackout likely to reduce US earnings by $6.4 billion. in Landry and Koger, 2006.

Ballman, Janette. “Merri1l Lynch Resumes Critical Business Functions Within Minutes of Attack.” Disaster Recovery Journal. 2001 in Stevens.

Bandyopadhyay, Kakoli. 2001. The Role of Business Impact Analysis and Testing in Disaster Recovery Planning by Health Maintenance Organizations. Hospital Topics: Research and Perspectives on Heolthcare. Vol. 79, no. 1.

Bannon, Karen and Levin, Carol, 2002, Building Your Safety Net, www.pcmag.com. PC MAGAZINE iBiz 1-5.

Bannon, Karen. 2002. Keeping Your Business Afloat In The Face Of Disaster Requires A Lot Of Planning. PCMag.com. Pc Magazine: Let current events be your wake-up call.

Barr, Jean. 2003. A Disaster Plan in Action: How a Law Firm in the World Trade Center Survived 9/11 with Vital Records and Employees Intact. The Information Management Journal.

Benton, Dick. 2007. Disaster Recovery: A Pragmatist’s Viewpoint. Disaster Recovery Journal.

BREED, A.G. 2004. Frances leaves floods, power outages, takes aim at panhandle. Billings Gazette. in Landry and Koger, 2006. Web.

BRICE, J. AND LANGAN, H. 2004. Ivan leaves about 1.6 mln without power in U.S. south. Bloomberg. in Landry and Koger, 2006. Web.

Britt, Phillip. 2005. Taking Steps for Disaster Recovery. Information Today. Vol 22 Issue 9.

Brunetto Guy. 2006. Disaster recovery: How will your company survive? Journal of Strategic Finance. Volume 82. Issue 9. pp: 57-62.

Carlson, Caron. 2005. Agencies Under Fire For Disaster Recovery Plans. Eweek 37.

Choy, Manhoi; Leung, Hong Va; Wong, Man Hong. 2000. Disaster Recovery Techniques for Database Systems. Communications of The ACM.p272-280.

Curtis, Preston W. 1999. UNIX Backup and Recovery. O’Reilly Media, Inc. ISBN-10: 1565926420.

Dignan, Larry. 2006. Eweek. p4. Ziff-Davis Publishing.

Dignan, Larry. 2006. Eweek. p27. Ziff-Davis Publishing.

Facer, Dave. 2001. Rethinking: Business continuity. Journal of Risk Management. Volume 46. Issue 10. pp: 17-21.

Fisher, Sharon; Havenstein, Heather and Thibodeau, Patrick. 2006. Heavy Rotation of Storms Drives IT Action In Florida, Other States Katrina, other hurricanes have far-reaching effects. Computerworld: News, Special Report.

Fonseca, Brian. 2002. Ready With Plan B: Vendors Tap Services To Offer Aid In Disaster Recovery Preparation. Services: Strategic News and Analysis. Pp41-42.

Gibson, Stan. 2001. Disaster: Terrorism Puts The Nation And Human Spirit To Extreme Test. Eweek. volume 18.number 36.

Gilchrist Bruce. 2001. Coping with Catastrophe: Implications to Information Systems Design. Journal of the American Society for Information Science. pp: 271-278.

Griffiths, Karen and Kramolis, Tammie, 2007, Twelve Steps to Recovery: Grygo, Eugene. 2001. The Year in Review. Infoworld.

My Co-Dependent Relationship with Tivoli. SIGUCCS’07. Orlando, Florida, USA.

Hiatt Charlotte J. 2007. A Primer for Disaster Recovery Planning in an IT Environment, 2nd Edition. ISBN-10: 1878289810.

Kaye David, Graham Julia. 2006. A Risk Management Approach to Business Continuity: Aligning Business Continuity with Corporate Governance. Rothstein Associates Inc. ISBN 1-931332-36-3.

Keeton, Kimberly; Beyer, Dirk; Brau, Ernesto; Merchant, Arif; Santos, Cipriano and Zhang, Alex. 2006. On the Road to Recovery: Restoring Data after Disasters. Hewlett-Packard Labs. Palo Alto, Ca, USA.

Landry, J.L., Koger, M. Scott. December 2006. Dispelling 10 Common Disaster Recovery Myths: Lessons Learned from Hurricane Katrina and Other Disasters, ACM Journal on Educational Resources in Computing. Vol. 6, No. 4.

Lundquist, Eric, 2004, Up Front: To Do List Beckons: Disaster Recovery Plan, Inventories Are Must Projects. eWEEK.

Margulies Stuart. 2006. Preparation for the DRP test: (Degrees of Reading power). 2nd Edition. Educational Design publications. ISBN-13: 978-0876942857.

Meade, Peter. 1993. Taking the risk out of disaster recovery services. Journal of Risk Management. Volume 40. Issue 2. pp: 20-26.

Mearian, Lucas and Weiss, Todd R. 2005. Lessons Learned: IT Managers Steel for Rita; Users rush to implement disaster recovery plans, find off-site hosts ahead of latest storm. Computerworld.

Mearian, Lucas. 2005. Users Are Rethinking Disaster Recovery Plans: Bank dumps its outsourcer, brings program in-house. Computerworld.

Nemzow, Martin. 1997. Business Continuity Planning, Int. J. Network Mgmt., vol. 7, 127–136.

Pember, Margaret. 2007. Information disaster planning: An integral component of corporate risk management. ARMA Records Management Quarterly. Volume 30. Issue 2. pp: 31-39.

Pfleeger, Charles P. 2002. Security in Computing, 3rd Edition. Prentice Hall PTR. ISBN-13: 978-0130355485.

Presswire. 2008. Price Waterhouse Coopers. New survey raises serious concerns about the effectiveness of disaster recovery plans. M2 Presswire. pp: 2-3.

Pudy. Teresa.1999. Personal Interview done for Williams Communications by Mueller-Steel Inc. Interviewer Karena Andrusyshyn on web.

Sarrel, Matthew D. 2007. Your Disaster Recovery Plan:Building a business continuity plan in case of disaster is vital to the survival of your company. PC Magazine.

Seewald, Nancy and Damico, Westher. 2005. Chemical Week.

Stevens, David O. 2003. Protecting Records in the Face of Chaos, Calamity and Cataclysm.The Information Management Journal.

Swartz, Nikki. February 2004. Survey Assesses the State of Information Security Worldwide. Information Management Journal. Volume 38. Issue 1. pp: 16-20.

Symoens, Jeff. 2000. Preparing Enchange for High Availability. Inforworld.

Toigo, Jon William. 2005. Disaster Recovery Planning: For Computers and Communication Resources. Wiley; Publications. ISBN-10: 0471121754.

Vijayan, Jaikumar. 2005. Data Security Risks Missing From Disaster Recovery Plans: Scope of contingency programs needs to be expanded, execs say. Computer World.

Walsh, Catherine. 2001. Leadership on 9/11: Morgan Stanley’s Challenge. Harvard Business School. Working Knowledge for Business Leaders. Web.

Yager, Tom. 2002. Enterprise Stratecies: Keep it covered: Squeeze business continuity planning in among your many other underfunded IT priorities. Services: Strategic News and Analysis.

Find out your order's cost