Disaster Recovery
Information Security Standard
Information Security Standards (ISS) are developed to support and enforce both District Administrative Regulations and the California Community College Information Security Standard.
Summary
An IT disaster recovery plan is pivotal in the overall business continuity strategy. And the purpose of business continuity is to maintain a minimum level of service in the District while restoring the organization to business as usual. If the District fails to put a disaster recovery plan in place then, when disaster strikes, the District risks penalization, loss of funding, certification, and/or loss of accreditation.
- Purpose and Scope
- Disaster Recovery Strategy and Components
- Roles and Responsibilities
- Update, Testing, and Maintenance
- What to do in the Event of a Disaster
- Disaster Declaration
I. Purpose and Scope
The objective of this Disaster Recovery Plan is to outline the strategy and basic procedures to enable the Long Beach Community College District (LBCCD) to withstand the prolonged unavailability of critical information and systems and provide for the recovery of District Information Technology Services (ITS) in the event of a disaster. DR is a component of Business Continuity Planning, which is the process of ensuring that essential business functions continue to operate during and after a disaster.
This is one of a series of information security guidelines maintained by the District Information Technology (ITS) department designed to protect LBCCD information systems.
1. Applicability of Assets
This Disaster Recovery plan has been designed and written to be used in the event of a disaster affecting LBCCD at the District’s central business offices in Long Beach, CA.
2. Applicability to all Employees and Volunteers
This Disaster Recovery plan applies to ITS Personnel but may impact all Board of Trustees, full-time and part-time employees, Substitutes, Short-term (Temporary) staff, consultants and contractors, work-study students, student employees, and volunteers who are employed in the LBCCD for the purpose of meeting the needs of the district.
3. Applicability to External Parties
May have applicability to external parties to the extent that hardware, software, or services provided by or utilized by the external party is affected by the disaster.
II. Disaster Recovery Strategy and Components
This plan is structured around teams, with each team having a set of specific responsibilities.
The LBCCD Disaster Recovery strategy is based on the following elements:
- IT infrastructure is designed with redundancy and application availability as primary considerations
- The ability to leverage cloud-based or alternate site locations and facilities
- Documented and routinely tested IT Disaster Recovery tier-based procedures for each application and service
- Business Continuity plans as developed by associated business areas
- This Disaster Recovery Plan describes:
- Disaster declaration
- A priority list of critical applications and services to be recovered
- Key tasks that include responsibilities and assignments for each task
- Departments and individuals who are part of the recovery process
Each critical application that has been identified in this ITS Policy & Procedure has its own Disaster Recovery Plan that can be found in the Appendices of this document.
Electronic copies of this Disaster Recovery plan and Appendices must be stored at secure and readily accessible alternate locations, which can be physical or cloud.
1. Business Continuity Plans
The Disaster Recovery Plan for a critical application is a complementary subset of departmental Business Continuity Plans (BCPs). These plans describe the actions to be taken within business areas that rely upon and use those applications.
Copies of BCPs will be documented and maintained by LBCCD business units as led and developed by the relevant Business Recovery Coordinator. The IT Disaster Recovery Coordinator will retain master copies of all LBCCD BCPs (see 3.3 for a description of roles).
Copies of all BCPs must be stored at secure and readily accessible alternate locations, which can be physical or cloud. All plans must be reviewed annually and updated for any significant changes.
All relevant LBCCD employees must be made aware of the Business Continuity Plan and their own respective roles. Training must be provided to staff with operational business and /or recovery plan execution responsibilities.
Business Continuity Plans must be developed with requirements based on the specific risks associated with the process or system. Business Continuity Plans must include, but are not limited to, the following information:
- Executive Summary
- Key Assumptions
- Identified Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO)
- Long-term vs. Short-term Outage Considerations
- Disaster Declaration / Plan Activation Procedures (e.g., communication plan, mobilization plan)
- Key Contacts / Calling Tree(s)
- Roles / Responsibilities (e.g., Recovery Teams)
- Alternate Site / Lodging
- Asset Inventory
- Detailed Recovery Procedures
- Relevant Disaster Recovery Plan
- Event and recover status reporting to LBCCD management, appropriate employees, third parties, and business partners.
Sufficient detail must be included so that procedures can be carried out by individuals who do not normally perform these responsibilities.
III. Roles and Responsibilities
1. Disaster Management Team
The Disaster Management Team is responsible for providing the overall direction of the data center recovery operations. It ascertains the extent of the damage and activates the recovery organization. Its prime role is to monitor and direct the recovery effort. It has a dual structure in that its members include Team Leaders of other teams. Responsibilities of the Disaster Management Team include:
- Evaluating the extent of the problem and potential consequences and initiating disaster recovery procedures.
- Monitoring recovery operations; managing the Recovery teams and liaising with LBCCD management and users as appropriate; notifying senior management of the disaster, recovery progress, and problems.
- Controlling and recording emergency costs and expenditures; expediting authorization of expenditures by other teams.
- Approving the results of audit tests on the applications which are processed at the standby facility shortly after they have been produced.
- Declaring that the Disaster Recovery Plan is no longer in effect when critical business systems and application processing are restored at the primary site.
The Disaster Management Team Leader is responsible for deciding whether or not the situation warrants the introduction of disaster recovery procedures. If he/she decides that it does, then the organization defined in this section comes into force and, for the duration of the disaster, supersedes any current management structures.
The Disaster Management Team will operate from a Command Center (To Be Determined), or, if that is not possible, at a secondary location (To Be Determined).
2. Disaster Management Team Members are
- Vice President – Administrative Services
- Director – Business Support Services
- Chief Information Systems Officer – Information Technology Services
- Senior Director – Facilities, Maintenance, and Operations
- Director – Fiscal Services and Payroll
- Executive Director – Public Affairs and Marketing
3. Recovery Coordinators
There are two coordination roles who will report to the Disaster Management Team:
A Disaster Recovery Coordinator is the communications focal point for the Disaster Management Team and Network Operations Team and will coordinate disaster notification, damage control, and problem correction services. The Disaster Recovery Coordinator also maintains the IT Disaster Recovery Plans and offsite copies and retains master copies of Business Recovery Plans.
- Director ITS– Network Services
- Director ITS– Applications, Development, and Support
Business Recovery Coordinators will develop and maintain Business Recovery Plans and coordinate recovery efforts and notifications in their business areas.
- Director – Admissions and Records
- Dean of Enrollment Services
- Deputy Director Finance and Accounting
- Payroll Benefits Manager
- Associate Dean of Student Support Services
- Executive Director – Classified Human Resources
- Director Business Support Services
- Director Academic Services
- Dean, Counseling and Student Support Services
- Director of Student Health and Student Life
- Dean – Student Affairs
- Accounting Supervisor
4. Network Operations & Telecommunications Infrastructure
The ITS Network Operations team is responsible for the computer environment (Data Center and other vital computer locations) and for performing tasks within those environments. This Team is responsible for restoring computer processing and for performing Data Center activities, including:
- Installing the computer hardware and setting up the latest version of the operating system at the standby facility.
- Arranging for the acquisition and/or availability of necessary computer equipment and supplies
- Establishing processing schedule and inform user contacts
- Obtaining all appropriate historical/current data from the offsite storage vendor
- Restoring the most current application systems, software libraries, and database environments.
- Coordinating the user groups to aid the recovery of any non-recoverable (i.e., not available on the latest backup) data
- Providing the appropriate management and staffing for the standby data center, help desk, and backup library in order to meet the defined level of user requirements.
- Performing backup activities at the standby site.
- Providing ongoing technical support at the standby site.
- Working with the Networks Team to restore local and wide-area data communications services to meet the minimum processing requirements.
- Ensuring that all documentation for standards, operations, vital records maintenance, application programs, etc. are stored in a secure/safe environment and reassembled at the standby facilities, as appropriate.
- Evaluating the extent of damage to the voice and data network.
- Discussing alternate communications arrangements with telecom service providers, and ordering the voice/data communications services and equipment as required.
- Arranging new local and wide-area data communications facilities and a communications network that links the standby facility to the critical users.
- Establishing the network at the standby site, and installing a minimum voice network to enable identified critical telephone users to link to the public network.
- Defining the priorities for restoring the network in the user areas.
- Supervising the line and equipment installation for the new network.
- Providing necessary network documentation.
- Providing ongoing support of the networks at the standby facility.
- Re-establishing networks at the primary site when the post-disaster restoration is complete.
5. Facilities
The Facilities Team is responsible for the general environment including buildings, services, and environmental issues outside of the Data Center. This team has responsibility for security, health, and safety and for replacement building facilities, including:
- In conjunction with the Disaster Management Team, evaluating the damage and identifying equipment that can be salvaged.
- Arranging all transport to the standby facility.
- Arranging for all necessary office support services.
- Controlling security at the standby facility and the damaged site (physical security may need to be increased).
- Working with the Network Team to have lines ready for rapid activation.
- As soon as the standby site is occupied, cleaning up the disaster site and securing that site to prevent further damage.
- Administering the reconstruction of the original site for recovery and operation.
- Supplying information for initiating insurance claims, and ensuring that insurance arrangements are appropriate for the circumstances (i.e., any replacement equipment is immediately covered, etc.).
- Maintaining current configuration schematics of the Data Center (stored off-site). This should include:
- Air conditioning
- Power distribution
- Electrical supplies and connections
- Specifications and floor layouts
- Dealing with staff safety and welfare.
- Working with Campus police, who will contact local law enforcement if needed.
6. Communications
Public Relations and Marketing is responsible for obtaining communications directives from the Disaster Management Team, and communicating information during the disaster and restoration phases to employees, suppliers, third parties, and students. All information that is to be released must be handled through Public Relations and Marketing.
The Communications Team may be made up of the Public Information Officer and individuals from the College, Marketing, Legal, HR, and business area organizations, as appropriate.
- Liaising with the Josh Castillano and Engagement, Disaster Recovery Coordinator, and/or Business Recovery Coordinators to obtain directives on the messages to communicate.
- Making statements to local, national, and international media.
- Informing suppliers and students of any potential delays.
- Informing employees of the recovery progress of the schedules using available communications methods.
- Ensuring that there are no miscommunications that could damage the image of the District.
- Any other public relations requirements.
IV. Update, Testing, and Maintenance
This Disaster Recovery plan must be kept up to date. It is the responsibility of the Disaster Recovery Coordinator to ensure that procedures are in place to keep this plan up to date. If, while using this plan, any information is found to be incorrect, missing, or unclear, please inform the Disaster Recovery Coordinator so that it may be corrected. It is important that everyone understands their role as described in this plan.
Updated versions of the plan are distributed to the authorized recipients, listed in Section 2.5.
This Administrative Regulation and the IT Disaster Recovery Plans as documented in the Appendices must be reviewed by IT and business management at least semi-annually and when significant application or infrastructure changes are made.
Plans must be tested periodically and at least annually, and include realistic simulations involving the business users and District IT staff. The results of DR tests must be documented and reviewed and approved by appropriate management.
V. What to do in the Event of a Disaster
The most critical and complex part of disaster response is mobilizing the required personnel in an efficient manner during the invocation of the plan. Because normal processes have been disrupted, individuals are taking on new roles and responsibilities and must adapt to changing circumstances quickly.
The key is for personnel to be well-rehearsed, familiar with the Disaster Recovery Plan, and be sure of their assignments.
1. Standard Emergency Plan
The first priority in a disaster situation is to ensure the safe evacuation of all personnel.
In the event of a major physical disruption, standard emergency procedures must be followed. This means immediately:
- Activating the standard alarm procedures for that section of the building to ensure that emergency authorities (fire, medical, law enforcement, etc.) are correctly alerted
- If necessary, evacuating the premises following the established evacuation procedures and assemble outside at the designated location, if it is safe to do so.
2. First Steps of Recovery Team
ACTION |
TEAM |
---|---|
Evaluate the damage |
Disaster Management, Facilities, Network Operations |
Identify the concerned applications |
Disaster Management, Network Operations |
Request the appropriate resources for the Standby Facility |
Disaster Management |
Obtain the appropriate backups |
Network Operations |
Restart the appropriate applications at the Standby Facility |
Network Operations |
Inform users of the new procedures |
Communications |
Order replacement equipment to replace the damaged computers/networks |
Network Operations |
Install replacement equipment and restart the applications |
Network Operations |
Inform users of normal operations |
Communications |
3. The Next Steps
- The Disaster Management Team Leader decides whether to declare a disaster and activate the Disaster Recovery Plan and which recovery scenario will be followed.
- The Recovery Teams then follow the defined recovery activities and act within the responsibilities of each team, as defined in this Disaster Recovery Plan and those defined for the critical applications outlined in the District IT Business Continuity Departmental Procedures.
4. Critical Business Applications / Services
The following Tier 1 business processes are considered Mission Critical to LBCCD’s business operations
- Tier 1 Student Information System
- Tier 1 Financial Systems
- Tier 1 Infrastructure hosting communications (Voice and Digital)
- Tier 1 Servers, File shares, Databases, Web Portals, and Application shares.
Please refer to District ITS departmental procedures to address the Tier 1 DR procedures for these services. Critical applications (Payroll/Financial System)
- Tier 1 Payroll HR System – (Rebuild, Restore, or Hybrid and Test)
- Database/Application Servers/Web Presentation Servers
- Tier 1 Email /File Storage
- Tier 1 Fiscal System
- Tier 1 Student System
- Tier 2 LaserFische
- Tier 2 Web Application Servers/
- Tier 2 F5 Load Balancers
- Tier 3 Third-Party Integrations
- Tier 3 Print Services
- Tier 4 Genetec Surveillance Cameras
- Tier 4 Dev Sandbox Environments
VI. Disaster Declaration
In the event of a serious system disruption, the Disaster Management Team will determine the level of response based on the disaster classification categories below. This determination will be made within four (4) hours of the occurrence.
The classification level should be reviewed every 12 hours and re-classification of the disaster will be made as needed until recovery is complete.
Disasters at LBCCD fall into one of the following four levels.
DISASTER CLASSIFICATION |
DESCRIPTION |
---|---|
Level 1 (Low) |
Sub-system Outage / Minor Damage Partial loss of a component of a critical application for a period of one day to one week. This type of outage does not result in the total loss of operation for that application; however specific functionality is reduced or impaired. In this scenario, only a part of the computer processing environment is impacted, but the communication lines and network are still up and running. The building is still available, and the users can use normal office space to wait for the restart of server or application processing. The goal of the recovery process in this case is to restore server or application functionality. |
Level 2 (Medium) |
Short Term Outage Complete loss of a critical application for a period of one day to one week. The ability to meet business functions and mission objectives may be impacted, usually by elongated processing cycles and missed deadlines, but not to a significant extent. In this scenario, a key computer processing application is unavailable. Communication lines or portions of the network may be down. The goal of the recovery process is to restore minimum critical application functionality, which may require moving affected applications to alternate equipment. An alternate site may need to be put on Standby. |
Level 3 (High) |
Long Term /Total System Disaster Complete loss of a critical system(s) for a period greater than two weeks. The ability to continue the business function and its mission is in jeopardy and may fail in some circumstances, such as missing critical milestones in the business cycle. In this scenario, key portions of the computer processing environment are unavailable. Communication lines or portions of the network may also be down. It may expand to entire processing environments experiencing a catastrophic disaster. The goal of the recovery process is to restore minimum critical application functionality either at the primary facility or at the Standby facility. Also included in this class are disasters that may not produce outages greater than two weeks, but involve more than one critical application; or natural disasters such as fires, floods, or other catastrophic situations. If time frames for repairs are not acceptable (e.g., will take longer than 1-2 months), an interim or new production facility may need to be acquired or leased. |