7 Cloud SLA Must-Haves for Life Sciences Organizations

In line with current technological trends across all industries, the life sciences are transitioning from a classic on-premise model to a cloud-based, virtualized computing model to host regulated applications and information.

In doing so, life sciences organizations are effectively transferring some—or all—of their computing responsibilities and control to third-party cloud service providers – something that some compliance leaders find difficult to come to terms with.

Maintaining a cloud-based system in a state of compliance is consequently dependent on the activities of the service provider. With this in mind, it makes sense to ensure that the roles and responsibilities of the cloud service provider and customer are understood and agreed to by both parties.

Above and beyond what common sense dictates, local legislation (e.g. EU Annex 11, reference 1) may require formalized service agreements to exist between third-party service providers and customers from a regulated environment.

In this article, we explore some of the key issues and points of concern that life sciences organizations should identify and address in their Service Level Agreements with cloud service providers.

Before We Start: What Exactly is an SLA?

An SLA (Service Level Agreement) is a section of a service contract between two parties. While the contract itself is enforceable by law, the SLA focuses on providing Key Performance Indicators (KPIs) which can be used to measure service performance and quality as agreed to by both parties.

An SLA should cover the full scope of services: which system(s) are covered, any constraints or exclusions, and the duration of the agreement. The services to be provided should be well described, including the identification of key roles responsible for delivering those services.

Tip: Be wary of escape clauses (circumstances under which the level of service promised does not apply).

7 Cloud SLA Must Haves for Life Sciences

1. Reliability and Availability

Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time, while availability is the degree to which a system or component is operational and accessible to you when required (IEEE, reference 2).

Targets for both of these parameters need to be well-defined in the SLA (e.g. 99.99% uptime during work days, 99.9% uptime for nights/weekends). These expectations should be based on your fault tolerance and readiness for system downtime. What constitutes “downtime” should also be defined in the SLA: Is it only unscheduled downtime or does planned maintenance count towards downtime as well?

Tip: If you have a comprehensive Business Continuity Plan in place that allows you to continue operations during downtime, you may be prepared to accept more system downtime and you should use this to your advantage during contract negotiations. Service providers who guarantee you higher uptimes will want to be compensated for the effort that goes into ensuring the maximum availability of your system, so you could potentially decrease your fees if you are willing to accept lower uptimes.

2. Monitoring and Performance

The SLA should clearly identify who is responsible for monitoring performance, what kind of data will be collected, and how often it is collected. Metrics used to measure performance objectives need to be identified and calculation methods need to be defined.

Consider the relevance of the following metrics:

MTBF (Mean Time Between Failure - for repairable systems) or MTTF (Mean Time To Failure - for non-repairable systems): This is a straightforward parameter to quantify reliability.
MTTR (Mean Time to Recovery or Repair): This is the expected time to recover a system from failure and it is used to quantify availability. The longer it takes to recover a system from failure, the less available your system is (i.e. greater downtime).
Provisioning Interval: This is the mean time to bring up or drop a resource (e.g. provision a new virtual machine or add resources (CPU, storage) in IaaS; bring a new application instance online in SaaS). This is used to quantify elasticity.
Mean Response Time: This can be two-faceted. From a technology standpoint, you want to know if the application is taking too long to respond. From a human (customer service) standpoint, you want to know how long it takes the support team to answer requests. This is used to quantify responsiveness.

These metrics allow the quality of service to be quantified or benchmarked.

Tip: When negotiating the SLA, it may be beneficial to include penalties (read: $$$) that would be applied if the cloud service provider fails to meet its performance objectives. This will provide incentive to maintain a certain level of service and provide some financial “insurance” against lost service time.

3. Data Security and Privacy

While defining the terms of an SLA, it is worth going through an exercise to analyze and identify the types of electronic records that will be created or maintained on the system. Depending on the intended use of the system, the electronic records may or may not be required by predicate rules or be subject to specific data security and privacy laws.

Data Encryption

In 21 CFR Part 11, the FDA defines an “Open System” as an environment in which system access is not controlled by those persons who are responsible for the content of the electronic records on the system.

Considering this definition, the cloud delivery model that you have chosen (PaaS vs IaaS vs SaaS) and the corresponding system administration responsibilities become extremely important. By choosing to operate your system on the cloud, you are surrendering control over certain aspects of the system. If the service provider manages system access, you are then dealing with an “open” system.

Tip: Find out about data encryption. Are transmissions over the internet secured (through encryption)? Is the service provider encrypting all stored and transmitted data?

Location of the Data

Depending on the location of your operations you may have an obligation to comply with data privacy laws (such as the EU Data Protection Directive or the future GDPR, reference 3).

Tip: Inquire about the possibility to choose the geographical location where your data will be stored, allowing you a degree of control over your compliance with relevant legislation.

4. System Maintenance and Changes

The cloud service provider will inevitably make changes to the system, be it scheduled maintenance for updating systems, adding new services, or unplanned repairs. How will upcoming changes, patches, or planned maintenance be communicated? Moreover, how are responsibilities appointed with regards to maintaining system documentation up to date following a change?

Tip: As changes have a direct impact on maintaining a system in a state of compliance, you should ensure the SLA addresses the change management process and outlines key communication timeframes and documentation.

5. Business Continuity Planning: Disaster Recovery and Data Protection

While we all hope that there will never be catastrophic event, such events are usually outside of our control. When disaster strikes the service provider, you should activate your organization’s Business Continuity Plan to oversee continued operations while you wait for service to be restored. Two crucial parameters for business continuity planning are the Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

The RPO is the maximum tolerable period (hours or days) during which data might be lost. Ask yourself how much data you can afford to lose, considering the amount of data that you generate in the system and the timeframe in which this data is generated. If you are generating lots of data in little time (e.g. a laboratory data management system or a SCADA), you likely cannot tolerate losing more than a few hours worth of data. If you are only sporadically adding data to a database (e.g. adding documents to a Quality Management System), losing a day’s worth of data probably won’t be a substantial loss or present major costs to your business.

The RTO is the target time set for resumption of service after an incident or disaster. The faster you need to return to an operational state (lower RTO), the more preparation is required for disaster recovery readiness.

Tip: Both the RPO and RTO should be stated in the SLA and should take into about the amount of resilience you require. While the frequency of backups and guaranteed service resumption are key, it’s important to avoid paying for “over-resilience”, features that go unused and are technically more than you actually need.

6. Problem Reporting

Let’s face it - things will go wrong with how cloud services are delivered (e.g. unplanned downtime, unanticipated changes made by supplier). So, you need to have a clear procedure in place for reporting, handling, and escalating problems. The following points should be agreed to, understood, and documented:

Who are you calling when there is a problem? Is it sufficient to reach out to a general call center or support inbox, or do you need a direct line of contact to the appropriate administrators?
How are reported issues prioritized? What rules are used to decide severity levels?
What are the allowable response times and resolution times?
What is the dispute mediation process? Under what conditions do problems get escalated?

Tip: Make sure the SLA includes escalation contact details. If you are concerned with monitoring incidents and issues within your quality system, the SLA should also describe reporting requirements and related responsibilities.

Moreover, the SLA should describe sanctions that may be applied following an unresolved dispute and the consequences for not meeting service obligations. This may include payment credits or reimbursements, or even contract termination.

7. Access to Data

As the customer, remember that the data residing in the cloud is yours. The data is owned by you and you should have every right to access it—at any time. The cloud service provider should never hold your data ransom, even if you are leaving a contract.

Complete and accurate copies of your records (in readable format) must be available for inspection by the regulatory agencies. Will you have immediate access to your data?

If you choose to withdraw from the service contract, how will the service provider support you to ensure all your data is returned to you?

Tip: Consider your exit strategy and the ability to move your data to a different service provider. Make sure the SLA outlines expectations for data repatriation in the event your contract with the cloud service provider is terminated.

The Takeaway

It is important to understand the type of data being generated and maintained in the cloud as this will affect the performance and quality of services that you, as the customer, will need to protect your data. Services provided through an SLA are usually directly proportional to the pricing model - the more services you expect to receive, the more you should expect to pay. In order to keep this pricing low and at the level of required service, performance targets should set early, monitored and be periodically reviewed.

Carefully review the SLA and build in the assurances you need – this will ensure that your data is accessible and protected under the terms that you and the service provider mutually agreed to, and provide a strong framework for a great partnership.

By Gianna De Rubertis | November 28, 2017 at 1:51 PM | Quality, Regulations & Standards | 1 Comment

7 Cloud SLA Must-Haves for Life Sciences Organizations

Before We Start: What Exactly is an SLA?