Scaling Genomics Research

VE3 Enables High-Performance Cloud-Based Research Platform on AWS

Overview

Genomics research demands unparalleled computational resources to process vast datasets and run complex simulations. At VE3, we partnered with a leading genomics and biomedical research organization to build a high-performance, cloud-based platform on AWS. This collaboration aimed to overcome existing limitations in computational power, reduce operational costs, and enhance global collaboration among researchers.

About the Customer

Health trusts and research labs across the UK are at the forefront of genomics research, especially in the wake of the COVID-19 pandemic. These institutions require advanced digital and cloud solutions through NHS SBS to enhance their operations, improve data management, streamline workflows, and boost their research capabilities. With the significant amount of data these organizations manage, there is a pressing need to scale resources effectively to meet increasing demands while also reducing costs. Cloud solutions are essential to achieving these goals, enabling efficient healthcare delivery and accelerating research advancements in genomics and other critical areas of medical science.

Challenges

The institution was encountering several critical challenges that were impeding their research progress

Limited Computing Power

The on-premises infrastructure was struggling to keep up with the computational demands of expanding genomic research. The hardware limitations were causing significant delays in data processing and analysis, directly impacting the pace of research.

High Maintenance Costs

The costs associated with maintaining and upgrading the on-premises high-performance computing (HPC) cluster were escalating. Resources that could have been allocated to research were instead being used to manage and maintain outdated infrastructure.

Collaboration Barriers

With researchers located across various regions, the existing system was inadequate in facilitating seamless collaboration. Sharing large datasets and coordinating analyses in real-time was challenging, slowing down collaborative efforts and data-driven discoveries.

Without addressing these issues, the institution risked delays in critical research projects, increased operational costs, and potential setbacks in their quest for scientific breakthroughs.

VE3 Professional Services Approach

VE3 collaborated closely with the institution’s IT and research teams to develop a tailored solution. The approach included

Needs Assessment and Cloud Architecture Design

Conducted a comprehensive assessment of the institution's infrastructure and research objectives to identify enhancement areas and designed a scalable cloud architecture tailored to their needs.

Detailed Implementation
Plan

Outlined a step-by-step deployment strategy to transition from on-premises to a cloud-based platform, minimizing disruptions and maximizing resource efficiency.

Continuous Engagement and Optimization

Provided ongoing support to ensure the platform met evolving computational and collaboration needs.

VE3’s Solution

A Cloud-Based Research Platform on AWS

To overcome these challenges, we at VE3 collaborated closely with the institution’s IT and research teams to design and implemented a cutting-edge cloud-based research platform on AWS. Our team designed a scalable cloud architecture tailored to the institution’s needs, including high-performance computing, secure data management, and efficient collaboration tools for genomic research.

Solution Implementation

AWS Batch

Managed and executed large-scale batch processing jobs for genomic data analysis. Configured to handle high-volume workloads, AWS Batch dynamically scaled compute resources based on job requirements. Integration with Spot Instances optimized cost-efficiency, reducing computational costs by up to 50%. 

Amazon EC2 Reserved Instances and Spot Instances

Utilized Reserved Instances for predictable, long-term workloads, ensuring cost savings and capacity assurance. Employed Spot Instances for high-performance, variable workloads, optimizing compute costs by leveraging unused EC2 capacity. This combination balanced compute power and cost efficiency.

Amazon S3 and Amazon S3 Glacier

Managed scalable storage for active and archival data. Amazon S3 handled frequently accessed data, while lifecycle policies transitioned older data to S3 Glacier for cost-effective long-term storage. Vault Lock policies were implemented for data immutability and compliance. 

AWS Glue

Facilitated data integration and transformation for genomic datasets. Created ETL jobs to clean, normalize, and partition data, with integration to Amazon Redshift. Glue Crawlers were used to catalog and discover data efficiently. 

Amazon Redshift

Served as a high-performance data warehouse to store processed data from AWS Glue. Enabled complex queries and analytics on large-scale genomic datasets, integrating seamlessly with data pipelines to support in-depth research analysis. 

AWS Service Catalog

Provided a central repository for managing and distributing approved AWS resources and configurations. Enabled standardized deployment of cloud resources across the organization, ensuring consistent and compliant use of AWS services. 

Amazon EKS

Hosted the web service platform on Amazon Elastic Kubernetes Service (EKS). Provided a scalable and managed Kubernetes environment to deploy, manage, and scale containerized applications, enhancing the platform's resilience and operational efficiency. 

Amazon RDS

Offered managed relational database services for metadata and structured data. Deployed Multi-AZ instances to ensure high availability and automated backups, with scaling and performance tuning to support growing data needs. 

AWS DataSync and AWS AppFlow

Utilized AWS DataSync to automate and accelerate the transfer of large genomic datasets between on-premises storage and Amazon S3, ensuring efficient and reliable data synchronization. AWS AppFlow was employed to securely integrate and transfer data between Amazon S3 and third-party applications, facilitating seamless data movement and synchronization across the research platform 

AWS Security Hub

Aggregated and prioritized security findings from across AWS services. Enabled continuous monitoring and compliance checks to ensure a secure and compliant environment for handling sensitive genomic data.

AWS IAM

Managed user access and permissions across AWS services. Implemented fine-grained access controls to ensure that only authorized personnel had access to sensitive data and critical resources. 

Amazon ECR

Stored and managed Docker container images for applications running on Amazon EKS. Provided a secure, scalable repository for container images, supporting efficient deployment and management of containerized applications. 

Amazon CloudWatch

Monitored the health and performance of the cloud environment. Custom dashboards were created to track key metrics like CPU utilization, job completion times, and data transfer rates. 

AWS Key Management Service (KMS)

Implemented to manage encryption keys and enhance data security, for data in transit and at rest. 

Implementation and Deployment

High-Performance Computing (HPC) Environment

We set up a scalable and robust computing environment on AWS, enabling researchers to run complex simulations, data analyses, and genetic studies without being constrained by hardware limitations. The environment was designed to automatically scale based on the computational demands, ensuring optimal performance at all times.

Data Management and Security

Our team implemented stringent data management practices, including encryption, access controls, and audit trails, to ensure the integrity and security of sensitive genetic data. AWS’s compliance with various global standards further reinforced the institution’s data security posture.

Data Encryption and Secure Communication

We applied strong encryption to safeguard genetic data both at rest and in transit using AWS Key Management Service (KMS). Data in Amazon S3 and S3 Glacier was encrypted with customer-managed keys, while SSL/TLS protocols secured communications between AWS services.

Ongoing Support and Optimization

VE3 provided continuous support to ensure the platform remained optimized

Performance Monitoring and Updates

Regularly monitored the platform's performance and applied updates to enhance efficiency and security.

Model and Data Integration Updates

Ensured the platform could integrate new data sources and adapt to the institution's evolving research requirements.

Cost Management

Continuously evaluated resource usage to maintain cost-efficiency.

Results and Benefits

The implementation of AWS Professional Services brought transformative results for the research institution

Increased Computational Capacity

The new cloud-based platform increased the institution’s computational capacity by 300%, allowing them to process and analyze vast amounts of genetic data at unprecedented speeds. This acceleration in computational power enabled researchers to complete projects faster and take on more complex studies.

Significant Cost
Savings

By utilizing Amazon EC2 Spot Instances and AWS Batch, we cut computing costs by 50%. This reduction in expenses allowed the institution to redirect funds from infrastructure maintenance to advancing their research initiatives, thus maximizing their research investments.

Enhanced
Collaboration

The new platform broke down collaboration barriers, enabling researchers to work together more effectively, regardless of their physical location. This improvement in collaboration capabilities significantly enhanced the institution's research output and innovation potential.

Conclusion

The partnership between VE3 and the research institutions resulted in a highly successful transformation of their computational infrastructure. By moving to a cloud-based platform on AWS, we not only solved the immediate challenges of limited computing power, high maintenance costs, and collaboration barriers, but we also positioned the institution to achieve long-term success in their genomics research endeavours.

Our tailored solution enabled the institution to scale its research operations, improve collaboration across global teams, and significantly reduce operational costs. With the new AWS-powered platform, the institution is better equipped to drive scientific discoveries, develop new treatments, and continue its leadership in the field of genomics and biomedical research.

About VE3

VE3 Professional Services specializes in delivering AWS cloud solutions tailored for the research and education sectors. By leveraging AWS’s extensive cloud computing capabilities, VE3 helps institutions accelerate their research processes, optimize costs, and enhance collaboration. VE3 is committed to driving innovation and enabling scientific breakthroughs through scalable, secure, and efficient cloud platforms.