Multi-omics analysis of high-quality biospecimen using the latest advances in technology is a significant step forward in the development of new cancer treatments. The methods currently in use are highly complex and time-consuming – not least because a suitable data processing system is required. In a marked effort to operate more efficiently, the global oncology company Indivumed has entered into a collaboration with Lufthansa Industry Solutions (LHIND). The goal of this project was to implement new software that would both reduce costs and improve quality.
Indivumed is making strides in cancer research with its cloud-based software solutions
The Customer
Since its founding in 2002, Indivumed GmbH has been an innovative driving force in the field of oncology and precision medicine. Operating from its headquarters in Hamburg, Germany, and a subsidary in the USA, Indivumed has established a steadily growing global clinical network of hospitals and medical facilities.
Indivumed’s goal is to better understand the complexity of cancer in order to enable the development of personalized treatments.
Tapping into the wealth of information in multi-omics tumor data is critical if we are to understand the biology of individual tumors better.
To this end, Indivumed has developed IndivuType, the world’s first and most comprehensive multi-omics database: a discovery solution that combines diverse molecular biology information on the genome, transcriptome, proteome and phospoproteome, as well as digital histopathology data, with extensive clinical information from thousands of patients.
All cases are compiled pseudonymized in Indivumed’s global clinical network according to globally standardized procedures, so as to ensure the biological data is consistent and accurate. With thousands of cases, relating to multiple types of cancer, from Europe, the Americas and Asia and a sophisticated AI/ML-powered data analysis platform, IndivuType truly is a unique tool in the development of oncology drugs and diagnoses.
This enables Indivumed to claim that they are capable of accelerating the drug development process increasingly quickly, the individually tailored use of which leads to an improvement in cancer treatment.
- Formulating possible solutions as part of a one-day workshop
- On-site collaboration with the customer spanning several months
- Support in the transition from using software developed for scientific research to a contemporary, ready-for-use solution
- Definition of the computing infrastructure for the development, validation and operation of the software
- Migration of the previous software into another programming language
- Technical support during the new software’s validation phase
- Support in operating and monitoring the computing infrastructure
The Challenge
In this process, selected partner companies carry out part of the preparation for IndivuType, in particular genome, transcriptome and proteome analyses. This support results in Indivumed receiving highly complex and extensive data, which is processed further using computer-assisted techniques. Since the computer-based analyses require a high degree of computing and storage capacity, the research company’s long term objective was to work more effectively and economically both on these crucial aspects and on the latest technologies, hence the collaboration with LHIND.
The sheer volume of the whole genome sequencing data that Indivumed receives from its partner companies is a challenge in itself, as each individual data set can easily reach the three-digit gigabyte range in size.
The aim of Indivumed’s collaboration with LHIND was to develop a ready-to-use solution that could replace the software intended more for scientific research the company had been using previously and with which they can process several tens of thousands of tissue samples every year. Furthermore, this collaboration would reduce the time and costs that also had to be factored in. The simultaneous improvement, both in scalability and cost savings, the company was working towards was achieved with a completely new software architecture that had the potential to guarantee precise comparability and consistent quality of the results. The new solution had to be able to scale computing resources dynamically and all processing steps had to be automated.
The Solution
The solution Indivumed developed in collaboration with LHIND was implemented under the project name MOCCA (Multi-Omics for Cancer and Clinical Analytics).
In order to handle the enormous amounts of data and computing-intensive processes, while simultaneously creating a scalable solution, it was necessary to rent computing and storage capacities from a cloud provider. Additional flexibility comes from the dynamic model based on the use of the cloud. Resources are no longer reserved and purchased in advance, rather the computing network is scaled automatically, meaning only the capacities required are ever ordered. To achieve this, Indivumed is collaborating with Amazon Web Services (AWS).
A dynamically provisioned Kubernetes cluster, which is extremely efficient and cost-effective to operate, is installed as the computing network. The resources used by the cluster are distributed dynamically to multiple, physically separate AWS data centers. Instead of using reserved and prepaid instances, the Kubernetes cluster only uses “Spot Instances”. This is surplus computing power that other AWS customers trade at lower prices on a kind of electronic exchange.
After developing this new solution, the new software was tested extensively and validated to ensure consistency in data quality and comparability.
- Cloud Platform: Amazon Web Services (AWS)
- Storage: Amazon S3 (Amazon Simple Storage Service)
- Programming Languages: Python, Shell, Perl, R
- Data Banks: MySQL, SQLite
- Cluster Management: AWS Elastic Kubernetes Service (EKS)
- Workflow Processing: Argo Workflow Engine
- Monitoring Software: Prometheus, Grafana
- Continuous Integration/Continuous Delivery (CI/CD) System: Drone
Benefits for the Customer
The newly established solution allows the digital data set analysis to run faster, more efficiently and more cost-effectively than before. Indeed, 1,800 tissue samples were processed within the first month of using the new software alone. At peak times, more than 800 automatically and dynamically added computing nodes process the huge amounts of data.
The use of cloud computing with AWS and dynamic scaling have enabled a significant increase in performance in all relevant categories, providing Indivumed with a technological foundation that not only enables them to make an important contribution to the development of new cancer treatments, but also puts the research company in the most ideal position for tackling future challenges within the industry.
- Significant increase in performance through automatic and dynamic scaling
- Use of the latest cloud computing methods
- Analysis of 1,800 tissue samples in the first month
- Reduction in cost per patient sample and, in turn, the overall operating costs
- A technological foundation that can also be utilized for future projects
“Our data pipeline enables a highly scalable analysis of large cancer genome datasets. This is a re-implementation of the QuickNGS pipelines that Indivumed was using previously onto a modern, cloud-native technology stack, which allows tens of thousands of patient genomes to be processed each year. At the same time, we were able to significantly reduce the cost of cloud operations for genome analysis. Lufthansa Industry Solutions has been the driving force behind this effort by bringing their highly relevant cloud engineering expertise to Indivumed’s own in-house software development team.”
Dr. Peter Frommolt
Senior Director Bioinformatics