Dr. Adrien Oliva is associated with CSIRO, an Australian e-Health Research Centre. In this article, he discussed the power of APIs in genomic research and how they transform the field of genomics.
I am part of the EM Steam, Australia’s biggest digital health program. We have a wide range of products and services, the most important being ONTOSERVER, which powers Australia’s and England’s clinical terminology services. Our goal as a team is to improve medical diagnostics and medical treatment of everyday life using genomic data.
Genomics is the study of DNA. DNA can be seen as the blueprint of the life of any living organism; it has all the information about the organism and what keeps it alive. DNA is your own storybook and the most important book ever written about you, as it contains information about your past, present, and future. It contains information about your entire bloodline. These data are sensitive and should be protected at any cost. In the wrong hands, it can give sensitive information to the wrong people, but in the right hands, it can give insight into diseases and help develop more effective treatments or even predict and prevent some diseases. The discovery we make is directly correlated with the amount of data available, and with the cost of sequencing getting lower and lower every year, we are adding more data than ever to study, and that’s why we are getting better and quicker responses.
This can be challenging for a researcher, and that’s where APIs and cloud solutions come in.
Analyze data
Variant Spark is a machine learning framework for genomic data. This software is used to find a genetic association with a trait or disease. In other words, it’s trying to find mutations in the DNA correlated with eye color, BMI, cancer, or any kind of disease. For this, you need two sets or two cohorts of people, one with the trait and one without. We then use a machine learning algorithm to find if there is a correlation between the gene mutation and the trait. This sounds simple, but a human has 3 billion letters in their DNA. So, when you’re working with hundreds of thousands of people, this amount of data can become impractical very quickly. Variant Spark is mainly run on the Cloud. The Variant Spark architecture allows us to access large data sets safely and allows us to send it to multiple instances to execute Variant Spark in parallel and run it quicker than other software.
We are not the only software to run this kind of analysis, but we are the quickest. We are scaling linearly as well as exponentially. This difference in scalability can be the difference between being able to run an analysis or not, which can be the difference between being able or not to find insight about the disease. All of that is possible thanks to the architecture, APIs and cloud solution, allowing us to access and store those volumes of data securely.
Create Trust
As a researcher, we like having lots of data, but genetics data always comes with consent. Consent is mandatory when we track this type of data. It’s primordial for us as researchers to create robust data protection to safely gather sensitive genomic data, but also to create trust between researcher and patient.
Let us look at an example. When we go to a hospital or clinic, they will create a file or get a blood sample or any other sample that will have my DNA. It will be sent to another lab, which will analyze and maybe send it to another sequencing facility. If you go to another clinic, this will happen all over again. This will create multiple copies of our DNA. At this point, we would lose complete data ownership over our DNA, and we wouldn’t even know where our DNA is. But what can also happen today is that if you’re part of a cohort, one central organization will control all your DNA, and this also raises a security concern. With the rise of blockchain and decentralized technology, the risk increases. So, we tried to develop a pipeline that is secure and safe for a patient to store data. The pipeline has to be transparent and auditable and has to keep track of all changes. So, we developed a protocol that coordinates with the consent protocol. It is called the Dynamic Consent Protocol. The patient can provide specific sharing consent, e.g., they can share data for cancer research but not for ethnicity research.
If the patient consents to share data, they will have access to educational videos and a dashboard where they can see preliminary results of research where their data was used. The app is called “macrokey”. It is a self-sovereign identity app. Individuals can create and control their identity without relying on centralized authorities. It provides server-less and password-less authentication. It thus brings back control in the hands of the individual for a more secure way to store, manage, and share personal data.
So, for this, we need a secure data exchange. This is where we may need Beacon protocol. This was created by GA4GH. Beacon API facilitates the search for genetic variants and metadata without compromising the privacy of the data set. A researcher will be able to ask the tool-specific questions. The tool will give information about a group, never about an individual. So, an individual’s specific data will never be shared with the researcher. This keeps the patient’s privacy intact. Thus, we allow researchers from all around the world to exchange genomic data safely and efficiently, allowing us to transfer information between different institutions and organizations. One of the main features is the ability to share this data while maintaining complete control within your infrastructure, which can be a crucial consideration for many organizations in the field.
To summarize, APIs and cloud solutions have revolutionized the world of genomics research by affording us the ability to analyze and access an unprecedented wealth of data, which leads us to the identification of crucial diseases related to genes, allowing us to find better treatments for diseases. They also enable everyone to access data securely. It places control of the data with the patient and grants them editing access. Lastly, secure sharing and access to data through API solutions creates a collaborative workplace that drives innovation and progress in the field of genomics, revolutionizing the world of genomics in general.