Finding Cures Faster by Making Genomic Data Available
Miracles seem to be commonplace at St. Jude Children's Research Hospital, a belief widely accepted by most Mid-Southerners. The latest example may be St. Jude Cloud, introduced in April 2018 and continuing to dazzle as it evolves under the eye of Alex Gout, PhD, the project's scientific lead.
The Cloud's nucleus began to coalesce ten years ago, when high-level researchers at St. Jude teamed with Washington University in St. Louis to devise the Pediatric Cancer Genome Sequencing Project. Until that time, Gout explains, most cancer research, in the form of cancer genomic sequencing, was focused on adult cancers--which affected a larger number of individuals.
The project sequenced about 700 pediatric cancers, which led to countless discoveries detailed in high profile publications and scientific journals. Published papers, Gout points out, require substantiating references, i.e. sequencing data or other data generated with that publication.
Central repositories, however, "were cumbersome to access, cumbersome for us to upload our pediatric cancer genome sequencing data to, and cumbersome for users around the world to download this data to their own computing environment--which might take hours or even days," Gout noted. "It was a big mess."
The solution - St. Jude Cloud: a platform in the cloud, developed at St. Jude to share all of the rich genomic pediatric cancer data generated over the previous ten years. From its launch in the spring of 2018 through August 2019, more than 50,000 users from across the world had accessed the Cloud. Today, less than 6 months later, that figure has more than doubled, and now exceeds 100,000 users, including 61 institutions in 16 countries.
St. Jude Cloud, a partnership with Microsoft and DNAnexus, offers three interactive data-sharing platforms where scientists can use an array of exceptional tools to manipulate data in the form of whole genomes, whole exomes, and RNA-sequenced data derived from pediatric cancer patients--rare data now available to doctors and researchers anywhere in the world.
"The driving force behind any scientific endeavor is discovery and advancement of knowledge," Gout reminds us. "Anyone in the pediatric cancer space is trying to discover or make advances toward better understanding of the disease and developing treatment. Therefore we want as many people looking at this data as possible, because the more brains we have on the case, the quicker we're going to come up with discoveries and solutions to treating childhood cancers."
The more that's shared, the more that's learned; increased genome sequencing and analysis are revealing more and more about the long-term negative health effects of cancer treatment. Over time, less-toxic interventions are replacing treatments like radiation, which is being eliminated as less toxic treatments are developed.
Gout's fascination with pediatric cancer genomics was home-grown in Australia, where he studied genetics as an undergraduate, followed by a graduate degree in computer science, a PhD in medical biology/infectious disease, and additional post-doctorate studies in pediatric oncology genomics before spending time at the Broad Institute at Harvard-MIT in Boston as part of his post-doctoral focus.
He returned to the U.S. in 2017 as editor for Nature Communications' Cancer Genomics, a prestigious scientific journal--until Jinghui Zhang, PhD and chair of the St. Jude Dept. of Computational Biology, invited him to join the fledgling St. Jude Cloud project, late in 2018, as the project's Scientific Lead.
"It's been a fantastic experience; St. Jude is an absolutely amazing place," he adds. "The St. Jude Cloud project is really paving the way for how we share genomic data with the world, and how we analyze genomic data, as well. I'm so happy that I came!"
His efforts shepherded the project into its next phase in May 2019--making real-time clinical genomics available in the Cloud. Instead of waiting for the publication of related research (the traditional path, which can sometimes take years) before sharing the data, St. Jude now uploads unmined, de-identified patient data on a monthly basis.
"We don't want to delay other people's access to this data--and potentially finding their own discoveries," says Gout. "Instead, as soon as we sequence the data and generate it, we want to put it out into the world. Because pediatric cancer is such a rare disease, we need a lot of data to find associations and commonalities and correlations in order to make discoveries. The faster we share, the faster we allow this discovery process to happen."
The Survivorship Portal, a new feature of St. Jude Cloud, was introduced last fall, as part of a National Cancer Institute (NCI)-funded project. "This is a fascinating study," Gout marvels. "They've accumulated between 3,000 and 5,000 patients who have, once upon a time, been treated at St. Jude for cancer. Many of these survivors return to St. Jude voluntarily every year, to have more than 100 different phenotypes measured: their blood pressure, heart characteristics, eye strength, mental agility, weight, height, BMI, etc. We study their DNA, if they agree; we look at these clinical phenotypes, the cancer they had in the past, the treatment used, and consider whether or not those things have affected their lives moving forward, in 5, 10, or 15 years.
"Proper analysis of these data sets may enable us to serve preventative measures when we're treating another child with cancer," he suggests, "by comparing the results shown for patients with a similar genomic profile, who received different therapies, as recorded in the Survivorship Portal.
"The Survivorship Project is a really beautiful retrospective study whereby we can leverage the accumulated knowledge that we can get from previously treated patients, and how we develop that treatment strategy for future patients."
The newest evolution -- Today, St. Jude Cloud anticipates its newest evolution: Federated data analysis of collaborating but currently incompatible genomic clouds. "Those who hold major pediatric databases use different clouds, and don't allow simultaneous access between them," says Gout.
"We have over 10,000 whole genomes sequenced within St. Jude Cloud," he reminds us. "With such a large number of samples, we need to think about new ways to house and analyze this data--new ways to develop processing and analysis software protocols that allow us to work with other large data repositories.
"This is a big paradigm shift in doing genomics research in computational biology and other medical research," he points out.
"St. Jude Cloud is the largest genetic sequencing repository in the world, but there are other very large sequencing repositories being born around the world as well. Rather than finding one place where we could combine all of this data, we're striving towards a federated data analysis approach, whereby we leave all the data in these other large pediatric cancer repositories and develop software analysis programs to analyze the data in multiple locations at the same time."
Other potential participants in the collaborative cloud project include Kids First, at the Children's Hospital of Philadelphia; and NCI's Target Study, housed inside NCI Data Commons.
"Collaborating with these other repositories and performing federated data analysis would allow us to leverage all of the data that each collectively has," he notes. This would allow researchers to view scattered and rare cases, comparing commonalities--or lack thereof--without knocking on multiple virtual doors in order to gather and analyze valuable research data.
"The federated data analysis project is something very cool that we'll embark on very soon," Gout believes.