Developmental Biology - Brain|
Machine Learning & Diagnosis
New tool reveals molecular causes of disease, analyzing hundreds of diseases simultaneously...
Princeton University researchers are gaining new insights into causes and characteristics of diseases by harnessing machine learning. Using a new tool available to research worldwide — a team of computer scientists and biologists has experimentally confirmed previous contributions of four top-weighted genes with the rare form of the cancer primarily affecting babies and young children.
The team, which includes collaborators at Michigan State University and the University of Oslo, introduced the new system demonstrating its abilities in a paper published in the Feb. 23 issue of the journal Cell Systems. While previous approaches focused on genes associated with specific diseases types, this new technique uses machine learning to find unique patterns of gene activity by looking at more than 300 different diseases simultaneously, including cancers, heart disease and metabolic disorders.
The system, Unveiling RNA Sample Annotation for Human Diseases or URSA(HD), incorporates gene activity from publicly available records of almost 8,000 biopsies of healthy and diseased tissues — thousands of patients. In the future, researchers will be able to submit new samples to the database via the web, for analysis of associations between diseases and tissue types.
"The real innovation is comparing all samples to every other sample," explains Chandra Theesfeld, one of the lead researchers.
Theesfeld likened the system to a humans' ability to recognize nuanced differences between behaviors based on a wide variety of examples. Watching soccer players, for example, might reveal the characteristics of a kick, but watching soccer players and ballet dancers at the same time reveals more muscle detail in context of the activity.
"Studying them together provides a way to distinguish their unique aspects. A viewpoint providing an unbiased way to learn new things about disease impossible to find with a 'one-disease-at-a-time' approach. A way to potentially identify new targets for therapy or discover new aspects of disease previously unappreciated."
Chandra L. Theesfeld PhD, research scientist, of the Olga Troyanskaya PhD laboratory, Simons Foundation, Princeton University, and team leader.
In making comparisons, the algorithm gives more weight to differences in gene activity that uniquely define distinct tissue and diseases. It de-emphasizes information about gene activity common to related diseases, many already well studied. "Our method is driven by disease information in a patient sample, so it's not biased toward the popular disease genes that always get studied," Theesfeld explains. "We can track patterns of changes in data without knowing exactly what each change means."
Theesfeld noted that 90 percent of studies of genes look at just 10 percent of human genes. URSA(HD) looks at the entire human genome creating a genome-wide signature for each disease. The new approach could be particularly powerful with rare diseases, allowing researchers to create a model from just a few tissue samples. In the case of the pediatric form of neuroblastoma, researchers found four genes that contribute to the disease. To confirm their findings, Theesfeld performed laboratory tests on human cells, manipulating the gene activity and observing their effects on cancer related processes in the cells.
Rather than looking at DNA itself, URSA(HD) looks at the RNA products cells create as they transcribe DNA into molecules. Molecules that go on to build and run cells, transmitting signals from cell to cell.
The system looks beyond mutations, focusing on downstream transcription products which can become dysregulated even if the original gene is normal. Troyanskaya's lab has a long history of integrating massive collections of dissimilar datasets in order to make precise biological predictions and discoveries. Troyanskaya: "Interdisciplinary approaches that merge sophisticated data science with deep knowledge of biology are key to deciphering biomedical puzzles and realize the promise of precision medicine."
• URSAHD integrates >8,000 clinical gene expression profiles across >300 diseases
• Identifies unique characteristics for each disease in a datadriven manner
• Enables data-driven targeted research even for rare and understudied diseases
• Tracks therapeutic drug response in expression profiles from disease samples.
A key challenge for the diagnosis and treatment of complex human diseases is identifying their molecular basis. Here, we developed a unified computational framework, URSAHD (Unveiling RNA Sample Annotation for Human Diseases), that leverages machine learning and the hierarchy of anatomical relationships present among diseases to integrate thousands
of clinical gene expression profiles and identify molecular characteristics specific to each of the hundreds of complex diseases. URSAHD can distinguish between closely related diseases more accurately than literature-validated genes or traditional differential expression based computational approaches and is applicable to any disease, including rare and understudied ones. We demonstrate the utility of URSAHD in classifying related nervous
system cancers and experimentally verifying novel neuroblastoma-associated genes identified
by URSAHD. We highlight the applications for potential targeted drug-repurposing and for quantitatively assessing the molecular response to clinical therapies. URSAHD is freely available for public use,including the use of underlying models, at ursahd.princeton.edu..
Young-suk Lee, Arjun Krishnan, Rose Oughtred, Kara Dolinski, Chandra L. Theesfeld and Olga G. Troyanskaya.
The authors declare no conflict of interest.
In addition to Lee, Theesfeld and Troyanskaya, who is a professor of computer science and Lewis-Sigler Institute for Integrative Genomics and deputy director for genomics at the Flatiron Institute in New York, the researchers include: Rose Oughtred, Jennifer Rust, Christie S. Chang, Joseph Ryu and Kara Dolinski of Princeton; Arjun Krishnan of Michigan State University, and Vessela N. Kristensen of the University of Oslo. Support for the project was provided in part by the National Institutes of Health.
Return to top of page
Mar 4 2019 Fetal Timeline Maternal Timeline News
New research technique URSA(HD) identifies molecular basis of disease across entire human genome, comparing hundreds of diseases simultaneously. Image Credit: Troyanskaya Lab.