Scientists use math to increase the accuracy of data analysis results for biomedical research

Kyoto-Since scientists first mapped the complete human genome, attention has now turned to the dilemma of how cells use this master copy of genetic directions. It is known that when genes are turned on, get-togethers of DNA sequences in the cell nucleus are copied into shorter chain-like molecules, RNA, which deliver molecules essential for survival and specific functions to cells.

Understanding the patterns of RNAs in a cell can show which genes are active and allow researchers to speculate on what the cell is doing. The technology for measuring RNA by massively parallel DNA sequencer, RNA-sequencing, has become a standard technique over the last decade. More recently, rapid advances in technology are enabling single-cell level RNA sequencing from thousands of cells in parallel, accelerating advances in biomedical science. But quantifying RNAs from such a small material poses great technical challenges. Even with state-of-the-art equipment, the data produced from single-cell RNA sequencing data contains significant detection errors, including the so-called “dropout effect”. Moreover, even small errors in the calculations for a large number of genes can quickly add up, so that any useful facts are lost among the signal noise.

Now , a team from the Kyoto University Institute for the Advanced Study of Human Biology (WPI-ASHBi) has developed a new mathematical method that can eliminate noise and thus enable the extraction of clear signals from sequencing data. single-cell RNA. The new method successfully reduces random sampling noise in the data to enable an accurate and comprehensive understanding of a cell’s activity. The research was recently published in the journal Existence Science Alliance.

Lead author of the paper, Yusuke Imoto of ASHBi, explains: “Each gene represents a dimension different in RNA sequencing data, which means that tens of thousands of proportions must be collected from multiple cells and analyzed. Even the slightest noise in one dimension can have a major impact on downstream data analyses, so that potentially important signals are lost. That’s why we call it the “curse of dimensionality”.

To break the curse of dimensionality, the Kyoto team developed a new noise reduction method , RECODE – which stands for “resolving the curse of dimensionality” – to remove random sampling noise from single-cell RNA sequencing data. RECODE applies high-dimensional statistical theories to obtain accurate results, even for genes expressed at very low levels.

First, the team tested their method on data from a widely well-studied cellular population, human peripheral blood. They confirmed that RECODE successfully removes the curse of dimensionality to reveal expression patterns for individual genes close to their expected values.

Then compared to other state-of-the-art analysis methods, RECODE outperformed the competition by giving much more faithful representations of gene activation. Additionally, RECODE is easier to use than other methods, without relying on parameters or using machine learning to make the calculations work.

Finally, the team tested RECODE on a complex dataset from mouse embryo cells containing many different cell styles with expression patterns unique genes. While other methods muddied the results, RECODE clearly resolved gene expression levels, even for rare cell types.

Imoto concludes: “The analysis Single-cell RNA sequencing data remains technically challenging and is a developing approach, but our RECODE algorithm is a step towards being able to reveal the true behaviors of single-cell constructs. With our contribution, single-cell RNA sequencing data analysis could become a powerful research tool with massive implications in many biological fields.” Another first-time author, Tomonori Nakamura, a biologist at ASHBi and Kyoto University’s Hakubi Center for Sophisticated Analysis, adds, “By unlocking the true power of single-cell RNA sequencing, RECODE will enable researchers to discover unidentified rare cell types, leading to the development and establishment of the new field of basic science research as well as research for clinical purposes and drug discovery. »

RECODE calculation programs (Python/R code, desktop software) are available on GitHub ( .

Related Articles

Back to top button