Biotechnology

A powerful new tool for advancing genomics and disease research

a–c), mixed human cell lines (df), the primitive intestinal tube of the mouse (gi) and 10× Single-Cell Multiome data for embryonic mouse brain (jl), human peripheral blood mononuclear cells (PBMC) (mo ) and human lymph node (pr). a, d, g, j, m, p UMAP visualization where cells are colored according to cell type/labels/clusters. b, e, h, k, n, q< /b> Same UMAP visualization, but cells are colored according to the cell displacement score (CBS). c, f, i, l, o, r CBS allocation of cells from different cell types/lots/clusters. Frames are colored by clusters of cells using the same color palette as the first column. The median line, box limits, top line, and bottom line of the box plots represent the median, 25th to 75th percentile range, 25th percentile – 1.5 × interquartile range (IQR), and 75th percentile + 1.5 × IQR, respectively. Author: Nature Communications (2022). DOI: 10.1038/s41467-022-33194-z” width=”800″ height=”529″/>

Intrinsic cleavage biases affect single-cell ATAC-seq data analysis. Visualization of the effect of internal cleavage bias in different cell clusters derived from scATAC-seq data for different biological samples and different experimental platforms: human hematopoietic cells (ac), mixed human cell lines (de), the primitive intestinal tube of the mouse (Mri), as well as 10x Single-Cell Multiome data for embryonic mouse brain (jl), human peripheral blood mononuclear cells (PBMCs) (mo), and the human lymph node (oldp). a, d, g, j, m, old UMAP visualization where cells are colored based on cell type/labels/clusters. b, d, h, k, n, q Same UMAP imaging, but cells are stained by cell displacement score (CBS). c, e, i, l, o, p CBS distribution of cells from different cell types / lots / clusters. Frames are colored by clusters of cells using the same color palette as the first column. The median line, box limits, top line, and bottom line of the box plots represent the median, 25th to 75th percentile range, 25th percentile – 1.5 × interquartile range (IQR), and 75th percentile + 1.5 × IQR, respectively. credit: Communications of nature (2022). DOI: 10.1038/s41467-022-33194-z

UVA Health researchers have developed an important new tool to help scientists sort the signal from the noise when studying the genetic causes of cancer and other diseases. In addition to advancing research and potentially speeding up new treatments, the new tool could help improve cancer diagnosis by making it easier for doctors to detect cancer cells.


Developed by UVA’s Chongzhi Zang, Ph.D., and his team and collaborators, the new tool is mathematical model which will help ensure integrity”big data” about the building blocks of our chromosomes, the genetic material called chromatin. Chromatin—a combination of DNA and protein—plays an important role in controlling the activity of our genes. When chromatin goes awry, it can turn a healthy cell into cancer or contribute to other diseases.

Scientists can now probe inside chromatin individual cells using an advanced technology called “single-cell ATAC-seq”, but this creates a huge amount of data, including a lot of noise and bias. Zang’s new tool cuts through this, saving scientists from false leads and wasted effort.

As in the best of times, large-scale single-cell genomics research is like “hunting for a needle in a haystack,” Zang says. But his new tool will make it a lot easier by removing a lot of the bad hay.

“Using the traditional way of analyzing data, you can see some patterns that look like real signals of a certain chromatin state, but are actually false due to the bias of the experimental technology itself. Such false signals can confuse scientists,” said Zhang, a computational biologist. from the UVA Center for Public Health Genomics and the UVA Health Cancer Center. “We developed a model to better capture and filter out such spurious signals, so that the real needle we’re looking for can more easily stand out from the haystack.”

About the genomics tool

Zang’s new tool adapts the model from number theory and cryptology is called “simplex coding”. He and his colleagues used this to encode DNA sequences into mathematical forms and ultimately convert the complex genome sequence into a much simpler mathematical form. They can then compare different shapes to detect bias and noise in sequential data that cannot be easily found using conventional approaches.

“The complexity of DNA sequences increases exponentially as they get longer. They are difficult to model because a typical data set contains millions of sequences from thousands of cells,” said Shengen Sean Hu, Ph.D., a researcher in Zang’s lab. lead author of this paper. “But the simplex coding model can provide an accurate estimate of sequence displacements because of its superior mathematical quality.”

Tests of the tool have shown that it is much better at analyzing complex single-cell data to characterize different cell types. This is important for both basic biological research and disease diagnosis, where doctors need to detect small numbers of disease cells in much larger samples, ranging from tens of thousands to millions of cells.

“Anomalies were not easy to find because they were confused with real signals and hidden in big data. “Maybe this won’t be a big problem if people select only the strongest signals from a large number of cells,” said Zhang, who recently co-led several other single-cell genomic studies studying coronary heart disease and gut development.

“But if you look at the single-cell data, there’s no more low-hanging fruit. Signals are always weak at the individual cell level, and the effect of noise and bias can be disastrous. Bias correction is often overlooked but can be vital for single-cell data analysis.’

To make their new tool widely available, the researchers created free, open-source software and posted it online. The software can be found on GitHub.

“We hope that this tool can benefit the biomedical research community in the study of chromatin biology and genomics and ultimately help research diseases,” Zhang said. “It’s always exciting to see how our colleagues are using the tools we’ve developed to make important scientific discoveries in their own research.”

The researchers published their results in Communications of nature.

Additional information:
Schengen Sean Hu et al. Estimating intrinsic bias to improve the analysis of bulk and single-cell chromatin accessibility profiles using SELMA, Communications of nature (2022). DOI: 10.1038/s41467-022-33194-z

Software: github.com/zang-lab/SELMA and at doi.org/10.5281/zenodo.7048767

Citation: A Powerful New Tool to Advance Genomics and Disease Research (November 22, 2022), Retrieved November 22, 2022, from https://phys.org/news/2022-11-powerful-tool-advance-genomics-disease.html

This document is subject to copyright. Except in good faith for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.



https://phys.org/news/2022-11-powerful-tool-advance-genomics-disease.html A powerful new tool for advancing genomics and disease research

Back to top button