Michigan State University researchers have developed “DANCE”, a Python library to support deep learning models to analyze single-cell gene expression at large scale

Source: https://www.biorxiv.org/content/biorxiv/early/2022/10/24/2022.10.19.512741.full.pdf

From single-modal (RNA, protein, and open chromatin) profiling to multimodal profiling and spatial transcriptomics, single cell analysis technology has advanced rapidly in recent years. A proliferation of computational approaches, especially those based on machine learning, has thus been prompted by the rapid expansion of this subject.

The researchers state that it is difficult to replicate the results as reported in the original papers due to the variety and complexity of current approaches. Hyperparameter tuning, incompatibilities between programming languages, and the lack of a publicly available codebase are all significant hurdles. Since most of the existing work has only reported their performance on limited datasets and comparisons with insufficient methodologies, a systematic benchmarking procedure is needed to fully evaluate the methods.

In a recent study, researchers from Michigan State University, University of Washington, Zhejiang University of Technology, Stanford University, and Johnson & Johnson present DANCE , a deep learning library and reference designed to accelerate progress in single cell analysis.

DANCE offers a comprehensive set of tools for analyzing large-scale single-cell data, allowing developers to build their deep learning models with greater ease and efficiency. Also, it can be used as a benchmark to compare the performance of various computational models for single-cell analysis. DANCE currently includes support for 3 modules, 8 tasks, 32 patterns and 21 datasets.

Currently, DANCE offers:

  1. Single modality analysis.
  2. Multimodality analysis
  3. Spatial transcriptomic analysis

Auto-encoders and GNNs are widely used, supported and applicable deep learning frameworks at all levels. According to their paper, DANCE is the first comprehensive reference platform for single-cell analysis.

In this work, the researchers used new components. They began the work by compiling standard task-specific benchmark data sets and making them readily available with a single parameter adjustment. Basic classical and deep learning algorithms are implemented for each task. All benchmark data sets collected are used to refine the baselines until they achieve the same or better results than the original studies. End users only have to run a single command line where they have encapsulated all the super-parameters in advance to acquire the claimed performance of the fine-tuned models.

The team used the PyTorch Geometric (PSG) framework as a backbone. Additionally, they standardize their baselines by turning them into a fit-predict-score framework. For each task, all the algorithms implemented are refined on the set of standard benchmarks collected via a search grid to obtain the optimal model. Associated super-parameters are stored in a single command line for user reproducibility.

The team believes their work benefits the entire single-cell community of the DANCE platform. End users do not have to spend a lot of time and effort in implementing and fine-tuning the model. Instead, all they need to do to reproduce our results is run the command line. In addition, the researchers also provide support for graphics processing units (GPUs) for rapid model training based on deep learning.

Present DANCE lacks a unified set of tools for preprocessing and graph creation. The team plans to work on this in the future. They also said that DANCE would be made available as a SaaS service so users wouldn’t have to rely solely on their own device’s processing power and storage capacity.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'DANCE: A Deep Learning Library and Benchmark for Single-Cell Analysis'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, code and tool.
Please Don't Forget To Join Our ML Subreddit


Tanushree Shenwai is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new technological advancements and applying them to real life.