Package: dsb 2.0.1

Matthew Mulè

dsb: Normalize & Denoise Droplet Single Cell Protein Data (CITE-Seq)

This lightweight R package provides a method for normalizing and denoising protein expression data from droplet based single cell experiments. Raw protein Unique Molecular Index (UMI) counts from sequencing DNA-conjugated antibody derived tags (ADT) in droplets (e.g. 'CITE-seq') have substantial measurement noise. Our experiments and computational modeling revealed two major components of this noise: 1) protein-specific noise originating from ambient, unbound antibody encapsulated in droplets that can be accurately inferred via the expected protein counts detected in empty droplets, and 2) droplet/cell-specific noise revealed via the shared variance component associated with isotype antibody controls and background protein counts in each cell. This package normalizes and removes both of these sources of noise from raw protein data derived from methods such as 'CITE-seq', 'REAP-seq', 'ASAP-seq', 'TEA-seq', 'proteogenomic' data from the Mission Bio platform, etc. See the vignette for tutorials on how to integrate dsb with 'Seurat' and 'Bioconductor' and how to use dsb in 'Python'. Please see our paper Mulè M.P., Martins A.J., and Tsang J.S. Nature Communications 2022 <https://www.nature.com/articles/s41467-022-29356-8> for more details on the method.

Authors:Matthew Mulè [aut, cre], Andrew Martins [aut], John Tsang [pdr]

dsb_2.0.1.tar.gz
dsb_2.0.1.zip(r-4.7)dsb_2.0.1.zip(r-4.6)dsb_2.0.1.zip(r-4.5)
dsb_2.0.1.tgz(r-4.6-any)dsb_2.0.1.tgz(r-4.5-any)
dsb_2.0.1.tar.gz(r-4.7-any)dsb_2.0.1.tar.gz(r-4.6-any)
dsb_2.0.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
dsb/json (API)

# Install 'dsb' in R:
install.packages('dsb', repos = c('https://niaid.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/niaid/dsb/issues

Datasets:
  • cells_citeseq_mtx - Small example CITE-seq protein dataset for 87 surface protein in 2872 cells
  • empty_drop_citeseq_mtx - Small example CITE-seq protein dataset for 87 surface protein in 8005 empty droplets

On CRAN:

Conda:

cite-seqniaid-tsang-lab

8.09 score 71 stars 172 scripts 514 downloads 3 exports 4 dependencies

Last updated from:aab21279fe. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK136
source / vignettesOK214
linux-release-x86_64OK137
macos-release-arm64OK84
macos-oldrel-arm64OK78
windows-develOK132
windows-releaseOK85
windows-oldrelOK75
wasm-releaseOK144

Exports:%>%DSBNormalizeProteinModelNegativeADTnorm

Dependencies:limmamagrittrmcluststatmod

Understanding how the dsb method works
Load example package data | log transform | dsb step 1 removal of ambient noise | dsb step II Part I fitting single cell models to extract background | dsb Part II step II | How can I interpret the dsb values?

Last update: 2025-11-02
Started: 2022-03-02

Normalizing ADTs for datasets without empty droplets with the dsb function ModelNegativeADTnorm

Last update: 2025-04-02
Started: 2022-03-11

Fast normalization for large datasets with or without empty drops

Last update: 2025-04-01
Started: 2025-04-01

End-to-end CITE-seq analysis workflow using dsb for ADT normalization and Seurat for multimodal clustering
Table of Contents | Background and motivation | Installation and quick overview | Download public 10X Genomics data | Step 1 A note on alignment of ADTs | Step 2 Load RNA and ADT data and define droplet quality control metadata | Step 3 Quality control cells and background droplets | Optional step; remove proteins without staining | Step 4 Normalize protein data with the DSBNormalizeProtein Function | Integrating dsb with Seurat | Clustering cells based on dsb normalized protein using Seurat | dsb derived cluster interpretation | Weighted Nearest Neighbor multimodal clustering using dsb normalized values with Seurat | Method 1 -- Seurat WNN default with PCA on dsb normalized protein | Method 2-- Seurat WNN with dsb normalized protein directly without PCA | Some recent publications using dsb | using other alignment algorithms | A note on Cell Ranger --expect-cells

Last update: 2024-06-15
Started: 2022-03-14

Additional Topics - qualtile.clipping - scale.factor - Python and Bioc - multiplexing - multi batch - FAQ
Integrating dsb with Bioconductor | Using dsb in Python | Using dsb with data lacking isotype controls | Using dsb with sample multiplexing experiments | Advanced usage - return internal stats used by dsb | outlier clipping with the quantile.clipping argument | Using a different background scaling method | Frequently Asked Questions | check for outliers in dsb normalized values

Last update: 2023-03-10
Started: 2022-03-02

Readme and manuals

Help Manual

Help pageTopics
small example CITE-seq protein dataset for 87 surface protein in 2872 cellscells_citeseq_mtx
DSBNormalizeProtein R function: Normalize single cell antibody derived tag (ADT) protein data. This function corrects for both protein specific and cell to cell technical noise in antibody derived tag (ADT) data. For datasets without access to empty drops use dsb::ModelNegativeADTnorm. See <https://www.nature.com/articles/s41467-022-29356-8> for details of the algorithm.DSBNormalizeProtein
small example CITE-seq protein dataset for 87 surface protein in 8005 empty dropletsempty_drop_citeseq_mtx
ModelNegativeADTnorm R function: Normalize single cell antibody derived tag (ADT) protein data. This function defines the background level for each protein by fitting a 2 component Gaussian mixture after log transformation. Empty Droplet ADT counts are not supplied. The fitted background mean of each protein across all cells is subtracted from the log transformed counts. Note this is distinct from and unrelated to the 2 component mixture used in the second step of `DSBNormalizeProtein` which is fitted to all proteins of each cell. After this background correction step, `ModelNegativeADTnorm` then models and removes technical cell to cell variations using the same step II procedure as in the DSBNormalizeProtein function using identical function arguments. This is a experimental function that performs well in testing and is motivated by our observation in Supplementary Fig 1 in the dsb paper showing that the fitted background mean was concordant with the mean of ambient ADTs in both empty droplets and unstained control cells. We recommend using `ModelNegativeADTnorm` if empty droplets are not available. See <https://www.nature.com/articles/s41467-022-29356-8> for details of the algorithm.ModelNegativeADTnorm