Package 'BisqueRNA' reference manual

Title:	Decomposition of Bulk Expression with Single-Cell Sequencing
Description:	Provides tools to accurately estimate cell type abundances from heterogeneous bulk expression. A reference-based method utilizes single-cell information to generate a signature matrix and transformation of bulk expression for accurate regression based estimates. A marker-based method utilizes known cell-specific marker genes to measure relative abundances across samples. For more details, see Jew and Alvarez et al (2019) <doi:10.1101/669911>.
Authors:	Brandon Jew [aut, cre], Marcus Alvarez [aut]
Maintainer:	Brandon Jew <[email protected]>
License:	GPL-3
Version:	1.0.5
Built:	2025-03-03 04:19:45 UTC
Source:	https://github.com/cozygene/bisque

Calculate cell proportions based on single-cell data

Description

Returns proportion of each cell type out of total cells for each individual in the single-cell Expression Set

Usage

CalculateSCCellProportions(sc.eset, subject.names, cell.types)
CalculateSCCellProportions(sc.eset, subject.names, cell.types)

Arguments

`sc.eset`	Expression Set with single-cell data
`subject.names`	A character string. Name of phenoData attribute in sc.eset that indicates individual ID.
`cell.types`	A character string. Name of phenoData attribute in sc.eset that indicates cell type

Value

sc.props Matrix. Cell proportions with number of cell types rows by number of individuals columns

Correlate columns of data frame

Description

This function runs correlation between markers of a data frame or matrix, returning the values of the lower/upper triangular of the correlation matrix in a vector.

Usage

CorTri(x, method = "pearson")
CorTri(x, method = "pearson")

Arguments

`x`	Data frame or matrix. Column vectors are correlated
`method`	Character string. Name of method passed to cor. Pearson by default.

Value

cors Numeric vector. Correlation coefficients of pairs

Convert counts data in Expression Set to counts per million (CPM)

Description

Convert counts data in Expression Set to counts per million (CPM)

Usage

CountsToCPM(eset)
CountsToCPM(eset)

Arguments

eset

Expression Set containing counts assay data.

Value

eset Expression Set containing CPM assay data

Estimate cell type proportions using first PC of expression matrix

Description

Estimate cell type proportions using first PC of expression matrix

Usage

EstimatePCACellTypeProportions(x, weighted = FALSE, w = NULL)
EstimatePCACellTypeProportions(x, weighted = FALSE, w = NULL)

Arguments

`x`	A sample by gene bulk expression matrix. Genes should be marker genes
`weighted`	Boolean. If weighted=TRUE, multiply scaled gene expression by gene weights
`w`	Numeric vector. Weights of genes

Value

ret List. Attribute pcs contains matrix of PCs, where PC1 should be used as estimates for cell type abundances Attribute sdev contains eigenvalues of eigendecomposition of var-covar matrix. The 1st eigenvalue should explain most of the variance. Attribute genes contains names of genes.

Remove genes in Expression Set with zero expression in all samples

Description

Remove genes in Expression Set with zero expression in all samples

Usage

FilterUnexpressedGenes(eset, verbose = TRUE)
FilterUnexpressedGenes(eset, verbose = TRUE)

Arguments

`eset`	Expression Set
`verbose`	Boolean. Print logging info

Value

eset Expression Set with zero expression genes removed

Remove genes in Expression Set with zero variance across samples

Description

Remove genes in Expression Set with zero variance across samples

Usage

FilterZeroVarianceGenes(eset, verbose = TRUE)
FilterZeroVarianceGenes(eset, verbose = TRUE)

Arguments

`eset`	Expression Set
`verbose`	Boolean. Print logging info

Value

eset Expression Set with zero variance genes removed

Generate reference profile for cell types identified in single-cell data

Description

Averages expression within each cell type across all samples to use as reference profile.

Usage

GenerateSCReference(sc.eset, cell.types)
GenerateSCReference(sc.eset, cell.types)

Arguments

`sc.eset`	Expression Set with single-cell data
`cell.types`	A character string. Name of phenoData attribute in sc.eset that indicates cell type

Value

sc.ref Matrix. Reference profile with number of gene rows by number of cell types columns.

Return cell type proportions from bulk

Description

Calculate cell type proportions from a data frame containing bulk expression values. Uses PCA (weighted or regular) to estimate relative proportions within each cell type.

Usage

GetCTP(
  bulk,
  cell_types,
  markers,
  ct_col,
  gene_col,
  min_gene,
  max_gene,
  weighted,
  w_col,
  verbose
)
GetCTP(
  bulk,
  cell_types,
  markers,
  ct_col,
  gene_col,
  min_gene,
  max_gene,
  weighted,
  w_col,
  verbose
)

Arguments

`bulk`	Expression Set containing bulk data
`cell_types`	Character vector. Names of cell types.
`markers`	Data frame with columns specifying cluster and gene, and optionally a column for weights, typically the fold-change of the gene. Important that the genes for each cell type are row-sorted by signficance.
`ct_col`	Character string. Column name specifying cluster/cell type corresponding to each marker gene in markers.
`gene_col`	Character string. Column name specifying gene names in markers.
`min_gene`	Numeric. Min number of genes to use for each cell type.
`max_gene`	Numeric. Max number of genes to use for each cell type.
`weighted`	Boolean. Whether to use weights for gene prioritization
`w_col`	Character string. Column name for weights, such as "avg_logFC", in markers
`verbose`	Boolean. Whether to print log info during decomposition. Errors will be printed regardless.

Value

A List. Slot cors contains list of vectors with correlation coefficients. Slot ctps contains list of CTP objects returned by GetCTP

Get number of genes to use with no weighted information

Description

Get number of genes to use with no weighted information

Usage

GetNumGenes(x, min.gene = 25, max.gene = 200)
GetNumGenes(x, min.gene = 25, max.gene = 200)

Arguments

`x`	Numeric Matrix. A sample by gene expression matrix containing the marker genes.
`min.gene`	Numeric. Minimum number of genes to consider as markers.
`max.gene`	Numeric. Maximum number of genes to consider as markers.

Value

best.n Numeric. Number of genes to use

Get number of genes to use with weighted PCA

Description

Get number of genes to use with weighted PCA

Usage

GetNumGenesWeighted(x, w, min.gene = 25, max.gene = 200)
GetNumGenesWeighted(x, w, min.gene = 25, max.gene = 200)

Arguments

`x`	Numeric Matrix. A sample by gene expression matrix containing the marker genes.
`w`	Numeric Vector. The weights of the genes that correspond to the columns of x.
`min.gene`	Numeric. Minimum number of genes to consider as markers.
`max.gene`	Numeric. Maximum number of genes to consider as markers.

Value

best.n Numeric. Number of genes to use

Find overlapping genes in single-cell data, bulk data, and marker genes

Description

Find overlapping genes in single-cell data, bulk data, and marker genes

Usage

GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose)
GetOverlappingGenes(sc.eset, bulk.eset, markers, verbose)

Arguments

`sc.eset`	Expression Set with single-cell data
`bulk.eset`	Expression Set with bulk data
`markers`	Character vector. List of relevant marker genes
`verbose`	Boolean. Print logging info

Value

overlapping.genes Character vector. List of genes found in markers and both datasets.

Find overlapping samples in single-cell and bulk data

Description

Find overlapping samples in single-cell and bulk data

Usage

GetOverlappingSamples(sc.eset, bulk.eset, subject.names, verbose)
GetOverlappingSamples(sc.eset, bulk.eset, subject.names, verbose)

Arguments

`sc.eset`	Expression Set with single-cell data
`bulk.eset`	Expression Set with bulk data
`subject.names`	A character string. Name of phenoData attribute in sc.eset that indicates individual ID (that would be found in bulk.eset if overlapping)
`verbose`	Boolean. Print logging info

Value

samples A list with attributes overlapping and remaining. Each attribute refers to a character vector that lists the samples found in both datasets and samples found only in bulk, respectively

Get unique markers present in only 1 cell type

Description

Given a data frame of marker genes for cell types, returns a new data frame with non-unique markers removed.

Usage

GetUniqueMarkers(x, gene_col = "gene")
GetUniqueMarkers(x, gene_col = "gene")

Arguments

`x`	Data frame. Contains column with marker gene names
`gene_col`	Character string. Name of the column that contains the marker genes

Value

x Data frame. Markers with non-unique markers removed

Performs marker-based decomposition of bulk expression using marker genes

Description

Estimates relative abundances of cell types from PCA-based decomposition. Uses a list of marker genes to subset the expression data, and returns the first PC of each sub-matrix as the cell type fraction estimates. Optionally, weights for each marker gene can be used to prioritize genes that are highly expressed in the given cell type.

Usage

MarkerBasedDecomposition(
  bulk.eset,
  markers,
  ct_col = "cluster",
  gene_col = "gene",
  min_gene = 5,
  max_gene = 200,
  weighted = FALSE,
  w_col = "avg_logFC",
  unique_markers = TRUE,
  verbose = TRUE
)
MarkerBasedDecomposition(
  bulk.eset,
  markers,
  ct_col = "cluster",
  gene_col = "gene",
  min_gene = 5,
  max_gene = 200,
  weighted = FALSE,
  w_col = "avg_logFC",
  unique_markers = TRUE,
  verbose = TRUE
)

Arguments

`bulk.eset`	Expression Set. Normalized bulk expression data.
`markers`	Data frame with columns specifying cluster and gene, and optionally a column for weights, typically the fold-change of the gene. Important that the genes for each cell type are row-sorted by signficance.
`ct_col`	Character string. Column name specifying cluster/cell type corresponding to each marker gene in markers.
`gene_col`	Character string. Column name specifying gene names in markers.
`min_gene`	Numeric. Min number of genes to use for each cell type.
`max_gene`	Numeric. Max number of genes to use for each cell type.
`weighted`	Boolean. Whether to use weights for gene prioritization
`w_col`	Character string. Column name for weights, such as "avg_logFC", in markers
`unique_markers`	Boolean. If TRUE, subset markers to include only genes that are markers for only one cell type
`verbose`	Boolean. Whether to print log info during decomposition. Errors will be printed regardless.

Details

Note that this method expects the input bulk data to be normalized, unlike the reference-based method.

Value

A List. Slot bulk.props contains estimated relative cell type abundances. Slot var.explained contains variance explained by first 20 PCs for cell type marker genes. Slot genes.used contains vector of genes used for decomposition.

Examples

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))
res <- MarkerBasedDecomposition(sim.data$bulk.eset, sim.data$markers, weighted=FALSE)
estimated.cell.proportions <- res$bulk.props

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))
res <- MarkerBasedDecomposition(sim.data$bulk.eset, sim.data$markers, weighted=FALSE)
estimated.cell.proportions <- res$bulk.props

Performs reference-based decomposition of bulk expression using single-cell data

Description

Generates a reference profile based on single-cell data. Learns a transformation of bulk expression based on observed single-cell proportions and performs NNLS regression on these transformed values to estimate cell proportions.

Usage

ReferenceBasedDecomposition(
  bulk.eset,
  sc.eset,
  markers = NULL,
  cell.types = "cellType",
  subject.names = "SubjectName",
  use.overlap = TRUE,
  verbose = TRUE,
  old.cpm = TRUE
)
ReferenceBasedDecomposition(
  bulk.eset,
  sc.eset,
  markers = NULL,
  cell.types = "cellType",
  subject.names = "SubjectName",
  use.overlap = TRUE,
  verbose = TRUE,
  old.cpm = TRUE
)

Arguments

`bulk.eset`	Expression Set containin bulk data. No PhenoData required but if overlapping option used, IDs returned by sampleNames(bulk.eset) should match those found in sc.eset phenoData individual labels.
`sc.eset`	Expression Set containing single-cell data. PhenoData of this Expression Set should contain cell type and individual labels for each cell. Names of these fields specified by arguments below.
`markers`	Structure, such as character vector, containing marker genes to be used in decomposition. 'base::unique(base::unlist(markers))' should return a simple vector containing each gene name. If no argument or NULL provided, the method will use all available genes for decomposition.
`cell.types`	Character string. Name of phenoData attribute in sc.eset indicating cell type label for each cell
`subject.names`	Character string. Name of phenoData attribute in sc.eset indicating individual label for each cell
`use.overlap`	Boolean. Whether to use and expect overlapping samples in decomposition.
`verbose`	Boolean. Whether to print log info during decomposition. Errors will be printed regardless.
`old.cpm`	Prior to version 1.0.4 (updated in July 2020), the package converted counts to CPM after subsetting the marker genes. Github user randel pointed out that the order of these operations should be switched. Thanks randel! This option is provided for replication of older BisqueRNA but should be enabled, especially for small marker gene sets. We briefly tested this change on the cortex and adipose datasets. The original and new order of operations produce estimates that have an average correlation of 0.87 for the cortex and 0.84 for the adipose within each cell type.

Details

Expects read counts for both datasets, as they will be converted to counts per million (CPM). Two options available: Use overlapping indivudals found in both single-cell and bulk datasets to learn transformation or learn transformation from single-cell alone. The overlapping option is expected to have better performance.

Value

A list. Slot bulk.props contains a matrix of cell type proportion estimates with cell types as rows and individuals as columns. Slot sc.props contains a matrix of cell type proportions estimated directly from counting single-cell data. Slot rnorm contains Euclidean norm of the residuals for each individual's proportion estimates. Slot genes.used contains vector of genes used in decomposition. Slot transformed.bulk contains the transformed bulk expression used for decomposition. These values are generated by applying a linear transformation to the CPM expression.

Examples

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))
sim.data$sc.eset <- sim.data$sc.eset[,sim.data$sc.eset$SubjectName %in% as.character(6:10)]
res <- ReferenceBasedDecomposition(sim.data$bulk.eset, sim.data$sc.eset)
estimated.cell.proportions <- res$bulk.props

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))
sim.data$sc.eset <- sim.data$sc.eset[,sim.data$sc.eset$SubjectName %in% as.character(6:10)]
res <- ReferenceBasedDecomposition(sim.data$bulk.eset, sim.data$sc.eset)
estimated.cell.proportions <- res$bulk.props

Transforms bulk expression of a gene using only single-cell data

Description

For a specific gene, this function learns a transformation of the bulk expression to match the distribution produced by the single-cell based reference and observed single-cell based cell proportions.

Usage

SemisupervisedTransformBulk(gene, Y.train, X.pred)
SemisupervisedTransformBulk(gene, Y.train, X.pred)

Arguments

`gene`	Character string. Gene name that corresponds to row in Y.train
`Y.train`	Numeric Matrix. Number of gene rows by number of overlapping individuals columns. Contains weighted sum of reference profile by single-cell based cell proportion estimates for each individual
`X.pred`	Numeric Matrix. Number of gene rows by number of remaining individuals columns. Contains observed bulk expression for each individual to be transformed.

Value

Y.pred Numeric Matrix. One row for given gene by number of remaining individuals columns. Contains transformed bulk expression for each individual.

Converts Seurat object to Expression Set

Description

'SeuratToExpressionSet()' returns an Expression Set with phenotype data indicating cell type (cellType) and individual (SubjectName) for each cell in a Seurat object. Raw counts data is used for assay data.

Usage

SeuratToExpressionSet(
  seurat.object,
  delimiter,
  position,
  version = c("v2", "v3")
)
SeuratToExpressionSet(
  seurat.object,
  delimiter,
  position,
  version = c("v2", "v3")
)

Arguments

`seurat.object`	Seurat object with attributes raw.data, ident, and cell.names
`delimiter`	Character to split cell names with to find individual ID.
`position`	Integer indicating 1-indexed position of individual ID after splitting cell name with delimiter.
`version`	Character string. Either "v2" or "v3. Seurat version used to create Seurat object.

Details

Note that the Seurat and Biobase libraries should be attached before running this function. The delimiter and position arguments are used to infer the individual ID from the cell ID. For example, a delimiter of "-" and position of "2" indicates that the individual ID for the cell ID ACTG-2 would be 2.

Value

sc.eset Expression set containing relevant phenotype and individual data, cellType and SubjectName.

Examples


  library(Seurat)
  library(Biobase)

  # We make a class to emulate a Seurat v2 object for illustration only
  setClass("testSeuratv2", representation(cell.names = "character",
                                          ident = "character",
                                          raw.data = "matrix"))
  sc.counts <- matrix(0,nrow=3,ncol=3)
  # These barcodes correspond to a delimiter of "-" and position 2 for individual id.
  test.cell.names <- c("ATCG-1", "TAGC-2", "GTCA-3")
  test.ident <- c("cell type a", "cell type b", "cell type c")
  names(test.ident) <- test.cell.names
  colnames(sc.counts) <- test.cell.names
  test.seurat.obj <- new("testSeuratv2",
                         cell.names=test.cell.names,
                         ident=test.ident,
                         raw.data=sc.counts)

  single.cell.expression.set <- SeuratToExpressionSet(test.seurat.obj, delimiter='-',
                                                      position=2, version="v2")


library(Seurat)
  library(Biobase)

  # We make a class to emulate a Seurat v2 object for illustration only
  setClass("testSeuratv2", representation(cell.names = "character",
                                          ident = "character",
                                          raw.data = "matrix"))
  sc.counts <- matrix(0,nrow=3,ncol=3)
  # These barcodes correspond to a delimiter of "-" and position 2 for individual id.
  test.cell.names <- c("ATCG-1", "TAGC-2", "GTCA-3")
  test.ident <- c("cell type a", "cell type b", "cell type c")
  names(test.ident) <- test.cell.names
  colnames(sc.counts) <- test.cell.names
  test.seurat.obj <- new("testSeuratv2",
                         cell.names=test.cell.names,
                         ident=test.ident,
                         raw.data=sc.counts)

  single.cell.expression.set <- SeuratToExpressionSet(test.seurat.obj, delimiter='-',
                                                      position=2, version="v2")

Simulate barcode for decomposition illustration

Description

Generates a nucleotide barcode similar to those generated by 10x chromium sequencing platforms for illustration purposes. Generates barcode and individual ID separated by '-' delimiter.

Usage

SimulateBarcode(index, individual, barcode.length)
SimulateBarcode(index, individual, barcode.length)

Arguments

`index`	Integer. Index of cell ID from 0 to barcode.length to the fourth power. Will generate a unique nucleotide barcode for each index.
`individual`	Character. ID of individual that the cell is from.
`barcode.length`	Integer. Length of nucleotide barcode.

Value

Simulated barcode for cell from an individual

Simulate data for decomposition illustration

Description

Simulates bulk and single-cell expression, as well as marker genes and true proportions that can be used as an example of decomposition

Usage

SimulateData(n.ind, n.genes, n.cells, cell.types, avg.props)
SimulateData(n.ind, n.genes, n.cells, cell.types, avg.props)

Arguments

`n.ind`	Integer. Number of individuals to simulate
`n.genes`	Integer. Number of genes to simulate
`n.cells`	Integer. Number of cells per individual for single-cell data
`cell.types`	Character vector. List of cell types to simulate
`avg.props`	Numeric vector. List of average proportions for given cell types. Should be same length as cell.types and sum to 1

Value

A list with simulated single-cell in slot 'sc.eset' and bulk in 'bulk.eset', as well as true proportions in 'props' and marker genes in 'markers'.

Examples

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))

Transforms bulk expression of a gene given overlapping data

Description

For a specific gene, this function uses linear regression to learn a transformation of the bulk expression to match the values produced by the single-cell based reference and observed single-cell based cell proportions.

Usage

SupervisedTransformBulk(gene, Y.train, X.train, X.pred)
SupervisedTransformBulk(gene, Y.train, X.train, X.pred)

Arguments

`gene`	Character string. Gene name that corresponds to row in Y.train
`Y.train`	Numeric Matrix. Number of gene rows by number of overlapping individuals columns. Contains weighted sum of reference profile by single-cell based cell proportion estimates for each individual
`X.train`	Numeric Matrix. Number of gene rows by number of overlapping individuals columns. Contains observed bulk expression for each individual
`X.pred`	Numeric Matrix. Number of gene rows by number of remaining individuals columns. Contains observed bulk expression for each individual to be transformed.

Details

If a linear transformation cannot be learned for a gene (zero variance in observed bulk or single-cell based weighted sums), a vector of NaNs will be returned of the expected length (length of X.pred)

Value

Y.pred Numeric Matrix. One row for given gene by number of remaining individuals columns. Contains transformed bulk expression for each individual.

Package 'BisqueRNA'

Help Index

Calculate cell proportions based on single-cell data

Description

Usage

Arguments

Value

Correlate columns of data frame

Description

Usage

Arguments

Value

Convert counts data in Expression Set to counts per million (CPM)

Description

Usage

Arguments

Value

Estimate cell type proportions using first PC of expression matrix

Description

Usage

Arguments

Value

Remove genes in Expression Set with zero expression in all samples

Description

Usage

Arguments

Value

Remove genes in Expression Set with zero variance across samples

Description

Usage

Arguments

Value

Generate reference profile for cell types identified in single-cell data

Description

Usage

Arguments

Value

Return cell type proportions from bulk

Description

Usage

Arguments

Value

Get number of genes to use with no weighted information

Description

Usage

Arguments

Value

Get number of genes to use with weighted PCA

Description

Usage

Arguments

Value

Find overlapping genes in single-cell data, bulk data, and marker genes

Description

Usage

Arguments

Value

Find overlapping samples in single-cell and bulk data

Description

Usage

Arguments

Value

Get unique markers present in only 1 cell type

Description

Usage

Arguments

Value

Performs marker-based decomposition of bulk expression using marker genes

Description

Usage

Arguments

Details

Value

Examples

Performs reference-based decomposition of bulk expression using single-cell data

Description

Usage

Arguments

Details

Value