Single-Cell Scaling
Purpose
Scaling standardizes gene expression values before PCA and other downstream steps.
In Seurat, ScaleData() centers and scales expression values for each gene. After scaling, genes are more comparable for dimensionality reduction.
Typical position:
NormalizeData() -> FindVariableFeatures() -> CellCycleScoring() -> ScaleData() -> RunPCA()
Scale Variable Features
By default, the standard workflow scales variable features before PCA.
seu <- Seurat::ScaleData(
object = seu, # Seurat object
features = Seurat::VariableFeatures(seu), # use variable features
vars.to.regress = NULL, # no regression; can use c("S.Score", "G2M.Score")
scale.max = 10, # maximum scaled value
do.scale = TRUE, # scale each feature
do.center = TRUE, # center each feature
verbose = TRUE # show progress messages
)What this does:
- centers each feature
- scales each feature
- stores the result in
scale.data
Scale All Features
Sometimes all genes are scaled, for example when downstream plotting or heatmaps need genes outside the variable feature set.
all_genes <- rownames(seu)
seu <- Seurat::ScaleData(
object = seu,
features = all_genes,
verbose = TRUE
)Scaling all genes may use more memory.
Regress Variables
ScaleData() can also regress out unwanted sources of variation.
Common examples:
nCount_RNApercent.mtS.ScoreG2M.Score
Example:
seu <- Seurat::ScaleData(
object = seu,
features = Seurat::VariableFeatures(seu),
vars.to.regress = c("nCount_RNA", "percent.mt"),
verbose = TRUE
)Cell cycle regression:
seu <- Seurat::ScaleData(
object = seu,
features = Seurat::VariableFeatures(seu),
vars.to.regress = c("S.Score", "G2M.Score"),
verbose = TRUE
)Do not regress variables blindly. Regression can remove biological signal if the variable is part of the question.
Check Scaled Data
Extract scaled data:
scaled_data <- Seurat::GetAssayData(
object = seu,
assay = "RNA",
slot = "scale.data"
)Check dimensions:
dim(scaled_data)
scaled_data[1:5, 1:5]If only variable features were scaled, scale.data may contain fewer rows than the full count matrix.
Note
Scaling is mainly preparation for PCA and related dimensionality reduction.
For a simple first workflow, scale variable features first. Add regression only when there is a clear reason.