vignettes/seq-workflow-visium-hd-cell.Rmd
seq-workflow-visium-hd-cell.Rmd
Authors: Yixing E. Dong1, Ellis Patrick2.
Last modified: 15 September, 2025.
In this instructor-led live demo, we analyze Visium HD data segmented
into cells, demonstrating the use of the
SpatialFeatureExperiment
class in R to import, visualize,
perform quality control, transfer labels from a single-cell reference to
annotate the data, identify spatial domains, and select marker genes.
Spatial transcriptomics (ST) data with annotated cell types can reveal
key disease characteristics through neighborhood enrichment and spatial
statistics, which will be demonstrated in the second half of the
workshop. The complete analysis provided by existing tools highlights
how easily researchers can transform raw counts from a Visium HD
experiment into biological insights using Bioconductor and CRAN.
Additional ST workflows are available at https://lmweber.org/OSTA/.
SpatialExperiment
classes and object
manipulationsf
classes and object
manipulationAdditional reading:
The relevant data by Oliveira et al. (2024) are stored on Open Science Framework, and can be downloaded as delayed format cached on your laptop through OSTA.data:
library(OSTA.data)
id <- "VisiumHD_HumanColon_Oliveira"
pa <- OSTA.data_load(id)
dir.create(td <- tempfile())
unzip(pa, exdir=td)
list.files(file.path(td, "segmented_outputs"))
#> [1] "cell_segmentations.geojson" "filtered_feature_cell_matrix.h5"
#> [3] "nucleus_segmentations.geojson" "spatial"
We prepared a reader for Visium HD segmented output at
./vignettes/utils.R
.
Bioconductor: Biobase, BiocStyle, Banksy, CARDspa, DropletUtils, GenomeInfoDb, ggspavis, OSTA.data, S4Vectors, scater, scran, scuttle, SingleCellExperiment, spacexr, SpatialExperiment, SpatialFeatureExperiment, spicyR, SpotSweeper, Statial, Voyager
CRAN: dplyr, ggplot2, grDevices, grid, here, igraph, magick, pals, patchwork, pheatmap, plotly, rlang, sf, tidyr, tidySpatialExperiment, tidyverse
The key steps of Visium HD data analysis are described in this workshop, along with diverse visualizations of the methods used. The goal is to help beginner bioinformaticians working with Visium HD data become familiar with each step involved and complete a standard analysis pipeline from start to finish.
SpatialFeatureExperiment
objectAs of June 2025, Visium HD enables direct output of H&E-segmented
cell-level data from SpaceRanger
v4. This automation removes the need for an additional segmentation
tool, such as bin2cell
. Since biologists often find
cellular units more intuitive for deriving insights, this output may
attract more interest than bin-level data. However, bins outside cell
boundaries can still provide valuable information for many research
topics in the tumor microenvironment. In this workflow, we will
demonstrate several analysis steps at the cell level using the same
dataset, subsetted to the same region, as in the Sequencing-based
platforms - Workflow: Visium HD chapter of OSTA book.
library(Banksy)
library(CARDspa)
library(dplyr)
library(DropletUtils)
library(ggplot2)
library(grDevices)
library(grid)
library(here)
library(igraph)
library(magick)
library(OSTA.data)
library(patchwork)
library(pheatmap)
library(plotly)
library(rlang)
library(S4Vectors)
library(scater)
library(scran)
library(scuttle)
library(sf)
library(SingleCellExperiment)
library(spacexr)
library(SpatialExperiment)
library(SpatialFeatureExperiment)
library(spicyR)
library(SpotSweeper)
library(Statial)
library(tidyr)
library(tidySpatialExperiment)
library(tidyverse)
library(Voyager)
We source the reader in utils.R
and specify alias for
plotting themes.
source(here::here("vignettes/utils.R"))
# ggplot theme alias
center_title <- theme(plot.title = element_text(hjust = 0.5))
no_legend <- theme(legend.position = "none")
gradient_theme <- theme(plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.key.width = grid::unit(0.5, "lines"),
legend.key.height = grid::unit(1, "lines"))
thin_discrete_legend <- theme(legend.key.size = grid::unit(0.5, "lines"))
enlarge_legend_dot <- guides(col=guide_legend(override.aes=list(size=3)))
common_args_heatmap <- list(cellwidth=10, cellheight=10,
treeheight_row=5, treeheight_col=5)
Output files from SpaceRanger v4 are listed in
/segmented_outputs
folder, with the expected structure
detailed in the Background - Importing data chapter of the OSTA
book.
id <- "VisiumHD_HumanColon_Oliveira"
pa <- OSTA.data_load(id)
dir.create(td <- tempfile())
unzip(pa, exdir=td)
Read in Visium HD cell segmented data as a
SpatialFeatureExperiment
object.
vhdsfe <- readVisiumHDCellSeg(td, res = "hires")
#> Reading layer `cell_segmentations' from data source
#> `/tmp/RtmptDBoqN/file8e25ebc7d2/segmented_outputs/cell_segmentations.geojson'
#> using driver `GeoJSON'
#> Simple feature collection with 220704 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 40587.5 ymin: -28.8 xmax: 65202.7 ymax: 22701
#> Geodetic CRS: WGS 84
vhdsfe
#> # A SpatialExperiment-tibble abstraction: 220,703 × 8
#> # Features = 18132 | Cells = 220703 | Assays = counts
#> .cell Sample Barcode sample_id pxl_col_in_hires pxl_row_in_hires
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 3 /tmp/RtmptDBoqN/fi… cellid… sample01 3555. 3571.
#> 2 4 /tmp/RtmptDBoqN/fi… cellid… sample01 3551. 3569.
#> 3 5 /tmp/RtmptDBoqN/fi… cellid… sample01 3557. 3570.
#> 4 6 /tmp/RtmptDBoqN/fi… cellid… sample01 3549. 3570.
#> 5 7 /tmp/RtmptDBoqN/fi… cellid… sample01 3565. 3570.
#> 6 8 /tmp/RtmptDBoqN/fi… cellid… sample01 3553. 3571.
#> 7 9 /tmp/RtmptDBoqN/fi… cellid… sample01 3557. 3573.
#> 8 10 /tmp/RtmptDBoqN/fi… cellid… sample01 3562. 3572.
#> 9 11 /tmp/RtmptDBoqN/fi… cellid… sample01 3571. 3571.
#> 10 12 /tmp/RtmptDBoqN/fi… cellid… sample01 3546. 3570.
#> # ℹ 220,693 more rows
#> # ℹ 2 more variables: pxl_col_in_fullres <dbl>, pxl_row_in_fullres <dbl>
#>
#> unit:
#> Geometries:
#> colGeometries: centroids (POINT), cellSeg (POLYGON), updatecentroids (POINT), updatecellSeg (POLYGON)
#>
#> Graphs:
#> sample01:
To keep runtime low, our downstream analysis will be performed on a subset area of 16 µm bins (1/64 of the data coverage area). We restrict to the same region from as we have done in the OSTA book.
vhdsub <- subsetVisiumHD(vhdsfe,
xmin = 4214.937, xmax = 4465.761,
ymin = 2979.286, ymax = 3211.817)
Here, as we have restricted the analysis to only 1/64 of the entire
tissue coverage, the number of cells in the subset sf
object should also be reduced by roughly this fraction, depending on
tissue density.
Show subset region used for analysis:
getBoxRange <- function(obj){
data <- spatialCoords(obj) %>% as.data.frame() %>%
summarise(xmin = min(pxl_col_in_fullres), xmax = max(pxl_col_in_fullres),
ymin = min(pxl_row_in_fullres), ymax = max(pxl_row_in_fullres))
return(data)
}
plotBox <- function(data, col, lty, lwd){
geom_rect(data = data,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
color = col, fill = NA, linetype = lty, linewidth = lwd)
}
plotImage(vhdsfe) +
plotBox(getBoxRange(vhdsfe), col = "#3b3b3b", lty = 2, lwd = 1/2) +
plotBox(getBoxRange(vhdsub), col = "black", lty = 4, lwd = 2/3) +
ggtitle("Grey: data coverage; Black: subset region") +
center_title
Plot cell polygons overlay H&E image for this subset region (in black box). Here is an overview of the subset of tissue section that we will work on:
vhdsub$polygon <- 1
p0 <- plotImage(vhdsub, bbox = unlist(getBoxRange(vhdsub)))
p1 <- plotSpatialFeature(vhdsub, bbox = unlist(getBoxRange(vhdsub)),
features = "polygon", alpha = 0.1, fill = NA, color = "white",
colGeometryName = "updatecellSeg", image_id = "hires") +
no_legend
(p0 + ggtitle("H&E region") |
p1 + ggtitle("Overlayed with cell boundaries")) &
center_title
Get cell-level QC metrics for visualization with scuttle.
# Visualize library size to the polygons
sub <- list(mt = grep("^MT-", rownames(vhdsub)))
vhdsub <- addPerCellQCMetrics(vhdsub, subsets = sub)
vhdsub$log_sum <- log1p(vhdsub$sum)
CD <- colData(vhdsub) %>% as.data.frame()
head(CD %>% select(Barcode, sum, log_sum, subsets_mt_percent))
#> Barcode sum log_sum subsets_mt_percent
#> 84452 cellid_000084452-1 139 4.941642 3.597122
#> 84464 cellid_000084464-1 553 6.317165 2.169982
#> 84472 cellid_000084472-1 1218 7.105786 2.298851
#> 84476 cellid_000084476-1 1004 6.912743 4.681275
#> 84504 cellid_000084504-1 835 6.728629 5.029940
#> 84560 cellid_000084560-1 436 6.079933 2.981651
Here is an overview of the cells colored by their log library size and mitochondrial percentage.
p0 <- plotSpatialFeature(vhdsub, features = "log_sum",
colGeometryName = "updatecellSeg") +
ggtitle("log library size")
p1 <- plotSpatialFeature(vhdsub, features = "subsets_mt_percent",
colGeometryName = "updatecellSeg") +
ggtitle("% mitochondrial")
(p0 | p1) &
gradient_theme &
scale_fill_gradientn(colors = pals::jet())
SpotSweeper determines outliers based on low log-library size, few uniquely detected features, or a high mitochondrial fraction compared to their surrounding neighborhood:
vhdsub <- localOutliers(vhdsub, metric="sum", direction="lower", log=TRUE)
vhdsub <- localOutliers(vhdsub, metric="detected", direction="lower", log=TRUE)
vhdsub <- localOutliers(vhdsub, metric="subsets_mt_percent", direction="higher", log=TRUE)
vhdsub$discard <-
vhdsub$sum_outliers |
vhdsub$detected_outliers |
vhdsub$subsets_mt_percent_outliers
# tabulate number of cells retained vs. removed by any criterion
table(vhdsub$discard)
#>
#> FALSE TRUE
#> 4406 13
We can visualize the local outliers identified by
SpotSweeper
spatially. However, they are difficult to see,
as only a few small cells are flagged as outliers.
common_args_qc <- list(fill = NA, color = "grey90", linewidth = 0.1,
colGeometryName = "updatecellSeg")
p0 <- exec(plotSpatialFeature,
vhdsub, features = "sum_outliers", !!!common_args_qc) +
ggtitle("low_lib_size")
p1 <- exec(plotSpatialFeature,
vhdsub, features = "detected_outliers", !!!common_args_qc) +
ggtitle("low_n_features")
p2 <- exec(plotSpatialFeature,
vhdsub, features = "discard", !!!common_args_qc) +
ggtitle("discard")
p0 + p1 + p2 +
plot_layout(nrow=1, guides="collect") &
thin_discrete_legend & center_title &
scale_fill_manual("discard", values=c("white", "red"))
This Visium HD dataset provides matched scRNA-seq for the same
tissue, which avoids tissue and batch effects that often complicate
label transfer. The Level1
column contains broad,
interpretable populations that are well suited for deconvolution and
downstream interpretation. We will use these labels to map likely cell
identities onto our Visium HD cells.
First lets load in the data from the OSTA.data
package.
## SCE ref
id <- "Chromium_HumanColon_Oliveira"
## Download the data for OSTA.data and unzip the files
pa <- OSTA.data_load(id)
dir.create(td2 <- tempfile())
unzip(pa, exdir=td2)
# read into 'SingleCellExperiment'
fs <- list.files(td2, full.names=TRUE)
h5 <- grep("h5$", fs, value=TRUE)
sce <- read10xCounts(h5, col.names=TRUE)
# add cell metadata
csv <- grep("csv$", fs, value=TRUE)
cd <- read.csv(csv, row.names=1)
colData(sce)[names(cd)] <- cd[colnames(sce), ]
# use gene symbols as feature names
rownames(sce) <- make.unique(rowData(sce)$Symbol)
# exclude cells deemed to be of low-quality
sce <- sce[, sce$QCFilter == "Keep"]
We are wanting to use this single cell data as a reference. It has
two resolutions of cell annotations. Lets check out what cells labels
are in Level1
.
# tabulate subpopulations
table(sce$Level1)
#>
#> B cells Endothelial Fibroblast
#> 33611 7969 28653
#> Intestinal Epithelial Myeloid Neuronal
#> 22763 25105 4199
#> Smooth Muscle T cells Tumor
#> 43308 29272 65626
Spaces can make manipulating data tricky sometimes so lets replace them with underscores.
sce$Level1 <- gsub(" ", "_", sce$Level1)
For speed we down-sample to at most 1,000 cells per reference class. This preserves diversity within each class while keeping memory and runtime manageable.
There are many ways to transfer labels from single-cell data to spatial data. We use RCTD because it performs competitively in public benchmarks, models mixed signals explicitly, and assigns each spatial unit a clear class as a singlet or as one of several doublet types. RCTD was originally designed for spot-based platforms such as 10x Visium, where each spot captures RNA from multiple cells. It treats the observed counts in a spot as a mixture of reference cell-type expression profiles and uses a likelihood model to estimate non-negative weights for each candidate cell type. It then compares a singlet model against a doublet model to determine whether one cell type is sufficient to explain the counts or whether two are required. The method also learns scale factors to account for platform differences and total RNA content, which helps it transfer labels across technologies.
In our setting we apply RCTD at cellular resolution using segmented Visium HD data. Although each unit is intended to represent a single cell, there are still cases where the measured counts reflect more than one cell type. This can arise from partial volume effects at cell boundaries, segmentation errors, tightly packed regions, or local diffusion of transcripts. Running RCTD in doublet mode remains useful here because it can flag these cases and provide a principled estimate of the most likely composition. In practice we expect most units to be classified as singlets, with a minority of doublets that warrant closer inspection or additional filtering.
Unfortunately running RCTD is slow, so we will load a precompiled version.
# data("rctd_result_level1.RData", package = "EuroBioC2025_OSTA_Workshop")
load(here::here("data/rctd_result_level1.RData"))
!!! Do not run this in the workshop it will take 30 mins.
# ## Perform initial preprocessing of the visiumHD and single cell reference data
# rctd_data <- createRctd(spatial_experiment = vhdsub, # Our visium HD data
# reference_experiment = sce[, down_sample],# Down-sampled single cell reference
# cell_type_col="Level1", # We will use the Level1 resolution of cell types.
# UMI_min = 0) # To make things easier, we won't filter away cells that RCTD thinks are poor quality.
#
# ## Perform the deconvolution.
# rctd_result_level1 <- runRctd(rctd_data, max_cores = 4, rctd_mode = "doublet")
# # save(rctd_result_level1, file = here::here("data/rctd_result_level1.RData"))
Note: RCTD does have some internal QC, so lets check if anything was filtered out prior to running deconvolution.
dim(vhdsub)
#> [1] 18132 4419
dim(rctd_result_level1)
#> [1] 9 4418
setdiff(colnames(vhdsub), colnames(rctd_result_level1))
#> [1] "91874"
vhdsub$discard[colnames(vhdsub) == "91874"]
#> [1] TRUE
vhdsub$sum_outliers[colnames(vhdsub) == "91874"]
#> [1] TRUE
vhdsub$sum[colnames(vhdsub) == "91874"]
#> [1] 14
head(sort(vhdsub$sum))
#> [1] 14 18 19 19 22 23
The spot labels are saved in the colData()
as
spot_class
. This provides a quick quality check: a majority
of singlet classifications at cell resolution is expected and supports
the validity of the H&E-guided segmentation.
table(rctd_result_level1$spot_class)
#>
#> reject singlet doublet_certain doublet_uncertain
#> 115 2660 1347 296
We then synchronise the deconvolution labels back into our
vhdsub
SpatialFeatureExperiment
object by
matching cell ids to keep everything aligned.
# Subset to RCTD filtered object for convenience
vhdsub <- vhdsub[, colnames(vhdsub)]
# Here we use match to ensure that we match the cell IDs
vhdsub$cellTypeClass <- rctd_result_level1$spot_class[
match(colnames(vhdsub),colnames(rctd_result_level1))]
vhdsub$cellType <- rctd_result_level1$first_type[
match(colnames(vhdsub), colnames(rctd_result_level1))]
# Double check that the numbers of the cell types is consistent
table(vhdsub$cellType)
#>
#> B_cells Endothelial Fibroblast
#> 497 460 1010
#> Intestinal_Epithelial Myeloid Neuronal
#> 529 547 11
#> Smooth_Muscle T_cells Tumor
#> 90 262 1012
We can visualize the spatial location of singlets, doublets, or the
assigned cell types by RCTD
.
p0 <- plotSpatialFeature(vhdsub, features = "cellTypeClass",
colGeometryName = "updatecellSeg") +
scale_fill_manual(values = unname(pals::okabe()))
p1 <- plotSpatialFeature(vhdsub, features = "cellType",
colGeometryName = "updatecellSeg") +
scale_fill_manual(values = unname(pals::trubetskoy()))
(p0 | p1) & thin_discrete_legend
Previously, we found one cell was flagged as a local outlier by
SpotSweeper
, and now we can check its property.
table(vhdsub$subsets_mt_percent_outliers == TRUE)
#>
#> FALSE TRUE
#> 4418 1
vhdsub$cellType[vhdsub$subsets_mt_percent_outliers == TRUE]
#> [1] "T_cells"
vhdsub$cellTypeClass[vhdsub$subsets_mt_percent_outliers == TRUE]
#> [1] doublet_uncertain
#> Levels: reject singlet doublet_certain doublet_uncertain
Filter out low quality cells from the
SpatialFeatureExperiment
object, prior to spatial
clustering.
vhdsub <- vhdsub[, !vhdsub$discard]
dim(vhdsub)
#> [1] 18132 4406
First, we identify highly variable genes (HVGs) in the target area. Note that spatially variable genes (SVGs) could be used instead.
vhdsub <- logNormCounts(vhdsub)
dec <- modelGeneVar(vhdsub)
hvg <- getTopHVGs(dec, n=3e3)
Banksy utilizes a pair of spatial kernels to capture gene expression variation, followed by dimension reduction and graph-based clustering to identify spatial domains.
# restrict to selected features
tmp <- vhdsub[hvg, ]
# compute spatially aware 'Banksy' PCs
set.seed(1000)
tmp <- computeBanksy(tmp,
assay_name = "logcounts",
coord_names = c("array_row", "array_col"),
k_geom = 8) # consider first order neighbors
tmp <- runBanksyPCA(tmp,
lambda = 0.2, # use little spatial information,
npcs = 20)
reducedDim(vhdsub, "PCA") <- reducedDim(tmp)
set.seed(1000)
tmp <- Banksy::clusterBanksy(tmp, lambda = 0.2, npcs = 20, resolution = 0.8)
vhdsub$Banksy <- tmp$clust_M0_lam0.2_k50_res0.8
table(vhdsub$Banksy)
#>
#> 1 2 3 4 5 6 7
#> 1287 844 700 560 471 281 263
Now we can visualize clustering result spatially.
plotSpatialFeature(vhdsub, features = "Banksy",
colGeometryName = "updatecellSeg") +
scale_fill_manual(values = unname(pals::kelly())) +
thin_discrete_legend
Next, we can perform differential gene expresison (DGE) analysis to identify markers for each cluster. We also compute the cluster-wise mean expression of selected markers:
# differential gene expression analysis
mgs <- findMarkers(vhdsub, groups = vhdsub$Banksy, direction = "up")
# select for a few markers per cluster
top <- lapply(mgs, \(df) rownames(df)[df$Top <= 2])
top <- unique(unlist(top))
# average expression by clusters
pbs <- aggregateAcrossCells(vhdsub,
ids = vhdsub$Banksy, subset.row = top,
use.assay.type = "logcounts", statistics = "mean")
The marker genes in each cluster can be visualized with a heatmap:
# visualize averages z-scaled across clusters
exec(pheatmap,
mat = t(assay(pbs)), scale = "column", breaks = seq(-2, 2, length = 101),
!!!common_args_heatmap)
We can visualize some marker genes:
p0 <- plotSpatialFeature(vhdsub, features = "MMP2",
colGeometryName = "updatecellSeg") + ggtitle("MMP2")
p1 <- plotSpatialFeature(vhdsub, features = "PIGR",
colGeometryName = "updatecellSeg") + ggtitle("PIGR")
p2 <- plotSpatialFeature(vhdsub, features = "DES",
colGeometryName = "updatecellSeg") + ggtitle("DES")
(p0 | p1 | p2) &
gradient_theme &
scale_fill_gradientn(colors = rev(hcl.colors(9, "Rocket")))
We can visualize the results spatially:
p0 <- plotSpatialFeature(vhdsub, features = "cellType",
colGeometryName = "updatecellSeg") +
scale_fill_manual(values = unname(pals::trubetskoy())) +
ggtitle("RCTD")
p1 <- plotSpatialFeature(vhdsub, features = "Banksy",
colGeometryName = "updatecellSeg") +
scale_fill_manual(values = unname(pals::kelly())) +
ggtitle("Banksy")
p0 + p1 +
plot_layout(nrow=1) &
center_title & thin_discrete_legend
We observe that Banksy
delineates tissue borders very
well, while RCTD
provides direct biological insight into
the underlying clusters. Their concordance can be visualized using a
heatmap:
fq <- prop.table(table(vhdsub$Banksy, vhdsub$cellType), 1)
exec(pheatmap, fq, !!!common_args_heatmap)
We next quantify spatial proximity between annotated cell types. The
getPairwise()
function from the spicyR
package computes the K-cross function across all pairs of classes,
summarising whether cell type A tends to appear within distance
r
of cell type B more often than expected under spatial
randomness. Here we use a radius of 200 units (~50 microns) to capture
immediate neighbourhood effects at approximately one to two cell
diameters.
crossK = getPairwise(vhdsub,
r = 200,
cellType = "cellType",
imageID = "sample_id",
spatialCoords = c("pxl_col_in_fullres",
"pxl_row_in_fullres"))
We can visualise these relationships using
imageCrossPlot()
. It visualises enrichment and avoidance
across all pairs, which is handy for scanning for strong interactions
before diving deeper. To aid interpretation, follow the sign: values
above zero indicate spatial attraction at radius r (more co-occurrence
than expected), while values below zero indicate spatial repulsion.
imageCrossPlot(crossK, limits = c(-400,400))
We can visualise the spatial arrangement of our annotated cells to build intuition for these statistics. This helps confirm that any strong pairwise signals are visible in the image and not driven by a handful of cells.
vhdsub |>
ggplot(aes(pxl_col_in_fullres, pxl_row_in_fullres, colour = cellType)) +
geom_point(size=0.5) + scale_color_brewer(palette = "Set1") + theme_classic() +
enlarge_legend_dot
While pairwise proximity statistics are a sensible first step, they
can sometimes be misleading because they ignore the broader tissue
context. For example, the K-cross function from spicyR
will
tell us whether two cell types are found close together more often than
expected under a random distribution, but it does not ask conditional
questions. Biologists, however, often want to know things like:
“Do immune cells cluster more tightly with tumour than we would expect, even after accounting for the fact that immune cells already tend to occur together?”
This is where Kontextual
from the Statial
package comes in. Kontextual
extends beyond pairwise
enrichment by creating a contextual null model. Instead of comparing two
cell types against a fully random distribution, it compares them against
what we would expect given the observed mixture of other neighbouring
cell types. In practice, this means that Kontextual is testing whether a
particular association is stronger (or weaker) than would be expected
given the local cellular environment.
For this demonstration, we define an immune context composed of T cells, Myeloid cells, and B cells. Kontextual then evaluates whether, within this immune backdrop, specific pairs (for example, Myeloid–Tumour) show evidence of additional attraction or repulsion at a chosen spatial scale. The method is flexible: you can define any grouping of interest as a “parent” or context, and then interrogate the conditional relationships of one population relative to that context.
## Create a vector saying which cells are in the immune context
immune <- c("T_cells", "Myeloid", "B_cells")
## Define the parents we want to use as contexts
parentDf <- parentCombinations(
all = unique(vhdsub$cellType),
immune = immune)
## Run Kontextual with radius of 20
resultsKontextual <- Kontextual(
cells = vhdsub,
parentDf = parentDf,
imageID = "sample_id",
cellType = "cellType",
spatialCoords = c("pxl_col_in_fullres", "pxl_row_in_fullres"),
r = 200
)
Lets visualise the results.
plotKontext <- resultsKontextual |>
ggplot(aes(original, kontextual, label = test)) +
geom_point() +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
theme_classic()
ggplotly(plotKontext)
We can then inspect radii-dependent curves (for example Myeloid around Tumour) to confirm at which scales the effect appears, which is often informative for processes like invasion fronts or peritumoural niches.
curves <- kontextCurve(
cells = vhdsub,
from = "Tumor",
to = "Myeloid",
parent = immune,
rs = seq(20, 500, 50),
image = "sample01",
cellType = "cellType",
imageID = "sample_id"
)
kontextPlot(curves) + theme_bw()
To aid interpretation, we can also re-plot the spatial map with the immune context greyed and the two focal types coloured. This quickly links our statistical findings to a tangible patterns in the tissue.
df <- colData(vhdsub) |>
as.data.frame() |>
cbind(spatialCoords(vhdsub))
ggplot(data = filter(df, cellType %in% c("Tumor", "Myeloid")),
mapping = aes(pxl_col_in_fullres, pxl_row_in_fullres, color = cellType))+
geom_point(data = dplyr::filter(df, cellType %in% immune),
colour = "darkgrey", size = 0.5) +
geom_point(size = 0.5) + enlarge_legend_dot + theme_classic()
Spatial proximity tells us where different cell types are relative to one another, but an equally important question is what happens to those cells when they are in close contact. Proximity alone does not guarantee functional interaction. To uncover biological relevance, we need to test whether changes in the microenvironment are associated with measurable shifts in gene expression within a given cell type.
This is the motivation behind spatioMark
, part of the
Statial
package. The idea is simple:
take a target population (for example, Myeloid cells),
quantify its local environment (how many Tumour cells, T cells, fibroblasts, etc. are nearby within a certain radius),
and ask whether the transcriptomes of the target cells vary with that environment.
In practice, spatioMark
first computes local abundances
of each cell type around every cell, giving each cell a numerical
“neighbourhood profile.” These covariates then feed into a gene-wise
regression model. For each gene expressed in the target population,
spatioMark fits a generalised linear model. As we have count data in
this case, we leverage functionality in the edgeR
package
to model the data as negative binomial. The coefficients tell us whether
gene expression increases or decreases as the abundance of a chosen
neighbour rises.
First, we need to quantify the abundance of each cell type around each cell.
vhdsub <- getAbundances(vhdsub,
r = 200,
cellType = "cellType",
imageID = "sample_id"
)
Next, we focus on whether Myeloid cells show distinct expression patterns when they are located close to Tumour cells.
stateChanges <- calcStateChanges(
cells = vhdsub,
type = "abundances",
from = "Myeloid",
to = "Tumor",
test = "nb",
imageID = "sample_id"
)
stateChanges |> head(20)
#> imageID primaryCellType otherCellType marker coef tval
#> 1 sample01 Myeloid Tumor IGHG3 -0.26986581 -18.176051
#> 2 sample01 Myeloid Tumor SPP1 0.11458632 13.630583
#> 3 sample01 Myeloid Tumor MMP11 0.05409721 7.351737
#> 4 sample01 Myeloid Tumor MMP12 0.05773036 6.921257
#> 5 sample01 Myeloid Tumor IL1RN 0.04404694 5.943101
#> 6 sample01 Myeloid Tumor IGHD -0.02380578 -5.875567
#> 7 sample01 Myeloid Tumor CTSB 0.03596732 5.741341
#> 8 sample01 Myeloid Tumor GREM1 -0.05300939 -5.724902
#> 9 sample01 Myeloid Tumor MT-CO1 0.03483848 5.647737
#> 10 sample01 Myeloid Tumor WNT5A 0.03899693 5.637293
#> 11 sample01 Myeloid Tumor PYGB 0.04308331 5.471111
#> 12 sample01 Myeloid Tumor MGP -0.04772709 -5.118931
#> 13 sample01 Myeloid Tumor APOC1 0.03665541 5.096505
#> 14 sample01 Myeloid Tumor CEACAM6 0.03919565 5.091371
#> 15 sample01 Myeloid Tumor PLAU 0.03954202 4.999526
#> 16 sample01 Myeloid Tumor IGKC -0.07047815 -4.912809
#> 17 sample01 Myeloid Tumor MT-CO3 0.03117035 4.901309
#> 18 sample01 Myeloid Tumor KRT23 0.02789161 4.848393
#> 19 sample01 Myeloid Tumor DES -0.04838446 -4.805449
#> 20 sample01 Myeloid Tumor IGHG1 -0.05020400 -4.782172
#> pval fdr
#> 1 3.994157e-74 3.392237e-70
#> 2 1.317335e-42 5.594065e-39
#> 3 9.782363e-14 2.769387e-10
#> 4 2.238265e-12 4.752396e-09
#> 5 1.398403e-09 2.375328e-06
#> 6 2.106995e-09 2.982451e-06
#> 7 4.696492e-09 5.493550e-06
#> 8 5.174661e-09 5.493550e-06
#> 9 8.128685e-09 7.335579e-06
#> 10 8.637206e-09 7.335579e-06
#> 11 2.236116e-08 1.726485e-05
#> 12 1.536358e-07 1.078256e-04
#> 13 1.729906e-07 1.078256e-04
#> 14 1.777415e-07 1.078256e-04
#> 15 2.873572e-07 1.627016e-04
#> 16 4.489043e-07 2.378050e-04
#> 17 4.760020e-07 2.378050e-04
#> 18 6.223286e-07 2.936354e-04
#> 19 7.720243e-07 3.450949e-04
#> 20 8.670557e-07 3.681952e-04
The previous output is a ranked list of genes, highlighting markers
that change as Myeloid cells are near tumour cells. One of the top genes
on this list is SPP1 which is known to be expressed in
tumour-associated macrophages. We can visualise this relationship using
the plotStateChanges()
function.
p <- plotStateChanges(
cells = vhdsub,
image = "sample01",
type = "abundances",
from = "Myeloid",
to = "Tumor",
marker = "SPP1",
imageID = "sample_id",
size = 1,
shape = 19,
interactive = FALSE,
method = "lm"
)
p[[1]]
However, there are some genes that not expressed in macrophages such as CEACAM6 which are likely lateral spillover or diffusion.
sce[, down_sample] |>
join_features(features = "CEACAM6", shape = "wide") |>
ggplot(aes(x = Level1, y = CEACAM6, fill = Level1)) +
stat_summary(fun = mean, geom = "bar") +
scale_fill_brewer(palette = "Set1") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
thin_discrete_legend
Because we already computed RCTD
weights, we can
mitigate contamination by adding deconvolution-derived components as
covariates to our models. By storing the RCTD
cell-type
weight matrix into reducedDim(vhdsub, "RCTD")
, we can pass
"RCTD"
to the contamination argument of
calcStateChanges()
. SpatioMark will use these estimates of
contamination in its modelling to hopefully remove false positives like
CEACAM6 while preserving genuine relationships.
contam <- assay(rctd_result_level1,1) |>
t() |>
as.matrix() |>
as.data.frame()
reducedDim(vhdsub, "RCTD") <- contam[match(colnames(vhdsub), rownames(contam)),]
stateChangesCon <- calcStateChanges(
cells = vhdsub,
type = "abundances",
from = "Myeloid",
to = "Tumor",
test = "nb",
imageID = "sample_id",
contamination = "RCTD"
)
stateChangesCon |> filter(marker == "CEACAM6")
#> imageID primaryCellType otherCellType marker coef tval pval
#> 1 sample01 Myeloid Tumor CEACAM6 0.0225659 2.376978 0.008727561
#> fdr
#> 1 0.4283528
You can try using a finer resolution of cell type classified by
RCTD
.
Or you can try using a different deconvolution algorithm.
set.seed(2025)
vhd2 <- vhdsub
colnames(spatialCoords(vhd2)) <- c("x", "y")
sce2 <-sce[, down_sample]
counts(vhd2) <- as(counts(vhd2), "sparseMatrix")
counts(sce2) <- as(counts(sce2), "sparseMatrix")
CARD_obj <- CARD_deconvolution(
spe = vhd2,
sce = sce2,
sc_count = NULL,
sc_meta = NULL,
spatial_count = NULL,
spatial_location = NULL,
ct_varname = "Level2",
ct_select = NULL, # decon with all sce$Annogrp cell types
sample_varname = NULL, # use all sce as one ref sample
mincountgene = 0,
mincountspot = 0
)
# a matrix/data.frame of probabilities per cell (rows) × cell types (cols)
P <- CARD_obj$Proportion_CARD[colnames(vhdsub), ]
# cell type with the largest probability in each row
vhdsub$cellType2 <- colnames(P)[max.col(P, ties.method = "first")]
reducedDim(vhdsub, "CARD") <- as.data.frame(P)