Skip to content

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542

Open
lingminhao wants to merge 18 commits intodevel_pre_v4from
generateColData
Open

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542
lingminhao wants to merge 18 commits intodevel_pre_v4from
generateColData

Conversation

@lingminhao
Copy link
Copy Markdown
Collaborator

@lingminhao lingminhao commented Feb 15, 2026

This PR allows the user to provide sample-specific metadata using the sampleData argument.

Format Supported:

  • Bulk Data: one single .csv metadata file with a mandatory sampleName column is sufficient. Every row then contains metadata information for a sample. Multiple .csv file for each sample is possible, but not necessary.
  • Single-Cell / Spatial Data: one .csv metadata file per single-cell/spatial sample, each containing a mandatory barcode column.

If a specific sample lacks metadata, users can simply pass a NA value at the corresponding index in the input vector (e.g., c("metadata_sample1.csv", NA, "metadata_sample3.csv")).

Users can then define any additional metadata columns as needed in the metadata .csv file. The metadata will be stored in the colData of the se SummarizedExperiment object

@lingminhao lingminhao changed the base branch from devel to devel_pre_v4 February 15, 2026 14:09
@lingminhao lingminhao requested a review from Copilot February 15, 2026 14:09
@lingminhao lingminhao added the bambu-dev Feature is implemented in development branch label Feb 15, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a sampleData argument to bambu() to allow users to attach per-sample (bulk) or per-sample/per-barcode (single-cell/spatial) metadata from CSV files into the output SummarizedExperiment’s colData.

Changes:

  • Adds sampleData to bambu() and threads it into assignReadClasstoTranscripts().
  • Reworks generateColData() to left-join user-provided CSV metadata by sampleName (bulk) or barcode (demultiplexed).
  • Changes multi-sample SE assembly to carry forward per-sample colData into the combined SE.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
R/bambu.R Adds sampleData param and passes it through quantification path; changes how combined colData is built.
R/bambu-assignDist.R Extends assignReadClasstoTranscripts() signature to accept sampleData and uses new generateColData().
R/bambu_utilityFunctions.R Updates combineCountSes() to accept external colData list and rewrites generateColData() to join CSV metadata.
R/bambu-processReads_utilityConstructReadClasses.R Formatting/brace cleanup in read-class construction.
R/bambu-processReads.R Removes an unused warnings placeholder variable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GoekeLab GoekeLab deleted a comment from Copilot AI Feb 17, 2026
Copy link
Copy Markdown
Collaborator

@ch99l ch99l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the requested code changes, code runs as expected.

@ch99l ch99l requested a review from SuiYue-2308 March 9, 2026 10:04
mcols(readGrgList)$CB <- as.factor(mcols(readGrgList)$CB)

if(!isFALSE(demultiplexed)){
mcols(readGrgList)$CB <- as.factor(mcols(readGrgList)$CB)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done when this is first defined?

sampleName = names(bam.file)[1]
)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • sampleName <- names(bam.file)[1] already defined but not used? (line 176)
  • simplify code here?
    metadata(se)$sampleData <- tibble( id = names(bam.file)[1], sampleName = id) if (demultiplexed) { metadata(se)$sampleData %>% mutate( barcode = levels(mcols(readGrgList)$CB) id = paste(sampleName, barcode, sep='_') ) }

j = as.numeric(names(unlist(counts.table))),
x = unlist(counts.table),
dims = c(nrow(eqClasses), length(metadata(readClassFile)$samples)))
dims = c(nrow(eqClasses), length(metadata(readClassFile)$sampleData$id)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nrow(metadata(readClassFile)$sampleData) ? or define sampleIds <-.... then use length(sampleIds)

Comment on lines +314 to +320
colData <- tibble(
id = paste(metadata(readClassList)$sampleData$sampleName, metadata(readClassList)$sampleData$barcode, sep = '_'),
sampleName = metadata(readClassList)$sampleData$sampleName,
barcode = metadata(readClassList)$sampleData$barcode
)
} else{
colData$sampleName <- samples
colData <- tibble(id = metadata(readClassList)$sampleData$sampleName, sampleName = metadata(readClassList)$sampleData$sampleName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated? same as above

Copy link
Copy Markdown
Member

@jonathangoeke jonathangoeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments regarding code changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bambu-dev Feature is implemented in development branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants