Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542
Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542lingminhao wants to merge 18 commits intodevel_pre_v4from
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a sampleData argument to bambu() to allow users to attach per-sample (bulk) or per-sample/per-barcode (single-cell/spatial) metadata from CSV files into the output SummarizedExperiment’s colData.
Changes:
- Adds
sampleDatatobambu()and threads it intoassignReadClasstoTranscripts(). - Reworks
generateColData()to left-join user-provided CSV metadata bysampleName(bulk) orbarcode(demultiplexed). - Changes multi-sample SE assembly to carry forward per-sample
colDatainto the combined SE.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| R/bambu.R | Adds sampleData param and passes it through quantification path; changes how combined colData is built. |
| R/bambu-assignDist.R | Extends assignReadClasstoTranscripts() signature to accept sampleData and uses new generateColData(). |
| R/bambu_utilityFunctions.R | Updates combineCountSes() to accept external colData list and rewrites generateColData() to join CSV metadata. |
| R/bambu-processReads_utilityConstructReadClasses.R | Formatting/brace cleanup in read-class construction. |
| R/bambu-processReads.R | Removes an unused warnings placeholder variable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
65c11ec to
7d60435
Compare
ch99l
left a comment
There was a problem hiding this comment.
After the requested code changes, code runs as expected.
| mcols(readGrgList)$CB <- as.factor(mcols(readGrgList)$CB) | ||
|
|
||
| if(!isFALSE(demultiplexed)){ | ||
| mcols(readGrgList)$CB <- as.factor(mcols(readGrgList)$CB) |
There was a problem hiding this comment.
should be done when this is first defined?
| sampleName = names(bam.file)[1] | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
sampleName <- names(bam.file)[1]already defined but not used? (line 176)- simplify code here?
metadata(se)$sampleData <- tibble( id = names(bam.file)[1], sampleName = id) if (demultiplexed) { metadata(se)$sampleData %>% mutate( barcode = levels(mcols(readGrgList)$CB) id = paste(sampleName, barcode, sep='_') ) }
| j = as.numeric(names(unlist(counts.table))), | ||
| x = unlist(counts.table), | ||
| dims = c(nrow(eqClasses), length(metadata(readClassFile)$samples))) | ||
| dims = c(nrow(eqClasses), length(metadata(readClassFile)$sampleData$id))) |
There was a problem hiding this comment.
nrow(metadata(readClassFile)$sampleData) ? or define sampleIds <-.... then use length(sampleIds)
| colData <- tibble( | ||
| id = paste(metadata(readClassList)$sampleData$sampleName, metadata(readClassList)$sampleData$barcode, sep = '_'), | ||
| sampleName = metadata(readClassList)$sampleData$sampleName, | ||
| barcode = metadata(readClassList)$sampleData$barcode | ||
| ) | ||
| } else{ | ||
| colData$sampleName <- samples | ||
| colData <- tibble(id = metadata(readClassList)$sampleData$sampleName, sampleName = metadata(readClassList)$sampleData$sampleName) |
There was a problem hiding this comment.
duplicated? same as above
jonathangoeke
left a comment
There was a problem hiding this comment.
see comments regarding code changes
This PR allows the user to provide sample-specific metadata using the
sampleDataargument.Format Supported:
.csvmetadata file with a mandatorysampleNamecolumn is sufficient. Every row then contains metadata information for a sample. Multiple.csvfile for each sample is possible, but not necessary..csvmetadata file per single-cell/spatial sample, each containing a mandatorybarcodecolumn.If a specific sample lacks metadata, users can simply pass a
NAvalue at the corresponding index in the input vector (e.g.,c("metadata_sample1.csv", NA, "metadata_sample3.csv")).Users can then define any additional metadata columns as needed in the metadata
.csvfile. The metadata will be stored in thecolDataof theseSummarizedExperiment object