Skip to content

Filter Bismark coverage files in targeted sequencing mode#604

Merged
FelixKrueger merged 7 commits intonf-core:devfrom
FelixKrueger:fix/filter-coverage-targeted-sequencing
Apr 10, 2026
Merged

Filter Bismark coverage files in targeted sequencing mode#604
FelixKrueger merged 7 commits intonf-core:devfrom
FelixKrueger:fix/filter-coverage-targeted-sequencing

Conversation

@FelixKrueger
Copy link
Copy Markdown
Contributor

Summary

  • When --run_targeted_sequencing is enabled, only bedGraph files were filtered against the target BED. The .cov.gz files from Bismark were published unfiltered, containing off-target CpGs.
  • Adds a BEDTOOLS_INTERSECT_COV process alias to filter .cov files identically to bedGraphs, so downstream tools (methylKit, bsseq, DSS) receive only on-target sites.
  • Only affects the Bismark aligner path (bwameth/MethylDackel doesn't produce .cov files).

Changes

  • subworkflows/local/targeted_sequencing/main.nf: Added BEDTOOLS_INTERSECT_COV alias, new ch_coverage input, intersect call, and coverage_filtered emit
  • workflows/methylseq/main.nf: Pass methylation_coverage from Bismark subworkflow to TARGETED_SEQUENCING
  • conf/modules/bedtools_intersect_cov.config: Config for aliased process (ext.suffix = 'targeted.cov', publishDir to methylation_coverage/)
  • conf/subworkflows/targeted_sequencing.config: Include new config

Test plan

  • Run with --aligner bismark --run_targeted_sequencing --target_regions_file <bed> and verify *.targeted.cov files appear in bismark/methylation_calls/methylation_coverage/
  • Verify all sites in filtered .cov fall within target regions (bedtools intersect -v should return 0 lines)
  • Run with --aligner bwameth --run_targeted_sequencing and verify no errors (empty coverage channel is handled gracefully)

When run_targeted_sequencing is enabled, only bedGraph files were
filtered against the target regions BED file. The Bismark .cov.gz
files (used by downstream tools like methylKit, bsseq, and DSS)
were published unfiltered, containing off-target CpG sites.

This adds a second BEDTOOLS_INTERSECT call (via process alias
BEDTOOLS_INTERSECT_COV) to filter .cov files the same way,
publishing them alongside the filtered bedGraphs. Only applies
to the Bismark aligner path since bwameth/MethylDackel does not
produce .cov files.
@FelixKrueger FelixKrueger requested a review from a team as a code owner April 8, 2026 12:31
@FelixKrueger FelixKrueger requested a review from sateeshperi April 8, 2026 12:32
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Bismark bedGraph files use 0-based half-open coordinates for single-C
positions (e.g., chr1 10788 10789). When a CpG straddles a target
boundary — the C falls just outside, the G just inside — a naive
bedtools intersect drops it because [10788, 10789) does not overlap
a target starting at 10789.

Replace the generic BEDTOOLS_INTERSECT call with a local
FILTER_BEDGRAPH_TARGETS process that:
  1. Extends each bedGraph entry by 1 bp on the right (+1 to end
     coordinate) to represent the full CpG dinucleotide
  2. Runs bedtools intersect against the target regions
  3. Restores the original single-base coordinates

This ensures bedGraph and coverage filtered outputs contain the same
set of on-target CpGs.
Replaces BEDTOOLS_INTERSECT with FILTER_BEDGRAPH_TARGETS in all 5
targeted sequencing test snapshots to match the process rename in the
subworkflow.
The FILTER_BEDGRAPH_TARGETS process change and BEDTOOLS_INTERSECT_COV
addition alter version entries, output file lists, and content checksums.
Clearing the snapshot file so nf-test regenerates all entries from the
current pipeline output.
- Rename BEDTOOLS_INTERSECT to FILTER_BEDGRAPH_TARGETS in version entries
- Add BEDTOOLS_INTERSECT_COV version entry for Bismark tests
- Add .targeted.cov output files to Bismark test file lists
- Sort version keys alphabetically to match pipeline output order
def prefix = task.ext.prefix ?: "${meta.id}"
"""
# Extend end coordinate by 1 bp so the interval covers the full CpG dinucleotide
awk 'BEGIN{OFS="\\t"} {\$3=\$3+1; print}' ${bedgraph} > extended.bedGraph
Copy link
Copy Markdown

@ryanckelly44 ryanckelly44 Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Produces an empty bedGraph file if gzipped file is passed to awk.
Change to:
zcat ${bedgraph} | awk 'BEGIN{OFS="\t"} {$3=$3+1; print}' \ > extended.bedGraph

The Bismark bedGraph files are gzipped (.bedGraph.gz), so the awk
preprocessing needs zcat to decompress them first. Also skip track
header lines to avoid corrupting the bedGraph format.

Rebuild test snapshots from original with BEDTOOLS_INTERSECT renamed
to FILTER_BEDGRAPH_TARGETS and BEDTOOLS_INTERSECT_COV added for
Bismark tests. Snapshots will need regeneration via nf-test
--update-snapshot after confirming the code fix.
Reconstruct from original pre-PR snapshots with:
- BEDTOOLS_INTERSECT renamed to FILTER_BEDGRAPH_TARGETS
- BEDTOOLS_INTERSECT_COV added for Bismark tests
- Version entries ordered to match pipeline output (late-completing
  processes like BISMARK_ALIGN, FASTQC, MULTIQC placed after Workflow)
- .targeted.cov files added to Bismark file lists and checksums
  using MD5s from CI output
@FelixKrueger FelixKrueger merged commit d39c3de into nf-core:dev Apr 10, 2026
84 of 101 checks passed
@FelixKrueger FelixKrueger deleted the fix/filter-coverage-targeted-sequencing branch April 10, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants