Skip to content

Inconsistent RUC estimates between samples with similar CIGAR-derived insertion profiles #85

Description

@antoniotescari

Hello and thank you for this useful tool!

I am investigating repeat expansion distributions in a cohort of ONT-sequenced samples and have noticed that, in multiple instances, the tool infers very different copy number values for samples whose insertion distribution profiles look nearly identical when inspected via CIGAR strings.

Example
Alignment:
minimap2 -a -x map-ont --MD -t 16 reference.fa sample.fastq -o sample.sam
Straglr:
straglr.py sample.bam hg38.fa sample --min_ins_size 5 --region regions.bed

At locus chr6:75866276 sample_1 and sample_2 vcf report same repeated unit ("GT"), while different copy numbers (21.5 and 44.5, respectivelly):

  • Sample1: chr6 75866276 . T <CNV:TR> . PASS RUS_REF=GT;SVLEN=32;RN=1;RUS=GT;RUC=21.5;CIRUC=-2.0,1.0 GT:DP:AD 1:41:21
  • Sample2: chr6 75866276 . T <CNV:TR> . PASS RUS_REF=GT;SVLEN=32;RN=1;RUS=GT;RUC=44.5;CIRUC=-10.5,4.0 GT:DP:AD 1:32:23

When inspecting CIGAR-derived insertion lengths and repeat unit counts (GT/TG) at this locus, both samples show nearly identical distributions (see below). This is roughly consistent with Sample1's RUC=21.5 (considering 16 copy number offset from hg38), but for Sample2 a RUC=44.5 call would imply a distribution shifted ~20–25 units higher.

Question
Is the RUC=44.5 call for Sample2 reliable, or should it be flagged as a miscall? Does it makes sense to cross-checking RUC against CIGAR-derived insertion lengths or am I missing something?

I’ll attach straglr tsv output and alignment files for both samples.

Thank you.

sample_1.tsv
sample_2.tsv

sample_1.txt
sample_2.txt

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions