Inconsistent RUC estimates between samples with similar CIGAR-derived insertion profiles

Hello and thank you for this useful tool!

I am investigating repeat expansion distributions in a cohort of ONT-sequenced samples and have noticed that, in multiple instances, the tool infers very different copy number values for samples whose insertion distribution profiles look nearly identical when inspected via CIGAR strings.

**Example** 
Alignment:
`minimap2 -a -x map-ont --MD -t 16 reference.fa sample.fastq -o sample.sam`
Straglr:
`straglr.py sample.bam hg38.fa sample --min_ins_size 5 --region regions.bed`

At locus chr6:75866276  sample_1 and sample_2 vcf report same repeated unit ("GT"), while different copy numbers (21.5 and 44.5, respectivelly):

- Sample1: `chr6	75866276	.	T	<CNV:TR>	.	PASS	RUS_REF=GT;SVLEN=32;RN=1;RUS=GT;RUC=21.5;CIRUC=-2.0,1.0	GT:DP:AD	1:41:21`
- Sample2: `chr6	75866276	.	T	<CNV:TR>	.	PASS	RUS_REF=GT;SVLEN=32;RN=1;RUS=GT;RUC=44.5;CIRUC=-10.5,4.0	GT:DP:AD	1:32:23`

When inspecting CIGAR-derived insertion lengths and repeat unit counts (GT/TG) at this locus, both samples show nearly identical distributions (see below). This is roughly consistent with Sample1's RUC=21.5 (considering 16 copy number offset from hg38), but for Sample2 a RUC=44.5 call would imply a distribution shifted ~20–25 units higher.

**Question**
Is the RUC=44.5 call for Sample2 reliable, or should it be flagged as a miscall? Does it makes sense to cross-checking RUC against CIGAR-derived insertion lengths or am I missing something?

I’ll attach straglr tsv output and alignment files for both samples.

Thank you.

[sample_1.tsv](https://github.com/user-attachments/files/29010396/sample_1.tsv)
[sample_2.tsv](https://github.com/user-attachments/files/29010395/sample_2.tsv)

[sample_1.txt](https://github.com/user-attachments/files/29010561/sample_1.txt)
[sample_2.txt](https://github.com/user-attachments/files/29010560/sample_2.txt)



<img width="1733" height="1551" alt="Image" src="https://github.com/user-attachments/assets/b7ddd165-4b73-4ba3-a6df-480989eec74a" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent RUC estimates between samples with similar CIGAR-derived insertion profiles #85

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inconsistent RUC estimates between samples with similar CIGAR-derived insertion profiles #85

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions