Hi! Thanks for making this amazing tool
I noticed
- ultraplex does not remove barcode from the other read (when insert is short)
- ultraplex removed the space between read id and read name, so when I want to further remove the barcode at the other read, cutadapt cannot validate the integrity between two reads (read id is concatenated to name, causing problems)
In particular:
problem 1:
for a library with barcode ATGCGCAG, the output of the reverse read still contain the reverse complement of the barcode at the end (CTGCGCAT is reverse complement to ATGCGCAG)
zcat ultraple_demux_somthing_Rev.fastq.gz | grep -v '@' | grep CTGCGCAT
GACGTCGTGCTCTCCCCCTGCGCAT
CCCCCCGCGGGGGCGCGCCGGTTCTGCGCAT
CTCCCGGGGCTACGCCTGTCTGAGCGTCGCTATCTGCGCAT
GAAAGTCGGAGACCTGCGCAT
TCCCGGGGCTACGCCTGTCTGAGCGTCGCTTTACTGCGCAT
GGCGGCGTCCGGTGAGCTCTCGCTGGCCTTCTGCGCAT
GGCTACGCCTGTCTGAGCGTCGCTTGTCTGCGCAT
GTCCTGGGAAACGGGGCGCGGCCGGCCCTGCGCAT
TGGTGACCACGGGTGACGGGGAAGCTGCGCAT
GACCCGCCGGGCAGCTTCCGGGAAACCAAAATCTGCGCATA
GGTTCGATTCCGGTTGCGTCCACCCACTGCGCAT
these reads will have problems to map
problem 2:
output read ID is concatenated to read name, causing problem with cutadapt:
# cutadapt error
ERROR: Error in sequence file at unknown line: Reads are improperly paired. Read name 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' in file 1 does not match 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:' in file 2.
the problem is that it is checking if A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT==A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT but because the space between is gone, it now determines 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' != 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:'
output from ultraplex:
(base) [hsher@tscc-login1 fastqs]$ zcat ultraplex_demux_PUM2_Fwd.fastq.gz | head
@A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG2:N:0:ATTACTCG+AGGCTATArbc:
AGTTGGGGAAATCGCAGGGGTCAGCACATCCGGAGTGCAATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF
notice the space between @A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG and 2:N:0:ATTACTCG+AGGCTATArbc: is gone
input to ultraplex:
(base) [hsher@tscc-login1 fastqs]$ zcat all.Tr.umi.fq2.trim.gz | head
@A00475:502:HJLHHDRX2:1:2101:1090:1000:NCACCACCTG 2:N:0:ATTACTCG+AGGCTATA
CGTAGTAAACTCTCCCCGGGGCTCCCGCCGGCTTCTCCGGG
+
FFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF
Hi! Thanks for making this amazing tool
I noticed
In particular:
problem 1:
for a library with barcode
ATGCGCAG, the output of the reverse read still contain the reverse complement of the barcode at the end (CTGCGCATis reverse complement toATGCGCAG)GACGTCGTGCTCTCCCCCTGCGCAT
CCCCCCGCGGGGGCGCGCCGGTTCTGCGCAT
CTCCCGGGGCTACGCCTGTCTGAGCGTCGCTATCTGCGCAT
GAAAGTCGGAGACCTGCGCAT
TCCCGGGGCTACGCCTGTCTGAGCGTCGCTTTACTGCGCAT
GGCGGCGTCCGGTGAGCTCTCGCTGGCCTTCTGCGCAT
GGCTACGCCTGTCTGAGCGTCGCTTGTCTGCGCAT
GTCCTGGGAAACGGGGCGCGGCCGGCCCTGCGCAT
TGGTGACCACGGGTGACGGGGAAGCTGCGCAT
GACCCGCCGGGCAGCTTCCGGGAAACCAAAATCTGCGCATA
GGTTCGATTCCGGTTGCGTCCACCCACTGCGCAT
these reads will have problems to map
problem 2:
output read ID is concatenated to read name, causing problem with cutadapt:
the problem is that it is checking if
A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT==A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAATbut because the space between is gone, it now determines'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:'!='A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:'output from ultraplex:
notice the space between
@A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAGand2:N:0:ATTACTCG+AGGCTATArbc:is goneinput to ultraplex: