Important
Are you an Immcantation user and/or interested in adaptive immune receptor repertoire analysis?
Register now for the upcoming Immcantation Users Group Meeting! It will be held virtually on January 30th, 2025, from 10 to 1:30pm (ET). All talks will be from user-submitted abstracts.
Full information here: https://immcantation.github.io/users-meeting/
Fixing Assembly Problems
Assembling paired-end reads that do not overlap
The typical way to assemble paired-end reads is via de novo assembly using the align subcommand of AssemblePairs.py. However, some sequences with long CDR3 regions may fail to assemble due to insufficient or completely absent overlaps between the mate-pairs. The reference or sequential subcommands can be used to assemble mate-pairs that do not overlap using the ungapped V-segment references sequences as a guide.
To handle such sequences in two separate steps, a normal align command
would be performed first. The --failed
argument is added so that the reads failing de novo alignment are output to
separate files:
AssemblePairs.py align -1 reads-1.fastq -2 reads-2.fastq --rc tail \
--coord illumina --failed -outname align
Then the files labeled assemble-fail
, along with the ungapped V-segment
reference sequences (-r vref.fasta
),
would be input into the reference subcommand of AssemblePairs.py:
AssemblePairs.py reference -1 align-1_assemble-fail.fastq -2 align-2_assemble-fail.fastq \
--rc tail -r vref.fasta --coord illumina --outname ref
This will result in two separate assemble-pass
files - one from each step. You may
process them separately or concatenate them together into a single file:
cat align_assemble-pass.fastq ref_assemble-pass.fastq > merged_assemble-pass.fastq
However, if you intend to processes them together, you may simplify this by perform both steps using the sequential subcommand, which will attempt de novo assembly followed by reference guided assembly if de novo assembly fails:
AssemblePairs.py sequential -1 reads-1.fastq -2 reads-2.fastq --rc tail \
--coord illumina -r vref.fasta
Note
The sequences output by the reference or sequential subcommands
may contain an appropriate length spacer of Ns between any mate-pairs that do not overlap.
The --fill
argument can be specified to force
AssemblePairs.py to insert the germline sequence into the missing positions,
but this should be used with caution as the inserted sequence may not be
biologically correct.