Isotype and Primer Annotations

Assigning isotype annotations from the constant region sequence

MaskPrimers.py is usually used to remove primer regions and annotate sequences with primer identifiers. However, it can be used for any other case where you need to align a set of short sequences against the reads. One example of an alternate use is where you either do not know the C-region primer sequences or do not trust the primer region to provide an accurate isotype assignment.

If you build a FASTA file containing the reverse-complement of short sequences from the front of CH-1, then you can annotate the reads with these sequence in the same way you would C-region specific primers:

MaskPrimers.py align -s reads.fastq -p IGHC.fasta --maxlen 100 --maxerror 0.3 \
    --mode cut --revpr --pf C_CALL

where --revpr tells MaskPrimers.py to reverse-complement the “primer” sequences and look for them at the end of the reads, --maxlen 100 restricts the search to the last 100 bp, --maxerror 0.3 allows for up to 30% mismatches, and -p IGHC.fasta specifies the file containing the CH-1 sequences. The name of the C-region will be added to the sequence headers as the C_CALL annotation, where the field name is specified by the --pf argument. An example CH-1 sequence file would look like:

>IGHD
CTGATATGATGGGGAACACATCCGGAGCCTTGGTGGGTGC
>IGHM
AGGAGACGAGGGGGAAAAGGGTTGGGGCGGATGCACTCCC
>IGHG
AGGGYGCCAGGGGGAAGACSGATGGGCCCTTGGTGGAAGC
>IGHA
MGAGGCTCAGCGGGAAGACCTTGGGGCTGGTCGGGGATGC
>IGHE
AGCGGGTCAAGGGGAAGACGGATGGGCTCTGTGTGGAGGC

Download IGHC.fasta