Isotype and Primer Annotations

Assigning isotype annotations from the constant region sequence

MaskPrimers.py is usually used to remove primer regions and annotate sequences with primer identifiers. However, it can be used for any other case where you need to align a set of short sequences against the reads. One example of an alternate use is where you either do not know the C-region primer sequences or do not trust the primer region to provide an accurate isotype assignment.

If you build a FASTA file containing the reverse-complement of short sequences from the front of CH-1, then you can annotate the reads with these sequence in the same way you would C-region specific primers:

MaskPrimers.py align -s reads.fastq -p IGHC.fasta --maxlen 100 --maxerror 0.3 \
    --mode cut --revpr --pf C_CALL

where --revpr tells MaskPrimers.py to reverse-complement the “primer” sequences and look for them at the end of the reads, --maxlen 100 restricts the search to the last 100 bp, --maxerror 0.3 allows for up to 30% mismatches, and -p IGHC.fasta specifies the file containing the CH-1 sequences. The name of the C-region will be added to the sequence headers as the C_CALL annotation, where the field name is specified by the --pf argument. An example CH-1 sequence file would look like:

>IGHD
CTGATATGATGGGGAACACATCCGGAGCCTTGGTGGGTGC
>IGHM
AGGAGACGAGGGGGAAAAGGGTTGGGGCGGATGCACTCCC
>IGHG
AGGGYGCCAGGGGGAAGACSGATGGGCCCTTGGTGGAAGC
>IGHA
MGAGGCTCAGCGGGAAGACCTTGGGGCTGGTCGGGGATGC
>IGHE
AGCGGGTCAAGGGGAAGACGGATGGGCTCTGTGTGGAGGC

Download IGHC.fasta

See also

Constant region reference sequences may be downloaded from IMGT and the sequence headers can be reformated using the imgt subcommand of ConvertHeaders.py. Note, you may need to clean-up the reference sequences a bit before running ConvertHeaders.py if you receive an error about duplicate sequence names (e.g., remove duplicate allele with different artificial splicing). To cut and reverse-complement the constant region sequences, use something like seqmagick.