pRESTO - The REpertoire Sequencing TOolkit¶
pRESTO is a toolkit for processing raw reads from high-throughput sequencing of lymphocyte repertoires.
Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of immunoglobulin repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of T and B lymphocytes. The REpertoire Sequencing TOolkit (pRESTO) is composed of a suite of utilities to handle all stages of sequence processing prior to germline segment assignment. pRESTO is designed to handle either single reads or paired-end reads. It includes features for quality control, primer masking, annotation of reads with sequence embedded barcodes, generation of unique molecular identifier (UMI) consensus sequences, assembly of paired-end reads and identification of duplicate sequences. Numerous options for sequence sorting, sampling and conversion operations are also included.
- Manipulating Annotations
- Miscellaneous Tasks
- Importing data from SRA, ENA or GenBank into pRESTO
- Removing junk sequences
- Reducing file size for submission to IMGT/HighV-QUEST
- Sampling and subsetting FASTA and FASTQ files
- Dealing with insufficient UMI diversity
- Dealing with misaligned V-segment primers and indels in UMI groups
- Assembling paired-end reads that do not overlap
- Assigning isotype annotations from the constant region sequence
- Estimating sequencing and PCR error rates with UMI data