presto.Sequence<a class="headerlink" href="#module-presto.Sequence" title="Permalink to this heading">

presto.Sequence.calculateDiversity(seq_list, score_dict=getDNAScoreDict())

Determine the average pairwise error rate for a list of sequences

Parameters:

seq_list – List of SeqRecord objects to score
score_dict – Optional dictionary of alignment scores as {(char1, char2): score}

Returns:

Average pairwise error rate for the list of sequences

Return type:

float

presto.Sequence.calculateSetError(seq_list, ref_seq, ignore_chars=['n', 'N'], score_dict=getDNAScoreDict())

Counts the occurrence of nucleotide mismatches from a reference in a set of sequences

Parameters:

seq_list – list of SeqRecord objects with aligned sequences.
ref_seq – SeqRecord object containing the reference sequence to match against.
ignore_chars – list of characters to exclude from mismatch counts.
score_dict – optional dictionary of alignment scores as {(char1, char2): score}.

Returns:

error rate for the set.

Return type:

float

presto.Sequence.checkSeqEqual(seq1, seq2, ignore_chars={'-', '.', 'N', 'n'})

Determine if two sequences are equal, excluding missing positions

Parameters:

seq1 – SeqRecord object
seq2 – SeqRecord object
ignore_chars – Set of characters to ignore

Returns:

True if the sequences are equal

Return type:

bool

presto.Sequence.compilePrimers(primers)

Translates IUPAC Ambiguous Nucleotide characters to regular expressions and compiles them

Parameters:: key – Dictionary of sequences to translate
Returns:: Dictionary of compiled regular expressions
Return type:: dict

presto.Sequence.consensusUnify(data, field, delimiter=('|', '=', ','))

Reassigns all annotations to the consensus annotation in group

Parameters:

data – SeqData object contain sequences to process.
field – field containing annotations to collapse.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).

Returns:

modified sequences.

Return type:

presto.Sequence.deleteSeqPositions(seq, positions)

Deletes a list of positions from a SeqRecord

Parameters:

seq – SeqRecord objects
positions – Set of positions (indices) to delete

Returns:

Modified SeqRecord with the specified positions removed

Return type:

SeqRecord

presto.Sequence.deletionUnify(data, field, delimiter=('|', '=', ','))

Removes all sequences with differing annotations in a group

Parameters:

data – SeqData object contain sequences to process.
field – field containing annotations to collapse.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).

Returns:

modified sequences.

Return type:

presto.Sequence.PrimerAlignment

presto.Sequence.extractAlignment(seq_record, start, length, rev_primer=False)

Extracts a subsequence from sequence

Parameters:

data – SeqRecord to process.
start – position where subsequence starts.
length – the length of the subsequence to extract.
rev_primer – if True extract relative to the tail end of the sequence.

Returns:

extraction results as an alignment object

Return type:

presto.Sequence.filterLength(data, min_length=250, inner=True, missing_chars='-n.N')

Filters sequences by length

Parameters:

data (SeqData) – a SeqData object with a single SeqRecord to process.
min_length (int) – the minimum length allowed.
inner (bool) – if True exclude outer missing characters from calculation.
missing_chars (str) – a string of missing character values.

Returns:

SeqResult object.

Return type:

presto.Sequence.filterMissing(data, max_missing=10, inner=True, missing_chars='-n.N')

Filters sequences by number of missing nucleotides

Parameters:

data (SeqData) – SeqData object with a single SeqRecord to process.
max_missing (int) – the maximum number of allowed ambiguous characters.
inner (bool) – if True exclude outer missing characters from calculation.
missing_chars (str) – a string of missing character values.

Returns:

SeqResult object.

Return type:

presto.Sequence.filterQuality(data, min_qual=0, inner=True, missing_chars='-n.N')

Filters sequences by quality score

Parameters:

data (SeqData) – a SeqData object with a single SeqRecord to process.
min_qual (int) – minimum mean quality score for retained sequences.
inner (bool) – if True exclude outer missing characters from calculation.
missing_chars (str) – a string of missing character values.

Returns:

SeqResult object.

Return type:

presto.Sequence.filterRepeats(data, max_repeat=15, include_missing=False, inner=True, missing_chars='-n.N')

Filters sequences by fraction of ambiguous nucleotides

Parameters:

data (SeqData) – a SeqData object with a single SeqRecord to process.
max_repeat (int) – the maximum number of allowed repeating characters.
include_missing (int) – if True count ambiguous character repeats; if False do not consider ambiguous character repeats.
inner (int) – if True exclude outer missing characters from calculation.
missing_chars (str) – a string of missing character values.

Returns:

SeqResult object.

Return type:

presto.Sequence.findGapPositions(seq_list, max_gap, gap_chars={'-', '.'})

Finds positions in a set of aligned sequences with a high number of gap characters.

Parameters:

seq_list – List of SeqRecord objects with aligned sequences
max_gap – Float of the maximum gap frequency to consider a position as non-gapped
gap_chars – Set of characters to consider as gaps

Returns:

Positions (indices) with gap frequency greater than max_gap

Return type:

list

presto.Sequence.frequencyConsensus(seq_list, min_freq=0.6, ignore_chars={'-', '.', 'N', 'n'})

Builds a consensus sequence from a set of sequences

Parameters:

set_seq – List of SeqRecord objects
min_freq – Frequency cutoff to assign a base
ignore_chars – Set of characters to exclude when building a consensus sequence

Returns:

Consensus SeqRecord object

Return type:

SeqRecord

presto.Sequence.getAAScoreDict(mask_score=None, gap_score=None)

Generates a score dictionary

Parameters:

mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity

Returns:

Score dictionary with keys (char1, char2) mapping to scores

Return type:

dict

presto.Sequence.getDNAScoreDict(mask_score=None, gap_score=None)

Generates a score dictionary

Parameters:

mask_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity

Returns:

Score dictionary with keys (char1, char2) mapping to scores

Return type:

dict

presto.Sequence.indexSeqSets(seq_dict, field='BARCODE', delimiter=('|', '=', ','))

Identifies sets of sequences with the same ID field

Parameters:

seq_dict – a dictionary index of sequences returned from SeqIO.index()
field – the annotation field containing set IDs
delimiter – a tuple of delimiters for (fields, values, value lists)

Returns:

Dictionary mapping set name to a list of record names

Return type:

dict

presto.Sequence.joinAssembly(head_seq, tail_seq, gap=0, insert_seq=None)

Concatenates two sequences

Parameters:

head_seq – the head SeqRecord.
tail_seq – the tail SeqRecord.
gap – number of gap characters to insert between head and tail ignored if insert_seq is not None.
insert_seq – a string or Bio.Seq.Seq object, to insert between the head and tail; if None insert with N characters.

Returns:

assembled sequence object.

Return type:

presto.Sequence.PrimerAlignment

presto.Sequence.localAlignment(seq_record, primers, primers_regex=None, max_error=0.3, max_len=1000, rev_primer=False, skip_rc=False, gap_penalty=(1, 1), score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 0, ('N', 'B'): 0, ('N', 'C'): 0, ('N', 'D'): 0, ('N', 'G'): 0, ('N', 'H'): 0, ('N', 'K'): 0, ('N', 'M'): 0, ('N', 'N'): 0, ('N', 'R'): 0, ('N', 'S'): 0, ('N', 'T'): 0, ('N', 'U'): 0, ('N', 'V'): 0, ('N', 'W'): 0, ('N', 'Y'): 0, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})

Performs pairwise local alignment of a list of short sequences against a long sequence

Parameters:

seq_record – a SeqRecord object to align primers against
primers – dictionary of {names: short IUPAC ambiguous sequence strings}
primers_regex – optional dictionary of {names: compiled primer regular expressions}
max_error – maximum acceptable error rate before aligning reverse complement
max_len – maximum length of sample sequence to align
rev_primer – if True align with the tail end of the sequence
skip_rc – if True do not check reverse complement sequences
gap_penalty – a tuple of positive (gap open, gap extend) penalties
score_dict – optional dictionary of alignment scores as {(char1, char2): score}

Returns:

primer alignment result object

Return type:

presto.Sequence.maskQuality(data, min_qual=0)

Masks characters by in sequence by quality score

Parameters:

data (SeqData) – a SeqData object with a single SeqRecord to process.
min_qual (int) – minimum quality for retained characters.

Returns:

SeqResult object.

Return type:

presto.Sequence.maskSeq(align, mode='mask', barcode=False, barcode_field='BARCODE', primer_field='PRIMER', delimiter=('|', '=', ','))

Create an output sequence with primers masked or cut

Parameters:

align – a PrimerAlignment object.
mode – defines the action taken; one of ‘cut’, ‘mask’, ‘tag’ or ‘trim’.
barcode – if True add sequence preceding primer to description.
barcode_field – name of the output barcode annotation.
primer_field – name of the output primer annotation.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).

Returns:

masked sequence.

Return type:

Bio.SeqRecord.SeqRecord

presto.Sequence.meanQuality(qual, prob=(1.0, 0.7943282347242815, 0.6309573444801932, 0.5011872336272722, 0.3981071705534972, 0.31622776601683794, 0.251188643150958, 0.19952623149688797, 0.15848931924611134, 0.12589254117941673, 0.1, 0.07943282347242814, 0.06309573444801933, 0.05011872336272722, 0.039810717055349734, 0.03162277660168379, 0.025118864315095794, 0.0199526231496888, 0.015848931924611134, 0.012589254117941675, 0.01, 0.007943282347242814, 0.00630957344480193, 0.005011872336272725, 0.003981071705534973, 0.0031622776601683794, 0.0025118864315095794, 0.001995262314968879, 0.001584893192461114, 0.0012589254117941675, 0.001, 0.0007943282347242813, 0.000630957344480193, 0.0005011872336272725, 0.00039810717055349735, 0.00031622776601683794, 0.00025118864315095795, 0.00019952623149688788, 0.00015848931924611142, 0.00012589254117941674, 0.0001, 7.943282347242822e-05, 6.309573444801929e-05, 5.011872336272725e-05, 3.9810717055349695e-05, 3.1622776601683795e-05, 2.5118864315095822e-05, 1.9952623149688786e-05, 1.584893192461114e-05, 1.2589254117941661e-05, 1e-05, 7.943282347242822e-06, 6.30957344480193e-06, 5.011872336272725e-06, 3.981071705534969e-06, 3.162277660168379e-06, 2.5118864315095823e-06, 1.9952623149688787e-06, 1.584893192461114e-06, 1.2589254117941661e-06, 1e-06, 7.943282347242822e-07, 6.30957344480193e-07, 5.011872336272725e-07, 3.981071705534969e-07, 3.162277660168379e-07, 2.5118864315095823e-07, 1.9952623149688787e-07, 1.584893192461114e-07, 1.2589254117941662e-07, 1e-07, 7.943282347242822e-08, 6.30957344480193e-08, 5.011872336272725e-08, 3.981071705534969e-08, 3.162277660168379e-08, 2.511886431509582e-08, 1.9952623149688786e-08, 1.5848931924611143e-08, 1.2589254117941661e-08, 1e-08, 7.943282347242822e-09, 6.309573444801943e-09, 5.011872336272715e-09, 3.981071705534969e-09, 3.1622776601683795e-09, 2.511886431509582e-09, 1.9952623149688828e-09, 1.584893192461111e-09, 1.2589254117941663e-09, 1e-09, 7.943282347242822e-10, 6.309573444801942e-10, 5.011872336272714e-10, 3.9810717055349694e-10, 3.1622776601683795e-10, 2.511886431509582e-10, 1.9952623149688828e-10, 1.584893192461111e-10, 1.2589254117941662e-10, 1e-10, 7.943282347242822e-11, 6.309573444801942e-11, 5.011872336272715e-11, 3.9810717055349695e-11, 3.1622776601683794e-11, 2.5118864315095823e-11, 1.9952623149688828e-11, 1.5848931924611107e-11, 1.2589254117941662e-11, 1e-11, 7.943282347242821e-12, 6.309573444801943e-12, 5.011872336272715e-12, 3.9810717055349695e-12, 3.1622776601683794e-12, 2.5118864315095823e-12, 1.9952623149688827e-12, 1.584893192461111e-12, 1.258925411794166e-12, 1e-12, 7.943282347242822e-13, 6.309573444801942e-13, 5.011872336272715e-13, 3.981071705534969e-13, 3.162277660168379e-13, 2.511886431509582e-13, 1.9952623149688827e-13))

Calculate mean quality score

Parameters:

qual (list) – numeric Phred quality scores.
prob (list) – mapping of Phred score (index) to probability values

Returns:

floor of the mean Phred quality score.

Return type:

presto.Sequence.overlapConsensus(head_seq, tail_seq, ignore_chars={'-', '.', 'N', 'n'})

Creates a consensus overlap sequences from two segments

Parameters:

head_seq – the overlap head SeqRecord.
tail_seq – the overlap tail SeqRecord.
ignore_chars – list of characters which do not contribute to consensus.

Returns:

A SeqRecord object with consensus characters and quality scores.

Return type:

SeqRecord

presto.Sequence.qualityConsensus(seq_list, min_qual=0, min_freq=0.6, dependent=False, ignore_chars={'-', '.', 'N', 'n'})

Builds a consensus sequence from a set of sequences

Parameters:

seq_list – List of SeqRecord objects
min_qual – Quality cutoff to assign a base
min_freq – Frequency cutoff to assign a base
dependent – If False assume sequences are independent for quality calculation
ignore_chars – Set of characters to exclude when building a consensus sequence

Returns:

Consensus SeqRecord object

Return type:

SeqRecord

presto.Sequence.referenceAssembly(head_seq, tail_seq, ref_dict, ref_db, min_ident=0.5, evalue=1e-05, max_hits=100, fill=False, aligner='usearch', aligner_exec='usearch', score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 1, ('N', '.'): 1, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})

Stitches two sequences together by aligning against a reference database

Parameters:

head_seq – the head SeqRecord.
head_seq – the tail SeqRecord.
ref_dict – a dictionary of reference SeqRecord objects.
ref_db – the path and name of the reference database.
min_ident – the minimum identity for a valid assembly.
evalue – the E-value cut-off for ublast.
max_hits – the maxhits output limit for ublast.
fill – if False non-overlapping regions will be assigned Ns; if True non-overlapping regions will be filled with the reference sequence.
aligner – the alignment tool; one of ‘blastn’ or ‘usearch’.
aligner_exec – the path to the alignment tool executable.
score_dict – optional dictionary of character scores in the form {(char1, char2): score}.

Returns:

assembled sequence object.

Return type:

presto.Sequence.reverseComplement(seq)

Takes the reverse complement of a sequence

Parameters:: seq – a SeqRecord object, Seq object or string to reverse complement
Returns:: Object of the same type as the input with the reverse complement sequence
Return type:: Seq

presto.Sequence.scoreAA(a, b, mask_score=None, gap_score=None)

Returns the score for a pair of IUPAC Extended Protein characters

Parameters:

a – First character
b – Second character
mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity

Returns:

Score for the character pair

Return type:

presto.Sequence.PrimerAlignment

presto.Sequence.scoreAlignment(seq_record, primers, start=0, rev_primer=False, score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 0, ('N', 'B'): 0, ('N', 'C'): 0, ('N', 'D'): 0, ('N', 'G'): 0, ('N', 'H'): 0, ('N', 'K'): 0, ('N', 'M'): 0, ('N', 'N'): 0, ('N', 'R'): 0, ('N', 'S'): 0, ('N', 'T'): 0, ('N', 'U'): 0, ('N', 'V'): 0, ('N', 'W'): 0, ('N', 'Y'): 0, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})

Performs a simple fixed position alignment of primers

Parameters:

seq_record – a SeqRecord object to align primers against
primers – dictionary of {names: short IUPAC ambiguous sequence strings}
start – position where primer alignment starts
rev_primer – if True align with the tail end of the sequence
score_dict – optional dictionary of {(char1, char2): score} alignment scores

Returns:

primer alignment result object

Return type:

presto.Sequence.scoreDNA(a, b, mask_score=None, gap_score=None)

Returns the score for a pair of IUPAC ambiguous nucleotide characters

Parameters:

a – First character
b – Second character
n_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity

Returns:

Score for the character pair

Return type:

presto.Sequence.scoreSeqPair(seq1, seq2, ignore_chars={}, score_dict={('-', '-'): 1, ('-', '.'): 1, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 1, ('.', '.'): 1, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})

Determine the error rate for a pair of sequences

Parameters:

seq1 – SeqRecord object
seq2 – SeqRecord object
ignore_chars – Set of characters to ignore when scoring and counting the weight
score_dict – Optional dictionary of alignment scores

Returns:

Tuple of the (score, minimum weight, error rate) for the pair of sequences

Return type:

Tuple

presto.Sequence.sequentialAssembly(head_seq, tail_seq, ref_dict, ref_db, alpha=1e-05, max_error=0.3, min_len=8, max_len=1000, scan_reverse=False, min_ident=0.5, evalue=1e-05, max_hits=100, fill=False, aligner='usearch', aligner_exec='usearch', assembly_stats=None, score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 1, ('N', '.'): 1, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})

Stitches sequences together by first attempting de novo assembly then falling back to reference guided assembly

Parameters:

head_seq – the head SeqRecord
head_seq – the tail SeqRecord
ref_dict – a dictionary of reference SeqRecord objects
ref_db – the path and name of the reference database
alpha – the minimum p-value for a valid de novo assembly
max_error – the maximum error rate for a valid de novo assembly
min_len – minimum length of overlap to test for de novo assembly
max_len – maximum length of overlap to test for de novo assembly
scan_reverse – if True allow the head sequence to overhang the end of the tail sequence in de novo assembly if False end alignment scan at end of tail sequence or start of head sequence
min_ident – the minimum identity for a valid reference guided assembly
evalue – the E-value cut-off for reference guided assembly
max_hits – the maxhits output limit for reference guided assembly
fill – if False non-overlapping regions will be assigned Ns in reference guided assembly; if True non-overlapping regions will be filled with the reference sequence.
aligner – the alignment tool; one of ‘blastn’ or ‘usearch’
aligner_exec – the path to the alignment tool executable
assembly_stats – optional successes by trials numpy.array of p-values
score_dict – optional dictionary of character scores in the form {(char1, char2): score}.

Returns:

assembled sequence object.

Return type:

presto.Sequence.subsetSeqIndex(seq_dict, field, values, delimiter=('|', '=', ','))

Subsets a sequence set by annotation value

Parameters:

seq_dict – Dictionary index of sequences returned from SeqIO.index()
field – Annotation field to select keys by
values – List of annotation values that define the retained keys
delimiter – Tuple of delimiters for (annotations, field/values, value lists)

Returns:

List of keys

Return type:

list

presto.Sequence.subsetSeqSet(seq_iter, field, values, delimiter=('|', '=', ','))

Subsets a sequence set by annotation value

Parameters:

seq_iter – Iterator or list of SeqRecord objects
field – Annotation field to select by
values – List of annotation values that define the retained sequences
delimiter – Tuple of delimiters for (annotations, field/values, value lists)

Returns:

Modified list of SeqRecord objects

Return type:

list

presto.Sequence.translateAmbigDNA(key)

Translates IUPAC ambiguous nucleotide characters to or from character sets

Parameters:: key – String or re.search object containing the character set to translate
Returns:: Character translation
Return type:: str

presto.Sequence.trimQuality(data, min_qual=0, window=10, reverse=False)

Cuts sequences using a moving mean quality score

Parameters:

data (SeqData) – a SeqData object with a single SeqRecord to process.
min_qual (int) – minimum mean quality to define a cut point.
window (int) – nucleotide window size.
reverse (bool) – if True cut the head of the sequence; if False cut the tail of the sequence

Returns:

SeqResult object.

Return type:

presto.Sequence.weightSeq(seq, ignore_chars={})

Returns the length of a sequencing excluding ignored characters

Parameters:

seq – SeqRecord or Seq object
ignore_chars – Set of characters to ignore when counting sequence length

Returns:

Sum of the character scores for the sequence

Return type: