presto.Sequence
Sequence processing functions
- class presto.Sequence.AssemblyRecord(seq=None)
Bases:
object
A class defining a paired-end assembly result
- property overlap
- class presto.Sequence.AssemblyStats(n)
Bases:
object
Class containing p-value and z-score matrices for scoring assemblies
- class presto.Sequence.PrimerAlignment(seq=None)
Bases:
object
A class defining a primer alignment result
- __bool__()
Boolean evaluation of the alignment
- Returns:
evaluates to the value of the valid attribute
- Return type:
- presto.Sequence.alignAssembly(head_seq, tail_seq, alpha=1e-05, max_error=0.3, min_len=8, max_len=1000, scan_reverse=False, assembly_stats=None, score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 1, ('N', '.'): 1, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Stitches two sequences together by aligning the ends
- Parameters:
head_seq – the head SeqRecord.
head_seq – the tail SeqRecord.
alpha – the minimum p-value for a valid assembly.
max_error – the maximum error rate for a valid assembly.
min_len – minimum length of overlap to test.
max_len – maximum length of overlap to test.
scan_reverse – if True allow the head sequence to overhang the end of the tail sequence if False end alignment scan at end of tail sequence or start of head sequence.
assembly_stats – optional successes by trials numpy.array of p-values.
score_dict – optional dictionary of character scores in the . form {(char1, char2): score}.
- Returns:
assembled sequence object.
- Return type:
- presto.Sequence.calculateDiversity(seq_list, score_dict=getDNAScoreDict())
Determine the average pairwise error rate for a list of sequences
- Parameters:
seq_list – List of SeqRecord objects to score
score_dict – Optional dictionary of alignment scores as {(char1, char2): score}
- Returns:
Average pairwise error rate for the list of sequences
- Return type:
- presto.Sequence.calculateSetError(seq_list, ref_seq, ignore_chars=['n', 'N'], score_dict=getDNAScoreDict())
Counts the occurrence of nucleotide mismatches from a reference in a set of sequences
- Parameters:
seq_list – list of SeqRecord objects with aligned sequences.
ref_seq – SeqRecord object containing the reference sequence to match against.
ignore_chars – list of characters to exclude from mismatch counts.
score_dict – optional dictionary of alignment scores as {(char1, char2): score}.
- Returns:
error rate for the set.
- Return type:
- presto.Sequence.checkSeqEqual(seq1, seq2, ignore_chars={'-', '.', 'N', 'n'})
Determine if two sequences are equal, excluding missing positions
- Parameters:
seq1 – SeqRecord object
seq2 – SeqRecord object
ignore_chars – Set of characters to ignore
- Returns:
True if the sequences are equal
- Return type:
- presto.Sequence.compilePrimers(primers)
Translates IUPAC Ambiguous Nucleotide characters to regular expressions and compiles them
- Parameters:
key – Dictionary of sequences to translate
- Returns:
Dictionary of compiled regular expressions
- Return type:
- presto.Sequence.consensusUnify(data, field, delimiter=('|', '=', ','))
Reassigns all annotations to the consensus annotation in group
- Parameters:
data – SeqData object contain sequences to process.
field – field containing annotations to collapse.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).
- Returns:
modified sequences.
- Return type:
- presto.Sequence.deleteSeqPositions(seq, positions)
Deletes a list of positions from a SeqRecord
- Parameters:
seq – SeqRecord objects
positions – Set of positions (indices) to delete
- Returns:
Modified SeqRecord with the specified positions removed
- Return type:
SeqRecord
- presto.Sequence.deletionUnify(data, field, delimiter=('|', '=', ','))
Removes all sequences with differing annotations in a group
- Parameters:
data – SeqData object contain sequences to process.
field – field containing annotations to collapse.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).
- Returns:
modified sequences.
- Return type:
- presto.Sequence.extractAlignment(seq_record, start, length, rev_primer=False)
Extracts a subsequence from sequence
- Parameters:
data – SeqRecord to process.
start – position where subsequence starts.
length – the length of the subsequence to extract.
rev_primer – if True extract relative to the tail end of the sequence.
- Returns:
extraction results as an alignment object
- Return type:
- presto.Sequence.filterLength(data, min_length=250, inner=True, missing_chars='-n.N')
Filters sequences by length
- Parameters:
- Returns:
SeqResult object.
- Return type:
- presto.Sequence.filterMissing(data, max_missing=10, inner=True, missing_chars='-n.N')
Filters sequences by number of missing nucleotides
- Parameters:
- Returns:
SeqResult object.
- Return type:
- presto.Sequence.filterQuality(data, min_qual=0, inner=True, missing_chars='-n.N')
Filters sequences by quality score
- Parameters:
- Returns:
SeqResult object.
- Return type:
- presto.Sequence.filterRepeats(data, max_repeat=15, include_missing=False, inner=True, missing_chars='-n.N')
Filters sequences by fraction of ambiguous nucleotides
- Parameters:
data (SeqData) – a SeqData object with a single SeqRecord to process.
max_repeat (int) – the maximum number of allowed repeating characters.
include_missing (int) – if True count ambiguous character repeats; if False do not consider ambiguous character repeats.
inner (int) – if True exclude outer missing characters from calculation.
missing_chars (str) – a string of missing character values.
- Returns:
SeqResult object.
- Return type:
- presto.Sequence.findGapPositions(seq_list, max_gap, gap_chars={'-', '.'})
Finds positions in a set of aligned sequences with a high number of gap characters.
- Parameters:
seq_list – List of SeqRecord objects with aligned sequences
max_gap – Float of the maximum gap frequency to consider a position as non-gapped
gap_chars – Set of characters to consider as gaps
- Returns:
Positions (indices) with gap frequency greater than max_gap
- Return type:
- presto.Sequence.frequencyConsensus(seq_list, min_freq=0.6, ignore_chars={'-', '.', 'N', 'n'})
Builds a consensus sequence from a set of sequences
- Parameters:
set_seq – List of SeqRecord objects
min_freq – Frequency cutoff to assign a base
ignore_chars – Set of characters to exclude when building a consensus sequence
- Returns:
Consensus SeqRecord object
- Return type:
SeqRecord
- presto.Sequence.getAAScoreDict(mask_score=None, gap_score=None)
Generates a score dictionary
- Parameters:
mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- Returns:
Score dictionary with keys (char1, char2) mapping to scores
- Return type:
- presto.Sequence.getDNAScoreDict(mask_score=None, gap_score=None)
Generates a score dictionary
- Parameters:
mask_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- Returns:
Score dictionary with keys (char1, char2) mapping to scores
- Return type:
- presto.Sequence.indexSeqSets(seq_dict, field='BARCODE', delimiter=('|', '=', ','))
Identifies sets of sequences with the same ID field
- Parameters:
seq_dict – a dictionary index of sequences returned from SeqIO.index()
field – the annotation field containing set IDs
delimiter – a tuple of delimiters for (fields, values, value lists)
- Returns:
Dictionary mapping set name to a list of record names
- Return type:
- presto.Sequence.joinAssembly(head_seq, tail_seq, gap=0, insert_seq=None)
Concatenates two sequences
- Parameters:
head_seq – the head SeqRecord.
tail_seq – the tail SeqRecord.
gap – number of gap characters to insert between head and tail ignored if insert_seq is not None.
insert_seq – a string or Bio.Seq.Seq object, to insert between the head and tail; if None insert with N characters.
- Returns:
assembled sequence object.
- Return type:
- presto.Sequence.localAlignment(seq_record, primers, primers_regex=None, max_error=0.3, max_len=1000, rev_primer=False, skip_rc=False, gap_penalty=(1, 1), score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 0, ('N', 'B'): 0, ('N', 'C'): 0, ('N', 'D'): 0, ('N', 'G'): 0, ('N', 'H'): 0, ('N', 'K'): 0, ('N', 'M'): 0, ('N', 'N'): 0, ('N', 'R'): 0, ('N', 'S'): 0, ('N', 'T'): 0, ('N', 'U'): 0, ('N', 'V'): 0, ('N', 'W'): 0, ('N', 'Y'): 0, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Performs pairwise local alignment of a list of short sequences against a long sequence
- Parameters:
seq_record – a SeqRecord object to align primers against
primers – dictionary of {names: short IUPAC ambiguous sequence strings}
primers_regex – optional dictionary of {names: compiled primer regular expressions}
max_error – maximum acceptable error rate before aligning reverse complement
max_len – maximum length of sample sequence to align
rev_primer – if True align with the tail end of the sequence
skip_rc – if True do not check reverse complement sequences
gap_penalty – a tuple of positive (gap open, gap extend) penalties
score_dict – optional dictionary of alignment scores as {(char1, char2): score}
- Returns:
primer alignment result object
- Return type:
- presto.Sequence.maskQuality(data, min_qual=0)
Masks characters by in sequence by quality score
- presto.Sequence.maskSeq(align, mode='mask', barcode=False, barcode_field='BARCODE', primer_field='PRIMER', delimiter=('|', '=', ','))
Create an output sequence with primers masked or cut
- Parameters:
align – a PrimerAlignment object.
mode – defines the action taken; one of ‘cut’, ‘mask’, ‘tag’ or ‘trim’.
barcode – if True add sequence preceding primer to description.
barcode_field – name of the output barcode annotation.
primer_field – name of the output primer annotation.
delimiter – a tuple of delimiters for (annotations, field/values, value lists).
- Returns:
masked sequence.
- Return type:
Bio.SeqRecord.SeqRecord
- presto.Sequence.meanQuality(qual, prob=(1.0, 0.7943282347242815, 0.6309573444801932, 0.5011872336272722, 0.3981071705534972, 0.31622776601683794, 0.251188643150958, 0.19952623149688797, 0.15848931924611134, 0.12589254117941673, 0.1, 0.07943282347242814, 0.06309573444801933, 0.05011872336272722, 0.039810717055349734, 0.03162277660168379, 0.025118864315095794, 0.0199526231496888, 0.015848931924611134, 0.012589254117941675, 0.01, 0.007943282347242814, 0.00630957344480193, 0.005011872336272725, 0.003981071705534973, 0.0031622776601683794, 0.0025118864315095794, 0.001995262314968879, 0.001584893192461114, 0.0012589254117941675, 0.001, 0.0007943282347242813, 0.000630957344480193, 0.0005011872336272725, 0.00039810717055349735, 0.00031622776601683794, 0.00025118864315095795, 0.00019952623149688788, 0.00015848931924611142, 0.00012589254117941674, 0.0001, 7.943282347242822e-05, 6.309573444801929e-05, 5.011872336272725e-05, 3.9810717055349695e-05, 3.1622776601683795e-05, 2.5118864315095822e-05, 1.9952623149688786e-05, 1.584893192461114e-05, 1.2589254117941661e-05, 1e-05, 7.943282347242822e-06, 6.30957344480193e-06, 5.011872336272725e-06, 3.981071705534969e-06, 3.162277660168379e-06, 2.5118864315095823e-06, 1.9952623149688787e-06, 1.584893192461114e-06, 1.2589254117941661e-06, 1e-06, 7.943282347242822e-07, 6.30957344480193e-07, 5.011872336272725e-07, 3.981071705534969e-07, 3.162277660168379e-07, 2.5118864315095823e-07, 1.9952623149688787e-07, 1.584893192461114e-07, 1.2589254117941662e-07, 1e-07, 7.943282347242822e-08, 6.30957344480193e-08, 5.011872336272725e-08, 3.981071705534969e-08, 3.162277660168379e-08, 2.511886431509582e-08, 1.9952623149688786e-08, 1.5848931924611143e-08, 1.2589254117941661e-08, 1e-08, 7.943282347242822e-09, 6.309573444801943e-09, 5.011872336272715e-09, 3.981071705534969e-09, 3.1622776601683795e-09, 2.511886431509582e-09, 1.9952623149688828e-09, 1.584893192461111e-09, 1.2589254117941663e-09, 1e-09, 7.943282347242822e-10, 6.309573444801942e-10, 5.011872336272714e-10, 3.9810717055349694e-10, 3.1622776601683795e-10, 2.511886431509582e-10, 1.9952623149688828e-10, 1.584893192461111e-10, 1.2589254117941662e-10, 1e-10, 7.943282347242822e-11, 6.309573444801942e-11, 5.011872336272715e-11, 3.9810717055349695e-11, 3.1622776601683794e-11, 2.5118864315095823e-11, 1.9952623149688828e-11, 1.5848931924611107e-11, 1.2589254117941662e-11, 1e-11, 7.943282347242821e-12, 6.309573444801943e-12, 5.011872336272715e-12, 3.9810717055349695e-12, 3.1622776601683794e-12, 2.5118864315095823e-12, 1.9952623149688827e-12, 1.584893192461111e-12, 1.258925411794166e-12, 1e-12, 7.943282347242822e-13, 6.309573444801942e-13, 5.011872336272715e-13, 3.981071705534969e-13, 3.162277660168379e-13, 2.511886431509582e-13, 1.9952623149688827e-13))
Calculate mean quality score
- presto.Sequence.overlapConsensus(head_seq, tail_seq, ignore_chars={'-', '.', 'N', 'n'})
Creates a consensus overlap sequences from two segments
- Parameters:
head_seq – the overlap head SeqRecord.
tail_seq – the overlap tail SeqRecord.
ignore_chars – list of characters which do not contribute to consensus.
- Returns:
A SeqRecord object with consensus characters and quality scores.
- Return type:
SeqRecord
- presto.Sequence.qualityConsensus(seq_list, min_qual=0, min_freq=0.6, dependent=False, ignore_chars={'-', '.', 'N', 'n'})
Builds a consensus sequence from a set of sequences
- Parameters:
seq_list – List of SeqRecord objects
min_qual – Quality cutoff to assign a base
min_freq – Frequency cutoff to assign a base
dependent – If False assume sequences are independent for quality calculation
ignore_chars – Set of characters to exclude when building a consensus sequence
- Returns:
Consensus SeqRecord object
- Return type:
SeqRecord
- presto.Sequence.referenceAssembly(head_seq, tail_seq, ref_dict, ref_db, min_ident=0.5, evalue=1e-05, max_hits=100, fill=False, aligner='usearch', aligner_exec='usearch', score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 1, ('N', '.'): 1, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Stitches two sequences together by aligning against a reference database
- Parameters:
head_seq – the head SeqRecord.
head_seq – the tail SeqRecord.
ref_dict – a dictionary of reference SeqRecord objects.
ref_db – the path and name of the reference database.
min_ident – the minimum identity for a valid assembly.
evalue – the E-value cut-off for ublast.
max_hits – the maxhits output limit for ublast.
fill – if False non-overlapping regions will be assigned Ns; if True non-overlapping regions will be filled with the reference sequence.
aligner – the alignment tool; one of ‘blastn’ or ‘usearch’.
aligner_exec – the path to the alignment tool executable.
score_dict – optional dictionary of character scores in the form {(char1, char2): score}.
- Returns:
assembled sequence object.
- Return type:
- presto.Sequence.reverseComplement(seq)
Takes the reverse complement of a sequence
- Parameters:
seq – a SeqRecord object, Seq object or string to reverse complement
- Returns:
Object of the same type as the input with the reverse complement sequence
- Return type:
Seq
- presto.Sequence.scoreAA(a, b, mask_score=None, gap_score=None)
Returns the score for a pair of IUPAC Extended Protein characters
- Parameters:
a – First character
b – Second character
mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- Returns:
Score for the character pair
- Return type:
- presto.Sequence.scoreAlignment(seq_record, primers, start=0, rev_primer=False, score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 0, ('N', 'B'): 0, ('N', 'C'): 0, ('N', 'D'): 0, ('N', 'G'): 0, ('N', 'H'): 0, ('N', 'K'): 0, ('N', 'M'): 0, ('N', 'N'): 0, ('N', 'R'): 0, ('N', 'S'): 0, ('N', 'T'): 0, ('N', 'U'): 0, ('N', 'V'): 0, ('N', 'W'): 0, ('N', 'Y'): 0, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Performs a simple fixed position alignment of primers
- Parameters:
seq_record – a SeqRecord object to align primers against
primers – dictionary of {names: short IUPAC ambiguous sequence strings}
start – position where primer alignment starts
rev_primer – if True align with the tail end of the sequence
score_dict – optional dictionary of {(char1, char2): score} alignment scores
- Returns:
primer alignment result object
- Return type:
- presto.Sequence.scoreDNA(a, b, mask_score=None, gap_score=None)
Returns the score for a pair of IUPAC ambiguous nucleotide characters
- Parameters:
a – First character
b – Second character
n_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- Returns:
Score for the character pair
- Return type:
- presto.Sequence.scoreSeqPair(seq1, seq2, ignore_chars={}, score_dict={('-', '-'): 1, ('-', '.'): 1, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 1, ('.', '.'): 1, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 0, ('N', '.'): 0, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Determine the error rate for a pair of sequences
- Parameters:
seq1 – SeqRecord object
seq2 – SeqRecord object
ignore_chars – Set of characters to ignore when scoring and counting the weight
score_dict – Optional dictionary of alignment scores
- Returns:
Tuple of the (score, minimum weight, error rate) for the pair of sequences
- Return type:
Tuple
- presto.Sequence.sequentialAssembly(head_seq, tail_seq, ref_dict, ref_db, alpha=1e-05, max_error=0.3, min_len=8, max_len=1000, scan_reverse=False, min_ident=0.5, evalue=1e-05, max_hits=100, fill=False, aligner='usearch', aligner_exec='usearch', assembly_stats=None, score_dict={('-', '-'): 0, ('-', '.'): 0, ('-', 'A'): 0, ('-', 'B'): 0, ('-', 'C'): 0, ('-', 'D'): 0, ('-', 'G'): 0, ('-', 'H'): 0, ('-', 'K'): 0, ('-', 'M'): 0, ('-', 'N'): 0, ('-', 'R'): 0, ('-', 'S'): 0, ('-', 'T'): 0, ('-', 'U'): 0, ('-', 'V'): 0, ('-', 'W'): 0, ('-', 'Y'): 0, ('.', '-'): 0, ('.', '.'): 0, ('.', 'A'): 0, ('.', 'B'): 0, ('.', 'C'): 0, ('.', 'D'): 0, ('.', 'G'): 0, ('.', 'H'): 0, ('.', 'K'): 0, ('.', 'M'): 0, ('.', 'N'): 0, ('.', 'R'): 0, ('.', 'S'): 0, ('.', 'T'): 0, ('.', 'U'): 0, ('.', 'V'): 0, ('.', 'W'): 0, ('.', 'Y'): 0, ('A', '-'): 0, ('A', '.'): 0, ('A', 'A'): 1, ('A', 'B'): 0, ('A', 'C'): 0, ('A', 'D'): 1, ('A', 'G'): 0, ('A', 'H'): 1, ('A', 'K'): 0, ('A', 'M'): 1, ('A', 'N'): 1, ('A', 'R'): 1, ('A', 'S'): 0, ('A', 'T'): 0, ('A', 'U'): 0, ('A', 'V'): 1, ('A', 'W'): 1, ('A', 'Y'): 0, ('B', '-'): 0, ('B', '.'): 0, ('B', 'A'): 0, ('B', 'B'): 1, ('B', 'C'): 1, ('B', 'D'): 1, ('B', 'G'): 1, ('B', 'H'): 1, ('B', 'K'): 1, ('B', 'M'): 1, ('B', 'N'): 1, ('B', 'R'): 1, ('B', 'S'): 1, ('B', 'T'): 1, ('B', 'U'): 1, ('B', 'V'): 1, ('B', 'W'): 1, ('B', 'Y'): 1, ('C', '-'): 0, ('C', '.'): 0, ('C', 'A'): 0, ('C', 'B'): 1, ('C', 'C'): 1, ('C', 'D'): 0, ('C', 'G'): 0, ('C', 'H'): 1, ('C', 'K'): 0, ('C', 'M'): 1, ('C', 'N'): 1, ('C', 'R'): 0, ('C', 'S'): 1, ('C', 'T'): 0, ('C', 'U'): 0, ('C', 'V'): 1, ('C', 'W'): 0, ('C', 'Y'): 1, ('D', '-'): 0, ('D', '.'): 0, ('D', 'A'): 1, ('D', 'B'): 1, ('D', 'C'): 0, ('D', 'D'): 1, ('D', 'G'): 1, ('D', 'H'): 1, ('D', 'K'): 1, ('D', 'M'): 1, ('D', 'N'): 1, ('D', 'R'): 1, ('D', 'S'): 1, ('D', 'T'): 1, ('D', 'U'): 1, ('D', 'V'): 1, ('D', 'W'): 1, ('D', 'Y'): 1, ('G', '-'): 0, ('G', '.'): 0, ('G', 'A'): 0, ('G', 'B'): 1, ('G', 'C'): 0, ('G', 'D'): 1, ('G', 'G'): 1, ('G', 'H'): 0, ('G', 'K'): 1, ('G', 'M'): 0, ('G', 'N'): 1, ('G', 'R'): 1, ('G', 'S'): 1, ('G', 'T'): 0, ('G', 'U'): 0, ('G', 'V'): 1, ('G', 'W'): 0, ('G', 'Y'): 0, ('H', '-'): 0, ('H', '.'): 0, ('H', 'A'): 1, ('H', 'B'): 1, ('H', 'C'): 1, ('H', 'D'): 1, ('H', 'G'): 0, ('H', 'H'): 1, ('H', 'K'): 1, ('H', 'M'): 1, ('H', 'N'): 1, ('H', 'R'): 1, ('H', 'S'): 1, ('H', 'T'): 1, ('H', 'U'): 1, ('H', 'V'): 1, ('H', 'W'): 1, ('H', 'Y'): 1, ('K', '-'): 0, ('K', '.'): 0, ('K', 'A'): 0, ('K', 'B'): 1, ('K', 'C'): 0, ('K', 'D'): 1, ('K', 'G'): 1, ('K', 'H'): 1, ('K', 'K'): 1, ('K', 'M'): 0, ('K', 'N'): 1, ('K', 'R'): 1, ('K', 'S'): 1, ('K', 'T'): 1, ('K', 'U'): 1, ('K', 'V'): 1, ('K', 'W'): 1, ('K', 'Y'): 1, ('M', '-'): 0, ('M', '.'): 0, ('M', 'A'): 1, ('M', 'B'): 1, ('M', 'C'): 1, ('M', 'D'): 1, ('M', 'G'): 0, ('M', 'H'): 1, ('M', 'K'): 0, ('M', 'M'): 1, ('M', 'N'): 1, ('M', 'R'): 1, ('M', 'S'): 1, ('M', 'T'): 0, ('M', 'U'): 0, ('M', 'V'): 1, ('M', 'W'): 1, ('M', 'Y'): 1, ('N', '-'): 1, ('N', '.'): 1, ('N', 'A'): 1, ('N', 'B'): 1, ('N', 'C'): 1, ('N', 'D'): 1, ('N', 'G'): 1, ('N', 'H'): 1, ('N', 'K'): 1, ('N', 'M'): 1, ('N', 'N'): 1, ('N', 'R'): 1, ('N', 'S'): 1, ('N', 'T'): 1, ('N', 'U'): 1, ('N', 'V'): 1, ('N', 'W'): 1, ('N', 'Y'): 1, ('R', '-'): 0, ('R', '.'): 0, ('R', 'A'): 1, ('R', 'B'): 1, ('R', 'C'): 0, ('R', 'D'): 1, ('R', 'G'): 1, ('R', 'H'): 1, ('R', 'K'): 1, ('R', 'M'): 1, ('R', 'N'): 1, ('R', 'R'): 1, ('R', 'S'): 1, ('R', 'T'): 0, ('R', 'U'): 0, ('R', 'V'): 1, ('R', 'W'): 1, ('R', 'Y'): 0, ('S', '-'): 0, ('S', '.'): 0, ('S', 'A'): 0, ('S', 'B'): 1, ('S', 'C'): 1, ('S', 'D'): 1, ('S', 'G'): 1, ('S', 'H'): 1, ('S', 'K'): 1, ('S', 'M'): 1, ('S', 'N'): 1, ('S', 'R'): 1, ('S', 'S'): 1, ('S', 'T'): 0, ('S', 'U'): 0, ('S', 'V'): 1, ('S', 'W'): 0, ('S', 'Y'): 1, ('T', '-'): 0, ('T', '.'): 0, ('T', 'A'): 0, ('T', 'B'): 1, ('T', 'C'): 0, ('T', 'D'): 1, ('T', 'G'): 0, ('T', 'H'): 1, ('T', 'K'): 1, ('T', 'M'): 0, ('T', 'N'): 1, ('T', 'R'): 0, ('T', 'S'): 0, ('T', 'T'): 1, ('T', 'U'): 1, ('T', 'V'): 0, ('T', 'W'): 1, ('T', 'Y'): 1, ('U', '-'): 0, ('U', '.'): 0, ('U', 'A'): 0, ('U', 'B'): 1, ('U', 'C'): 0, ('U', 'D'): 1, ('U', 'G'): 0, ('U', 'H'): 1, ('U', 'K'): 1, ('U', 'M'): 0, ('U', 'N'): 1, ('U', 'R'): 0, ('U', 'S'): 0, ('U', 'T'): 1, ('U', 'U'): 1, ('U', 'V'): 0, ('U', 'W'): 1, ('U', 'Y'): 1, ('V', '-'): 0, ('V', '.'): 0, ('V', 'A'): 1, ('V', 'B'): 1, ('V', 'C'): 1, ('V', 'D'): 1, ('V', 'G'): 1, ('V', 'H'): 1, ('V', 'K'): 1, ('V', 'M'): 1, ('V', 'N'): 1, ('V', 'R'): 1, ('V', 'S'): 1, ('V', 'T'): 0, ('V', 'U'): 0, ('V', 'V'): 1, ('V', 'W'): 1, ('V', 'Y'): 1, ('W', '-'): 0, ('W', '.'): 0, ('W', 'A'): 1, ('W', 'B'): 1, ('W', 'C'): 0, ('W', 'D'): 1, ('W', 'G'): 0, ('W', 'H'): 1, ('W', 'K'): 1, ('W', 'M'): 1, ('W', 'N'): 1, ('W', 'R'): 1, ('W', 'S'): 0, ('W', 'T'): 1, ('W', 'U'): 1, ('W', 'V'): 1, ('W', 'W'): 1, ('W', 'Y'): 1, ('Y', '-'): 0, ('Y', '.'): 0, ('Y', 'A'): 0, ('Y', 'B'): 1, ('Y', 'C'): 1, ('Y', 'D'): 1, ('Y', 'G'): 0, ('Y', 'H'): 1, ('Y', 'K'): 1, ('Y', 'M'): 1, ('Y', 'N'): 1, ('Y', 'R'): 0, ('Y', 'S'): 1, ('Y', 'T'): 1, ('Y', 'U'): 1, ('Y', 'V'): 1, ('Y', 'W'): 1, ('Y', 'Y'): 1})
Stitches sequences together by first attempting de novo assembly then falling back to reference guided assembly
- Parameters:
head_seq – the head SeqRecord
head_seq – the tail SeqRecord
ref_dict – a dictionary of reference SeqRecord objects
ref_db – the path and name of the reference database
alpha – the minimum p-value for a valid de novo assembly
max_error – the maximum error rate for a valid de novo assembly
min_len – minimum length of overlap to test for de novo assembly
max_len – maximum length of overlap to test for de novo assembly
scan_reverse – if True allow the head sequence to overhang the end of the tail sequence in de novo assembly if False end alignment scan at end of tail sequence or start of head sequence
min_ident – the minimum identity for a valid reference guided assembly
evalue – the E-value cut-off for reference guided assembly
max_hits – the maxhits output limit for reference guided assembly
fill – if False non-overlapping regions will be assigned Ns in reference guided assembly; if True non-overlapping regions will be filled with the reference sequence.
aligner – the alignment tool; one of ‘blastn’ or ‘usearch’
aligner_exec – the path to the alignment tool executable
assembly_stats – optional successes by trials numpy.array of p-values
score_dict – optional dictionary of character scores in the form {(char1, char2): score}.
- Returns:
assembled sequence object.
- Return type:
- presto.Sequence.subsetSeqIndex(seq_dict, field, values, delimiter=('|', '=', ','))
Subsets a sequence set by annotation value
- Parameters:
seq_dict – Dictionary index of sequences returned from SeqIO.index()
field – Annotation field to select keys by
values – List of annotation values that define the retained keys
delimiter – Tuple of delimiters for (annotations, field/values, value lists)
- Returns:
List of keys
- Return type:
- presto.Sequence.subsetSeqSet(seq_iter, field, values, delimiter=('|', '=', ','))
Subsets a sequence set by annotation value
- Parameters:
seq_iter – Iterator or list of SeqRecord objects
field – Annotation field to select by
values – List of annotation values that define the retained sequences
delimiter – Tuple of delimiters for (annotations, field/values, value lists)
- Returns:
Modified list of SeqRecord objects
- Return type:
- presto.Sequence.translateAmbigDNA(key)
Translates IUPAC ambiguous nucleotide characters to or from character sets
- Parameters:
key – String or re.search object containing the character set to translate
- Returns:
Character translation
- Return type:
- presto.Sequence.trimQuality(data, min_qual=0, window=10, reverse=False)
Cuts sequences using a moving mean quality score
- Parameters:
- Returns:
SeqResult object.
- Return type: