presto.Annotation

Annotation functions

presto.Annotation.annotationConsensus(seq_iter, field, delimiter=('|', '=', ', '))

Calculate a consensus annotation for a set of sequences

Parameters:
  • seq_iter – an iterator or list of SeqRecord objects
  • field – the annotation field to take a consensus of
  • delimiter – a tuple of delimiters for (annotations, field/values, value lists)
Returns:

Dictionary with keys

set containing a list of unique annotation values, count containing annotation counts, cons containing the consensus annotation, freq containing the majority annotation frequency

Return type:

dict

presto.Annotation.collapseAnnotation(ann_dict, action, fields=None, delimiter=('|', '=', ', '))

Collapses multiple annotations into new single annotations for each field

Parameters:
  • ann_dict – Dictionary of field/value pairs
  • action – Collapse action to take; one of {min, max, sum, first, last, set, cat}
  • fields – Subset of ann_dict to _collapse; if None _collapse all but the ID field
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Modified field dictionary

Return type:

OrderedDict

presto.Annotation.flattenAnnotation(ann_dict, delimiter=('|', '=', ', '))

Converts annotations from a dictionary to a FASTA/FASTQ sequence description

Parameters:
  • ann_dict – Dictionary of field/value pairs
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Formatted sequence description string

Return type:

str

presto.Annotation.getAnnotationValues(seq_iter, field, unique=False, delimiter=('|', '=', ', '))

Gets the set of unique annotation values in a sequence set

Parameters:
  • seq_iter – Iterator or list of SeqRecord objects
  • field – Annotation field to retrieve values for
  • unique – If True return a list of only the unique values; if False return a list of all values
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

List of values for the field

Return type:

list

presto.Annotation.getCoordKey(header, coord_type='presto', delimiter=('|', '=', ', '))

Return the coordinate identifier for a sequence description

Parameters:
  • header – Sequence header string
  • coord_type – Sequence header format; one of [‘illumina’, ‘solexa’, ‘sra’, ‘454’, ‘presto’]; if unrecognized type or None return sequence ID.
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Coordinate identifier as a string

Return type:

str

presto.Annotation.mergeAnnotation(ann_dict_1, ann_dict_2, prepend=False, delimiter=('|', '=', ', '))

Merges non-ID field annotations from one field dictionary into another

Parameters:
  • ann_dict_1 – Dictionary of field/value pairs to append to
  • ann_dict_2 – Dictionary of field/value pairs to merge with ann_dict_2
  • prepend – If True then add ann_dict_2 values to the front of any ann_dict_1 values that are already present, rather than the default behavior of appending ann_dict_2 values.
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Modified ann_dict_1 dictonary of field/value pairs

Return type:

OrderedDict

presto.Annotation.parseAnnotation(record, fields=None, delimiter=('|', '=', ', '))

Extracts annotations from a FASTA/FASTQ sequence description

Parameters:
  • record – Description string to extract annotations from
  • fields – List of fields to subset the return dictionary to; if None return all fields
  • delimiter – a tuple of delimiters for (fields, values, value lists)
Returns:

An OrderedDict of field/value pairs

Return type:

OrderedDict

presto.Annotation.renameAnnotation(ann_dict, old_field, new_field, delimiter=('|', '=', ', '))

Renames an annotation and merges annotations if the new name already exists

Parameters:
  • ann_dict – Dictionary of field/value pairs
  • old_field – Old field name
  • new_field – New field name
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Modified fields dictonary

Return type:

OrderedDict