presto.IO

File I/O and logging functions

presto.IO.countSeqFile(seq_file)

Counts the records in FASTA/FASTQ files

Parameters:seq_file – FASTA or FASTQ file containing sample sequences
Returns:Count of records in the sequence file
Return type:int
presto.IO.countSeqSets(seq_file, field='BARCODE', delimiter=('|', '=', ', '))

Identifies sets of sequences with the same ID field

Parameters:
  • seq_file – FASTA or FASTQ file containing sample sequences
  • field – Annotation field containing set IDs
  • delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:

Count of unit set IDs in the sequence file

Return type:

int

presto.IO.getFileType(filename)

Determines the type of a file by file extension

Parameters:filename – Filename
Returns:String defining the sequence type for SeqIO operations
Return type:str
presto.IO.getOutputHandle(in_file, out_label=None, out_dir=None, out_name=None, out_type=None)

Opens an output file handle

Parameters:
  • in_file – Input filename
  • out_label – Text to be inserted before the file extension; if None do not add a label
  • out_type – the file extension of the output file; if None use input file extension
  • out_dir – the output directory; if None use directory of input file
  • out_name – the short filename to use for the output file; if None use input file short name
Returns:

File handle

Return type:

file

presto.IO.printLog(record, handle=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, inset=None)

Formats a dictionary into an IgPipeline log string

Parameters:
  • record – a dict or OrderedDict of field names mapping to values
  • handle – the file handle to write the log to; if None do not write to file
  • inset – minimum field name inset; if None automatically space field names
Returns:

Formatted multi-line string in IgPipeline log format

Return type:

str

presto.IO.printMessage(message, start_time=None, end=False, width=20)

Prints a progress message to standard out

Parameters:
  • message – Current task message
  • start_time – task start time returned by time.time(); if None do not add run time to progress
  • end – If True print final message (add newline)
  • width – Maximum number of characters for messages
Returns:

None

presto.IO.printProgress(current, total=None, step=None, start_time=None, end=False)

Prints a progress bar to standard out

Parameters:
  • current – Count of completed tasks
  • total – Total task count; if None do not print percentage
  • step – Float defining the fractional progress increment to print if total is defined; an int defining the progress increment to print at if total is not defined; if None always output the progress
  • start_time – Task start time returned by time.time(); if None do not add run time to progress
  • end – if True print final log (add newline)
Returns:

None

presto.IO.readPrimerFile(primer_file)

Processes primer sequences from file

Parameters:primer_file – name of file containing primer sequences
Returns:Dictionary mapping primer id to primer sequence
Return type:dict
presto.IO.readReferenceFile(ref_file)

Create a dictionary of cleaned and ungapped reference sequences.

Parameters:ref_file – reference sequences in fasta format.
Returns:
cleaned and ungapped reference sequences;
with the key as the sequence ID and value as a Bio.SeqRecord for each reference sequence.
Return type:dict
presto.IO.readSeqFile(seq_file, index=False, key_func=None)

Reads FASTA/FASTQ files

Parameters:
  • seq_file – FASTA or FASTQ file containing sample sequences
  • index – If True return a dictionary from SeqIO.index(); if False return an iterator from SeqIO.parse()
  • key_func – the key_function argument to pass to SeqIO.index if index=True
Returns:

Tuple of (input file type, sequence record object)

Return type:

tuple