presto.Multiprocessing

Multiprocessing functions

class presto.Multiprocessing.SeqData(id, data)

Bases: object

Class defining sequence data objects for worker processes

id

unique identifier

data

single object or a list of data objects.

valid

if True data is suitable for processing.

__bool__()

Boolean evaluation

Returns:True if the valid attribute is True
Return type:bool
__len__()

Length evaluation

Returns:number of objects in the data attribute.
Return type:int
class presto.Multiprocessing.SeqResult(id, data)

Bases: object

Class defining sequence result objects for collector processes

id

unique identifier

data

single unprocessed object or a list of unprocessed data objects.

results

single processed object or a list of processed data objects.

valid

if True processing was successful.

log

dictionary containing the processing log.

__bool__()

Boolean evaluation

Returns:True if the valid attribute is True.
Return type:bool
__len__()

Length evaluation

Returns:number of objects in the results attribute.
Return type:int
data_count

Data length

Returns:number of objects in the data attribute.
Return type:int
presto.Multiprocessing.collectPairQueue(alive, result_queue, collect_queue, seq_file_1, seq_file_2, label, out_file=None, out_args={'delimiter': ('|', '=', ', '), 'failed': True, 'log_file': None, 'out_dir': None, 'out_name': None, 'out_type': None, 'separator': ', '})

Pulls from results queue, assembles results and manages log and file IO

Parameters:
  • alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns.
  • result_queue – a multiprocessing.Queue holding worker results.
  • collect_queue – a multiprocessing.Queue holding collector return values.
  • seq_file_1 – the first sequence file name.
  • seq_file_2 – the second sequence file name.
  • label – task label used to tag the output files.
  • out_file – output file name. Automatically generated from the input file if None.
  • out_args – common output argument dictionary from parseCommonArgs.
Returns:

adds a dictionary of {log: log object, out_files: output file names} to collect_queue.

Return type:

None

presto.Multiprocessing.collectSeqQueue(alive, result_queue, collect_queue, seq_file, label, index_field=None, out_file=None, out_args={'delimiter': ('|', '=', ', '), 'failed': True, 'log_file': None, 'out_dir': None, 'out_name': None, 'out_type': None, 'separator': ', '})

Pulls from results queue, assembles results and manages log and file IO

Parameters:
  • alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns.
  • result_queue – Multiprocessing.Queue holding worker results.
  • collect_queue – Multiprocessing.Queue to store collector return values.
  • seq_file – sample sequence file name.
  • label – task label used to tag the output files.
  • out_file – output file name. Automatically generated from the input file if None.
  • out_args – Common output argument dictionary from parseCommonArgs.
  • index_field – Field defining set membership for sequence sets if None data queue contained individual records.
Returns:

Adds a dictionary with key value pairs to collect_queue containing

’log’ defining a log object, ‘out_files’ defining the output file names

Return type:

None

presto.Multiprocessing.feedPairQueue(alive, data_queue, seq_file_1, seq_file_2, coord_type='presto', delimiter=('|', '=', ', '))

Feeds the data queue with sequence pairs for processQueue processes

Parameters:
  • alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns
  • data_queue – an multiprocessing.Queue to hold data for processing
  • seq_file_1 – the name of sequence file 1
  • seq_file_2 – the name of sequence file 2
  • coord_type – the sequence header format
  • delimiter – a tuple of delimiters for (fields, values, value lists)
Returns:

None

presto.Multiprocessing.feedSeqQueue(alive, data_queue, seq_file, index_func=None, index_args={})

Feeds the data queue with SeqRecord objects

Parameters:
  • alive – multiprocessing.Value boolean controlling whether processing continues; when False function returns
  • data_queue – multiprocessing.Queue to hold data for processing
  • seq_file – Sequence file to read input from
  • index_func – Function to use to define sequence sets if None do not index sets and feed individual records
  • index_args – Dictionary of arguments to pass to index_func
Returns:

None

presto.Multiprocessing.manageProcesses(feed_func, work_func, collect_func, feed_args={}, work_args={}, collect_args={}, nproc=None, queue_size=None)

Manages feeder, worker and collector processes

Parameters:
  • feed_func (function) – Data Queue feeder function.
  • work_func (function) – Worker function.
  • collect_func (function) – Result Queue collector function.
  • feed_args (dict) – Dictionary of arguments to pass to feed_func.
  • work_args (dict) – Dictionary of arguments to pass to work_func.
  • collect_args (dict) – Dictionary of arguments to pass to collect_func.
  • nproc (int) – Number of processQueue processes; if None defaults to the number of CPUs
  • queue_size (int) – Maximum size of the argument queue; if None defaults to 2*nproc
Returns:

Dictionary of collector results

Return type:

dict

presto.Multiprocessing.processSeqQueue(alive, data_queue, result_queue, process_func, process_args={})

Pulls from data queue, performs calculations, and feeds results queue

Parameters:
  • alive – multiprocessing.Value boolean controlling whether processing continues; when False function returns
  • data_queue – multiprocessing.Queue holding data to process
  • result_queue – multiprocessing.Queue to hold processed results
  • process_func – function to use for processing sequences
  • process_args – Dictionary of arguments to pass to process_func
Returns:

None