presto.Multiprocessing
Multiprocessing functions
- class presto.Multiprocessing.SeqData(id, data)
Bases:
object
Class defining sequence data objects for worker processes
- id
unique identifier
- data
single object or a list of data objects.
- valid
if True data is suitable for processing.
- class presto.Multiprocessing.SeqResult(id, data)
Bases:
object
Class defining sequence result objects for collector processes
- id
unique identifier
- data
single unprocessed object or a list of unprocessed data objects.
- results
single processed object or a list of processed data objects.
- valid
if True processing was successful.
- log
dictionary containing the processing log.
- presto.Multiprocessing.collectPairQueue(alive, result_queue, collect_queue, seq_file_1, seq_file_2, label, out_file=None, out_args={'delimiter': ('|', '=', ','), 'failed': True, 'log_file': None, 'out_dir': None, 'out_name': None, 'out_type': None, 'separator': ','})
Pulls from results queue, assembles results and manages log and file IO
- Parameters:
alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns.
result_queue – a multiprocessing.Queue holding worker results.
collect_queue – a multiprocessing.Queue holding collector return values.
seq_file_1 – the first sequence file name.
seq_file_2 – the second sequence file name.
label – task label used to tag the output files.
out_file – output file name. Automatically generated from the input file if None.
out_args – common output argument dictionary from parseCommonArgs.
- Returns:
adds a dictionary of {log: log object, out_files: output file names} to collect_queue.
- Return type:
None
- presto.Multiprocessing.collectSeqQueue(alive, result_queue, collect_queue, seq_file, label, index_field=None, out_file=None, out_args={'delimiter': ('|', '=', ','), 'failed': True, 'log_file': None, 'out_dir': None, 'out_name': None, 'out_type': None, 'separator': ','})
Pulls from results queue, assembles results and manages log and file IO
- Parameters:
alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns.
result_queue – Multiprocessing.Queue holding worker results.
collect_queue – Multiprocessing.Queue to store collector return values.
seq_file – sample sequence file name.
label – task label used to tag the output files.
out_file – output file name. Automatically generated from the input file if None.
out_args – Common output argument dictionary from parseCommonArgs.
index_field – Field defining set membership for sequence sets if None data queue contained individual records.
- Returns:
- Adds a dictionary with key value pairs to collect_queue containing
’log’ defining a log object, ‘out_files’ defining the output file names
- Return type:
None
- presto.Multiprocessing.feedPairQueue(alive, data_queue, seq_file_1, seq_file_2, coord_type='presto', delimiter=('|', '=', ','))
Feeds the data queue with sequence pairs for processQueue processes
- Parameters:
alive – a multiprocessing.Value boolean controlling whether processing continues; when False function returns
data_queue – an multiprocessing.Queue to hold data for processing
seq_file_1 – the name of sequence file 1
seq_file_2 – the name of sequence file 2
coord_type – the sequence header format
delimiter – a tuple of delimiters for (fields, values, value lists)
- Returns:
None
- presto.Multiprocessing.feedSeqQueue(alive, data_queue, seq_file, index_func=None, index_args={})
Feeds the data queue with SeqRecord objects
- Parameters:
alive – multiprocessing.Value boolean controlling whether processing continues; when False function returns
data_queue – multiprocessing.Queue to hold data for processing
seq_file – Sequence file to read input from
index_func – Function to use to define sequence sets if None do not index sets and feed individual records
index_args – Dictionary of arguments to pass to index_func
- Returns:
None
- presto.Multiprocessing.manageProcesses(feed_func, work_func, collect_func, feed_args={}, work_args={}, collect_args={}, nproc=None, queue_size=None)
Manages feeder, worker and collector processes
- Parameters:
feed_func (function) – Data Queue feeder function.
work_func (function) – Worker function.
collect_func (function) – Result Queue collector function.
feed_args (dict) – Dictionary of arguments to pass to feed_func.
work_args (dict) – Dictionary of arguments to pass to work_func.
collect_args (dict) – Dictionary of arguments to pass to collect_func.
nproc (int) – Number of processQueue processes; if None defaults to the number of CPUs
queue_size (int) – Maximum size of the argument queue; if None defaults to 2*nproc
- Returns:
Dictionary of collector results
- Return type:
- presto.Multiprocessing.processSeqQueue(alive, data_queue, result_queue, process_func, process_args={})
Pulls from data queue, performs calculations, and feeds results queue
- Parameters:
alive – multiprocessing.Value boolean controlling whether processing continues; when False function returns
data_queue – multiprocessing.Queue holding data to process
result_queue – multiprocessing.Queue to hold processed results
process_func – function to use for processing sequences
process_args – Dictionary of arguments to pass to process_func
- Returns:
None