MaskPrimers.py

Removes primers and annotates sequences with primer and barcode identifiers

usage: MaskPrimers.py [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
mask-pass

processed reads with successful primer matches.

mask-fail

raw reads failing primer identification.

output annotation fields:
SEQORIENT

the orientation of the output sequence. Either F (input) or RC (reverse complement of input).

PRIMER

name of the best primer match.

BARCODE

the sequence preceding the primer match. Only output when the –barcode flag is specified.

MaskPrimers.py align

Find primer matches using pairwise local alignment.

usage: MaskPrimers.py align [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                            [--fasta] [--delim DELIMITER DELIMITER DELIMITER]
                            [--nproc NPROC] -p PRIMER_FILE
                            [--maxerror MAX_ERROR] [--maxlen MAX_LEN]
                            [--gap GAP_PENALTY GAP_PENALTY] [--revpr]
                            [--skiprc] [--mode {cut,mask,trim,tag}]
                            [--barcode] [--bf BARCODE_FIELD]
                            [--pf PRIMER_FIELD]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-p <primer_file>

A FASTA file containing primer sequences.

--maxerror <max_error>

Maximum allowable error rate.

--maxlen <max_len>

Length of the sequence window to scan for primers.

--gap <gap_penalty>

A list of two positive values defining the gap open and gap extension penalties for aligning the primers. Note: the error rate is calculated as the percentage of mismatches from the primer sequence with gap penalties reducing the match count accordingly; this may lead to error rates that differ from strict mismatch percentage when gaps are present in the alignment.

--revpr

Specify to match the tail-end of the sequence against the reverse complement of the primers. This also reverses the behavior of the –maxlen argument, such that the search window begins at the tail-end of the sequence.

--skiprc

Specify to prevent checking of sample reverse complement sequences.

--mode {cut,mask,trim,tag}

Specifies the action to take with the primer sequence. The “cut” mode will remove both the primer region and the preceding sequence. The “mask” mode will replace the primer region with Ns and remove the preceding sequence. The “trim” mode will remove the region preceding the primer, but leave the primer region intact. The “tag” mode will leave the input sequence unmodified.

--barcode

Specify to annotate reads sequences with barcode sequences (unique molecular identifiers) found preceding the primer.

--bf <barcode_field>

Name of the barcode annotation field.

--pf <primer_field>

Name of the annotation field containing the primer name.

MaskPrimers.py extract

Remove and annotate a fixed sequence region.

usage: MaskPrimers.py extract [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                              [-o OUT_FILES [OUT_FILES ...]]
                              [--outdir OUT_DIR] [--outname OUT_NAME]
                              [--log LOG_FILE] [--failed] [--fasta]
                              [--delim DELIMITER DELIMITER DELIMITER]
                              [--nproc NPROC] [--start START] --len LENGTH
                              [--revpr] [--mode {cut,mask,trim,tag}]
                              [--barcode] [--bf BARCODE_FIELD]
                              [--pf PRIMER_FIELD]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--start <start>

The starting position of the sequence region to extract.

--len <length>

The length of the sequence to extract.

--revpr

Specify to extract from the tail end of the sequence with the start position relative to the end of the sequence.

--mode {cut,mask,trim,tag}

Specifies the action to take with the sequence region. The “cut” mode will remove the region. The “mask” mode will replace the specified region with Ns. The “trim” mode will remove the sequence preceding the specified region, but leave the region intact. The “tag” mode will leave the input sequence unmodified.

--barcode

Specify to remove the sequence preceding the extracted region and annotate the read with that sequence.

--bf <barcode_field>

Name of the barcode annotation field.

--pf <primer_field>

Name of the annotation field containing the extracted sequence region.

MaskPrimers.py score

Find primer matches by scoring primers at a fixed position.

usage: MaskPrimers.py score [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                            [--fasta] [--delim DELIMITER DELIMITER DELIMITER]
                            [--nproc NPROC] -p PRIMER_FILE [--start START]
                            [--maxerror MAX_ERROR] [--revpr]
                            [--mode {cut,mask,trim,tag}] [--barcode]
                            [--bf BARCODE_FIELD] [--pf PRIMER_FIELD]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-p <primer_file>

A FASTA file containing primer sequences.

--start <start>

The starting position of the primer.

--maxerror <max_error>

Maximum allowable error rate.

--revpr

Specify to match the tail-end of the sequence against the reverse complement of the primers. This also reverses the behavior of the –start argument, such that start position is relative to the tail-end of the sequence.

--mode {cut,mask,trim,tag}

Specifies the action to take with the primer sequence. The “cut” mode will remove both the primer region and the preceding sequence. The “mask” mode will replace the primer region with Ns and remove the preceding sequence. The “trim” mode will remove the region preceding the primer, but leave the primer region intact. The “tag” mode will leave the input sequence unmodified.

--barcode

Specify to annotate reads sequences with barcode sequences (unique molecular identifiers) found preceding the primer.

--bf <barcode_field>

Name of the barcode annotation field.

--pf <primer_field>

Name of the annotation field containing the primer name.