EstimateError¶

Calculates annotation set error rates

usage: EstimateError [--version] [-h]  ...

--version¶: show program’s version number and exit

-h, --help¶: show this help message and exit

output files:

error-position: estimated error by read position.
error-quality: estimated error by the quality score assigned within the input file.
error-nucleotide: estimated error by nucleotide.
error-set: estimated error by annotation set size.
distance-set: pairwise hamming distances by annotation set.
threshold-set: thresholds from pairwise hamming distances for annotation sets.
distance-barcode: estimated error by pairwise hamming distances
threshold-barcode: thresholds from pairwise hamming distances for clustering barcodes

output fields:

POSITION: read position with base zero indexing.
Q: Phred quality score.
OBSERVED: observed nucleotide value.
REFERENCE: consensus nucleotide for the barcode read group.
SET_COUNT: barcode read group size.
REPORTED_Q: mean Phred quality score reported within the input file for the given position, quality score, nucleotide or read group.
MISMATCHES: count of observed mismatches from consensus for the given position, quality score, nucleotide or read group.
OBSERVATIONS: total count of observed values for each position, quality score, nucleotide or read group size.
ERROR: estimated error rate.
EMPIRICAL_Q: estimated error rate converted to a Phred quality score.
ALL: histogram (count) of all pairwise distance distribution.
DTN: histogram (count) of distance to nearest distribution.
DISTANCE: length normalized hamming distance.

EstimateError barcode¶

Calculates pairwise distance metrics of barcode sequences.

usage: EstimateError barcode [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                             [--outdir OUT_DIR] [--outname OUT_NAME]
                             [--delim DELIMITER DELIMITER DELIMITER]
                             [-f BARCODE_FIELD]

--version¶: show program’s version number and exit

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

-f <barcode_field>¶: The name of the barcode field.

EstimateError set¶

Estimates error statistics within annotation sets.

usage: EstimateError set [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                         [--outdir OUT_DIR] [--outname OUT_NAME]
                         [--log LOG_FILE]
                         [--delim DELIMITER DELIMITER DELIMITER]
                         [--nproc NPROC] [-f SET_FIELD] [-n MIN_COUNT]
                         [--mode {freq,qual}] [-q MIN_QUAL] [--freq MIN_FREQ]
                         [--maxdiv MAX_DIVERSITY]

--version¶: show program’s version number and exit

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

-f <set_field>¶: The name of the annotation field to group sequences by

-n <min_count>¶: The minimum number of sequences needed to consider a set

--mode {freq,qual}¶: Specifies which method to use to determine the consensus sequence. The “freq” method will determine the consensus by nucleotide frequency at each position and assign the most common value. The “qual” method will weight values by their quality scores to determine the consensus nucleotide at each position.

-q <min_qual>¶: Consensus quality score cut-off under which an ambiguous character is assigned.

--freq <min_freq>¶: Fraction of character occurrences under which an ambiguous character is assigned.

--maxdiv <max_diversity>¶: Specify to calculate the nucleotide diversity of each read group (average pairwise error rate) and exclude groups which exceed the given diversity threshold.