EstimateError

Calculates annotation set error rates

usage: EstimateError [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
error-position
estimated error by read position.
error-quality
estimated error by the quality score assigned within the input file.
error-nucleotide
estimated error by nucleotide.
error-set
estimated error by annotation set size.
distance-set
pairwise hamming distances by annotation set.
threshold-set
thresholds from pairwise hamming distances for annotation sets.
distance-barcode
estimated error by pairwise hamming distances
threshold-barcode
thresholds from pairwise hamming distances for clustering barcodes
output fields:
POSITION
read position with base zero indexing.
Q
Phred quality score.
OBSERVED
observed nucleotide value.
REFERENCE
consensus nucleotide for the barcode read group.
SET_COUNT
barcode read group size.
REPORTED_Q
mean Phred quality score reported within the input file for the given position, quality score, nucleotide or read group.
MISMATCHES
count of observed mismatches from consensus for the given position, quality score, nucleotide or read group.
OBSERVATIONS
total count of observed values for each position, quality score, nucleotide or read group size.
ERROR
estimated error rate.
EMPIRICAL_Q
estimated error rate converted to a Phred quality score.
ALL
histogram (count) of all pairwise distance distribution.
DTN
histogram (count) of distance to nearest distribution.
DISTANCE
length normalized hamming distance.

EstimateError barcode

Calculates pairwise distance metrics of barcode sequences.

usage: EstimateError barcode [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                             [--outdir OUT_DIR] [--outname OUT_NAME]
                             [--delim DELIMITER DELIMITER DELIMITER]
                             [-f BARCODE_FIELD]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

-f <barcode_field>

The name of the barcode field.

EstimateError set

Estimates error statistics within annotation sets.

usage: EstimateError set [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                         [--outdir OUT_DIR] [--outname OUT_NAME]
                         [--log LOG_FILE]
                         [--delim DELIMITER DELIMITER DELIMITER]
                         [--nproc NPROC] [-f SET_FIELD] [-n MIN_COUNT]
                         [--mode {freq,qual}] [-q MIN_QUAL] [--freq MIN_FREQ]
                         [--maxdiv MAX_DIVERSITY]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-f <set_field>

The name of the annotation field to group sequences by

-n <min_count>

The minimum number of sequences needed to consider a set

--mode {freq,qual}

Specifies which method to use to determine the consensus sequence. The “freq” method will determine the consensus by nucleotide frequency at each position and assign the most common value. The “qual” method will weight values by their quality scores to determine the consensus nucleotide at each position.

-q <min_qual>

Consensus quality score cut-off under which an ambiguous character is assigned.

--freq <min_freq>

Fraction of character occurrences under which an ambiguous character is assigned.

--maxdiv <max_diversity>

Specify to calculate the nucleotide diversity of each read group (average pairwise error rate) and exclude groups which exceed the given diversity threshold.