EstimateError

Calculates annotation set error rates

usage: EstimateError [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                         [--log LOG_FILE] [--delim DELIMITER DELIMITER DELIMITER]
                         [--nproc NPROC] [--outdir OUT_DIR] [--outname OUT_NAME]
                         [-f SET_FIELD] [-n MIN_COUNT] [--mode {freq,qual}]
                         [-q MIN_QUAL] [--freq MIN_FREQ] [--maxdiv MAX_DIVERSITY]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <set_field>

The name of the annotation field to group sequences by

-n <min_count>

The minimum number of sequences needed to consider a set

--mode {freq,qual}

Specifies which method to use to determine the consensus sequence. The “freq” method will determine the consensus by nucleotide frequency at each position and assign the most common value. The “qual” method will weight values by their quality scores to determine the consensus nucleotide at each position.

-q <min_qual>

Consensus quality score cut-off under which an ambiguous character is assigned.

--freq <min_freq>

Fraction of character occurrences under which an ambiguous character is assigned.

--maxdiv <max_diversity>

Specify to calculate the nucleotide diversity of each read group (average pairwise error rate) and exclude groups which exceed the given diversity threshold.

output files:
error-position
estimated error by read position.
error-quality
estimated error by the quality score assigned within the input file.
error-nucleotide
estimated error by nucleotide.
error-set
estimated error by barcode read group size.
output fields:
POSITION
read position with base zero indexing.
Q
Phred quality score.
OBSERVED
observed nucleotide value.
REFERENCE
consensus nucleotide for the barcode read group.
SET_COUNT
barcode read group size.
REPORTED_Q
mean Phred quality score reported within the input file for the given position, quality score, nucleotide or read group.
MISMATCHES
count of observed mismatches from consensus for the given position, quality score, nucleotide or read group.
OBSERVATIONS
total count of observed values for each position, quality score, nucleotide or read group size.
ERROR
estimated error rate.
EMPIRICAL_Q
estimated error rate converted to a Phred quality score.