FilterSeq.py

Filters sequences in FASTA/FASTQ files

usage: FilterSeq.py [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
<command>-pass

reads passing filtering operation and modified accordingly, where <command> is the name of the filtering operation that was run.

<command>-fail

raw reads failing filtering criteria, where <command> is the name of the filtering operation.

output annotation fields:

None

FilterSeq.py length

Filters reads by length.

usage: FilterSeq.py length [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                           [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                           [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                           [--fasta] [--nproc NPROC] [-n MIN_LENGTH] [--inner]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-n <min_length>

Minimum sequence length to retain.

--inner

If specified exclude consecutive missing characters at either end of the sequence.

FilterSeq.py maskqual

Masks low quality positions.

usage: FilterSeq.py maskqual [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                             [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                             [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                             [--fasta] [--nproc NPROC] [-q MIN_QUAL]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-q <min_qual>

Quality score threshold.

FilterSeq.py missing

Filters reads by N or gap character count.

usage: FilterSeq.py missing [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                            [--fasta] [--nproc NPROC] [-n MAX_MISSING]
                            [--inner]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-n <max_missing>

Threshold for fraction of gap or N nucleotides.

--inner

If specified exclude consecutive missing characters at either end of the sequence.

FilterSeq.py quality

Filters reads by quality score.

usage: FilterSeq.py quality [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                            [--fasta] [--nproc NPROC] [-q MIN_QUAL] [--inner]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-q <min_qual>

Quality score threshold.

--inner

If specified exclude consecutive missing characters at either end of the sequence.

FilterSeq.py repeats

Filters reads by consecutive nucleotide repeats.

usage: FilterSeq.py repeats [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                            [--fasta] [--nproc NPROC] [-n MAX_REPEAT]
                            [--missing] [--inner]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-n <max_repeat>

Threshold for fraction of repeating nucleotides.

--missing

If specified count consecutive gap and N characters ‘ in addition to {A,C,G,T}.

--inner

If specified exclude consecutive missing characters at either end of the sequence.

FilterSeq.py trimqual

Trims sequences by quality score decay.

usage: FilterSeq.py trimqual [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
                             [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                             [--outname OUT_NAME] [--log LOG_FILE] [--failed]
                             [--fasta] [--nproc NPROC] [-q MIN_QUAL]
                             [--win WINDOW] [--reverse]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

-o <out_files>

Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--fasta

Specify to force output as FASTA rather than FASTQ.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

-q <min_qual>

Quality score threshold.

--win <window>

Nucleotide window size for moving average calculation.

--reverse

Specify to trim the head of the sequence rather than the tail.