Cluster sequences by group

usage: ClusterSets [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                       [--failed] [--log LOG_FILE]
                       [--delim DELIMITER DELIMITER DELIMITER] [--nproc NPROC]
                       [--outdir OUT_DIR] [--outname OUT_NAME] [-f BARCODE_FIELD]
                       [-k CLUSTER_FIELD] [--id IDENT] [--start SEQ_START]
                       [--end SEQ_END] [--exec CLUSTER_EXEC]

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.


Specify to force output as FASTA rather than FASTQ.


If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <barcode_field>

The annotation field containing annotations, such as UID barcode, for sequence grouping.

-k <cluster_field>

The name of the output annotation field to add with the cluster information for each sequence.

--id <ident>

The sequence identity threshold for the uclust algorithm.

--start <seq_start>

The start of the region to be used for clustering. Together with –end, this parameter can be used to specify a subsequence of each read to use in the clustering algorithm.

--end <seq_end>

The end of the region to be used for clustering.

--exec <cluster_exec>

The name or location of the usearch or vsearch executable.

output files:
clustered reads.
raw reads failing clustering.
output annotation fields:
a numeric cluster identifier defining the within-group cluster.