SplitSeq

Sorts, samples and splits FASTA/FASTQ sequence files

usage: SplitSeq [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
part<part>
reads partitioned by count, where <part> is the partition number.
<field>-<value>
reads partitioned by annotation <field> and <value>.
under-<number>
reads partitioned by numeric threshold where the annotation value is strictly less than the threshold <number>.
atleast-<number>
reads partitioned by numeric threshold where the annotation value is greater than or equal to the threshold <number>.
sorted
reads sorted by annotation value.
sorted-part<part>
reads sorted by annotation value and partitioned by count, where <part> is the partition number.
sample<i>-n<count>
randomly sampled reads where <i> is a number specifying the sampling instance and <count> is the number of sampled reads.
selected
reads passing selection criteria.
output annotation fields:
None

SplitSeq count

Splits sequences files by number of records.

usage: SplitSeq count [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                          [--outdir OUT_DIR] [--outname OUT_NAME] -n MAX_COUNT
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

Maximum number of sequences in each new file

SplitSeq group

Splits sequences files by annotation.

usage: SplitSeq group [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                          [--delim DELIMITER DELIMITER DELIMITER]
                          [--outdir OUT_DIR] [--outname OUT_NAME] -f FIELD
                          [--num THRESHOLD]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>

Annotation field to split sequence files by

--num <threshold>

Specify to define the split field as numeric and group sequences by value.

SplitSeq sample

Randomly samples from unpaired sequences files.

usage: SplitSeq sample [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                           [--delim DELIMITER DELIMITER DELIMITER]
                           [--outdir OUT_DIR] [--outname OUT_NAME] -n MAX_COUNT
                           [MAX_COUNT ...] [-f FIELD] [-u VALUES [VALUES ...]]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

Maximum number of sequences to sample from each file, field or annotation set. The default behavior, without the -f argument, is to sample from the complete set of sequences in the input file.

-f <field>

The annotation field for sampling criteria. If the -u argument is not also specified, then sampling will be performed for each unique annotation value in the declared field separately.

-u <values>

If specified, sampling will be restricted to sequences that contain one of the declared annotation values in the specified field. Requires the -f argument.

SplitSeq samplepair

Randomly samples from paired-end sequences files.

usage: SplitSeq samplepair [--version] [-h] -1 SEQ_FILES_1 [SEQ_FILES_1 ...]
                               -2 SEQ_FILES_2 [SEQ_FILES_2 ...] [--fasta]
                               [--delim DELIMITER DELIMITER DELIMITER]
                               [--outdir OUT_DIR] [--outname OUT_NAME] -n
                               MAX_COUNT [MAX_COUNT ...] [-f FIELD]
                               [-u VALUES [VALUES ...]]
                               [--coord {illumina,solexa,sra,454,presto}]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

Maximum number of paired sequences to sample from each set of files, fields or annotations. The default behavior, without the -f argument, is to sample from the complete set of paired sequences in the input files.

-f <field>

The annotation field for sampling criteria. If the -u argument is not also specified, then sampling will be performed for each unique annotation value in the declared field separately.

-u <values>

If specified, sampling will be restricted to sequences that contain one of the declared annotation values in the specified field. Requires the -f argument.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired read files.

SplitSeq select

Selects sequences from sequence files by annotation.

usage: SplitSeq select [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                           [--delim DELIMITER DELIMITER DELIMITER]
                           [--outdir OUT_DIR] [--outname OUT_NAME] -f FIELD
                           [-u VALUE_LIST [VALUE_LIST ...] | -t VALUE_FILE]
                           [--not]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>

The annotation field for selection criteria.

-u <value_list>

A list of values to select for in the specified field. Mutually exclusive with -t.

-t <value_file>

A tab delimited file specifying values to select for in the specified field. The file must be formatted with the given field name in the header row. Values will be taken from that column. Mutually exclusive with -u.

--not

If specified, will perform negative matching. Meaning, sequences will be selected if they fail to match for all specified values.

SplitSeq sort

Sorts sequences files by annotation.

usage: SplitSeq sort [--version] [-h] -s SEQ_FILES [SEQ_FILES ...] [--fasta]
                         [--delim DELIMITER DELIMITER DELIMITER]
                         [--outdir OUT_DIR] [--outname OUT_NAME] -f FIELD
                         [-n MAX_COUNT] [--num]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>

The annotation field to sort sequences by.

-n <max_count>

Maximum number of sequences in each new file.

--num

Specify to define the sort field as numeric rather than textual.