AssemblePairs

Assembles paired-end reads into a single sequence

usage: AssemblePairs [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
assemble-pass
successfully assembled reads.
assemble-fail
raw reads failing paired-end assembly.
output annotation fields:
<user defined>
annotation fields specified by the –1f or –2f arguments.

AssemblePairs align

Assemble pairs by aligning ends.

usage: AssemblePairs align [--version] [-h] -1 SEQ_FILES_1 [SEQ_FILES_1 ...]
                               -2 SEQ_FILES_2 [SEQ_FILES_2 ...] [--fasta]
                               [--failed] [--log LOG_FILE]
                               [--delim DELIMITER DELIMITER DELIMITER]
                               [--nproc NPROC] [--outdir OUT_DIR]
                               [--outname OUT_NAME]
                               [--coord {illumina,solexa,sra,454,presto}]
                               [--rc {head,tail,both}]
                               [--1f HEAD_FIELDS [HEAD_FIELDS ...]]
                               [--2f TAIL_FIELDS [TAIL_FIELDS ...]]
                               [--alpha ALPHA] [--maxerror MAX_ERROR]
                               [--minlen MIN_LEN] [--maxlen MAX_LEN] [--scanrev]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends.

--rc {head,tail,both}

Specify to reverse complement sequences before stitching.

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record.

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record.

--alpha <alpha>

Significance threshold for de novo paired-end assembly.

--maxerror <max_error>

Maximum allowable error rate for de novo assembly.

--minlen <min_len>

Minimum sequence length to scan for overlap in de novo assembly.

--maxlen <max_len>

Maximum sequence length to scan for overlap in de novo assembly.

--scanrev

If specified, scan past the end of the tail sequence in de novo assembly to allow the head sequence to overhang the end of the tail sequence.

AssemblePairs join

Assemble pairs by concatenating ends.

usage: AssemblePairs join [--version] [-h] -1 SEQ_FILES_1 [SEQ_FILES_1 ...] -2
                              SEQ_FILES_2 [SEQ_FILES_2 ...] [--fasta] [--failed]
                              [--log LOG_FILE]
                              [--delim DELIMITER DELIMITER DELIMITER]
                              [--nproc NPROC] [--outdir OUT_DIR]
                              [--outname OUT_NAME]
                              [--coord {illumina,solexa,sra,454,presto}]
                              [--rc {head,tail,both}]
                              [--1f HEAD_FIELDS [HEAD_FIELDS ...]]
                              [--2f TAIL_FIELDS [TAIL_FIELDS ...]] [--gap GAP]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends.

--rc {head,tail,both}

Specify to reverse complement sequences before stitching.

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record.

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record.

--gap <gap>

Number of N characters to place between ends.

AssemblePairs reference

Assemble pairs by aligning reads against a reference database.

usage: AssemblePairs reference [--version] [-h] -1 SEQ_FILES_1
                                   [SEQ_FILES_1 ...] -2 SEQ_FILES_2
                                   [SEQ_FILES_2 ...] [--fasta] [--failed]
                                   [--log LOG_FILE]
                                   [--delim DELIMITER DELIMITER DELIMITER]
                                   [--nproc NPROC] [--outdir OUT_DIR]
                                   [--outname OUT_NAME]
                                   [--coord {illumina,solexa,sra,454,presto}]
                                   [--rc {head,tail,both}]
                                   [--1f HEAD_FIELDS [HEAD_FIELDS ...]]
                                   [--2f TAIL_FIELDS [TAIL_FIELDS ...]] -r
                                   REF_FILE [--minident MIN_IDENT]
                                   [--evalue EVALUE] [--maxhits MAX_HITS] [--fill]
                                   [--aligner {blastn,usearch}]
                                   [--exec ALIGNER_EXEC] [--dbexec DB_EXEC]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends.

--rc {head,tail,both}

Specify to reverse complement sequences before stitching.

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record.

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record.

-r <ref_file>

A FASTA file containing the reference sequence database.

--minident <min_ident>

Minimum identity of the assembled sequence required to call a valid reference guided assembly (between 0 and 1).

--evalue <evalue>

Minimum E-value for reference alignment for both the head and tail sequence.

--maxhits <max_hits>

Maximum number of hits from the reference alignment to check for matching head and tail sequence assignments.

--fill

Specify to change the behavior of inserted characters when the head and tail sequences do not overlap during reference guided assembly. If specified, this will result in inserted of the V region reference sequence instead of a sequence of Ns in the non-overlapping region. Warning: you could end up making chimeric sequences by using this option.

--aligner {blastn,usearch}

The local alignment tool to use. Must be one blastn (blast+ nucleotide) or usearch (ublast algorithm).

--exec <aligner_exec>

The name or location of the aligner executable file (blastn or usearch). Defaults to the name specified by the –aligner argument.

--dbexec <db_exec>

The name or location of the executable file that builds the reference database. This defaults to makeblastdb when blastn is specified to the –aligner argument, and usearch when usearch is specified.

AssemblePairs sequential

Assemble pairs by first attempting de novo assembly, then reference guided assembly.

usage: AssemblePairs sequential [--version] [-h] -1 SEQ_FILES_1
                                    [SEQ_FILES_1 ...] -2 SEQ_FILES_2
                                    [SEQ_FILES_2 ...] [--fasta] [--failed]
                                    [--log LOG_FILE]
                                    [--delim DELIMITER DELIMITER DELIMITER]
                                    [--nproc NPROC] [--outdir OUT_DIR]
                                    [--outname OUT_NAME]
                                    [--coord {illumina,solexa,sra,454,presto}]
                                    [--rc {head,tail,both}]
                                    [--1f HEAD_FIELDS [HEAD_FIELDS ...]]
                                    [--2f TAIL_FIELDS [TAIL_FIELDS ...]]
                                    [--alpha ALPHA] [--maxerror MAX_ERROR]
                                    [--minlen MIN_LEN] [--maxlen MAX_LEN]
                                    [--scanrev] -r REF_FILE [--minident MIN_IDENT]
                                    [--evalue EVALUE] [--maxhits MAX_HITS]
                                    [--fill] [--aligner {blastn,usearch}]
                                    [--exec ALIGNER_EXEC] [--dbexec DB_EXEC]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends.

--rc {head,tail,both}

Specify to reverse complement sequences before stitching.

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record.

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record.

--alpha <alpha>

Significance threshold for de novo paired-end assembly.

--maxerror <max_error>

Maximum allowable error rate for de novo assembly.

--minlen <min_len>

Minimum sequence length to scan for overlap in de novo assembly.

--maxlen <max_len>

Maximum sequence length to scan for overlap in de novo assembly.

--scanrev

If specified, scan past the end of the tail sequence in de novo assembly to allow the head sequence to overhang the end of the tail sequence.

-r <ref_file>

A FASTA file containing the reference sequence database.

--minident <min_ident>

Minimum identity of the assembled sequence required to call a valid reference guided assembly (between 0 and 1).

--evalue <evalue>

Minimum E-value for reference alignment for both the head and tail sequence.

--maxhits <max_hits>

Maximum number of hits from the reference alignment to check for matching head and tail sequence assignments.

--fill

Specify to change the behavior of inserted characters when the head and tail sequences do not overlap during reference guided assembly. If specified, this will result in inserted of the V region reference sequence instead of a sequence of Ns in the non-overlapping region. Warning: you could end up making chimeric sequences by using this option.

--aligner {blastn,usearch}

The local alignment tool to use. Must be one blastn (blast+ nucleotide) or usearch (ublast algorithm).

--exec <aligner_exec>

The name or location of the aligner executable file (blastn or usearch). Defaults to the name specified by the –aligner argument.

--dbexec <db_exec>

The name or location of the executable file that builds the reference database. This defaults to makeblastdb when blastn is specified to the –aligner argument, and usearch when usearch is specified.