presto.Applications¶
External application wrappers
-
presto.Applications.
makeBlastnDb
(ref_file, db_exec='makeblastdb')¶ Makes a ublast database file
Parameters: - ref_file – the path to the reference database file
- db_exec – the path to the makeblastdb executable
Returns: (name and location of the database, handle of the tempfile.TemporaryDirectory)
Return type:
-
presto.Applications.
makeUBlastDb
(ref_file, db_exec='usearch')¶ Makes a ublast database file
Parameters: - ref_file – path to the reference database file.
- db_exec – path to the usearch executable.
Returns: (location of the database, handle of the tempfile.NamedTemporaryFile)
Return type:
-
presto.Applications.
runBlastn
(seq, database, evalue=1e-05, max_hits=100, aligner_exec='blastn')¶ Aligns a sequence against a reference database using BLASTN
Parameters: - seq – a list of SeqRecord objects to align.
- database – the path and name of the blastn database.
- evalue – the E-value cut-off.
- maxhits – the maximum number of hits returned.
- aligner_exec – the path to the blastn executable.
Returns: Alignment results.
Return type: pandas.DataFrame
-
presto.Applications.
runCDHit
(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, max_memory=3000, threads=1, cluster_exec='cd-hit-est')¶ Cluster a set of sequences using CD-HIT
Parameters: - seq_list (list) – a list of SeqRecord objects to align.
- ident (float) – the sequence identity cutoff to be passed to cd-hit-est.
- length_ratio (float) – cd-hit-est parameter defining the minimum short/long length ratio allowed within a cluster.
- seq_start (int) – the start position to trim sequences at before clustering.
- seq_end (int) – the end position to trim sequences at before clustering.
- max_memory (int) – cd-hit-est max memory limit (Mb)
- threads (int) – number of threads for cd-hit-est.
- cluster_exec (str) – the path to the cd-hit-est executable.
Returns: {cluster id: list of sequence ids}.
Return type:
-
presto.Applications.
runMuscle
(seq_list, aligner_exec='muscle')¶ Multiple aligns a set of sequences using MUSCLE
Parameters: - seq_list – a list of SeqRecord objects to align
- aligner_exec – the MUSCLE executable
Returns: Multiple alignment results.
Return type: Bio.Align.MultipleSeqAlignment
-
presto.Applications.
runUBlast
(seq, database, evalue=1e-05, max_hits=100, aligner_exec='usearch')¶ Aligns a sequence against a reference database using the usearch_local algorithm of USEARCH
Parameters: - seq – a list of SeqRecord objects to align.
- database – the path to the ublast database or a fasta file.
- evalue – the E-value cut-off.
- maxhits – the maximum number of hits returned.
- aligner_exec – the path to the usearch executable.
Returns: Alignment results.
Return type: pandas.DataFrame
-
presto.Applications.
runUClust
(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, threads=1, cluster_exec='usearch')¶ Cluster a set of sequences using the UCLUST algorithm from USEARCH
Parameters: - seq_list (list) – a list of SeqRecord objects to align.
- ident (float) – the sequence identity cutoff to be passed to usearch.
- length_ratio (float) – usearch parameter defining the minimum short/long length ratio allowed within a cluster.
- seq_start (int) – the start position to trim sequences at before clustering.
- seq_end (int) – the end position to trim sequences at before clustering.
- threads (int) – number of threads for usearch.
- cluster_exec (str) – the path to the usearch executable.
Returns: {cluster id: list of sequence ids}.
Return type: