presto.Applications

External application wrappers

presto.Applications.makeBlastnDb(ref_file, db_exec='makeblastdb')

Makes a blastn database file

Parameters
  • ref_file – the path to the reference database file

  • db_exec – the path to the makeblastdb executable

Returns

(name and location of the database, handle of the tempfile.TemporaryDirectory)

Return type

tuple

presto.Applications.makeUBlastDb(ref_file, db_exec='usearch')

Makes a ublast database file

Parameters
  • ref_file – path to the reference database file.

  • db_exec – path to the usearch executable.

Returns

(location of the database, handle of the tempfile.NamedTemporaryFile)

Return type

tuple

presto.Applications.runBlastn(seq, database, evalue=1e-05, max_hits=100, aligner_exec='blastn')

Aligns a sequence against a reference database using BLASTN

Parameters
  • seq – a list of SeqRecord objects to align.

  • database – the path and name of the blastn database.

  • evalue – the E-value cut-off.

  • maxhits – the maximum number of hits returned.

  • aligner_exec – the path to the blastn executable.

Returns

Alignment results.

Return type

pandas.DataFrame

presto.Applications.runCDHit(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, max_memory=3000, threads=1, cluster_exec='cd-hit-est')

Cluster a set of sequences using CD-HIT

Parameters
  • seq_list (list) – a list of SeqRecord objects to align.

  • ident (float) – the sequence identity cutoff to be passed to cd-hit-est.

  • length_ratio (float) – cd-hit-est parameter defining the minimum short/long length ratio allowed within a cluster.

  • seq_start (int) – the start position to trim sequences at before clustering.

  • seq_end (int) – the end position to trim sequences at before clustering.

  • max_memory (int) – cd-hit-est max memory limit (Mb)

  • threads (int) – number of threads for cd-hit-est.

  • cluster_exec (str) – the path to the cd-hit-est executable.

Returns

{cluster id: list of sequence ids}.

Return type

dict

presto.Applications.runMuscle(seq_list, aligner_exec='muscle')

Multiple aligns a set of sequences using MUSCLE

Parameters
  • seq_list – a list of SeqRecord objects to align

  • aligner_exec – the MUSCLE executable

Returns

Multiple alignment results.

Return type

Bio.Align.MultipleSeqAlignment

presto.Applications.runUBlast(seq, database, evalue=1e-05, max_hits=100, aligner_exec='usearch')

Aligns a sequence against a reference database using the usearch_local algorithm of USEARCH

Parameters
  • seq – a list of SeqRecord objects to align.

  • database – the path to the ublast database or a fasta file.

  • evalue – the E-value cut-off.

  • maxhits – the maximum number of hits returned.

  • aligner_exec – the path to the usearch executable.

Returns

Alignment results.

Return type

pandas.DataFrame

presto.Applications.runUClust(seq_list, ident=0.9, length_ratio=0.0, seq_start=0, seq_end=None, threads=1, cluster_exec='usearch')

Cluster a set of sequences using the UCLUST algorithm from USEARCH

Parameters
  • seq_list (list) – a list of SeqRecord objects to align.

  • ident (float) – the sequence identity cutoff to be passed to usearch.

  • length_ratio (float) – usearch parameter defining the minimum short/long length ratio allowed within a cluster.

  • seq_start (int) – the start position to trim sequences at before clustering.

  • seq_end (int) – the end position to trim sequences at before clustering.

  • threads (int) – number of threads for usearch.

  • cluster_exec (str) – the path to the usearch executable.

Returns

{cluster id: list of sequence ids}.

Return type

dict