Manipulating Annotations

The tool provides a collection of methods for performing simple manipulations of sequence headers that are formatted in the pRESTO annotation scheme.

For converting sequence headers into the pRESTO format, see the Importing Data documentation.

Adding a sample annotation

Addition of annotation values is accomplished using the add subcommand of add -s reads.fastq -f SAMPLE -u A1

which will add the annotation SAMPLE=A1 to each sequence of the input file.

Expanding and renaming annotations

By default, pRESTO will not delete annotations. If a sequence header already contains an annotation that a tool is trying to add, it will not overwrite that annotation. Instead, it will append the annotation value to the values already present in a comma delimited form. For example, after two interations of with the default primer field name PRIMER, you will have an annotation in the following form (reflecting a match against primer VH3 in the first iteration and primer IGHM in the second):


Separating these annotations into two annotations is accomplished via the expand subcommand of expand -s reads.fastq -f PRIMER

Resulting in the annotations:


which may then be renamed via the rename subcommand: expand subcommand of rename -s reads_reheader.fastq -f PRIMER1 PRIMER2 \

Copying, merging and collapsing annotations

Nested annotations can be generated using the copy or merge subcommands of The examples that follow will use the starting annotation:


The UMI and CELL annotations can be combined into a single INDEX annotation using the following command: merge -s reads.fasta -f UMI CELL -k INDEX --delete
# result> COUNT=10,2|INDEX=ATGC,GGCC

Without the --delete argument, the original UMI and CELL annotations would be kept in the header.

The nested annotation values can then be combined using the collapse subcommand to create various effects: collapse -s reads_reheader.fasta -f INDEX --act cat
# result> INDEX=ATGCGGCC collapse -s reads_reheader.fasta -f INDEX --act first
# result> INDEX=ATGC collapse -s reads_reheader.fasta -f COUNT --act sum
# result> COUNT=12 collapse -s reads_reheader.fasta -f COUNT --act min
# result> COUNT=2

where the --act argument specifies the type of collapse action to perform.

The copy subcommand is normally used to create duplicate annotations with different names, but will have a similar effect to the merge subcommand when the target is an existing field: copy -s reads.fasta -f UMI -k CELL

Both the copy and merge subcommands have an --act argument which allows you to perform an action from the collapse subcommand in the same step as the copy or merge: merge -s reads.fasta -f UMI CELL -k INDEX --delete --act cat
# result> COUNT=10,2|INDEX=ATGCGGCC copy -s reads.fasta -f UMI -k CELL --act cat

Deleting annotations

Unwanted annotations can be deleted using the delete subcommand of delete -s reads.fastq -f PRIMER

which will remove the PRIMER field from each sequence header.