Importing Data¶
Importing data from SRA, ENA, or GenBank¶
If you have downloaded a data set from GenBank, SRA, or ENA, the format of the sequences headers are different from the raw Roche 454 and Illumina header format. As such, they may or may not be compatible with pRESTO (depending on how the headers have been modified by the sequence archive). ConvertHeaders.py allows you to change incompatible header formats into the pRESTO format. For example, to convert from SRA or ENA headers the sra subcommand would be used:
ConvertHeaders.py sra -s reads.fastq
ConvertHeaders.py provides the following conversion subcommands:
Subcommand |
Formats Converted |
---|---|
generic |
Headers with an unknown annotation system |
454 |
Roche 454 |
genbank |
NCBI GenBank and RefSeq |
illumina |
Illumina HiSeq or MiSeq |
imgt |
IMGT/GENE-DB |
migec |
Molecular Identifier Guided Error Correction |
sra |
NCBI SRA or EMBL-EBI ENA |