Examples
Download summary tables
Starting from v0.8.0, you can restrict outputs to assembly_summary.tsv
and taxonomy.tsv
- Command
- Output
📂staphylococcus_aureus
┣ 📂logs
┃ ┣ 📂taxons
┃ ┃ ┗ 📜staphylococcus_aureus.log
┃ ┗📜lineage.log
┣ 📜assembly_finder.log
┣ 📜assembly_summary.tsv
┣ 📜config.yaml
┗ 📜taxonomy.tsv
Download genomes
Small datasets
- Staphylococcus aureus complete genomes
Note
By default, assembly_finder searches assembly levels in the following order: complete, chromosome, scaffold, and contig.
The search stops at the first assembly level where genomes are found.
This behavior was introduced in v0.9.0 to allow finding the best genomes available for each taxon
- All Staphylococcus aureus genomes
Note
The --all option disables the default iteration over assembly levels. When used, all genomes for the specified taxon are downloaded, regardless of their assembly level.
- Any Staphylococcus aureus complete genome
Big datasets
Warning
These examples are for big datasets downloads, so using an NCBI api-key is highly recommended
- Download all chlamydia genomes
- Best ranking complete genome per bacteria species
- Complete bacteria viruses and archaea genomes from RefSeq (excluding MAGs and atypical)
assembly_finder -i eubacteria,viruses,archaea \
--api-key <api-key> \
--source refseq \
--mag exclude \
-o outdir
- Specific bioproject
Download other files (cds, proteins, gff3 ...)
assembly_finder -i staphylococcus_aureus --reference \
--include rna,protein,cds,gff3,gtf,gbff,seq-report
📂staphylococcus_aureus
┣ 📂download
┃ ┣ 📂GCF_000013425.1
┃ ┃ ┣ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
┃ ┃ ┣ 📜cds_from_genomic.fna.gz
┃ ┃ ┣ 📜genomic.gbff.gz
┃ ┃ ┣ 📜genomic.gff.gz
┃ ┃ ┣ 📜genomic.gtf.gz
┃ ┃ ┗ 📜protein.faa.gz
┃ ┃ ┗ 📜sequence_report.jsonl
┃ ┗ 📜.snakemake_timestamp
┣ 📂logs
┃ ┣ 📂taxons
┃ ┃ ┗ 📜staphylococcus_aureus.log
┃ ┣ 📜archive.log
┃ ┣ 📜lineage.log
┃ ┣ 📜rsync.log
┃ ┗ 📜unzip.log
┣ 📜archive.zip
┣ 📜assembly_finder.log
┣ 📜assembly_summary.tsv
┣ 📜config.yaml
┗ 📜taxonomy.tsv