Skip to content

Outputs

Below are all the outputs when using the taxons table example

📂taxons
 ┣ 📂download
 ┃ ┣ 📂GCF_000008865.2
 ┃ ┃ ┗ 📜GCF_000008865.2_ASM886v2_genomic.fna.gz
 ┃ ┣ 📂GCF_000013425.1
 ┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
 ┃ ┣ 📂GCF_003812505.1
 ┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz
 ┃ ┗ 📜.snakemake_timestamp
 ┣ 📂logs
 ┃ ┣ 📂taxons
 ┃ ┃ ┣ 📜1290.log
 ┃ ┃ ┣ 📜562.log
 ┃ ┃ ┗ 📜staphylococcus_aureus.log
 ┃ ┣ 📜archive.log
 ┃ ┣ 📜lineage.log
 ┃ ┣ 📜rsync.log
 ┃ ┗ 📜unzip.log
 ┣ 📜archive.zip
 ┣ 📜assembly_finder.log
 ┣ 📜assembly_summary.tsv
 ┣ 📜config.yaml
 ┗ 📜taxonomy.tsv

Download directory

Downloaded files (genomic.fna.gz by default) are located in the download directory as shown below

 ┣ 📂download
 ┃ ┣ 📂GCF_000008865.2
 ┃ ┃ ┗ 📜GCF_000008865.2_ASM886v2_genomic.fna.gz
 ┃ ┣ 📂GCF_000013425.1
 ┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
 ┃ ┣ 📂GCF_003812505.1
 ┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz

Assembly summary

Table with assembly informations such as assembly level, reference category, checkM and BUSCO completeness, sequencing technology, number of contigs ...

taxon accession current_accession paired_accession source_database annotation_info.method annotation_info.name annotation_info.pipeline annotation_info.provider annotation_info.release_date annotation_info.software_version annotation_info.stats.gene_counts.non_coding annotation_info.stats.gene_counts.protein_coding annotation_info.stats.gene_counts.pseudogene annotation_info.stats.gene_counts.total assembly_level assembly_method assembly_name assembly_status assembly_type bioproject_accession biosample.accession biosample.bioprojects biosample.description.organism_name biosample.description.tax_id biosample.description.title biosample.last_updated biosample.models biosample.owner.contacts biosample.owner.name biosample.package biosample.publication_date biosample.status.status biosample.status.when biosample.submission_date paired_assembly.accession paired_assembly.annotation_name paired_assembly.status refseq_category release_date sequencing_tech submitter contig_l50 contig_n50 gc_count gc_percent genome_coverage number_of_component_sequences number_of_contigs number_of_scaffolds scaffold_l50 scaffold_n50 total_number_of_chromosomes total_sequence_length total_ungapped_length average_nucleotide_identity.best_ani_match.ani average_nucleotide_identity.best_ani_match.assembly average_nucleotide_identity.best_ani_match.assembly_coverage average_nucleotide_identity.best_ani_match.category average_nucleotide_identity.best_ani_match.organism_name average_nucleotide_identity.best_ani_match.type_assembly_coverage average_nucleotide_identity.category average_nucleotide_identity.comment average_nucleotide_identity.match_status average_nucleotide_identity.submitted_ani_match.ani average_nucleotide_identity.submitted_ani_match.assembly average_nucleotide_identity.submitted_ani_match.assembly_coverage average_nucleotide_identity.submitted_ani_match.category average_nucleotide_identity.submitted_ani_match.organism_name average_nucleotide_identity.submitted_ani_match.type_assembly_coverage average_nucleotide_identity.submitted_organism average_nucleotide_identity.submitted_species average_nucleotide_identity.taxonomy_check_status checkm_info.checkm_marker_set checkm_info.checkm_marker_set_rank checkm_info.checkm_species_tax_id checkm_info.checkm_version checkm_info.completeness checkm_info.completeness_percentile checkm_info.contamination infraspecific_names.strain organism_name tax_id path
staphylococcus_aureus GCF_000013425.1 GCF_000013425.1 GCA_000013425.1 SOURCE_DATABASE_REFSEQ na Annotation submitted by NCBI RefSeq na NCBI RefSeq 2016-08-03 na 75 2767 30 2872 Complete Genome na ASM1342v1 current haploid PRJNA237 SAMN02604235 [{'accession': 'PRJNA237'}] Staphylococcus aureus subsp. aureus NCTC 8325 93061 Sample from Staphylococcus aureus subsp. aureus NCTC 8325 2015-05-18T13:21:01.110 ['Generic'] na NCBI Generic.1.0 2014-01-30T15:13:19.920 live 2014-01-30T15:13:19.920 2014-01-30T15:13:19.920 GCA_000013425.1 Annotation submitted by University of Oklahoma Health Sciences Center current reference genome 2006-02-13 na University of Oklahoma Health Sciences Center 1 2821361 927332 33 na 1 1 1 1 2821361 1 2821361 2821361 99.94 GCA_006094915.1 96.32 type Staphylococcus aureus 97.66 category_na na species_match 99.94 GCA_006094915.1 96.32 type Staphylococcus aureus 97.66 Staphylococcus aureus subsp. aureus NCTC 8325 Staphylococcus aureus OK Staphylococcus aureus species 1280 v1.2.2 97.59 19.683367 0.39 NCTC 8325 Staphylococcus aureus subsp. aureus NCTC 8325 93061 /path/to/genome/GCF_000013425.1/GCF_000013425.1_ASM1342v1_genomic.fna.gz
562 GCF_000008865.2 GCF_000008865.2 GCA_000008865.2 SOURCE_DATABASE_REFSEQ na Annotation submitted by NCBI RefSeq na NCBI RefSeq 2021-02-12 na 126 5155 136 5417 Complete Genome na ASM886v2 current haploid PRJNA226 SAMN01911278 [{'accession': 'PRJNA226'}] Escherichia coli O157:H7 str. Sakai 386585 Bacterial, clinical or host-associated sample for Escherichia coli O157:H7 str. SAKAI (EHEC) 2019-05-23T15:25:40.989 ['Pathogen.ba-cl'] [{}] ATCC Pathogen.cl.1.0 2013-02-05T00:00:00.000 live 2014-11-20T09:44:57 2013-02-05T09:09:06.203 GCA_000008865.2 Annotation submitted by GIRC current reference genome 2018-06-08 na GIRC 1 5498578 2824389 50.5 na 3 3 3 1 5498578 3 5594605 5594605 99.97 GCA_001281725.1 94.32 claderef Escherichia coli 99.57 category_na na species_match 99.97 GCA_001281725.1 94.32 claderef Escherichia coli 99.57 Escherichia coli O157:H7 str. Sakai Escherichia coli OK Escherichia coli species 562 v1.2.2 99.51 92.85564 0.15 Sakai substr. RIMD 0509952 Escherichia coli O157:H7 str. Sakai 386585 /path/to/genome/GCF_000008865.2/GCF_000008865.2_ASM886v2_genomic.fna.gz
1290 GCF_003812505.1 GCF_003812505.1 GCA_003812505.1 SOURCE_DATABASE_REFSEQ Best-placed reference protein set; GeneMarkS-2+ GCF_003812505.1-RS_2024_03_28 NCBI Prokaryotic Genome Annotation Pipeline (PGAP) NCBI RefSeq 2024-03-28 6.7 85 2142 35 2262 Complete Genome SMRT v. 2.3.0, HGAP v. 3.0 ASM381250v1 current haploid PRJNA231221 SAMN10163251 [{'accession': 'PRJNA231221'}] Staphylococcus hominis 1290 Pathogen: clinical or host-associated sample from Staphylococcus hominis 2019-05-14T13:08:20.304 ['Pathogen.cl'] [{}] US Food and Drug Administration Pathogen.cl.1.0 2018-10-02T00:00:00.000 live 2018-10-02T12:23:11.101 2018-10-02T12:23:11.100 GCA_003812505.1 NCBI Prokaryotic Genome Annotation Pipeline (PGAP) current representative genome 2018-11-21 PacBio; Illumina US Food and Drug Administration 1 2220494 713682 31.5 19.6x 3 3 3 1 2220494 3 2257431 2257431 99.99 GCA_900458635.1 98.99 type Staphylococcus hominis 99.01 category_na na species_match 99.99 GCA_900458635.1 98.99 type Staphylococcus hominis 99.01 Staphylococcus hominis Staphylococcus hominis OK Staphylococcus hominis species 1290 v1.2.2 90.97 47.945206 2.63 FDAARGOS_575 Staphylococcus hominis 1290 /path/to/genome/GCF_003812505.1/GCF_003812505.1_ASM381250v1_genomic.fna.gz

Note

Some columns were removed for visual clarity

Taxonomy

Table containing the full lineage from kingdom to species of each tax_id

accession tax_id name rank kingdom phylum class order family genus species
GCF_000013425.1 93061 Staphylococcus aureus subsp. aureus NCTC 8325 strain Bacteria Bacillota Bacilli Bacillales Staphylococcaceae Staphylococcus Staphylococcus aureus
GCF_000008865.2 386585 Escherichia coli O157:H7 str. Sakai strain Bacteria Pseudomonadota Gammaproteobacteria Enterobacterales Enterobacteriaceae Escherichia Escherichia coli
GCF_003812505.1 1290 Staphylococcus hominis species Bacteria Bacillota Bacilli Bacillales Staphylococcaceae Staphylococcus Staphylococcus hominis