Outputs
Below are all the outputs when using the taxons table example
📂taxons
┣ 📂download
┃ ┣ 📂GCF_000008865.2
┃ ┃ ┗ 📜GCF_000008865.2_ASM886v2_genomic.fna.gz
┃ ┣ 📂GCF_000013425.1
┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
┃ ┣ 📂GCF_003812505.1
┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz
┃ ┗ 📜.snakemake_timestamp
┣ 📂logs
┃ ┣ 📂taxons
┃ ┃ ┣ 📜1290.log
┃ ┃ ┣ 📜562.log
┃ ┃ ┗ 📜staphylococcus_aureus.log
┃ ┣ 📜archive.log
┃ ┣ 📜lineage.log
┃ ┣ 📜rsync.log
┃ ┗ 📜unzip.log
┣ 📜archive.zip
┣ 📜assembly_finder.log
┣ 📜assembly_summary.tsv
┣ 📜config.yaml
┗ 📜taxonomy.tsv
Download directory
Downloaded files (genomic.fna.gz
by default) are located in the download directory as shown below
┣ 📂download
┃ ┣ 📂GCF_000008865.2
┃ ┃ ┗ 📜GCF_000008865.2_ASM886v2_genomic.fna.gz
┃ ┣ 📂GCF_000013425.1
┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
┃ ┣ 📂GCF_003812505.1
┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz
Assembly summary
Table with assembly informations such as assembly level, reference category, checkM and BUSCO completeness, sequencing technology, number of contigs ...
taxon | accession | current_accession | paired_accession | source_database | annotation_info.method | annotation_info.name | annotation_info.pipeline | annotation_info.provider | annotation_info.release_date | annotation_info.software_version | annotation_info.stats.gene_counts.non_coding | annotation_info.stats.gene_counts.protein_coding | annotation_info.stats.gene_counts.pseudogene | annotation_info.stats.gene_counts.total | assembly_level | assembly_method | assembly_name | assembly_status | assembly_type | bioproject_accession | biosample.accession | biosample.bioprojects | biosample.description.organism_name | biosample.description.tax_id | biosample.description.title | biosample.last_updated | biosample.models | biosample.owner.contacts | biosample.owner.name | biosample.package | biosample.publication_date | biosample.status.status | biosample.status.when | biosample.submission_date | paired_assembly.accession | paired_assembly.annotation_name | paired_assembly.status | refseq_category | release_date | sequencing_tech | submitter | contig_l50 | contig_n50 | gc_count | gc_percent | genome_coverage | number_of_component_sequences | number_of_contigs | number_of_scaffolds | scaffold_l50 | scaffold_n50 | total_number_of_chromosomes | total_sequence_length | total_ungapped_length | average_nucleotide_identity.best_ani_match.ani | average_nucleotide_identity.best_ani_match.assembly | average_nucleotide_identity.best_ani_match.assembly_coverage | average_nucleotide_identity.best_ani_match.category | average_nucleotide_identity.best_ani_match.organism_name | average_nucleotide_identity.best_ani_match.type_assembly_coverage | average_nucleotide_identity.category | average_nucleotide_identity.comment | average_nucleotide_identity.match_status | average_nucleotide_identity.submitted_ani_match.ani | average_nucleotide_identity.submitted_ani_match.assembly | average_nucleotide_identity.submitted_ani_match.assembly_coverage | average_nucleotide_identity.submitted_ani_match.category | average_nucleotide_identity.submitted_ani_match.organism_name | average_nucleotide_identity.submitted_ani_match.type_assembly_coverage | average_nucleotide_identity.submitted_organism | average_nucleotide_identity.submitted_species | average_nucleotide_identity.taxonomy_check_status | checkm_info.checkm_marker_set | checkm_info.checkm_marker_set_rank | checkm_info.checkm_species_tax_id | checkm_info.checkm_version | checkm_info.completeness | checkm_info.completeness_percentile | checkm_info.contamination | infraspecific_names.strain | organism_name | tax_id | path |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
staphylococcus_aureus | GCF_000013425.1 | GCF_000013425.1 | GCA_000013425.1 | SOURCE_DATABASE_REFSEQ | na | Annotation submitted by NCBI RefSeq | na | NCBI RefSeq | 2016-08-03 | na | 75 | 2767 | 30 | 2872 | Complete Genome | na | ASM1342v1 | current | haploid | PRJNA237 | SAMN02604235 | [{'accession': 'PRJNA237'}] | Staphylococcus aureus subsp. aureus NCTC 8325 | 93061 | Sample from Staphylococcus aureus subsp. aureus NCTC 8325 | 2015-05-18T13:21:01.110 | ['Generic'] | na | NCBI | Generic.1.0 | 2014-01-30T15:13:19.920 | live | 2014-01-30T15:13:19.920 | 2014-01-30T15:13:19.920 | GCA_000013425.1 | Annotation submitted by University of Oklahoma Health Sciences Center | current | reference genome | 2006-02-13 | na | University of Oklahoma Health Sciences Center | 1 | 2821361 | 927332 | 33 | na | 1 | 1 | 1 | 1 | 2821361 | 1 | 2821361 | 2821361 | 99.94 | GCA_006094915.1 | 96.32 | type | Staphylococcus aureus | 97.66 | category_na | na | species_match | 99.94 | GCA_006094915.1 | 96.32 | type | Staphylococcus aureus | 97.66 | Staphylococcus aureus subsp. aureus NCTC 8325 | Staphylococcus aureus | OK | Staphylococcus aureus | species | 1280 | v1.2.2 | 97.59 | 19.683367 | 0.39 | NCTC 8325 | Staphylococcus aureus subsp. aureus NCTC 8325 | 93061 | /path/to/genome/GCF_000013425.1/GCF_000013425.1_ASM1342v1_genomic.fna.gz |
562 | GCF_000008865.2 | GCF_000008865.2 | GCA_000008865.2 | SOURCE_DATABASE_REFSEQ | na | Annotation submitted by NCBI RefSeq | na | NCBI RefSeq | 2021-02-12 | na | 126 | 5155 | 136 | 5417 | Complete Genome | na | ASM886v2 | current | haploid | PRJNA226 | SAMN01911278 | [{'accession': 'PRJNA226'}] | Escherichia coli O157:H7 str. Sakai | 386585 | Bacterial, clinical or host-associated sample for Escherichia coli O157:H7 str. SAKAI (EHEC) | 2019-05-23T15:25:40.989 | ['Pathogen.ba-cl'] | [{}] | ATCC | Pathogen.cl.1.0 | 2013-02-05T00:00:00.000 | live | 2014-11-20T09:44:57 | 2013-02-05T09:09:06.203 | GCA_000008865.2 | Annotation submitted by GIRC | current | reference genome | 2018-06-08 | na | GIRC | 1 | 5498578 | 2824389 | 50.5 | na | 3 | 3 | 3 | 1 | 5498578 | 3 | 5594605 | 5594605 | 99.97 | GCA_001281725.1 | 94.32 | claderef | Escherichia coli | 99.57 | category_na | na | species_match | 99.97 | GCA_001281725.1 | 94.32 | claderef | Escherichia coli | 99.57 | Escherichia coli O157:H7 str. Sakai | Escherichia coli | OK | Escherichia coli | species | 562 | v1.2.2 | 99.51 | 92.85564 | 0.15 | Sakai substr. RIMD 0509952 | Escherichia coli O157:H7 str. Sakai | 386585 | /path/to/genome/GCF_000008865.2/GCF_000008865.2_ASM886v2_genomic.fna.gz |
1290 | GCF_003812505.1 | GCF_003812505.1 | GCA_003812505.1 | SOURCE_DATABASE_REFSEQ | Best-placed reference protein set; GeneMarkS-2+ | GCF_003812505.1-RS_2024_03_28 | NCBI Prokaryotic Genome Annotation Pipeline (PGAP) | NCBI RefSeq | 2024-03-28 | 6.7 | 85 | 2142 | 35 | 2262 | Complete Genome | SMRT v. 2.3.0, HGAP v. 3.0 | ASM381250v1 | current | haploid | PRJNA231221 | SAMN10163251 | [{'accession': 'PRJNA231221'}] | Staphylococcus hominis | 1290 | Pathogen: clinical or host-associated sample from Staphylococcus hominis | 2019-05-14T13:08:20.304 | ['Pathogen.cl'] | [{}] | US Food and Drug Administration | Pathogen.cl.1.0 | 2018-10-02T00:00:00.000 | live | 2018-10-02T12:23:11.101 | 2018-10-02T12:23:11.100 | GCA_003812505.1 | NCBI Prokaryotic Genome Annotation Pipeline (PGAP) | current | representative genome | 2018-11-21 | PacBio; Illumina | US Food and Drug Administration | 1 | 2220494 | 713682 | 31.5 | 19.6x | 3 | 3 | 3 | 1 | 2220494 | 3 | 2257431 | 2257431 | 99.99 | GCA_900458635.1 | 98.99 | type | Staphylococcus hominis | 99.01 | category_na | na | species_match | 99.99 | GCA_900458635.1 | 98.99 | type | Staphylococcus hominis | 99.01 | Staphylococcus hominis | Staphylococcus hominis | OK | Staphylococcus hominis | species | 1290 | v1.2.2 | 90.97 | 47.945206 | 2.63 | FDAARGOS_575 | Staphylococcus hominis | 1290 | /path/to/genome/GCF_003812505.1/GCF_003812505.1_ASM381250v1_genomic.fna.gz |
Note
Some columns were removed for visual clarity
Taxonomy
Table containing the full lineage from kingdom to species of each tax_id
accession | tax_id | name | rank | kingdom | phylum | class | order | family | genus | species |
---|---|---|---|---|---|---|---|---|---|---|
GCF_000013425.1 | 93061 | Staphylococcus aureus subsp. aureus NCTC 8325 | strain | Bacteria | Bacillota | Bacilli | Bacillales | Staphylococcaceae | Staphylococcus | Staphylococcus aureus |
GCF_000008865.2 | 386585 | Escherichia coli O157:H7 str. Sakai | strain | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | Escherichia | Escherichia coli |
GCF_003812505.1 | 1290 | Staphylococcus hominis | species | Bacteria | Bacillota | Bacilli | Bacillales | Staphylococcaceae | Staphylococcus | Staphylococcus hominis |