Skip to content

2) Download genomes

Using minimal_test.tsv:

minimal_test.tsv
taxon   nb  cov_sim sample
staphylococcus_aureus   1   0.1 sample1
1290    1   0.1 sample2

mess fetches n reference genome(s) for each taxon/accession with assembly_finder.

Output

The download step outputs:

  • Compressed fasta in assembly_finder/download
  • Taxonomy, assembly and sequence summaries in assembly_finder
📂mess_out
 ┣ 📂assembly_finder
 ┃ ┣ 📂download
 ┃ ┃ ┣ 📂GCF_000013425.1
 ┃ ┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
 ┃ ┃ ┣ 📂GCF_003812505.1
 ┃ ┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz
 ┃ ┣ 📜assembly_summary.tsv
 ┃ ┣ 📜config.yaml
 ┃ ┣ 📜sequence_report.tsv
 ┃ ┗ 📜taxonomy.tsv

Assembly summary

taxon accession current_accession paired_accession source_database annotation_info.name annotation_info.provider annotation_info.release_date annotation_info.stats.gene_counts.non_coding annotation_info.stats.gene_counts.protein_coding annotation_info.stats.gene_counts.pseudogene annotation_info.stats.gene_counts.total assembly_level assembly_name assembly_status assembly_type bioproject_accession biosample.accession biosample.bioprojects biosample.description.organism_name biosample.description.tax_id biosample.description.title biosample.last_updated biosample.models biosample.owner.name biosample.package biosample.publication_date biosample.status.status biosample.status.when biosample.submission_date paired_assembly.accession paired_assembly.annotation_name paired_assembly.status refseq_category release_date submitter contig_l50 contig_n50 gc_count gc_percent number_of_component_sequences number_of_contigs number_of_scaffolds scaffold_l50 scaffold_n50 total_number_of_chromosomes total_sequence_length total_ungapped_length average_nucleotide_identity.best_ani_match.ani average_nucleotide_identity.best_ani_match.assembly average_nucleotide_identity.best_ani_match.assembly_coverage average_nucleotide_identity.best_ani_match.category average_nucleotide_identity.best_ani_match.organism_name average_nucleotide_identity.best_ani_match.type_assembly_coverage average_nucleotide_identity.category average_nucleotide_identity.comment average_nucleotide_identity.match_status average_nucleotide_identity.submitted_ani_match.ani average_nucleotide_identity.submitted_ani_match.assembly average_nucleotide_identity.submitted_ani_match.assembly_coverage average_nucleotide_identity.submitted_ani_match.category average_nucleotide_identity.submitted_ani_match.organism_name average_nucleotide_identity.submitted_ani_match.type_assembly_coverage average_nucleotide_identity.submitted_organism average_nucleotide_identity.submitted_species average_nucleotide_identity.taxonomy_check_status checkm_info.checkm_marker_set checkm_info.checkm_marker_set_rank checkm_info.checkm_species_tax_id checkm_info.checkm_version checkm_info.completeness checkm_info.completeness_percentile checkm_info.contamination infraspecific_names.strain organism_name tax_id annotation_info.method annotation_info.pipeline annotation_info.software_version assembly_method biosample.owner.contacts sequencing_tech genome_coverage path
staphylococcus_aureus GCF_000013425.1 GCF_000013425.1 GCA_000013425.1 SOURCE_DATABASE_REFSEQ Annotation submitted by NCBI RefSeq NCBI RefSeq 2016-08-03 75 2767 30 2872 Complete Genome ASM1342v1 current haploid PRJNA237 SAMN02604235 [{'accession': 'PRJNA237'}] Staphylococcus aureus subsp. aureus NCTC 8325 93061 Sample from Staphylococcus aureus subsp. aureus NCTC 8325 2015-05-18T13:21:01.110 ['Generic'] NCBI Generic.1.0 2014-01-30T15:13:19.920 live 2014-01-30T15:13:19.920 2014-01-30T15:13:19.920 GCA_000013425.1 Annotation submitted by University of Oklahoma Health Sciences Center current reference genome 2006-02-13 University of Oklahoma Health Sciences Center 1 2821361 927332 33.0 1 1 1 1 2821361 1 2821361 2821361 99.94 GCA_006094915.1 96.32 type Staphylococcus aureus 97.66 category_na na species_match 99.94 GCA_006094915.1 96.32 type Staphylococcus aureus 97.66 Staphylococcus aureus subsp. aureus NCTC 8325 Staphylococcus aureus OK Staphylococcus aureus species 1280 v1.2.2 97.59 19.567595 0.39 NCTC 8325 Staphylococcus aureus subsp. aureus NCTC 8325 93061 na na na na na na na test_bam_out/assembly_finder/download/GCF_000013425.1/GCF_000013425.1_ASM1342v1_genomic.fna.gz
1290 GCF_003812505.1 GCF_003812505.1 GCA_003812505.1 SOURCE_DATABASE_REFSEQ GCF_003812505.1-RS_2024_03_28 NCBI RefSeq 2024-03-28 85 2142 35 2262 Complete Genome ASM381250v1 current haploid PRJNA231221 SAMN10163251 [{'accession': 'PRJNA231221'}] Staphylococcus hominis 1290 Pathogen: clinical or host-associated sample from Staphylococcus hominis 2019-05-14T13:08:20.304 ['Pathogen.cl'] US Food and Drug Administration Pathogen.cl.1.0 2018-10-02T00:00:00.000 live 2018-10-02T12:23:11.101 2018-10-02T12:23:11.100 GCA_003812505.1 NCBI Prokaryotic Genome Annotation Pipeline (PGAP) current representative genome 2018-11-21 US Food and Drug Administration 1 2220494 713682 31.5 3 3 3 1 2220494 3 2257431 2257431 99.99 GCA_900458635.1 98.99 type Staphylococcus hominis 99.01 category_na na species_match 99.99 GCA_900458635.1 98.99 type Staphylococcus hominis 99.01 Staphylococcus hominis Staphylococcus hominis OK Staphylococcus hominis species 1290 v1.2.2 90.97 47.945206 2.63 FDAARGOS_575 Staphylococcus hominis 1290 Best-placed reference protein set; GeneMarkS-2+ NCBI Prokaryotic Genome Annotation Pipeline (PGAP) 6.7 SMRT v. 2.3.0, HGAP v. 3.0 [{}] PacBio; Illumina 19.6x test_bam_out/assembly_finder/download/GCF_003812505.1/GCF_003812505.1_ASM381250v1_genomic.fna.gz

Sequence report

Assembly Accession Assembly Unplaced Count Assembly-unit accession Chromosome name GC Count GC Percent GenBank seq accession Molecule type Ordering RefSeq seq accession Role Seq length UCSC style name Unlocalized Count
GCF_000013425.1 Primary Assembly chromosome 927332 CP000253.1 Chromosome NC_007795.1 assembled-molecule 2821361
GCF_003812505.1 Primary Assembly chromosome 702792 CP033732.1 Chromosome NZ_CP033732.1 assembled-molecule 2220494
GCF_003812505.1 Primary Assembly unnamed1 9555 CP033731.1 Plasmid NZ_CP033731.1 assembled-molecule 32498
GCF_003812505.1 Primary Assembly unnamed2 1335 CP033733.1 Plasmid NZ_CP033733.1 assembled-molecule 4439

Taxonomy

accession tax_id name rank kingdom phylum class order family genus species
GCF_000013425.1 93061 Staphylococcus aureus subsp. aureus NCTC 8325 strain Bacteria Bacillota Bacilli Bacillales Staphylococcaceae Staphylococcus Staphylococcus aureus
GCF_003812505.1 1290 Staphylococcus hominis species Bacteria Bacillota Bacilli Bacillales Staphylococcaceae Staphylococcus Staphylococcus hominis