2) Download genomes
Using minimal_test.tsv:
mess
fetches n reference genome(s) for each taxon/accession with assembly_finder
.
Output
The download step outputs:
- Compressed fasta in assembly_finder/download
- Taxonomy, assembly and sequence summaries in assembly_finder
📂mess_out
┣ 📂assembly_finder
┃ ┣ 📂download
┃ ┃ ┣ 📂GCF_000013425.1
┃ ┃ ┃ ┗ 📜GCF_000013425.1_ASM1342v1_genomic.fna.gz
┃ ┃ ┣ 📂GCF_003812505.1
┃ ┃ ┃ ┗ 📜GCF_003812505.1_ASM381250v1_genomic.fna.gz
┃ ┣ 📜assembly_summary.tsv
┃ ┣ 📜config.yaml
┃ ┣ 📜sequence_report.tsv
┃ ┗ 📜taxonomy.tsv
Assembly summary
taxon | accession | current_accession | paired_accession | source_database | annotation_info.name | annotation_info.provider | annotation_info.release_date | annotation_info.stats.gene_counts.non_coding | annotation_info.stats.gene_counts.protein_coding | annotation_info.stats.gene_counts.pseudogene | annotation_info.stats.gene_counts.total | assembly_level | assembly_name | assembly_status | assembly_type | bioproject_accession | biosample.accession | biosample.bioprojects | biosample.description.organism_name | biosample.description.tax_id | biosample.description.title | biosample.last_updated | biosample.models | biosample.owner.name | biosample.package | biosample.publication_date | biosample.status.status | biosample.status.when | biosample.submission_date | paired_assembly.accession | paired_assembly.annotation_name | paired_assembly.status | refseq_category | release_date | submitter | contig_l50 | contig_n50 | gc_count | gc_percent | number_of_component_sequences | number_of_contigs | number_of_scaffolds | scaffold_l50 | scaffold_n50 | total_number_of_chromosomes | total_sequence_length | total_ungapped_length | average_nucleotide_identity.best_ani_match.ani | average_nucleotide_identity.best_ani_match.assembly | average_nucleotide_identity.best_ani_match.assembly_coverage | average_nucleotide_identity.best_ani_match.category | average_nucleotide_identity.best_ani_match.organism_name | average_nucleotide_identity.best_ani_match.type_assembly_coverage | average_nucleotide_identity.category | average_nucleotide_identity.comment | average_nucleotide_identity.match_status | average_nucleotide_identity.submitted_ani_match.ani | average_nucleotide_identity.submitted_ani_match.assembly | average_nucleotide_identity.submitted_ani_match.assembly_coverage | average_nucleotide_identity.submitted_ani_match.category | average_nucleotide_identity.submitted_ani_match.organism_name | average_nucleotide_identity.submitted_ani_match.type_assembly_coverage | average_nucleotide_identity.submitted_organism | average_nucleotide_identity.submitted_species | average_nucleotide_identity.taxonomy_check_status | checkm_info.checkm_marker_set | checkm_info.checkm_marker_set_rank | checkm_info.checkm_species_tax_id | checkm_info.checkm_version | checkm_info.completeness | checkm_info.completeness_percentile | checkm_info.contamination | infraspecific_names.strain | organism_name | tax_id | annotation_info.method | annotation_info.pipeline | annotation_info.software_version | assembly_method | biosample.owner.contacts | sequencing_tech | genome_coverage | path |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
staphylococcus_aureus | GCF_000013425.1 | GCF_000013425.1 | GCA_000013425.1 | SOURCE_DATABASE_REFSEQ | Annotation submitted by NCBI RefSeq | NCBI RefSeq | 2016-08-03 | 75 | 2767 | 30 | 2872 | Complete Genome | ASM1342v1 | current | haploid | PRJNA237 | SAMN02604235 | [{'accession': 'PRJNA237'}] | Staphylococcus aureus subsp. aureus NCTC 8325 | 93061 | Sample from Staphylococcus aureus subsp. aureus NCTC 8325 | 2015-05-18T13:21:01.110 | ['Generic'] | NCBI | Generic.1.0 | 2014-01-30T15:13:19.920 | live | 2014-01-30T15:13:19.920 | 2014-01-30T15:13:19.920 | GCA_000013425.1 | Annotation submitted by University of Oklahoma Health Sciences Center | current | reference genome | 2006-02-13 | University of Oklahoma Health Sciences Center | 1 | 2821361 | 927332 | 33.0 | 1 | 1 | 1 | 1 | 2821361 | 1 | 2821361 | 2821361 | 99.94 | GCA_006094915.1 | 96.32 | type | Staphylococcus aureus | 97.66 | category_na | na | species_match | 99.94 | GCA_006094915.1 | 96.32 | type | Staphylococcus aureus | 97.66 | Staphylococcus aureus subsp. aureus NCTC 8325 | Staphylococcus aureus | OK | Staphylococcus aureus | species | 1280 | v1.2.2 | 97.59 | 19.567595 | 0.39 | NCTC 8325 | Staphylococcus aureus subsp. aureus NCTC 8325 | 93061 | na | na | na | na | na | na | na | test_bam_out/assembly_finder/download/GCF_000013425.1/GCF_000013425.1_ASM1342v1_genomic.fna.gz |
1290 | GCF_003812505.1 | GCF_003812505.1 | GCA_003812505.1 | SOURCE_DATABASE_REFSEQ | GCF_003812505.1-RS_2024_03_28 | NCBI RefSeq | 2024-03-28 | 85 | 2142 | 35 | 2262 | Complete Genome | ASM381250v1 | current | haploid | PRJNA231221 | SAMN10163251 | [{'accession': 'PRJNA231221'}] | Staphylococcus hominis | 1290 | Pathogen: clinical or host-associated sample from Staphylococcus hominis | 2019-05-14T13:08:20.304 | ['Pathogen.cl'] | US Food and Drug Administration | Pathogen.cl.1.0 | 2018-10-02T00:00:00.000 | live | 2018-10-02T12:23:11.101 | 2018-10-02T12:23:11.100 | GCA_003812505.1 | NCBI Prokaryotic Genome Annotation Pipeline (PGAP) | current | representative genome | 2018-11-21 | US Food and Drug Administration | 1 | 2220494 | 713682 | 31.5 | 3 | 3 | 3 | 1 | 2220494 | 3 | 2257431 | 2257431 | 99.99 | GCA_900458635.1 | 98.99 | type | Staphylococcus hominis | 99.01 | category_na | na | species_match | 99.99 | GCA_900458635.1 | 98.99 | type | Staphylococcus hominis | 99.01 | Staphylococcus hominis | Staphylococcus hominis | OK | Staphylococcus hominis | species | 1290 | v1.2.2 | 90.97 | 47.945206 | 2.63 | FDAARGOS_575 | Staphylococcus hominis | 1290 | Best-placed reference protein set; GeneMarkS-2+ | NCBI Prokaryotic Genome Annotation Pipeline (PGAP) | 6.7 | SMRT v. 2.3.0, HGAP v. 3.0 | [{}] | PacBio; Illumina | 19.6x | test_bam_out/assembly_finder/download/GCF_003812505.1/GCF_003812505.1_ASM381250v1_genomic.fna.gz |
Sequence report
Assembly Accession | Assembly Unplaced Count | Assembly-unit accession | Chromosome name | GC Count | GC Percent | GenBank seq accession | Molecule type | Ordering | RefSeq seq accession | Role | Seq length | UCSC style name | Unlocalized Count |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GCF_000013425.1 | Primary Assembly | chromosome | 927332 | CP000253.1 | Chromosome | NC_007795.1 | assembled-molecule | 2821361 | |||||
GCF_003812505.1 | Primary Assembly | chromosome | 702792 | CP033732.1 | Chromosome | NZ_CP033732.1 | assembled-molecule | 2220494 | |||||
GCF_003812505.1 | Primary Assembly | unnamed1 | 9555 | CP033731.1 | Plasmid | NZ_CP033731.1 | assembled-molecule | 32498 | |||||
GCF_003812505.1 | Primary Assembly | unnamed2 | 1335 | CP033733.1 | Plasmid | NZ_CP033733.1 | assembled-molecule | 4439 |
Taxonomy
accession | tax_id | name | rank | kingdom | phylum | class | order | family | genus | species |
---|---|---|---|---|---|---|---|---|---|---|
GCF_000013425.1 | 93061 | Staphylococcus aureus subsp. aureus NCTC 8325 | strain | Bacteria | Bacillota | Bacilli | Bacillales | Staphylococcaceae | Staphylococcus | Staphylococcus aureus |
GCF_003812505.1 | 1290 | Staphylococcus hominis | species | Bacteria | Bacillota | Bacilli | Bacillales | Staphylococcaceae | Staphylococcus | Staphylococcus hominis |