File:Ncbi-prok-genomesize.svg
From Wikimedia Commons, the free media repository
Jump to navigation
Jump to search
Size of this PNG preview of this SVG file: 800 × 457 pixels. Other resolutions: 320 × 183 pixels | 640 × 366 pixels | 1,024 × 585 pixels | 1,280 × 731 pixels | 2,560 × 1,463 pixels | 1,260 × 720 pixels.
Original file (SVG file, nominally 1,260 × 720 pixels, file size: 917 KB)
File information
Structured data
Captions
Summary
[edit]DescriptionNcbi-prok-genomesize.svg |
English: Log-log plot of the total number of annotated proteins in bacterial and archeal genomes submitted to GenBank as a function of genome size. Based on data from NCBI genome reports. |
Date | |
Source | Own work |
Author | Estevezj |
SVG development InfoField | |
Source code InfoField | R code#!/usr/bin/Rscript
# File-Name: prok-genomes-genes-graph.R
# Date: 2013-01-11
# Author: James Estevez (User:Estevezj)
# Purpose: This generates a log-log plot of protein count as a function of genome size.
# Data Used: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt
# License: To the extent possible under law, the author(s) have
# dedicated all copyright and related and neighboring rights to this software to
# the public domain worldwide. This software is distributed without any
# warranty. You should have received a copy of the CC0 Public Domain Dedication
# along with this software. If not, see
# <https://creativecommons.org/publicdomain/zero/1.0/>.
library(grDevices)
library(ggplot2)
library(plyr)
library(taxize)
# Download our tables from NCBI's FTP site. Accessed Fri Jan 11 23:02:49 PST 2013
prok <- read.table("ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt", sep="\t", comment.char="!", header=T, stringsAsFactors = F)
prok <- read.table("ncbi-ftp-reports-prokaryotes.txt", sep="\t", comment.char="!", header=T, stringsAsFactors = F)
# Clear missing values ('-')
prok.cut <- prok[(prok$Size..Mb. != '-') & (prok$Proteins != '-'),]
# Set classes
prok.cut$Size..Mb. <- as.numeric(prok.cut$Size..Mb.)
prok.cut$Proteins <- as.numeric(prok.cut$Proteins)
prok.cut$Group <- as.factor(prok.cut$Group)
# From which domain of life does each genome come?
groups <- levels(prok.cut$Group)
get_domain <- function(x){first.hit <- classification(get_uid(x))[[1]] # return the first hit
kingdom <- as.character(first.hit[which(first.hit[,"Rank"] == "superkingdom"), 1]) # extract domain
return(data.frame(Group = x, Domain = kingdom))
}
domains <- ldply(groups, get_domain)
foo <- prok.cut
prok.cut <- merge(prok.cut, domains, by = "Group")
# Draw our plot
p <- ggplot(prok.cut, aes(Size..Mb., Proteins, color = Domain))
# Save our plot to SVG
svg(filename='ncbi-prok-genomesize.svg', width = 14, height = 8)
p + geom_point(alpha = 0.5, size = 2) +
scale_y_log10() +
scale_x_log10() +
scale_shape(solid = FALSE) +
ggtitle("The total genome size and the number of genes in bacteria and archaea.") +
xlab('Genome size (Megabases)') +
ylab("Number of protein coding genes") +
scale_colour_brewer(type="qual", palette=3)
dev.off()
|
Licensing
[edit]I, the copyright holder of this work, hereby publish it under the following license:
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 07:00, 12 January 2013 | 1,260 × 720 (917 KB) | Estevezj (talk | contribs) | User created page with UploadWizard |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.
Metadata
This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong.
Width | 1008pt |
---|---|
Height | 576pt |