Notice:
Due to maintenance work, some ExPASy services (including HAMAP, ScanProsite, ProRule, Swiss-2DPAGE, World-2DPAGE) will be inaccessible Friday July 30, 2010 (4pm, GMT+2) to Saturday July 31, 2010 (11pm, GMT+2).
UniProtKB/Swiss-Prot protein knowledgebase release 2010_08 statistics
1. INTRODUCTION
Release 2010_08 of 13-Jul-10 of UniProtKB/Swiss-Prot contains 518415 sequence entries,
comprising 182829264 amino acids abstracted from 190192 references.
646 sequences have been added since release 2010_07, the sequence data of
118 existing entries has been updated and the annotations of
262208 entries have been revised.
Number of fragments: 8708
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 29306
Protein existence (PE): entries %
1: Evidence at protein level 70117 13.5%
2: Evidence at transcript level 67013 12.9%
3: Inferred from homology 365349 70.5%
4: Predicted 14328 2.8%
5: Uncertain 1608 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12146
The first twenty species represent 108194 sequences: 20.9 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5264
2x: 1724
3x: 898
4x: 577
5x: 426
6x: 351
7x: 245
8x: 209
9x: 187
10x: 106
11- 20x: 596
21- 50x: 372
51-100x: 173
>100x: 1018
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20291 Homo sapiens (Human)
2 16301 Mus musculus (Mouse)
3 9160 Arabidopsis thaliana (Mouse-ear cress)
4 7519 Rattus norvegicus (Rat)
5 6577 Saccharomyces cerevisiae (Baker's yeast)
6 5781 Bos taurus (Bovine)
7 4975 Schizosaccharomyces pombe (Fission yeast)
8 4429 Escherichia coli (strain K12)
9 4254 Bacillus subtilis
10 4245 Dictyostelium discoideum (Slime mold)
11 3293 Caenorhabditis elegans
12 3244 Xenopus laevis (African clawed frog)
13 3081 Drosophila melanogaster (Fruit fly)
14 2641 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2474 Oryza sativa subsp. japonica (Rice)
16 2211 Pongo abelii (Sumatran orangutan)
17 2170 Gallus gallus (Chicken)
18 1993 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1772 Salmonella typhimurium
22 1668 Shigella flexneri
23 1667 Escherichia coli O6
24 1666 Mycobacterium tuberculosis
25 1542 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1370 Sus scrofa (Pig)
27 1342 Salmonella typhi
28 1282 Pseudomonas aeruginosa
29 1213 Mycobacterium bovis
30 1162 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1015 Synechocystis sp. (strain PCC 6803)
32 997 Yersinia pestis
33 991 Archaeoglobus fulgidus
34 942 Vibrio cholerae
35 929 Salmonella paratyphi A
36 924 Staphylococcus aureus (strain N315)
37 923 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 913 Rhizobium meliloti (Sinorhizobium meliloti)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 897 Staphylococcus aureus (strain COL)
41 895 Staphylococcus aureus (strain MW2)
42 889 Staphylococcus aureus (strain MSSA476)
43 886 Staphylococcus aureus (strain MRSA252)
44 882 Oryctolagus cuniculus (Rabbit)
45 881 Salmonella choleraesuis
46 879 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
47 870 Shigella sonnei (strain Ss046)
48 864 Yersinia pseudotuberculosis
49 835 Escherichia coli O9:H4 (strain HS)
50 829 Escherichia coli O139:H28 (strain E24377A / ETEC)
51 825 Shigella boydii serotype 4 (strain Sb227)
52 821 Ashbya gossypii (Yeast) (Eremothecium gossypii)
53 817 Escherichia coli (strain UTI89 / UPEC)
54 814 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
55 802 Shigella dysenteriae serotype 1 (strain Sd197)
56 800 Candida albicans (Yeast)
57 794 Vibrio parahaemolyticus
58 793 Kluyveromyces lactis (Yeast) (Candida sphaerica)
59 785 Escherichia coli (strain SMS-3-5 / SECEC)
60 778 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
61 777 Pasteurella multocida
62 775 Neurospora crassa
63 773 Aquifex aeolicus
64 766 Canis familiaris (Dog) (Canis lupus familiaris)
65 765 Escherichia coli (strain K12 / DH10B)
66 759 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
67 759 Escherichia coli (strain K12 / MC4100 / BW2952)
68 757 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
69 757 Escherichia coli (strain 55989 / EAEC)
70 757 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
71 756 Escherichia coli O8 (strain IAI1)
72 756 Staphylococcus epidermidis (strain ATCC 12228)
73 751 Escherichia coli O45:K1 (strain S88 / ExPEC)
74 750 Candida glabrata (Yeast) (Torulopsis glabrata)
75 750 Escherichia coli (strain SE11)
76 750 Shigella flexneri serotype 5b (strain 8401)
77 748 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
78 742 Escherichia coli O157:H7 (strain EC4115 / EHEC)
79 738 Streptomyces coelicolor
80 738 Photorhabdus luminescens subsp. laumondii
81 731 Vibrio vulnificus
82 730 Bacillus halodurans
83 726 Escherichia coli O81 (strain ED1a)
84 723 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
85 722 Bacillus anthracis
86 720 Salmonella enteritidis PT4 (strain P125109)
87 715 Vibrio vulnificus (strain YJ016)
88 715 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
89 713 Yersinia pestis bv. Antiqua (strain Nepal516)
90 713 Salmonella paratyphi A (strain AKU_12601)
91 712 Staphylococcus aureus (strain NCTC 8325)
92 712 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
93 711 Salmonella newport (strain SL254)
94 710 Salmonella heidelberg (strain SL476)
95 710 Salmonella agona (strain SL483)
96 709 Yersinia pestis bv. Antiqua (strain Antiqua)
97 709 Salmonella schwarzengrund (strain CVM19633)
98 706 Escherichia coli O1:K1 / APEC
99 705 Emericella nidulans (Aspergillus nidulans)
100 700 Salmonella dublin (strain CT_02021853)
101 698 Enterobacter sp. (strain 638)
102 697 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
103 697 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
104 687 Mycoplasma pneumoniae
105 685 Pan troglodytes (Chimpanzee)
106 685 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
107 684 Klebsiella pneumoniae (strain 342)
108 684 Pseudomonas syringae pv. tomato
109 682 Salmonella gallinarum (strain 287/91 / NCTC 13346)
110 677 Anabaena sp. (strain PCC 7120)
111 671 Pseudomonas putida (strain KT2440)
112 666 Staphylococcus aureus (strain USA300)
113 666 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
114 666 Yersinia pestis (strain Pestoides F)
115 662 Mycobacterium leprae
116 658 Rhizobium sp. (strain NGR234)
117 655 Serratia proteamaculans (strain 568)
118 653 Zea mays (Maize)
119 645 Escherichia coli
120 645 Bradyrhizobium japonicum
121 642 Staphylococcus aureus (strain bovine RF122 / ET3-1)
122 638 Bacillus cereus (strain ATCC 14579 / DSM 31)
123 637 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
124 635 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
125 633 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
126 620 Shewanella oneidensis
127 617 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
128 615 Treponema pallidum
129 614 Ralstonia solanacearum (Pseudomonas solanacearum)
130 608 Staphylococcus haemolyticus (strain JCSC1435)
131 608 Enterobacter sakazakii (strain ATCC BAA-894)
132 603 Rhizobium loti (Mesorhizobium loti)
133 602 Staphylococcus saprophyticus subsp. saprophyticus
134 601 Methanobacterium thermoautotrophicum
135 599 Salmonella paratyphi C (strain RKS4594)
136 598 Yersinia pestis bv. Antiqua (strain Angola)
137 597 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
138 596 Listeria monocytogenes
139 596 Yarrowia lipolytica (Candida lipolytica)
140 595 Photobacterium profundum (Photobacterium sp. (strain SS9))
141 590 Bacillus cereus (strain ATCC 10987)
142 590 Xanthomonas campestris pv. campestris
143 588 Listeria innocua
144 585 Rickettsia prowazekii
145 584 Helicobacter pylori (Campylobacter pylori)
146 584 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 581 Lactococcus lactis subsp. lactis (Streptococcus lactis)
148 579 Neisseria meningitidis serogroup B
149 576 Brucella suis
150 572 Brucella melitensis
151 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
152 570 Aspergillus fumigatus (Sartorya fumigata)
153 569 Bacillus thuringiensis subsp. konkukian
154 565 Helicobacter pylori J99 (Campylobacter pylori J99)
155 562 Buchnera aphidicola subsp. Schizaphis graminum
156 560 Bacillus cereus (strain ZK / E33L)
157 560 Pseudomonas syringae pv. syringae (strain B728a)
158 557 Pseudomonas aeruginosa (strain UCBPP-PA14)
159 556 Neisseria meningitidis serogroup A
160 555 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
161 555 Xanthomonas axonopodis pv. citri (Citrus canker)
162 553 Vibrio fischeri (strain ATCC 700601 / ES114)
163 551 Pseudomonas fluorescens (strain Pf0-1)
164 549 Oceanobacillus iheyensis
165 545 Caulobacter crescentus (Caulobacter vibrioides)
166 545 Clostridium acetobutylicum
167 545 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
168 538 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
169 529 Listeria monocytogenes serotype 4b (strain F2365)
170 524 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
171 522 Sodalis glossinidius (strain morsitans)
172 521 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
173 521 Xylella fastidiosa
174 519 Streptococcus pneumoniae
175 515 Caenorhabditis briggsae
176 512 Chromobacterium violaceum
177 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
178 510 Thermotoga maritima
179 509 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
180 507 Bordetella parapertussis
181 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)
182 507 Pseudomonas aeruginosa (strain PA7)
183 505 Bordetella pertussis
184 504 Staphylococcus aureus (strain Newman)
185 504 Haemophilus ducreyi
186 504 Geobacillus kaustophilus
187 500 Pseudomonas entomophila (strain L48)
188 498 Brucella abortus
189 497 Rickettsia conorii
190 497 Deinococcus radiodurans
191 496 Bacillus clausii (strain KSM-K16)
192 494 Oryza sativa subsp. indica (Rice)
193 492 Haemophilus influenzae (strain 86-028NP)
194 490 Xanthomonas campestris pv. campestris (strain 8004)
195 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
196 490 Clostridium perfringens
197 488 Bacillus amyloliquefaciens (strain FZB42)
198 487 Burkholderia pseudomallei (Pseudomonas pseudomallei)
199 487 Shewanella sp. (strain MR-7)
200 485 Corynebacterium glutamicum (Brevibacterium flavum)
201 484 Pseudomonas aeruginosa (strain LESB58)
202 484 Staphylococcus aureus (strain Mu3 / ATCC 700698)
203 484 Shewanella sp. (strain MR-4)
204 483 Mannheimia succiniciproducens (strain MBEL55E)
205 483 Mycoplasma genitalium
206 482 Streptomyces avermitilis
207 481 Proteus mirabilis (strain HI4320)
208 479 Methanosarcina acetivorans
209 475 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
210 472 Burkholderia sp. (strain 383) (Burkholderia cepacia
211 472 Pseudomonas putida (strain F1 / ATCC 700007)
212 472 Brucella abortus (strain 2308)
213 472 Thermosynechococcus elongatus (strain BP-1)
214 469 Acinetobacter sp. (strain ADP1)
215 469 Enterococcus faecalis (Streptococcus faecalis)
216 465 Pyrococcus horikoshii
217 465 Xanthomonas campestris pv. vesicatoria (strain 85-10)
218 465 Pseudomonas putida (strain GB-1)
219 464 Rhodopseudomonas palustris
220 464 Shewanella frigidimarina (strain NCIMB 400)
221 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
222 462 Shewanella sp. (strain ANA-3)
223 461 Burkholderia mallei (Pseudomonas mallei)
224 460 Ralstonia eutropha (Cupriavidus necator
225 459 Lactobacillus plantarum
226 457 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
227 457 Pyrococcus abyssi
228 457 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 457 Methanosarcina mazei (Methanosarcina frisia)
230 456 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
231 455 Staphylococcus aureus (strain JH1)
232 453 Rickettsia felis (Rickettsia azadi)
233 453 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
234 452 Shewanella baltica (strain OS185)
235 452 Pseudomonas putida (strain W619)
236 452 Halobacterium salinarium (Halobacterium halobium)
237 449 Staphylococcus aureus (strain JH9)
238 449 Streptococcus mutans
239 448 Thermoanaerobacter tengcongensis
240 447 Methylococcus capsulatus
241 447 Ovis aries (Sheep)
242 447 Aeromonas salmonicida (strain A449)
243 446 Vibrio fischeri (strain MJ11)
244 446 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
245 444 Pseudomonas mendocina (strain ymp)
246 443 Hahella chejuensis (strain KCTC 2396)
247 443 Dechloromonas aromatica (strain RCB)
248 441 Streptococcus pyogenes serotype M6
249 440 Pyrococcus furiosus
250 439 Mycobacterium paratuberculosis
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18201 ( 4%)
Bacteria 323640 ( 62%)
Eukaryota 161696 ( 31%)
Viruses 14878 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20292 ( 13%) ( 4%)
Other Mammalia 44812 ( 28%) ( 9%)
Other Vertebrata 16186 ( 10%) ( 3%)
Viridiplantae 29271 ( 18%) ( 6%)
Fungi 26184 ( 16%) ( 5%)
Insecta 8032 ( 5%) ( 2%)
Nematoda 4086 ( 3%) ( 1%)
Other 12833 ( 8%) ( 2%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8502 1001-1100 3528
51- 100 39984 1101-1200 2450
101- 150 55879 1201-1300 1930
151- 200 55992 1301-1400 1788
201- 250 54712 1401-1500 1423
251- 300 48066 1501-1600 636
301- 350 48504 1601-1700 507
351- 400 41489 1701-1800 422
401- 450 33942 1801-1900 394
451- 500 27298 1901-2000 322
501- 550 19372 2001-2100 199
551- 600 13782 2101-2200 265
601- 650 11536 2201-2300 276
651- 700 8232 2301-2400 168
701- 750 6847 2401-2500 129
751- 800 4889 >2500 1016
801- 850 4224
851- 900 4801
901- 950 3650
951-1000 2553
The average sequence length in UniProtKB/Swiss-Prot is 352 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2076
4.1 Table of the frequency of journal citations
Journals cited 1x: 664
2x: 287
3x: 146
4x: 104
5x: 88
6x: 63
7x: 33
8x: 39
9x: 35
10x: 26
11- 20x: 168
21- 50x: 165
51-100x: 98
>100x: 160
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 18017 Journal of Biological Chemistry
2 8344 Proceedings of the National Academy of Sciences of the U.S.A.
3 5033 Journal of Bacteriology
4 4530 Biochemical and Biophysical Research Communications
5 4503 Gene
6 4306 Nucleic Acids Research
7 3972 FEBS Letters
8 3876 Biochemistry
9 3755 The EMBO Journal
10 3424 Molecular and Cellular Biology
11 3236 Nature
12 3104 European Journal of Biochemistry
13 3062 Journal of Molecular Biology
14 2974 Biochimica et Biophysica Acta
15 2686 Cell
16 2479 Genomics
17 2181 Biochemical Journal
18 2136 Science
19 2030 Journal of Virology
20 1771 Molecular Microbiology
21 1577 Journal of Cell Biology
22 1498 Plant Molecular Biology
23 1369 Genes and Development
24 1348 Virology
25 1330 Plant Physiology
26 1324 Human Molecular Genetics
27 1318 Nature Genetics
28 1304 Molecular and General Genetics
29 1253 The American Journal of Human Genetics
30 1182 Oncogene
31 1170 Development
32 1161 Journal of Biochemistry
33 1088 Human Mutation
34 1020 Molecular Biology of the Cell
35 1012 Journal of Immunology
36 987 Genetics
37 889 Structure
38 882 Infection and Immunity
39 872 Journal of General Virology
40 871 The Plant Cell
41 825 Molecular Cell
42 825 Archives of Biochemistry and Biophysics
43 799 Blood
44 757 Yeast
45 754 Microbiology
46 752 The Plant Journal
47 735 Journal of Cell Science
48 728 Developmental Biology
49 676 Cancer Research
50 656 FEMS Microbiology Letters
51 651 Current Biology
52 595 Mechanisms of Development
53 592 Nature Structural Biology
54 590 Human Genetics
55 556 Acta Crystallographica, Section D
56 551 Protein Science
57 534 Applied and Environmental Microbiology
58 534 Journal of Neuroscience
59 524 Current Genetics
60 515 Toxicon
61 506 Neuron
62 503 Journal of Clinical Investigation
63 470 Mammalian Genome
64 461 American Journal of Physiology
65 449 Immunogenetics
66 447 The Journal of Experimental Medicine
67 441 Molecular Endocrinology
68 419 Molecular and Biochemical Parasitology
69 411 Journal of Neurochemistry
70 410 The Journal of Clinical Endocrinology and Metabolism
71 387 Endocrinology
72 378 Journal of Molecular Evolution
73 372 Proteins
74 367 Bioscience, Biotechnology, and Biochemistry
75 366 DNA and Cell Biology
76 357 Molecular Biology and Evolution
77 356 DNA Sequence
78 350 Journal of Medical Genetics
79 320 Tissue Antigens
80 314 Brain Research. Molecular Brain Research
81 306 Plant and Cell Physiology
82 304 Nature Cell Biology
83 298 Experimental Cell Research
84 297 Peptides
85 297 Comparative Biochemistry and Physiology
86 289 Biological Chemistry Hoppe-Seyler
87 282 Antimicrobial Agents and Chemotherapy
88 276 Journal of Investigative Dermatology
89 275 Cytogenetics and Cell Genetics
90 268 Molecular Pharmacology
91 258 Biology of Reproduction
92 248 Journal of General Microbiology
93 247 Genome Research
94 247 Developmental Cell
95 242 Neurology
96 241 Developmental Dynamics
97 239 RNA
98 232 Virus Research
99 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
100 211 Planta
101 206 Molecular Plant-Microbe Interactions
102 205 DNA Research
103 204 European Journal of Immunology
104 203 Biochimie
105 202 Annals of Neurology
106 198 Genes to Cells
107 194 European Journal of Human Genetics
108 191 Eukaryotic cell
109 190 Immunity
110 187 The New England Journal of Medicine
111 185 Journal of Human Genetics
112 179 Nature Structural and Molecular Biology
113 175 Molecular and Cellular Endocrinology
114 172 The FEBS Journal
115 169 Investigative Ophthalmology and Visual Science
116 165 Archives of Microbiology
117 165 The FASEB Journal
118 163 American Journal of Medical Genetics
119 163 Molecular Phylogenetics and Evolution
120 161 Insect Biochemistry and Molecular Biology
121 161 EMBO Reports
122 159 DNA
123 153 Molecular Immunology
124 153 Hemoglobin
125 152 Bioorganicheskaia Khimiia
126 151 Molecular Reproduction and Development
127 150 Diabetes
128 147 Archives of Virology
129 145 Glycobiology
130 144 Clinical Genetics
131 138 International Journal of Cancer
132 137 Molecular Genetics and Metabolism
133 136 General and Comparative Endocrinology
134 136 Animal Genetics
135 135 Molecular and Cellular Neuroscience
136 133 Journal of Cellular Biochemistry
137 132 Journal of the American Chemical Society
138 130 British Journal of Haematology
139 129 Biological Chemistry
140 126 BMC Genomics
141 126 American Journal of Medical Genetics. Part A
142 125 Molecular Genetics and Genomics
143 125 Nature Immunology
144 122 Journal of Lipid Research
145 122 Agricultural and Biological Chemistry
146 118 Circulation Research
147 116 Proteomics
148 115 Neuroscience Letters
149 114 Thrombosis and Haemostasis
150 114 Journal of Medicinal Chemistry
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 928298 1.79
Journal 735916 389078 1.42 1
Submitted to EMBL/GenBank/DDBJ 179701 166254 0.35 2
Submitted to other databases 10610 9198 0.02 3
Book citation 639 625 <0.01 4
Plant Gene Register 560 548 <0.01 5
Thesis 399 396 <0.01 6
Unpublished observations 297 293 <0.01 7
Patent 170 168 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 290543
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2228234 4.30
ALLERGEN 465 465 <0.01 26
ALTERNATIVE PRODUCTS 18906 18906 0.04 13
BIOPHYSICOCHEMICAL PROPERTIES 3147 3147 0.01 22
BIOTECHNOLOGY 275 273 <0.01 28
CATALYTIC ACTIVITY 224889 205307 0.43 4
CAUTION 7067 6925 0.01 19
COFACTOR 100065 91843 0.19 7
DEVELOPMENTAL STAGE 8909 8909 0.02 16
DISEASE 4276 2892 0.01 21
DISRUPTION PHENOTYPE 2776 2776 0.01 23
DOMAIN 31986 28288 0.06 11
ENZYME REGULATION 8901 8901 0.02 17
FUNCTION 387606 371523 0.75 2
INDUCTION 11923 11923 0.02 15
INTERACTION 12576 12576 0.02 14
MASS SPECTROMETRY 4431 3353 0.01 20
MISCELLANEOUS 30349 28073 0.06 12
PATHWAY 127008 116121 0.24 6
PHARMACEUTICAL 84 84 <0.01 29
POLYMORPHISM 791 756 <0.01 24
PTM 35866 29025 0.07 9
RNA EDITING 611 611 <0.01 25
SEQUENCE CAUTION 37929 37929 0.07 8
SIMILARITY 603793 493758 1.16 1
SUBCELLULAR LOCATION 299758 294677 0.58 3
SUBUNIT 221852 221852 0.43 5
TISSUE SPECIFICITY 33268 33268 0.06 10
TOXIC DOSE 426 415 <0.01 27
WEB RESOURCE 8301 6583 0.02 18
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3247931 6.27
ACT_SITE 129656 77502 0.25 9
BINDING 206089 58052 0.40 4
CA_BIND 3668 1491 0.01 35
CARBOHYD 100409 25469 0.19 13
CHAIN 524820 513530 1.01 1
COILED 18295 12389 0.04 26
COMPBIAS 49712 25923 0.10 18
CONFLICT 117011 41056 0.23 10
CROSSLNK 4944 3178 0.01 34
DISULFID 96416 25776 0.19 14
DNA_BIND 10944 10076 0.02 29
DOMAIN 144577 86186 0.28 6
HELIX 130424 13641 0.25 8
INIT_MET 14833 14833 0.03 27
INTRAMEM 1523 720 <0.01 37
LIPID 10573 6735 0.02 31
METAL 276005 68087 0.53 3
MOD_RES 180341 59700 0.35 5
MOTIF 32228 20774 0.06 22
MUTAGEN 31367 7466 0.06 23
NON_CONS 1826 687 <0.01 36
NON_STD 349 274 <0.01 39
NON_TER 11797 8974 0.02 28
NP_BIND 104522 68055 0.20 12
PEPTIDE 8693 5603 0.02 32
PROPEP 10573 8923 0.02 30
REGION 92790 50893 0.18 15
REPEAT 88842 13125 0.17 16
SIGNAL 34667 34657 0.07 21
SITE 37323 22145 0.07 20
STRAND 130956 12754 0.25 7
TOPO_DOM 116689 23932 0.23 11
TRANSIT 6578 6492 0.01 33
TRANSMEM 338874 69223 0.65 2
TURN 31113 10773 0.06 24
UNSURE 1201 395 <0.01 38
VAR_SEQ 39211 16833 0.08 19
VARIANT 79751 16503 0.15 17
ZN_FING 28341 12326 0.05 25
Total number of feature keys: 39
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 13275589 25.61
2DBase-Ecoli 85 85 <0.01 118 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 115 2D gel databases
AGD 827 821 <0.01 94 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 124 2D gel databases
ArachnoServer 460 456 <0.01 102 Organism-specific databases
ArrayExpress 58185 58185 0.11 39 Gene expression databases
Bgee 39443 39441 0.08 45 Gene expression databases
BindingDB 297 297 <0.01 110 Other
BioCyc 160692 147808 0.31 21 Enzyme and pathway databases
BRENDA 65183 62384 0.13 36 Enzyme and pathway databases
CAZy 7172 6435 0.01 68 Protein family/group databases
CGD 561 555 <0.01 99 Organism-specific databases
CleanEx 30187 29542 0.06 47 Gene expression databases
COMPLUYEAST-2DPAGE 101 100 <0.01 117 2D gel databases
ConoServer 613 587 <0.01 98 Organism-specific databases
Cornea-2DPAGE 67 67 <0.01 119 2D gel databases
CTD 64784 64232 0.12 37 Organism-specific databases
CYGD 6629 6540 0.01 71 Organism-specific databases
dictyBase 4372 4244 0.01 80 Organism-specific databases
DIP 11486 11382 0.02 62 Protein-protein interaction databases
DisProt 397 394 <0.01 105 3D structure databases
DOSAC-COBS-2DPAGE 149 147 <0.01 114 2D gel databases
DrugBank 5317 1626 0.01 73 Other
EchoBASE 4167 4163 0.01 82 Organism-specific databases
ECO2DBASE 352 300 <0.01 109 2D gel databases
EcoGene 4396 4394 0.01 78 Organism-specific databases
eggNOG 217074 217074 0.42 18 Phylogenomic databases
EMBL 856728 508471 1.65 3 Sequence databases
Ensembl 74434 57636 0.14 31 Genome annotation databases
EnsemblBacteria 97262 84186 0.19 27 Genome annotation databases
EnsemblFungi 14449 14295 0.03 58 Genome annotation databases
EnsemblMetazoa 12533 8313 0.02 60 Genome annotation databases
EnsemblPlants 12780 11345 0.02 59 Genome annotation databases
EnsemblProtists 4282 4160 0.01 81 Genome annotation databases
euHCVdb 55 44 <0.01 120 Organism-specific databases
EuPathDB 233 233 <0.01 113 Organism-specific databases
FlyBase 5647 5271 0.01 72 Organism-specific databases
Gene3D 236550 194456 0.46 17 Family and domain databases
GeneCards 20589 19824 0.04 51 Organism-specific databases
GeneDB_Spombe 4977 4932 0.01 75 Organism-specific databases
GeneFarm 2697 2682 0.01 86 Organism-specific databases
GeneID 476676 449760 0.92 6 Genome annotation databases
Genevestigator 64649 64649 0.12 38 Gene expression databases
GenoList 7039 7027 0.01 69 Organism-specific databases
GenomeReviews 377388 357322 0.73 9 Genome annotation databases
GermOnline 41903 41309 0.08 44 Gene expression databases
GlycoSuiteDB 280 280 <0.01 111 PTM databases
GO 2162252 484492 4.17 1 Ontologies
Gramene 4385 4385 0.01 79 Organism-specific databases
H-InvDB 11797 11057 0.02 61 Organism-specific databases
HAMAP 307434 307290 0.59 14 Family and domain databases
HGNC 19710 19529 0.04 53 Organism-specific databases
HOGENOM 360232 360232 0.69 11 Phylogenomic databases
HOVERGEN 74431 74431 0.14 32 Phylogenomic databases
HPA 11304 8342 0.02 63 Organism-specific databases
HSSP 29099 29099 0.06 48 3D structure databases
InParanoid 66267 66267 0.13 35 Phylogenomic databases
IntAct 22181 22180 0.04 50 Protein-protein interaction databases
InterPro 1625443 493263 3.14 2 Family and domain databases
IPI 89002 63795 0.17 29 Sequence databases
KEGG 440485 418771 0.85 8 Genome annotation databases
LegioList 760 758 <0.01 95 Organism-specific databases
Leproma 665 662 <0.01 97 Organism-specific databases
MaizeGDB 472 467 <0.01 101 Organism-specific databases
MEROPS 9931 9619 0.02 64 Protein family/group databases
MGI 16187 16138 0.03 56 Organism-specific databases
MIM 16334 12826 0.03 55 Organism-specific databases
MINT 17490 17490 0.03 54 Protein-protein interaction databases
NextBio 48795 48794 0.09 42 Other
NMPDR 130411 130407 0.25 24 Genome annotation databases
OGP 377 377 <0.01 107 2D gel databases
OMA 368433 368433 0.71 10 Phylogenomic databases
Orphanet 3810 2185 0.01 84 Organism-specific databases
OrthoDB 56359 56359 0.11 40 Phylogenomic databases
PANTHER 185411 170187 0.36 20 Family and domain databases
Pathway_Interaction_DB 4567 1665 0.01 76 Enzyme and pathway databases
PDB 68513 15886 0.13 33 3D structure databases
PDBsum 68511 15885 0.13 34 3D structure databases
PeptideAtlas 5169 5169 0.01 74 Proteomic databases
PeroxiBase 678 666 <0.01 96 Protein family/group databases
Pfam 684756 480642 1.32 4 Family and domain databases
PharmGKB 15791 15780 0.03 57 Organism-specific databases
PHCI-2DPAGE 247 247 <0.01 112 2D gel databases
PhosphoSite 20425 20425 0.04 52 PTM databases
PhosSite 352 352 <0.01 108 PTM databases
PhylomeDB 121668 121668 0.23 25 Phylogenomic databases
PIR 115415 105426 0.22 26 Sequence databases
PIRSF 84174 84174 0.16 30 Family and domain databases
PMAP-CutDB 1395 1395 <0.01 89 Other
PMMA-2DPAGE 52 52 <0.01 121 2D gel databases
PptaseDB 34 34 <0.01 122 Protein family/group databases
PRIDE 53898 53898 0.10 41 Proteomic databases
PRINTS 135570 117364 0.26 23 Family and domain databases
ProDom 27827 27498 0.05 49 Family and domain databases
ProMEX 447 447 <0.01 103 Proteomic databases
PROSITE 459849 292704 0.89 7 Family and domain databases
ProtClustDB 324375 324375 0.63 13 Phylogenomic databases
PseudoCAP 1221 1212 <0.01 91 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 123 2D gel databases
Reactome 7806 4424 0.02 66 Enzyme and pathway databases
REBASE 378 358 <0.01 106 Protein family/group databases
RefSeq 498693 450275 0.96 5 Sequence databases
REPRODUCTION-2DPAGE 1252 1031 <0.01 90 2D gel databases
RGD 7416 7412 0.01 67 Organism-specific databases
SGD 6641 6556 0.01 70 Organism-specific databases
Siena-2DPAGE 103 103 <0.01 116 2D gel databases
SMART 142252 109737 0.27 22 Family and domain databases
SMR 346901 346901 0.67 12 3D structure databases
STRING 203856 203813 0.39 19 Protein-protein interaction databases
SUPFAM 303394 242497 0.59 15 Family and domain databases
SWISS-2DPAGE 1183 1181 <0.01 92 2D gel databases
TAIR 9242 9131 0.02 65 Organism-specific databases
TCDB 3365 3324 0.01 85 Protein family/group databases
TIGR 34039 33272 0.07 46 Genome annotation databases
TIGRFAMs 282974 263200 0.55 16 Family and domain databases
TubercuList 1690 1654 <0.01 88 Organism-specific databases
UCD-2DPAGE 512 502 <0.01 100 2D gel databases
UCSC 48546 39568 0.09 43 Genome annotation databases
UniGene 94075 82817 0.18 28 Sequence databases
VectorBase 431 417 <0.01 104 Genome annotation databases
World-2DPAGE 915 904 <0.01 93 2D gel databases
WormBase 4521 3736 0.01 77 Organism-specific databases
Xenbase 4107 4034 0.01 83 Organism-specific databases
ZFIN 2582 2571 <0.01 87 Organism-specific databases
Total number of cross-referenced databases: 124
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.27 Gln (Q) 3.94 Leu (L) 9.67 Ser (S) 6.51
Arg (R) 5.53 Glu (E) 6.76 Lys (K) 5.85 Thr (T) 5.33
Asn (N) 4.05 Gly (G) 7.09 Met (M) 2.42 Trp (W) 1.08
Asp (D) 5.45 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92
Cys (C) 1.36 Ile (I) 5.99 Pro (P) 4.69 Val (V) 6.87
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4447 entries are encoded on a mitochondrion, and 3573 are encoded on a plasmid.
12175 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11617 on chloroplasts,
44 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 69189