ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot |
UniProt Knowledgebase |
Release notes UniProtKB release 13.0 of 26-Feb-2008 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 13.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 55.0 and the UniProtKB/TrEMBL Protein Database release 38.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 55.0 statistics |
|---|
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
| 52.0 | 03/07 | 261'513 | 95'638'062 |
| 53.0 | 05/07 | 269'293 | 98'902'758 |
| 54.0 | 07/07 | 276'256 | 101'466'206 |
| 55.0 | 02/08 | 356'194 | 127'836'513 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 6'461 |
| C.albicans | None yet | calbican.txt | 679 |
| C.elegans | Wormpep | celegans.txt | 3'153 |
| D.discoideum | DictyBase | dicty.txt | 587 |
| D.melanogaster | FlyBase | fly.txt | 2'747 |
| M.musculus | MGD | mgdtosp.txt | 15'015 |
| S.cerevisiae | SGD | yeast.txt | 6'556 |
| S.pombe | GeneDB_SPombe | pombe.txt | 4'198 |
1. INTRODUCTION
Release 55.0 of 26-Feb-08 of UniProtKB/Swiss-Prot contains 356194 sequence entries,
comprising 127836513 amino acids abstracted from 165776 references.
80183 sequences have been added since release 54.0, the sequence data of
1411 existing entries has been updated and the annotations of
262009 entries have been revised.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.09 Gln (Q) 3.95 Leu (L) 9.67 Ser (S) 6.69
Arg (R) 5.49 Glu (E) 6.72 Lys (K) 5.89 Thr (T) 5.36
Asn (N) 4.05 Gly (G) 7.02 Met (M) 2.41 Trp (W) 1.10
Asp (D) 5.40 His (H) 2.29 Phe (F) 3.89 Tyr (Y) 2.95
Cys (C) 1.43 Ile (I) 5.90 Pro (P) 4.79 Val (V) 6.81
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 11290
The first twenty species represent 92957 sequences: 26.1 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 5244
2x: 1710
3x: 810
4x: 542
5x: 372
6x: 330
7x: 229
8x: 195
9x: 159
10x: 99
11- 20x: 478
21- 50x: 327
51-100x: 141
>100x: 654
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 18609 Homo sapiens (Human)
2 15015 Mus musculus (Mouse)
3 6783 Rattus norvegicus (Rat)
4 6556 Saccharomyces cerevisiae (Baker's yeast)
5 6461 Arabidopsis thaliana (Mouse-ear cress)
6 4846 Bos taurus (Bovine)
7 4343 Escherichia coli (strain K12)
8 4198 Schizosaccharomyces pombe (Fission yeast)
9 3153 Caenorhabditis elegans
10 2871 Bacillus subtilis
11 2747 Drosophila melanogaster (Fruit fly)
12 2559 Xenopus laevis (African clawed frog)
13 2153 Pongo pygmaeus (Orangutan)
14 1973 Gallus gallus (Chicken)
15 1931 Danio rerio (Zebrafish) (Brachydanio rerio)
16 1922 Escherichia coli O157:H7
17 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
18 1774 Haemophilus influenzae
19 1678 Salmonella typhimurium
20 1603 Escherichia coli O6
21 1601 Shigella flexneri
22 1529 Oryza sativa subsp. japonica (Rice)
23 1436 Mycobacterium tuberculosis
24 1287 Sus scrofa (Pig)
25 1271 Salmonella typhi
26 1234 Pseudomonas aeruginosa
27 1177 Mycobacterium bovis
28 1048 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
29 1009 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
30 979 Synechocystis sp. (strain PCC 6803)
31 974 Archaeoglobus fulgidus
32 938 Yersinia pestis
33 909 Acanthamoeba polyphaga mimivirus (APMV)
34 904 Vibrio cholerae
35 882 Rhizobium meliloti (Sinorhizobium meliloti)
36 863 Oryctolagus cuniculus (Rabbit)
37 845 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 844 Staphylococcus aureus (strain N315)
39 823 Salmonella paratyphi A
40 816 Staphylococcus aureus (strain MW2)
41 816 Staphylococcus aureus (strain COL)
42 812 Staphylococcus aureus (strain MSSA476)
43 808 Staphylococcus aureus (strain MRSA252)
44 780 Yersinia pseudotuberculosis
45 769 Salmonella choleraesuis
46 757 Aquifex aeolicus
47 753 Vibrio parahaemolyticus
48 749 Pasteurella multocida
49 745 Shigella sonnei (strain Ss046)
50 744 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
51 736 Canis familiaris (Dog)
52 715 Shigella boydii serotype 4 (strain Sb227)
53 705 Shigella dysenteriae serotype 1 (strain Sd197)
54 703 Ashbya gossypii (Yeast) (Eremothecium gossypii)
55 699 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
56 696 Vibrio vulnificus
57 695 Streptomyces coelicolor
58 687 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
59 687 Mycoplasma pneumoniae
60 686 Staphylococcus epidermidis (strain ATCC 12228)
61 683 Bacillus halodurans
62 683 Neurospora crassa
63 679 Vibrio vulnificus (strain YJ016)
64 679 Candida albicans (Yeast)
65 674 Escherichia coli (strain UTI89 / UPEC)
66 672 Kluyveromyces lactis (Yeast) (Candida sphaerica)
67 668 Photorhabdus luminescens subsp. laumondii
68 664 Pan troglodytes (Chimpanzee)
69 655 Escherichia coli O9:H4 (strain HS)
70 652 Bacillus anthracis
71 650 Escherichia coli O139:H28 (strain E24377A / ETEC)
72 641 Mycobacterium leprae
73 641 Anabaena sp. (strain PCC 7120)
74 629 Pseudomonas syringae pv. tomato
75 626 Pseudomonas putida (strain KT2440)
76 625 Candida glabrata (Yeast) (Torulopsis glabrata)
77 615 Escherichia coli
78 612 Treponema pallidum
79 612 Shigella flexneri serotype 5b (strain 8401)
80 608 Bradyrhizobium japonicum
81 607 Yersinia pestis (biovar Antiqua strain Nepal516)
82 602 Yersinia pestis (biovar Antiqua strain Antiqua)
83 602 Zea mays (Maize)
84 593 Methanobacterium thermoautotrophicum
85 593 Staphylococcus aureus (strain NCTC 8325)
86 587 Dictyostelium discoideum (Slime mold)
87 587 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
88 587 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
89 581 Rickettsia prowazekii
90 580 Bacillus cereus (strain ATCC 14579 / DSM 31)
91 579 Ralstonia solanacearum (Pseudomonas solanacearum)
92 577 Helicobacter pylori (Campylobacter pylori)
93 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
94 569 Yersinia pseudotuberculosis (serotype O:1b / strain IP 31758)
95 568 Rhizobium loti (Mesorhizobium loti)
96 567 Shewanella oneidensis
97 562 Buchnera aphidicola subsp. Schizaphis graminum
98 561 Lactococcus lactis subsp. lactis (Streptococcus lactis)
99 560 Listeria monocytogenes
100 558 Helicobacter pylori J99 (Campylobacter pylori J99)
101 558 Neisseria meningitidis serogroup B
102 557 Escherichia coli O1:K1 / APEC
103 552 Listeria innocua
104 551 Xanthomonas campestris pv. campestris
105 549 Staphylococcus aureus (strain USA300)
106 539 Staphylococcus aureus (strain bovine RF122)
107 538 Yersinia pestis (strain Pestoides F)
108 537 Photobacterium profundum (Photobacterium sp. (strain SS9))
109 537 Neisseria meningitidis serogroup A
110 532 Enterobacter sp. (strain 638)
111 527 Clostridium acetobutylicum
112 525 Staphylococcus haemolyticus (strain JCSC1435)
113 524 Caulobacter crescentus (Caulobacter vibrioides)
114 523 Staphylococcus saprophyticus subsp. saprophyticus
115 521 Brucella melitensis
116 521 Brucella suis
117 519 Bacillus cereus (strain ATCC 10987)
118 517 Xanthomonas axonopodis pv. citri
119 516 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
120 508 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
121 507 Buchnera aphidicola subsp. Baizongia pistaciae
122 505 Oceanobacillus iheyensis
123 503 Streptococcus pneumoniae
124 499 Serratia proteamaculans (strain 568)
125 499 Bacillus thuringiensis subsp. konkukian
126 496 Emericella nidulans (Aspergillus nidulans)
127 495 Yarrowia lipolytica (Candida lipolytica)
128 495 Xylella fastidiosa
129 494 Thermotoga maritima
130 493 Listeria monocytogenes serotype 4b (strain F2365)
131 491 Vibrio fischeri (strain ATCC 700601 / ES114)
132 490 Rickettsia conorii
133 486 Bacillus cereus (strain ZK / E33L)
134 486 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
135 486 Pseudomonas syringae pv. syringae (strain B728a)
136 483 Mycoplasma genitalium
137 478 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
138 478 Pseudomonas fluorescens (strain PfO-1)
139 476 Haemophilus ducreyi
140 475 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
141 475 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
142 472 Deinococcus radiodurans
143 471 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
144 469 Bordetella pertussis
145 468 Bordetella parapertussis
146 468 Chromobacterium violaceum
147 468 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
148 466 Clostridium perfringens
149 465 Corynebacterium glutamicum (Brevibacterium flavum)
150 462 Methanosarcina acetivorans
151 459 Pseudomonas aeruginosa (strain UCBPP-PA14)
152 459 Enterobacter sakazakii (strain ATCC BAA-894)
153 451 Pyrococcus horikoshii
154 447 Pyrococcus abyssi
155 445 Rickettsia felis (Rickettsia azadi)
156 444 Halobacterium salinarium (Halobacterium halobium)
157 443 Sodalis glossinidius (strain morsitans)
158 443 Brucella abortus
159 442 Methanosarcina mazei (Methanosarcina frisia)
160 440 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
161 440 Haemophilus influenzae (strain 86-028NP)
162 439 Mannheimia succiniciproducens (strain MBEL55E)
163 439 Enterococcus faecalis (Streptococcus faecalis)
164 439 Streptomyces avermitilis
165 437 Chlamydia trachomatis
166 433 Lactobacillus plantarum
167 432 Rickettsia bellii (strain RML369-C)
168 431 Streptococcus mutans
169 430 Thermoanaerobacter tengcongensis
170 429 Synechococcus elongatus (Thermosynechococcus elongatus)
171 429 Pyrococcus furiosus
172 428 Burkholderia pseudomallei (Pseudomonas pseudomallei)
173 428 Bacillus clausii (strain KSM-K16)
174 426 Streptococcus pyogenes serotype M6
175 426 Xanthomonas campestris pv. campestris (strain 8004)
176 425 Borrelia burgdorferi (Lyme disease spirochete)
177 425 Ovis aries (Sheep)
178 423 Nicotiana tabacum (Common tobacco)
179 422 Geobacillus kaustophilus
180 418 Campylobacter jejuni
181 418 Pseudomonas entomophila (strain L48)
182 417 Chlamydia pneumoniae (Chlamydophila pneumoniae)
183 416 Acinetobacter sp. (strain ADP1)
184 412 Rhodopseudomonas palustris
185 408 Shewanella sp. (strain MR-7)
186 407 Chlamydia muridarum
187 407 Brucella abortus (strain 2308)
188 406 Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
189 406 Rhizobium sp. (strain NGR234)
190 405 Burkholderia mallei (Pseudomonas mallei)
191 404 Pseudomonas aeruginosa (strain PA7)
192 404 Streptococcus pyogenes serotype M1
193 404 Shewanella sp. (strain MR-4)
194 404 Sulfolobus solfataricus
195 403 Rickettsia typhi
196 402 Xanthomonas campestris pv. vesicatoria (strain 85-10)
197 401 Streptococcus pyogenes serotype M18
198 399 Burkholderia sp. (strain 383) (Burkholderia cepacia
199 399 Streptococcus pyogenes serotype M3
200 398 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
201 395 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
202 392 Nitrosomonas europaea
203 392 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
204 390 Methylococcus capsulatus
205 389 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
206 388 Vibrio cholerae (strain ATCC 39541 / O395)
207 388 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
208 386 Gloeobacter violaceus
209 385 Corynebacterium efficiens
210 382 Chlorobium tepidum
211 380 Aspergillus fumigatus (Sartorya fumigata)
212 379 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
213 378 Ralstonia eutropha (Cupriavidus necator
214 377 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
215 376 Pseudomonas putida (strain F1 / ATCC 700007)
216 376 Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
217 374 Solanum tuberosum (Potato)
218 373 Shewanella sp. (strain ANA-3)
219 371 Idiomarina loihiensis
220 368 Mycobacterium paratuberculosis
221 368 Synechococcus sp. (strain WH8102)
222 368 Shewanella frigidimarina (strain NCIMB 400)
223 367 Pseudoalteromonas haloplanktis (strain TAC 125)
224 366 Streptococcus agalactiae serotype III
225 365 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
226 364 Methanopyrus kandleri
227 364 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
228 362 Streptococcus agalactiae serotype V
229 362 Oryza sativa subsp. indica (Rice)
230 358 Staphylococcus aureus (strain Newman)
231 358 Xanthomonas oryzae pv. oryzae
232 357 Hahella chejuensis (strain KCTC 2396)
233 357 Leptospira interrogans
234 356 Prochlorococcus marinus (strain MIT 9313)
235 356 Coxiella burnetii
236 355 Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
237 355 Dechloromonas aromatica (strain RCB)
238 354 Burkholderia xenovorans (strain LB400)
239 353 Aeropyrum pernix
240 352 Prochlorococcus marinus
241 351 Staphylococcus aureus (strain Mu3 / ATCC 700698)
242 350 Shewanella baltica (strain OS185)
243 349 Shewanella sp. (strain W3-18-1)
244 349 Burkholderia cenocepacia (strain AU 1054)
245 349 Pisum sativum (Garden pea)
246 348 Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
247 347 Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
248 346 Haemophilus influenzae (strain PittEE)
249 346 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
250 345 Geobacter sulfurreducens
251 342 Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
252 340 Burkholderia pseudomallei (strain 1710b)
253 339 Sulfolobus tokodaii
254 339 Glycine max (Soybean)
255 338 Listeria welshimeri serovar 6b (strain ATCC 35897 / DSM 20650 / SLCC5334)
256 338 Shewanella denitrificans (strain OS217 / ATCC BAA-1090 / DSM 15013)
257 338 Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
258 337 Aeromonas salmonicida (strain A449)
259 335 Rhizobium etli (strain CFN 42 / ATCC 51251)
260 331 Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
261 331 Shewanella baltica (strain OS155 / ATCC BAA-1091)
262 330 Actinobacillus pleuropneumoniae serotype 5b (strain L20)
263 330 Legionella pneumophila (strain Paris)
264 330 Pseudomonas mendocina (strain ymp)
265 330 Shewanella loihica (strain BAA-1088 / PV-4)
266 330 Shewanella amazonensis (strain ATCC BAA-1098 / SB2B)
267 329 Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
268 329 Legionella pneumophila (strain Lens)
269 328 Macaca mulatta (Rhesus macaque)
270 327 Nocardia farcinica
271 327 Bacillus amyloliquefaciens (strain FZB42)
272 327 Haemophilus influenzae (strain PittGG)
273 327 Rhodopirellula baltica
274 326 Silicibacter pomeroyi
275 325 Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
276 323 Haemophilus somnus (strain 129Pt) (Histophilus somni (strain 129Pt))
277 323 Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
278 323 Legionella pneumophila subsp. pneumophila
279 323 Fusobacterium nucleatum subsp. nucleatum
280 321 Thiobacillus denitrificans (strain ATCC 25259)
281 321 Thermoplasma acidophilum
282 320 Rhizobium leguminosarum bv. viciae (strain 3841)
283 316 Zymomonas mobilis
284 314 Burkholderia cenocepacia (strain HI2424)
285 313 Symbiobacterium thermophilum
286 313 Wolinella succinogenes
287 312 Bacillus pumilus (strain SAFR-032)
288 312 Neisseria meningitidis serogroup C / serotype 2a (strain ATCC 700532 / FAM18)
289 312 Chromohalobacter salexigens (strain DSM 3043 / ATCC BAA-138 / NCIMB 13768)
290 311 Mycobacterium tuberculosis (strain ATCC 25177 / H37Ra)
291 311 Triticum aestivum (Wheat)
292 309 Bacteroides thetaiotaomicron
293 307 Saccharophagus degradans (strain 2-40 / ATCC 43961 / DSM 17024)
294 307 Clostridium tetani
295 307 Streptococcus agalactiae serotype Ia
296 306 Bordetella avium (strain 197N)
297 305 Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
298 305 Lactococcus lactis subsp. cremoris (strain MG1363)
299 305 Methanococcus maripaludis
300 304 Corynebacterium diphtheriae
301 304 Caenorhabditis briggsae
302 303 Burkholderia cepacia (strain ATCC 53795 / AMMD)
303 303 Mycobacterium bovis (strain BCG / Paris 1173P2)
304 302 Geobacter metallireducens (strain GS-15 / ATCC 53774 / DSM 7210)
305 301 Campylobacter jejuni (strain RM1221)
306 300 Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
307 299 Clostridium perfringens (strain ATCC 13124 / NCTC 8237 / Type A)
308 299 Geobacillus thermodenitrificans (strain NG80-2)
309 298 Hordeum vulgare (Barley)
310 297 Methanosarcina barkeri (strain Fusaro / DSM 804)
311 297 Rhodopseudomonas palustris (strain HaA2)
312 296 Sulfolobus acidocaldarius
313 296 Staphylococcus aureus
314 295 Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
315 294 Streptococcus thermophilus (strain CNRZ 1066)
316 293 Actinobacillus succinogenes (strain ATCC 55618 / 130Z)
317 293 Pseudoalteromonas atlantica (strain T6c / BAA-1087)
318 293 Rhodopseudomonas palustris (strain BisB18)
319 292 Haloarcula marismortui (Halobacterium marismortui)
320 292 Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
321 291 Cavia porcellus (Guinea pig)
322 291 Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
323 290 Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
324 290 Streptococcus pneumoniae serotype 2 (strain D39 / NCTC 7466)
325 289 Pseudomonas stutzeri (strain A1501)
326 289 Bacteroides fragilis
327 289 Pyrobaculum aerophilum
328 288 Psychromonas ingrahamii (strain 37)
329 288 Rhodoferax ferrireducens (strain DSM 15236 / ATCC BAA-621 / T118)
330 287 Prochlorococcus marinus (strain NATL2A)
331 287 Bacillus thuringiensis (strain Al Hakam)
332 287 Thiomicrospira crunogena (strain XCL-2)
333 285 Thermoplasma volcanium
334 284 Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia
335 284 Clostridium perfringens (strain SM101 / Type A)
336 284 Methylobacillus flagellatus (strain KT / ATCC 51484 / DSM 6875)
337 284 Gluconobacter oxydans (Gluconobacter suboxydans)
338 283 Burkholderia pseudomallei (strain 1106a)
339 283 Bartonella henselae (Rochalimaea henselae)
340 281 Helicobacter hepaticus
341 280 Nitrobacter hamburgensis (strain X14 / DSM 10229)
342 280 Sinorhizobium medicae (strain WSM419) (Ensifer medicae)
343 278 Spinacia oleracea (Spinach)
344 278 Rhodopseudomonas palustris (strain BisB5)
345 277 Pseudomonas putida
346 277 Alcanivorax borkumensis (strain SK2 / ATCC 700651 / DSM 11573)
347 277 Carboxydothermus hydrogenoformans (strain Z-2901 / DSM 6008)
348 277 Prochlorococcus marinus (strain MIT 9312)
349 276 Streptococcus pyogenes serotype M28
350 276 Bartonella quintana (Rochalimaea quintana)
351 276 Mesorhizobium sp. (strain BNC1)
352 275 Wigglesworthia glossinidia brevipalpis
353 274 Cryptococcus neoformans (Filobasidiella neoformans)
354 274 Bifidobacterium longum
355 273 Psychrobacter arcticum
356 273 Desulfotalea psychrophila
357 273 Azoarcus sp. (strain BH72)
358 272 Synechococcus sp. (strain CC9902)
359 272 Burkholderia mallei (strain NCTC 10247)
360 271 Burkholderia mallei (strain NCTC 10229)
361 270 Gorilla gorilla gorilla (Lowland gorilla)
362 270 Burkholderia pseudomallei (strain 668)
363 269 Equus caballus (Horse)
364 269 Ochrobactrum anthropi (strain ATCC 49188 / DSM 6882 / NCTC 12168)
365 268 Bacteriophage T4
366 268 Lactobacillus johnsonii
367 268 Alkalilimnicola ehrlichei (strain MLHE-1)
368 267 Marinobacter aquaeolei (Marinobacter hydrocarbonoclasticus
369 267 Roseobacter denitrificans (strain ATCC 33942 / OCh 114) (Erythrobacter sp.
370 267 Porphyromonas gingivalis (Bacteroides gingivalis)
371 267 Streptococcus pyogenes serotype M5 (strain Manfredo)
372 266 Lactococcus lactis subsp. cremoris (strain SK11)
373 266 Burkholderia mallei (strain SAVP1)
374 266 Leifsonia xyli subsp. xyli
375 264 Synechococcus sp. (strain CC9605)
376 264 Streptococcus sanguinis (strain SK36)
377 264 Lactobacillus sakei subsp. sakei (strain 23K)
378 263 Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
379 262 Rhodobacter capsulatus (Rhodopseudomonas capsulata)
380 262 Nitrosomonas eutropha (strain C71)
381 262 Streptococcus thermophilus (strain ATCC BAA-491 / LMD-9)
382 261 Blochmannia floridanus
383 261 Moorella thermoacetica (strain ATCC 39073)
384 258 Ustilago maydis (Smut fungus)
385 257 Chlamydophila caviae
386 257 Propionibacterium acnes
387 257 Campylobacter jejuni subsp. jejuni serotype O:23/36 (strain 81-176)
388 257 Bacteroides fragilis (strain ATCC 25285 / NCTC 9343)
389 256 Lactobacillus acidophilus
390 256 Psychrobacter cryohalolentis (strain K5)
391 255 Rhodopseudomonas palustris (strain BisA53)
392 254 Francisella tularensis subsp. tularensis
393 254 Vaccinia virus (strain Copenhagen) (VACV)
394 253 Synechococcus sp. (strain JA-2-3B'a(2-13))
395 251 Helicobacter pylori (strain HPAG1)
396 250 Legionella pneumophila (strain Corby)
397 250 Synechococcus sp. (strain JA-3-3Ab)
398 250 Silicibacter sp. (strain TM1040)
399 249 Jannaschia sp. (strain CCS1)
400 246 Brucella ovis (strain ATCC 25840 / 63/290 / NCTC 10512)
401 245 Desulfovibrio desulfuricans (strain G20)
402 244 Campylobacter jejuni subsp. jejuni serotype O:6 (strain 81116 / NCTC 11828)
403 244 Streptococcus pyogenes serotype M12 (strain MGAS9429)
404 243 Rhodobacter sphaeroides (strain ATCC 17029 / ATH 2.4.9)
405 243 Chlorobium chlorochromatii (strain CaD3)
406 242 Streptococcus pyogenes serotype M4 (strain MGAS10750)
407 241 Streptococcus pyogenes serotype M2 (strain MGAS10270)
408 240 Magnetospirillum magneticum (strain AMB-1 / ATCC 700264)
409 239 Bdellovibrio bacteriovorus
410 239 Francisella tularensis subsp. holarctica (strain LVS)
411 237 Trichodesmium erythraeum (strain IMS101)
412 237 Corynebacterium jeikeium (strain K411)
413 237 Clostridium novyi (strain NT)
414 237 Aspergillus oryzae
415 235 Bacillus stearothermophilus (Geobacillus stearothermophilus)
416 235 Halorhodospira halophila (strain DSM 244 / SL1) (Ectothiorhodospira halophila
417 234 Mycobacterium ulcerans (strain Agy99)
418 234 Polaromonas sp. (strain JS666 / ATCC BAA-500)
419 234 Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
420 233 Rhodococcus sp. (strain RHA1)
421 233 Pelodictyon luteolum (strain DSM 273) (Chlorobium luteolum (strain DSM 273))
422 233 Anaeromyxobacter dehalogenans (strain 2CP-C)
423 232 Corynebacterium glutamicum (strain R)
424 232 Thermobifida fusca (strain YX)
425 230 Treponema denticola
426 230 Novosphingobium aromaticivorans (strain DSM 12444)
427 229 Lactobacillus salivarius subsp. salivarius (strain UCC118)
428 229 Acidovorax avenae subsp. citrulli (strain AAC00-1)
429 229 Chlamydomonas reinhardtii
430 229 Bradyrhizobium sp. (strain ORS278)
431 227 Blochmannia pennsylvanicus (strain BPEN)
432 227 Sulfurimonas denitrificans (Thiomicrospira denitrificans
433 226 Clostridium botulinum (strain ATCC 19397 / Type A)
434 226 Thermotoga petrophila (strain RKU-1 / ATCC BAA-488 / DSM 13995)
435 226 Streptococcus pyogenes serotype M12 (strain MGAS2096)
436 226 Prochlorococcus marinus (strain MIT 9301)
437 226 Clostridium botulinum (strain Langeland / NCTC 10281 / Type F)
438 224 Prochlorococcus marinus (strain MIT 9515)
439 224 Mycobacterium avium (strain 104)
440 223 Desulfitobacterium hafniense (strain Y51)
441 221 Myxococcus xanthus (strain DK 1622)
442 220 Prochlorococcus marinus (strain AS9601)
443 220 Francisella tularensis subsp. tularensis (strain FSC 198)
444 219 Chlamydia trachomatis (strain A/HAR-13 / ATCC VR-571B)
445 219 Desulfovibrio vulgaris subsp. vulgaris (strain DP4)
446 219 Campylobacter jejuni subsp. doylei (strain ATCC BAA-1458 / RM4099 / 269.97)
447 219 Baumannia cicadellinicola subsp. Homalodisca coagulata
448 219 Synechococcus sp. (strain CC9311)
449 217 Cricetulus griseus (Chinese hamster)
450 217 Porphyra purpurea
451 217 Natronomonas pharaonis (strain DSM 2160 / ATCC 35678)
452 217 Francisella tularensis subsp. holarctica (strain OSU18)
453 216 Janthinobacterium sp. (strain Marseille) (Minibacterium massiliensis)
454 216 Paracoccus denitrificans (strain Pd 1222)
455 216 Prochlorococcus marinus (strain NATL1A)
456 215 Protochlamydia amoebophila (strain UWE25)
457 214 Klebsiella pneumoniae
458 214 Francisella tularensis subsp. novicida (strain U112)
459 214 Clostridium thermocellum (strain ATCC 27405 / DSM 1237)
460 213 Clostridium difficile (strain 630)
461 213 Syntrophus aciditrophicus (strain SB)
462 212 Felis silvestris catus (Cat)
463 212 Mycobacterium sp. (strain MCS)
464 212 Chlamydophila abortus
465 212 Synechococcus sp. (strain WH7803)
466 212 Rhodobacter sphaeroides (strain ATCC 17025 / ATH 2.4.3)
467 211 Leptospira borgpetersenii serovar Hardjo-bovis (strain JB197)
468 211 Mycobacterium vanbaalenii (strain DSM 7251 / PYR-1)
469 210 Herminiimonas arsenicoxydans
470 210 Polaromonas naphthalenivorans (strain CJ2)
471 209 Gibberella zeae (Fusarium graminearum)
472 209 Helicobacter acinonychis (strain Sheeba)
473 209 Acidovorax sp. (strain JS42)
474 208 Prochlorococcus marinus (strain MIT 9215)
475 208 Porphyra yezoensis
476 208 Prochlorococcus marinus (strain MIT 9303)
477 207 Chlorobium phaeobacteroides (strain DSM 266)
478 206 Clostridium botulinum (strain Hall / ATCC 3502 / NCTC 13319 / Type A)
479 206 Sphingopyxis alaskensis (Sphingomonas alaskensis)
480 205 Lactobacillus casei (strain ATCC 334)
481 204 Lactobacillus delbrueckii subsp. bulgaricus (strain ATCC 11842 / DSM 20081)
482 204 Mesocricetus auratus (Golden hamster)
483 204 Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
484 203 Pediococcus pentosaceus (strain ATCC 25745 / 183-1w)
485 203 Dehalococcoides sp. (strain CBDB1)
486 203 Francisella tularensis subsp. tularensis (strain WY96-3418)
487 202 Encephalitozoon cuniculi
488 202 Methanococcus vannielii (strain SB / ATCC 35089 / DSM 1224)
489 201 Dehalococcoides ethenogenes (strain 195)
490 201 Mycobacterium sp. (strain KMS)
491 200 Pelobacter propionicus (strain DSM 2379)
492 200 Vaccinia virus (strain Western Reserve / WR) (VACV)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 13942 ( 4%)
Bacteria 198528 ( 56%)
Eukaryota 131953 ( 37%)
Viruses 11771 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 18610 ( 14%) ( 5%)
Other Mammalia 40919 ( 31%) ( 11%)
Other Vertebrata 13069 ( 10%) ( 4%)
Viridiplantae 22036 ( 17%) ( 6%)
Fungi 20495 ( 16%) ( 6%)
Insecta 5343 ( 4%) ( 2%)
Nematoda 3702 ( 3%) ( 1%)
Other 7779 ( 6%) ( 2%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 6132 1001-1100 2680
51- 100 26810 1101-1200 1822
101- 150 38249 1201-1300 1444
151- 200 37092 1301-1400 1316
201- 250 37148 1401-1500 1051
251- 300 32525 1501-1600 522
301- 350 31514 1601-1700 396
351- 400 27716 1701-1800 338
401- 450 23033 1801-1900 330
451- 500 18961 1901-2000 259
501- 550 13546 2001-2100 159
551- 600 9888 2101-2200 230
601- 650 8484 2201-2300 211
651- 700 5714 2301-2400 142
701- 750 4531 2401-2500 104
751- 800 3622 >2500 794
801- 850 3059
851- 900 3253
901- 950 2722
951-1000 1922
The average sequence length in UniProtKB/Swiss-Prot is 358 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1884
5.1 Table of the frequency of journal citations
Journals cited 1x: 635
2x: 252
3x: 127
4x: 105
5x: 64
6x: 56
7x: 35
8x: 41
9x: 37
10x: 16
11- 20x: 152
21- 50x: 143
51-100x: 84
>100x: 137
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 15830 Journal of Biological Chemistry
2 7462 Proceedings of the National Academy of Sciences of the U.S.A.
3 4618 Journal of Bacteriology
4 4344 Gene
5 4134 Nucleic Acids Research
6 4064 Biochemical and Biophysical Research Communications
7 3679 FEBS Letters
8 3440 Biochemistry
9 3404 The EMBO Journal
10 3015 Molecular and Cellular Biology
11 2982 European Journal of Biochemistry
12 2912 Nature
13 2752 Biochimica et Biophysica Acta
14 2639 Journal of Molecular Biology
15 2387 Genomics
16 2360 Cell
17 1958 Biochemical Journal
18 1840 Science
19 1546 Molecular Microbiology
20 1504 Journal of Virology
21 1391 Plant Molecular Biology
22 1387 Journal of Cell Biology
23 1283 Molecular and General Genetics
24 1195 Virology
25 1165 Human Molecular Genetics
26 1158 Nature Genetics
27 1148 Genes and Development
28 1108 Journal of Biochemistry
29 1054 Oncogene
30 1045 The American Journal of Human Genetics
31 1045 Plant Physiology
32 908 Development
33 868 Human Mutation
34 868 Journal of Immunology
35 827 Genetics
36 798 Infection and Immunity
37 777 Structure
38 772 Molecular Biology of the Cell
39 743 Journal of General Virology
40 730 Archives of Biochemistry and Biophysics
41 712 Yeast
42 677 Blood
43 653 The Plant Cell
44 646 Microbiology
45 603 Molecular Cell
46 584 FEMS Microbiology Letters
47 567 Cancer Research
48 567 Developmental Biology
49 563 Journal of Cell Science
50 560 Nature Structural Biology
51 544 Human Genetics
52 544 The Plant Journal
53 505 Mechanisms of Development
54 501 Current Genetics
55 487 Current Biology
56 459 Journal of Clinical Investigation
57 458 Applied and Environmental Microbiology
58 455 Neuron
59 451 Protein Science
60 450 Acta Crystallographica, Section D
61 449 Journal of Neuroscience
62 443 Mammalian Genome
63 411 Molecular Endocrinology
64 409 Molecular and Biochemical Parasitology
65 408 The Journal of Experimental Medicine
66 389 Immunogenetics
67 374 Journal of Neurochemistry
68 373 Toxicon
69 366 American Journal of Physiology
70 360 Endocrinology
71 355 Journal of Molecular Evolution
72 347 DNA and Cell Biology
73 334 The Journal of Clinical Endocrinology and Metabolism
74 334 DNA Sequence
75 316 Molecular Biology and Evolution
76 307 Bioscience, Biotechnology, and Biochemistry
77 300 Brain Research. Molecular Brain Research
78 285 Biological Chemistry Hoppe-Seyler
79 284 Journal of Medical Genetics
80 270 Proteins
81 263 Cytogenetics and Cell Genetics
82 257 Comparative Biochemistry and Physiology
83 247 Journal of Investigative Dermatology
84 244 Journal of General Microbiology
85 243 Peptides
86 238 Antimicrobial Agents and Chemotherapy
87 238 Molecular Pharmacology
88 229 Biology of Reproduction
89 226 Nature Cell Biology
90 219 Plant and Cell Physiology
91 215 Genome Research
92 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
93 208 Experimental Cell Research
94 206 Virus Research
95 193 Molecular Plant-Microbe Interactions
96 191 Neurology
97 187 DNA Research
98 184 European Journal of Immunology
99 183 RNA
100 181 Developmental Dynamics
101 172 Biochimie
102 168 Annals of Neurology
103 165 European Journal of Human Genetics
104 163 Molecular and Cellular Endocrinology
105 163 Tissue Antigens
106 159 Journal of Human Genetics
107 159 DNA
108 158 Planta
109 155 Genes to Cells
110 153 American Journal of Medical Genetics
111 152 Molecular Phylogenetics and Evolution
112 152 Hemoglobin
113 149 Developmental Cell
114 148 Immunity
115 146 Archives of Microbiology
116 145 Bioorganicheskaia Khimiia
117 139 Insect Biochemistry and Molecular Biology
118 136 The New England Journal of Medicine
119 133 Molecular Reproduction and Development
120 130 Diabetes
121 130 Animal Genetics
122 130 Investigative Ophthalmology and Visual Science
123 128 Glycobiology
124 127 Molecular Immunology
125 123 General and Comparative Endocrinology
126 121 Molecular and Cellular Neuroscience
127 118 Agricultural and Biological Chemistry
128 117 Eukaryotic cell
129 116 Archives of Virology
130 111 International Journal of Cancer
131 110 British Journal of Haematology
132 109 The FASEB Journal
133 107 Journal of Protein Chemistry
134 104 Molecular Genetics and Metabolism
135 102 Journal of Neuroscience Research
136 102 EMBO Reports
137 101 Molecular Genetics and Genomics
138 100 Biochemistry and Molecular Biology International
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 658462 1.85
Journal 541550 286387 1.52
Submitted to EMBL/GenBank/DDBJ 110587 101077 0.31
Submitted to other databases 4361 4022 0.01
Book citation 580 570 <0.01
Plant Gene Register 539 527 <0.01
Thesis 416 414 <0.01
Unpublished observations 284 280 <0.01
Patent 139 137 <0.01
Worm Breeder's Gazette 6 6 <0.01
Comments (CC) 1457200 4.09
SIMILARITY 406912 333079 1.14
FUNCTION 254080 244576 0.71
SUBCELLULAR LOCATION 201326 197449 0.57
CATALYTIC ACTIVITY 140827 129089 0.40
SUBUNIT 137675 137675 0.39
PATHWAY 80947 70624 0.23
COFACTOR 57092 52038 0.16
TISSUE SPECIFICITY 28336 28336 0.08
PTM 26602 21828 0.07
MISCELLANEOUS 25620 23189 0.07
DOMAIN 22053 19330 0.06
ALTERNATIVE PRODUCTS 14956 14956 0.04
SEQUENCE CAUTION 9219 9219 0.03
INTERACTION 9147 9147 0.03
INDUCTION 8458 8458 0.02
DEVELOPMENTAL STAGE 7152 7152 0.02
ENZYME REGULATION 5354 5354 0.02
WEB RESOURCE 5071 4011 0.01
CAUTION 4606 4509 0.01
DISEASE 4200 2911 0.01
MASS SPECTROMETRY 3242 2580 0.01
BIOPHYSICOCHEMICAL PROPERTIES 2026 2026 0.01
POLYMORPHISM 675 650 <0.01
RNA EDITING 517 517 <0.01
ALLERGEN 432 432 <0.01
TOXIC DOSE 367 359 <0.01
BIOTECHNOLOGY 229 227 <0.01
PHARMACEUTICAL 79 79 <0.01
Features (FT) 2193020 6.16
CHAIN 361940 352406 1.02
TRANSMEM 216487 47150 0.61
METAL 159687 39854 0.45
BINDING 107776 35320 0.30
CONFLICT 103693 35990 0.29
DOMAIN 103615 58710 0.29
TOPO_DOM 98201 20112 0.28
STRAND 92184 8655 0.26
HELIX 88577 9093 0.25
ACT_SITE 86543 50940 0.24
MOD_RES 85863 31943 0.24
CARBOHYD 83674 21422 0.23
DISULFID 82625 20876 0.23
REPEAT 64854 9936 0.18
NP_BIND 60272 42086 0.17
REGION 54380 29930 0.15
VARIANT 54045 11916 0.15
VAR_SEQ 32496 13901 0.09
COMPBIAS 31544 18673 0.09
SIGNAL 28029 28019 0.08
TURN 23669 7381 0.07
MOTIF 22815 14879 0.06
ZN_FING 22612 8858 0.06
MUTAGEN 22566 5463 0.06
SITE 22452 12757 0.06
COILED 13384 8710 0.04
INIT_MET 11919 11919 0.03
NON_TER 10874 8325 0.03
PROPEP 8895 7475 0.02
LIPID 8754 5657 0.02
DNA_BIND 8192 7572 0.02
PEPTIDE 7143 4444 0.02
TRANSIT 5139 5085 0.01
CA_BIND 2958 1258 0.01
CROSSLNK 2793 1949 0.01
NON_CONS 1405 571 <0.01
UNSURE 626 208 <0.01
NON_STD 339 265 <0.01
Cross-references (DR) 5789634 16.25
InterPro 825089 331328 2.32
EMBL 627265 347443 1.76
GO 497745 216624 1.40
Pfam 459384 322980 1.29
PROSITE 321640 201793 0.90
RefSeq 304120 279958 0.85
GeneID 294610 279522 0.83
KEGG 264948 246931 0.74
GenomeReviews 226623 209455 0.64
HAMAP 178738 178641 0.50
TIGRFAMs 161263 151501 0.45
Gene3D 144931 122545 0.41
BioCyc 144788 138347 0.41
PANTHER 127681 117494 0.36
PRINTS 116051 94354 0.33
PIR 109846 100280 0.31
ProDom 100202 97501 0.28
SMART 95398 72741 0.27
HSSP 83032 83032 0.23
UniGene 74720 68329 0.21
Ensembl 63363 62014 0.18
SMR 49846 49846 0.14
PDBsum 48670 12327 0.14
PDB 48670 12327 0.14
ArrayExpress 44118 44118 0.12
PIRSF 43200 43200 0.12
GermOnline 41993 41381 0.12
TIGR 30250 29583 0.08
CleanEx 27382 26814 0.08
HGNC 17972 17859 0.05
LinkHub 17909 17909 0.05
IntAct 16028 16028 0.04
PharmGKB 15517 15512 0.04
MGI 14890 14840 0.04
MIM 14743 11762 0.04
PhosphoSite 14155 14155 0.04
H-InvDB 11268 9573 0.03
DIP 8984 8934 0.03
MEROPS 6722 6399 0.02
RGD 6656 6651 0.02
SGD 6643 6541 0.02
CYGD 6630 6526 0.02
TAIR 6545 6431 0.02
PeptideAtlas 5132 5132 0.01
EcoGene 4331 4328 0.01
GeneDB_Spombe 4236 4196 0.01
EchoBASE 4159 4124 0.01
WormPep 3811 3150 0.01
FlyBase 3591 3463 0.01
Gramene 3537 3535 0.01
WormBase 3514 3432 0.01
Reactome 3398 2056 0.01
HPA 2986 2565 0.01
TRANSFAC 2907 2608 0.01
SubtiList 2812 2811 0.01
Orphanet 2618 1670 0.01
GeneFarm 2091 2071 0.01
ZFIN 1852 1838 0.01
DrugBank 1821 501 0.01
StyGene 1631 1627 <0.01
TubercuList 1464 1428 <0.01
SWISS-2DPAGE 1184 1182 <0.01
PseudoCAP 1173 1164 <0.01
ListiList 1113 1105 <0.01
REPRODUCTION-2DPAGE 1025 937 <0.01
AGD 709 703 <0.01
PhotoList 668 668 <0.01
LegioList 659 659 <0.01
Leproma 644 641 <0.01
dictyBase 637 587 <0.01
World-2DPAGE 492 492 <0.01
MaizeGDB 463 458 <0.01
PeroxiBase 446 435 <0.01
DisProt 397 394 <0.01
OGP 380 378 <0.01
SagaList 367 366 <0.01
REBASE 366 360 <0.01
HIV 361 351 <0.01
ECO2DBASE 351 299 <0.01
GlycoSuiteDB 282 282 <0.01
PHCI-2DPAGE 241 241 <0.01
BuruList 234 234 <0.01
MypuList 198 198 <0.01
VectorBase 192 186 <0.01
DOSAC-COBS-2DPAGE 152 150 <0.01
Aarhus/Ghent-2DPAGE 126 96 <0.01
Siena-2DPAGE 102 102 <0.01
HSC-2DPAGE 85 85 <0.01
2DBase-Ecoli 84 84 <0.01
PhosSite 70 70 <0.01
Cornea-2DPAGE 67 67 <0.01
COMPLUYEAST-2DPAGE 59 59 <0.01
euHCVdb 55 44 <0.01
PMMA-2DPAGE 52 52 <0.01
PptaseDB 31 31 <0.01
Rat-heart-2DPAGE 28 28 <0.01
ANU-2DPAGE 22 22 <0.01
Number of explicitly cross-referenced databases: 98
Number of implicitly cross-referenced databases: 25
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 254724
Total number of entries encoded on a Mitochondrion: 4370
Total number of entries encoded on a Plasmid: 3377
Total number of entries encoded on a Plastid: 72
Total number of entries encoded on a Plastid; Apicoplast: 14
Total number of entries encoded on a Plastid; Chloroplast: 8834
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 118
Number of fragments: 8475
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 24213
| UniProtKB/TrEMBL protein database release 38.0 statistics |
|---|
1. INTRODUCTION
Release 38.0 of 26-Feb-2008 of UniProtKB/TrEMBL contains 5395414 sequence entries
comprising 1746448602 amino acids.
988909 sequences have been added since release 37, the sequence data of
19885 existing entries has been updated and the annotations of
3686771 entries have been revised. This represents an increase of 23%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.56 Gln (Q) 3.90 Leu (L) 9.85 Ser (S) 6.82
Arg (R) 5.55 Glu (E) 6.05 Lys (K) 5.21 Thr (T) 5.59
Asn (N) 4.19 Gly (G) 7.06 Met (M) 2.41 Trp (W) 1.33
Asp (D) 5.26 His (H) 2.22 Phe (F) 4.05 Tyr (Y) 3.02
Cys (C) 1.35 Ile (I) 5.92 Pro (P) 4.83 Val (V) 6.65
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.07
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 155282
The first twenty species represent 910939 sequences: 16.9 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:71108
2x:28370
3x:14802
4x: 8349
5x: 4833
6x: 3577
7x: 2706
8x: 2194
9x: 1731
10x: 2022
11- 20x: 8971
21- 50x: 3194
51-100x: 1327
>100x: 2098
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 208379 Human immunodeficiency virus 1
2 95503 Oryza sativa subsp. japonica (Rice)
3 54297 Vitis vinifera (Grape)
4 52760 Homo sapiens (Human)
5 50189 Trichomonas vaginalis G3
6 43738 Mus musculus (Mouse)
7 43432 Arabidopsis thaliana (Mouse-ear cress)
8 40657 Hepatitis C virus
9 39808 Paramecium tetraurelia
10 39306 Oryza sativa subsp. indica (Rice)
11 35649 Physcomitrella patens subsp. patens
12 28061 Tetraodon nigroviridis (Green puffer)
13 27783 Drosophila melanogaster (Fruit fly)
14 24866 Nematostella vectensis (Starlet sea anemone)
15 24850 uncultured bacterium
16 22520 Danio rerio (Zebrafish) (Brachydanio rerio)
17 20488 Trypanosoma cruzi
18 20471 Caenorhabditis elegans
19 19334 Caenorhabditis briggsae
20 18848 Hepatitis B virus (HBV)
21 17895 Laccaria bicolor S238N-H82
22 16810 Aedes aegypti (Yellowfever mosquito)
23 16685 Tetrahymena thermophila SB210
24 16332 Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
25 15912 Phaeosphaeria nodorum (Septoria nodorum)
26 14678 Chlamydomonas reinhardtii (Chlamydomonas smithii)
27 14676 Plasmodium chabaudi
28 14359 Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold)
29 14058 Aspergillus niger
30 13923 Dictyostelium discoideum (Slime mold)
31 13523 Coprinopsis cinerea okayama7#130
32 12791 Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
33 12773 Anopheles gambiae str. PEST
34 12475 Xenopus laevis (African clawed frog)
35 11969 Aspergillus oryzae
36 11784 Plasmodium berghei
37 11570 Brugia malayi (Filarial nematode worm)
38 10945 Chaetomium globosum (Soil fungus)
39 10457 Neurospora crassa
40 10358 Coccidioides immitis
41 10339 Bos taurus (Bovine)
42 10302 Neosartorya fischeri (Aspergillus fischerianus
43 10298 Aspergillus terreus (strain NIH 2624)
44 10009 Drosophila pseudoobscura (Fruit fly)
45 9723 Schistosoma japonicum (Blood fluke)
46 9707 Cryptococcus neoformans (Filobasidiella neoformans)
47 9676 Aspergillus fumigatus (Sartorya fumigata)
48 9473 Emericella nidulans (Aspergillus nidulans)
49 9463 Trypanosoma brucei
50 9444 Escherichia coli
51 9320 Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
52 9290 Candida albicans (Yeast)
53 9225 Ajellomyces capsulata (strain NAm1) (Histoplasma capsulatum)
54 9155 Monosiga brevicollis MX1
55 9132 Hepatitis C virus subtype 1b
56 9012 Aspergillus clavatus
57 8861 Rhodococcus sp. (strain RHA1)
58 8607 Entamoeba dispar SAW760
59 8603 Methylobacterium nodulans ORS 2060
60 8512 Stigmatella aurantiaca DW4/3-1
61 8437 Plesiocystis pacifica SIR-1
62 8427 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
63 8249 Microscilla marina ATCC 23134
64 8238 Burkholderia xenovorans (strain LB400)
65 8195 Rattus norvegicus (Rat)
66 8177 Helicobacter pylori (Campylobacter pylori)
67 8172 Acaryochloris marina (strain MBIC 11017)
68 8167 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
69 8120 Bradyrhizobium japonicum
70 8013 Leishmania infantum
71 7971 Ostreococcus tauri
72 7887 Leishmania braziliensis
73 7835 Burkholderia phymatum STM815
74 7810 Plasmodium yoelii yoelii
75 7612 Solibacter usitatus (strain Ellin6076)
76 7515 Streptomyces coelicolor
77 7461 Burkholderia cenocepacia MC0-3
78 7434 Plasmodium vivax
79 7401 Ostreococcus lucimarinus (strain CCE9901)
80 7349 Burkholderia pseudomallei 305
81 7336 Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
82 7316 Burkholderia sp. (strain 383) (Burkholderia cepacia
83 7310 Burkholderia phytofirmans PsJN
84 7279 Streptomyces avermitilis
85 7274 Clostridium bolteae ATCC BAA-613
86 7142 Rhizobium loti (Mesorhizobium loti)
87 7133 Frankia sp. EAN1pec
88 7126 Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia
89 7116 Leishmania major
90 7102 Plasmodium falciparum
91 7097 Myxococcus xanthus (strain DK 1622)
92 7011 Saccharopolyspora erythraea (strain NRRL 23338)
93 7010 Pseudomonas aeruginosa
94 6996 Burkholderia ambifaria MC40-6
95 6974 Methylobacterium sp. 4-46
96 6946 Rhodopirellula baltica
97 6945 Burkholderia pseudomallei (strain 668)
98 6864 Burkholderia pseudomallei (strain 1106a)
99 6808 Rhizobium leguminosarum bv. viciae (strain 3841)
100 6679 Psychroflexus torquis ATCC 700755
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 112655 ( 2%)
Bacteria 2905746 ( 54%)
Eukaryota 1785622 ( 33%)
Viruses 586657 ( 11%)
Other 4733 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 52761 ( 3%) ( 1%)
Other Mammalia 130205 ( 7%) ( 2%)
Other Vertebrata 197451 ( 11%) ( 4%)
Viridiplantae 471101 ( 26%) ( 9%)
Fungi 327023 ( 18%) ( 6%)
Insecta 158021 ( 9%) ( 3%)
Nematoda 56038 ( 3%) ( 1%)
Other 393022 ( 22%) ( 7%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 91470 1001-1100 33181
51- 100 383677 1101-1200 23244
101- 150 468708 1201-1300 15878
151- 200 447604 1301-1400 10607
201- 250 449588 1401-1500 8618
251- 300 430199 1501-1600 6336
301- 350 398830 1601-1700 4861
351- 400 314841 1701-1800 3959
401- 450 258394 1801-1900 2977
451- 500 217852 1901-2000 2552
501- 550 156341 2001-2100 2030
551- 600 116102 2101-2200 2085
601- 650 86645 2201-2300 1640
651- 700 67840 2301-2400 1326
701- 750 59065 2401-2500 1089
751- 800 53008 >2500 9352
801- 850 38898
851- 900 34173
901- 950 24888
951-1000 19770
The average sequence length in UniProtKB/TrEMBL is 323 amino acids.
The shortest sequence is Q16047_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 7019903 1.30
Submitted to EMBL/GenBank/DDBJ 3757942 3171477 0.70
Journal 3147573 2771279 0.58
Thesis 6762 6709 <0.01
Submitted to other databases 4688 4671 <0.01
Book citation 4382 4337 <0.01
Other 98556 98380 0.02
Comments (CC) 3791142 0.70
SIMILARITY 1200496 1084100 0.22
CAUTION 1126073 1126073 0.21
CATALYTIC ACTIVITY 403074 342747 0.07
FUNCTION 367781 356067 0.07
SUBCELLULAR LOCATION 315018 315008 0.06
PATHWAY 130311 121801 0.02
SUBUNIT 123048 122305 0.02
COFACTOR 114551 112375 0.02
MISCELLANEOUS 5502 5502 <0.01
INTERACTION 4745 4745 <0.01
DOMAIN 543 543 <0.01
Features (FT) 2300912 0.43
NON_TER 1925207 1145607 0.36
CHAIN 224689 186970 0.04
SIGNAL 150459 150459 0.03
TRANSIT 557 557 <0.01
Cross-references (DR) 47445741 8.79
GO 9719800 3154534 1.80
InterPro 8029335 3799022 1.49
EMBL 6079047 5387662 1.13
Pfam 4861188 3591860 0.90
PROSITE 2596670 1703309 0.48
RefSeq 2269376 2189726 0.42
GeneID 2262156 2188782 0.42
KEGG 1879939 1814567 0.35
GenomeReviews 1592238 1541926 0.30
Gene3D 1382127 1183950 0.26
PRINTS 1000951 841622 0.19
SMART 915287 717020 0.17
TIGRFAMs 801155 734682 0.15
PANTHER 780419 739026 0.14
ProDom 634393 605610 0.12
SMR 499090 498972 0.09
BioCyc 305512 292732 0.06
UniGene 285368 260272 0.05
HSSP 264084 263797 0.05
TIGR 200230 192919 0.04
PIR 183544 150486 0.03
PIRSF 180767 180767 0.03
Ensembl 166331 159129 0.03
ArrayExpress 104405 104378 0.02
Gramene 70292 70292 0.01
euHCVdb 47780 47780 0.01
MGI 42265 42052 0.01
FlyBase 35850 35710 0.01
HGNC 31568 31534 0.01
VectorBase 29029 28730 0.01
MEROPS 27182 26524 0.01
WormPep 19555 19453 <0.01
WormBase 19547 19453 <0.01
TAIR 18856 18804 <0.01
ZFIN 16420 16413 <0.01
LinkHub 12490 12490 <0.01
dictyBase 12365 12363 <0.01
RGD 6137 4218 <0.01
PDBsum 5994 3421 <0.01
PDB 5994 3421 <0.01
IntAct 5604 5603 <0.01
LegioList 5244 5214 <0.01
ListiList 4702 4685 <0.01
PseudoCAP 4397 4394 <0.01
PhotoList 4012 3888 <0.01
BuruList 4006 3972 <0.01
AGD 3985 3985 <0.01
REBASE 3685 3660 <0.01
TubercuList 2525 2519 <0.01
DIP 2311 2306 <0.01
PeroxiBase 2131 2125 <0.01
PhosphoSite 1750 1750 <0.01
SagaList 1727 1633 <0.01
Leproma 963 962 <0.01
TRANSFAC 846 837 <0.01
GeneDB_Spombe 735 729 <0.01
MypuList 584 580 <0.01
PharmGKB 472 471 <0.01
World-2DPAGE 421 421 <0.01
SGD 337 337 <0.01
PeptideAtlas 249 249 <0.01
PHCI-2DPAGE 106 106 <0.01
Reactome 85 79 <0.01
ANU-2DPAGE 59 59 <0.01
SWISS-2DPAGE 29 29 <0.01
REPRODUCTION-2DPAGE 18 18 <0.01
CYGD 16 16 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 98
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 260940
Total number of entries encoded on a Mitochondrion: 195527
Total number of entries encoded on a Plasmid: 85079
Total number of entries encoded on a Plastid: 4006
Total number of entries encoded on a Plastid; Apicoplast: 182
Total number of entries encoded on a Plastid; Chloroplast: 67647
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 231
Number of fragments: 1147786
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml
For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:
UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail:
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication, please use the following reference:
The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 36:D190-D195(2008) doi:10.1093/nar/gkm895
ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot |
| Hosted by | Mirror sites: | Australia | Brazil | Canada | China | Korea |