UniProt Knowledgebase |
Release notes UniProtKB release 12.0 of 24-Jul-2007 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 12.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 54.0 and the UniProtKB/TrEMBL Protein Database release 37.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 54.0 statistics |
|---|
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
| 52.0 | 03/07 | 261'513 | 95'638'062 |
| 53.0 | 05/07 | 269'293 | 98'902'758 |
| 54.0 | 07/07 | 276'256 | 101'466'206 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 5'935 |
| C.albicans | None yet | calbican.txt | 638 |
| C.elegans | Wormpep | celegans.txt | 3'040 |
| D.discoideum | DictyBase | dicty.txt | 352 |
| D.melanogaster | FlyBase | fly.txt | 2'567 |
| M.musculus | MGD | mgdtosp.txt | 13'561 |
| S.cerevisiae | SGD | yeast.txt | 6'162 |
| S.pombe | GeneDB_SPombe | pombe.txt | 3'229 |
1. INTRODUCTION
Release 54.0 of 24-Jul-07 of UniProtKB/Swiss-Prot contains 276256 sequence entries,
comprising 101466206 amino acids abstracted from 158294 references.
7104 sequences have been added since release 53.0, the sequence data of
690 existing entries has been updated and the annotations of
269152 entries have been revised.
The growth of the database is summarized below.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 7.86 Gln (Q) 3.97 Leu (L) 9.66 Ser (S) 6.88
Arg (R) 5.43 Glu (E) 6.66 Lys (K) 5.92 Thr (T) 5.40
Asn (N) 4.12 Gly (G) 6.94 Met (M) 2.39 Trp (W) 1.13
Asp (D) 5.33 His (H) 2.29 Phe (F) 3.95 Tyr (Y) 3.01
Cys (C) 1.50 Ile (I) 5.88 Pro (P) 4.85 Val (V) 6.72
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 10989
The first twenty species represent 85570 sequences: 31 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 5195
2x: 1653
3x: 802
4x: 529
5x: 365
6x: 321
7x: 222
8x: 193
9x: 162
10x: 94
11- 20x: 498
21- 50x: 356
51-100x: 184
>100x: 415
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 16890 Homo sapiens (Human)
2 13561 Mus musculus (Mouse)
3 6229 Rattus norvegicus (Rat)
4 6162 Saccharomyces cerevisiae (Baker's yeast)
5 5935 Arabidopsis thaliana (Mouse-ear cress)
6 4930 Escherichia coli
7 4179 Bos taurus (Bovine)
8 3229 Schizosaccharomyces pombe (Fission yeast)
9 3040 Caenorhabditis elegans
10 2856 Bacillus subtilis
11 2567 Drosophila melanogaster (Fruit fly)
12 2111 Xenopus laevis (African clawed frog)
13 1893 Escherichia coli O157:H7
14 1887 Pongo pygmaeus (Orangutan)
15 1803 Gallus gallus (Chicken)
16 1782 Methanococcus jannaschii
17 1774 Haemophilus influenzae
18 1647 Salmonella typhimurium
19 1560 Escherichia coli O6
20 1535 Shigella flexneri
21 1442 Danio rerio (Zebrafish) (Brachydanio rerio)
22 1419 Mycobacterium tuberculosis
23 1282 Oryza sativa subsp. japonica (Rice)
24 1242 Salmonella typhi
25 1225 Sus scrofa (Pig)
26 1225 Pseudomonas aeruginosa
27 1160 Mycobacterium bovis
28 978 Synechocystis sp. (strain PCC 6803)
29 971 Archaeoglobus fulgidus
30 920 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 915 Yersinia pestis
32 894 Vibrio cholerae
33 884 Mimivirus
34 883 Rhizobium meliloti (Sinorhizobium meliloti)
35 838 Oryctolagus cuniculus (Rabbit)
36 810 Staphylococcus aureus (strain Mu50 / ATCC 700699)
37 808 Staphylococcus aureus (strain N315)
38 784 Staphylococcus aureus (strain COL)
39 783 Staphylococcus aureus (strain MW2)
40 779 Staphylococcus aureus (strain MSSA476)
41 773 Staphylococcus aureus (strain MRSA252)
42 756 Aquifex aeolicus
43 740 Vibrio parahaemolyticus
44 740 Pasteurella multocida
45 720 Canis familiaris (Dog)
46 688 Streptomyces coelicolor
47 687 Mycoplasma pneumoniae
48 685 Vibrio vulnificus
49 680 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
50 676 Bacillus halodurans
51 665 Vibrio vulnificus (strain YJ016)
52 660 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
53 659 Staphylococcus epidermidis (strain ATCC 12228)
54 646 Neurospora crassa
55 644 Ashbya gossypii (Yeast) (Eremothecium gossypii)
56 639 Pan troglodytes (Chimpanzee)
57 638 Candida albicans (Yeast)
58 633 Mycobacterium leprae
59 632 Anabaena sp. (strain PCC 7120)
60 628 Yersinia pseudotuberculosis
61 622 Bacillus anthracis
62 620 Pseudomonas syringae pv. tomato
63 616 Pseudomonas putida (strain KT2440)
64 612 Treponema pallidum
65 611 Kluyveromyces lactis (Yeast) (Candida sphaerica)
66 605 Photorhabdus luminescens subsp. laumondii
67 593 Zea mays (Maize)
68 592 Salmonella paratyphi-a
69 589 Methanobacterium thermoautotrophicum
70 585 Bradyrhizobium japonicum
71 579 Rickettsia prowazekii
72 575 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
73 574 Helicobacter pylori (Campylobacter pylori)
74 573 Ralstonia solanacearum (Pseudomonas solanacearum)
75 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
76 568 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
77 563 Candida glabrata (Yeast) (Torulopsis glabrata)
78 562 Buchnera aphidicola subsp. Schizaphis graminum
79 561 Rhizobium loti (Mesorhizobium loti)
80 555 Lactococcus lactis subsp. lactis (Streptococcus lactis)
81 555 Helicobacter pylori J99 (Campylobacter pylori J99)
82 552 Listeria monocytogenes
83 548 Bacillus cereus (strain ATCC 14579 / DSM 31)
84 544 Listeria innocua
85 542 Xanthomonas campestris pv. campestris
86 541 Shewanella oneidensis
87 531 Neisseria meningitidis serogroup A
88 531 Neisseria meningitidis serogroup B
89 519 Caulobacter crescentus (Caulobacter vibrioides)
90 519 Clostridium acetobutylicum
91 514 Brucella melitensis
92 514 Brucella suis
93 508 Xanthomonas axonopodis pv. citri
94 508 Salmonella choleraesuis
95 507 Buchnera aphidicola subsp. Baizongia pistaciae
96 492 Streptococcus pneumoniae
97 490 Thermotoga maritima
98 487 Oceanobacillus iheyensis
99 485 Rickettsia conorii
100 484 Listeria monocytogenes serotype 4b (strain F2365)
101 483 Mycoplasma genitalium
102 481 Xylella fastidiosa
103 475 Photobacterium profundum (Photobacterium sp. (strain SS9))
104 472 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
105 468 Haemophilus ducreyi
106 467 Deinococcus radiodurans
107 460 Emericella nidulans (Aspergillus nidulans)
108 458 Methanosarcina acetivorans
109 457 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
110 455 Yarrowia lipolytica (Candida lipolytica)
111 454 Corynebacterium glutamicum (Brevibacterium flavum)
112 454 Clostridium perfringens
113 451 Bacillus cereus (strain ATCC 10987)
114 447 Pyrococcus horikoshii
115 444 Bordetella parapertussis
116 443 Bordetella pertussis
117 442 Pyrococcus abyssi
118 441 Halobacterium salinarium (Halobacterium halobium)
119 440 Chromobacterium violaceum
120 440 Rickettsia felis (Rickettsia azadi)
121 437 Methanosarcina mazei (Methanosarcina frisia)
122 435 Chlamydia trachomatis
123 434 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
124 429 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
125 427 Rickettsia bellii (strain RML369-C)
126 425 Borrelia burgdorferi (Lyme disease spirochete)
127 423 Lactobacillus plantarum
128 422 Thermoanaerobacter tengcongensis
129 422 Pyrococcus furiosus
130 421 Nicotiana tabacum (Common tobacco)
131 420 Synechococcus elongatus (Thermosynechococcus elongatus)
132 418 Bacillus thuringiensis subsp. konkukian
133 417 Streptococcus pyogenes serotype M6
134 417 Ovis aries (Sheep)
135 416 Chlamydia pneumoniae (Chlamydophila pneumoniae)
136 414 Enterococcus faecalis (Streptococcus faecalis)
137 413 Streptococcus mutans
138 412 Campylobacter jejuni
139 412 Streptomyces avermitilis
140 411 Shigella sonnei (strain Ss046)
141 406 Staphylococcus haemolyticus (strain JCSC1435)
142 406 Chlamydia muridarum
143 406 Rhizobium sp. (strain NGR234)
144 397 Rickettsia typhi
145 397 Staphylococcus saprophyticus subsp. saprophyticus
146 397 Streptococcus pyogenes serotype M1
147 397 Sulfolobus solfataricus
148 394 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
149 393 Streptococcus pyogenes serotype M18
150 391 Streptococcus pyogenes serotype M3
151 387 Shigella boydii serotype 4 (strain Sb227)
152 387 Bacillus cereus (strain ZK / E33L)
153 385 Acinetobacter sp. (strain ADP1)
154 384 Burkholderia pseudomallei (Pseudomonas pseudomallei)
155 378 Shigella dysenteriae serotype 1 (strain Sd197)
156 376 Rhodopseudomonas palustris
157 374 Vibrio fischeri (strain ATCC 700601 / ES114)
158 373 Chlorobium tepidum
159 373 Nitrosomonas europaea
160 371 Corynebacterium efficiens
161 370 Bacillus clausii (strain KSM-K16)
162 369 Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
163 368 Solanum tuberosum (Potato)
164 362 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
165 359 Methanopyrus kandleri
166 359 Mannheimia succiniciproducens (strain MBEL55E)
167 358 Gloeobacter violaceus
168 357 Burkholderia mallei (Pseudomonas mallei)
169 352 Dictyostelium discoideum (Slime mold)
170 352 Staphylococcus aureus (strain NCTC 8325)
171 351 Leptospira interrogans
172 350 Aeropyrum pernix
173 348 Streptococcus agalactiae serotype III
174 345 Streptococcus agalactiae serotype V
175 344 Brucella abortus
176 343 Synechococcus sp. (strain WH8102)
177 343 Pisum sativum (Garden pea)
178 341 Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
179 341 Aspergillus fumigatus (Sartorya fumigata)
180 340 Methylococcus capsulatus
181 339 Geobacillus kaustophilus
182 337 Prochlorococcus marinus (strain MIT 9313)
183 334 Sulfolobus tokodaii
184 333 Prochlorococcus marinus
185 333 Glycine max (Soybean)
186 330 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
187 326 Mycobacterium paratuberculosis
188 325 Staphylococcus aureus
189 324 Staphylococcus aureus (strain bovine RF122)
190 323 Idiomarina loihiensis
191 321 Pseudomonas syringae pv. syringae (strain B728a)
192 320 Macaca mulatta (Rhesus macaque)
193 320 Rhodopirellula baltica
194 317 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
195 317 Geobacter sulfurreducens
196 317 Thermoplasma acidophilum
197 315 Coxiella burnetii
198 314 Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
199 313 Staphylococcus aureus (strain USA300)
200 313 Fusobacterium nucleatum subsp. nucleatum
201 311 Triticum aestivum (Wheat)
202 310 Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
203 304 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
204 302 Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
205 300 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
206 300 Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
207 297 Nocardia farcinica
208 295 Zymomonas mobilis
209 295 Wolinella succinogenes
210 293 Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
211 293 Bacteroides thetaiotaomicron
212 293 Pseudomonas fluorescens (strain PfO-1)
213 291 Sulfolobus acidocaldarius
214 290 Hordeum vulgare (Barley)
215 290 Legionella pneumophila subsp. pneumophila
216 290 Xanthomonas oryzae pv. oryzae
217 289 Silicibacter pomeroyi
218 288 Clostridium tetani
219 288 Haemophilus influenzae (strain 86-028NP)
220 287 Legionella pneumophila (strain Paris)
221 287 Symbiobacterium thermophilum
222 287 Pseudomonas putida
223 286 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
224 286 Pyrobaculum aerophilum
225 285 Legionella pneumophila (strain Lens)
226 284 Cavia porcellus (Guinea pig)
227 283 Burkholderia sp. (strain 383) (Burkholderia cepacia
228 283 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 282 Thermoplasma volcanium
230 280 Corynebacterium diphtheriae
231 277 Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
232 277 Brucella abortus (strain 2308)
233 276 Escherichia coli (strain UTI89 / UPEC)
234 272 Spinacia oleracea (Spinach)
235 269 Gorilla gorilla gorilla (Lowland gorilla)
236 268 Bacteriophage T4
237 267 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
238 265 Equus caballus (Horse)
239 265 Xanthomonas campestris pv. campestris (strain 8004)
240 264 Methanococcus maripaludis
241 264 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
242 262 Haloarcula marismortui (Halobacterium marismortui)
243 262 Rhodobacter capsulatus (Rhodopseudomonas capsulata)
244 262 Helicobacter hepaticus
245 262 Bifidobacterium longum
246 261 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
247 261 Wigglesworthia glossinidia brevipalpis
248 260 Dechloromonas aromatica (strain RCB)
249 259 Cryptococcus neoformans (Filobasidiella neoformans)
250 258 Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
251 255 Leifsonia xyli subsp. xyli
252 254 Vaccinia virus (strain Copenhagen) (VACV)
253 254 Gluconobacter oxydans (Gluconobacter suboxydans)
254 252 Porphyromonas gingivalis (Bacteroides gingivalis)
255 252 Bartonella henselae (Rochalimaea henselae)
256 249 Campylobacter jejuni (strain RM1221)
257 249 Pseudoalteromonas haloplanktis (strain TAC 125)
258 248 Bacteroides fragilis
259 245 Burkholderia pseudomallei (strain 1710b)
260 244 Chlamydophila caviae
261 243 Desulfotalea psychrophila
262 242 Blochmannia floridanus
263 241 Xanthomonas campestris pv. vesicatoria (strain 85-10)
264 240 Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
265 240 Lactobacillus johnsonii
266 239 Propionibacterium acnes
267 239 Bartonella quintana (Rochalimaea quintana)
268 238 Ustilago maydis (Smut fungus)
269 236 Thiobacillus denitrificans (strain ATCC 25259)
270 235 Bacillus stearothermophilus (Geobacillus stearothermophilus)
271 228 Oryza sativa subsp. indica (Rice)
272 225 Francisella tularensis subsp. tularensis
273 225 Chlamydomonas reinhardtii
274 224 Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
275 223 Bdellovibrio bacteriovorus
276 222 Streptococcus thermophilus (strain CNRZ 1066)
277 221 Caenorhabditis briggsae
278 219 Psychrobacter arcticum
279 217 Porphyra purpurea
280 217 Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
281 215 Hahella chejuensis (strain KCTC 2396)
282 214 Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
283 214 Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
284 214 Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
285 212 Klebsiella pneumoniae
286 211 Felis silvestris catus (Cat)
287 210 Cricetulus griseus (Chinese hamster)
288 209 Gibberella zeae (Fusarium graminearum)
289 209 Lactobacillus acidophilus
290 209 Treponema denticola
291 208 Porphyra yezoensis
292 208 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
293 207 Sodalis glossinidius (strain morsitans)
294 207 Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
295 206 Thiomicrospira crunogena (strain XCL-2)
296 202 Mesocricetus auratus (Golden hamster)
297 202 Geobacter metallireducens (strain GS-15 / ATCC 53774 / DSM 7210)
298 200 Encephalitozoon cuniculi
299 200 Vaccinia virus (strain Western Reserve / WR) (VACV)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 11165 ( 4%)
Bacteria 133655 ( 48%)
Eukaryota 120226 ( 44%)
Viruses 11210 ( 4%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 16891 ( 14%) ( 6%)
Other Mammalia 37610 ( 31%) ( 14%)
Other Vertebrata 11423 ( 10%) ( 4%)
Viridiplantae 20710 ( 17%) ( 7%)
Fungi 17864 ( 15%) ( 6%)
Insecta 4951 ( 4%) ( 2%)
Nematoda 3502 ( 3%) ( 1%)
Other 7275 ( 6%) ( 3%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 5183 1001-1100 2299
51- 100 20141 1101-1200 1564
101- 150 28762 1201-1300 1265
151- 200 27299 1301-1400 1056
201- 250 27917 1401-1500 866
251- 300 24165 1501-1600 445
301- 350 23861 1601-1700 345
351- 400 22083 1701-1800 290
401- 450 17289 1801-1900 270
451- 500 14984 1901-2000 228
501- 550 11221 2001-2100 137
551- 600 7913 2101-2200 202
601- 650 6653 2201-2300 189
651- 700 4546 2301-2400 121
701- 750 3828 2401-2500 94
751- 800 3088 >2500 719
801- 850 2560
851- 900 2678
901- 950 2046
951-1000 1607
The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_HUMAN (Q8WZ42): 34350 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1833
5.1 Table of the frequency of journal citations
Journals cited 1x: 636
2x: 237
3x: 133
4x: 93
5x: 68
6x: 49
7x: 39
8x: 36
9x: 27
10x: 21
11- 20x: 142
21- 50x: 145
51-100x: 75
>100x: 132
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 15088 Journal of Biological Chemistry
2 7172 Proceedings of the National Academy of Sciences of the U.S.A.
3 4530 Journal of Bacteriology
4 4262 Gene
5 4095 Nucleic Acids Research
6 3878 Biochemical and Biophysical Research Communications
7 3582 FEBS Letters
8 3333 Biochemistry
9 3292 The EMBO Journal
10 2946 European Journal of Biochemistry
11 2839 Nature
12 2837 Molecular and Cellular Biology
13 2690 Biochimica et Biophysica Acta
14 2556 Journal of Molecular Biology
15 2318 Genomics
16 2305 Cell
17 1883 Biochemical Journal
18 1781 Science
19 1504 Molecular Microbiology
20 1385 Journal of Virology
21 1363 Plant Molecular Biology
22 1318 Journal of Cell Biology
23 1274 Molecular and General Genetics
24 1137 Virology
25 1117 Human Molecular Genetics
26 1101 Nature Genetics
27 1093 Genes and Development
28 1084 Journal of Biochemistry
29 990 Oncogene
30 986 Plant Physiology
31 969 The American Journal of Human Genetics
32 838 Human Mutation
33 833 Development
34 805 Journal of Immunology
35 778 Infection and Immunity
36 774 Genetics
37 747 Structure
38 708 Molecular Biology of the Cell
39 702 Yeast
40 701 Archives of Biochemistry and Biophysics
41 683 Journal of General Virology
42 636 Microbiology
43 606 Blood
44 593 The Plant Cell
45 571 FEMS Microbiology Letters
46 548 Nature Structural Biology
47 547 Molecular Cell
48 525 Developmental Biology
49 522 Journal of Cell Science
50 517 Cancer Research
51 516 Human Genetics
52 494 Current Genetics
53 485 The Plant Journal
54 484 Mechanisms of Development
55 449 Applied and Environmental Microbiology
56 447 Current Biology
57 437 Acta Crystallographica, Section D
58 436 Protein Science
59 435 Neuron
60 432 Journal of Clinical Investigation
61 426 Mammalian Genome
62 409 Molecular and Biochemical Parasitology
63 406 Journal of Neuroscience
64 396 Molecular Endocrinology
65 386 The Journal of Experimental Medicine
66 372 Immunogenetics
67 352 Journal of Neurochemistry
68 350 Journal of Molecular Evolution
69 343 Endocrinology
70 342 DNA and Cell Biology
71 334 Toxicon
72 324 DNA Sequence
73 314 The Journal of Clinical Endocrinology and Metabolism
74 314 American Journal of Physiology
75 303 Molecular Biology and Evolution
76 293 Brain Research. Molecular Brain Research
77 286 Biological Chemistry Hoppe-Seyler
78 286 Bioscience, Biotechnology, and Biochemistry
79 261 Journal of Medical Genetics
80 252 Cytogenetics and Cell Genetics
81 250 Comparative Biochemistry and Physiology
82 248 Proteins
83 242 Journal of General Microbiology
84 231 Peptides
85 222 Antimicrobial Agents and Chemotherapy
86 222 Molecular Pharmacology
87 217 Journal of Investigative Dermatology
88 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
89 210 Biology of Reproduction
90 208 Nature Cell Biology
91 204 Plant and Cell Physiology
92 199 Genome Research
93 193 Virus Research
94 188 Experimental Cell Research
95 183 Molecular Plant-Microbe Interactions
96 183 DNA Research
97 178 European Journal of Immunology
98 173 RNA
99 171 Neurology
100 167 Biochimie
101 166 Developmental Dynamics
102 160 Tissue Antigens
103 158 DNA
104 156 Molecular and Cellular Endocrinology
105 151 European Journal of Human Genetics
106 150 American Journal of Medical Genetics
107 150 Molecular Phylogenetics and Evolution
108 149 Hemoglobin
109 147 Planta
110 146 Annals of Neurology
111 145 Bioorganicheskaia Khimiia
112 145 Genes to Cells
113 144 Archives of Microbiology
114 140 Journal of Human Genetics
115 137 Insect Biochemistry and Molecular Biology
116 136 Immunity
117 130 Developmental Cell
118 127 Animal Genetics
119 124 Molecular Reproduction and Development
120 123 Diabetes
121 119 General and Comparative Endocrinology
122 118 Agricultural and Biological Chemistry
123 118 Glycobiology
124 117 The New England Journal of Medicine
125 116 Molecular Immunology
126 116 Investigative Ophthalmology and Visual Science
127 109 Molecular and Cellular Neuroscience
128 107 Eukaryotic cell
129 106 British Journal of Haematology
130 106 Journal of Protein Chemistry
131 102 International Journal of Cancer
132 102 Archives of Virology
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 550759 1.99
Journal 478204 245229 1.73
Submitted to EMBL/GenBank/DDBJ 67287 59130 0.24
Submitted to other databases 2988 2775 0.01
Unpublished observations 633 627 <0.01
Book citation 578 568 <0.01
Plant Gene Register 537 525 <0.01
Thesis 389 387 <0.01
Patent 137 135 <0.01
Worm Breeder's Gazette 6 6 <0.01
Comments (CC) 1128537 4.09
SIMILARITY 318972 253893 1.15
FUNCTION 195810 188699 0.71
SUBCELLULAR LOCATION 152959 152959 0.55
SUBUNIT 105026 105026 0.38
CATALYTIC ACTIVITY 102651 94032 0.37
PATHWAY 54987 46371 0.20
COFACTOR 41486 37238 0.15
TISSUE SPECIFICITY 26128 26128 0.09
PTM 23627 19322 0.09
MISCELLANEOUS 22651 20422 0.08
DOMAIN 18132 15735 0.07
ALTERNATIVE PRODUCTS 12699 12699 0.05
INTERACTION 8120 8120 0.03
SEQUENCE CAUTION 7633 7633 0.03
INDUCTION 7512 7512 0.03
DEVELOPMENTAL STAGE 6587 6587 0.02
ENZYME REGULATION 4821 4821 0.02
WEB RESOURCE 4281 3472 0.02
DISEASE 3888 2742 0.01
CAUTION 3659 3580 0.01
MASS SPECTROMETRY 3008 2435 0.01
BIOPHYSICOCHEMICAL PROPERTIES 1793 1793 0.01
POLYMORPHISM 611 594 <0.01
RNA EDITING 496 496 <0.01
ALLERGEN 422 422 <0.01
TOXIC DOSE 331 324 <0.01
BIOTECHNOLOGY 174 174 <0.01
PHARMACEUTICAL 73 73 <0.01
Features (FT) 1860993 6.74
CHAIN 281223 272612 1.02
TRANSMEM 179620 39486 0.65
METAL 118481 29154 0.43
CONFLICT 95800 33149 0.35
DOMAIN 93021 52258 0.34
STRAND 92321 8670 0.33
TOPO_DOM 89201 18184 0.32
HELIX 88720 9107 0.32
CARBOHYD 79407 20018 0.29
DISULFID 77002 19621 0.28
BINDING 73356 25193 0.27
MOD_RES 72933 27627 0.26
ACT_SITE 66177 38366 0.24
REPEAT 60127 9182 0.22
VARIANT 48775 10403 0.18
NP_BIND 43766 31429 0.16
REGION 42789 22999 0.15
VAR_SEQ 27872 11982 0.10
COMPBIAS 27816 16202 0.10
SIGNAL 25883 25873 0.09
TURN 23765 7394 0.09
MUTAGEN 20666 5030 0.07
ZN_FING 20432 7801 0.07
MOTIF 19329 12719 0.07
SITE 17097 9484 0.06
INIT_MET 11353 11353 0.04
COILED 10903 7114 0.04
NON_TER 10693 8198 0.04
PROPEP 8294 6993 0.03
LIPID 8074 5202 0.03
DNA_BIND 7324 6770 0.03
PEPTIDE 6823 4279 0.02
TRANSIT 4813 4762 0.02
CA_BIND 2766 1151 0.01
CROSSLNK 2299 1592 0.01
NON_CONS 1308 536 <0.01
UNSURE 502 188 <0.01
SE_CYS 262 189 <0.01
Cross-references (DR) 3933954 14.24
InterPro 629121 253999 2.28
EMBL 523989 267709 1.90
Pfam 354417 247932 1.28
GO 351705 143924 1.27
PROSITE 261031 158819 0.94
KEGG 184994 167349 0.67
GenomeReviews 159422 142494 0.58
HAMAP 111335 111212 0.40
TIGRFAMs 108864 101865 0.39
PIR 107524 98237 0.39
PRINTS 96822 76679 0.35
BioCyc 94765 87112 0.34
SMART 83386 63657 0.30
PANTHER 83290 77583 0.30
HSSP 81747 81747 0.30
ProDom 74589 72198 0.27
Gene3D 73453 62434 0.27
UniGene 67015 61114 0.24
Ensembl 55170 55140 0.20
GermOnline 42027 41411 0.15
PDB 38787 10559 0.14
ArrayExpress 37729 37729 0.14
SMR 36791 36791 0.13
RZPD-ProtExp 28789 13542 0.10
PIRSF 26598 25658 0.10
TIGR 24567 23936 0.09
LinkHub 17927 17910 0.06
HGNC 16366 16288 0.06
PharmGKB 14775 14775 0.05
IntAct 14626 14626 0.05
MGI 13440 13393 0.05
MIM 13419 10760 0.05
DIP 8837 8787 0.03
SGD 6236 6148 0.02
CYGD 6224 6134 0.02
RGD 6110 6106 0.02
TAIR 6017 5906 0.02
MEROPS 5517 5204 0.02
PeptideAtlas 5026 5026 0.02
EcoGene 4311 4308 0.02
EchoBASE 4158 4126 0.02
H-InvDB 3677 3659 0.01
WormPep 3666 3037 0.01
FlyBase 3341 3213 0.01
WormBase 3320 3238 0.01
Gramene 3294 3294 0.01
GeneDB_Spombe 3262 3227 0.01
TRANSFAC 2895 2599 0.01
SubtiList 2797 2796 0.01
Reactome 2706 1546 0.01
Orphanet 2564 1641 0.01
GeneFarm 2021 2001 0.01
DrugBank 1826 502 0.01
StyGene 1599 1595 0.01
HPA 1486 1324 0.01
TubercuList 1447 1411 0.01
ZFIN 1415 1403 0.01
SWISS-2DPAGE 1183 1183 <0.01
PseudoCAP 1165 1156 <0.01
ListiList 1097 1089 <0.01
REPRODUCTION-2DPAGE 836 836 <0.01
AGD 650 644 <0.01
Leproma 636 633 <0.01
PhotoList 605 605 <0.01
LegioList 572 572 <0.01
MaizeGDB 460 455 <0.01
DisProt 395 392 <0.01
OGP 379 378 <0.01
PeroxiBase 378 367 <0.01
REBASE 364 358 <0.01
HIV 361 351 <0.01
DictyBase 355 352 <0.01
ECO2DBASE 351 299 <0.01
SagaList 349 348 <0.01
GlycoSuiteDB 282 282 <0.01
PHCI-2DPAGE 241 241 <0.01
MypuList 193 193 <0.01
DOSAC-COBS-2DPAGE 149 147 <0.01
Aarhus/Ghent-2DPAGE 128 98 <0.01
Siena-2DPAGE 103 103 <0.01
HSC-2DPAGE 85 85 <0.01
PhosSite 70 70 <0.01
Cornea-2DPAGE 67 67 <0.01
COMPLUYEAST-2DPAGE 59 59 <0.01
euHCVdb 55 44 <0.01
PMMA-2DPAGE 52 52 <0.01
PptaseDB 29 29 <0.01
Rat-heart-2DPAGE 28 28 <0.01
ANU-2DPAGE 25 25 <0.01
BuruList 20 20 <0.01
Number of explicitly cross-referenced databases: 91
Number of implicitly cross-referenced databases: 26
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 244404
Total number of entries encoded on a Mitochondrion: 4312
Total number of entries encoded on a Plasmid: 3339
Total number of entries encoded on a Plastid: 28
Total number of entries encoded on a Plastid; Apicoplast: 10
Total number of entries encoded on a Plastid; Chloroplast: 8493
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 108
Number of fragments: 8342
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 20944
| UniProtKB/TrEMBL protein database release 37.0 statistics |
|---|
1. INTRODUCTION
Release 37.0 of 24-July-2007 of UniProtKB/TrEMBL contains 4672908 sequence entries
comprising 1515982311 amino acids.
407721 sequences have been added since release 36, the sequence data of
5983 existing entries has been updated and the annotations of
4265187 entries have been revised. This represents an increase of 10%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.57 Gln (Q) 3.92 Leu (L) 9.86 Ser (S) 6.79
Arg (R) 5.57 Glu (E) 6.05 Lys (K) 5.20 Thr (T) 5.58
Asn (N) 4.20 Gly (G) 7.06 Met (M) 2.40 Trp (W) 1.33
Asp (D) 5.27 His (H) 2.22 Phe (F) 4.04 Tyr (Y) 3.01
Cys (C) 1.34 Ile (I) 5.91 Pro (P) 4.83 Val (V) 6.66
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.05
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 137885
The first twenty species represent 832320 sequences: 17.8 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:62883
2x:25898
3x:13175
4x: 7414
5x: 4369
6x: 3205
7x: 2336
8x: 1966
9x: 1552
10x: 1686
11- 20x: 7515
21- 50x: 2861
51-100x: 1184
>100x: 1841
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 189486 Human immunodeficiency virus 1
2 95759 Oryza sativa subsp. japonica (Rice)
3 54604 Homo sapiens (Human)
4 50189 Trichomonas vaginalis G3
5 49337 Mus musculus (Mouse)
6 43819 Arabidopsis thaliana (Mouse-ear cress)
7 39845 Paramecium tetraurelia
8 39334 Oryza sativa subsp. indica (Rice)
9 37589 Hepatitis C virus
10 28042 Tetraodon nigroviridis (Green puffer)
11 26752 Drosophila melanogaster (Fruit fly)
12 24811 Vitis vinifera (Grape)
13 22320 Medicago truncatula (Barrel medic)
14 20567 Danio rerio (Zebrafish) (Brachydanio rerio)
15 20421 Trypanosoma cruzi
16 20391 Caenorhabditis elegans
17 19091 uncultured bacterium
18 16854 Aedes aegypti (Yellowfever mosquito)
19 16685 Tetrahymena thermophila SB210
20 16424 Phaeosphaeria nodorum (Septoria nodorum)
21 15262 Hepatitis B virus (HBV)
22 14672 Plasmodium chabaudi
23 14095 Aspergillus niger
24 13408 Anopheles gambiae str. PEST
25 13061 Dictyostelium discoideum AX4
26 13056 Caenorhabditis briggsae
27 12801 Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
28 12597 Xenopus laevis (African clawed frog)
29 12006 Aspergillus oryzae
30 11783 Plasmodium berghei
31 10977 Chaetomium globosum (Soil fungus)
32 10630 Neurospora crassa
33 10399 Coccidioides immitis
34 10354 Neosartorya fischeri (Aspergillus fischerianus
35 10345 Aspergillus terreus (strain NIH 2624)
36 10074 Drosophila pseudoobscura (Fruit fly)
37 9727 Cryptococcus neoformans (Filobasidiella neoformans)
38 9721 Schistosoma japonicum (Blood fluke)
39 9704 Aspergillus fumigatus (Sartorya fumigata)
40 9503 Emericella nidulans (Aspergillus nidulans)
41 9461 Trypanosoma brucei
42 9318 Candida albicans (Yeast)
43 9153 Escherichia coli
44 9145 Bos taurus (Bovine)
45 9064 Aspergillus clavatus
46 8975 Rhodococcus sp. (strain RHA1)
47 8512 Stigmatella aurantiaca DW4/3-1
48 8437 Plesiocystis pacifica SIR-1
49 8422 Rattus norvegicus (Rat)
50 8413 Burkholderia xenovorans (strain LB400)
51 8249 Microscilla marina ATCC 23134
52 8126 Bradyrhizobium japonicum
53 8010 Leishmania infantum
54 7976 Ostreococcus tauri
55 7939 Helicobacter pylori (Campylobacter pylori)
56 7937 Frankia sp. EAN1pec
57 7877 Leishmania braziliensis
58 7834 Burkholderia phymatum STM815
59 7808 Plasmodium yoelii yoelii
60 7745 Solibacter usitatus (strain Ellin6076)
61 7559 Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
62 7521 Streptomyces coelicolor
63 7501 Hepatitis C virus subtype 1b
64 7461 Burkholderia cenocepacia MC0-3
65 7432 Burkholderia sp. (strain 383) (Burkholderia cepacia
66 7409 Burkholderia vietnamiensis G4
67 7403 Ostreococcus lucimarinus CCE9901
68 7349 Burkholderia pseudomallei 305
69 7336 Plasmodium vivax
70 7310 Burkholderia phytofirmans PsJN
71 7305 Streptomyces avermitilis
72 7215 Burkholderia pseudomallei (strain 668)
73 7190 Myxococcus xanthus (strain DK 1622)
74 7171 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
75 7141 Saccharopolyspora erythraea (strain NRRL 23338)
76 7138 Burkholderia pseudomallei (strain 1106a)
77 7136 Rhizobium loti (Mesorhizobium loti)
78 7109 Leishmania major
79 6996 Burkholderia ambifaria MC40-6
80 6974 Methylobacterium sp. 4-46
81 6970 Rhizobium leguminosarum bv. viciae (strain 3841)
82 6953 Rhodopirellula baltica
83 6932 Pseudomonas aeruginosa
84 6911 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
85 6850 Burkholderia cenocepacia (strain HI2424)
86 6700 Bradyrhizobium sp. (strain ORS278)
87 6687 Frankia alni (strain ACN14a)
88 6679 Psychroflexus torquis ATCC 700755
89 6606 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
90 6587 Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
91 6564 Burkholderia multivorans ATCC 17616
92 6561 Burkholderia cepacia (strain ATCC 53795 / AMMD)
93 6553 Plasmodium falciparum
94 6537 Hahella chejuensis (strain KCTC 2396)
95 6482 Ralstonia eutropha (Cupriavidus necator
96 6463 Planctomyces maris DSM 8797
97 6457 Ustilago maydis (Smut fungus)
98 6412 Cyanothece sp. CCY 0110
99 6394 Giardia lamblia ATCC 50803
100 6337 Sinorhizobium medicae WSM419
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 99441 ( 2%)
Bacteria 2542475 ( 54%)
Eukaryota 1507950 ( 32%)
Viruses 519029 ( 11%)
Other 4011 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 54604 ( 4%) ( 1%)
Other Mammalia 129446 ( 9%) ( 3%)
Other Vertebrata 180006 ( 12%) ( 4%)
Viridiplantae 383788 ( 25%) ( 8%)
Fungi 238250 ( 16%) ( 5%)
Insecta 147593 ( 10%) ( 3%)
Nematoda 37432 ( 2%) ( 1%)
Other 336831 ( 22%) ( 7%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 68917 1001-1100 28568
51- 100 325538 1101-1200 20000
101- 150 406073 1201-1300 13794
151- 200 388378 1301-1400 9199
201- 250 389089 1401-1500 7508
251- 300 373155 1501-1600 5523
301- 350 346516 1601-1700 4222
351- 400 272777 1701-1800 3452
401- 450 224745 1801-1900 2632
451- 500 188831 1901-2000 2218
501- 550 135570 2001-2100 1760
551- 600 100144 2101-2200 1822
601- 650 75262 2201-2300 1418
651- 700 58398 2301-2400 1150
701- 750 51071 2401-2500 939
751- 800 45518 >2500 8013
801- 850 33704
851- 900 29503
901- 950 21777
951-1000 17156
The average sequence length in UniProtKB/TrEMBL is 324 amino acids.
The shortest sequence is Q96AT0_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 6509095 1.39
Submitted to EMBL/GenBank/DDBJ 3467141 2744405 0.74
Journal 2916615 2494170 0.62
Thesis 6588 6534 <0.01
Submitted to other databases 4346 4340 <0.01
Book citation 4284 4239 <0.01
Other 110121 78245 0.02
Comments (CC) 3245182 0.69
CAUTION 1035264 1035264 0.22
SIMILARITY 1028284 934241 0.22
CATALYTIC ACTIVITY 309925 287313 0.07
FUNCTION 268909 267080 0.06
SUBCELLULAR LOCATION 254968 252079 0.05
SUBUNIT 130740 130288 0.03
PATHWAY 112447 103550 0.02
COFACTOR 96274 93156 0.02
MISCELLANEOUS 5231 5231 <0.01
INTERACTION 2613 2613 <0.01
DOMAIN 523 521 <0.01
MASS SPECTROMETRY 4 4 <0.01
Features (FT) 2030625 0.43
NON_TER 1687691 1006361 0.36
CHAIN 202951 171791 0.04
SIGNAL 139437 139437 0.03
TRANSIT 546 546 <0.01
Cross-references (DR) 32363734 6.93
InterPro 6492321 3104830 1.39
GO 5725036 2029244 1.23
EMBL 5279734 4660870 1.13
Pfam 4003018 2955678 0.86
PROSITE 2123457 1385572 0.45
GenomeReviews 1140859 1096509 0.24
Gene3D 921242 810969 0.20
KEGG 867649 830073 0.19
PRINTS 842539 707055 0.18
SMART 747946 584038 0.16
TIGRFAMs 651635 598676 0.14
PANTHER 569104 543361 0.12
ProDom 523096 499091 0.11
SMR 394475 394475 0.08
BioCyc 309825 296969 0.07
HSSP 270097 269694 0.06
UniGene 238789 220780 0.05
PIR 190628 155545 0.04
TIGR 179285 172649 0.04
Ensembl 158571 158482 0.03
PIRSF 151109 144392 0.03
RZPD-ProtExp 106571 32439 0.02
ArrayExpress 93384 93299 0.02
Gramene 70587 70587 0.02
MGI 41758 41081 0.01
HGNC 36138 36100 0.01
FlyBase 34546 34449 0.01
euHCVdb 32511 32511 0.01
WormPep 19084 19003 <0.01
TAIR 18749 18701 <0.01
WormBase 18556 18477 <0.01
ZFIN 15139 15135 <0.01
LinkHub 13340 13340 <0.01
DictyBase 12907 12907 <0.01
MEROPS 11596 11163 <0.01
LegioList 5331 5301 <0.01
IntAct 5216 5216 <0.01
ListiList 4718 4701 <0.01
PseudoCAP 4405 4402 <0.01
PDB 4271 2576 <0.01
BuruList 4220 4186 <0.01
PhotoList 4075 3951 <0.01
AGD 4049 4049 <0.01
RGD 3721 3407 <0.01
REBASE 3691 3666 <0.01
TubercuList 2542 2536 <0.01
DIP 2484 2479 <0.01
SagaList 1745 1651 <0.01
GeneDB_Spombe 1729 1717 <0.01
PeroxiBase 1359 1356 <0.01
PharmGKB 1347 1346 <0.01
Leproma 971 970 <0.01
TRANSFAC 863 853 <0.01
MypuList 589 585 <0.01
PeptideAtlas 376 376 <0.01
SGD 372 371 <0.01
PHCI-2DPAGE 106 106 <0.01
CYGD 101 98 <0.01
ANU-2DPAGE 59 59 <0.01
Reactome 45 32 <0.01
SWISS-2DPAGE 35 35 <0.01
REPRODUCTION-2DPAGE 27 27 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 91
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 251813
Total number of entries encoded on a Mitochondrion: 171426
Total number of entries encoded on a Plasmid: 73689
Total number of entries encoded on a Plastid: 3656
Total number of entries encoded on a Plastid; Apicoplast: 181
Total number of entries encoded on a Plastid; Chloroplast: 59449
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 210
Number of fragments: 1008568
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml
For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:
UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication please use the following reference:
The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929