![]() |
UniProt Knowledgebase Release notes UniProtKB release 11.0 of 29-May-2007 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 11.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 53.0 and the UniProtKB/TrEMBL Protein Database release 36.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 53.0 statistics |
|---|
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
| 52.0 | 03/07 | 261'513 | 95'638'062 |
| 53.0 | 05/07 | 269'293 | 98'902'758 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 5'771 |
| C.albicans | None yet | calbican.txt | 621 |
| C.elegans | Wormpep | celegans.txt | 3'113 |
| D.discoideum | DictyBase | dicty.txt | 357 |
| D.melanogaster | FlyBase | fly.txt | 2'659 |
| M.musculus | MGD | mgdtosp.txt | 13'155 |
| S.cerevisiae | SGD | yeast.txt | 6'240 |
| S.pombe | GeneDB_SPombe | pombe.txt | 3'232 |
1. INTRODUCTION
Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269293 sequence entries,
comprising 98902758 amino acids abstracted from 156204 references.
9228 sequences have been added since release 52.0, the sequence data of
734 existing entries has been updated and the annotations of
210454 entries have been revised.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 7.85 Gln (Q) 3.96 Leu (L) 9.66 Ser (S) 6.87
Arg (R) 5.42 Glu (E) 6.67 Lys (K) 5.93 Thr (T) 5.40
Asn (N) 4.12 Gly (G) 6.94 Met (M) 2.39 Trp (W) 1.13
Asp (D) 5.34 His (H) 2.29 Phe (F) 3.95 Tyr (Y) 3.01
Cys (C) 1.50 Ile (I) 5.89 Pro (P) 4.84 Val (V) 6.72
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 10917
The first twenty species represent 84159 sequences: 31.3 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 5200
2x: 1661
3x: 801
4x: 529
5x: 359
6x: 332
7x: 225
8x: 194
9x: 168
10x: 89
11- 20x: 451
21- 50x: 329
51-100x: 170
>100x: 409
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 16602 Homo sapiens (Human)
2 13316 Mus musculus (Mouse)
3 6163 Saccharomyces cerevisiae (Baker's yeast)
4 6119 Rattus norvegicus (Rat)
5 5706 Arabidopsis thaliana (Mouse-ear cress)
6 4930 Escherichia coli
7 4025 Bos taurus (Bovine)
8 3188 Schizosaccharomyces pombe (Fission yeast)
9 3032 Caenorhabditis elegans
10 2854 Bacillus subtilis
11 2545 Drosophila melanogaster (Fruit fly)
12 2008 Xenopus laevis (African clawed frog)
13 1885 Escherichia coli O157:H7
14 1782 Methanococcus jannaschii
15 1774 Haemophilus influenzae
16 1762 Pongo pygmaeus (Orangutan)
17 1752 Gallus gallus (Chicken)
18 1636 Salmonella typhimurium
19 1552 Escherichia coli O6
20 1528 Shigella flexneri
21 1418 Mycobacterium tuberculosis
22 1332 Danio rerio (Zebrafish) (Brachydanio rerio)
23 1232 Salmonella typhi
24 1223 Pseudomonas aeruginosa
25 1195 Sus scrofa (Pig)
26 1159 Mycobacterium bovis
27 1077 Oryza sativa subsp. japonica (Rice)
28 978 Synechocystis sp. (strain PCC 6803)
29 971 Archaeoglobus fulgidus
30 906 Yersinia pestis
31 892 Vibrio cholerae
32 884 Mimivirus
33 884 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
34 879 Rhizobium meliloti (Sinorhizobium meliloti)
35 838 Oryctolagus cuniculus (Rabbit)
36 796 Staphylococcus aureus (strain Mu50 / ATCC 700699)
37 794 Staphylococcus aureus (strain N315)
38 770 Staphylococcus aureus (strain MW2)
39 770 Staphylococcus aureus (strain COL)
40 766 Staphylococcus aureus (strain MSSA476)
41 759 Staphylococcus aureus (strain MRSA252)
42 756 Aquifex aeolicus
43 738 Vibrio parahaemolyticus
44 738 Pasteurella multocida
45 714 Canis familiaris (Dog)
46 687 Streptomyces coelicolor
47 687 Mycoplasma pneumoniae
48 682 Vibrio vulnificus
49 674 Bacillus halodurans
50 663 Vibrio vulnificus (strain YJ016)
51 647 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
52 645 Staphylococcus epidermidis (strain ATCC 12228)
53 633 Mycobacterium leprae
54 631 Anabaena sp. (strain PCC 7120)
55 629 Neurospora crassa
56 621 Ashbya gossypii (Yeast) (Eremothecium gossypii)
57 619 Yersinia pseudotuberculosis
58 618 Bacillus anthracis
59 618 Pseudomonas syringae pv. tomato
60 617 Candida albicans (Yeast)
61 614 Pseudomonas putida (strain KT2440)
62 612 Treponema pallidum
63 611 Pan troglodytes (Chimpanzee)
64 602 Photorhabdus luminescens subsp. laumondii
65 598 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
66 591 Zea mays (Maize)
67 588 Methanobacterium thermoautotrophicum
68 588 Kluyveromyces lactis (Yeast) (Candida sphaerica)
69 582 Bradyrhizobium japonicum
70 579 Rickettsia prowazekii
71 577 Salmonella paratyphi-a
72 574 Helicobacter pylori (Campylobacter pylori)
73 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
74 572 Ralstonia solanacearum (Pseudomonas solanacearum)
75 571 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
76 562 Buchnera aphidicola subsp. Schizaphis graminum
77 559 Rhizobium loti (Mesorhizobium loti)
78 559 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
79 555 Lactococcus lactis subsp. lactis (Streptococcus lactis)
80 555 Helicobacter pylori J99 (Campylobacter pylori J99)
81 550 Listeria monocytogenes
82 542 Bacillus cereus (strain ATCC 14579 / DSM 31)
83 542 Listeria innocua
84 541 Xanthomonas campestris pv. campestris
85 539 Shewanella oneidensis
86 535 Candida glabrata (Yeast) (Torulopsis glabrata)
87 530 Neisseria meningitidis serogroup A
88 530 Neisseria meningitidis serogroup B
89 519 Clostridium acetobutylicum
90 517 Caulobacter crescentus (Caulobacter vibrioides)
91 507 Buchnera aphidicola subsp. Baizongia pistaciae
92 507 Xanthomonas axonopodis pv. citri
93 494 Brucella suis
94 493 Brucella melitensis
95 492 Streptococcus pneumoniae
96 490 Salmonella choleraesuis
97 489 Thermotoga maritima
98 485 Oceanobacillus iheyensis
99 483 Mycoplasma genitalium
100 482 Listeria monocytogenes serotype 4b (strain F2365)
101 481 Rickettsia conorii
102 481 Xylella fastidiosa
103 472 Photobacterium profundum (Photobacterium sp. (strain SS9))
104 472 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
105 467 Deinococcus radiodurans
106 467 Haemophilus ducreyi
107 458 Methanosarcina acetivorans
108 456 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
109 454 Corynebacterium glutamicum (Brevibacterium flavum)
110 454 Clostridium perfringens
111 448 Bacillus cereus (strain ATCC 10987)
112 446 Pyrococcus horikoshii
113 443 Bordetella parapertussis
114 443 Emericella nidulans (Aspergillus nidulans)
115 442 Bordetella pertussis
116 441 Pyrococcus abyssi
117 440 Halobacterium salinarium (Halobacterium halobium)
118 438 Chromobacterium violaceum
119 437 Methanosarcina mazei (Methanosarcina frisia)
120 436 Yarrowia lipolytica (Candida lipolytica)
121 435 Chlamydia trachomatis
122 434 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
123 432 Rickettsia felis (Rickettsia azadi)
124 425 Borrelia burgdorferi (Lyme disease spirochete)
125 423 Lactobacillus plantarum
126 422 Thermoanaerobacter tengcongensis
127 421 Nicotiana tabacum (Common tobacco)
128 421 Pyrococcus furiosus
129 419 Synechococcus elongatus (Thermosynechococcus elongatus)
130 419 Rickettsia bellii (strain RML369-C)
131 417 Streptococcus pyogenes serotype M6
132 416 Ovis aries (Sheep)
133 416 Chlamydia pneumoniae (Chlamydophila pneumoniae)
134 414 Enterococcus faecalis (Streptococcus faecalis)
135 413 Streptococcus mutans
136 413 Bacillus thuringiensis subsp. konkukian
137 412 Campylobacter jejuni
138 412 Streptomyces avermitilis
139 408 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
140 406 Chlamydia muridarum
141 406 Rhizobium sp. (strain NGR234)
142 397 Streptococcus pyogenes serotype M1
143 397 Sulfolobus solfataricus
144 394 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
145 393 Streptococcus pyogenes serotype M18
146 391 Streptococcus pyogenes serotype M3
147 390 Rickettsia typhi
148 389 Staphylococcus haemolyticus (strain JCSC1435)
149 388 Shigella sonnei (strain Ss046)
150 383 Acinetobacter sp. (strain ADP1)
151 383 Burkholderia pseudomallei (Pseudomonas pseudomallei)
152 382 Bacillus cereus (strain ZK / E33L)
153 378 Staphylococcus saprophyticus subsp. saprophyticus
154 374 Rhodopseudomonas palustris
155 373 Chlorobium tepidum
156 372 Nitrosomonas europaea
157 370 Corynebacterium efficiens
158 369 Vibrio fischeri (strain ATCC 700601 / ES114)
159 368 Bacillus clausii (strain KSM-K16)
160 368 Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
161 364 Shigella boydii serotype 4 (strain Sb227)
162 359 Methanopyrus kandleri
163 359 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
164 357 Mannheimia succiniciproducens (strain MBEL55E)
165 356 Burkholderia mallei (Pseudomonas mallei)
166 354 Gloeobacter violaceus
167 351 Leptospira interrogans
168 349 Aeropyrum pernix
169 348 Shigella dysenteriae serotype 1 (strain Sd197)
170 348 Streptococcus agalactiae serotype III
171 345 Streptococcus agalactiae serotype V
172 344 Dictyostelium discoideum (Slime mold)
173 341 Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
174 340 Solanum tuberosum (Potato)
175 340 Pisum sativum (Garden pea)
176 338 Methylococcus capsulatus
177 338 Synechococcus sp. (strain WH8102)
178 336 Geobacillus kaustophilus
179 334 Sulfolobus tokodaii
180 332 Prochlorococcus marinus (strain MIT 9313)
181 332 Glycine max (Soybean)
182 331 Prochlorococcus marinus
183 325 Mycobacterium paratuberculosis
184 324 Staphylococcus aureus
185 324 Aspergillus fumigatus (Sartorya fumigata)
186 323 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
187 321 Brucella abortus
188 320 Idiomarina loihiensis
189 320 Rhodopirellula baltica
190 318 Macaca mulatta (Rhesus macaque)
191 317 Geobacter sulfurreducens
192 317 Staphylococcus aureus (strain NCTC 8325)
193 317 Pseudomonas syringae pv. syringae (strain B728a)
194 317 Thermoplasma acidophilum
195 315 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
196 314 Coxiella burnetii
197 313 Fusobacterium nucleatum subsp. nucleatum
198 312 Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
199 310 Triticum aestivum (Wheat)
200 300 Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
201 299 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
202 297 Nocardia farcinica
203 297 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
204 297 Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
205 296 Staphylococcus aureus (strain bovine RF122)
206 295 Wolinella succinogenes
207 295 Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
208 293 Zymomonas mobilis
209 293 Bacteroides thetaiotaomicron
210 292 Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
211 291 Sulfolobus acidocaldarius
212 288 Clostridium tetani
213 287 Symbiobacterium thermophilum
214 287 Pseudomonas putida
215 287 Haemophilus influenzae (strain 86-028NP)
216 287 Silicibacter pomeroyi
217 287 Legionella pneumophila subsp. pneumophila
218 287 Pseudomonas fluorescens (strain PfO-1)
219 286 Xanthomonas oryzae pv. oryzae
220 285 Pyrobaculum aerophilum
221 284 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
222 283 Cavia porcellus (Guinea pig)
223 283 Legionella pneumophila (strain Paris)
224 283 Hordeum vulgare (Barley)
225 282 Thermoplasma volcanium
226 281 Legionella pneumophila (strain Lens)
227 279 Staphylococcus aureus (strain USA300)
228 279 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
229 278 Corynebacterium diphtheriae
230 276 Burkholderia sp. (strain 383) (Burkholderia cepacia
231 273 Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
232 269 Gorilla gorilla gorilla (Lowland gorilla)
233 269 Spinacia oleracea (Spinach)
234 268 Bacteriophage T4
235 264 Equus caballus (Horse)
236 263 Methanococcus maripaludis
237 262 Rhodobacter capsulatus (Rhodopseudomonas capsulata)
238 262 Xanthomonas campestris pv. campestris (strain 8004)
239 261 Helicobacter hepaticus
240 261 Bifidobacterium longum
241 260 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
242 260 Wigglesworthia glossinidia brevipalpis
243 259 Haloarcula marismortui (Halobacterium marismortui)
244 259 Oryza sativa (Rice)
245 258 Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
246 257 Dechloromonas aromatica (strain RCB)
247 255 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
248 255 Leifsonia xyli subsp. xyli
249 254 Vaccinia virus (strain Copenhagen) (VACV)
250 253 Gluconobacter oxydans (Gluconobacter suboxydans)
251 252 Porphyromonas gingivalis (Bacteroides gingivalis)
252 251 Brucella abortus (strain 2308)
253 250 Bartonella henselae (Rochalimaea henselae)
254 247 Bacteroides fragilis
255 247 Campylobacter jejuni (strain RM1221)
256 245 Cryptococcus neoformans (Filobasidiella neoformans)
257 244 Chlamydophila caviae
258 243 Desulfotalea psychrophila
259 242 Pseudoalteromonas haloplanktis (strain TAC 125)
260 241 Blochmannia floridanus
261 241 Burkholderia pseudomallei (strain 1710b)
262 240 Lactobacillus johnsonii
263 238 Propionibacterium acnes
264 237 Xanthomonas campestris pv. vesicatoria (strain 85-10)
265 237 Bartonella quintana (Rochalimaea quintana)
266 236 Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
267 235 Bacillus stearothermophilus (Geobacillus stearothermophilus)
268 233 Thiobacillus denitrificans (strain ATCC 25259)
269 232 Escherichia coli (strain UTI89 / UPEC)
270 227 Ustilago maydis (Smut fungus)
271 225 Chlamydomonas reinhardtii
272 224 Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
273 222 Francisella tularensis subsp. tularensis
274 222 Streptococcus thermophilus (strain CNRZ 1066)
275 221 Bdellovibrio bacteriovorus
276 217 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
277 217 Porphyra purpurea
278 216 Psychrobacter arcticum
279 213 Caenorhabditis briggsae
280 212 Klebsiella pneumoniae
281 212 Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
282 212 Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
283 211 Felis silvestris catus (Cat)
284 209 Cricetulus griseus (Chinese hamster)
285 209 Gibberella zeae (Fusarium graminearum)
286 209 Lactobacillus acidophilus
287 209 Treponema denticola
288 209 Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
289 207 Porphyra yezoensis
290 203 Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
291 202 Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
292 202 Mesocricetus auratus (Golden hamster)
293 200 Vaccinia virus (strain Western Reserve / WR) (VACV)
294 200 Thiomicrospira crunogena (strain XCL-2)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 10967 ( 4%)
Bacteria 130080 ( 48%)
Eukaryota 117170 ( 44%)
Viruses 11076 ( 4%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 16603 ( 14%) ( 6%)
Other Mammalia 36858 ( 31%) ( 14%)
Other Vertebrata 10970 ( 9%) ( 4%)
Viridiplantae 19863 ( 17%) ( 7%)
Fungi 17340 ( 15%) ( 6%)
Insecta 4888 ( 4%) ( 2%)
Nematoda 3486 ( 3%) ( 1%)
Other 7162 ( 6%) ( 3%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 5095 1001-1100 2259
51- 100 19534 1101-1200 1515
101- 150 27896 1201-1300 1220
151- 200 26732 1301-1400 1003
201- 250 26947 1401-1500 838
251- 300 23343 1501-1600 417
301- 350 23199 1601-1700 330
351- 400 21643 1701-1800 284
401- 450 17008 1801-1900 255
451- 500 14711 1901-2000 222
501- 550 11081 2001-2100 135
551- 600 7644 2101-2200 187
601- 650 6459 2201-2300 178
651- 700 4429 2301-2400 116
701- 750 3737 2401-2500 94
751- 800 2996 >2500 703
801- 850 2516
851- 900 2629
901- 950 1992
951-1000 1570
The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_HUMAN (Q8WZ42): 34350 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1816
5.1 Table of the frequency of journal citations
Journals cited 1x: 631
2x: 231
3x: 136
4x: 93
5x: 68
6x: 49
7x: 39
8x: 31
9x: 32
10x: 16
11- 20x: 139
21- 50x: 147
51-100x: 73
>100x: 131
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 14911 Journal of Biological Chemistry
2 7108 Proceedings of the National Academy of Sciences of the U.S.A.
3 4518 Journal of Bacteriology
4 4231 Gene
5 4078 Nucleic Acids Research
6 3837 Biochemical and Biophysical Research Communications
7 3551 FEBS Letters
8 3308 Biochemistry
9 3269 The EMBO Journal
10 2943 European Journal of Biochemistry
11 2815 Nature
12 2782 Molecular and Cellular Biology
13 2673 Biochimica et Biophysica Acta
14 2530 Journal of Molecular Biology
15 2307 Genomics
16 2279 Cell
17 1851 Biochemical Journal
18 1754 Science
19 1500 Molecular Microbiology
20 1356 Journal of Virology
21 1353 Plant Molecular Biology
22 1294 Journal of Cell Biology
23 1271 Molecular and General Genetics
24 1132 Virology
25 1103 Human Molecular Genetics
26 1076 Journal of Biochemistry
27 1073 Genes and Development
28 1072 Nature Genetics
29 973 Plant Physiology
30 970 Oncogene
31 938 The American Journal of Human Genetics
32 832 Human Mutation
33 813 Development
34 789 Journal of Immunology
35 769 Infection and Immunity
36 757 Genetics
37 741 Structure
38 696 Yeast
39 694 Archives of Biochemistry and Biophysics
40 688 Molecular Biology of the Cell
41 679 Journal of General Virology
42 627 Microbiology
43 583 Blood
44 574 The Plant Cell
45 564 FEMS Microbiology Letters
46 544 Nature Structural Biology
47 532 Molecular Cell
48 510 Journal of Cell Science
49 504 Human Genetics
50 503 Developmental Biology
51 501 Cancer Research
52 493 Current Genetics
53 474 Mechanisms of Development
54 470 The Plant Journal
55 446 Applied and Environmental Microbiology
56 438 Current Biology
57 432 Protein Science
58 430 Acta Crystallographica, Section D
59 429 Neuron
60 423 Mammalian Genome
61 421 Journal of Clinical Investigation
62 403 Molecular and Biochemical Parasitology
63 402 Journal of Neuroscience
64 392 Molecular Endocrinology
65 384 The Journal of Experimental Medicine
66 372 Immunogenetics
67 348 Journal of Neurochemistry
68 347 Journal of Molecular Evolution
69 342 DNA and Cell Biology
70 339 Endocrinology
71 324 Toxicon
72 323 DNA Sequence
73 311 The Journal of Clinical Endocrinology and Metabolism
74 309 American Journal of Physiology
75 295 Molecular Biology and Evolution
76 292 Brain Research. Molecular Brain Research
77 286 Biological Chemistry Hoppe-Seyler
78 284 Bioscience, Biotechnology, and Biochemistry
79 252 Cytogenetics and Cell Genetics
80 246 Comparative Biochemistry and Physiology
81 244 Proteins
82 242 Journal of General Microbiology
83 242 Journal of Medical Genetics
84 225 Peptides
85 221 Molecular Pharmacology
86 219 Antimicrobial Agents and Chemotherapy
87 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
88 208 Journal of Investigative Dermatology
89 205 Biology of Reproduction
90 202 Nature Cell Biology
91 196 Genome Research
92 196 Plant and Cell Physiology
93 189 Virus Research
94 183 DNA Research
95 181 Molecular Plant-Microbe Interactions
96 177 Experimental Cell Research
97 176 European Journal of Immunology
98 169 RNA
99 166 Biochimie
100 166 Neurology
101 160 Developmental Dynamics
102 159 Tissue Antigens
103 158 DNA
104 152 Molecular and Cellular Endocrinology
105 149 American Journal of Medical Genetics
106 149 Molecular Phylogenetics and Evolution
107 149 Hemoglobin
108 145 Bioorganicheskaia Khimiia
109 144 European Journal of Human Genetics
110 143 Genes to Cells
111 142 Annals of Neurology
112 141 Archives of Microbiology
113 140 Planta
114 137 Journal of Human Genetics
115 135 Insect Biochemistry and Molecular Biology
116 131 Immunity
117 128 Developmental Cell
118 125 Animal Genetics
119 122 Molecular Reproduction and Development
120 120 Diabetes
121 118 Agricultural and Biological Chemistry
122 118 General and Comparative Endocrinology
123 116 Glycobiology
124 116 Investigative Ophthalmology and Visual Science
125 112 Molecular Immunology
126 110 The New England Journal of Medicine
127 106 Molecular and Cellular Neuroscience
128 106 Journal of Protein Chemistry
129 102 Eukaryotic cell
130 102 Archives of Virology
131 101 British Journal of Haematology
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 535419 1.99
Journal 466313 240501 1.73
Submitted to EMBL/GenBank/DDBJ 64038 56191 0.24
Submitted to Swiss-Prot 2131 2080 0.01
Submitted to other databases 656 643 <0.01
Unpublished observations 634 628 <0.01
Book citation 578 568 <0.01
Plant Gene Register 537 525 <0.01
Thesis 388 386 <0.01
Patent 138 136 <0.01
Worm Breeder's Gazette 6 6 <0.01
Comments (CC) 1094051 4.06
SIMILARITY 308814 246723 1.15
FUNCTION 190370 183685 0.71
SUBCELLULAR LOCATION 148332 148332 0.55
SUBUNIT 101315 101315 0.38
CATALYTIC ACTIVITY 100585 92207 0.37
PATHWAY 53586 45379 0.20
COFACTOR 40430 36241 0.15
TISSUE SPECIFICITY 25652 25652 0.10
MISCELLANEOUS 22399 20186 0.08
PTM 21807 17661 0.08
DOMAIN 17056 14702 0.06
ALTERNATIVE PRODUCTS 12219 12219 0.05
INTERACTION 8011 8011 0.03
INDUCTION 7327 7327 0.03
SEQUENCE CAUTION 6991 6991 0.03
DEVELOPMENTAL STAGE 6431 6431 0.02
ENZYME REGULATION 4713 4713 0.02
WEB RESOURCE 4182 3404 0.02
DISEASE 3728 2674 0.01
CAUTION 3445 3371 0.01
MASS SPECTROMETRY 2868 2347 0.01
BIOPHYSICOCHEMICAL PROPERTIES 1736 1736 0.01
POLYMORPHISM 600 584 <0.01
RNA EDITING 484 484 <0.01
ALLERGEN 421 421 <0.01
TOXIC DOSE 326 321 <0.01
BIOTECHNOLOGY 150 150 <0.01
PHARMACEUTICAL 73 73 <0.01
Features (FT) 1810156 6.72
CHAIN 274122 265689 1.02
TRANSMEM 173694 38311 0.65
METAL 114900 28257 0.43
CONFLICT 94113 32564 0.35
STRAND 92328 8673 0.34
DOMAIN 89923 50410 0.33
HELIX 88733 9109 0.33
TOPO_DOM 87932 17895 0.33
CARBOHYD 77812 19639 0.29
DISULFID 75787 19361 0.28
BINDING 70187 24266 0.26
MOD_RES 66122 25262 0.25
ACT_SITE 64096 37057 0.24
REPEAT 58439 8843 0.22
VARIANT 46452 9660 0.17
NP_BIND 42707 30693 0.16
REGION 41005 21872 0.15
COMPBIAS 26848 15541 0.10
VAR_SEQ 26720 11548 0.10
SIGNAL 25239 25229 0.09
TURN 23767 7394 0.09
MUTAGEN 20170 4905 0.07
ZN_FING 19737 7617 0.07
MOTIF 18802 12363 0.07
SITE 16382 9270 0.06
INIT_MET 11205 11205 0.04
NON_TER 10727 8231 0.04
COILED 10588 6900 0.04
PROPEP 8180 6890 0.03
LIPID 7976 5129 0.03
DNA_BIND 7151 6620 0.03
PEPTIDE 6760 4226 0.03
TRANSIT 4711 4664 0.02
CA_BIND 2746 1143 0.01
CROSSLNK 2069 1416 0.01
NON_CONS 1297 534 <0.01
UNSURE 477 185 <0.01
SE_CYS 252 183 <0.01
Cross-references (DR) 4033012 14.98
InterPro 651509 247238 2.42
EMBL 511792 260808 1.90
Pfam 344308 240530 1.28
GO 342677 138336 1.27
PROSITE 254039 154834 0.94
Gene3D 236303 170230 0.88
KEGG 183185 165599 0.68
GenomeReviews 154937 138089 0.58
HAMAP 107598 107477 0.40
TIGRFAMs 106041 99200 0.39
PANTHER 103373 92186 0.38
PIR 99319 92736 0.37
PRINTS 95064 75266 0.35
HSSP 81207 81207 0.30
SMART 80636 61451 0.30
ProDom 72914 70542 0.27
BioCyc 72868 67334 0.27
UniGene 64399 59174 0.24
Ensembl 54123 54101 0.20
GermOnline 42029 41413 0.16
PDB 38670 10525 0.14
ArrayExpress 37093 37093 0.14
PIRSF 36473 33978 0.14
SMR 36314 36314 0.13
RZPD-ProtExp 28321 13310 0.11
TIGR 24163 23538 0.09
LinkHub 17851 17834 0.07
HGNC 16065 15996 0.06
IntAct 14505 14505 0.05
MGI 13185 13140 0.05
MIM 13142 10642 0.05
DIP 8831 8781 0.03
SGD 6236 6149 0.02
CYGD 6224 6135 0.02
RGD 5989 5985 0.02
TAIR 5775 5677 0.02
MEROPS 5482 5173 0.02
EcoGene 4311 4308 0.02
EchoBASE 4158 4126 0.02
H-InvDB 3677 3659 0.01
WormPep 3652 3028 0.01
FlyBase 3313 3189 0.01
WormBase 3304 3222 0.01
GeneDB_Spombe 3221 3186 0.01
Gramene 3075 3075 0.01
TRANSFAC 2884 2589 0.01
SubtiList 2795 2794 0.01
Reactome 2707 1546 0.01
Orphanet 2513 1615 0.01
GeneFarm 1854 1835 0.01
DrugBank 1826 502 0.01
StyGene 1589 1585 0.01
HPA 1486 1324 0.01
TubercuList 1446 1410 0.01
ZFIN 1303 1291 <0.01
SWISS-2DPAGE 1181 1181 <0.01
PseudoCAP 1164 1155 <0.01
ListiList 1093 1085 <0.01
REPRODUCTION-2DPAGE 834 834 <0.01
Leproma 636 633 <0.01
AGD 627 621 <0.01
PhotoList 602 602 <0.01
LegioList 564 564 <0.01
MaizeGDB 458 453 <0.01
OGP 379 378 <0.01
PeroxiBase 377 366 <0.01
REBASE 364 358 <0.01
HIV 361 351 <0.01
ECO2DBASE 351 299 <0.01
SagaList 349 348 <0.01
DictyBase 347 344 <0.01
GlycoSuiteDB 282 282 <0.01
PHCI-2DPAGE 241 241 <0.01
MypuList 193 193 <0.01
DOSAC-COBS-2DPAGE 149 147 <0.01
Aarhus/Ghent-2DPAGE 128 98 <0.01
Siena-2DPAGE 103 103 <0.01
HSC-2DPAGE 85 85 <0.01
PhosSite 70 70 <0.01
Cornea-2DPAGE 67 67 <0.01
COMPLUYEAST-2DPAGE 59 59 <0.01
euHCVdb 55 44 <0.01
PMMA-2DPAGE 52 52 <0.01
PptaseDB 29 29 <0.01
Rat-heart-2DPAGE 28 28 <0.01
ANU-2DPAGE 25 25 <0.01
BuruList 5 5 <0.01
Number of explicitly cross-referenced databases: 88
Number of implicitly cross-referenced databases: 26
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 241395
Total number of entries encoded on a Mitochondrion: 4307
Total number of entries encoded on a Plasmid: 3324
Total number of entries encoded on a Plastid: 26
Total number of entries encoded on a Plastid; Apicoplast: 6
Total number of entries encoded on a Plastid; Chloroplast: 8139
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 91
Number of fragments: 8376
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 20101
| UniProtKB/TrEMBL protein database release 36.0 statistics |
|---|
1. INTRODUCTION
Release 36.0 of 29-May-2007 of UniProtKB/TrEMBL contains 4377315 sequence entries
comprising 1418480772 amino acids.
635321 sequences have been added since release 35, the sequence data of
6733 existing entries has been updated and the annotations of
2544163 entries have been revised. This represents an increase of 17%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.60 Gln (Q) 3.93 Leu (L) 9.87 Ser (S) 6.80
Arg (R) 5.59 Glu (E) 6.03 Lys (K) 5.16 Thr (T) 5.60
Asn (N) 4.19 Gly (G) 7.07 Met (M) 2.40 Trp (W) 1.33
Asp (D) 5.26 His (H) 2.22 Phe (F) 4.03 Tyr (Y) 3.00
Cys (C) 1.35 Ile (I) 5.89 Pro (P) 4.86 Val (V) 6.66
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.05
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 133843
The first twenty species represent 811024 sequences: 18.5 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:61175
2x:25028
3x:12878
4x: 7213
5x: 4220
6x: 3118
7x: 2264
8x: 1889
9x: 1517
10x: 1615
11- 20x: 7245
21- 50x: 2799
51-100x: 1134
>100x: 1748
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 183943 Human immunodeficiency virus 1
2 95950 Oryza sativa subsp. japonica (Rice)
3 51267 Homo sapiens (Human)
4 50189 Trichomonas vaginalis G3
5 49978 Mus musculus (Mouse)
6 44092 Arabidopsis thaliana (Mouse-ear cress)
7 39844 Paramecium tetraurelia
8 38479 Oryza sativa subsp. indica (Rice)
9 37042 Hepatitis C virus
10 28036 Tetraodon nigroviridis (Green puffer)
11 26966 Drosophila melanogaster (Fruit fly)
12 22297 Medicago truncatula (Barrel medic)
13 20231 Caenorhabditis elegans
14 20162 Trypanosoma cruzi
15 19623 Danio rerio (Zebrafish) (Brachydanio rerio)
16 18255 uncultured bacterium
17 16853 Aedes aegypti (Yellowfever mosquito)
18 16685 Tetrahymena thermophila SB210
19 16460 Phaeosphaeria nodorum (Septoria nodorum)
20 14672 Plasmodium chabaudi
21 14311 Hepatitis B virus (HBV)
22 13528 Aspergillus niger
23 13412 Anopheles gambiae str. PEST
24 13071 Dictyostelium discoideum AX4
25 13062 Caenorhabditis briggsae
26 12905 Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
27 12570 Xenopus laevis (African clawed frog)
28 12020 Aspergillus oryzae
29 11783 Plasmodium berghei
30 10991 Chaetomium globosum (Soil fungus)
31 10645 Neurospora crassa
32 10429 Coccidioides immitis
33 10370 Neosartorya fischeri (Aspergillus fischerianus
34 10360 Aspergillus terreus (strain NIH 2624)
35 10067 Drosophila pseudoobscura (Fruit fly)
36 9747 Cryptococcus neoformans (Filobasidiella neoformans)
37 9721 Aspergillus fumigatus (Sartorya fumigata)
38 9720 Schistosoma japonicum (Blood fluke)
39 9518 Emericella nidulans (Aspergillus nidulans)
40 9453 Trypanosoma brucei
41 9332 Candida albicans (Yeast)
42 9080 Aspergillus clavatus
43 9004 Escherichia coli
44 8987 Rhodococcus sp. (strain RHA1)
45 8557 Rattus norvegicus (Rat)
46 8512 Stigmatella aurantiaca DW4/3-1
47 8424 Burkholderia xenovorans (strain LB400)
48 8285 Bos taurus (Bovine)
49 8249 Microscilla marina ATCC 23134
50 8123 Bradyrhizobium japonicum
51 8011 Leishmania infantum
52 7975 Ostreococcus tauri
53 7937 Frankia sp. EAN1pec
54 7880 Leishmania braziliensis
55 7834 Burkholderia phymatum STM815
56 7808 Plasmodium yoelii yoelii
57 7757 Solibacter usitatus (strain Ellin6076)
58 7659 Helicobacter pylori (Campylobacter pylori)
59 7522 Streptomyces coelicolor
60 7461 Burkholderia cenocepacia MC0-3
61 7439 Burkholderia sp. (strain 383) (Burkholderia cepacia
62 7432 Bradyrhizobium sp. BTAi1
63 7409 Burkholderia vietnamiensis G4
64 7403 Ostreococcus lucimarinus CCE9901
65 7349 Burkholderia pseudomallei 305
66 7310 Burkholderia phytofirmans PsJN
67 7297 Streptomyces avermitilis
68 7274 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
69 7215 Burkholderia pseudomallei (strain 668)
70 7199 Myxococcus xanthus (strain DK 1622)
71 7161 Saccharopolyspora erythraea (strain ATCC 11635 / DSM 40517 / NRRL 2338)
72 7147 Burkholderia pseudomallei (strain 1106a)
73 7136 Rhizobium loti (Mesorhizobium loti)
74 7113 Hepatitis C virus subtype 1b
75 7113 Leishmania major
76 6996 Burkholderia ambifaria MC40-6
77 6986 Rhizobium leguminosarum bv. viciae (strain 3841)
78 6953 Rhodopirellula baltica
79 6916 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
80 6870 Burkholderia cenocepacia (strain HI2424)
81 6726 Pseudomonas aeruginosa
82 6711 Bradyrhizobium sp. ORS278
83 6704 Frankia alni (strain ACN14a)
84 6679 Psychroflexus torquis ATCC 700755
85 6595 Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
86 6581 Burkholderia cepacia (strain ATCC 53795 / AMMD)
87 6564 Burkholderia multivorans ATCC 17616
88 6553 Hahella chejuensis (strain KCTC 2396)
89 6517 Plasmodium falciparum
90 6501 Ralstonia eutropha (Cupriavidus necator
91 6468 Ustilago maydis (Smut fungus)
92 6411 Cyanothece sp. CCY0110
93 6394 Giardia lamblia ATCC 50803
94 6337 Sinorhizobium medicae WSM419
95 6302 Burkholderia cenocepacia (strain AU 1054)
96 6300 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
97 6272 Stappia aggregata IAM 12614
98 6227 Oryza sativa (Rice)
99 6172 Bacillus anthracis
100 6170 Yarrowia lipolytica (Candida lipolytica)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 97220 ( 2%)
Bacteria 2327609 ( 53%)
Eukaryota 1446119 ( 33%)
Viruses 502656 ( 11%)
Other 3709 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 51267 ( 4%) ( 1%)
Other Mammalia 127002 ( 9%) ( 3%)
Other Vertebrata 174804 ( 12%) ( 4%)
Viridiplantae 355389 ( 25%) ( 8%)
Fungi 225852 ( 16%) ( 5%)
Insecta 144995 ( 10%) ( 3%)
Nematoda 37220 ( 3%) ( 1%)
Other 329590 ( 23%) ( 8%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 60636 1001-1100 26286
51- 100 301601 1101-1200 18518
101- 150 378501 1201-1300 12788
151- 200 360318 1301-1400 8482
201- 250 361834 1401-1500 6934
251- 300 346925 1501-1600 5095
301- 350 322854 1601-1700 3905
351- 400 254139 1701-1800 3210
401- 450 208459 1801-1900 2411
451- 500 175752 1901-2000 2043
501- 550 126518 2001-2100 1627
551- 600 93347 2101-2200 1698
601- 650 70033 2201-2300 1317
651- 700 54300 2301-2400 1093
701- 750 47578 2401-2500 864
751- 800 42478 >2500 7587
801- 850 31364
851- 900 27564
901- 950 20246
951-1000 15849
The average sequence length in UniProtKB/TrEMBL is 324 amino acids.
The shortest sequence is Q96AT0_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 6233829 1.42
Submitted to EMBL/GenBank/DDBJ 3316771 2557101 0.76
Journal 2829436 2384972 0.65
Thesis 6545 6491 <0.01
Book citation 4240 4195 <0.01
Submitted to other databases 273 267 <0.01
Other 76564 44407 0.02
Comments (CC) 1728629 0.39
CAUTION 930449 930449 0.21
SIMILARITY 281294 276006 0.06
FUNCTION 120986 113447 0.03
SUBCELLULAR LOCATION 116592 116592 0.03
CATALYTIC ACTIVITY 111221 100470 0.03
SUBUNIT 78221 78221 0.02
COFACTOR 58468 58243 0.01
PATHWAY 20159 16511 <0.01
DOMAIN 4879 4275 <0.01
MISCELLANEOUS 3656 3656 <0.01
INTERACTION 2695 2695 <0.01
ALLERGEN 5 5 <0.01
MASS SPECTROMETRY 4 4 <0.01
Features (FT) 1963260 0.45
NON_TER 1627029 970945 0.37
CHAIN 198930 168578 0.05
SIGNAL 136762 136762 0.03
TRANSIT 539 539 <0.01
Cross-references (DR) 32364872 7.39
InterPro 6595878 3152098 1.51
GO 5811332 2060966 1.33
EMBL 4964829 4369327 1.13
Pfam 4064279 3000740 0.93
PROSITE 2155488 1407184 0.49
GenomeReviews 1143708 1099331 0.26
Gene3D 933992 822246 0.21
KEGG 869694 832105 0.20
PRINTS 853606 716490 0.20
SMART 761643 594461 0.17
TIGRFAMs 666776 612627 0.15
PANTHER 578057 551858 0.13
ProDom 531615 507197 0.12
SMR 395863 395863 0.09
BioCyc 280518 265666 0.06
HSSP 270957 270554 0.06
UniGene 242685 224363 0.06
PIR 184223 149145 0.04
TIGR 171226 164627 0.04
Ensembl 157071 157070 0.04
PIRSF 154302 147467 0.04
RZPD-ProtExp 108875 33477 0.02
ArrayExpress 95585 95495 0.02
Gramene 70734 70734 0.02
MGI 42221 41755 0.01
HGNC 34930 34883 0.01
euHCVdb 32511 32511 0.01
FlyBase 24665 24629 0.01
WormPep 19286 19205 <0.01
TAIR 18958 18899 <0.01
WormBase 18790 18711 <0.01
ZFIN 15410 15406 <0.01
LinkHub 13489 13489 <0.01
DictyBase 12917 12917 <0.01
MEROPS 11642 11209 <0.01
LegioList 5339 5309 <0.01
IntAct 5322 5322 <0.01
ListiList 4722 4705 <0.01
PseudoCAP 4407 4404 <0.01
PDB 4323 2607 <0.01
BuruList 4235 4201 <0.01
PhotoList 4078 3954 <0.01
AGD 4073 4073 <0.01
RGD 3795 3466 <0.01
REBASE 3692 3667 <0.01
TubercuList 2543 2537 <0.01
DIP 2509 2504 <0.01
GeneDB_Spombe 1770 1758 <0.01
SagaList 1745 1651 <0.01
PeroxiBase 1371 1368 <0.01
Leproma 971 970 <0.01
TRANSFAC 872 862 <0.01
MypuList 589 585 <0.01
SGD 375 374 <0.01
PHCI-2DPAGE 106 106 <0.01
CYGD 101 98 <0.01
ANU-2DPAGE 60 60 <0.01
Reactome 46 33 <0.01
SWISS-2DPAGE 37 37 <0.01
REPRODUCTION-2DPAGE 30 30 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 87
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 248496
Total number of entries encoded on a Mitochondrion: 167124
Total number of entries encoded on a Plasmid: 71535
Total number of entries encoded on a Plastid: 3525
Total number of entries encoded on a Plastid; Apicoplast: 183
Total number of entries encoded on a Plastid; Chloroplast: 57807
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 212
Number of fragments: 973161
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml
For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:
UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail:
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication please use the following reference:
The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929