![]() |
UniProt Knowledgebase Release notes UniProtKB release 10.0 of 06-Mar-2007 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 10.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 52.0 and the UniProtKB/TrEMBL Protein Database release 35.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 52.0 statistics |
|---|
Release 52.0 of 06-Mar-07 of UniProtKB/Swiss-Prot contains 261'513 sequence entries, comprising 95'638'062 amino acids abstracted from 153'035 references.
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
| 52.0 | 03/07 | 261'513 | 95'638'062 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 5'065 |
| C.albicans | None yet | calbican.txt | 604 |
| C.elegans | Wormpep | celegans.txt | 3'081 |
| D.discoideum | DictyBase | dicty.txt | 350 |
| D.melanogaster | FlyBase | fly.txt | 2'588 |
| M.musculus | MGD | mgdtosp.txt | 12'408 |
| S.cerevisiae | SGD | yeast.txt | 6'239 |
| S.pombe | GeneDB_SPombe | pombe.txt | 3'217 |
1. INTRODUCTION
Release 52.0 of 06-Mar-07 of UniProtKB/Swiss-Prot contains 261513 sequence entries,
comprising 95638062 amino acids abstracted from 153035 references.
20329 sequences have been added since release 51.0, the sequence data of
11364 existing entries has been updated and the annotations of
196464 entries have been revised.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 7.87 Gln (Q) 3.96 Leu (L) 9.65 Ser (S) 6.84
Arg (R) 5.42 Glu (E) 6.66 Lys (K) 5.92 Thr (T) 5.41
Asn (N) 4.13 Gly (G) 6.95 Met (M) 2.39 Trp (W) 1.13
Asp (D) 5.34 His (H) 2.29 Phe (F) 3.95 Tyr (Y) 3.02
Cys (C) 1.50 Ile (I) 5.91 Pro (P) 4.82 Val (V) 6.73
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 10849
The first twenty species represent 80696 sequences: 30.9 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 5201
2x: 1664
3x: 815
4x: 532
5x: 349
6x: 318
7x: 218
8x: 183
9x: 159
10x: 82
11- 20x: 433
21- 50x: 325
51-100x: 169
>100x: 401
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 15945 Homo sapiens (Human)
2 12710 Mus musculus (Mouse)
3 6163 Saccharomyces cerevisiae (Baker's yeast)
4 5864 Rattus norvegicus (Rat)
5 4978 Arabidopsis thaliana (Mouse-ear cress)
6 4931 Escherichia coli
7 3420 Bos taurus (Bovine)
8 3176 Schizosaccharomyces pombe (Fission yeast)
9 3006 Caenorhabditis elegans
10 2849 Bacillus subtilis
11 2485 Drosophila melanogaster (Fruit fly)
12 1883 Escherichia coli O157:H7
13 1782 Methanococcus jannaschii
14 1780 Xenopus laevis (African clawed frog)
15 1774 Haemophilus influenzae
16 1665 Gallus gallus (Chicken)
17 1626 Salmonella typhimurium
18 1585 Pongo pygmaeus (Orangutan)
19 1550 Escherichia coli O6
20 1524 Shigella flexneri
21 1416 Mycobacterium tuberculosis
22 1222 Salmonella typhi
23 1160 Sus scrofa (Pig)
24 1158 Mycobacterium bovis
25 1135 Brachydanio rerio (Zebrafish) (Danio rerio)
26 1125 Oryza sativa (Rice)
27 1107 Pseudomonas aeruginosa
28 976 Synechocystis sp. (strain PCC 6803)
29 971 Archaeoglobus fulgidus
30 905 Yersinia pestis
31 887 Vibrio cholerae
32 884 Mimivirus
33 876 Rhizobium meliloti (Sinorhizobium meliloti)
34 829 Oryctolagus cuniculus (Rabbit)
35 801 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
36 756 Aquifex aeolicus
37 748 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 746 Staphylococcus aureus (strain N315)
39 737 Pasteurella multocida
40 734 Vibrio parahaemolyticus
41 730 Staphylococcus aureus (strain MW2)
42 728 Staphylococcus aureus (strain COL)
43 726 Staphylococcus aureus (strain MSSA476)
44 724 Staphylococcus aureus (strain MRSA252)
45 687 Mycoplasma pneumoniae
46 686 Streptomyces coelicolor
47 681 Canis familiaris (Dog)
48 677 Vibrio vulnificus
49 673 Bacillus halodurans
50 658 Vibrio vulnificus (strain YJ016)
51 632 Mycobacterium leprae
52 629 Anabaena sp. (strain PCC 7120)
53 618 Staphylococcus epidermidis (strain ATCC 12228)
54 618 Pseudomonas syringae pv. tomato
55 617 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
56 617 Neurospora crassa
57 617 Yersinia pseudotuberculosis
58 612 Pseudomonas putida (strain KT2440)
59 612 Bacillus anthracis
60 611 Treponema pallidum
61 606 Candida albicans (Yeast)
62 605 Ashbya gossypii (Yeast) (Eremothecium gossypii)
63 601 Photorhabdus luminescens subsp. laumondii
64 587 Methanobacterium thermoautotrophicum
65 581 Bradyrhizobium japonicum
66 575 Rickettsia prowazekii
67 574 Helicobacter pylori (Campylobacter pylori)
68 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
69 571 Kluyveromyces lactis (Yeast) (Candida sphaerica)
70 570 Ralstonia solanacearum (Pseudomonas solanacearum)
71 568 Pan troglodytes (Chimpanzee)
72 568 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
73 562 Buchnera aphidicola subsp. Schizaphis graminum
74 561 Salmonella paratyphi-a
75 556 Lactococcus lactis subsp. lactis (Streptococcus lactis)
76 556 Rhizobium loti (Mesorhizobium loti)
77 556 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
78 555 Helicobacter pylori J99 (Campylobacter pylori J99)
79 554 Zea mays (Maize)
80 549 Listeria monocytogenes
81 541 Listeria innocua
82 540 Xanthomonas campestris pv. campestris
83 537 Shewanella oneidensis
84 537 Bacillus cereus (strain ATCC 14579 / DSM 31)
85 530 Neisseria meningitidis serogroup A
86 530 Neisseria meningitidis serogroup B
87 518 Candida glabrata (Yeast) (Torulopsis glabrata)
88 517 Clostridium acetobutylicum
89 516 Caulobacter crescentus (Caulobacter vibrioides)
90 507 Buchnera aphidicola subsp. Baizongia pistaciae
91 506 Xanthomonas axonopodis pv. citri
92 491 Streptococcus pneumoniae
93 488 Thermotoga maritima
94 483 Mycoplasma genitalium
95 482 Oceanobacillus iheyensis
96 481 Listeria monocytogenes serotype 4b (strain F2365)
97 481 Xylella fastidiosa
98 474 Brucella suis
99 473 Salmonella choleraesuis
100 473 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
101 472 Brucella melitensis
102 472 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
103 470 Photobacterium profundum (Photobacterium sp. (strain SS9))
104 466 Haemophilus ducreyi
105 465 Deinococcus radiodurans
106 457 Methanosarcina acetivorans
107 453 Corynebacterium glutamicum (Brevibacterium flavum)
108 453 Clostridium perfringens
109 449 Rickettsia conorii
110 446 Pyrococcus horikoshii
111 445 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
112 441 Bacillus cereus (strain ATCC 10987)
113 441 Pyrococcus abyssi
114 440 Halobacterium salinarium (Halobacterium halobium)
115 440 Bordetella pertussis
116 437 Methanosarcina mazei (Methanosarcina frisia)
117 435 Chromobacterium violaceum
118 435 Chlamydia trachomatis
119 432 Bordetella parapertussis
120 432 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
121 431 Emericella nidulans (Aspergillus nidulans)
122 424 Borrelia burgdorferi (Lyme disease spirochete)
123 424 Yarrowia lipolytica (Candida lipolytica)
124 422 Thermoanaerobacter tengcongensis
125 421 Nicotiana tabacum (Common tobacco)
126 421 Pyrococcus furiosus
127 420 Lactobacillus plantarum
128 418 Synechococcus elongatus (Thermosynechococcus elongatus)
129 416 Chlamydia pneumoniae (Chlamydophila pneumoniae)
130 415 Streptococcus pyogenes serotype M6
131 414 Ovis aries (Sheep)
132 412 Campylobacter jejuni
133 412 Enterococcus faecalis (Streptococcus faecalis)
134 411 Streptococcus mutans
135 410 Streptomyces avermitilis
136 406 Chlamydia muridarum
137 406 Rhizobium sp. (strain NGR234)
138 404 Bacillus thuringiensis subsp. konkukian
139 397 Sulfolobus solfataricus
140 396 Streptococcus pyogenes serotype M1
141 394 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
142 394 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
143 391 Streptococcus pyogenes serotype M18
144 390 Streptococcus pyogenes serotype M3
145 381 Acinetobacter sp. (strain ADP1)
146 380 Burkholderia pseudomallei (Pseudomonas pseudomallei)
147 376 Shigella sonnei (strain Ss046)
148 373 Bacillus cereus (strain ZK / E33L)
149 372 Chlorobium tepidum
150 371 Rhodopseudomonas palustris
151 371 Nitrosomonas europaea
152 369 Corynebacterium efficiens
153 368 Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
154 367 Vibrio fischeri (strain ATCC 700601 / ES114)
155 366 Bacillus clausii (strain KSM-K16)
156 360 Rickettsia bellii (strain RML369-C)
157 356 Methanopyrus kandleri
158 356 Mannheimia succiniciproducens (strain MBEL55E)
159 355 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
160 354 Staphylococcus haemolyticus (strain JCSC1435)
161 354 Gloeobacter violaceus
162 353 Burkholderia mallei (Pseudomonas mallei)
163 352 Shigella boydii serotype 4 (strain Sb227)
164 351 Leptospira interrogans
165 349 Staphylococcus saprophyticus subsp. saprophyticus
166 349 Rickettsia felis (Rickettsia azadi)
167 348 Aeropyrum pernix
168 346 Streptococcus agalactiae serotype III
169 343 Streptococcus agalactiae serotype V
170 341 Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
171 340 Solanum tuberosum (Potato)
172 339 Shigella dysenteriae serotype 1 (strain Sd197)
173 339 Pisum sativum (Garden pea)
174 337 Methylococcus capsulatus
175 337 Dictyostelium discoideum (Slime mold)
176 337 Synechococcus sp. (strain WH8102)
177 334 Sulfolobus tokodaii
178 332 Rickettsia typhi
179 332 Prochlorococcus marinus (strain MIT 9313)
180 332 Glycine max (Soybean)
181 331 Prochlorococcus marinus
182 331 Geobacillus kaustophilus
183 323 Mycobacterium paratuberculosis
184 322 Staphylococcus aureus
185 320 Rhodopirellula baltica
186 317 Idiomarina loihiensis
187 316 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
188 316 Geobacter sulfurreducens
189 316 Thermoplasma acidophilum
190 315 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
191 314 Pseudomonas syringae pv. syringae (strain B728a)
192 312 Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
193 311 Aspergillus fumigatus (Sartorya fumigata)
194 311 Fusobacterium nucleatum subsp. nucleatum
195 309 Coxiella burnetii
196 307 Triticum aestivum (Wheat)
197 303 Macaca mulatta (Rhesus macaque)
198 300 Brucella abortus
199 299 Azoarcus sp. (strain EbN1)
200 296 Nocardia farcinica
201 296 Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
202 296 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
203 295 Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
204 294 Wolinella succinogenes
205 293 Zymomonas mobilis
206 292 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
207 292 Bacteroides thetaiotaomicron
208 291 Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
209 290 Sulfolobus acidocaldarius
210 287 Clostridium tetani
211 285 Symbiobacterium thermophilum
212 285 Pseudomonas putida
213 285 Silicibacter pomeroyi
214 285 Pyrobaculum aerophilum
215 285 Legionella pneumophila subsp. pneumophila
216 284 Haemophilus influenzae (strain 86-028NP)
217 284 Xanthomonas oryzae pv. oryzae
218 283 Hordeum vulgare (Barley)
219 282 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
220 282 Cavia porcellus (Guinea pig)
221 281 Legionella pneumophila (strain Paris)
222 281 Thermoplasma volcanium
223 279 Legionella pneumophila (strain Lens)
224 279 Pseudomonas fluorescens (strain PfO-1)
225 277 Corynebacterium diphtheriae
226 273 Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
227 273 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
228 269 Spinacia oleracea (Spinach)
229 268 Bacteriophage T4
230 267 Burkholderia sp. (strain 383) (Burkholderia cepacia
231 262 Rhodobacter capsulatus (Rhodopseudomonas capsulata)
232 261 Helicobacter hepaticus
233 261 Methanococcus maripaludis
234 260 Wigglesworthia glossinidia brevipalpis
235 259 Haloarcula marismortui (Halobacterium marismortui)
236 259 Equus caballus (Horse)
237 259 Bifidobacterium longum
238 258 Xanthomonas campestris pv. campestris (strain 8004)
239 257 Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
240 257 Staphylococcus aureus (strain NCTC 8325)
241 256 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
242 255 Leifsonia xyli subsp. xyli
243 254 Vaccinia virus (strain Copenhagen) (VACV)
244 253 Gluconobacter oxydans (Gluconobacter suboxydans)
245 252 Dechloromonas aromatica (strain RCB)
246 251 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
247 251 Porphyromonas gingivalis (Bacteroides gingivalis)
248 249 Bartonella henselae (Rochalimaea henselae)
249 248 Staphylococcus aureus (strain bovine RF122)
250 247 Campylobacter jejuni (strain RM1221)
251 244 Chlamydophila caviae
252 243 Bacteroides fragilis
253 243 Desulfotalea psychrophila
254 240 Blochmannia floridanus
255 238 Lactobacillus johnsonii
256 237 Cryptococcus neoformans (Filobasidiella neoformans)
257 237 Propionibacterium acnes
258 236 Bartonella quintana (Rochalimaea quintana)
259 235 Bacillus stearothermophilus (Geobacillus stearothermophilus)
260 235 Burkholderia pseudomallei (strain 1710b)
261 234 Pseudoalteromonas haloplanktis (strain TAC 125)
262 232 Xanthomonas campestris pv. vesicatoria (strain 85-10)
263 232 Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
264 230 Thiobacillus denitrificans (strain ATCC 25259)
265 229 Brucella abortus (strain 2308)
266 225 Gorilla gorilla gorilla (Lowland gorilla)
267 224 Chlamydomonas reinhardtii
268 221 Bdellovibrio bacteriovorus
269 220 Francisella tularensis subsp. tularensis
270 220 Ustilago maydis (Smut fungus)
271 220 Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
272 219 Staphylococcus aureus (strain USA300)
273 218 Streptococcus thermophilus (strain CNRZ 1066)
274 217 Porphyra purpurea
275 217 Escherichia coli (strain UTI89 / UPEC)
276 213 Psychrobacter arcticum
277 212 Klebsiella pneumoniae
278 211 Felis silvestris catus (Cat)
279 210 Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
280 210 Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
281 208 Cricetulus griseus (Chinese hamster)
282 208 Treponema denticola
283 207 Porphyra yezoensis
284 207 Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
285 206 Lactobacillus acidophilus
286 204 Caenorhabditis briggsae
287 201 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
288 201 Mesocricetus auratus (Golden hamster)
289 200 Vaccinia virus (strain Western Reserve / WR) (VACV)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 10908 ( 4%)
Bacteria 127559 ( 49%)
Eukaryota 112139 ( 43%)
Viruses 10907 ( 4%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 15946 ( 14%) ( 6%)
Other Mammalia 34810 ( 31%) ( 13%)
Other Vertebrata 10293 ( 9%) ( 4%)
Viridiplantae 18768 ( 17%) ( 7%)
Fungi 16974 ( 15%) ( 6%)
Insecta 4794 ( 4%) ( 2%)
Nematoda 3451 ( 3%) ( 1%)
Other 7103 ( 6%) ( 3%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 5020 1001-1100 2190
51- 100 19316 1101-1200 1469
101- 150 27430 1201-1300 1178
151- 200 26015 1301-1400 973
201- 250 26277 1401-1500 809
251- 300 22407 1501-1600 399
301- 350 22622 1601-1700 321
351- 400 20653 1701-1800 276
401- 450 16334 1801-1900 243
451- 500 14148 1901-2000 205
501- 550 10709 2001-2100 127
551- 600 7271 2101-2200 180
601- 650 6284 2201-2300 177
651- 700 4277 2301-2400 113
701- 750 3615 2401-2500 92
751- 800 2907 >2500 679
801- 850 2438
851- 900 2548
901- 950 1920
951-1000 1531
The average sequence length in UniProtKB/Swiss-Prot is 365 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_HUMAN (Q8WZ42): 34350 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1756
5.1 Table of the frequency of journal citations
Journals cited 1x: 618
2x: 236
3x: 133
4x: 89
5x: 67
6x: 47
7x: 35
8x: 35
9x: 34
10x: 16
11- 20x: 130
21- 50x: 149
51-100x: 70
>100x: 129
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 14640 Journal of Biological Chemistry
2 7008 Proceedings of the National Academy of Sciences of the U.S.A.
3 4469 Journal of Bacteriology
4 4188 Gene
5 4053 Nucleic Acids Research
6 3779 Biochemical and Biophysical Research Communications
7 3520 FEBS Letters
8 3259 Biochemistry
9 3220 The EMBO Journal
10 2924 European Journal of Biochemistry
11 2785 Nature
12 2709 Molecular and Cellular Biology
13 2646 Biochimica et Biophysica Acta
14 2497 Journal of Molecular Biology
15 2292 Genomics
16 2256 Cell
17 1818 Biochemical Journal
18 1718 Science
19 1483 Molecular Microbiology
20 1346 Plant Molecular Biology
21 1286 Journal of Virology
22 1268 Molecular and General Genetics
23 1267 Journal of Cell Biology
24 1113 Virology
25 1088 Human Molecular Genetics
26 1064 Journal of Biochemistry
27 1051 Nature Genetics
28 1046 Genes and Development
29 942 Oncogene
30 939 Plant Physiology
31 909 The American Journal of Human Genetics
32 822 Human Mutation
33 781 Journal of Immunology
34 775 Development
35 765 Infection and Immunity
36 741 Genetics
37 731 Structure
38 691 Yeast
39 685 Archives of Biochemistry and Biophysics
40 656 Journal of General Virology
41 652 Molecular Biology of the Cell
42 619 Microbiology
43 576 Blood
44 557 FEMS Microbiology Letters
45 553 The Plant Cell
46 541 Nature Structural Biology
47 505 Molecular Cell
48 497 Human Genetics
49 492 Journal of Cell Science
50 489 Current Genetics
51 486 Cancer Research
52 475 Developmental Biology
53 452 Mechanisms of Development
54 443 The Plant Journal
55 434 Applied and Environmental Microbiology
56 426 Protein Science
57 422 Neuron
58 418 Journal of Clinical Investigation
59 417 Mammalian Genome
60 417 Acta Crystallographica, Section D
61 409 Current Biology
62 402 Molecular and Biochemical Parasitology
63 393 Journal of Neuroscience
64 390 Molecular Endocrinology
65 380 The Journal of Experimental Medicine
66 370 Immunogenetics
67 345 Journal of Molecular Evolution
68 338 DNA and Cell Biology
69 335 Journal of Neurochemistry
70 333 Endocrinology
71 317 DNA Sequence
72 315 Toxicon
73 302 The Journal of Clinical Endocrinology and Metabolism
74 300 American Journal of Physiology
75 291 Molecular Biology and Evolution
76 286 Biological Chemistry Hoppe-Seyler
77 285 Brain Research. Molecular Brain Research
78 281 Bioscience, Biotechnology, and Biochemistry
79 249 Cytogenetics and Cell Genetics
80 242 Comparative Biochemistry and Physiology
81 242 Journal of General Microbiology
82 238 Proteins
83 224 Journal of Medical Genetics
84 220 Peptides
85 218 Antimicrobial Agents and Chemotherapy
86 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
87 215 Molecular Pharmacology
88 202 Journal of Investigative Dermatology
89 194 Biology of Reproduction
90 191 Genome Research
91 189 Plant and Cell Physiology
92 189 Nature Cell Biology
93 183 DNA Research
94 180 Molecular Plant-Microbe Interactions
95 180 Virus Research
96 175 European Journal of Immunology
97 172 Experimental Cell Research
98 164 RNA
99 160 Biochimie
100 158 Tissue Antigens
101 158 DNA
102 156 Neurology
103 152 Molecular and Cellular Endocrinology
104 151 Developmental Dynamics
105 149 Molecular Phylogenetics and Evolution
106 149 Hemoglobin
107 147 American Journal of Medical Genetics
108 145 Bioorganicheskaia Khimiia
109 140 Archives of Microbiology
110 140 Annals of Neurology
111 138 Genes to Cells
112 138 European Journal of Human Genetics
113 134 Insect Biochemistry and Molecular Biology
114 132 Journal of Human Genetics
115 129 Immunity
116 128 Planta
117 123 Animal Genetics
118 123 Developmental Cell
119 121 Molecular Reproduction and Development
120 118 Agricultural and Biological Chemistry
121 118 General and Comparative Endocrinology
122 117 Diabetes
123 111 Molecular Immunology
124 109 Glycobiology
125 109 Investigative Ophthalmology and Visual Science
126 107 The New England Journal of Medicine
127 106 Journal of Protein Chemistry
128 102 Molecular and Cellular Neuroscience
129 101 Archives of Virology
130 100 British Journal of Haematology
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 514754 1.97
Journal 448577 234379 1.72
Submitted to EMBL/GenBank/DDBJ 62118 54649 0.24
Submitted to Swiss-Prot 1146 1128 <0.01
Submitted to other databases 640 626 <0.01
Unpublished observations 637 631 <0.01
Book citation 578 566 <0.01
Plant Gene Register 537 525 <0.01
Thesis 380 378 <0.01
Patent 135 133 <0.01
Worm Breeder's Gazette 6 6 <0.01
Comments (CC) 1058365 4.05
SIMILARITY 297431 239032 1.14
FUNCTION 184967 178423 0.71
SUBCELLULAR LOCATION 143499 143499 0.55
SUBUNIT 98812 98812 0.38
CATALYTIC ACTIVITY 98305 90234 0.38
PATHWAY 52267 44460 0.20
COFACTOR 39248 35113 0.15
TISSUE SPECIFICITY 24970 24970 0.10
MISCELLANEOUS 22037 19810 0.08
PTM 21043 17067 0.08
DOMAIN 15822 13637 0.06
ALTERNATIVE PRODUCTS 11342 11342 0.04
CAUTION 10296 9071 0.04
INTERACTION 7093 7093 0.03
INDUCTION 7087 7087 0.03
DEVELOPMENTAL STAGE 6212 6212 0.02
ENZYME REGULATION 4520 4520 0.02
DISEASE 3600 2604 0.01
WEB RESOURCE 3448 2894 0.01
MASS SPECTROMETRY 2693 2242 0.01
BIOPHYSICOCHEMICAL PROPERTIES 1653 1653 0.01
POLYMORPHISM 586 570 <0.01
RNA EDITING 477 477 <0.01
ALLERGEN 419 419 <0.01
TOXIC DOSE 324 319 <0.01
BIOTECHNOLOGY 141 141 <0.01
PHARMACEUTICAL 73 73 <0.01
Features (FT) 1744500 6.67
CHAIN 266207 257979 1.02
TRANSMEM 167633 36823 0.64
METAL 111155 27262 0.43
STRAND 92324 8678 0.35
CONFLICT 90312 31283 0.35
HELIX 88592 9112 0.34
DOMAIN 86017 47759 0.33
TOPO_DOM 85194 17293 0.33
CARBOHYD 76138 19104 0.29
DISULFID 74123 18930 0.28
BINDING 64883 22914 0.25
ACT_SITE 62217 35914 0.24
MOD_RES 58057 23003 0.22
REPEAT 54932 8200 0.21
VARIANT 44600 9160 0.17
NP_BIND 40877 29321 0.16
REGION 38954 20753 0.15
COMPBIAS 25530 14709 0.10
VAR_SEQ 24590 10681 0.09
SIGNAL 24413 24403 0.09
TURN 23826 7391 0.09
MUTAGEN 19304 4726 0.07
ZN_FING 18962 7399 0.07
MOTIF 18018 11888 0.07
SITE 15906 9034 0.06
INIT_MET 10975 10975 0.04
NON_TER 10704 8215 0.04
COILED 9800 6360 0.04
PROPEP 7982 6739 0.03
LIPID 7548 4882 0.03
DNA_BIND 6921 6424 0.03
PEPTIDE 6644 4151 0.03
TRANSIT 4550 4504 0.02
CA_BIND 2693 1111 0.01
CROSSLNK 2024 1381 0.01
NON_CONS 1175 523 <0.01
UNSURE 469 183 <0.01
SE_CYS 251 182 <0.01
Cross-references (DR) 3650964 13.96
InterPro 617838 238799 2.36
EMBL 492611 253089 1.88
GO 339578 137455 1.30
Pfam 325839 231011 1.25
PROSITE 246106 150243 0.94
KEGG 180404 162878 0.69
GenomeReviews 150533 134388 0.58
HAMAP 103797 103678 0.40
TIGRFAMs 101433 95037 0.39
PIR 98446 91898 0.38
PRINTS 93227 73725 0.36
HSSP 80523 80523 0.31
SMART 76697 58494 0.29
BioCyc 72324 66863 0.28
ProDom 70565 68226 0.27
UniGene 59630 54887 0.23
Gene3D 59342 52618 0.23
Ensembl 51212 51212 0.20
GermOnline 42029 41413 0.16
PANTHER 41377 41043 0.16
PDB 38554 10498 0.15
SMR 35989 35989 0.14
ArrayExpress 35710 35710 0.14
RZPD-ProtExp 27256 12772 0.10
TIGR 23892 23273 0.09
PIRSF 20894 20636 0.08
LinkHub 17639 17639 0.07
HGNC 15422 15355 0.06
IntAct 13399 13399 0.05
MIM 12802 10405 0.05
MGI 12583 12537 0.05
DIP 8824 8774 0.03
SGD 6235 6148 0.02
CYGD 6223 6134 0.02
RGD 5692 5689 0.02
MEROPS 5364 5058 0.02
TAIR 5039 4947 0.02
EcoGene 4311 4308 0.02
EchoBASE 4158 4126 0.02
H-InvDB 3677 3659 0.01
WormPep 3617 3002 0.01
WormBase 3270 3188 0.01
FlyBase 3234 3110 0.01
GeneDB_Spombe 3209 3174 0.01
TRANSFAC 2878 2584 0.01
SubtiList 2790 2789 0.01
Gramene 2789 2789 0.01
Reactome 2706 1545 0.01
GeneFarm 1831 1812 0.01
DrugBank 1826 502 0.01
StyGene 1579 1575 0.01
HPA 1486 1324 0.01
TubercuList 1444 1408 0.01
SWISS-2DPAGE 1179 1179 <0.01
ZFIN 1120 1108 <0.01
ListiList 1091 1083 <0.01
REPRODUCTION-2DPAGE 829 829 <0.01
Leproma 635 632 <0.01
AGD 611 605 <0.01
PhotoList 601 601 <0.01
LegioList 560 560 <0.01
MaizeGDB 442 437 <0.01
OGP 377 376 <0.01
REBASE 364 358 <0.01
HIV 361 351 <0.01
PeroxiBase 361 350 <0.01
ECO2DBASE 351 299 <0.01
SagaList 347 346 <0.01
DictyBase 340 337 <0.01
GlycoSuiteDB 282 282 <0.01
PHCI-2DPAGE 241 241 <0.01
MypuList 192 192 <0.01
DOSAC-COBS-2DPAGE 149 147 <0.01
Aarhus/Ghent-2DPAGE 128 98 <0.01
Siena-2DPAGE 103 103 <0.01
HSC-2DPAGE 85 85 <0.01
PhosSite 70 70 <0.01
Cornea-2DPAGE 67 67 <0.01
COMPLUYEAST-2DPAGE 59 59 <0.01
euHCVdb 55 44 <0.01
PMMA-2DPAGE 52 52 <0.01
PptaseDB 29 29 <0.01
Rat-heart-2DPAGE 28 28 <0.01
ANU-2DPAGE 22 22 <0.01
Number of explicitly cross-referenced databases: 85
Number of implicitly cross-referenced databases: 26
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 237138
Total number of entries encoded on a Mitochondrion: 4306
Total number of entries encoded on a Plasmid: 3295
Total number of entries encoded on a Plastid: 26
Total number of entries encoded on a Plastid; Apicoplast: 6
Total number of entries encoded on a Plastid; Chloroplast: 8037
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 91
Number of fragments: 8360
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 18319
| UniProtKB/TrEMBL protein database release 35.0 statistics |
|---|
1. INTRODUCTION
Release 35.0 of 06-Mar-2007 of UniProtKB/TrEMBL contains 3874166 sequence entries
comprising 1260291226 amino acids.
696753 sequences have been added since release 34, the sequence data of
10513 existing entries has been updated and the annotations of
2124731 entries have been revised. This represents an increase of 23%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.37 Gln (Q) 4.00 Leu (L) 9.83 Ser (S) 6.86
Arg (R) 5.52 Glu (E) 6.04 Lys (K) 5.28 Thr (T) 5.62
Asn (N) 4.30 Gly (G) 6.98 Met (M) 2.39 Trp (W) 1.33
Asp (D) 5.23 His (H) 2.23 Phe (F) 4.06 Tyr (Y) 3.04
Cys (C) 1.37 Ile (I) 5.96 Pro (P) 4.85 Val (V) 6.60
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.05
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 127380
The first twenty species represent 763609 sequences: 19.7 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:58527
2x:23921
3x:12255
4x: 6910
5x: 4051
6x: 2999
7x: 2175
8x: 1781
9x: 1381
10x: 1525
11- 20x: 6496
21- 50x: 2672
51-100x: 1081
>100x: 1606
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 177135 Human immunodeficiency virus 1
2 71810 Oryza sativa (japonica cultivar-group)
3 53146 Homo sapiens (Human)
4 52403 Mus musculus (Mouse)
5 50189 Trichomonas vaginalis G3
6 45187 Arabidopsis thaliana (Mouse-ear cress)
7 39844 Paramecium tetraurelia
8 35448 Hepatitis C virus
9 28040 Tetraodon nigroviridis (Green puffer)
10 27313 Tetrahymena thermophila SB210
11 26501 Drosophila melanogaster (Fruit fly)
12 20214 Caenorhabditis elegans
13 20166 Trypanosoma cruzi
14 18711 Medicago truncatula (Barrel medic)
15 18430 Brachydanio rerio (Zebrafish) (Danio rerio)
16 17188 uncultured bacterium
17 16864 Aedes aegypti (Yellowfever mosquito)
18 16432 Phaeosphaeria nodorum SN15
19 14666 Plasmodium chabaudi
20 13922 Hepatitis B virus (HBV)
21 13557 Aspergillus niger
22 13415 Anopheles gambiae str. PEST
23 13082 Dictyostelium discoideum AX4
24 13074 Caenorhabditis briggsae
25 12674 Xenopus laevis (African clawed frog)
26 12032 Aspergillus oryzae
27 11780 Plasmodium berghei
28 11650 Gibberella zeae (Fusarium graminearum)
29 10980 Chaetomium globosum CBS 148.51
30 10662 Neurospora crassa
31 10403 Neosartorya fischeri (Aspergillus fischerianus
32 10393 Aspergillus terreus NIH2624
33 10278 Coccidioides immitis RS
34 10084 Drosophila pseudoobscura (Fruit fly)
35 10006 Aspergillus fumigatus (Sartorya fumigata)
36 9719 Schistosoma japonicum (Blood fluke)
37 9640 Emericella nidulans (Aspergillus nidulans)
38 9446 Trypanosoma brucei
39 9343 Candida albicans (Yeast)
40 9232 Rattus norvegicus (Rat)
41 9113 Aspergillus clavatus NRRL 1
42 9089 Entamoeba histolytica HM-1:IMSS
43 8994 Rhodococcus sp. (strain RHA1)
44 8811 Escherichia coli
45 8512 Stigmatella aurantiaca DW4/3-1
46 8436 Burkholderia xenovorans (strain LB400)
47 8249 Microscilla marina ATCC 23134
48 8244 Bos taurus (Bovine)
49 8097 Bradyrhizobium japonicum
50 7975 Ostreococcus tauri
51 7937 Frankia sp. EAN1pec
52 7834 Burkholderia phymatum STM815
53 7808 Plasmodium yoelii yoelii
54 7761 Solibacter usitatus (strain Ellin6076)
55 7663 Burkholderia vietnamiensis G4
56 7524 Streptomyces coelicolor
57 7490 Helicobacter pylori (Campylobacter pylori)
58 7461 Burkholderia cenocepacia MC0-3
59 7449 Burkholderia sp. (strain 383) (Burkholderia cepacia
60 7432 Bradyrhizobium sp. BTAi1
61 7310 Burkholderia phytofirmans PsJN
62 7300 Streptomyces avermitilis
63 7207 Myxococcus xanthus (strain DK 1622)
64 7139 Rhizobium loti (Mesorhizobium loti)
65 7113 Leishmania major
66 7042 Hepatitis C virus subtype 1b
67 6996 Burkholderia ambifaria MC40-6
68 6994 Rhizobium leguminosarum bv. viciae (strain 3841)
69 6952 Rhodopirellula baltica
70 6921 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
71 6882 Burkholderia cenocepacia (strain HI2424)
72 6792 Pseudomonas aeruginosa
73 6708 Frankia alni (strain ACN14a)
74 6679 Psychroflexus torquis ATCC 700755
75 6597 Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
76 6592 Burkholderia cepacia (strain ATCC 53795 / AMMD)
77 6566 Hahella chejuensis (strain KCTC 2396)
78 6564 Burkholderia multivorans ATCC 17616
79 6511 Ralstonia eutropha (Cupriavidus necator
80 6488 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
81 6471 Ustilago maydis (Smut fungus)
82 6420 Plasmodium falciparum
83 6398 Cryptococcus neoformans (Filobasidiella neoformans)
84 6394 Giardia lamblia ATCC 50803
85 6363 Cryptococcus neoformans var. neoformans B-3501A
86 6337 Sinorhizobium medicae WSM419
87 6313 Burkholderia cenocepacia (strain AU 1054)
88 6272 Stappia aggregata IAM 12614
89 6269 Oryza sativa (Rice)
90 6267 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
91 6186 Yarrowia lipolytica (Candida lipolytica)
92 6181 Bacillus anthracis
93 6176 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
94 6154 Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
95 6129 Bacillus thuringiensis serovar israelensis ATCC 35646
96 6110 Lyngbya sp. PCC 8106
97 6095 Burkholderia pseudomallei (strain 1710b)
98 6003 Delftia acidovorans SPH-1
99 5962 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
100 5909 Mycobacterium vanbaalenii (strain DSM 7251 / PYR-1)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 85628 ( 2%)
Bacteria 1953096 ( 50%)
Eukaryota 1353357 ( 35%)
Viruses 438444 ( 12%)
Other 3639 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 53146 ( 4%) ( 1%)
Other Mammalia 128207 ( 9%) ( 3%)
Other Vertebrata 166359 ( 12%) ( 4%)
Viridiplantae 277650 ( 21%) ( 7%)
Fungi 221421 ( 16%) ( 6%)
Insecta 140469 ( 10%) ( 4%)
Nematoda 36997 ( 3%) ( 1%)
Other 329108 ( 24%) ( 8%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 49629 1001-1100 23412
51- 100 256397 1101-1200 16751
101- 150 325180 1201-1300 11671
151- 200 309661 1301-1400 7811
201- 250 311786 1401-1500 6409
251- 300 298194 1501-1600 4703
301- 350 279147 1601-1700 3695
351- 400 220660 1701-1800 3063
401- 450 179845 1801-1900 2267
451- 500 153062 1901-2000 1925
501- 550 111613 2001-2100 1566
551- 600 81746 2101-2200 1583
601- 650 61473 2201-2300 1247
651- 700 47710 2301-2400 1057
701- 750 41866 2401-2500 846
751- 800 37368 >2500 7206
801- 850 27923
851- 900 24629
901- 950 17965
951-1000 14061
The average sequence length in UniProtKB/TrEMBL is 325 amino acids.
The shortest sequence is Q96AT0_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 5698366 1.47
Submitted to EMBL/GenBank/DDBJ 3001879 2155019 0.77
Journal 2633713 2185824 0.68
Thesis 6111 6059 <0.01
Book citation 4173 4128 <0.01
Submitted to other databases 281 275 <0.01
Other 52209 34730 0.01
Comments (CC) 1596760 0.41
CAUTION 764048 764048 0.20
SIMILARITY 288557 283163 0.07
SUBCELLULAR LOCATION 134969 134969 0.03
FUNCTION 121188 115775 0.03
CATALYTIC ACTIVITY 110804 100343 0.03
SUBUNIT 83645 83645 0.02
COFACTOR 59953 59716 0.02
PATHWAY 19869 16688 0.01
DOMAIN 5538 5051 <0.01
INTERACTION 4527 4527 <0.01
MISCELLANEOUS 3652 3652 <0.01
ALLERGEN 6 6 <0.01
MASS SPECTROMETRY 4 4 <0.01
Features (FT) 1871141 0.48
NON_TER 1551259 926827 0.40
CHAIN 190187 160684 0.05
SIGNAL 129160 129160 0.03
TRANSIT 535 535 <0.01
Cross-references (DR) 27737959 7.16
GO 6176751 2117700 1.59
InterPro 5183167 2359043 1.34
EMBL 4421091 3866114 1.14
Pfam 2961494 2202542 0.76
PROSITE 1630918 1054799 0.42
GenomeReviews 1149944 1105731 0.30
KEGG 874421 836997 0.23
Gene3D 758975 651836 0.20
PRINTS 661334 551431 0.17
SMART 562318 439627 0.15
TIGRFAMs 417608 385499 0.11
SMR 398503 398493 0.10
ProDom 390620 372224 0.10
BioCyc 281152 266236 0.07
HSSP 272626 272224 0.07
UniGene 253722 234448 0.07
PANTHER 239476 237162 0.06
PIR 185394 150287 0.05
TIGR 153089 146665 0.04
RZPD-ProtExp 114853 36208 0.03
ArrayExpress 101817 101712 0.03
Ensembl 93999 93997 0.02
PIRSF 87590 86698 0.02
Gramene 71013 71013 0.02
MGI 44956 43553 0.01
HGNC 38055 38004 0.01
euHCVdb 30120 30120 0.01
FlyBase 24842 24806 0.01
TAIR 19580 19520 0.01
WormPep 19308 19223 <0.01
WormBase 19065 18982 <0.01
LinkHub 13923 13923 <0.01
ZFIN 12974 12972 <0.01
DictyBase 12926 12926 <0.01
MEROPS 11947 11509 <0.01
LegioList 5345 5315 <0.01
IntAct 5246 5246 <0.01
ListiList 4724 4707 <0.01
PDB 4407 2648 <0.01
AGD 4096 4096 <0.01
PhotoList 4081 3957 <0.01
RGD 4044 3711 <0.01
REBASE 3697 3672 <0.01
TubercuList 2545 2539 <0.01
DIP 2487 2482 <0.01
GeneDB_Spombe 1779 1766 <0.01
SagaList 1749 1655 <0.01
Leproma 972 971 <0.01
PeroxiBase 902 901 <0.01
TRANSFAC 881 870 <0.01
MypuList 590 586 <0.01
SGD 407 406 <0.01
CYGD 133 130 <0.01
PHCI-2DPAGE 106 106 <0.01
ANU-2DPAGE 63 63 <0.01
Reactome 49 36 <0.01
REPRODUCTION-2DPAGE 40 40 <0.01
SWISS-2DPAGE 39 39 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 85
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 245473
Total number of entries encoded on a Mitochondrion: 156079
Total number of entries encoded on a Plasmid: 62697
Total number of entries encoded on a Plastid: 3559
Total number of entries encoded on a Plastid; Apicoplast: 179
Total number of entries encoded on a Plastid; Chloroplast: 53902
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 181
Number of fragments: 929039
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml
For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:
UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication please use the following reference:
The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929