![]() |
UniProt Knowledgebase Release notes UniProtKB release 9.0 of 31-Oct-2006 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 9.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 51.0 and the UniProtKB/TrEMBL Protein Database release 34.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 51.0 statistics |
|---|
Release 51.0 of 31-Oct-06 of UniProtKB/Swiss-Prot contains 241'242 sequence entries, comprising 88'541'632 amino acids abstracted from 148'048 references.
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 4'551 |
| C.albicans | None yet | calbican.txt | 572 |
| C.elegans | Wormpep | celegans.txt | 2'966 |
| D.discoideum | DictyBase | dicty.txt | 332 |
| D.melanogaster | FlyBase | fly.txt | 2'436 |
| M.musculus | MGD | mgdtosp.txt | 11'897 |
| S.cerevisiae | SGD | yeast.txt | 5'916 |
| S.pombe | GeneDB_SPombe | pombe.txt | 3'082 |
1. INTRODUCTION
Release 51.0 of 31-Oct-06 of UniProtKB/Swiss-Prot contains 241242 sequence entries,
comprising 88541632 amino acids abstracted from 148048 references.
19061 sequences have been added since release 50.0, the sequence data of
1336 existing entries has been updated and the annotations of
222181 entries have been revised.
The growth of the database is summarized below.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 7.89 Gln (Q) 3.95 Leu (L) 9.65 Ser (S) 6.82
Arg (R) 5.40 Glu (E) 6.67 Lys (K) 5.92 Thr (T) 5.41
Asn (N) 4.13 Gly (G) 6.96 Met (M) 2.38 Trp (W) 1.13
Asp (D) 5.35 His (H) 2.29 Phe (F) 3.96 Tyr (Y) 3.03
Cys (C) 1.50 Ile (I) 5.90 Pro (P) 4.83 Val (V) 6.73
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Arg, Asp, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 10671
The first twenty species represent 76403 sequences: 31.7 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 5274
2x: 1616
3x: 788
4x: 486
5x: 340
6x: 295
7x: 202
8x: 177
9x: 156
10x: 80
11- 20x: 416
21- 50x: 340
51-100x: 133
>100x: 368
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 14987 Homo sapiens (Human)
2 11897 Mus musculus (Mouse)
3 5916 Saccharomyces cerevisiae (Baker's yeast)
4 5528 Rattus norvegicus (Rat)
5 4877 Escherichia coli
6 4551 Arabidopsis thaliana (Mouse-ear cress)
7 3082 Schizosaccharomyces pombe (Fission yeast)
8 2966 Caenorhabditis elegans
9 2872 Bos taurus (Bovine)
10 2842 Bacillus subtilis
11 2436 Drosophila melanogaster (Fruit fly)
12 1837 Escherichia coli O157:H7
13 1782 Methanococcus jannaschii
14 1774 Haemophilus influenzae
15 1587 Salmonella typhimurium
16 1556 Gallus gallus (Chicken)
17 1509 Escherichia coli O6
18 1508 Xenopus laevis (African clawed frog)
19 1486 Shigella flexneri
20 1410 Mycobacterium tuberculosis
21 1347 Pongo pygmaeus (Orangutan)
22 1182 Salmonella typhi
23 1153 Mycobacterium bovis
24 1105 Sus scrofa (Pig)
25 1089 Pseudomonas aeruginosa
26 1014 Oryza sativa (Rice)
27 971 Archaeoglobus fulgidus
28 970 Synechocystis sp. (strain PCC 6803)
29 930 Brachydanio rerio (Zebrafish) (Danio rerio)
30 884 Mimivirus
31 866 Yersinia pestis
32 863 Vibrio cholerae
33 857 Rhizobium meliloti (Sinorhizobium meliloti)
34 807 Oryctolagus cuniculus (Rabbit)
35 754 Aquifex aeolicus
36 723 Pasteurella multocida
37 707 Vibrio parahaemolyticus
38 690 Staphylococcus aureus (strain Mu50 / ATCC 700699)
39 688 Staphylococcus aureus (strain N315)
40 687 Mycoplasma pneumoniae
41 677 Streptomyces coelicolor
42 672 Staphylococcus aureus (strain MW2)
43 670 Staphylococcus aureus (strain COL)
44 669 Staphylococcus aureus (strain MRSA252)
45 668 Staphylococcus aureus (strain MSSA476)
46 660 Bacillus halodurans
47 659 Canis familiaris (Dog)
48 655 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
49 650 Vibrio vulnificus
50 631 Vibrio vulnificus (strain YJ016)
51 630 Mycobacterium leprae
52 612 Anabaena sp. (strain PCC 7120)
53 608 Treponema pallidum
54 589 Pseudomonas putida (strain KT2440)
55 589 Pseudomonas syringae pv. tomato
56 587 Bacillus anthracis
57 587 Methanobacterium thermoautotrophicum
58 581 Neurospora crassa
59 577 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
60 577 Staphylococcus epidermidis (strain ATCC 12228)
61 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
62 572 Candida albicans (Yeast)
63 570 Helicobacter pylori (Campylobacter pylori)
64 569 Ashbya gossypii (Yeast) (Eremothecium gossypii)
65 568 Photorhabdus luminescens subsp. laumondii
66 565 Bradyrhizobium japonicum
67 562 Pan troglodytes (Chimpanzee)
68 562 Buchnera aphidicola subsp. Schizaphis graminum
69 561 Yersinia pseudotuberculosis
70 551 Helicobacter pylori J99 (Campylobacter pylori J99)
71 551 Ralstonia solanacearum (Pseudomonas solanacearum)
72 549 Rickettsia prowazekii
73 548 Zea mays (Maize)
74 548 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
75 543 Lactococcus lactis subsp. lactis (Streptococcus lactis)
76 540 Rhizobium loti (Mesorhizobium loti)
77 539 Listeria monocytogenes
78 535 Kluyveromyces lactis (Yeast) (Candida sphaerica)
79 531 Listeria innocua
80 528 Xanthomonas campestris pv. campestris
81 518 Neisseria meningitidis serogroup A
82 517 Neisseria meningitidis serogroup B
83 516 Shewanella oneidensis
84 512 Bacillus cereus (strain ATCC 14579 / DSM 31)
85 507 Buchnera aphidicola subsp. Baizongia pistaciae
86 507 Clostridium acetobutylicum
87 505 Caulobacter crescentus (Caulobacter vibrioides)
88 501 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
89 491 Xanthomonas axonopodis pv. citri
90 484 Candida glabrata (Yeast) (Torulopsis glabrata)
91 483 Mycoplasma genitalium
92 483 Thermotoga maritima
93 483 Salmonella paratyphi-a
94 478 Streptococcus pneumoniae
95 471 Xylella fastidiosa
96 470 Listeria monocytogenes serotype 4b (strain F2365)
97 462 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
98 461 Deinococcus radiodurans
99 460 Brucella melitensis
100 460 Oceanobacillus iheyensis
101 460 Brucella suis
102 452 Haemophilus ducreyi
103 448 Methanosarcina acetivorans
104 446 Pyrococcus horikoshii
105 443 Corynebacterium glutamicum (Brevibacterium flavum)
106 441 Pyrococcus abyssi
107 441 Clostridium perfringens
108 439 Halobacterium salinarium (Halobacterium halobium)
109 435 Chlamydia trachomatis
110 429 Methanosarcina mazei (Methanosarcina frisia)
111 426 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
112 421 Borrelia burgdorferi (Lyme disease spirochete)
113 420 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
114 420 Photobacterium profundum (Photobacterium sp. (strain SS9))
115 417 Nicotiana tabacum (Common tobacco)
116 416 Pyrococcus furiosus
117 415 Chlamydia pneumoniae (Chlamydophila pneumoniae)
118 414 Chromobacterium violaceum
119 413 Bordetella parapertussis
120 413 Bordetella pertussis
121 411 Thermoanaerobacter tengcongensis
122 410 Bacillus cereus (strain ATCC 10987)
123 410 Lactobacillus plantarum
124 409 Synechococcus elongatus (Thermosynechococcus elongatus)
125 406 Chlamydia muridarum
126 405 Emericella nidulans (Aspergillus nidulans)
127 405 Rhizobium sp. (strain NGR234)
128 404 Campylobacter jejuni
129 401 Streptococcus pyogenes serotype M6
130 401 Streptococcus mutans
131 401 Ovis aries (Sheep)
132 400 Enterococcus faecalis (Streptococcus faecalis)
133 395 Sulfolobus solfataricus
134 395 Streptomyces avermitilis
135 395 Salmonella choleraesuis
136 393 Yarrowia lipolytica (Candida lipolytica)
137 389 Streptococcus pyogenes serotype M1
138 384 Streptococcus pyogenes serotype M18
139 383 Streptococcus pyogenes serotype M3
140 380 Rickettsia conorii
141 374 Bacillus thuringiensis subsp. konkukian
142 365 Chlorobium tepidum
143 361 Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
144 360 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
145 360 Corynebacterium efficiens
146 356 Rhodopseudomonas palustris
147 356 Nitrosomonas europaea
148 354 Acinetobacter sp. (strain ADP1)
149 350 Methanopyrus kandleri
150 348 Aeropyrum pernix
151 347 Leptospira interrogans
152 342 Gloeobacter violaceus
153 341 Burkholderia pseudomallei (Pseudomonas pseudomallei)
154 341 Bacillus cereus (strain ZK / E33L)
155 339 Pisum sativum (Garden pea)
156 337 Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
157 332 Dictyostelium discoideum (Slime mold)
158 332 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
159 331 Streptococcus agalactiae serotype III
160 331 Bacillus clausii (strain KSM-K16)
161 329 Streptococcus agalactiae serotype V
162 328 Synechococcus sp. (strain WH8102)
163 328 Sulfolobus tokodaii
164 326 Mannheimia succiniciproducens (strain MBEL55E)
165 321 Prochlorococcus marinus (strain MIT 9313)
166 321 Prochlorococcus marinus
167 319 Burkholderia mallei (Pseudomonas mallei)
168 318 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
169 313 Methylococcus capsulatus
170 313 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
171 312 Vibrio fischeri (strain ATCC 700601 / ES114)
172 311 Thermoplasma acidophilum
173 309 Staphylococcus aureus
174 308 Rhodopirellula baltica
175 305 Triticum aestivum (Wheat)
176 302 Fusobacterium nucleatum subsp. nucleatum
177 300 Mycobacterium paratuberculosis
178 300 Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
179 300 Geobacillus kaustophilus
180 298 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
181 297 Coxiella burnetii
182 297 Staphylococcus haemolyticus (strain JCSC1435)
183 297 Macaca mulatta (Rhesus macaque)
184 297 Geobacter sulfurreducens
185 292 Glycine max (Soybean)
186 291 Staphylococcus saprophyticus subsp. saprophyticus
187 290 Aspergillus fumigatus (Sartorya fumigata)
188 287 Sulfolobus acidocaldarius
189 286 Idiomarina loihiensis
190 286 Solanum tuberosum (Potato)
191 286 Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
192 284 Pseudomonas putida
193 283 Bacteroides thetaiotaomicron
194 279 Wolinella succinogenes
195 279 Pyrobaculum aerophilum
196 278 Cavia porcellus (Guinea pig)
197 278 Nocardia farcinica
198 278 Hordeum vulgare (Barley)
199 277 Zymomonas mobilis
200 277 Clostridium tetani
201 275 Thermoplasma volcanium
202 274 Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
203 269 Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
204 268 Bacteriophage T4
205 267 Symbiobacterium thermophilum
206 267 Spinacia oleracea (Spinach)
207 266 Corynebacterium diphtheriae
208 266 Shigella sonnei (strain Ss046)
209 261 Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
210 261 Rhodobacter capsulatus (Rhodopseudomonas capsulata)
211 259 Azoarcus sp. (strain EbN1)
212 259 Brucella abortus
213 256 Legionella pneumophila subsp. pneumophila
214 255 Silicibacter pomeroyi
215 255 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
216 255 Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
217 254 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
218 254 Vaccinia virus (strain Copenhagen) (VACV)
219 254 Wigglesworthia glossinidia brevipalpis
220 251 Haloarcula marismortui (Halobacterium marismortui)
221 251 Legionella pneumophila (strain Paris)
222 251 Helicobacter hepaticus
223 251 Methanococcus maripaludis
224 250 Xanthomonas oryzae pv. oryzae
225 249 Equus caballus (Horse)
226 249 Shigella boydii serotype 4 (strain Sb227)
227 249 Legionella pneumophila (strain Lens)
228 248 Bifidobacterium longum
229 247 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
230 245 Pseudomonas syringae pv. syringae (strain B728a)
231 242 Porphyromonas gingivalis (Bacteroides gingivalis)
232 241 Shigella dysenteriae serotype 1 (strain Sd197)
233 240 Chlamydophila caviae
234 240 Leifsonia xyli subsp. xyli
235 236 Haemophilus influenzae (strain 86-028NP)
236 235 Bacillus stearothermophilus (Geobacillus stearothermophilus)
237 232 Bacteroides fragilis
238 231 Blochmannia floridanus
239 229 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
240 228 Gluconobacter oxydans (Gluconobacter suboxydans)
241 226 Campylobacter jejuni (strain RM1221)
242 225 Lactobacillus johnsonii
243 224 Propionibacterium acnes
244 223 Bartonella henselae (Rochalimaea henselae)
245 223 Desulfotalea psychrophila
246 222 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
247 220 Porphyra purpurea
248 220 Chlamydomonas reinhardtii
249 216 Gorilla gorilla gorilla (Lowland gorilla)
250 213 Cryptococcus neoformans (Filobasidiella neoformans)
251 212 Bartonella quintana (Rochalimaea quintana)
252 212 Pseudomonas fluorescens (strain PfO-1)
253 211 Klebsiella pneumoniae
254 210 Xanthomonas campestris pv. campestris (strain 8004)
255 207 Cricetulus griseus (Chinese hamster)
256 206 Burkholderia sp. (strain 383) (Burkholderia cepacia
257 206 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
258 205 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
259 203 Bdellovibrio bacteriovorus
260 201 Felis silvestris catus (Cat)
261 200 Vaccinia virus (strain Western Reserve / WR) (VACV)
262 200 Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 10690 ( 4%)
Bacteria 116347 ( 48%)
Eukaryota 103579 ( 43%)
Viruses 10626 ( 4%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 14988 ( 14%) ( 6%)
Other Mammalia 32159 ( 31%) ( 13%)
Other Vertebrata 9382 ( 9%) ( 4%)
Viridiplantae 16436 ( 16%) ( 7%)
Fungi 16083 ( 16%) ( 7%)
Insecta 4691 ( 5%) ( 2%)
Nematoda 3376 ( 3%) ( 1%)
Other 6464 ( 6%) ( 3%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 4507 1001-1100 2086
51- 100 16972 1101-1200 1370
101- 150 25049 1201-1300 1062
151- 200 23742 1301-1400 887
201- 250 24479 1401-1500 740
251- 300 20223 1501-1600 376
301- 350 21179 1601-1700 275
351- 400 19510 1701-1800 219
401- 450 15270 1801-1900 211
451- 500 13104 1901-2000 172
501- 550 9799 2001-2100 118
551- 600 6807 2101-2200 177
601- 650 5760 2201-2300 160
651- 700 3886 2301-2400 106
701- 750 3213 2401-2500 90
751- 800 2663 >2500 627
801- 850 2287
851- 900 2428
901- 950 1811
951-1000 1433
The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_HUMAN (Q8WZ42): 34350 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1756
5.1 Table of the frequency of journal citations
Journals cited 1x: 615
2x: 231
3x: 130
4x: 89
5x: 64
6x: 48
7x: 38
8x: 29
9x: 31
10x: 21
11- 20x: 120
21- 50x: 151
51-100x: 62
>100x: 127
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 14192 Journal of Biological Chemistry
2 6814 Proceedings of the National Academy of Sciences of the U.S.A.
3 4398 Journal of Bacteriology
4 4128 Gene
5 4011 Nucleic Acids Research
6 3687 Biochemical and Biophysical Research Communications
7 3459 FEBS Letters
8 3183 Biochemistry
9 3108 The EMBO Journal
10 2902 European Journal of Biochemistry
11 2727 Nature
12 2601 Biochimica et Biophysica Acta
13 2570 Molecular and Cellular Biology
14 2424 Journal of Molecular Biology
15 2250 Genomics
16 2195 Cell
17 1772 Biochemical Journal
18 1666 Science
19 1443 Molecular Microbiology
20 1329 Plant Molecular Biology
21 1265 Molecular and General Genetics
22 1192 Journal of Cell Biology
23 1177 Journal of Virology
24 1081 Virology
25 1062 Human Molecular Genetics
26 1059 Journal of Biochemistry
27 1010 Nature Genetics
28 1004 Genes and Development
29 906 Plant Physiology
30 904 Oncogene
31 873 The American Journal of Human Genetics
32 802 Human Mutation
33 763 Journal of Immunology
34 737 Infection and Immunity
35 726 Development
36 703 Structure
37 699 Genetics
38 681 Yeast
39 675 Archives of Biochemistry and Biophysics
40 641 Journal of General Virology
41 603 Microbiology
42 585 Molecular Biology of the Cell
43 551 FEMS Microbiology Letters
44 544 Blood
45 536 Nature Structural Biology
46 525 The Plant Cell
47 487 Human Genetics
48 475 Current Genetics
49 468 Cancer Research
50 467 Journal of Cell Science
51 465 Molecular Cell
52 444 Developmental Biology
53 429 Applied and Environmental Microbiology
54 426 Mechanisms of Development
55 426 The Plant Journal
56 413 Journal of Clinical Investigation
57 413 Protein Science
58 409 Neuron
59 406 Mammalian Genome
60 406 Acta Crystallographica, Section D
61 400 Molecular and Biochemical Parasitology
62 383 Molecular Endocrinology
63 376 Journal of Neuroscience
64 372 The Journal of Experimental Medicine
65 370 Current Biology
66 364 Immunogenetics
67 341 Journal of Molecular Evolution
68 333 Endocrinology
69 333 DNA and Cell Biology
70 322 Journal of Neurochemistry
71 307 DNA Sequence
72 291 The Journal of Clinical Endocrinology and Metabolism
73 291 American Journal of Physiology
74 285 Biological Chemistry Hoppe-Seyler
75 282 Toxicon
76 281 Molecular Biology and Evolution
77 274 Bioscience, Biotechnology, and Biochemistry
78 273 Brain Research. Molecular Brain Research
79 247 Cytogenetics and Cell Genetics
80 242 Journal of General Microbiology
81 231 Comparative Biochemistry and Physiology
82 229 Proteins
83 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
84 215 Antimicrobial Agents and Chemotherapy
85 212 Journal of Medical Genetics
86 210 Molecular Pharmacology
87 205 Peptides
88 193 Journal of Investigative Dermatology
89 186 Biology of Reproduction
90 181 Plant and Cell Physiology
91 181 DNA Research
92 180 Genome Research
93 178 Molecular Plant-Microbe Interactions
94 171 Nature Cell Biology
95 171 European Journal of Immunology
96 169 Virus Research
97 158 Experimental Cell Research
98 158 Tissue Antigens
99 158 DNA
100 157 Biochimie
101 150 RNA
102 146 Molecular and Cellular Endocrinology
103 146 Molecular Phylogenetics and Evolution
104 145 Hemoglobin
105 144 Bioorganicheskaia Khimiia
106 143 American Journal of Medical Genetics
107 137 Archives of Microbiology
108 134 Neurology
109 133 Annals of Neurology
110 132 Developmental Dynamics
111 131 European Journal of Human Genetics
112 129 Insect Biochemistry and Molecular Biology
113 126 Journal of Human Genetics
114 124 Genes to Cells
115 123 Immunity
116 118 Agricultural and Biological Chemistry
117 117 Molecular Reproduction and Development
118 116 General and Comparative Endocrinology
119 116 Animal Genetics
120 115 Planta
121 112 Diabetes
122 110 Molecular Immunology
123 108 Glycobiology
124 107 Developmental Cell
125 106 Investigative Ophthalmology and Visual Science
126 103 Journal of Protein Chemistry
127 101 The New England Journal of Medicine
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 475775 1.97
Journal 417892 219561 1.73
Submitted to EMBL/GenBank/DDBJ 54275 47414 0.22
Submitted to Swiss-Prot 788 784 <0.01
Unpublished observations 629 623 <0.01
Submitted to other databases 579 566 <0.01
Book citation 566 554 <0.01
Plant Gene Register 531 519 <0.01
Thesis 378 376 <0.01
Patent 131 129 <0.01
Worm Breeder's Gazette 6 6 <0.01
Comments (CC) 967004 4.01
SIMILARITY 271184 219367 1.12
FUNCTION 169309 163414 0.70
SUBCELLULAR LOCATION 130620 130620 0.54
CATALYTIC ACTIVITY 91194 84076 0.38
SUBUNIT 87937 87937 0.36
PATHWAY 48385 41451 0.20
COFACTOR 36561 32578 0.15
TISSUE SPECIFICITY 23913 23913 0.10
MISCELLANEOUS 20879 18880 0.09
PTM 19119 15678 0.08
DOMAIN 14102 12187 0.06
ALTERNATIVE PRODUCTS 10235 10235 0.04
CAUTION 8915 8197 0.04
INDUCTION 6672 6672 0.03
INTERACTION 5931 5931 0.02
DEVELOPMENTAL STAGE 5926 5926 0.02
ENZYME REGULATION 3541 3541 0.01
DISEASE 3457 2516 0.01
WEB RESOURCE 3060 2533 0.01
MASS SPECTROMETRY 2556 2135 0.01
BIOPHYSICOCHEMICAL PROPERTIES 1564 1564 0.01
POLYMORPHISM 562 549 <0.01
RNA EDITING 457 457 <0.01
ALLERGEN 413 413 <0.01
TOXIC DOSE 307 304 <0.01
BIOTECHNOLOGY 136 136 <0.01
PHARMACEUTICAL 69 69 <0.01
Features (FT) 1694975 7.03
CHAIN 245076 237799 1.02
TRANSMEM 152755 33630 0.63
TURN 117303 9164 0.49
METAL 104372 25294 0.43
STRAND 90666 8491 0.38
HELIX 86054 8898 0.36
CONFLICT 85085 29528 0.35
TOPO_DOM 80831 16416 0.34
DOMAIN 76562 41407 0.32
CARBOHYD 72770 18269 0.30
DISULFID 71207 18191 0.30
ACT_SITE 55572 32657 0.23
REPEAT 51933 7667 0.22
BINDING 47911 19347 0.20
MOD_RES 42295 18871 0.18
VARIANT 41927 8508 0.17
NP_BIND 35175 25159 0.15
REGION 35073 18260 0.15
COMPBIAS 23586 13408 0.10
SIGNAL 23081 23071 0.10
VAR_SEQ 22047 9609 0.09
MUTAGEN 17867 4388 0.07
MOTIF 17158 11356 0.07
ZN_FING 16881 6588 0.07
SITE 14257 8162 0.06
NON_TER 10836 8297 0.04
INIT_MET 10172 10172 0.04
COILED 8808 5634 0.04
PROPEP 7394 6204 0.03
LIPID 6883 4535 0.03
DNA_BIND 6558 6123 0.03
PEPTIDE 6429 3966 0.03
TRANSIT 4212 4175 0.02
CA_BIND 2640 1086 0.01
CROSSLNK 1743 1175 0.01
NON_CONS 1150 519 <0.01
UNSURE 457 178 <0.01
SE_CYS 249 180 <0.01
Cross-references (DR) 3117026 12.92
InterPro 579873 222611 2.40
EMBL 456684 232945 1.89
Pfam 304763 215339 1.26
PROSITE 224649 137580 0.93
GO 212471 91220 0.88
GenomeReviews 138640 122934 0.57
KEGG 113059 102134 0.47
PIR 97026 90613 0.40
TIGRFAMs 94134 88121 0.39
HAMAP 92615 92497 0.38
PRINTS 89163 70134 0.37
HSSP 78938 78938 0.33
SMART 71541 54152 0.30
BioCyc 70591 65335 0.29
ProDom 57638 55724 0.24
Ensembl 42511 42498 0.18
UniGene 38777 36129 0.16
PANTHER 38091 37880 0.16
PDB 36752 10060 0.15
SMR 34082 34082 0.14
ArrayExpress 33838 33838 0.14
RZPD-ProtExp 25639 12023 0.11
TIGR 22645 22052 0.09
PIRSF 19888 19634 0.08
LinkHub 17389 17388 0.07
HGNC 14412 14352 0.06
MIM 12287 10033 0.05
MGI 11746 11700 0.05
IntAct 10997 10997 0.05
SGD 5974 5906 0.02
MEROPS 5241 4936 0.02
RGD 5225 5222 0.02
GermOnline 4925 4879 0.02
TAIR 4609 4521 0.02
EcoGene 4259 4256 0.02
EchoBASE 4160 4128 0.02
H-InvDB 3677 3659 0.02
WormPep 3566 2963 0.01
WormBase 3195 3114 0.01
FlyBase 3164 3040 0.01
GeneDB_Spombe 3115 3080 0.01
TRANSFAC 2862 2569 0.01
SubtiList 2784 2783 0.01
Gramene 2675 2675 0.01
GeneFarm 1761 1742 0.01
StyGene 1543 1539 0.01
HPA 1480 1320 0.01
TubercuList 1438 1402 0.01
SWISS-2DPAGE 1170 1170 <0.01
ListiList 1071 1063 <0.01
Reactome 1003 1003 <0.01
ZFIN 917 907 <0.01
Leproma 633 630 <0.01
AGD 575 569 <0.01
PhotoList 568 568 <0.01
LegioList 500 500 <0.01
MaizeDB 439 434 <0.01
OGP 375 374 <0.01
HIV 361 356 <0.01
REBASE 353 349 <0.01
ECO2DBASE 351 299 <0.01
DictyBase 334 331 <0.01
SagaList 332 331 <0.01
GlycoSuiteDB 282 282 <0.01
PeroxiBase 265 258 <0.01
PHCI-2DPAGE 241 241 <0.01
MypuList 189 189 <0.01
Aarhus/Ghent-2DPAGE 128 98 <0.01
Siena-2DPAGE 103 103 <0.01
HSC-2DPAGE 85 85 <0.01
PhosSite 70 70 <0.01
COMPLUYEAST-2DPAGE 59 59 <0.01
PMMA-2DPAGE 52 52 <0.01
PptaseDB 29 29 <0.01
Rat-heart-2DPAGE 28 28 <0.01
ANU-2DPAGE 21 21 <0.01
Number of explicitly cross-referenced databases: 78
Number of implicitly cross-referenced databases: 27
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 230300
Total number of entries encoded on a Mitochondrion: 4085
Total number of entries encoded on a Plasmid: 3160
Total number of entries encoded on a Plastid: 26
Total number of entries encoded on a Plastid; Apicoplast: 6
Total number of entries encoded on a Plastid; Chloroplast: 5862
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 90
Number of fragments: 8444
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 16655
| UniProtKB/TrEMBL protein database release 34.0 statistics |
|---|
1. INTRODUCTION
Release 34.0 of 31-Oct-2006 of UniProtKB/TrEMBL contains 3313264 sequence entries
comprising 1073273937 amino acids.
497407 sequences have been added since release 33, the sequence data of
2732 existing entries has been updated and the annotations of
2815857 entries have been revised. This represents an increase of 18%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.30 Gln (Q) 3.93 Leu (L) 9.81 Ser (S) 6.92
Arg (R) 5.52 Glu (E) 6.04 Lys (K) 5.27 Thr (T) 5.65
Asn (N) 4.32 Gly (G) 7.01 Met (M) 2.40 Trp (W) 1.34
Asp (D) 5.21 His (H) 2.24 Phe (F) 4.05 Tyr (Y) 3.04
Cys (C) 1.40 Ile (I) 5.94 Pro (P) 4.88 Val (V) 6.59
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.05
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Lys, Asp, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 119998
The first twenty species represent 673630 sequences: 20.3 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:55767
2x:22572
3x:11680
4x: 6456
5x: 3574
6x: 2775
7x: 1989
8x: 1662
9x: 1274
10x: 1259
11- 20x: 6025
21- 50x: 2493
51-100x: 1023
>100x: 1449
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 162793 Human immunodeficiency virus 1
2 71887 Oryza sativa (japonica cultivar-group)
3 55035 Homo sapiens (Human)
4 47627 Mus musculus (Mouse)
5 44945 Arabidopsis thaliana (Mouse-ear cress)
6 32207 Hepatitis C virus
7 28028 Tetraodon nigroviridis (Green puffer)
8 27313 Tetrahymena thermophila SB210
9 24948 Drosophila melanogaster (Fruit fly)
10 20246 Caenorhabditis elegans
11 20134 Trypanosoma cruzi
12 17387 Medicago truncatula (Barrel medic)
13 16934 Brachydanio rerio (Zebrafish) (Danio rerio)
14 16817 Aedes aegypti (Yellowfever mosquito)
15 16450 Phaeosphaeria nodorum SN15
16 15078 Anopheles gambiae str. PEST
17 14942 uncultured bacterium
18 14666 Plasmodium chabaudi
19 13103 Caenorhabditis briggsae
20 13090 Dictyostelium discoideum AX4
21 12866 Hepatitis B virus (HBV)
22 12285 Xenopus laevis (African clawed frog)
23 12042 Aspergillus oryzae
24 11773 Plasmodium berghei
25 11656 Gibberella zeae (Fusarium graminearum)
26 11001 Chaetomium globosum CBS 148.51
27 10779 Neurospora crassa
28 10404 Aspergillus terreus NIH2624
29 10299 Coccidioides immitis RS
30 10060 Drosophila pseudoobscura (Fruit fly)
31 10030 Aspergillus fumigatus (Sartorya fumigata)
32 9704 Schistosoma japonicum (Blood fluke)
33 9671 Emericella nidulans (Aspergillus nidulans)
34 9449 Trypanosoma brucei
35 9386 Candida albicans (Yeast)
36 9325 Rattus norvegicus (Rat)
37 9089 Entamoeba histolytica HM-1:IMSS
38 9042 Rhodococcus sp. (strain RHA1)
39 9000 Escherichia coli
40 8513 Burkholderia xenovorans (strain LB400)
41 8512 Stigmatella aurantiaca DW4/3-1
42 8217 Bos taurus (Bovine)
43 8109 Bradyrhizobium japonicum
44 8063 Solibacter usitatus Ellin6076
45 7937 Frankia sp. EAN1pec
46 7809 Plasmodium yoelii yoelii
47 7663 Burkholderia vietnamiensis G4
48 7533 Streptomyces coelicolor
49 7509 Burkholderia sp. (strain 383) (Burkholderia cepacia
50 7432 Bradyrhizobium sp. BTAi1
51 7314 Streptomyces avermitilis
52 7262 Myxococcus xanthus (strain DK 1622)
53 7152 Rhizobium loti (Mesorhizobium loti)
54 7106 Leishmania major
55 7062 Rhizobium leguminosarum bv. viciae (strain 3841)
56 7049 Burkholderia cenocepacia HI2424
57 6963 Rhodopirellula baltica
58 6951 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
59 6776 Pseudomonas aeruginosa
60 6711 Frankia alni ACN14a
61 6679 Psychroflexus torquis ATCC 700755
62 6629 Hahella chejuensis (strain KCTC 2396)
63 6607 Burkholderia cepacia AMMD
64 6545 Ustilago maydis (Smut fungus)
65 6419 Cryptococcus neoformans (Filobasidiella neoformans)
66 6394 Giardia lamblia ATCC 50803
67 6393 Burkholderia cenocepacia (strain AU 1054)
68 6383 Cryptococcus neoformans var. neoformans B-3501A
69 6337 Sinorhizobium medicae WSM419
70 6280 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
71 6225 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
72 6219 Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
73 6217 Yarrowia lipolytica (Candida lipolytica)
74 6204 Bacillus anthracis
75 6201 Ralstonia eutropha H16
76 6153 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
77 6150 Burkholderia pseudomallei (strain 1710b)
78 6129 Bacillus thuringiensis serovar israelensis ATCC 35646
79 6025 Plasmodium falciparum
80 5989 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
81 5979 Mycobacterium vanbaalenii PYR-1
82 5936 Yersinia pestis
83 5904 Bacillus cereus G9241
84 5896 Rhizobium meliloti (Sinorhizobium meliloti)
85 5881 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
86 5852 Mycobacterium sp. KMS
87 5811 Rhizobium etli (strain CFN 42 / ATCC 51251)
88 5696 Crocosphaera watsonii
89 5689 Bacillus sp. NRRL B-14911
90 5687 Mycobacterium sp. JLS
91 5665 Nocardia farcinica
92 5599 Burkholderia pseudomallei (Pseudomonas pseudomallei)
93 5590 Mycobacterium sp. (strain MCS)
94 5589 Helicobacter pylori (Campylobacter pylori)
95 5553 Gallus gallus (Chicken)
96 5538 Photobacterium profundum 3TCK
97 5534 Anabaena sp. (strain PCC 7120)
98 5523 Bacillus weihenstephanensis KBAB4
99 5516 Pseudomonas fluorescens (strain PfO-1)
100 5513 Mycobacterium flavescens PYR-GCK
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 74858 ( 2%)
Bacteria 1612809 ( 49%)
Eukaryota 1184862 ( 36%)
Viruses 437391 ( 13%)
Other 3342 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 55035 ( 5%) ( 2%)
Other Mammalia 119094 ( 10%) ( 4%)
Other Vertebrata 157328 ( 13%) ( 5%)
Viridiplantae 259197 ( 22%) ( 8%)
Fungi 187904 ( 16%) ( 6%)
Insecta 134424 ( 11%) ( 4%)
Nematoda 36759 ( 3%) ( 1%)
Other 235121 ( 20%) ( 7%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 42142 1001-1100 19892
51- 100 217858 1101-1200 14274
101- 150 273464 1201-1300 10158
151- 200 258617 1301-1400 6712
201- 250 259851 1401-1500 5505
251- 300 246860 1501-1600 3963
301- 350 231466 1601-1700 3126
351- 400 183758 1701-1800 2686
401- 450 148439 1801-1900 1991
451- 500 127714 1901-2000 1673
501- 550 93656 2001-2100 1295
551- 600 68771 2101-2200 1321
601- 650 51687 2201-2300 1094
651- 700 40198 2301-2400 887
701- 750 35649 2401-2500 672
751- 800 31903 >2500 6094
801- 850 23601
851- 900 20923
901- 950 15229
951-1000 11919
The average sequence length in UniProtKB/TrEMBL is 323 amino acids.
The shortest sequence is Q96AT0_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 4959113 1.50
Submitted to EMBL/GenBank/DDBJ 2564765 1763042 0.77
Journal 2340853 1911015 0.71
Thesis 5927 5875 <0.01
Book citation 4222 4177 <0.01
Submitted to other databases 390 382 <0.01
Other 42956 27184 0.01
Comments (CC) 1594288 0.48
CAUTION 660733 660733 0.20
SIMILARITY 339625 332287 0.10
SUBCELLULAR LOCATION 146413 146413 0.04
FUNCTION 143112 137402 0.04
CATALYTIC ACTIVITY 111155 106722 0.03
SUBUNIT 81452 81452 0.02
COFACTOR 69193 68827 0.02
PATHWAY 28469 24415 0.01
DOMAIN 7826 7061 <0.01
MISCELLANEOUS 3690 3690 <0.01
INTERACTION 2586 2586 <0.01
MASS SPECTROMETRY 28 20 <0.01
ALLERGEN 6 6 <0.01
Features (FT) 1584760 0.48
NON_TER 1415744 846140 0.43
SIGNAL 117681 113596 0.04
CHAIN 50795 29813 0.02
TRANSIT 540 536 <0.01
Cross-references (DR) 26048671 7.86
GO 6905142 1966760 2.08
InterPro 4978587 2263523 1.50
EMBL 3795886 3304821 1.15
Pfam 2836849 2111526 0.86
PROSITE 1568909 1014438 0.47
KEGG 886563 848900 0.27
GenomeReviews 847386 805667 0.26
PRINTS 640221 533328 0.19
SMART 543869 423745 0.16
TIGRFAMs 404484 373646 0.12
SMR 383447 383385 0.12
ProDom 370442 352631 0.11
BioCyc 286378 271096 0.09
HSSP 275921 275518 0.08
PANTHER 249322 246987 0.08
PIR 190563 155148 0.06
TIGR 136495 130204 0.04
UniGene 111140 106824 0.03
Ensembl 99717 99715 0.03
ArrayExpress 91421 91404 0.03
RZPD-ProtExp 81191 32808 0.02
PIRSF 80345 79566 0.02
Gramene 71161 71161 0.02
MGI 44511 43786 0.01
FlyBase 25700 25663 0.01
TAIR 19951 19890 0.01
WormPep 19324 19239 0.01
WormBase 19271 19188 0.01
LinkHub 14660 14660 <0.01
MEROPS 12421 11979 <0.01
ZFIN 12302 12300 <0.01
LegioList 5403 5373 <0.01
IntAct 5209 5209 <0.01
ListiList 4744 4727 <0.01
AGD 4141 4141 <0.01
PDB 4137 2465 <0.01
PhotoList 4112 3988 <0.01
HGNC 3152 3152 <0.01
TubercuList 2551 2545 <0.01
DictyBase 1967 1967 <0.01
RGD 1902 1896 <0.01
GeneDB_Spombe 1872 1859 <0.01
SagaList 1762 1668 <0.01
Leproma 974 973 <0.01
TRANSFAC 897 886 <0.01
SGD 688 671 <0.01
PeroxiBase 633 627 <0.01
MypuList 593 589 <0.01
REBASE 124 119 <0.01
PHCI-2DPAGE 106 106 <0.01
ANU-2DPAGE 64 64 <0.01
SWISS-2DPAGE 48 48 <0.01
Reactome 7 7 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 78
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 234955
Total number of entries encoded on a Mitochondrion: 144724
Total number of entries encoded on a Plasmid: 55874
Total number of entries encoded on a Plastid: 3169
Total number of entries encoded on a Plastid; Apicoplast: 179
Total number of entries encoded on a Plastid; Chloroplast: 51775
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 166
Number of fragments: 848216
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml
For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:
UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on CD-ROM from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication please use the following reference:
Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Mazumder R., O'Donovan C., Redaschi N., Suzek B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34: D187-D191 (2006).