UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Release notes
UniProtKB release 12.0 of 24-Jul-2007

Content

  Introduction
  UniProtKB/Swiss-Prot Protein Knowledgebase release statistics
  UniProtKB/TrEMBL Protein Database release statistics

  Submissions and Updates
  Download information
  Contact
  Citation

  Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.

Introduction

Release 12.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 54.0 and the UniProtKB/TrEMBL Protein Database release 37.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.


UniProtKB/Swiss-Prot protein knowledgebase release 54.0 statistics

Release 54.0 of 24-Jul-07 of UniProtKB/Swiss-Prot contains 276'256 sequence entries, comprising 101'466'206 amino acids abstracted from 158'294 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062
53.0 05/07 269'293 98'902'758
54.0 07/07 276'256 101'466'206

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 5'935
C.albicans None yet calbican.txt 638
C.elegans Wormpep celegans.txt 3'040
D.discoideum DictyBase dicty.txt 352
D.melanogaster FlyBase fly.txt 2'567
M.musculus MGD mgdtosp.txt 13'561
S.cerevisiae SGD yeast.txt 6'162
S.pombe GeneDB_SPombe pombe.txt 3'229

UniProtKB/Swiss-Prot release statistics
1.  INTRODUCTION

Release 54.0 of 24-Jul-07 of UniProtKB/Swiss-Prot contains 276256 sequence entries,
comprising 101466206 amino acids abstracted from 158294 references. 

7104 sequences have been added since release 53.0, the sequence data of
690 existing entries has been updated and the annotations of
269152 entries have been revised.

The growth of the database is summarized below.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 7.86   Gln (Q) 3.97   Leu (L) 9.66   Ser (S) 6.88
   Arg (R) 5.43   Glu (E) 6.66   Lys (K) 5.92   Thr (T) 5.40
   Asn (N) 4.12   Gly (G) 6.94   Met (M) 2.39   Trp (W) 1.13
   Asp (D) 5.33   His (H) 2.29   Phe (F) 3.95   Tyr (Y) 3.01
   Cys (C) 1.50   Ile (I) 5.88   Pro (P) 4.85   Val (V) 6.72

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 10989

   The first twenty species represent 85570 sequences:    31 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 5195
                            2x: 1653
                            3x:  802
                            4x:  529
                            5x:  365
                            6x:  321
                            7x:  222
                            8x:  193
                            9x:  162
                           10x:   94
                       11- 20x:  498
                       21- 50x:  356
                       51-100x:  184
                         >100x:  415


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      16890  Homo sapiens (Human)
       2      13561  Mus musculus (Mouse)
       3       6229  Rattus norvegicus (Rat)
       4       6162  Saccharomyces cerevisiae (Baker's yeast)
       5       5935  Arabidopsis thaliana (Mouse-ear cress)
       6       4930  Escherichia coli
       7       4179  Bos taurus (Bovine)
       8       3229  Schizosaccharomyces pombe (Fission yeast)
       9       3040  Caenorhabditis elegans
      10       2856  Bacillus subtilis
      11       2567  Drosophila melanogaster (Fruit fly)
      12       2111  Xenopus laevis (African clawed frog)
      13       1893  Escherichia coli O157:H7
      14       1887  Pongo pygmaeus (Orangutan)
      15       1803  Gallus gallus (Chicken)
      16       1782  Methanococcus jannaschii
      17       1774  Haemophilus influenzae
      18       1647  Salmonella typhimurium
      19       1560  Escherichia coli O6
      20       1535  Shigella flexneri
      21       1442  Danio rerio (Zebrafish) (Brachydanio rerio)
      22       1419  Mycobacterium tuberculosis
      23       1282  Oryza sativa subsp. japonica (Rice)
      24       1242  Salmonella typhi
      25       1225  Sus scrofa (Pig)
      26       1225  Pseudomonas aeruginosa
      27       1160  Mycobacterium bovis
      28        978  Synechocystis sp. (strain PCC 6803)
      29        971  Archaeoglobus fulgidus
      30        920  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      31        915  Yersinia pestis
      32        894  Vibrio cholerae
      33        884  Mimivirus
      34        883  Rhizobium meliloti (Sinorhizobium meliloti)
      35        838  Oryctolagus cuniculus (Rabbit)
      36        810  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      37        808  Staphylococcus aureus (strain N315)
      38        784  Staphylococcus aureus (strain COL)
      39        783  Staphylococcus aureus (strain MW2)
      40        779  Staphylococcus aureus (strain MSSA476)
      41        773  Staphylococcus aureus (strain MRSA252)
      42        756  Aquifex aeolicus
      43        740  Vibrio parahaemolyticus
      44        740  Pasteurella multocida
      45        720  Canis familiaris (Dog)
      46        688  Streptomyces coelicolor
      47        687  Mycoplasma pneumoniae
      48        685  Vibrio vulnificus
      49        680  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      50        676  Bacillus halodurans
      51        665  Vibrio vulnificus (strain YJ016)
      52        660  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      53        659  Staphylococcus epidermidis (strain ATCC 12228)
      54        646  Neurospora crassa
      55        644  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      56        639  Pan troglodytes (Chimpanzee)
      57        638  Candida albicans (Yeast)
      58        633  Mycobacterium leprae
      59        632  Anabaena sp. (strain PCC 7120)
      60        628  Yersinia pseudotuberculosis
      61        622  Bacillus anthracis
      62        620  Pseudomonas syringae pv. tomato
      63        616  Pseudomonas putida (strain KT2440)
      64        612  Treponema pallidum
      65        611  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      66        605  Photorhabdus luminescens subsp. laumondii
      67        593  Zea mays (Maize)
      68        592  Salmonella paratyphi-a
      69        589  Methanobacterium thermoautotrophicum
      70        585  Bradyrhizobium japonicum
      71        579  Rickettsia prowazekii
      72        575  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      73        574  Helicobacter pylori (Campylobacter pylori)
      74        573  Ralstonia solanacearum (Pseudomonas solanacearum)
      75        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
      76        568  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      77        563  Candida glabrata (Yeast) (Torulopsis glabrata)
      78        562  Buchnera aphidicola subsp. Schizaphis graminum
      79        561  Rhizobium loti (Mesorhizobium loti)
      80        555  Lactococcus lactis subsp. lactis (Streptococcus lactis)
      81        555  Helicobacter pylori J99 (Campylobacter pylori J99)
      82        552  Listeria monocytogenes
      83        548  Bacillus cereus (strain ATCC 14579 / DSM 31)
      84        544  Listeria innocua
      85        542  Xanthomonas campestris pv. campestris
      86        541  Shewanella oneidensis
      87        531  Neisseria meningitidis serogroup A
      88        531  Neisseria meningitidis serogroup B
      89        519  Caulobacter crescentus (Caulobacter vibrioides)
      90        519  Clostridium acetobutylicum
      91        514  Brucella melitensis
      92        514  Brucella suis
      93        508  Xanthomonas axonopodis pv. citri
      94        508  Salmonella choleraesuis
      95        507  Buchnera aphidicola subsp. Baizongia pistaciae
      96        492  Streptococcus pneumoniae
      97        490  Thermotoga maritima
      98        487  Oceanobacillus iheyensis
      99        485  Rickettsia conorii
     100        484  Listeria monocytogenes serotype 4b (strain F2365)
     101        483  Mycoplasma genitalium
     102        481  Xylella fastidiosa
     103        475  Photobacterium profundum (Photobacterium sp. (strain SS9))
     104        472  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
     105        468  Haemophilus ducreyi
     106        467  Deinococcus radiodurans
     107        460  Emericella nidulans (Aspergillus nidulans)
     108        458  Methanosarcina acetivorans
     109        457  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     110        455  Yarrowia lipolytica (Candida lipolytica)
     111        454  Corynebacterium glutamicum (Brevibacterium flavum)
     112        454  Clostridium perfringens
     113        451  Bacillus cereus (strain ATCC 10987)
     114        447  Pyrococcus horikoshii
     115        444  Bordetella parapertussis
     116        443  Bordetella pertussis
     117        442  Pyrococcus abyssi
     118        441  Halobacterium salinarium (Halobacterium halobium)
     119        440  Chromobacterium violaceum
     120        440  Rickettsia felis (Rickettsia azadi)
     121        437  Methanosarcina mazei (Methanosarcina frisia)
     122        435  Chlamydia trachomatis
     123        434  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     124        429  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     125        427  Rickettsia bellii (strain RML369-C)
     126        425  Borrelia burgdorferi (Lyme disease spirochete)
     127        423  Lactobacillus plantarum
     128        422  Thermoanaerobacter tengcongensis
     129        422  Pyrococcus furiosus
     130        421  Nicotiana tabacum (Common tobacco)
     131        420  Synechococcus elongatus (Thermosynechococcus elongatus)
     132        418  Bacillus thuringiensis subsp. konkukian
     133        417  Streptococcus pyogenes serotype M6
     134        417  Ovis aries (Sheep)
     135        416  Chlamydia pneumoniae (Chlamydophila pneumoniae)
     136        414  Enterococcus faecalis (Streptococcus faecalis)
     137        413  Streptococcus mutans
     138        412  Campylobacter jejuni
     139        412  Streptomyces avermitilis
     140        411  Shigella sonnei (strain Ss046)
     141        406  Staphylococcus haemolyticus (strain JCSC1435)
     142        406  Chlamydia muridarum
     143        406  Rhizobium sp. (strain NGR234)
     144        397  Rickettsia typhi
     145        397  Staphylococcus saprophyticus subsp. saprophyticus 
     146        397  Streptococcus pyogenes serotype M1
     147        397  Sulfolobus solfataricus
     148        394  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
     149        393  Streptococcus pyogenes serotype M18
     150        391  Streptococcus pyogenes serotype M3
     151        387  Shigella boydii serotype 4 (strain Sb227)
     152        387  Bacillus cereus (strain ZK / E33L)
     153        385  Acinetobacter sp. (strain ADP1)
     154        384  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     155        378  Shigella dysenteriae serotype 1 (strain Sd197)
     156        376  Rhodopseudomonas palustris
     157        374  Vibrio fischeri (strain ATCC 700601 / ES114)
     158        373  Chlorobium tepidum
     159        373  Nitrosomonas europaea
     160        371  Corynebacterium efficiens
     161        370  Bacillus clausii (strain KSM-K16)
     162        369  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
     163        368  Solanum tuberosum (Potato)
     164        362  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     165        359  Methanopyrus kandleri
     166        359  Mannheimia succiniciproducens (strain MBEL55E)
     167        358  Gloeobacter violaceus
     168        357  Burkholderia mallei (Pseudomonas mallei)
     169        352  Dictyostelium discoideum (Slime mold)
     170        352  Staphylococcus aureus (strain NCTC 8325)
     171        351  Leptospira interrogans
     172        350  Aeropyrum pernix
     173        348  Streptococcus agalactiae serotype III
     174        345  Streptococcus agalactiae serotype V
     175        344  Brucella abortus
     176        343  Synechococcus sp. (strain WH8102)
     177        343  Pisum sativum (Garden pea)
     178        341  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
     179        341  Aspergillus fumigatus (Sartorya fumigata)
     180        340  Methylococcus capsulatus
     181        339  Geobacillus kaustophilus
     182        337  Prochlorococcus marinus (strain MIT 9313)
     183        334  Sulfolobus tokodaii
     184        333  Prochlorococcus marinus
     185        333  Glycine max (Soybean)
     186        330  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     187        326  Mycobacterium paratuberculosis
     188        325  Staphylococcus aureus
     189        324  Staphylococcus aureus (strain bovine RF122)
     190        323  Idiomarina loihiensis
     191        321  Pseudomonas syringae pv. syringae (strain B728a)
     192        320  Macaca mulatta (Rhesus macaque)
     193        320  Rhodopirellula baltica
     194        317  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
     195        317  Geobacter sulfurreducens
     196        317  Thermoplasma acidophilum
     197        315  Coxiella burnetii
     198        314  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
     199        313  Staphylococcus aureus (strain USA300)
     200        313  Fusobacterium nucleatum subsp. nucleatum
     201        311  Triticum aestivum (Wheat)
     202        310  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
     203        304  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     204        302  Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
     205        300  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     206        300  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
     207        297  Nocardia farcinica
     208        295  Zymomonas mobilis
     209        295  Wolinella succinogenes
     210        293  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
     211        293  Bacteroides thetaiotaomicron
     212        293  Pseudomonas fluorescens (strain PfO-1)
     213        291  Sulfolobus acidocaldarius
     214        290  Hordeum vulgare (Barley)
     215        290  Legionella pneumophila subsp. pneumophila 
     216        290  Xanthomonas oryzae pv. oryzae
     217        289  Silicibacter pomeroyi
     218        288  Clostridium tetani
     219        288  Haemophilus influenzae (strain 86-028NP)
     220        287  Legionella pneumophila (strain Paris)
     221        287  Symbiobacterium thermophilum
     222        287  Pseudomonas putida
     223        286  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
     224        286  Pyrobaculum aerophilum
     225        285  Legionella pneumophila (strain Lens)
     226        284  Cavia porcellus (Guinea pig)
     227        283  Burkholderia sp. (strain 383) (Burkholderia cepacia 
     228        283  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     229        282  Thermoplasma volcanium
     230        280  Corynebacterium diphtheriae
     231        277  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
     232        277  Brucella abortus (strain 2308)
     233        276  Escherichia coli (strain UTI89 / UPEC)
     234        272  Spinacia oleracea (Spinach)
     235        269  Gorilla gorilla gorilla (Lowland gorilla)
     236        268  Bacteriophage T4
     237        267  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
     238        265  Equus caballus (Horse)
     239        265  Xanthomonas campestris pv. campestris (strain 8004)
     240        264  Methanococcus maripaludis
     241        264  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     242        262  Haloarcula marismortui (Halobacterium marismortui)
     243        262  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
     244        262  Helicobacter hepaticus
     245        262  Bifidobacterium longum
     246        261  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
     247        261  Wigglesworthia glossinidia brevipalpis
     248        260  Dechloromonas aromatica (strain RCB)
     249        259  Cryptococcus neoformans (Filobasidiella neoformans)
     250        258  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
     251        255  Leifsonia xyli subsp. xyli
     252        254  Vaccinia virus (strain Copenhagen) (VACV)
     253        254  Gluconobacter oxydans (Gluconobacter suboxydans)
     254        252  Porphyromonas gingivalis (Bacteroides gingivalis)
     255        252  Bartonella henselae (Rochalimaea henselae)
     256        249  Campylobacter jejuni (strain RM1221)
     257        249  Pseudoalteromonas haloplanktis (strain TAC 125)
     258        248  Bacteroides fragilis
     259        245  Burkholderia pseudomallei (strain 1710b)
     260        244  Chlamydophila caviae
     261        243  Desulfotalea psychrophila
     262        242  Blochmannia floridanus
     263        241  Xanthomonas campestris pv. vesicatoria (strain 85-10)
     264        240  Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
     265        240  Lactobacillus johnsonii
     266        239  Propionibacterium acnes
     267        239  Bartonella quintana (Rochalimaea quintana)
     268        238  Ustilago maydis (Smut fungus)
     269        236  Thiobacillus denitrificans (strain ATCC 25259)
     270        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
     271        228  Oryza sativa subsp. indica (Rice)
     272        225  Francisella tularensis subsp. tularensis
     273        225  Chlamydomonas reinhardtii
     274        224  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
     275        223  Bdellovibrio bacteriovorus
     276        222  Streptococcus thermophilus (strain CNRZ 1066)
     277        221  Caenorhabditis briggsae
     278        219  Psychrobacter arcticum
     279        217  Porphyra purpurea
     280        217  Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
     281        215  Hahella chejuensis (strain KCTC 2396)
     282        214  Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
     283        214  Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
     284        214  Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
     285        212  Klebsiella pneumoniae
     286        211  Felis silvestris catus (Cat)
     287        210  Cricetulus griseus (Chinese hamster)
     288        209  Gibberella zeae (Fusarium graminearum)
     289        209  Lactobacillus acidophilus
     290        209  Treponema denticola
     291        208  Porphyra yezoensis
     292        208  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
     293        207  Sodalis glossinidius (strain morsitans)
     294        207  Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
     295        206  Thiomicrospira crunogena (strain XCL-2)
     296        202  Mesocricetus auratus (Golden hamster)
     297        202  Geobacter metallireducens (strain GS-15 / ATCC 53774 / DSM 7210)
     298        200  Encephalitozoon cuniculi
     299        200  Vaccinia virus (strain Western Reserve / WR) (VACV)


   
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           11165 (  4%)
    Bacteria         133655 ( 48%)
    Eukaryota        120226 ( 44%)
    Viruses           11210 (  4%)


   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  16891 ( 14%)           (  6%)
     Other Mammalia         37610 ( 31%)           ( 14%)
     Other Vertebrata       11423 ( 10%)           (  4%)
     Viridiplantae          20710 ( 17%)           (  7%)
     Fungi                  17864 ( 15%)           (  6%)
     Insecta                 4951 (  4%)           (  2%)
     Nematoda                3502 (  3%)           (  1%)
     Other                   7275 (  6%)           (  3%)


4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    5183             1001-1100     2299
                 51- 100   20141             1101-1200     1564
                101- 150   28762             1201-1300     1265
                151- 200   27299             1301-1400     1056
                201- 250   27917             1401-1500      866
                251- 300   24165             1501-1600      445
                301- 350   23861             1601-1700      345
                351- 400   22083             1701-1800      290
                401- 450   17289             1801-1900      270
                451- 500   14984             1901-2000      228
                501- 550   11221             2001-2100      137
                551- 600    7913             2101-2200      202
                601- 650    6653             2201-2300      189
                651- 700    4546             2301-2400      121
                701- 750    3828             2401-2500       94
                751- 800    3088             >2500          719
                801- 850    2560
                851- 900    2678
                901- 950    2046
                951-1000    1607


   The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_HUMAN (Q8WZ42): 34350 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1833


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  636
                       2x:  237
                       3x:  133
                       4x:   93
                       5x:   68
                       6x:   49
                       7x:   39
                       8x:   36
                       9x:   27
                      10x:   21
                  11- 20x:  142
                  21- 50x:  145
                  51-100x:   75
                    >100x:  132


   5.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        15088   Journal of Biological Chemistry
    2         7172   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4530   Journal of Bacteriology
    4         4262   Gene
    5         4095   Nucleic Acids Research
    6         3878   Biochemical and Biophysical Research Communications
    7         3582   FEBS Letters
    8         3333   Biochemistry
    9         3292   The EMBO Journal
   10         2946   European Journal of Biochemistry
   11         2839   Nature
   12         2837   Molecular and Cellular Biology
   13         2690   Biochimica et Biophysica Acta
   14         2556   Journal of Molecular Biology
   15         2318   Genomics
   16         2305   Cell
   17         1883   Biochemical Journal
   18         1781   Science
   19         1504   Molecular Microbiology
   20         1385   Journal of Virology
   21         1363   Plant Molecular Biology
   22         1318   Journal of Cell Biology
   23         1274   Molecular and General Genetics
   24         1137   Virology
   25         1117   Human Molecular Genetics
   26         1101   Nature Genetics
   27         1093   Genes and Development
   28         1084   Journal of Biochemistry
   29          990   Oncogene
   30          986   Plant Physiology
   31          969   The American Journal of Human Genetics
   32          838   Human Mutation
   33          833   Development
   34          805   Journal of Immunology
   35          778   Infection and Immunity
   36          774   Genetics
   37          747   Structure
   38          708   Molecular Biology of the Cell
   39          702   Yeast
   40          701   Archives of Biochemistry and Biophysics
   41          683   Journal of General Virology
   42          636   Microbiology
   43          606   Blood
   44          593   The Plant Cell
   45          571   FEMS Microbiology Letters
   46          548   Nature Structural Biology
   47          547   Molecular Cell
   48          525   Developmental Biology
   49          522   Journal of Cell Science
   50          517   Cancer Research
   51          516   Human Genetics
   52          494   Current Genetics
   53          485   The Plant Journal
   54          484   Mechanisms of Development
   55          449   Applied and Environmental Microbiology
   56          447   Current Biology
   57          437   Acta Crystallographica, Section D
   58          436   Protein Science
   59          435   Neuron
   60          432   Journal of Clinical Investigation
   61          426   Mammalian Genome
   62          409   Molecular and Biochemical Parasitology
   63          406   Journal of Neuroscience
   64          396   Molecular Endocrinology
   65          386   The Journal of Experimental Medicine
   66          372   Immunogenetics
   67          352   Journal of Neurochemistry
   68          350   Journal of Molecular Evolution
   69          343   Endocrinology
   70          342   DNA and Cell Biology
   71          334   Toxicon
   72          324   DNA Sequence
   73          314   The Journal of Clinical Endocrinology and Metabolism
   74          314   American Journal of Physiology
   75          303   Molecular Biology and Evolution
   76          293   Brain Research. Molecular Brain Research
   77          286   Biological Chemistry Hoppe-Seyler
   78          286   Bioscience, Biotechnology, and Biochemistry
   79          261   Journal of Medical Genetics
   80          252   Cytogenetics and Cell Genetics
   81          250   Comparative Biochemistry and Physiology
   82          248   Proteins
   83          242   Journal of General Microbiology
   84          231   Peptides
   85          222   Antimicrobial Agents and Chemotherapy
   86          222   Molecular Pharmacology
   87          217   Journal of Investigative Dermatology
   88          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   89          210   Biology of Reproduction
   90          208   Nature Cell Biology
   91          204   Plant and Cell Physiology
   92          199   Genome Research
   93          193   Virus Research
   94          188   Experimental Cell Research
   95          183   Molecular Plant-Microbe Interactions
   96          183   DNA Research
   97          178   European Journal of Immunology
   98          173   RNA
   99          171   Neurology
  100          167   Biochimie
  101          166   Developmental Dynamics
  102          160   Tissue Antigens
  103          158   DNA
  104          156   Molecular and Cellular Endocrinology
  105          151   European Journal of Human Genetics
  106          150   American Journal of Medical Genetics
  107          150   Molecular Phylogenetics and Evolution
  108          149   Hemoglobin
  109          147   Planta
  110          146   Annals of Neurology
  111          145   Bioorganicheskaia Khimiia
  112          145   Genes to Cells
  113          144   Archives of Microbiology
  114          140   Journal of Human Genetics
  115          137   Insect Biochemistry and Molecular Biology
  116          136   Immunity
  117          130   Developmental Cell
  118          127   Animal Genetics
  119          124   Molecular Reproduction and Development
  120          123   Diabetes
  121          119   General and Comparative Endocrinology
  122          118   Agricultural and Biological Chemistry
  123          118   Glycobiology
  124          117   The New England Journal of Medicine
  125          116   Molecular Immunology
  126          116   Investigative Ophthalmology and Visual Science
  127          109   Molecular and Cellular Neuroscience
  128          107   Eukaryotic cell
  129          106   British Journal of Haematology
  130          106   Journal of Protein Chemistry
  131          102   International Journal of Cancer
  132          102   Archives of Virology


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     550759              1.99
   Journal                          478204    245229    1.73
   Submitted to EMBL/GenBank/DDBJ    67287     59130    0.24
   Submitted to other databases       2988      2775    0.01
   Unpublished observations            633       627   <0.01
   Book citation                       578       568   <0.01
   Plant Gene Register                 537       525   <0.01
   Thesis                              389       387   <0.01
   Patent                              137       135   <0.01
   Worm Breeder's Gazette                6         6   <0.01

Comments (CC)                      1128537              4.09
   SIMILARITY                       318972    253893    1.15
   FUNCTION                         195810    188699    0.71
   SUBCELLULAR LOCATION             152959    152959    0.55
   SUBUNIT                          105026    105026    0.38
   CATALYTIC ACTIVITY               102651     94032    0.37
   PATHWAY                           54987     46371    0.20
   COFACTOR                          41486     37238    0.15
   TISSUE SPECIFICITY                26128     26128    0.09
   PTM                               23627     19322    0.09
   MISCELLANEOUS                     22651     20422    0.08
   DOMAIN                            18132     15735    0.07
   ALTERNATIVE PRODUCTS              12699     12699    0.05
   INTERACTION                        8120      8120    0.03
   SEQUENCE CAUTION                   7633      7633    0.03
   INDUCTION                          7512      7512    0.03
   DEVELOPMENTAL STAGE                6587      6587    0.02
   ENZYME REGULATION                  4821      4821    0.02
   WEB RESOURCE                       4281      3472    0.02
   DISEASE                            3888      2742    0.01
   CAUTION                            3659      3580    0.01
   MASS SPECTROMETRY                  3008      2435    0.01
   BIOPHYSICOCHEMICAL PROPERTIES      1793      1793    0.01
   POLYMORPHISM                        611       594   <0.01
   RNA EDITING                         496       496   <0.01
   ALLERGEN                            422       422   <0.01
   TOXIC DOSE                          331       324   <0.01
   BIOTECHNOLOGY                       174       174   <0.01
   PHARMACEUTICAL                       73        73   <0.01

Features (FT)                      1860993              6.74
   CHAIN                            281223    272612    1.02
   TRANSMEM                         179620     39486    0.65
   METAL                            118481     29154    0.43
   CONFLICT                          95800     33149    0.35
   DOMAIN                            93021     52258    0.34
   STRAND                            92321      8670    0.33
   TOPO_DOM                          89201     18184    0.32
   HELIX                             88720      9107    0.32
   CARBOHYD                          79407     20018    0.29
   DISULFID                          77002     19621    0.28
   BINDING                           73356     25193    0.27
   MOD_RES                           72933     27627    0.26
   ACT_SITE                          66177     38366    0.24
   REPEAT                            60127      9182    0.22
   VARIANT                           48775     10403    0.18
   NP_BIND                           43766     31429    0.16
   REGION                            42789     22999    0.15
   VAR_SEQ                           27872     11982    0.10
   COMPBIAS                          27816     16202    0.10
   SIGNAL                            25883     25873    0.09
   TURN                              23765      7394    0.09
   MUTAGEN                           20666      5030    0.07
   ZN_FING                           20432      7801    0.07
   MOTIF                             19329     12719    0.07
   SITE                              17097      9484    0.06
   INIT_MET                          11353     11353    0.04
   COILED                            10903      7114    0.04
   NON_TER                           10693      8198    0.04
   PROPEP                             8294      6993    0.03
   LIPID                              8074      5202    0.03
   DNA_BIND                           7324      6770    0.03
   PEPTIDE                            6823      4279    0.02
   TRANSIT                            4813      4762    0.02
   CA_BIND                            2766      1151    0.01
   CROSSLNK                           2299      1592    0.01
   NON_CONS                           1308       536   <0.01
   UNSURE                              502       188   <0.01
   SE_CYS                              262       189   <0.01

Cross-references (DR)              3933954             14.24
   InterPro                         629121    253999    2.28
   EMBL                             523989    267709    1.90
   Pfam                             354417    247932    1.28
   GO                               351705    143924    1.27
   PROSITE                          261031    158819    0.94
   KEGG                             184994    167349    0.67
   GenomeReviews                    159422    142494    0.58
   HAMAP                            111335    111212    0.40
   TIGRFAMs                         108864    101865    0.39
   PIR                              107524     98237    0.39
   PRINTS                            96822     76679    0.35
   BioCyc                            94765     87112    0.34
   SMART                             83386     63657    0.30
   PANTHER                           83290     77583    0.30
   HSSP                              81747     81747    0.30
   ProDom                            74589     72198    0.27
   Gene3D                            73453     62434    0.27
   UniGene                           67015     61114    0.24
   Ensembl                           55170     55140    0.20
   GermOnline                        42027     41411    0.15
   PDB                               38787     10559    0.14
   ArrayExpress                      37729     37729    0.14
   SMR                               36791     36791    0.13
   RZPD-ProtExp                      28789     13542    0.10
   PIRSF                             26598     25658    0.10
   TIGR                              24567     23936    0.09
   LinkHub                           17927     17910    0.06
   HGNC                              16366     16288    0.06
   PharmGKB                          14775     14775    0.05
   IntAct                            14626     14626    0.05
   MGI                               13440     13393    0.05
   MIM                               13419     10760    0.05
   DIP                                8837      8787    0.03
   SGD                                6236      6148    0.02
   CYGD                               6224      6134    0.02
   RGD                                6110      6106    0.02
   TAIR                               6017      5906    0.02
   MEROPS                             5517      5204    0.02
   PeptideAtlas                       5026      5026    0.02
   EcoGene                            4311      4308    0.02
   EchoBASE                           4158      4126    0.02
   H-InvDB                            3677      3659    0.01
   WormPep                            3666      3037    0.01
   FlyBase                            3341      3213    0.01
   WormBase                           3320      3238    0.01
   Gramene                            3294      3294    0.01
   GeneDB_Spombe                      3262      3227    0.01
   TRANSFAC                           2895      2599    0.01
   SubtiList                          2797      2796    0.01
   Reactome                           2706      1546    0.01
   Orphanet                           2564      1641    0.01
   GeneFarm                           2021      2001    0.01
   DrugBank                           1826       502    0.01
   StyGene                            1599      1595    0.01
   HPA                                1486      1324    0.01
   TubercuList                        1447      1411    0.01
   ZFIN                               1415      1403    0.01
   SWISS-2DPAGE                       1183      1183   <0.01
   PseudoCAP                          1165      1156   <0.01
   ListiList                          1097      1089   <0.01
   REPRODUCTION-2DPAGE                 836       836   <0.01
   AGD                                 650       644   <0.01
   Leproma                             636       633   <0.01
   PhotoList                           605       605   <0.01
   LegioList                           572       572   <0.01
   MaizeGDB                            460       455   <0.01
   DisProt                             395       392   <0.01
   OGP                                 379       378   <0.01
   PeroxiBase                          378       367   <0.01
   REBASE                              364       358   <0.01
   HIV                                 361       351   <0.01
   DictyBase                           355       352   <0.01
   ECO2DBASE                           351       299   <0.01
   SagaList                            349       348   <0.01
   GlycoSuiteDB                        282       282   <0.01
   PHCI-2DPAGE                         241       241   <0.01
   MypuList                            193       193   <0.01
   DOSAC-COBS-2DPAGE                   149       147   <0.01
   Aarhus/Ghent-2DPAGE                 128        98   <0.01
   Siena-2DPAGE                        103       103   <0.01
   HSC-2DPAGE                           85        85   <0.01
   PhosSite                             70        70   <0.01
   Cornea-2DPAGE                        67        67   <0.01
   COMPLUYEAST-2DPAGE                   59        59   <0.01
   euHCVdb                              55        44   <0.01
   PMMA-2DPAGE                          52        52   <0.01
   PptaseDB                             29        29   <0.01
   Rat-heart-2DPAGE                     28        28   <0.01
   ANU-2DPAGE                           25        25   <0.01
   BuruList                             20        20   <0.01

Number of explicitly cross-referenced databases: 91
Number of implicitly cross-referenced databases: 26


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 244404

Total number of entries encoded on a Mitochondrion: 4312
Total number of entries encoded on a Plasmid: 3339
Total number of entries encoded on a Plastid: 28
Total number of entries encoded on a Plastid; Apicoplast: 10
Total number of entries encoded on a Plastid; Chloroplast: 8493
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 108

Number of fragments: 8342 
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 20944



UniProtKB/TrEMBL protein database release 37.0 statistics

1.  INTRODUCTION

Release 37.0 of 24-July-2007 of UniProtKB/TrEMBL contains 4672908 sequence entries
comprising 1515982311 amino acids.

407721 sequences have been added since release 36, the sequence data of
5983 existing entries has been updated and the annotations of
4265187 entries have been revised. This represents an increase of 10%.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 8.57   Gln (Q) 3.92   Leu (L) 9.86   Ser (S) 6.79
   Arg (R) 5.57   Glu (E) 6.05   Lys (K) 5.20   Thr (T) 5.58
   Asn (N) 4.20   Gly (G) 7.06   Met (M) 2.40   Trp (W) 1.33
   Asp (D) 5.27   His (H) 2.22   Phe (F) 4.04   Tyr (Y) 3.01
   Cys (C) 1.34   Ile (I) 5.91   Pro (P) 4.83   Val (V) 6.66

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 137885

   The first twenty species represent  832320 sequences:  17.8 % of the
   total number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x:62883
                            2x:25898
                            3x:13175
                            4x: 7414
                            5x: 4369
                            6x: 3205
                            7x: 2336
                            8x: 1966
                            9x: 1552
                           10x: 1686
                       11- 20x: 7515
                       21- 50x: 2861
                       51-100x: 1184
                         >100x: 1841



   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     189486  Human immunodeficiency virus 1
       2      95759  Oryza sativa subsp. japonica (Rice)
       3      54604  Homo sapiens (Human)
       4      50189  Trichomonas vaginalis G3
       5      49337  Mus musculus (Mouse)
       6      43819  Arabidopsis thaliana (Mouse-ear cress)
       7      39845  Paramecium tetraurelia
       8      39334  Oryza sativa subsp. indica (Rice)
       9      37589  Hepatitis C virus
      10      28042  Tetraodon nigroviridis (Green puffer)
      11      26752  Drosophila melanogaster (Fruit fly)
      12      24811  Vitis vinifera (Grape)
      13      22320  Medicago truncatula (Barrel medic)
      14      20567  Danio rerio (Zebrafish) (Brachydanio rerio)
      15      20421  Trypanosoma cruzi
      16      20391  Caenorhabditis elegans
      17      19091  uncultured bacterium
      18      16854  Aedes aegypti (Yellowfever mosquito)
      19      16685  Tetrahymena thermophila SB210
      20      16424  Phaeosphaeria nodorum (Septoria nodorum)
      21      15262  Hepatitis B virus (HBV)
      22      14672  Plasmodium chabaudi
      23      14095  Aspergillus niger
      24      13408  Anopheles gambiae str. PEST
      25      13061  Dictyostelium discoideum AX4
      26      13056  Caenorhabditis briggsae
      27      12801  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
      28      12597  Xenopus laevis (African clawed frog)
      29      12006  Aspergillus oryzae
      30      11783  Plasmodium berghei
      31      10977  Chaetomium globosum (Soil fungus)
      32      10630  Neurospora crassa
      33      10399  Coccidioides immitis
      34      10354  Neosartorya fischeri  (Aspergillus fischerianus 
      35      10345  Aspergillus terreus (strain NIH 2624)
      36      10074  Drosophila pseudoobscura (Fruit fly)
      37       9727  Cryptococcus neoformans (Filobasidiella neoformans)
      38       9721  Schistosoma japonicum (Blood fluke)
      39       9704  Aspergillus fumigatus (Sartorya fumigata)
      40       9503  Emericella nidulans (Aspergillus nidulans)
      41       9461  Trypanosoma brucei
      42       9318  Candida albicans (Yeast)
      43       9153  Escherichia coli
      44       9145  Bos taurus (Bovine)
      45       9064  Aspergillus clavatus
      46       8975  Rhodococcus sp. (strain RHA1)
      47       8512  Stigmatella aurantiaca DW4/3-1
      48       8437  Plesiocystis pacifica SIR-1
      49       8422  Rattus norvegicus (Rat)
      50       8413  Burkholderia xenovorans (strain LB400)
      51       8249  Microscilla marina ATCC 23134
      52       8126  Bradyrhizobium japonicum
      53       8010  Leishmania infantum
      54       7976  Ostreococcus tauri
      55       7939  Helicobacter pylori (Campylobacter pylori)
      56       7937  Frankia sp. EAN1pec
      57       7877  Leishmania braziliensis
      58       7834  Burkholderia phymatum STM815
      59       7808  Plasmodium yoelii yoelii
      60       7745  Solibacter usitatus (strain Ellin6076)
      61       7559  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
      62       7521  Streptomyces coelicolor
      63       7501  Hepatitis C virus subtype 1b
      64       7461  Burkholderia cenocepacia MC0-3
      65       7432  Burkholderia sp. (strain 383) (Burkholderia cepacia 
      66       7409  Burkholderia vietnamiensis G4
      67       7403  Ostreococcus lucimarinus CCE9901
      68       7349  Burkholderia pseudomallei 305
      69       7336  Plasmodium vivax
      70       7310  Burkholderia phytofirmans PsJN
      71       7305  Streptomyces avermitilis
      72       7215  Burkholderia pseudomallei (strain 668)
      73       7190  Myxococcus xanthus (strain DK 1622)
      74       7171  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      75       7141  Saccharopolyspora erythraea (strain NRRL 23338)
      76       7138  Burkholderia pseudomallei (strain 1106a)
      77       7136  Rhizobium loti (Mesorhizobium loti)
      78       7109  Leishmania major
      79       6996  Burkholderia ambifaria MC40-6
      80       6974  Methylobacterium sp. 4-46
      81       6970  Rhizobium leguminosarum bv. viciae (strain 3841)
      82       6953  Rhodopirellula baltica
      83       6932  Pseudomonas aeruginosa
      84       6911  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      85       6850  Burkholderia cenocepacia (strain HI2424)
      86       6700  Bradyrhizobium sp. (strain ORS278)
      87       6687  Frankia alni (strain ACN14a)
      88       6679  Psychroflexus torquis ATCC 700755
      89       6606  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      90       6587  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
      91       6564  Burkholderia multivorans ATCC 17616
      92       6561  Burkholderia cepacia (strain ATCC 53795 / AMMD)
      93       6553  Plasmodium falciparum
      94       6537  Hahella chejuensis (strain KCTC 2396)
      95       6482  Ralstonia eutropha  (Cupriavidus necator 
      96       6463  Planctomyces maris DSM 8797
      97       6457  Ustilago maydis (Smut fungus)
      98       6412  Cyanothece sp. CCY 0110
      99       6394  Giardia lamblia ATCC 50803
     100       6337  Sinorhizobium medicae WSM419


  
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           99441 (  2%)
    Bacteria        2542475 ( 54%)
    Eukaryota       1507950 ( 32%)
    Viruses          519029 ( 11%)
    Other              4011 ( <1%)



   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  54604 (  4%)           (  1%)
     Other Mammalia        129446 (  9%)           (  3%)
     Other Vertebrata      180006 ( 12%)           (  4%)
     Viridiplantae         383788 ( 25%)           (  8%)
     Fungi                 238250 ( 16%)           (  5%)
     Insecta               147593 ( 10%)           (  3%)
     Nematoda               37432 (  2%)           (  1%)
     Other                 336831 ( 22%)           (  7%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50   68917             1001-1100    28568
                 51- 100  325538             1101-1200    20000
                101- 150  406073             1201-1300    13794
                151- 200  388378             1301-1400     9199
                201- 250  389089             1401-1500     7508
                251- 300  373155             1501-1600     5523
                301- 350  346516             1601-1700     4222
                351- 400  272777             1701-1800     3452
                401- 450  224745             1801-1900     2632
                451- 500  188831             1901-2000     2218
                501- 550  135570             2001-2100     1760
                551- 600  100144             2101-2200     1822
                601- 650   75262             2201-2300     1418
                651- 700   58398             2301-2400     1150
                701- 750   51071             2401-2500      939
                751- 800   45518             >2500         8013
                801- 850   33704
                851- 900   29503
                901- 950   21777
                951-1000   17156

   The average sequence length in UniProtKB/TrEMBL is   324 amino acids.

   The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    6509095              1.39
   Submitted to EMBL/GenBank/DDBJ  3467141   2744405    0.74
   Journal                         2916615   2494170    0.62
   Thesis                             6588      6534   <0.01
   Submitted to other databases       4346      4340   <0.01
   Book citation                      4284      4239   <0.01
   Other                            110121     78245    0.02

Comments (CC)                      3245182              0.69
   CAUTION                         1035264   1035264    0.22
   SIMILARITY                      1028284    934241    0.22
   CATALYTIC ACTIVITY               309925    287313    0.07
   FUNCTION                         268909    267080    0.06
   SUBCELLULAR LOCATION             254968    252079    0.05
   SUBUNIT                          130740    130288    0.03
   PATHWAY                          112447    103550    0.02
   COFACTOR                          96274     93156    0.02
   MISCELLANEOUS                      5231      5231   <0.01
   INTERACTION                        2613      2613   <0.01
   DOMAIN                              523       521   <0.01
   MASS SPECTROMETRY                     4         4   <0.01

Features (FT)                      2030625              0.43
   NON_TER                         1687691   1006361    0.36
   CHAIN                            202951    171791    0.04
   SIGNAL                           139437    139437    0.03
   TRANSIT                             546       546   <0.01

Cross-references (DR)             32363734              6.93
   InterPro                        6492321   3104830    1.39
   GO                              5725036   2029244    1.23
   EMBL                            5279734   4660870    1.13
   Pfam                            4003018   2955678    0.86
   PROSITE                         2123457   1385572    0.45
   GenomeReviews                   1140859   1096509    0.24
   Gene3D                           921242    810969    0.20
   KEGG                             867649    830073    0.19
   PRINTS                           842539    707055    0.18
   SMART                            747946    584038    0.16
   TIGRFAMs                         651635    598676    0.14
   PANTHER                          569104    543361    0.12
   ProDom                           523096    499091    0.11
   SMR                              394475    394475    0.08
   BioCyc                           309825    296969    0.07
   HSSP                             270097    269694    0.06
   UniGene                          238789    220780    0.05
   PIR                              190628    155545    0.04
   TIGR                             179285    172649    0.04
   Ensembl                          158571    158482    0.03
   PIRSF                            151109    144392    0.03
   RZPD-ProtExp                     106571     32439    0.02
   ArrayExpress                      93384     93299    0.02
   Gramene                           70587     70587    0.02
   MGI                               41758     41081    0.01
   HGNC                              36138     36100    0.01
   FlyBase                           34546     34449    0.01
   euHCVdb                           32511     32511    0.01
   WormPep                           19084     19003   <0.01
   TAIR                              18749     18701   <0.01
   WormBase                          18556     18477   <0.01
   ZFIN                              15139     15135   <0.01
   LinkHub                           13340     13340   <0.01
   DictyBase                         12907     12907   <0.01
   MEROPS                            11596     11163   <0.01
   LegioList                          5331      5301   <0.01
   IntAct                             5216      5216   <0.01
   ListiList                          4718      4701   <0.01
   PseudoCAP                          4405      4402   <0.01
   PDB                                4271      2576   <0.01
   BuruList                           4220      4186   <0.01
   PhotoList                          4075      3951   <0.01
   AGD                                4049      4049   <0.01
   RGD                                3721      3407   <0.01
   REBASE                             3691      3666   <0.01
   TubercuList                        2542      2536   <0.01
   DIP                                2484      2479   <0.01
   SagaList                           1745      1651   <0.01
   GeneDB_Spombe                      1729      1717   <0.01
   PeroxiBase                         1359      1356   <0.01
   PharmGKB                           1347      1346   <0.01
   Leproma                             971       970   <0.01
   TRANSFAC                            863       853   <0.01
   MypuList                            589       585   <0.01
   PeptideAtlas                        376       376   <0.01
   SGD                                 372       371   <0.01
   PHCI-2DPAGE                         106       106   <0.01
   CYGD                                101        98   <0.01
   ANU-2DPAGE                           59        59   <0.01
   Reactome                             45        32   <0.01
   SWISS-2DPAGE                         35        35   <0.01
   REPRODUCTION-2DPAGE                  27        27   <0.01
   PMMA-2DPAGE                           3         3   <0.01
   Siena-2DPAGE                          2         2   <0.01
   COMPLUYEAST-2DPAGE                    1         1   <0.01

Number of explicitly cross-referenced databases: 91


6.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/TrEMBL: 251813

Total number of entries encoded on a Mitochondrion: 171426
Total number of entries encoded on a Plasmid: 73689
Total number of entries encoded on a Plastid: 3656
Total number of entries encoded on a Plastid; Apicoplast: 181
Total number of entries encoded on a Plastid; Chloroplast: 59449
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 210

Number of fragments: 1008568


Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: swiss-prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929