UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Release notes
UniProtKB release 11.0 of 29-May-2007

Content

  Introduction
  UniProtKB/Swiss-Prot Protein Knowledgebase release statistics
  UniProtKB/TrEMBL Protein Database release statistics

  Submissions and Updates
  Download information
  Contact
  Citation

  Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.

Introduction

Release 11.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 53.0 and the UniProtKB/TrEMBL Protein Database release 36.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.


UniProtKB/Swiss-Prot protein knowledgebase release 53.0 statistics

Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269293 sequence entries, comprising 98902758 amino acids abstracted from 156204 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062
53.0 05/07 269'293 98'902'758

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 5'771
C.albicans None yet calbican.txt 621
C.elegans Wormpep celegans.txt 3'113
D.discoideum DictyBase dicty.txt 357
D.melanogaster FlyBase fly.txt 2'659
M.musculus MGD mgdtosp.txt 13'155
S.cerevisiae SGD yeast.txt 6'240
S.pombe GeneDB_SPombe pombe.txt 3'232

UniProtKB/Swiss-Prot release statistics

1.  INTRODUCTION

Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269293 sequence entries,
comprising 98902758 amino acids abstracted from 156204 references. 

9228 sequences have been added since release 52.0, the sequence data of
734 existing entries has been updated and the annotations of
210454 entries have been revised.


2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 7.85   Gln (Q) 3.96   Leu (L) 9.66   Ser (S) 6.87
   Arg (R) 5.42   Glu (E) 6.67   Lys (K) 5.93   Thr (T) 5.40
   Asn (N) 4.12   Gly (G) 6.94   Met (M) 2.39   Trp (W) 1.13
   Asp (D) 5.34   His (H) 2.29   Phe (F) 3.95   Tyr (Y) 3.01
   Cys (C) 1.50   Ile (I) 5.89   Pro (P) 4.84   Val (V) 6.72

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Arg, Thr, Asp, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 10917

   The first twenty species represent 84159 sequences:  31.3 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 5200
                            2x: 1661
                            3x:  801
                            4x:  529
                            5x:  359
                            6x:  332
                            7x:  225
                            8x:  194
                            9x:  168
                           10x:   89
                       11- 20x:  451
                       21- 50x:  329
                       51-100x:  170
                         >100x:  409


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      16602  Homo sapiens (Human)
       2      13316  Mus musculus (Mouse)
       3       6163  Saccharomyces cerevisiae (Baker's yeast)
       4       6119  Rattus norvegicus (Rat)
       5       5706  Arabidopsis thaliana (Mouse-ear cress)
       6       4930  Escherichia coli
       7       4025  Bos taurus (Bovine)
       8       3188  Schizosaccharomyces pombe (Fission yeast)
       9       3032  Caenorhabditis elegans
      10       2854  Bacillus subtilis
      11       2545  Drosophila melanogaster (Fruit fly)
      12       2008  Xenopus laevis (African clawed frog)
      13       1885  Escherichia coli O157:H7
      14       1782  Methanococcus jannaschii
      15       1774  Haemophilus influenzae
      16       1762  Pongo pygmaeus (Orangutan)
      17       1752  Gallus gallus (Chicken)
      18       1636  Salmonella typhimurium
      19       1552  Escherichia coli O6
      20       1528  Shigella flexneri
      21       1418  Mycobacterium tuberculosis
      22       1332  Danio rerio (Zebrafish) (Brachydanio rerio)
      23       1232  Salmonella typhi
      24       1223  Pseudomonas aeruginosa
      25       1195  Sus scrofa (Pig)
      26       1159  Mycobacterium bovis
      27       1077  Oryza sativa subsp. japonica (Rice)
      28        978  Synechocystis sp. (strain PCC 6803)
      29        971  Archaeoglobus fulgidus
      30        906  Yersinia pestis
      31        892  Vibrio cholerae
      32        884  Mimivirus
      33        884  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      34        879  Rhizobium meliloti (Sinorhizobium meliloti)
      35        838  Oryctolagus cuniculus (Rabbit)
      36        796  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      37        794  Staphylococcus aureus (strain N315)
      38        770  Staphylococcus aureus (strain MW2)
      39        770  Staphylococcus aureus (strain COL)
      40        766  Staphylococcus aureus (strain MSSA476)
      41        759  Staphylococcus aureus (strain MRSA252)
      42        756  Aquifex aeolicus
      43        738  Vibrio parahaemolyticus
      44        738  Pasteurella multocida
      45        714  Canis familiaris (Dog)
      46        687  Streptomyces coelicolor
      47        687  Mycoplasma pneumoniae
      48        682  Vibrio vulnificus
      49        674  Bacillus halodurans
      50        663  Vibrio vulnificus (strain YJ016)
      51        647  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      52        645  Staphylococcus epidermidis (strain ATCC 12228)
      53        633  Mycobacterium leprae
      54        631  Anabaena sp. (strain PCC 7120)
      55        629  Neurospora crassa
      56        621  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      57        619  Yersinia pseudotuberculosis
      58        618  Bacillus anthracis
      59        618  Pseudomonas syringae pv. tomato
      60        617  Candida albicans (Yeast)
      61        614  Pseudomonas putida (strain KT2440)
      62        612  Treponema pallidum
      63        611  Pan troglodytes (Chimpanzee)
      64        602  Photorhabdus luminescens subsp. laumondii
      65        598  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      66        591  Zea mays (Maize)
      67        588  Methanobacterium thermoautotrophicum
      68        588  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      69        582  Bradyrhizobium japonicum
      70        579  Rickettsia prowazekii
      71        577  Salmonella paratyphi-a
      72        574  Helicobacter pylori (Campylobacter pylori)
      73        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
      74        572  Ralstonia solanacearum (Pseudomonas solanacearum)
      75        571  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      76        562  Buchnera aphidicola subsp. Schizaphis graminum
      77        559  Rhizobium loti (Mesorhizobium loti)
      78        559  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      79        555  Lactococcus lactis subsp. lactis (Streptococcus lactis)
      80        555  Helicobacter pylori J99 (Campylobacter pylori J99)
      81        550  Listeria monocytogenes
      82        542  Bacillus cereus (strain ATCC 14579 / DSM 31)
      83        542  Listeria innocua
      84        541  Xanthomonas campestris pv. campestris
      85        539  Shewanella oneidensis
      86        535  Candida glabrata (Yeast) (Torulopsis glabrata)
      87        530  Neisseria meningitidis serogroup A
      88        530  Neisseria meningitidis serogroup B
      89        519  Clostridium acetobutylicum
      90        517  Caulobacter crescentus (Caulobacter vibrioides)
      91        507  Buchnera aphidicola subsp. Baizongia pistaciae
      92        507  Xanthomonas axonopodis pv. citri
      93        494  Brucella suis
      94        493  Brucella melitensis
      95        492  Streptococcus pneumoniae
      96        490  Salmonella choleraesuis
      97        489  Thermotoga maritima
      98        485  Oceanobacillus iheyensis
      99        483  Mycoplasma genitalium
     100        482  Listeria monocytogenes serotype 4b (strain F2365)
     101        481  Rickettsia conorii
     102        481  Xylella fastidiosa
     103        472  Photobacterium profundum (Photobacterium sp. (strain SS9))
     104        472  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
     105        467  Deinococcus radiodurans
     106        467  Haemophilus ducreyi
     107        458  Methanosarcina acetivorans
     108        456  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     109        454  Corynebacterium glutamicum (Brevibacterium flavum)
     110        454  Clostridium perfringens
     111        448  Bacillus cereus (strain ATCC 10987)
     112        446  Pyrococcus horikoshii
     113        443  Bordetella parapertussis
     114        443  Emericella nidulans (Aspergillus nidulans)
     115        442  Bordetella pertussis
     116        441  Pyrococcus abyssi
     117        440  Halobacterium salinarium (Halobacterium halobium)
     118        438  Chromobacterium violaceum
     119        437  Methanosarcina mazei (Methanosarcina frisia)
     120        436  Yarrowia lipolytica (Candida lipolytica)
     121        435  Chlamydia trachomatis
     122        434  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     123        432  Rickettsia felis (Rickettsia azadi)
     124        425  Borrelia burgdorferi (Lyme disease spirochete)
     125        423  Lactobacillus plantarum
     126        422  Thermoanaerobacter tengcongensis
     127        421  Nicotiana tabacum (Common tobacco)
     128        421  Pyrococcus furiosus
     129        419  Synechococcus elongatus (Thermosynechococcus elongatus)
     130        419  Rickettsia bellii (strain RML369-C)
     131        417  Streptococcus pyogenes serotype M6
     132        416  Ovis aries (Sheep)
     133        416  Chlamydia pneumoniae (Chlamydophila pneumoniae)
     134        414  Enterococcus faecalis (Streptococcus faecalis)
     135        413  Streptococcus mutans
     136        413  Bacillus thuringiensis subsp. konkukian
     137        412  Campylobacter jejuni
     138        412  Streptomyces avermitilis
     139        408  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     140        406  Chlamydia muridarum
     141        406  Rhizobium sp. (strain NGR234)
     142        397  Streptococcus pyogenes serotype M1
     143        397  Sulfolobus solfataricus
     144        394  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
     145        393  Streptococcus pyogenes serotype M18
     146        391  Streptococcus pyogenes serotype M3
     147        390  Rickettsia typhi
     148        389  Staphylococcus haemolyticus (strain JCSC1435)
     149        388  Shigella sonnei (strain Ss046)
     150        383  Acinetobacter sp. (strain ADP1)
     151        383  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     152        382  Bacillus cereus (strain ZK / E33L)
     153        378  Staphylococcus saprophyticus subsp. saprophyticus 
     154        374  Rhodopseudomonas palustris
     155        373  Chlorobium tepidum
     156        372  Nitrosomonas europaea
     157        370  Corynebacterium efficiens
     158        369  Vibrio fischeri (strain ATCC 700601 / ES114)
     159        368  Bacillus clausii (strain KSM-K16)
     160        368  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
     161        364  Shigella boydii serotype 4 (strain Sb227)
     162        359  Methanopyrus kandleri
     163        359  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     164        357  Mannheimia succiniciproducens (strain MBEL55E)
     165        356  Burkholderia mallei (Pseudomonas mallei)
     166        354  Gloeobacter violaceus
     167        351  Leptospira interrogans
     168        349  Aeropyrum pernix
     169        348  Shigella dysenteriae serotype 1 (strain Sd197)
     170        348  Streptococcus agalactiae serotype III
     171        345  Streptococcus agalactiae serotype V
     172        344  Dictyostelium discoideum (Slime mold)
     173        341  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
     174        340  Solanum tuberosum (Potato)
     175        340  Pisum sativum (Garden pea)
     176        338  Methylococcus capsulatus
     177        338  Synechococcus sp. (strain WH8102)
     178        336  Geobacillus kaustophilus
     179        334  Sulfolobus tokodaii
     180        332  Prochlorococcus marinus (strain MIT 9313)
     181        332  Glycine max (Soybean)
     182        331  Prochlorococcus marinus
     183        325  Mycobacterium paratuberculosis
     184        324  Staphylococcus aureus
     185        324  Aspergillus fumigatus (Sartorya fumigata)
     186        323  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     187        321  Brucella abortus
     188        320  Idiomarina loihiensis
     189        320  Rhodopirellula baltica
     190        318  Macaca mulatta (Rhesus macaque)
     191        317  Geobacter sulfurreducens
     192        317  Staphylococcus aureus (strain NCTC 8325)
     193        317  Pseudomonas syringae pv. syringae (strain B728a)
     194        317  Thermoplasma acidophilum
     195        315  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
     196        314  Coxiella burnetii
     197        313  Fusobacterium nucleatum subsp. nucleatum
     198        312  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
     199        310  Triticum aestivum (Wheat)
     200        300  Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
     201        299  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     202        297  Nocardia farcinica
     203        297  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     204        297  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
     205        296  Staphylococcus aureus (strain bovine RF122)
     206        295  Wolinella succinogenes
     207        295  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
     208        293  Zymomonas mobilis
     209        293  Bacteroides thetaiotaomicron
     210        292  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
     211        291  Sulfolobus acidocaldarius
     212        288  Clostridium tetani
     213        287  Symbiobacterium thermophilum
     214        287  Pseudomonas putida
     215        287  Haemophilus influenzae (strain 86-028NP)
     216        287  Silicibacter pomeroyi
     217        287  Legionella pneumophila subsp. pneumophila 
     218        287  Pseudomonas fluorescens (strain PfO-1)
     219        286  Xanthomonas oryzae pv. oryzae
     220        285  Pyrobaculum aerophilum
     221        284  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
     222        283  Cavia porcellus (Guinea pig)
     223        283  Legionella pneumophila (strain Paris)
     224        283  Hordeum vulgare (Barley)
     225        282  Thermoplasma volcanium
     226        281  Legionella pneumophila (strain Lens)
     227        279  Staphylococcus aureus (strain USA300)
     228        279  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     229        278  Corynebacterium diphtheriae
     230        276  Burkholderia sp. (strain 383) (Burkholderia cepacia 
     231        273  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
     232        269  Gorilla gorilla gorilla (Lowland gorilla)
     233        269  Spinacia oleracea (Spinach)
     234        268  Bacteriophage T4
     235        264  Equus caballus (Horse)
     236        263  Methanococcus maripaludis
     237        262  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
     238        262  Xanthomonas campestris pv. campestris (strain 8004)
     239        261  Helicobacter hepaticus
     240        261  Bifidobacterium longum
     241        260  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
     242        260  Wigglesworthia glossinidia brevipalpis
     243        259  Haloarcula marismortui (Halobacterium marismortui)
     244        259  Oryza sativa (Rice)
     245        258  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
     246        257  Dechloromonas aromatica (strain RCB)
     247        255  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     248        255  Leifsonia xyli subsp. xyli
     249        254  Vaccinia virus (strain Copenhagen) (VACV)
     250        253  Gluconobacter oxydans (Gluconobacter suboxydans)
     251        252  Porphyromonas gingivalis (Bacteroides gingivalis)
     252        251  Brucella abortus (strain 2308)
     253        250  Bartonella henselae (Rochalimaea henselae)
     254        247  Bacteroides fragilis
     255        247  Campylobacter jejuni (strain RM1221)
     256        245  Cryptococcus neoformans (Filobasidiella neoformans)
     257        244  Chlamydophila caviae
     258        243  Desulfotalea psychrophila
     259        242  Pseudoalteromonas haloplanktis (strain TAC 125)
     260        241  Blochmannia floridanus
     261        241  Burkholderia pseudomallei (strain 1710b)
     262        240  Lactobacillus johnsonii
     263        238  Propionibacterium acnes
     264        237  Xanthomonas campestris pv. vesicatoria (strain 85-10)
     265        237  Bartonella quintana (Rochalimaea quintana)
     266        236  Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
     267        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
     268        233  Thiobacillus denitrificans (strain ATCC 25259)
     269        232  Escherichia coli (strain UTI89 / UPEC)
     270        227  Ustilago maydis (Smut fungus)
     271        225  Chlamydomonas reinhardtii
     272        224  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
     273        222  Francisella tularensis subsp. tularensis
     274        222  Streptococcus thermophilus (strain CNRZ 1066)
     275        221  Bdellovibrio bacteriovorus
     276        217  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
     277        217  Porphyra purpurea
     278        216  Psychrobacter arcticum
     279        213  Caenorhabditis briggsae
     280        212  Klebsiella pneumoniae
     281        212  Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
     282        212  Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
     283        211  Felis silvestris catus (Cat)
     284        209  Cricetulus griseus (Chinese hamster)
     285        209  Gibberella zeae (Fusarium graminearum)
     286        209  Lactobacillus acidophilus
     287        209  Treponema denticola
     288        209  Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
     289        207  Porphyra yezoensis
     290        203  Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
     291        202  Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
     292        202  Mesocricetus auratus (Golden hamster)
     293        200  Vaccinia virus (strain Western Reserve / WR) (VACV)
     294        200  Thiomicrospira crunogena (strain XCL-2)


   
   3.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea           10967 (  4%)
    Bacteria         130080 ( 48%)
    Eukaryota        117170 ( 44%)
    Viruses           11076 (  4%)


   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  16603 ( 14%)           (  6%)
     Other Mammalia         36858 ( 31%)           ( 14%)
     Other Vertebrata       10970 (  9%)           (  4%)
     Viridiplantae          19863 ( 17%)           (  7%)
     Fungi                  17340 ( 15%)           (  6%)
     Insecta                 4888 (  4%)           (  2%)
     Nematoda                3486 (  3%)           (  1%)
     Other                   7162 (  6%)           (  3%)


4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    5095             1001-1100     2259
                 51- 100   19534             1101-1200     1515
                101- 150   27896             1201-1300     1220
                151- 200   26732             1301-1400     1003
                201- 250   26947             1401-1500      838
                251- 300   23343             1501-1600      417
                301- 350   23199             1601-1700      330
                351- 400   21643             1701-1800      284
                401- 450   17008             1801-1900      255
                451- 500   14711             1901-2000      222
                501- 550   11081             2001-2100      135
                551- 600    7644             2101-2200      187
                601- 650    6459             2201-2300      178
                651- 700    4429             2301-2400      116
                701- 750    3737             2401-2500       94
                751- 800    2996             >2500          703
                801- 850    2516
                851- 900    2629
                901- 950    1992
                951-1000    1570


   The average sequence length in UniProtKB/Swiss-Prot is 367 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_HUMAN (Q8WZ42): 34350 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1816


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  631
                       2x:  231
                       3x:  136
                       4x:   93
                       5x:   68
                       6x:   49
                       7x:   39
                       8x:   31
                       9x:   32
                      10x:   16
                  11- 20x:  139
                  21- 50x:  147
                  51-100x:   73
                    >100x:  131


   5.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        14911   Journal of Biological Chemistry
    2         7108   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4518   Journal of Bacteriology
    4         4231   Gene
    5         4078   Nucleic Acids Research
    6         3837   Biochemical and Biophysical Research Communications
    7         3551   FEBS Letters
    8         3308   Biochemistry
    9         3269   The EMBO Journal
   10         2943   European Journal of Biochemistry
   11         2815   Nature
   12         2782   Molecular and Cellular Biology
   13         2673   Biochimica et Biophysica Acta
   14         2530   Journal of Molecular Biology
   15         2307   Genomics
   16         2279   Cell
   17         1851   Biochemical Journal
   18         1754   Science
   19         1500   Molecular Microbiology
   20         1356   Journal of Virology
   21         1353   Plant Molecular Biology
   22         1294   Journal of Cell Biology
   23         1271   Molecular and General Genetics
   24         1132   Virology
   25         1103   Human Molecular Genetics
   26         1076   Journal of Biochemistry
   27         1073   Genes and Development
   28         1072   Nature Genetics
   29          973   Plant Physiology
   30          970   Oncogene
   31          938   The American Journal of Human Genetics
   32          832   Human Mutation
   33          813   Development
   34          789   Journal of Immunology
   35          769   Infection and Immunity
   36          757   Genetics
   37          741   Structure
   38          696   Yeast
   39          694   Archives of Biochemistry and Biophysics
   40          688   Molecular Biology of the Cell
   41          679   Journal of General Virology
   42          627   Microbiology
   43          583   Blood
   44          574   The Plant Cell
   45          564   FEMS Microbiology Letters
   46          544   Nature Structural Biology
   47          532   Molecular Cell
   48          510   Journal of Cell Science
   49          504   Human Genetics
   50          503   Developmental Biology
   51          501   Cancer Research
   52          493   Current Genetics
   53          474   Mechanisms of Development
   54          470   The Plant Journal
   55          446   Applied and Environmental Microbiology
   56          438   Current Biology
   57          432   Protein Science
   58          430   Acta Crystallographica, Section D
   59          429   Neuron
   60          423   Mammalian Genome
   61          421   Journal of Clinical Investigation
   62          403   Molecular and Biochemical Parasitology
   63          402   Journal of Neuroscience
   64          392   Molecular Endocrinology
   65          384   The Journal of Experimental Medicine
   66          372   Immunogenetics
   67          348   Journal of Neurochemistry
   68          347   Journal of Molecular Evolution
   69          342   DNA and Cell Biology
   70          339   Endocrinology
   71          324   Toxicon
   72          323   DNA Sequence
   73          311   The Journal of Clinical Endocrinology and Metabolism
   74          309   American Journal of Physiology
   75          295   Molecular Biology and Evolution
   76          292   Brain Research. Molecular Brain Research
   77          286   Biological Chemistry Hoppe-Seyler
   78          284   Bioscience, Biotechnology, and Biochemistry
   79          252   Cytogenetics and Cell Genetics
   80          246   Comparative Biochemistry and Physiology
   81          244   Proteins
   82          242   Journal of General Microbiology
   83          242   Journal of Medical Genetics
   84          225   Peptides
   85          221   Molecular Pharmacology
   86          219   Antimicrobial Agents and Chemotherapy
   87          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   88          208   Journal of Investigative Dermatology
   89          205   Biology of Reproduction
   90          202   Nature Cell Biology
   91          196   Genome Research
   92          196   Plant and Cell Physiology
   93          189   Virus Research
   94          183   DNA Research
   95          181   Molecular Plant-Microbe Interactions
   96          177   Experimental Cell Research
   97          176   European Journal of Immunology
   98          169   RNA
   99          166   Biochimie
  100          166   Neurology
  101          160   Developmental Dynamics
  102          159   Tissue Antigens
  103          158   DNA
  104          152   Molecular and Cellular Endocrinology
  105          149   American Journal of Medical Genetics
  106          149   Molecular Phylogenetics and Evolution
  107          149   Hemoglobin
  108          145   Bioorganicheskaia Khimiia
  109          144   European Journal of Human Genetics
  110          143   Genes to Cells
  111          142   Annals of Neurology
  112          141   Archives of Microbiology
  113          140   Planta
  114          137   Journal of Human Genetics
  115          135   Insect Biochemistry and Molecular Biology
  116          131   Immunity
  117          128   Developmental Cell
  118          125   Animal Genetics
  119          122   Molecular Reproduction and Development
  120          120   Diabetes
  121          118   Agricultural and Biological Chemistry
  122          118   General and Comparative Endocrinology
  123          116   Glycobiology
  124          116   Investigative Ophthalmology and Visual Science
  125          112   Molecular Immunology
  126          110   The New England Journal of Medicine
  127          106   Molecular and Cellular Neuroscience
  128          106   Journal of Protein Chemistry
  129          102   Eukaryotic cell
  130          102   Archives of Virology
  131          101   British Journal of Haematology


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     535419              1.99
   Journal                          466313    240501    1.73
   Submitted to EMBL/GenBank/DDBJ    64038     56191    0.24
   Submitted to Swiss-Prot            2131      2080    0.01
   Submitted to other databases        656       643   <0.01
   Unpublished observations            634       628   <0.01
   Book citation                       578       568   <0.01
   Plant Gene Register                 537       525   <0.01
   Thesis                              388       386   <0.01
   Patent                              138       136   <0.01
   Worm Breeder's Gazette                6         6   <0.01

Comments (CC)                      1094051              4.06
   SIMILARITY                       308814    246723    1.15
   FUNCTION                         190370    183685    0.71
   SUBCELLULAR LOCATION             148332    148332    0.55
   SUBUNIT                          101315    101315    0.38
   CATALYTIC ACTIVITY               100585     92207    0.37
   PATHWAY                           53586     45379    0.20
   COFACTOR                          40430     36241    0.15
   TISSUE SPECIFICITY                25652     25652    0.10
   MISCELLANEOUS                     22399     20186    0.08
   PTM                               21807     17661    0.08
   DOMAIN                            17056     14702    0.06
   ALTERNATIVE PRODUCTS              12219     12219    0.05
   INTERACTION                        8011      8011    0.03
   INDUCTION                          7327      7327    0.03
   SEQUENCE CAUTION                   6991      6991    0.03
   DEVELOPMENTAL STAGE                6431      6431    0.02
   ENZYME REGULATION                  4713      4713    0.02
   WEB RESOURCE                       4182      3404    0.02
   DISEASE                            3728      2674    0.01
   CAUTION                            3445      3371    0.01
   MASS SPECTROMETRY                  2868      2347    0.01
   BIOPHYSICOCHEMICAL PROPERTIES      1736      1736    0.01
   POLYMORPHISM                        600       584   <0.01
   RNA EDITING                         484       484   <0.01
   ALLERGEN                            421       421   <0.01
   TOXIC DOSE                          326       321   <0.01
   BIOTECHNOLOGY                       150       150   <0.01
   PHARMACEUTICAL                       73        73   <0.01

Features (FT)                      1810156              6.72
   CHAIN                            274122    265689    1.02
   TRANSMEM                         173694     38311    0.65
   METAL                            114900     28257    0.43
   CONFLICT                          94113     32564    0.35
   STRAND                            92328      8673    0.34
   DOMAIN                            89923     50410    0.33
   HELIX                             88733      9109    0.33
   TOPO_DOM                          87932     17895    0.33
   CARBOHYD                          77812     19639    0.29
   DISULFID                          75787     19361    0.28
   BINDING                           70187     24266    0.26
   MOD_RES                           66122     25262    0.25
   ACT_SITE                          64096     37057    0.24
   REPEAT                            58439      8843    0.22
   VARIANT                           46452      9660    0.17
   NP_BIND                           42707     30693    0.16
   REGION                            41005     21872    0.15
   COMPBIAS                          26848     15541    0.10
   VAR_SEQ                           26720     11548    0.10
   SIGNAL                            25239     25229    0.09
   TURN                              23767      7394    0.09
   MUTAGEN                           20170      4905    0.07
   ZN_FING                           19737      7617    0.07
   MOTIF                             18802     12363    0.07
   SITE                              16382      9270    0.06
   INIT_MET                          11205     11205    0.04
   NON_TER                           10727      8231    0.04
   COILED                            10588      6900    0.04
   PROPEP                             8180      6890    0.03
   LIPID                              7976      5129    0.03
   DNA_BIND                           7151      6620    0.03
   PEPTIDE                            6760      4226    0.03
   TRANSIT                            4711      4664    0.02
   CA_BIND                            2746      1143    0.01
   CROSSLNK                           2069      1416    0.01
   NON_CONS                           1297       534   <0.01
   UNSURE                              477       185   <0.01
   SE_CYS                              252       183   <0.01

Cross-references (DR)              4033012             14.98
   InterPro                         651509    247238    2.42
   EMBL                             511792    260808    1.90
   Pfam                             344308    240530    1.28
   GO                               342677    138336    1.27
   PROSITE                          254039    154834    0.94
   Gene3D                           236303    170230    0.88
   KEGG                             183185    165599    0.68
   GenomeReviews                    154937    138089    0.58
   HAMAP                            107598    107477    0.40
   TIGRFAMs                         106041     99200    0.39
   PANTHER                          103373     92186    0.38
   PIR                               99319     92736    0.37
   PRINTS                            95064     75266    0.35
   HSSP                              81207     81207    0.30
   SMART                             80636     61451    0.30
   ProDom                            72914     70542    0.27
   BioCyc                            72868     67334    0.27
   UniGene                           64399     59174    0.24
   Ensembl                           54123     54101    0.20
   GermOnline                        42029     41413    0.16
   PDB                               38670     10525    0.14
   ArrayExpress                      37093     37093    0.14
   PIRSF                             36473     33978    0.14
   SMR                               36314     36314    0.13
   RZPD-ProtExp                      28321     13310    0.11
   TIGR                              24163     23538    0.09
   LinkHub                           17851     17834    0.07
   HGNC                              16065     15996    0.06
   IntAct                            14505     14505    0.05
   MGI                               13185     13140    0.05
   MIM                               13142     10642    0.05
   DIP                                8831      8781    0.03
   SGD                                6236      6149    0.02
   CYGD                               6224      6135    0.02
   RGD                                5989      5985    0.02
   TAIR                               5775      5677    0.02
   MEROPS                             5482      5173    0.02
   EcoGene                            4311      4308    0.02
   EchoBASE                           4158      4126    0.02
   H-InvDB                            3677      3659    0.01
   WormPep                            3652      3028    0.01
   FlyBase                            3313      3189    0.01
   WormBase                           3304      3222    0.01
   GeneDB_Spombe                      3221      3186    0.01
   Gramene                            3075      3075    0.01
   TRANSFAC                           2884      2589    0.01
   SubtiList                          2795      2794    0.01
   Reactome                           2707      1546    0.01
   Orphanet                           2513      1615    0.01
   GeneFarm                           1854      1835    0.01
   DrugBank                           1826       502    0.01
   StyGene                            1589      1585    0.01
   HPA                                1486      1324    0.01
   TubercuList                        1446      1410    0.01
   ZFIN                               1303      1291   <0.01
   SWISS-2DPAGE                       1181      1181   <0.01
   PseudoCAP                          1164      1155   <0.01
   ListiList                          1093      1085   <0.01
   REPRODUCTION-2DPAGE                 834       834   <0.01
   Leproma                             636       633   <0.01
   AGD                                 627       621   <0.01
   PhotoList                           602       602   <0.01
   LegioList                           564       564   <0.01
   MaizeGDB                            458       453   <0.01
   OGP                                 379       378   <0.01
   PeroxiBase                          377       366   <0.01
   REBASE                              364       358   <0.01
   HIV                                 361       351   <0.01
   ECO2DBASE                           351       299   <0.01
   SagaList                            349       348   <0.01
   DictyBase                           347       344   <0.01
   GlycoSuiteDB                        282       282   <0.01
   PHCI-2DPAGE                         241       241   <0.01
   MypuList                            193       193   <0.01
   DOSAC-COBS-2DPAGE                   149       147   <0.01
   Aarhus/Ghent-2DPAGE                 128        98   <0.01
   Siena-2DPAGE                        103       103   <0.01
   HSC-2DPAGE                           85        85   <0.01
   PhosSite                             70        70   <0.01
   Cornea-2DPAGE                        67        67   <0.01
   COMPLUYEAST-2DPAGE                   59        59   <0.01
   euHCVdb                              55        44   <0.01
   PMMA-2DPAGE                          52        52   <0.01
   PptaseDB                             29        29   <0.01
   Rat-heart-2DPAGE                     28        28   <0.01
   ANU-2DPAGE                           25        25   <0.01
   BuruList                              5         5   <0.01

Number of explicitly cross-referenced databases: 88
Number of implicitly cross-referenced databases: 26


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 241395

Total number of entries encoded on a Mitochondrion: 4307
Total number of entries encoded on a Plasmid: 3324
Total number of entries encoded on a Plastid: 26
Total number of entries encoded on a Plastid; Apicoplast: 6
Total number of entries encoded on a Plastid; Chloroplast: 8139
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 91

Number of fragments: 8376 
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 20101


UniProtKB/TrEMBL protein database release 36.0 statistics


1.  INTRODUCTION

Release 36.0 of 29-May-2007 of UniProtKB/TrEMBL contains 4377315 sequence entries
comprising 1418480772 amino acids.

635321 sequences have been added since release 35, the sequence data of
6733 existing entries has been updated and the annotations of
2544163 entries have been revised. This represents an increase of 17%.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 8.60   Gln (Q) 3.93   Leu (L) 9.87   Ser (S) 6.80
   Arg (R) 5.59   Glu (E) 6.03   Lys (K) 5.16   Thr (T) 5.60
   Asn (N) 4.19   Gly (G) 7.07   Met (M) 2.40   Trp (W) 1.33
   Asp (D) 5.26   His (H) 2.22   Phe (F) 4.03   Tyr (Y) 3.00
   Cys (C) 1.35   Ile (I) 5.89   Pro (P) 4.86   Val (V) 6.66

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.05


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 133843

   The first twenty species represent  811024 sequences:  18.5 % of the
   total number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x:61175
                            2x:25028
                            3x:12878
                            4x: 7213
                            5x: 4220
                            6x: 3118
                            7x: 2264
                            8x: 1889
                            9x: 1517
                           10x: 1615
                       11- 20x: 7245
                       21- 50x: 2799
                       51-100x: 1134
                         >100x: 1748


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     183943  Human immunodeficiency virus 1
       2      95950  Oryza sativa subsp. japonica (Rice)
       3      51267  Homo sapiens (Human)
       4      50189  Trichomonas vaginalis G3
       5      49978  Mus musculus (Mouse)
       6      44092  Arabidopsis thaliana (Mouse-ear cress)
       7      39844  Paramecium tetraurelia
       8      38479  Oryza sativa subsp. indica (Rice)
       9      37042  Hepatitis C virus
      10      28036  Tetraodon nigroviridis (Green puffer)
      11      26966  Drosophila melanogaster (Fruit fly)
      12      22297  Medicago truncatula (Barrel medic)
      13      20231  Caenorhabditis elegans
      14      20162  Trypanosoma cruzi
      15      19623  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      18255  uncultured bacterium
      17      16853  Aedes aegypti (Yellowfever mosquito)
      18      16685  Tetrahymena thermophila SB210
      19      16460  Phaeosphaeria nodorum (Septoria nodorum)
      20      14672  Plasmodium chabaudi
      21      14311  Hepatitis B virus (HBV)
      22      13528  Aspergillus niger
      23      13412  Anopheles gambiae str. PEST
      24      13071  Dictyostelium discoideum AX4
      25      13062  Caenorhabditis briggsae
      26      12905  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
      27      12570  Xenopus laevis (African clawed frog)
      28      12020  Aspergillus oryzae
      29      11783  Plasmodium berghei
      30      10991  Chaetomium globosum (Soil fungus)
      31      10645  Neurospora crassa
      32      10429  Coccidioides immitis
      33      10370  Neosartorya fischeri  (Aspergillus fischerianus 
      34      10360  Aspergillus terreus (strain NIH 2624)
      35      10067  Drosophila pseudoobscura (Fruit fly)
      36       9747  Cryptococcus neoformans (Filobasidiella neoformans)
      37       9721  Aspergillus fumigatus (Sartorya fumigata)
      38       9720  Schistosoma japonicum (Blood fluke)
      39       9518  Emericella nidulans (Aspergillus nidulans)
      40       9453  Trypanosoma brucei
      41       9332  Candida albicans (Yeast)
      42       9080  Aspergillus clavatus
      43       9004  Escherichia coli
      44       8987  Rhodococcus sp. (strain RHA1)
      45       8557  Rattus norvegicus (Rat)
      46       8512  Stigmatella aurantiaca DW4/3-1
      47       8424  Burkholderia xenovorans (strain LB400)
      48       8285  Bos taurus (Bovine)
      49       8249  Microscilla marina ATCC 23134
      50       8123  Bradyrhizobium japonicum
      51       8011  Leishmania infantum
      52       7975  Ostreococcus tauri
      53       7937  Frankia sp. EAN1pec
      54       7880  Leishmania braziliensis
      55       7834  Burkholderia phymatum STM815
      56       7808  Plasmodium yoelii yoelii
      57       7757  Solibacter usitatus (strain Ellin6076)
      58       7659  Helicobacter pylori (Campylobacter pylori)
      59       7522  Streptomyces coelicolor
      60       7461  Burkholderia cenocepacia MC0-3
      61       7439  Burkholderia sp. (strain 383) (Burkholderia cepacia 
      62       7432  Bradyrhizobium sp. BTAi1
      63       7409  Burkholderia vietnamiensis G4
      64       7403  Ostreococcus lucimarinus CCE9901
      65       7349  Burkholderia pseudomallei 305
      66       7310  Burkholderia phytofirmans PsJN
      67       7297  Streptomyces avermitilis
      68       7274  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      69       7215  Burkholderia pseudomallei (strain 668)
      70       7199  Myxococcus xanthus (strain DK 1622)
      71       7161  Saccharopolyspora erythraea (strain ATCC 11635 / DSM 40517 / NRRL 2338)
      72       7147  Burkholderia pseudomallei (strain 1106a)
      73       7136  Rhizobium loti (Mesorhizobium loti)
      74       7113  Hepatitis C virus subtype 1b
      75       7113  Leishmania major
      76       6996  Burkholderia ambifaria MC40-6
      77       6986  Rhizobium leguminosarum bv. viciae (strain 3841)
      78       6953  Rhodopirellula baltica
      79       6916  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      80       6870  Burkholderia cenocepacia (strain HI2424)
      81       6726  Pseudomonas aeruginosa
      82       6711  Bradyrhizobium sp. ORS278
      83       6704  Frankia alni (strain ACN14a)
      84       6679  Psychroflexus torquis ATCC 700755
      85       6595  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
      86       6581  Burkholderia cepacia (strain ATCC 53795 / AMMD)
      87       6564  Burkholderia multivorans ATCC 17616
      88       6553  Hahella chejuensis (strain KCTC 2396)
      89       6517  Plasmodium falciparum
      90       6501  Ralstonia eutropha  (Cupriavidus necator 
      91       6468  Ustilago maydis (Smut fungus)
      92       6411  Cyanothece sp. CCY0110
      93       6394  Giardia lamblia ATCC 50803
      94       6337  Sinorhizobium medicae WSM419
      95       6302  Burkholderia cenocepacia (strain AU 1054)
      96       6300  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      97       6272  Stappia aggregata IAM 12614
      98       6227  Oryza sativa (Rice)
      99       6172  Bacillus anthracis
     100       6170  Yarrowia lipolytica (Candida lipolytica)


   
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           97220 (  2%)
    Bacteria        2327609 ( 53%)
    Eukaryota       1446119 ( 33%)
    Viruses          502656 ( 11%)
    Other              3709 ( <1%)



   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  51267 (  4%)           (  1%)
     Other Mammalia        127002 (  9%)           (  3%)
     Other Vertebrata      174804 ( 12%)           (  4%)
     Viridiplantae         355389 ( 25%)           (  8%)
     Fungi                 225852 ( 16%)           (  5%)
     Insecta               144995 ( 10%)           (  3%)
     Nematoda               37220 (  3%)           (  1%)
     Other                 329590 ( 23%)           (  8%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50   60636             1001-1100    26286
                 51- 100  301601             1101-1200    18518
                101- 150  378501             1201-1300    12788
                151- 200  360318             1301-1400     8482
                201- 250  361834             1401-1500     6934
                251- 300  346925             1501-1600     5095
                301- 350  322854             1601-1700     3905
                351- 400  254139             1701-1800     3210
                401- 450  208459             1801-1900     2411
                451- 500  175752             1901-2000     2043
                501- 550  126518             2001-2100     1627
                551- 600   93347             2101-2200     1698
                601- 650   70033             2201-2300     1317
                651- 700   54300             2301-2400     1093
                701- 750   47578             2401-2500      864
                751- 800   42478             >2500         7587
                801- 850   31364
                851- 900   27564
                901- 950   20246
                951-1000   15849

   The average sequence length in UniProtKB/TrEMBL is   324 amino acids.

   The shortest sequence is Q96AT0_HUMAN:     4 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    6233829              1.42
   Submitted to EMBL/GenBank/DDBJ  3316771   2557101    0.76
   Journal                         2829436   2384972    0.65
   Thesis                             6545      6491   <0.01
   Book citation                      4240      4195   <0.01
   Submitted to other databases        273       267   <0.01
   Other                             76564     44407    0.02

Comments (CC)                      1728629              0.39
   CAUTION                          930449    930449    0.21
   SIMILARITY                       281294    276006    0.06
   FUNCTION                         120986    113447    0.03
   SUBCELLULAR LOCATION             116592    116592    0.03
   CATALYTIC ACTIVITY               111221    100470    0.03
   SUBUNIT                           78221     78221    0.02
   COFACTOR                          58468     58243    0.01
   PATHWAY                           20159     16511   <0.01
   DOMAIN                             4879      4275   <0.01
   MISCELLANEOUS                      3656      3656   <0.01
   INTERACTION                        2695      2695   <0.01
   ALLERGEN                              5         5   <0.01
   MASS SPECTROMETRY                     4         4   <0.01

Features (FT)                      1963260              0.45
   NON_TER                         1627029    970945    0.37
   CHAIN                            198930    168578    0.05
   SIGNAL                           136762    136762    0.03
   TRANSIT                             539       539   <0.01

Cross-references (DR)             32364872              7.39
   InterPro                        6595878   3152098    1.51
   GO                              5811332   2060966    1.33
   EMBL                            4964829   4369327    1.13
   Pfam                            4064279   3000740    0.93
   PROSITE                         2155488   1407184    0.49
   GenomeReviews                   1143708   1099331    0.26
   Gene3D                           933992    822246    0.21
   KEGG                             869694    832105    0.20
   PRINTS                           853606    716490    0.20
   SMART                            761643    594461    0.17
   TIGRFAMs                         666776    612627    0.15
   PANTHER                          578057    551858    0.13
   ProDom                           531615    507197    0.12
   SMR                              395863    395863    0.09
   BioCyc                           280518    265666    0.06
   HSSP                             270957    270554    0.06
   UniGene                          242685    224363    0.06
   PIR                              184223    149145    0.04
   TIGR                             171226    164627    0.04
   Ensembl                          157071    157070    0.04
   PIRSF                            154302    147467    0.04
   RZPD-ProtExp                     108875     33477    0.02
   ArrayExpress                      95585     95495    0.02
   Gramene                           70734     70734    0.02
   MGI                               42221     41755    0.01
   HGNC                              34930     34883    0.01
   euHCVdb                           32511     32511    0.01
   FlyBase                           24665     24629    0.01
   WormPep                           19286     19205   <0.01
   TAIR                              18958     18899   <0.01
   WormBase                          18790     18711   <0.01
   ZFIN                              15410     15406   <0.01
   LinkHub                           13489     13489   <0.01
   DictyBase                         12917     12917   <0.01
   MEROPS                            11642     11209   <0.01
   LegioList                          5339      5309   <0.01
   IntAct                             5322      5322   <0.01
   ListiList                          4722      4705   <0.01
   PseudoCAP                          4407      4404   <0.01
   PDB                                4323      2607   <0.01
   BuruList                           4235      4201   <0.01
   PhotoList                          4078      3954   <0.01
   AGD                                4073      4073   <0.01
   RGD                                3795      3466   <0.01
   REBASE                             3692      3667   <0.01
   TubercuList                        2543      2537   <0.01
   DIP                                2509      2504   <0.01
   GeneDB_Spombe                      1770      1758   <0.01
   SagaList                           1745      1651   <0.01
   PeroxiBase                         1371      1368   <0.01
   Leproma                             971       970   <0.01
   TRANSFAC                            872       862   <0.01
   MypuList                            589       585   <0.01
   SGD                                 375       374   <0.01
   PHCI-2DPAGE                         106       106   <0.01
   CYGD                                101        98   <0.01
   ANU-2DPAGE                           60        60   <0.01
   Reactome                             46        33   <0.01
   SWISS-2DPAGE                         37        37   <0.01
   REPRODUCTION-2DPAGE                  30        30   <0.01
   PMMA-2DPAGE                           3         3   <0.01
   Siena-2DPAGE                          2         2   <0.01
   COMPLUYEAST-2DPAGE                    1         1   <0.01

Number of explicitly cross-referenced databases: 87


6.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/TrEMBL: 248496

Total number of entries encoded on a Mitochondrion: 167124
Total number of entries encoded on a Plasmid: 71535
Total number of entries encoded on a Plastid: 3525
Total number of entries encoded on a Plastid; Apicoplast: 183
Total number of entries encoded on a Plastid; Chloroplast: 57807
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 212

Number of fragments: 973161


Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail:


Download information

Bi-Weekly releases

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: /
WWW server: http://www.ebi.ac.uk/


Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address:
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address:
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 35:D193-D197(2007) doi:10.1093/nar/gkl929