UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Release notes
UniProtKB release 13.0 of 26-Feb-2008

Content

  Introduction
  UniProtKB/Swiss-Prot Protein Knowledgebase release statistics
  UniProtKB/TrEMBL Protein Database release statistics

  Submissions and Updates
  Download information
  Contact
  Citation

  Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.

Introduction

Release 13.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 55.0 and the UniProtKB/TrEMBL Protein Database release 38.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.


UniProtKB/Swiss-Prot protein knowledgebase release 55.0 statistics

Release 55.0 of 26-Feb-08 of UniProtKB/Swiss-Prot contains 356'194 sequence entries, comprising 127'836'513 amino acids abstracted from 165'776 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062
53.0 05/07 269'293 98'902'758
54.0 07/07 276'256 101'466'206
55.0 02/08 356'194 127'836'513

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 6'461
C.albicans None yet calbican.txt 679
C.elegans Wormpep celegans.txt 3'153
D.discoideum DictyBase dicty.txt 587
D.melanogaster FlyBase fly.txt 2'747
M.musculus MGD mgdtosp.txt 15'015
S.cerevisiae SGD yeast.txt 6'556
S.pombe GeneDB_SPombe pombe.txt 4'198

UniProtKB/Swiss-Prot release statistics
1.  INTRODUCTION

Release 55.0 of 26-Feb-08 of UniProtKB/Swiss-Prot contains 356194 sequence entries,
comprising 127836513 amino acids abstracted from 165776 references. 

80183 sequences have been added since release 54.0, the sequence data of
1411 existing entries has been updated and the annotations of
262009 entries have been revised.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 8.09   Gln (Q) 3.95   Leu (L) 9.67   Ser (S) 6.69
   Arg (R) 5.49   Glu (E) 6.72   Lys (K) 5.89   Thr (T) 5.36
   Asn (N) 4.05   Gly (G) 7.02   Met (M) 2.41   Trp (W) 1.10
   Asp (D) 5.40   His (H) 2.29   Phe (F) 3.89   Tyr (Y) 2.95
   Cys (C) 1.43   Ile (I) 5.90   Pro (P) 4.79   Val (V) 6.81

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 11290

   The first twenty species represent 92957 sequences:  26.1 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 5244
                            2x: 1710
                            3x:  810
                            4x:  542
                            5x:  372
                            6x:  330
                            7x:  229
                            8x:  195
                            9x:  159
                           10x:   99
                       11- 20x:  478
                       21- 50x:  327
                       51-100x:  141
                         >100x:  654


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      18609  Homo sapiens (Human)
       2      15015  Mus musculus (Mouse)
       3       6783  Rattus norvegicus (Rat)
       4       6556  Saccharomyces cerevisiae (Baker's yeast)
       5       6461  Arabidopsis thaliana (Mouse-ear cress)
       6       4846  Bos taurus (Bovine)
       7       4343  Escherichia coli (strain K12)
       8       4198  Schizosaccharomyces pombe (Fission yeast)
       9       3153  Caenorhabditis elegans
      10       2871  Bacillus subtilis
      11       2747  Drosophila melanogaster (Fruit fly)
      12       2559  Xenopus laevis (African clawed frog)
      13       2153  Pongo pygmaeus (Orangutan)
      14       1973  Gallus gallus (Chicken)
      15       1931  Danio rerio (Zebrafish) (Brachydanio rerio)
      16       1922  Escherichia coli O157:H7
      17       1782  Methanocaldococcus jannaschii (Methanococcus jannaschii)
      18       1774  Haemophilus influenzae
      19       1678  Salmonella typhimurium
      20       1603  Escherichia coli O6
      21       1601  Shigella flexneri
      22       1529  Oryza sativa subsp. japonica (Rice)
      23       1436  Mycobacterium tuberculosis
      24       1287  Sus scrofa (Pig)
      25       1271  Salmonella typhi
      26       1234  Pseudomonas aeruginosa
      27       1177  Mycobacterium bovis
      28       1048  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      29       1009  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      30        979  Synechocystis sp. (strain PCC 6803)
      31        974  Archaeoglobus fulgidus
      32        938  Yersinia pestis
      33        909  Acanthamoeba polyphaga mimivirus (APMV)
      34        904  Vibrio cholerae
      35        882  Rhizobium meliloti (Sinorhizobium meliloti)
      36        863  Oryctolagus cuniculus (Rabbit)
      37        845  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      38        844  Staphylococcus aureus (strain N315)
      39        823  Salmonella paratyphi A
      40        816  Staphylococcus aureus (strain MW2)
      41        816  Staphylococcus aureus (strain COL)
      42        812  Staphylococcus aureus (strain MSSA476)
      43        808  Staphylococcus aureus (strain MRSA252)
      44        780  Yersinia pseudotuberculosis
      45        769  Salmonella choleraesuis
      46        757  Aquifex aeolicus
      47        753  Vibrio parahaemolyticus
      48        749  Pasteurella multocida
      49        745  Shigella sonnei (strain Ss046)
      50        744  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
      51        736  Canis familiaris (Dog)
      52        715  Shigella boydii serotype 4 (strain Sb227)
      53        705  Shigella dysenteriae serotype 1 (strain Sd197)
      54        703  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      55        699  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      56        696  Vibrio vulnificus
      57        695  Streptomyces coelicolor
      58        687  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      59        687  Mycoplasma pneumoniae
      60        686  Staphylococcus epidermidis (strain ATCC 12228)
      61        683  Bacillus halodurans
      62        683  Neurospora crassa
      63        679  Vibrio vulnificus (strain YJ016)
      64        679  Candida albicans (Yeast)
      65        674  Escherichia coli (strain UTI89 / UPEC)
      66        672  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      67        668  Photorhabdus luminescens subsp. laumondii
      68        664  Pan troglodytes (Chimpanzee)
      69        655  Escherichia coli O9:H4 (strain HS)
      70        652  Bacillus anthracis
      71        650  Escherichia coli O139:H28 (strain E24377A / ETEC)
      72        641  Mycobacterium leprae
      73        641  Anabaena sp. (strain PCC 7120)
      74        629  Pseudomonas syringae pv. tomato
      75        626  Pseudomonas putida (strain KT2440)
      76        625  Candida glabrata (Yeast) (Torulopsis glabrata)
      77        615  Escherichia coli
      78        612  Treponema pallidum
      79        612  Shigella flexneri serotype 5b (strain 8401)
      80        608  Bradyrhizobium japonicum
      81        607  Yersinia pestis (biovar Antiqua strain Nepal516)
      82        602  Yersinia pestis (biovar Antiqua strain Antiqua)
      83        602  Zea mays (Maize)
      84        593  Methanobacterium thermoautotrophicum
      85        593  Staphylococcus aureus (strain NCTC 8325)
      86        587  Dictyostelium discoideum (Slime mold)
      87        587  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      88        587  Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
      89        581  Rickettsia prowazekii
      90        580  Bacillus cereus (strain ATCC 14579 / DSM 31)
      91        579  Ralstonia solanacearum (Pseudomonas solanacearum)
      92        577  Helicobacter pylori (Campylobacter pylori)
      93        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
      94        569  Yersinia pseudotuberculosis (serotype O:1b / strain IP 31758)
      95        568  Rhizobium loti (Mesorhizobium loti)
      96        567  Shewanella oneidensis
      97        562  Buchnera aphidicola subsp. Schizaphis graminum
      98        561  Lactococcus lactis subsp. lactis (Streptococcus lactis)
      99        560  Listeria monocytogenes
     100        558  Helicobacter pylori J99 (Campylobacter pylori J99)
     101        558  Neisseria meningitidis serogroup B
     102        557  Escherichia coli O1:K1 / APEC
     103        552  Listeria innocua
     104        551  Xanthomonas campestris pv. campestris
     105        549  Staphylococcus aureus (strain USA300)
     106        539  Staphylococcus aureus (strain bovine RF122)
     107        538  Yersinia pestis (strain Pestoides F)
     108        537  Photobacterium profundum (Photobacterium sp. (strain SS9))
     109        537  Neisseria meningitidis serogroup A
     110        532  Enterobacter sp. (strain 638)
     111        527  Clostridium acetobutylicum
     112        525  Staphylococcus haemolyticus (strain JCSC1435)
     113        524  Caulobacter crescentus (Caulobacter vibrioides)
     114        523  Staphylococcus saprophyticus subsp. saprophyticus 
     115        521  Brucella melitensis
     116        521  Brucella suis
     117        519  Bacillus cereus (strain ATCC 10987)
     118        517  Xanthomonas axonopodis pv. citri
     119        516  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
     120        508  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
     121        507  Buchnera aphidicola subsp. Baizongia pistaciae
     122        505  Oceanobacillus iheyensis
     123        503  Streptococcus pneumoniae
     124        499  Serratia proteamaculans (strain 568)
     125        499  Bacillus thuringiensis subsp. konkukian
     126        496  Emericella nidulans (Aspergillus nidulans)
     127        495  Yarrowia lipolytica (Candida lipolytica)
     128        495  Xylella fastidiosa
     129        494  Thermotoga maritima
     130        493  Listeria monocytogenes serotype 4b (strain F2365)
     131        491  Vibrio fischeri (strain ATCC 700601 / ES114)
     132        490  Rickettsia conorii
     133        486  Bacillus cereus (strain ZK / E33L)
     134        486  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
     135        486  Pseudomonas syringae pv. syringae (strain B728a)
     136        483  Mycoplasma genitalium
     137        478  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     138        478  Pseudomonas fluorescens (strain PfO-1)
     139        476  Haemophilus ducreyi
     140        475  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     141        475  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     142        472  Deinococcus radiodurans
     143        471  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     144        469  Bordetella pertussis
     145        468  Bordetella parapertussis
     146        468  Chromobacterium violaceum
     147        468  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     148        466  Clostridium perfringens
     149        465  Corynebacterium glutamicum (Brevibacterium flavum)
     150        462  Methanosarcina acetivorans
     151        459  Pseudomonas aeruginosa (strain UCBPP-PA14)
     152        459  Enterobacter sakazakii (strain ATCC BAA-894)
     153        451  Pyrococcus horikoshii
     154        447  Pyrococcus abyssi
     155        445  Rickettsia felis (Rickettsia azadi)
     156        444  Halobacterium salinarium (Halobacterium halobium)
     157        443  Sodalis glossinidius (strain morsitans)
     158        443  Brucella abortus
     159        442  Methanosarcina mazei (Methanosarcina frisia)
     160        440  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     161        440  Haemophilus influenzae (strain 86-028NP)
     162        439  Mannheimia succiniciproducens (strain MBEL55E)
     163        439  Enterococcus faecalis (Streptococcus faecalis)
     164        439  Streptomyces avermitilis
     165        437  Chlamydia trachomatis
     166        433  Lactobacillus plantarum
     167        432  Rickettsia bellii (strain RML369-C)
     168        431  Streptococcus mutans
     169        430  Thermoanaerobacter tengcongensis
     170        429  Synechococcus elongatus (Thermosynechococcus elongatus)
     171        429  Pyrococcus furiosus
     172        428  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     173        428  Bacillus clausii (strain KSM-K16)
     174        426  Streptococcus pyogenes serotype M6
     175        426  Xanthomonas campestris pv. campestris (strain 8004)
     176        425  Borrelia burgdorferi (Lyme disease spirochete)
     177        425  Ovis aries (Sheep)
     178        423  Nicotiana tabacum (Common tobacco)
     179        422  Geobacillus kaustophilus
     180        418  Campylobacter jejuni
     181        418  Pseudomonas entomophila (strain L48)
     182        417  Chlamydia pneumoniae (Chlamydophila pneumoniae)
     183        416  Acinetobacter sp. (strain ADP1)
     184        412  Rhodopseudomonas palustris
     185        408  Shewanella sp. (strain MR-7)
     186        407  Chlamydia muridarum
     187        407  Brucella abortus (strain 2308)
     188        406  Synechococcus sp. (strain PCC 7942) (Anacystis nidulans R2)
     189        406  Rhizobium sp. (strain NGR234)
     190        405  Burkholderia mallei (Pseudomonas mallei)
     191        404  Pseudomonas aeruginosa (strain PA7)
     192        404  Streptococcus pyogenes serotype M1
     193        404  Shewanella sp. (strain MR-4)
     194        404  Sulfolobus solfataricus
     195        403  Rickettsia typhi
     196        402  Xanthomonas campestris pv. vesicatoria (strain 85-10)
     197        401  Streptococcus pyogenes serotype M18
     198        399  Burkholderia sp. (strain 383) (Burkholderia cepacia 
     199        399  Streptococcus pyogenes serotype M3
     200        398  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     201        395  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
     202        392  Nitrosomonas europaea
     203        392  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
     204        390  Methylococcus capsulatus
     205        389  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     206        388  Vibrio cholerae (strain ATCC 39541 / O395)
     207        388  Vibrio harveyi (strain ATCC BAA-1116 / BB120)
     208        386  Gloeobacter violaceus
     209        385  Corynebacterium efficiens
     210        382  Chlorobium tepidum
     211        380  Aspergillus fumigatus (Sartorya fumigata)
     212        379  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
     213        378  Ralstonia eutropha  (Cupriavidus necator 
     214        377  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     215        376  Pseudomonas putida (strain F1 / ATCC 700007)
     216        376  Pyrococcus kodakaraensis (Thermococcus kodakaraensis)
     217        374  Solanum tuberosum (Potato)
     218        373  Shewanella sp. (strain ANA-3)
     219        371  Idiomarina loihiensis
     220        368  Mycobacterium paratuberculosis
     221        368  Synechococcus sp. (strain WH8102)
     222        368  Shewanella frigidimarina (strain NCIMB 400)
     223        367  Pseudoalteromonas haloplanktis (strain TAC 125)
     224        366  Streptococcus agalactiae serotype III
     225        365  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
     226        364  Methanopyrus kandleri
     227        364  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
     228        362  Streptococcus agalactiae serotype V
     229        362  Oryza sativa subsp. indica (Rice)
     230        358  Staphylococcus aureus (strain Newman)
     231        358  Xanthomonas oryzae pv. oryzae
     232        357  Hahella chejuensis (strain KCTC 2396)
     233        357  Leptospira interrogans
     234        356  Prochlorococcus marinus (strain MIT 9313)
     235        356  Coxiella burnetii
     236        355  Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1))
     237        355  Dechloromonas aromatica (strain RCB)
     238        354  Burkholderia xenovorans (strain LB400)
     239        353  Aeropyrum pernix
     240        352  Prochlorococcus marinus
     241        351  Staphylococcus aureus (strain Mu3 / ATCC 700698)
     242        350  Shewanella baltica (strain OS185)
     243        349  Shewanella sp. (strain W3-18-1)
     244        349  Burkholderia cenocepacia (strain AU 1054)
     245        349  Pisum sativum (Garden pea)
     246        348  Burkholderia thailandensis (strain E264 / ATCC 700388 / DSM 13276 / CIP 106301)
     247        347  Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni
     248        346  Haemophilus influenzae (strain PittEE)
     249        346  Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
     250        345  Geobacter sulfurreducens
     251        342  Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
     252        340  Burkholderia pseudomallei (strain 1710b)
     253        339  Sulfolobus tokodaii
     254        339  Glycine max (Soybean)
     255        338  Listeria welshimeri serovar 6b (strain ATCC 35897 / DSM 20650 / SLCC5334)
     256        338  Shewanella denitrificans (strain OS217 / ATCC BAA-1090 / DSM 15013)
     257        338  Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579)
     258        337  Aeromonas salmonicida (strain A449)
     259        335  Rhizobium etli (strain CFN 42 / ATCC 51251)
     260        331  Prochlorococcus marinus subsp. pastoris (strain CCMP 1378 / MED4)
     261        331  Shewanella baltica (strain OS155 / ATCC BAA-1091)
     262        330  Actinobacillus pleuropneumoniae serotype 5b (strain L20)
     263        330  Legionella pneumophila (strain Paris)
     264        330  Pseudomonas mendocina (strain ymp)
     265        330  Shewanella loihica (strain BAA-1088 / PV-4)
     266        330  Shewanella amazonensis (strain ATCC BAA-1098 / SB2B)
     267        329  Nitrosococcus oceani (strain ATCC 19707 / NCIMB 11848)
     268        329  Legionella pneumophila (strain Lens)
     269        328  Macaca mulatta (Rhesus macaque)
     270        327  Nocardia farcinica
     271        327  Bacillus amyloliquefaciens (strain FZB42)
     272        327  Haemophilus influenzae (strain PittGG)
     273        327  Rhodopirellula baltica
     274        326  Silicibacter pomeroyi
     275        325  Desulfovibrio vulgaris (strain Hildenborough / ATCC 29579 / NCIMB 8303)
     276        323  Haemophilus somnus (strain 129Pt) (Histophilus somni (strain 129Pt))
     277        323  Ralstonia metallidurans (strain CH34 / ATCC 43123 / DSM 2839)
     278        323  Legionella pneumophila subsp. pneumophila 
     279        323  Fusobacterium nucleatum subsp. nucleatum
     280        321  Thiobacillus denitrificans (strain ATCC 25259)
     281        321  Thermoplasma acidophilum
     282        320  Rhizobium leguminosarum bv. viciae (strain 3841)
     283        316  Zymomonas mobilis
     284        314  Burkholderia cenocepacia (strain HI2424)
     285        313  Symbiobacterium thermophilum
     286        313  Wolinella succinogenes
     287        312  Bacillus pumilus (strain SAFR-032)
     288        312  Neisseria meningitidis serogroup C / serotype 2a (strain ATCC 700532 / FAM18)
     289        312  Chromohalobacter salexigens (strain DSM 3043 / ATCC BAA-138 / NCIMB 13768)
     290        311  Mycobacterium tuberculosis (strain ATCC 25177 / H37Ra)
     291        311  Triticum aestivum (Wheat)
     292        309  Bacteroides thetaiotaomicron
     293        307  Saccharophagus degradans (strain 2-40 / ATCC 43961 / DSM 17024)
     294        307  Clostridium tetani
     295        307  Streptococcus agalactiae serotype Ia
     296        306  Bordetella avium (strain 197N)
     297        305  Thermus thermophilus (strain HB27 / ATCC BAA-163 / DSM 7039)
     298        305  Lactococcus lactis subsp. cremoris (strain MG1363)
     299        305  Methanococcus maripaludis
     300        304  Corynebacterium diphtheriae
     301        304  Caenorhabditis briggsae
     302        303  Burkholderia cepacia (strain ATCC 53795 / AMMD)
     303        303  Mycobacterium bovis (strain BCG / Paris 1173P2)
     304        302  Geobacter metallireducens (strain GS-15 / ATCC 53774 / DSM 7210)
     305        301  Campylobacter jejuni (strain RM1221)
     306        300  Nitrosospira multiformis (strain ATCC 25196 / NCIMB 11849)
     307        299  Clostridium perfringens (strain ATCC 13124 / NCTC 8237 / Type A)
     308        299  Geobacillus thermodenitrificans (strain NG80-2)
     309        298  Hordeum vulgare (Barley)
     310        297  Methanosarcina barkeri (strain Fusaro / DSM 804)
     311        297  Rhodopseudomonas palustris (strain HaA2)
     312        296  Sulfolobus acidocaldarius
     313        296  Staphylococcus aureus
     314        295  Nitrobacter winogradskyi (strain Nb-255 / ATCC 25391)
     315        294  Streptococcus thermophilus (strain CNRZ 1066)
     316        293  Actinobacillus succinogenes (strain ATCC 55618 / 130Z)
     317        293  Pseudoalteromonas atlantica (strain T6c / BAA-1087)
     318        293  Rhodopseudomonas palustris (strain BisB18)
     319        292  Haloarcula marismortui (Halobacterium marismortui)
     320        292  Streptococcus thermophilus (strain ATCC BAA-250 / LMG 18311)
     321        291  Cavia porcellus (Guinea pig)
     322        291  Rhodospirillum rubrum (strain ATCC 11170 / NCIB 8255)
     323        290  Pelobacter carbinolicus (strain DSM 2380 / Gra Bd 1)
     324        290  Streptococcus pneumoniae serotype 2 (strain D39 / NCTC 7466)
     325        289  Pseudomonas stutzeri (strain A1501)
     326        289  Bacteroides fragilis
     327        289  Pyrobaculum aerophilum
     328        288  Psychromonas ingrahamii (strain 37)
     329        288  Rhodoferax ferrireducens (strain DSM 15236 / ATCC BAA-621 / T118)
     330        287  Prochlorococcus marinus (strain NATL2A)
     331        287  Bacillus thuringiensis (strain Al Hakam)
     332        287  Thiomicrospira crunogena (strain XCL-2)
     333        285  Thermoplasma volcanium
     334        284  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
     335        284  Clostridium perfringens (strain SM101 / Type A)
     336        284  Methylobacillus flagellatus (strain KT / ATCC 51484 / DSM 6875)
     337        284  Gluconobacter oxydans (Gluconobacter suboxydans)
     338        283  Burkholderia pseudomallei (strain 1106a)
     339        283  Bartonella henselae (Rochalimaea henselae)
     340        281  Helicobacter hepaticus
     341        280  Nitrobacter hamburgensis (strain X14 / DSM 10229)
     342        280  Sinorhizobium medicae (strain WSM419) (Ensifer medicae)
     343        278  Spinacia oleracea (Spinach)
     344        278  Rhodopseudomonas palustris (strain BisB5)
     345        277  Pseudomonas putida
     346        277  Alcanivorax borkumensis (strain SK2 / ATCC 700651 / DSM 11573)
     347        277  Carboxydothermus hydrogenoformans (strain Z-2901 / DSM 6008)
     348        277  Prochlorococcus marinus (strain MIT 9312)
     349        276  Streptococcus pyogenes serotype M28
     350        276  Bartonella quintana (Rochalimaea quintana)
     351        276  Mesorhizobium sp. (strain BNC1)
     352        275  Wigglesworthia glossinidia brevipalpis
     353        274  Cryptococcus neoformans (Filobasidiella neoformans)
     354        274  Bifidobacterium longum
     355        273  Psychrobacter arcticum
     356        273  Desulfotalea psychrophila
     357        273  Azoarcus sp. (strain BH72)
     358        272  Synechococcus sp. (strain CC9902)
     359        272  Burkholderia mallei (strain NCTC 10247)
     360        271  Burkholderia mallei (strain NCTC 10229)
     361        270  Gorilla gorilla gorilla (Lowland gorilla)
     362        270  Burkholderia pseudomallei (strain 668)
     363        269  Equus caballus (Horse)
     364        269  Ochrobactrum anthropi (strain ATCC 49188 / DSM 6882 / NCTC 12168)
     365        268  Bacteriophage T4
     366        268  Lactobacillus johnsonii
     367        268  Alkalilimnicola ehrlichei (strain MLHE-1)
     368        267  Marinobacter aquaeolei  (Marinobacter hydrocarbonoclasticus 
     369        267  Roseobacter denitrificans (strain ATCC 33942 / OCh 114) (Erythrobacter sp.  
     370        267  Porphyromonas gingivalis (Bacteroides gingivalis)
     371        267  Streptococcus pyogenes serotype M5 (strain Manfredo)
     372        266  Lactococcus lactis subsp. cremoris (strain SK11)
     373        266  Burkholderia mallei (strain SAVP1)
     374        266  Leifsonia xyli subsp. xyli
     375        264  Synechococcus sp. (strain CC9605)
     376        264  Streptococcus sanguinis (strain SK36)
     377        264  Lactobacillus sakei subsp. sakei (strain 23K)
     378        263  Ureaplasma parvum (Ureaplasma urealyticum biotype 1)
     379        262  Rhodobacter capsulatus (Rhodopseudomonas capsulata)
     380        262  Nitrosomonas eutropha (strain C71)
     381        262  Streptococcus thermophilus (strain ATCC BAA-491 / LMD-9)
     382        261  Blochmannia floridanus
     383        261  Moorella thermoacetica (strain ATCC 39073)
     384        258  Ustilago maydis (Smut fungus)
     385        257  Chlamydophila caviae
     386        257  Propionibacterium acnes
     387        257  Campylobacter jejuni subsp. jejuni serotype O:23/36 (strain 81-176)
     388        257  Bacteroides fragilis (strain ATCC 25285 / NCTC 9343)
     389        256  Lactobacillus acidophilus
     390        256  Psychrobacter cryohalolentis (strain K5)
     391        255  Rhodopseudomonas palustris (strain BisA53)
     392        254  Francisella tularensis subsp. tularensis
     393        254  Vaccinia virus (strain Copenhagen) (VACV)
     394        253  Synechococcus sp. (strain JA-2-3B'a(2-13)) 
     395        251  Helicobacter pylori (strain HPAG1)
     396        250  Legionella pneumophila (strain Corby)
     397        250  Synechococcus sp. (strain JA-3-3Ab) 
     398        250  Silicibacter sp. (strain TM1040)
     399        249  Jannaschia sp. (strain CCS1)
     400        246  Brucella ovis (strain ATCC 25840 / 63/290 / NCTC 10512)
     401        245  Desulfovibrio desulfuricans (strain G20)
     402        244  Campylobacter jejuni subsp. jejuni serotype O:6 (strain 81116 / NCTC 11828)
     403        244  Streptococcus pyogenes serotype M12 (strain MGAS9429)
     404        243  Rhodobacter sphaeroides (strain ATCC 17029 / ATH 2.4.9)
     405        243  Chlorobium chlorochromatii (strain CaD3)
     406        242  Streptococcus pyogenes serotype M4 (strain MGAS10750)
     407        241  Streptococcus pyogenes serotype M2 (strain MGAS10270)
     408        240  Magnetospirillum magneticum (strain AMB-1 / ATCC 700264)
     409        239  Bdellovibrio bacteriovorus
     410        239  Francisella tularensis subsp. holarctica (strain LVS)
     411        237  Trichodesmium erythraeum (strain IMS101)
     412        237  Corynebacterium jeikeium (strain K411)
     413        237  Clostridium novyi (strain NT)
     414        237  Aspergillus oryzae
     415        235  Bacillus stearothermophilus (Geobacillus stearothermophilus)
     416        235  Halorhodospira halophila (strain DSM 244 / SL1) (Ectothiorhodospira halophila 
     417        234  Mycobacterium ulcerans (strain Agy99)
     418        234  Polaromonas sp. (strain JS666 / ATCC BAA-500)
     419        234  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
     420        233  Rhodococcus sp. (strain RHA1)
     421        233  Pelodictyon luteolum (strain DSM 273) (Chlorobium luteolum (strain DSM 273))
     422        233  Anaeromyxobacter dehalogenans (strain 2CP-C)
     423        232  Corynebacterium glutamicum (strain R)
     424        232  Thermobifida fusca (strain YX)
     425        230  Treponema denticola
     426        230  Novosphingobium aromaticivorans (strain DSM 12444)
     427        229  Lactobacillus salivarius subsp. salivarius (strain UCC118)
     428        229  Acidovorax avenae subsp. citrulli (strain AAC00-1)
     429        229  Chlamydomonas reinhardtii
     430        229  Bradyrhizobium sp. (strain ORS278)
     431        227  Blochmannia pennsylvanicus (strain BPEN)
     432        227  Sulfurimonas denitrificans  (Thiomicrospira denitrificans 
     433        226  Clostridium botulinum (strain ATCC 19397 / Type A)
     434        226  Thermotoga petrophila (strain RKU-1 / ATCC BAA-488 / DSM 13995)
     435        226  Streptococcus pyogenes serotype M12 (strain MGAS2096)
     436        226  Prochlorococcus marinus (strain MIT 9301)
     437        226  Clostridium botulinum (strain Langeland / NCTC 10281 / Type F)
     438        224  Prochlorococcus marinus (strain MIT 9515)
     439        224  Mycobacterium avium (strain 104)
     440        223  Desulfitobacterium hafniense (strain Y51)
     441        221  Myxococcus xanthus (strain DK 1622)
     442        220  Prochlorococcus marinus (strain AS9601)
     443        220  Francisella tularensis subsp. tularensis (strain FSC 198)
     444        219  Chlamydia trachomatis (strain A/HAR-13 / ATCC VR-571B)
     445        219  Desulfovibrio vulgaris subsp. vulgaris (strain DP4)
     446        219  Campylobacter jejuni subsp. doylei (strain ATCC BAA-1458 / RM4099 / 269.97)
     447        219  Baumannia cicadellinicola subsp. Homalodisca coagulata
     448        219  Synechococcus sp. (strain CC9311)
     449        217  Cricetulus griseus (Chinese hamster)
     450        217  Porphyra purpurea
     451        217  Natronomonas pharaonis (strain DSM 2160 / ATCC 35678)
     452        217  Francisella tularensis subsp. holarctica (strain OSU18)
     453        216  Janthinobacterium sp. (strain Marseille) (Minibacterium massiliensis)
     454        216  Paracoccus denitrificans (strain Pd 1222)
     455        216  Prochlorococcus marinus (strain NATL1A)
     456        215  Protochlamydia amoebophila (strain UWE25)
     457        214  Klebsiella pneumoniae
     458        214  Francisella tularensis subsp. novicida (strain U112)
     459        214  Clostridium thermocellum (strain ATCC 27405 / DSM 1237)
     460        213  Clostridium difficile (strain 630)
     461        213  Syntrophus aciditrophicus (strain SB)
     462        212  Felis silvestris catus (Cat)
     463        212  Mycobacterium sp. (strain MCS)
     464        212  Chlamydophila abortus
     465        212  Synechococcus sp. (strain WH7803)
     466        212  Rhodobacter sphaeroides (strain ATCC 17025 / ATH 2.4.3)
     467        211  Leptospira borgpetersenii serovar Hardjo-bovis (strain JB197)
     468        211  Mycobacterium vanbaalenii (strain DSM 7251 / PYR-1)
     469        210  Herminiimonas arsenicoxydans
     470        210  Polaromonas naphthalenivorans (strain CJ2)
     471        209  Gibberella zeae (Fusarium graminearum)
     472        209  Helicobacter acinonychis (strain Sheeba)
     473        209  Acidovorax sp. (strain JS42)
     474        208  Prochlorococcus marinus (strain MIT 9215)
     475        208  Porphyra yezoensis
     476        208  Prochlorococcus marinus (strain MIT 9303)
     477        207  Chlorobium phaeobacteroides (strain DSM 266)
     478        206  Clostridium botulinum (strain Hall / ATCC 3502 / NCTC 13319 / Type A)
     479        206  Sphingopyxis alaskensis (Sphingomonas alaskensis)
     480        205  Lactobacillus casei (strain ATCC 334)
     481        204  Lactobacillus delbrueckii subsp. bulgaricus (strain ATCC 11842 / DSM 20081)
     482        204  Mesocricetus auratus (Golden hamster)
     483        204  Mycobacterium smegmatis (strain ATCC 700084 / mc(2)155)
     484        203  Pediococcus pentosaceus (strain ATCC 25745 / 183-1w)
     485        203  Dehalococcoides sp. (strain CBDB1)
     486        203  Francisella tularensis subsp. tularensis (strain WY96-3418)
     487        202  Encephalitozoon cuniculi
     488        202  Methanococcus vannielii (strain SB / ATCC 35089 / DSM 1224)
     489        201  Dehalococcoides ethenogenes (strain 195)
     490        201  Mycobacterium sp. (strain KMS)
     491        200  Pelobacter propionicus (strain DSM 2379)
     492        200  Vaccinia virus (strain Western Reserve / WR) (VACV)


   
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           13942 (  4%)
    Bacteria         198528 ( 56%)
    Eukaryota        131953 ( 37%)
    Viruses           11771 (  3%)


   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  18610 ( 14%)           (  5%)
     Other Mammalia         40919 ( 31%)           ( 11%)
     Other Vertebrata       13069 ( 10%)           (  4%)
     Viridiplantae          22036 ( 17%)           (  6%)
     Fungi                  20495 ( 16%)           (  6%)
     Insecta                 5343 (  4%)           (  2%)
     Nematoda                3702 (  3%)           (  1%)
     Other                   7779 (  6%)           (  2%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    6132             1001-1100     2680
                 51- 100   26810             1101-1200     1822
                101- 150   38249             1201-1300     1444
                151- 200   37092             1301-1400     1316
                201- 250   37148             1401-1500     1051
                251- 300   32525             1501-1600      522
                301- 350   31514             1601-1700      396
                351- 400   27716             1701-1800      338
                401- 450   23033             1801-1900      330
                451- 500   18961             1901-2000      259
                501- 550   13546             2001-2100      159
                551- 600    9888             2101-2200      230
                601- 650    8484             2201-2300      211
                651- 700    5714             2301-2400      142
                701- 750    4531             2401-2500      104
                751- 800    3622             >2500          794
                801- 850    3059
                851- 900    3253
                901- 950    2722
                951-1000    1922



   The average sequence length in UniProtKB/Swiss-Prot is 358 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1884


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  635
                       2x:  252
                       3x:  127
                       4x:  105
                       5x:   64
                       6x:   56
                       7x:   35
                       8x:   41
                       9x:   37
                      10x:   16
                  11- 20x:  152
                  21- 50x:  143
                  51-100x:   84
                    >100x:  137


   5.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        15830   Journal of Biological Chemistry
    2         7462   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4618   Journal of Bacteriology
    4         4344   Gene
    5         4134   Nucleic Acids Research
    6         4064   Biochemical and Biophysical Research Communications
    7         3679   FEBS Letters
    8         3440   Biochemistry
    9         3404   The EMBO Journal
   10         3015   Molecular and Cellular Biology
   11         2982   European Journal of Biochemistry
   12         2912   Nature
   13         2752   Biochimica et Biophysica Acta
   14         2639   Journal of Molecular Biology
   15         2387   Genomics
   16         2360   Cell
   17         1958   Biochemical Journal
   18         1840   Science
   19         1546   Molecular Microbiology
   20         1504   Journal of Virology
   21         1391   Plant Molecular Biology
   22         1387   Journal of Cell Biology
   23         1283   Molecular and General Genetics
   24         1195   Virology
   25         1165   Human Molecular Genetics
   26         1158   Nature Genetics
   27         1148   Genes and Development
   28         1108   Journal of Biochemistry
   29         1054   Oncogene
   30         1045   The American Journal of Human Genetics
   31         1045   Plant Physiology
   32          908   Development
   33          868   Human Mutation
   34          868   Journal of Immunology
   35          827   Genetics
   36          798   Infection and Immunity
   37          777   Structure
   38          772   Molecular Biology of the Cell
   39          743   Journal of General Virology
   40          730   Archives of Biochemistry and Biophysics
   41          712   Yeast
   42          677   Blood
   43          653   The Plant Cell
   44          646   Microbiology
   45          603   Molecular Cell
   46          584   FEMS Microbiology Letters
   47          567   Cancer Research
   48          567   Developmental Biology
   49          563   Journal of Cell Science
   50          560   Nature Structural Biology
   51          544   Human Genetics
   52          544   The Plant Journal
   53          505   Mechanisms of Development
   54          501   Current Genetics
   55          487   Current Biology
   56          459   Journal of Clinical Investigation
   57          458   Applied and Environmental Microbiology
   58          455   Neuron
   59          451   Protein Science
   60          450   Acta Crystallographica, Section D
   61          449   Journal of Neuroscience
   62          443   Mammalian Genome
   63          411   Molecular Endocrinology
   64          409   Molecular and Biochemical Parasitology
   65          408   The Journal of Experimental Medicine
   66          389   Immunogenetics
   67          374   Journal of Neurochemistry
   68          373   Toxicon
   69          366   American Journal of Physiology
   70          360   Endocrinology
   71          355   Journal of Molecular Evolution
   72          347   DNA and Cell Biology
   73          334   The Journal of Clinical Endocrinology and Metabolism
   74          334   DNA Sequence
   75          316   Molecular Biology and Evolution
   76          307   Bioscience, Biotechnology, and Biochemistry
   77          300   Brain Research. Molecular Brain Research
   78          285   Biological Chemistry Hoppe-Seyler
   79          284   Journal of Medical Genetics
   80          270   Proteins
   81          263   Cytogenetics and Cell Genetics
   82          257   Comparative Biochemistry and Physiology
   83          247   Journal of Investigative Dermatology
   84          244   Journal of General Microbiology
   85          243   Peptides
   86          238   Antimicrobial Agents and Chemotherapy
   87          238   Molecular Pharmacology
   88          229   Biology of Reproduction
   89          226   Nature Cell Biology
   90          219   Plant and Cell Physiology
   91          215   Genome Research
   92          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   93          208   Experimental Cell Research
   94          206   Virus Research
   95          193   Molecular Plant-Microbe Interactions
   96          191   Neurology
   97          187   DNA Research
   98          184   European Journal of Immunology
   99          183   RNA
  100          181   Developmental Dynamics
  101          172   Biochimie
  102          168   Annals of Neurology
  103          165   European Journal of Human Genetics
  104          163   Molecular and Cellular Endocrinology
  105          163   Tissue Antigens
  106          159   Journal of Human Genetics
  107          159   DNA
  108          158   Planta
  109          155   Genes to Cells
  110          153   American Journal of Medical Genetics
  111          152   Molecular Phylogenetics and Evolution
  112          152   Hemoglobin
  113          149   Developmental Cell
  114          148   Immunity
  115          146   Archives of Microbiology
  116          145   Bioorganicheskaia Khimiia
  117          139   Insect Biochemistry and Molecular Biology
  118          136   The New England Journal of Medicine
  119          133   Molecular Reproduction and Development
  120          130   Diabetes
  121          130   Animal Genetics
  122          130   Investigative Ophthalmology and Visual Science
  123          128   Glycobiology
  124          127   Molecular Immunology
  125          123   General and Comparative Endocrinology
  126          121   Molecular and Cellular Neuroscience
  127          118   Agricultural and Biological Chemistry
  128          117   Eukaryotic cell
  129          116   Archives of Virology
  130          111   International Journal of Cancer
  131          110   British Journal of Haematology
  132          109   The FASEB Journal
  133          107   Journal of Protein Chemistry
  134          104   Molecular Genetics and Metabolism
  135          102   Journal of Neuroscience Research
  136          102   EMBO Reports
  137          101   Molecular Genetics and Genomics
  138          100   Biochemistry and Molecular Biology International


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     658462              1.85
   Journal                          541550    286387    1.52
   Submitted to EMBL/GenBank/DDBJ   110587    101077    0.31
   Submitted to other databases       4361      4022    0.01
   Book citation                       580       570   <0.01
   Plant Gene Register                 539       527   <0.01
   Thesis                              416       414   <0.01
   Unpublished observations            284       280   <0.01
   Patent                              139       137   <0.01
   Worm Breeder's Gazette                6         6   <0.01

Comments (CC)                      1457200              4.09
   SIMILARITY                       406912    333079    1.14
   FUNCTION                         254080    244576    0.71
   SUBCELLULAR LOCATION             201326    197449    0.57
   CATALYTIC ACTIVITY               140827    129089    0.40
   SUBUNIT                          137675    137675    0.39
   PATHWAY                           80947     70624    0.23
   COFACTOR                          57092     52038    0.16
   TISSUE SPECIFICITY                28336     28336    0.08
   PTM                               26602     21828    0.07
   MISCELLANEOUS                     25620     23189    0.07
   DOMAIN                            22053     19330    0.06
   ALTERNATIVE PRODUCTS              14956     14956    0.04
   SEQUENCE CAUTION                   9219      9219    0.03
   INTERACTION                        9147      9147    0.03
   INDUCTION                          8458      8458    0.02
   DEVELOPMENTAL STAGE                7152      7152    0.02
   ENZYME REGULATION                  5354      5354    0.02
   WEB RESOURCE                       5071      4011    0.01
   CAUTION                            4606      4509    0.01
   DISEASE                            4200      2911    0.01
   MASS SPECTROMETRY                  3242      2580    0.01
   BIOPHYSICOCHEMICAL PROPERTIES      2026      2026    0.01
   POLYMORPHISM                        675       650   <0.01
   RNA EDITING                         517       517   <0.01
   ALLERGEN                            432       432   <0.01
   TOXIC DOSE                          367       359   <0.01
   BIOTECHNOLOGY                       229       227   <0.01
   PHARMACEUTICAL                       79        79   <0.01

Features (FT)                      2193020              6.16
   CHAIN                            361940    352406    1.02
   TRANSMEM                         216487     47150    0.61
   METAL                            159687     39854    0.45
   BINDING                          107776     35320    0.30
   CONFLICT                         103693     35990    0.29
   DOMAIN                           103615     58710    0.29
   TOPO_DOM                          98201     20112    0.28
   STRAND                            92184      8655    0.26
   HELIX                             88577      9093    0.25
   ACT_SITE                          86543     50940    0.24
   MOD_RES                           85863     31943    0.24
   CARBOHYD                          83674     21422    0.23
   DISULFID                          82625     20876    0.23
   REPEAT                            64854      9936    0.18
   NP_BIND                           60272     42086    0.17
   REGION                            54380     29930    0.15
   VARIANT                           54045     11916    0.15
   VAR_SEQ                           32496     13901    0.09
   COMPBIAS                          31544     18673    0.09
   SIGNAL                            28029     28019    0.08
   TURN                              23669      7381    0.07
   MOTIF                             22815     14879    0.06
   ZN_FING                           22612      8858    0.06
   MUTAGEN                           22566      5463    0.06
   SITE                              22452     12757    0.06
   COILED                            13384      8710    0.04
   INIT_MET                          11919     11919    0.03
   NON_TER                           10874      8325    0.03
   PROPEP                             8895      7475    0.02
   LIPID                              8754      5657    0.02
   DNA_BIND                           8192      7572    0.02
   PEPTIDE                            7143      4444    0.02
   TRANSIT                            5139      5085    0.01
   CA_BIND                            2958      1258    0.01
   CROSSLNK                           2793      1949    0.01
   NON_CONS                           1405       571   <0.01
   UNSURE                              626       208   <0.01
   NON_STD                             339       265   <0.01

Cross-references (DR)              5789634             16.25
   InterPro                         825089    331328    2.32
   EMBL                             627265    347443    1.76
   GO                               497745    216624    1.40
   Pfam                             459384    322980    1.29
   PROSITE                          321640    201793    0.90
   RefSeq                           304120    279958    0.85
   GeneID                           294610    279522    0.83
   KEGG                             264948    246931    0.74
   GenomeReviews                    226623    209455    0.64
   HAMAP                            178738    178641    0.50
   TIGRFAMs                         161263    151501    0.45
   Gene3D                           144931    122545    0.41
   BioCyc                           144788    138347    0.41
   PANTHER                          127681    117494    0.36
   PRINTS                           116051     94354    0.33
   PIR                              109846    100280    0.31
   ProDom                           100202     97501    0.28
   SMART                             95398     72741    0.27
   HSSP                              83032     83032    0.23
   UniGene                           74720     68329    0.21
   Ensembl                           63363     62014    0.18
   SMR                               49846     49846    0.14
   PDBsum                            48670     12327    0.14
   PDB                               48670     12327    0.14
   ArrayExpress                      44118     44118    0.12
   PIRSF                             43200     43200    0.12
   GermOnline                        41993     41381    0.12
   TIGR                              30250     29583    0.08
   CleanEx                           27382     26814    0.08
   HGNC                              17972     17859    0.05
   LinkHub                           17909     17909    0.05
   IntAct                            16028     16028    0.04
   PharmGKB                          15517     15512    0.04
   MGI                               14890     14840    0.04
   MIM                               14743     11762    0.04
   PhosphoSite                       14155     14155    0.04
   H-InvDB                           11268      9573    0.03
   DIP                                8984      8934    0.03
   MEROPS                             6722      6399    0.02
   RGD                                6656      6651    0.02
   SGD                                6643      6541    0.02
   CYGD                               6630      6526    0.02
   TAIR                               6545      6431    0.02
   PeptideAtlas                       5132      5132    0.01
   EcoGene                            4331      4328    0.01
   GeneDB_Spombe                      4236      4196    0.01
   EchoBASE                           4159      4124    0.01
   WormPep                            3811      3150    0.01
   FlyBase                            3591      3463    0.01
   Gramene                            3537      3535    0.01
   WormBase                           3514      3432    0.01
   Reactome                           3398      2056    0.01
   HPA                                2986      2565    0.01
   TRANSFAC                           2907      2608    0.01
   SubtiList                          2812      2811    0.01
   Orphanet                           2618      1670    0.01
   GeneFarm                           2091      2071    0.01
   ZFIN                               1852      1838    0.01
   DrugBank                           1821       501    0.01
   StyGene                            1631      1627   <0.01
   TubercuList                        1464      1428   <0.01
   SWISS-2DPAGE                       1184      1182   <0.01
   PseudoCAP                          1173      1164   <0.01
   ListiList                          1113      1105   <0.01
   REPRODUCTION-2DPAGE                1025       937   <0.01
   AGD                                 709       703   <0.01
   PhotoList                           668       668   <0.01
   LegioList                           659       659   <0.01
   Leproma                             644       641   <0.01
   dictyBase                           637       587   <0.01
   World-2DPAGE                        492       492   <0.01
   MaizeGDB                            463       458   <0.01
   PeroxiBase                          446       435   <0.01
   DisProt                             397       394   <0.01
   OGP                                 380       378   <0.01
   SagaList                            367       366   <0.01
   REBASE                              366       360   <0.01
   HIV                                 361       351   <0.01
   ECO2DBASE                           351       299   <0.01
   GlycoSuiteDB                        282       282   <0.01
   PHCI-2DPAGE                         241       241   <0.01
   BuruList                            234       234   <0.01
   MypuList                            198       198   <0.01
   VectorBase                          192       186   <0.01
   DOSAC-COBS-2DPAGE                   152       150   <0.01
   Aarhus/Ghent-2DPAGE                 126        96   <0.01
   Siena-2DPAGE                        102       102   <0.01
   HSC-2DPAGE                           85        85   <0.01
   2DBase-Ecoli                         84        84   <0.01
   PhosSite                             70        70   <0.01
   Cornea-2DPAGE                        67        67   <0.01
   COMPLUYEAST-2DPAGE                   59        59   <0.01
   euHCVdb                              55        44   <0.01
   PMMA-2DPAGE                          52        52   <0.01
   PptaseDB                             31        31   <0.01
   Rat-heart-2DPAGE                     28        28   <0.01
   ANU-2DPAGE                           22        22   <0.01

Number of explicitly cross-referenced databases: 98
Number of implicitly cross-referenced databases: 25


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 254724

Total number of entries encoded on a Mitochondrion: 4370
Total number of entries encoded on a Plasmid: 3377
Total number of entries encoded on a Plastid: 72
Total number of entries encoded on a Plastid; Apicoplast: 14
Total number of entries encoded on a Plastid; Chloroplast: 8834
Total number of entries encoded on a Plastid; Cyanelle: 145
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 118

Number of fragments: 8475 
Number of additional sequences produced by alternative splicing, initiation or promoter usage: 24213



UniProtKB/TrEMBL protein database release 38.0 statistics


1.  INTRODUCTION

Release 38.0 of 26-Feb-2008 of UniProtKB/TrEMBL contains 5395414 sequence entries
comprising 1746448602 amino acids.

988909 sequences have been added since release 37, the sequence data of
19885 existing entries has been updated and the annotations of
3686771 entries have been revised. This represents an increase of 23%.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 8.56   Gln (Q) 3.90   Leu (L) 9.85   Ser (S) 6.82
   Arg (R) 5.55   Glu (E) 6.05   Lys (K) 5.21   Thr (T) 5.59
   Asn (N) 4.19   Gly (G) 7.06   Met (M) 2.41   Trp (W) 1.33
   Asp (D) 5.26   His (H) 2.22   Phe (F) 4.05   Tyr (Y) 3.02
   Cys (C) 1.35   Ile (I) 5.92   Pro (P) 4.83   Val (V) 6.65

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.07


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 155282

   The first twenty species represent  910939 sequences:  16.9 % of the
   total number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x:71108
                            2x:28370
                            3x:14802
                            4x: 8349
                            5x: 4833
                            6x: 3577
                            7x: 2706
                            8x: 2194
                            9x: 1731
                           10x: 2022
                       11- 20x: 8971
                       21- 50x: 3194
                       51-100x: 1327
                         >100x: 2098



   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     208379  Human immunodeficiency virus 1
       2      95503  Oryza sativa subsp. japonica (Rice)
       3      54297  Vitis vinifera (Grape)
       4      52760  Homo sapiens (Human)
       5      50189  Trichomonas vaginalis G3
       6      43738  Mus musculus (Mouse)
       7      43432  Arabidopsis thaliana (Mouse-ear cress)
       8      40657  Hepatitis C virus
       9      39808  Paramecium tetraurelia
      10      39306  Oryza sativa subsp. indica (Rice)
      11      35649  Physcomitrella patens subsp. patens
      12      28061  Tetraodon nigroviridis (Green puffer)
      13      27783  Drosophila melanogaster (Fruit fly)
      14      24866  Nematostella vectensis (Starlet sea anemone)
      15      24850  uncultured bacterium
      16      22520  Danio rerio (Zebrafish) (Brachydanio rerio)
      17      20488  Trypanosoma cruzi
      18      20471  Caenorhabditis elegans
      19      19334  Caenorhabditis briggsae
      20      18848  Hepatitis B virus (HBV)
      21      17895  Laccaria bicolor S238N-H82
      22      16810  Aedes aegypti (Yellowfever mosquito)
      23      16685  Tetrahymena thermophila SB210
      24      16332  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      25      15912  Phaeosphaeria nodorum (Septoria nodorum)
      26      14678  Chlamydomonas reinhardtii (Chlamydomonas smithii)
      27      14676  Plasmodium chabaudi
      28      14359  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
      29      14058  Aspergillus niger
      30      13923  Dictyostelium discoideum (Slime mold)
      31      13523  Coprinopsis cinerea okayama7#130
      32      12791  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
      33      12773  Anopheles gambiae str. PEST
      34      12475  Xenopus laevis (African clawed frog)
      35      11969  Aspergillus oryzae
      36      11784  Plasmodium berghei
      37      11570  Brugia malayi (Filarial nematode worm)
      38      10945  Chaetomium globosum (Soil fungus)
      39      10457  Neurospora crassa
      40      10358  Coccidioides immitis
      41      10339  Bos taurus (Bovine)
      42      10302  Neosartorya fischeri  (Aspergillus fischerianus 
      43      10298  Aspergillus terreus (strain NIH 2624)
      44      10009  Drosophila pseudoobscura (Fruit fly)
      45       9723  Schistosoma japonicum (Blood fluke)
      46       9707  Cryptococcus neoformans (Filobasidiella neoformans)
      47       9676  Aspergillus fumigatus (Sartorya fumigata)
      48       9473  Emericella nidulans (Aspergillus nidulans)
      49       9463  Trypanosoma brucei
      50       9444  Escherichia coli
      51       9320  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
      52       9290  Candida albicans (Yeast)
      53       9225  Ajellomyces capsulata (strain NAm1) (Histoplasma capsulatum)
      54       9155  Monosiga brevicollis MX1
      55       9132  Hepatitis C virus subtype 1b
      56       9012  Aspergillus clavatus
      57       8861  Rhodococcus sp. (strain RHA1)
      58       8607  Entamoeba dispar SAW760
      59       8603  Methylobacterium nodulans ORS 2060
      60       8512  Stigmatella aurantiaca DW4/3-1
      61       8437  Plesiocystis pacifica SIR-1
      62       8427  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      63       8249  Microscilla marina ATCC 23134
      64       8238  Burkholderia xenovorans (strain LB400)
      65       8195  Rattus norvegicus (Rat)
      66       8177  Helicobacter pylori (Campylobacter pylori)
      67       8172  Acaryochloris marina (strain MBIC 11017)
      68       8167  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      69       8120  Bradyrhizobium japonicum
      70       8013  Leishmania infantum
      71       7971  Ostreococcus tauri
      72       7887  Leishmania braziliensis
      73       7835  Burkholderia phymatum STM815
      74       7810  Plasmodium yoelii yoelii
      75       7612  Solibacter usitatus (strain Ellin6076)
      76       7515  Streptomyces coelicolor
      77       7461  Burkholderia cenocepacia MC0-3
      78       7434  Plasmodium vivax
      79       7401  Ostreococcus lucimarinus (strain CCE9901)
      80       7349  Burkholderia pseudomallei 305
      81       7336  Bradyrhizobium sp. (strain BTAi1 / ATCC BAA-1182)
      82       7316  Burkholderia sp. (strain 383) (Burkholderia cepacia 
      83       7310  Burkholderia phytofirmans PsJN
      84       7279  Streptomyces avermitilis
      85       7274  Clostridium bolteae ATCC BAA-613
      86       7142  Rhizobium loti (Mesorhizobium loti)
      87       7133  Frankia sp. EAN1pec
      88       7126  Burkholderia vietnamiensis (strain G4 / LMG 22486) (Burkholderia cepacia 
      89       7116  Leishmania major
      90       7102  Plasmodium falciparum
      91       7097  Myxococcus xanthus (strain DK 1622)
      92       7011  Saccharopolyspora erythraea (strain NRRL 23338)
      93       7010  Pseudomonas aeruginosa
      94       6996  Burkholderia ambifaria MC40-6
      95       6974  Methylobacterium sp. 4-46
      96       6946  Rhodopirellula baltica
      97       6945  Burkholderia pseudomallei (strain 668)
      98       6864  Burkholderia pseudomallei (strain 1106a)
      99       6808  Rhizobium leguminosarum bv. viciae (strain 3841)
     100       6679  Psychroflexus torquis ATCC 700755


   
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea          112655 (  2%)
    Bacteria        2905746 ( 54%)
    Eukaryota       1785622 ( 33%)
    Viruses          586657 ( 11%)
    Other              4733 ( <1%)



   Within Eukaryota:

 
    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  52761 (  3%)           (  1%)
     Other Mammalia        130205 (  7%)           (  2%)
     Other Vertebrata      197451 ( 11%)           (  4%)
     Viridiplantae         471101 ( 26%)           (  9%)
     Fungi                 327023 ( 18%)           (  6%)
     Insecta               158021 (  9%)           (  3%)
     Nematoda               56038 (  3%)           (  1%)
     Other                 393022 ( 22%)           (  7%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50   91470             1001-1100    33181
                 51- 100  383677             1101-1200    23244
                101- 150  468708             1201-1300    15878
                151- 200  447604             1301-1400    10607
                201- 250  449588             1401-1500     8618
                251- 300  430199             1501-1600     6336
                301- 350  398830             1601-1700     4861
                351- 400  314841             1701-1800     3959
                401- 450  258394             1801-1900     2977
                451- 500  217852             1901-2000     2552
                501- 550  156341             2001-2100     2030
                551- 600  116102             2101-2200     2085
                601- 650   86645             2201-2300     1640
                651- 700   67840             2301-2400     1326
                701- 750   59065             2401-2500     1089
                751- 800   53008             >2500         9352
                801- 850   38898
                851- 900   34173
                901- 950   24888
                951-1000   19770



   The average sequence length in UniProtKB/TrEMBL is   323 amino acids.

   The shortest sequence is Q16047_HUMAN:     4 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    7019903              1.30
   Submitted to EMBL/GenBank/DDBJ  3757942   3171477    0.70
   Journal                         3147573   2771279    0.58
   Thesis                             6762      6709   <0.01
   Submitted to other databases       4688      4671   <0.01
   Book citation                      4382      4337   <0.01
   Other                             98556     98380    0.02

Comments (CC)                      3791142              0.70
   SIMILARITY                      1200496   1084100    0.22
   CAUTION                         1126073   1126073    0.21
   CATALYTIC ACTIVITY               403074    342747    0.07
   FUNCTION                         367781    356067    0.07
   SUBCELLULAR LOCATION             315018    315008    0.06
   PATHWAY                          130311    121801    0.02
   SUBUNIT                          123048    122305    0.02
   COFACTOR                         114551    112375    0.02
   MISCELLANEOUS                      5502      5502   <0.01
   INTERACTION                        4745      4745   <0.01
   DOMAIN                              543       543   <0.01

Features (FT)                      2300912              0.43
   NON_TER                         1925207   1145607    0.36
   CHAIN                            224689    186970    0.04
   SIGNAL                           150459    150459    0.03
   TRANSIT                             557       557   <0.01

Cross-references (DR)             47445741              8.79
   GO                              9719800   3154534    1.80
   InterPro                        8029335   3799022    1.49
   EMBL                            6079047   5387662    1.13
   Pfam                            4861188   3591860    0.90
   PROSITE                         2596670   1703309    0.48
   RefSeq                          2269376   2189726    0.42
   GeneID                          2262156   2188782    0.42
   KEGG                            1879939   1814567    0.35
   GenomeReviews                   1592238   1541926    0.30
   Gene3D                          1382127   1183950    0.26
   PRINTS                          1000951    841622    0.19
   SMART                            915287    717020    0.17
   TIGRFAMs                         801155    734682    0.15
   PANTHER                          780419    739026    0.14
   ProDom                           634393    605610    0.12
   SMR                              499090    498972    0.09
   BioCyc                           305512    292732    0.06
   UniGene                          285368    260272    0.05
   HSSP                             264084    263797    0.05
   TIGR                             200230    192919    0.04
   PIR                              183544    150486    0.03
   PIRSF                            180767    180767    0.03
   Ensembl                          166331    159129    0.03
   ArrayExpress                     104405    104378    0.02
   Gramene                           70292     70292    0.01
   euHCVdb                           47780     47780    0.01
   MGI                               42265     42052    0.01
   FlyBase                           35850     35710    0.01
   HGNC                              31568     31534    0.01
   VectorBase                        29029     28730    0.01
   MEROPS                            27182     26524    0.01
   WormPep                           19555     19453   <0.01
   WormBase                          19547     19453   <0.01
   TAIR                              18856     18804   <0.01
   ZFIN                              16420     16413   <0.01
   LinkHub                           12490     12490   <0.01
   dictyBase                         12365     12363   <0.01
   RGD                                6137      4218   <0.01
   PDBsum                             5994      3421   <0.01
   PDB                                5994      3421   <0.01
   IntAct                             5604      5603   <0.01
   LegioList                          5244      5214   <0.01
   ListiList                          4702      4685   <0.01
   PseudoCAP                          4397      4394   <0.01
   PhotoList                          4012      3888   <0.01
   BuruList                           4006      3972   <0.01
   AGD                                3985      3985   <0.01
   REBASE                             3685      3660   <0.01
   TubercuList                        2525      2519   <0.01
   DIP                                2311      2306   <0.01
   PeroxiBase                         2131      2125   <0.01
   PhosphoSite                        1750      1750   <0.01
   SagaList                           1727      1633   <0.01
   Leproma                             963       962   <0.01
   TRANSFAC                            846       837   <0.01
   GeneDB_Spombe                       735       729   <0.01
   MypuList                            584       580   <0.01
   PharmGKB                            472       471   <0.01
   World-2DPAGE                        421       421   <0.01
   SGD                                 337       337   <0.01
   PeptideAtlas                        249       249   <0.01
   PHCI-2DPAGE                         106       106   <0.01
   Reactome                             85        79   <0.01
   ANU-2DPAGE                           59        59   <0.01
   SWISS-2DPAGE                         29        29   <0.01
   REPRODUCTION-2DPAGE                  18        18   <0.01
   CYGD                                 16        16   <0.01
   PMMA-2DPAGE                           3         3   <0.01
   Siena-2DPAGE                          2         2   <0.01
   COMPLUYEAST-2DPAGE                    1         1   <0.01

Number of explicitly cross-referenced databases: 98


6.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/TrEMBL: 260940

Total number of entries encoded on a Mitochondrion: 195527
Total number of entries encoded on a Plasmid: 85079
Total number of entries encoded on a Plastid: 4006
Total number of entries encoded on a Plastid; Apicoplast: 182
Total number of entries encoded on a Plastid; Chloroplast: 67647
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 231

Number of fragments: 1147786


Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/support/submissions.shtml

For all queries regarding submissions to UniProtKB and to submit new protein sequence data, please contact:

UniProt Knowledgebase
The EMBL Outstation - The European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 462
Telefax: (+44 1223) 494 468
E-mail: datasubs@ebi.ac.uk


Download information

Minor releases (every 3 weeks)

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/database/download.shtml. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address: datalib@ebi.ac.uk / swissprot@ebi.ac.uk
WWW server: http://www.ebi.ac.uk/


Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address: swiss-prot@expasy.org
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address: pirmail@georgetown.edu
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication, please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt)"
Nucleic Acids Res. 36:D190-D195(2008) doi:10.1093/nar/gkm895