Swiss-Prot Protein Knowledgebase
Release Notes

Release 43.0 of 29-Mar-2004

Content


  Introduction
  Status of the model organisms
  Swiss-Prot protein knowledgebase release 43.0 statistics
  We need your help


  See also Recent changes and Forthcoming changes.

Introduction

Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146'720 sequence entries, comprising 54'093'154 amino acids abstracted from 113'719 references. 10'760 sequences have been added since release 42, the sequence data of 663 existing entries has been updated and the annotations of 44'948 entries have been revised. This represents an increase of 8%.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154

Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana None yet arath.txt 2'591
C.albicans None yet calbican.txt 286
C.elegans Wormpep celegans.txt 2'458
D.discoideum DictyDB dicty.txt 319
D.melanogaster FlyBase fly.txt 1'967
M.musculus MGD mgdtosp.txt 7'326
S.cerevisiae SGD yeast.txt 4'930
S.pombe GeneDB_SPombe pombe.txt 2'386

Swiss-Prot protein knowledgebase release 43.0 statistics






1.  INTRODUCTION



Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146720 sequence entries,

comprising 54093154 amino acids abstracted from 113719 references. 



10760 sequences have been added since release 42, the sequence data of

663 existing entries has been updated and the annotations of

44948 entries have been revised. This represents an increase of 8%.





2.  AMINO ACID COMPOSITION



   2.1  Composition in percent for the complete database



   Ala (A) 7.79   Gln (Q) 3.92   Leu (L) 9.60   Ser (S) 6.89

   Arg (R) 5.28   Glu (E) 6.59   Lys (K) 5.93   Thr (T) 5.47

   Asn (N) 4.23   Gly (G) 6.93   Met (M) 2.37   Trp (W) 1.16

   Asp (D) 5.30   His (H) 2.27   Phe (F) 4.03   Tyr (Y) 3.09

   Cys (C) 1.56   Ile (I) 5.91   Pro (P) 4.85   Val (V) 6.70



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01





   2.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,

   Gln, Tyr, Met, His, Cys, Trp





3.  TAXONOMIC ORIGIN



   Total number of species represented in this release of Swiss-Prot: 8424



   The first twenty species represent 57715 sequences:  39.3 % of the total

   number of entries.





   3.1 Table of the frequency of occurrence of species



        Species represented 1x: 4065

                            2x: 1283

                            3x:  661

                            4x:  431

                            5x:  269

                            6x:  262

                            7x:  197

                            8x:  149

                            9x:  127

                           10x:   88

                       11- 20x:  344

                       21- 50x:  250

                       51-100x:   93

                         >100x:  205





   3.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      10691  Homo sapiens (Human)

       2       7326  Mus musculus (Mouse)

       3       4930  Saccharomyces cerevisiae (Baker's yeast)

       4       4835  Escherichia coli

       5       3726  Rattus norvegicus (Rat)

       6       2712  Bacillus subtilis

       7       2591  Arabidopsis thaliana (Mouse-ear cress)

       8       2458  Caenorhabditis elegans

       9       2386  Schizosaccharomyces pombe (Fission yeast)

      10       1967  Drosophila melanogaster (Fruit fly)

      11       1773  Haemophilus influenzae

      12       1772  Methanococcus jannaschii

      13       1647  Escherichia coli O157:H7

      14       1438  Bos taurus (Bovine)

      15       1406  Salmonella typhimurium

      16       1393  Mycobacterium tuberculosis

      17       1284  Escherichia coli O6

      18       1210  Shigella flexneri

      19       1090  Gallus gallus (Chicken)

      20       1080  Mycobacterium bovis

      21        980  Salmonella typhi

      22        962  Pseudomonas aeruginosa

      23        941  Synechocystis sp. (strain PCC 6803)

      24        937  Archaeoglobus fulgidus

      25        873  Xenopus laevis (African clawed frog)

      26        850  Sus scrofa (Pig)

      27        766  Rhizobium meliloti (Sinorhizobium meliloti)

      28        743  Vibrio cholerae

      29        738  Aquifex aeolicus

      30        725  Oryctolagus cuniculus (Rabbit)

      31        695  Yersinia pestis

      32        687  Mycoplasma pneumoniae

      33        647  Pasteurella multocida

      34        605  Mycobacterium leprae

      35        603  Treponema pallidum

      36        601  Streptomyces coelicolor

      37        586  Bacillus halodurans

      38        572  Buchnera aphidicola (subsp. Acyrthosiphon pisum) 

      39        570  Vibrio parahaemolyticus

      40        560  Buchnera aphidicola (subsp. Schizaphis graminum)

      41        557  Methanobacterium thermoautotrophicum

      42        557  Helicobacter pylori (Campylobacter pylori)

      43        543  Rickettsia prowazekii

      44        542  Anabaena sp. (strain PCC 7120)

      45        538  Helicobacter pylori J99 (Campylobacter pylori J99)

      46        518  Vibrio vulnificus

      47        504  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      48        503  Staphylococcus aureus (strain N315)

      49        499  Zea mays (Maize)

      50        495  Lactococcus lactis (subsp. lactis) (Streptococcus lactis)

      51        487  Staphylococcus aureus (strain MW2)

      52        486  Mycoplasma genitalium

      53        467  Ralstonia solanacearum (Pseudomonas solanacearum)

      54        464  Staphylococcus epidermidis

      55        463  Listeria monocytogenes

      56        459  Neisseria meningitidis (serogroup B)

      57        457  Listeria innocua

      58        457  Neisseria meningitidis (serogroup A)

      59        449  Pseudomonas putida (strain KT2440)

      60        448  Thermotoga maritima

      61        447  Rhizobium loti (Mesorhizobium loti)

      62        447  Agrobacterium tumefaciens (strain C58 / ATCC 33970)

      63        443  Xanthomonas campestris (pv. campestris)

      64        443  Clostridium acetobutylicum

      65        438  Pseudomonas syringae (pv. tomato)

      66        434  Caulobacter crescentus

      67        424  Oryza sativa (Rice)

      68        419  Deinococcus radiodurans

      69        417  Chlamydia trachomatis

      70        416  Streptococcus pneumoniae

      71        414  Borrelia burgdorferi (Lyme disease spirochete)

      72        412  Xylella fastidiosa

      73        411  Canis familiaris (Dog)

      74        407  Xanthomonas axonopodis (pv. citri)

      75        406  Pyrococcus horikoshii

      76        405  Chlamydia pneumoniae (Chlamydophila pneumoniae)

      77        403  Rhizobium sp. (strain NGR234)

      78        400  Buchnera aphidicola (subsp. Baizongia pistaciae)

      79        400  Pyrococcus abyssi

      80        400  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

      81        395  Chlamydia muridarum

      82        382  Clostridium perfringens

      83        377  Brucella melitensis

      84        375  Brucella suis

      85        374  Bradyrhizobium japonicum

      86        371  Corynebacterium glutamicum (Brevibacterium flavum)

      87        365  Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)

      88        362  Campylobacter jejuni

      89        361  Methanosarcina acetivorans

      90        356  Methanosarcina mazei (Methanosarcina frisia)

      91        355  Nicotiana tabacum (Common tobacco)

      92        355  Pyrococcus furiosus

      93        353  Sulfolobus solfataricus

      94        353  Thermoanaerobacter tengcongensis

      95        348  Streptococcus pyogenes

      96        343  Rickettsia conorii

      97        342  Ovis aries (Sheep)

      98        330  Lactobacillus plantarum

      99        330  Shewanella oneidensis

     100        321  Aeropyrum pernix





   

   3.3  Taxonomic distribution of the sequences



   Kingdom        sequences (% of the database)

    Archaea            8393 (  6%)

    Bacteria          62334 ( 42%)

    Eukaryota         67392 ( 46%)

    Viruses            8601 (  6%)





   Within Eukaryota:



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  10691 ( 16%)           (  7%)

     Other Mammalia         18197 ( 27%)           ( 12%)

     Other Vertebrata        6287 (  9%)           (  4%)

     Viridiplantae          10743 ( 16%)           (  7%)

     Fungi                   9849 ( 15%)           (  7%)

     Insecta                 3685 (  5%)           (  3%)

     Nematoda                2692 (  4%)           (  2%)

     Other                   5248 (  8%)           (  4%)





4.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    2672             1001-1100     1294

                 51- 100    9908             1101-1200      921

                101- 150   14435             1201-1300      686

                151- 200   13501             1301-1400      496

                201- 250   14264             1401-1500      388

                251- 300   12453             1501-1600      250

                301- 350   12935             1601-1700      185

                351- 400   11893             1701-1800      135

                401- 450    9066             1801-1900      150

                451- 500    7801             1901-2000      120

                501- 550    5961             2001-2100       70

                551- 600    3965             2101-2200      108

                601- 650    3391             2201-2300      100

                651- 700    2385             2301-2400       59

                701- 750    2073             2401-2500       62

                751- 800    1741             >2500          386

                801- 850    1359

                851- 900    1418

                901- 950    1006

                951-1000     862





   The average sequence length in Swiss-Prot is 368 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is   SNE1_HUMAN (Q8NF91):  8797 amino acids.





5.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of Swiss-Prot: 1437





   5.1 Table of the frequency of journal citations



        Journals cited 1x:  529

                       2x:  181

                       3x:  103

                       4x:   66

                       5x:   56

                       6x:   35

                       7x:   32

                       8x:   26

                       9x:   24

                      10x:   15

                  11- 20x:  110

                  21- 50x:  110

                  51-100x:   46

                    >100x:  104





   5.2  List of the most cited journals in Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        10139   Journal of Biological Chemistry

    2         5380   Proceedings of the National Academy of Sciences of the U.S.A.

    3         3835   Journal of Bacteriology

    4         3693   Nucleic Acids Research

    5         3568   Gene

    6         2843   FEBS Letters

    7         2789   Biochemical and Biophysical Research Communications

    8         2573   Biochemistry

    9         2543   European Journal of Biochemistry

   10         2361   The EMBO Journal

   11         2233   Nature

   12         2157   Biochimica et Biophysica Acta

   13         1949   Journal of Molecular Biology

   14         1886   Genomics

   15         1728   Cell

   16         1700   Molecular and Cellular Biology

   17         1353   Biochemical Journal

   18         1293   Science

   19         1153   Plant Molecular Biology

   20         1147   Molecular Microbiology

   21         1141   Molecular and General Genetics

   22          887   Journal of Biochemistry

   23          868   Virology

   24          834   Human Molecular Genetics

   25          788   Journal of Cell Biology

   26          745   Nature Genetics

   27          682   Genes and Development

   28          657   Journal of Virology

   29          641   The American Journal of Human Genetics

   30          639   Plant Physiology

   31          626   Human Mutation

   32          621   Oncogene

   33          568   Infection and Immunity

   34          566   Journal of Immunology

   35          551   Yeast

   36          519   Journal of General Virology

   37          517   Structure

   38          505   Archives of Biochemistry and Biophysics

   39          488   Microbiology

   40          475   FEMS Microbiology Letters

   41          475   Development

   42          436   Nature Structural Biology

   43          423   Genetics

   44          416   Human Genetics

   45          399   Current Genetics

   46          383   Blood

   47          367   Molecular and Biochemical Parasitology

   48          345   Applied and Environmental Microbiology

   49          336   Journal of Clinical Investigation

   50          318   Mammalian Genome

   51          316   Molecular Endocrinology

   52          314   Developmental Biology

   53          310   Protein Science

   54          297   Immunogenetics

   55          297   DNA and Cell Biology

   56          293   Cancer Research

   57          291   Journal of Molecular Evolution

   58          279   Neuron

   59          274   The Journal of Experimental Medicine

   60          274   Molecular Biology of the Cell

   61          271   Biological Chemistry Hoppe-Seyler

   62          269   Mechanisms of Development

   63          265   Acta Crystallographica, Section D

   64          265   The Plant Cell

   65          250   Endocrinology

   66          246   Journal of Cell Science

   67          239   DNA Sequence

   68          234   The Plant Journal

   69          232   Journal of General Microbiology

   70          223   Journal of Neuroscience

   71          222   Molecular Biology and Evolution

   72          213   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie

   73          212   Journal of Neurochemistry

   74          208   Brain Research. Molecular Brain Research

   75          206   The Journal of Clinical Endocrinology and Metabolism

   76          194   Cytogenetics and Cell Genetics

   77          182   Toxicon

   78          177   Comparative Biochemistry and Physiology

   79          175   American Journal of Physiology

   80          174   Bioscience, Biotechnology, and Biochemistry

   81          167   Molecular Cell

   82          163   Molecular Pharmacology

   83          160   Antimicrobial Agents and Chemotherapy

   84          158   Current Biology

   85          156   DNA

   86          144   Journal of Investigative Dermatology

   87          142   Tissue Antigens

   88          141   DNA Research

   89          140   Proteins

   90          136   Molecular Plant-Microbe Interactions

   91          136   Biochimie

   92          132   Peptides

   93          132   Virus Research

   94          132   Journal of Medical Genetics

   95          129   Bioorganicheskaia Khimiia

   96          125   American Journal of Medical Genetics

   97          124   Genome Research

   98          120   Hemoglobin

   99          117   Molecular and Cellular Endocrinology

  100          114   Agricultural and Biological Chemistry

  101          108   Biology of Reproduction

  102          107   Plant and Cell Physiology

  103          105   European Journal of Immunology

  104          102   Archives of Microbiology





6.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                   Total    Number of  Average

Line type / subtype                number   entries    per entry

---------------------------------  -------- ---------  ---------



References (RL)                     283423              1.93

   Journal                          250489    136613    1.71

   Submitted to EMBL/GenBank/DDBJ    30227     25422    0.21

   Unpublished observations            536       532   <0.01

   Submitted to Swiss-Prot             527       525   <0.01

   Plant Gene Register                 487       476   <0.01

   Book citation                       465       453   <0.01

   Thesis                              263       261   <0.01

   Submitted to other databases        203       202   <0.01

   Unpublished results                 127       125   <0.01

   Patent                               97        96   <0.01

   Worm Breeder's Gazette                2         2   <0.01



Comments (CC)                       512278              3.49

   SIMILARITY                       147106    127783    1.00

   FUNCTION                          93944     92443    0.64

   SUBCELLULAR LOCATION              69668     69668    0.47

   CATALYTIC ACTIVITY                51230     48248    0.35

   SUBUNIT                           44297     44297    0.30

   PATHWAY                           24285     23209    0.17

   COFACTOR                          17058     17058    0.12

   TISSUE SPECIFICITY                16130     16130    0.11

   PTM                                9012      8155    0.06

   MISCELLANEOUS                      8738      8050    0.06

   ALTERNATIVE PRODUCTS               5272      5272    0.04

   DOMAIN                             4975      4457    0.03

   CAUTION                            4387      4105    0.03

   INDUCTION                          4067      4067    0.03

   DEVELOPMENTAL STAGE                3868      3868    0.03

   DISEASE                            2514      1876    0.02

   ENZYME REGULATION                  2012      2012    0.01

   DATABASE                           1294      1217    0.01

   MASS SPECTROMETRY                  1213      1078    0.01

   POLYMORPHISM                        454       444   <0.01

   ALLERGEN                            335       335   <0.01

   RNA EDITING                         295       295   <0.01

   BIOTECHNOLOGY                        77        77   <0.01

   PHARMACEUTICAL                       47        47   <0.01



Features (FT)                       831689              5.67

   DOMAIN                           116075     35669    0.79

   TRANSMEM                          91827     20000    0.63

   TURN                              62474      4662    0.43

   STRAND                            57252      4163    0.39

   CONFLICT                          55195     19373    0.38

   METAL                             54792     13543    0.37

   CARBOHYD                          50364     12429    0.34

   DISULFID                          46118     12311    0.31

   HELIX                             45117      4520    0.31

   REPEAT                            31810      4634    0.22

   ACT_SITE                          31418     19008    0.21

   VARIANT                           27420      5089    0.19

   CHAIN                             26007     21096    0.18

   NP_BIND                           18909     13304    0.13

   SIGNAL                            16306     16304    0.11

   MOD_RES                           14982      8452    0.10

   NON_TER                           10597      8092    0.07

   SITE                              10154      6238    0.07

   VARSPLIC                           9968      4490    0.07

   BINDING                            9770      7652    0.07

   ZN_FING                            9215      3288    0.06

   MUTAGEN                            6587      1880    0.04

   INIT_MET                           6135      6090    0.04

   PROPEP                             5196      4409    0.04

   DNA_BIND                           4618      4327    0.03

   LIPID                              4175      2790    0.03

   PEPTIDE                            2806      1101    0.02

   TRANSIT                            2791      2766    0.02

   CA_BIND                            1840       792    0.01

   NON_CONS                            835       428    0.01

   CROSSLNK                            458       360   <0.01

   UNSURE                              315       131   <0.01

   SE_CYS                              163       108   <0.01



Cross-references (DR)              1336204              9.11

   EMBL                             284537    140087    1.94

   InterPro                         264209    129846    1.80

   Pfam                             168429    124579    1.15

   PROSITE                          128678     81216    0.88

   PIR                               88842     81354    0.61

   PRINTS                            47994     42345    0.33

   GO                                47413     14722    0.32

   SMART                             42918     32422    0.29

   HAMAP                             40549     40436    0.28

   TIGRFAMs                          40318     37407    0.27

   HSSP                              38738     38738    0.26

   ProDom                            36531     35107    0.25

   PDB                               22244      6010    0.15

   TIGR                              14632     14556    0.10

   Genew                              9613      9565    0.07

   MIM                                9433      7904    0.06

   MGD                                6973      6952    0.05

   SGD                                4973      4919    0.03

   GermOnline                         4927      4876    0.03

   EcoGene                            4227      4225    0.03

   MEROPS                             3454      3339    0.02

   WormPep                            2730      2439    0.02

   SubtiList                          2667      2666    0.02

   TRANSFAC                           2648      2373    0.02

   FlyBase                            2520      2446    0.02

   GeneDB_SPombe                      2399      2369    0.02

   RGD                                2297      2297    0.02

   TubercuList                        1421      1385    0.01

   StyGene                            1362      1359    0.01

   PIRSF                              1168      1168    0.01

   SWISS-2DPAGE                       1075      1075    0.01

   ListiList                           921       860    0.01

   Leproma                             609       605   <0.01

   GK                                  594       594   <0.01

   Gramene                             556       552   <0.01

   MaizeDB                             411       406   <0.01

   HIV                                 370       354   <0.01

   REBASE                              361       356   <0.01

   ECO2DBASE                           351       299   <0.01

   DictyBase                           321       319   <0.01

   ZFIN                                260       260   <0.01

   GlycoSuiteDB                        259       259   <0.01

   PHCI-2DPAGE                         214       214   <0.01

   SagaList                            205       204   <0.01

   PhotoList                           175       175   <0.01

   MypuList                            159       159   <0.01

   Aarhus/Ghent-2DPAGE                 128        98   <0.01

   Siena-2DPAGE                        103       103   <0.01

   HSC-2DPAGE                           85        85   <0.01

   PhosSite                             53        53   <0.01

   COMPLUYEAST-2DPAGE                   50        50   <0.01

   PMMA-2DPAGE                          48        48   <0.01

   Maize-2DPAGE                         39        39   <0.01

   ANU-2DPAGE                           13        13   <0.01





7.  MISCELLANEOUS STATISTICS



Total number of distinct authors cited in Swiss-Prot: 180569



Total number of entries encoded on a chloroplast: 3494

Total number of entries encoded on a mitochondrion: 2886

Total number of entries encoded on a cyanelle: 145

Total number of entries encoded on a plasmid: 2736



Number of fragments: 8221

Number of additional sequences encoded on splice variants: 7776

We need your help

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available. To facilitate this feedback we offer, on the ExPASy WWW server, a form that allows the submission of updates and/or corrections to Swiss-Prot:

It is also possible, from any entry in Swiss-Prot displayed by the ExPASy server, to submit updates and/or corrections for that particular entry. Finally, you can also send your comments by electronic mail to the address:

Note that all update requests are assigned a unique identifier of the form UR-Xnnnn (example: UR-A0123). This identifier is used internally by the Swiss-Prot staff at SIB and EBI to track requests and is also used in e-mail exchanges with the persons who have submitted a request.