SWISS-PROT RELEASE 32.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 32.0 of SWISS-PROT contains 49'340 sequence entries, comprising
17'385'503 amino acids abstracted from 43'056 references. This
represents an increase of 11.3% over release 31. The recent growth of
the data bank is summarized below.
Release Date Number of entries Nb of amino acids
2.0 09/86 3939 900 163
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
25.0 04/93 29955 10 214 020
26.0 07/93 31808 10 875 091
27.0 10/93 33329 11 484 420
28.0 02/94 36000 12 496 420
29.0 06/94 38303 13 464 008
30.0 10/94 40292 14 147 368
31.0 02/95 43470 15 335 248
32.0 11/95 49340 17 385 503
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 31
2.1 Sequences and annotations
5'959 sequences have been added since release 31, the sequence data of
921 existing entries has been updated and the annotations of 10'691
entries have been revised.
<PAGE>
Major annotations and sequences updates have been made in preparation of
the changes that will take place in release 33 (see section 3.1 of these
notes).
2.2 What's happening with the model organisms
We have selected a number of organisms that are the target of genome
sequencing and/or mapping projects and for which we intend to:
- Be as complete as possible. All sequences available at a given time
should be immediately included in SWISS-PROT. This also includes
sequence corrections and updates;
- Provide a higher level of annotation;
- Provide cross-references to specialized database(s) that contain,
among other data, some genetic information about the genes that code
for these proteins;
- Provide specific indices or documents.
What was done since the last release or in preparation for the next
release concerning model organisms:
- We have added two species to the list of model organisms:
Haemophilus influenzae. Haemophilus influenzae is the first bacterial
genome to be completely sequenced. Its 1'830 Kb sequence was recently
(Science 269:496-512(1995)) determined by a team from the Institute
of Genomic Research (TIGR). This bacterial genome codes for an
estimated 1'740 protein sequences. We have already annotated and
incorporated about 85% of this data into SWISS-PROT. What is left
will be made available in the following weeks.
Candida albicans. We have added Candida albicans to the list of model
organisms because of the extensive work being done by Stew Scherer
and colleagues at the Department of Microbiology of the University of
Minnesota to organize data from that fungal organism. Their data is
available from a WWW server:
http://alces.med.umn.edu/Candida.html
We currently have in SWISS-PROT all the publicly available C.albicans
protein sequences.
- We have started a major effort in catching up with the backlog of
sequences from Arabidopsis thaliana. About 150 entries have been
added since release 31. This effort will be continued and expanded in
the next months.
- We have added in SWISS-PROT, all the sequences from yeast chromosome
VI. All the data from yeast chromosome X is also in preparation and
will be available in a few days with the first weekly update of
SWISS-PROT. Yeast sequence entries are now cross-referenced to both
<PAGE>
LISTA and SGD (see section 2.3). We plan to work on chromosome XII
and XIII entries very soon.
- We are regularly adding data coming from the S.pombe chromosome I
sequencing project. About 180 S.pombe entries were added since
release 31.
- Although we added 234 entries from C.elegans, we have not yet caught
up with the backlog of sequence data produced from the genome
sequencing project of that organism. We hope to be able to clear up a
significant part of that backlog for release 33.
- We are almost up to date concerning Bacillus subtilis (306 entries
added), Escherichia coli (317 entries added) and Salmonella
typhimurium (35 entries added).
- A big effort needs to be done to take care of human (214 entries
added) and Drosophila (57 entries added) sequences.
- We plan to add Mycoplasma genitalium (the second bacterial genome to
be completely sequenced) as a model organism in release 33.
Here is the current status of the model organisms:
Organism Database Index file Number of
cross-referenced sequences
-------------- --------------------- -------------- ---------
A.thaliana None yet In preparation 432
B.subtilis SubtiList SUBTILIS.TXT 1389
C.albicans None yet CALBICAN.TXT 100
C.elegans WormPep CELEGANS.TXT 924
D.discoideum DictyDB DICTY.TXT 213
D.melanogaster FlyBase In preparation 768
E.coli EcoGene ECOLI.TXT 3468
H.influenzae None yet HAEINFLU.TXT 1575
H.sapiens MIM MIMTOSP.TXT 3281
S.cerevisiae LISTA/SGD YEAST.TXT 3391
S.typhimurium StyGene SALTY.TXT 603
S.pombe None yet POMBE.TXT 460
S.solfataricus None yet None yet 61
2.3 Changes in the DR line and other news about cross-references
We have added cross-references from SWISS-PROT to the Saccharomyces
Genome Database (SGD) (previously known as SacchDB) prepared under the
supervisation of Michael Cherry at the Stanford University School of
Medicine. These cross-references are present in the DR lines:
Data bank identifier: SGD
Primary identifier: Unique identifier attributed by SGD to the gene
coding for the protein
Secondary identifier: The gene designation (name)
Example: DR SGD; L0000008; AAR2.
<PAGE>
We started very recently to receive directly from PDB pre-release of
protein 3D-structure entries. Thanks to this new development, we will be
able to keep the cross-references between SWISS-PROT and PDB up to date.
Currently there are 920 SWISS-PROT entries that are cross-referenced to
PDB, but we need to catch up with a small backlog corresponding to the
significant increase in the number of PDB entries in the last six
months. We plan to be in synchronization with PDB starting with release
33.
There are currently 174'439 DR lines in SWISS-PROT, an average of 3.53
cross-references per entry.
2.4 Replacement of RM line by RX line
In this release, the RM (Reference Medline) line has been replaced by a
more 'generic' line called RX (Reference cross-references). The format
of that line is:
RX BIBLIOGRAPHIC_DATABASE_NAME; IDENTIFIER.
As of this release, the only "bibliographic_database_name" that is used
is "MEDLINE" and the associated "identifier" is the eight digit Medline
Unique Identifier (UID). But it is 'rumored' that additional
bibliographic databases are interested to be linked to the sequence
databases.
Example:
RM 91002678
has been changed to:
RX MEDLINE; 91002678.
There are currently 64'668 Medline cross-references (RX) in SWISS-PROT.
2.5 Status of the documentation files
SWISS-PROT is distributed with a large number of documentation files.
Some of these files have been available for a long time (the user
manual, release notes, the various indices for authors, citations,
keywords, etc.), but many have been created recently and we are
continuously adding new files. Since release 31, we have added 8 new
document files. The following table list all the documents that are
either currently available or that we plan to add in the next few
months.
USERMAN .TXT User manual
RELNOTES.TXT Release notes
SHORTDES.TXT Short description of entries in SWISS-PROT
<PAGE>
JOURLIST.TXT List of abbreviations for journals cited
KEYWLIST.TXT List of keywords in use
SPECLIST.TXT List of organism identification codes
EXPERTS .TXT List of on-line experts for PROSITE and SWISS-PROT
SUBMIT .TXT Submission of sequence data to the SWISS-PROT data bank [1]
ACINDEX .TXT Accession number index
AUTINDEX.TXT Author index
CITINDEX.TXT Citation index
KEYINDEX.TXT Keyword index
SPEINDEX.TXT Species index
7TMRLIST.TXT List of 7-transmembrane G-linked receptors entries
AATRNASY.TXT List of aminoacyl-tRNA synthetases [1]
ALLERGEN.TXT Nomenclature and index of allergen sequences [1]
CALBICAN.TXT Index of Candida albicans entries and their corresponding
gene designations [1]
CDLIST .TXT CD nomenclature for surface proteins of human leucocytes
CELEGANS.TXT Index of Caenorhabditis elegans entries and their
corresponding gene designations and WormPep cross-
references
DICTY .TXT Index of Dictyostelium discoideum entries and their
corresponding gene designations and DictyDB cross-
references
EC2DTOSP.TXT Index of Escherichia coli Gene-protein database entries
referenced in SWISS-PROT
ECOLI .TXT Index of Escherichia coli K12 chromosomal entries and
their corresponding EcoGene cross-reference
EMBLTOSP.TXT Index of EMBL Database entries referenced in SWISS-PROT
EXTRADOM.TXT Nomenclature of extracellular domains
GLYCOSYL.TXT Index of glycosyl hydrolases classified by families on the
basis of sequence similarities [2]
HAEINFLU.TXT Index of Haemophilus influenzae RD chromosomal entries [1]
HOXLIST .TXT Vertebrate homeotic Hox proteins: nomenclature and index
HUMCHR21.TXT Index of protein sequence entries encoded on human
chromosome 21
HUMCHR22.TXT Index of protein sequence entries encoded on human
chromosome 22 [1]
HUMCHRY .TXT Index of protein sequence entries encoded on human
chromosome Y
MIMTOSP .TXT Index of MIM entries referenced in SWISS-PROT
NOMLIST .TXT List of nomenclature related references for proteins
PDBTOSP .TXT Index of Brookhaven PDB entries referenced in SWISS-PROT
PEPTIDAS.TXT Classification of peptidase families and index of peptidases
entries [1]
PLASTID .TXT List of chloroplast and cyanelle encoded proteins
POMBE .TXT Index of Schizosaccharomyces pombe entries in SWISS-PROT
and their corresponding gene designations
RESTRIC .TXT List of restriction enzymes and methylases entries
RIBOSOMP.TXT Index of ribosomal proteins classified by families on the
basis of sequence similarities [2]
<PAGE>
SALTY .TXT Index of Salmonella typhimurium LT2 chromosomal entries
and their corresponding StyGene cross-references
SUBTILIS.TXT Index of Bacillus subtilis 168 chromosomal entries and
their corresponding SubtiList cross-references
YEAST .TXT Index of Saccharomyces cerevisiae entries and their
corresponding gene designations [3]
YEAST1 .TXT Yeast Chromosome I entries
YEAST2 .TXT Yeast Chromosome II entries
YEAST3 .TXT Yeast Chromosome III entries
YEAST5 .TXT Yeast Chromosome V entries
YEAST6 .TXT Yeast Chromosome VI entries [1]
YEAST8 .TXT Yeast Chromosome VIII entries
YEAST9 .TXT Yeast Chromosome IX entries
YEAST10 .TXT Yeast Chromosome X entries [2]
YEAST11 .TXT Yeast Chromosome XI entries
Notes:
[1] New in release 32.
[2] Will be available starting with release 33 in February 1996.
[3] The format of that file was changed to add cross-references to SGD.
We also started to include in SWISS-PROT document files listing of
World-Wide Web (sites) relevant to the subject under consideration. For
example, in the "POMBE.TXT" file, you will find the following lines:
More information on Schizosaccharomyces, its genome, biology and
genetics, is available from the following WWW pages:
NIH : http://www.nih.gov/sigs/yeast/fission.html
Salk: http://flosun.salk.edu/users/forsburg/lab.html
UCL : http://t-chappell.mcbl.ucl.ac.uk/
2.6 The Expasy World-Wide Web server
2.6.1 Background information
The most efficient and user-friendly way to browse interactively in
SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
the World-Wide Web (WWW) molecular biology server ExPASy. WWW is a
global information retrieval system merging the power of world-wide
networks, hypertext and multimedia. Through hypertext links, it gives
access to documents and information available on thousands of servers
around the world. To access a WWW server one needs a WWW browser.
Popular browsers available for most computer platforms include
Mosaic(TM), developed at the National Center for Supercomputing
Applications (NCSA) of the University of Illinois at Champaign (it may
be obtained by anonymous ftp from ftp.ncsa.uiuc.edu) and Netscape
Navigator(TM) from Netscape Communications Corp. (available from
ftp.netscape.com). Using a WWW browser, one has access to all the
hypertext documents stored on the ExPASy server as well as many other
WWW servers.
<PAGE>
The ExPASy server was made available to the public in September 1993. On
November 1995 a cumulative total of 3 million connections was attained.
It may be accessed through its Uniform Resource Locator (URL - the
addressing system defined in WWW), which is:
http://expasy.hcuge.ch/
The ExPASy WWW server allows access, using the user-friendly hypertext
model, to the SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and SWISS-
3DIMAGE databases and, through any SWISS-PROT protein sequence entry, to
other databases such as EMBL, EcoCyc, FlyBase, GCRDb, LISTA, MaizeDB,
SubtiList, OMIM, PDB, HSSP, ProDom, REBASE, SGD, YEPD and Medline. Using
a browser which is able to display images one can also remotely access
2D gels image data from SWISS-2DPAGE. ExPAsy also offers many tools for
the analysis of protein seqiuences and 2D gels.
For more information on the ExPASy WWW server, you can read the
following article:
Appel R.D., Bairoch A., Hochstrasser D.F.
A new generation of information retrieval tools for biologists: the
example of the ExPASy WWW server.
Trends Biochem. Sci. 19:258-260(1994).
Or you can contact Dr. Ron Appel:
Email: ron.appel@dim.hcuge.ch
Fax: +41-22-372 61 98
2.6.2 SWISS-SHOP
Thanks to the work of Manuel Peitsch from the Geneva Glaxo Institute for
Molecular Biology, we can provide, on ExPASy, a service called SWISS-
SHOP. SWISS-Shop allows any users of SWISS-PROT to indicate what
proteins he/she is interested in. This can be done using various
criteria that can be combined:
- By entering one or more words that should be present in the
description line;
- By entering one or more species name(s) or taxonomic division(s);
- By entering one or more keywords;
- By entering one or more author names;
- By entering the accession number (or entry name) of a PROSITE pattern
or a user-defined sequence pattern;
- By entering the accession number (or entry name) of an existing
SWISS-PROT entry or by entering a "private" sequence.
Every week, the new sequences entered in SWISS-PROT are automatically
compared with all the criteria that have been defined by the users. If a
sequence corresponds to the selection criteria defined by a user, that
sequence is sent by electronic mail.
<PAGE>
2.6.3 What is new on ExPASy
Since the last release, there has been a large number of new
developments on the ExPASy WWW server. Here are some highlights of these
changes:
- A new option has been introduced that allows to search in SWISS-PROT,
PROSITE and SeqAnalRef by citation. When you call this option, you
are prompted to enter the name of a journal and optionally a volume
number and/or a year. The program is written in such a way that you
can enter either the full name of a journal or its official (as
listed in the JOURLIST.TXT file) abbreviation (with or without
periods). It is also able to recognize special abbreviations such as
JBC, NAR, PNAS, etc. So, for example, you can either enter:
Journal of Biological Chemistry
J. Biol. Chem.
J Biol Chem
JBC
If you do not enter a valid journal name or abbreviation, it will
show you the list of those that could potentially match your input.
- We have improved the options that allow you to search in SWISS-PROT
by 'description' or by 'full text':
If your search criteria return a list that contains more than two
entries, you now have the option that to save these SWISS-PROT
entries into a file which is stored (for up to a week) on the
ExPASy FTP anonymous server. Thus it is now possible for users to
create custom subsets of the database and to download them on their
computer.
If your search criteria does not return any entry, you can, if you
believe that the sequence(s) that you are looking for are currently
missing in SWISS-PROT, send a message to the SWISS-PROT team so
that they can take steps to insure that these sequence(s) be added
to the database.
- The Journal of Biological Chemistry (JBC) has a WWW server where
abstracts and full text of articles are made available. We are happy
to announce the implementation of what we believe to be the first
direct link in a sequence database between a reference and the full
text version of a cited article. Recent JBC references in SWISS-PROT
and PROSITE are directly linked to the corresponding entry point in
the JBC server.
- ProtParam is a new tool which we have implemented and that allows the
computation of various physical and chemical parameters for a given
protein stored in SWISS-PROT or for a user entered sequence. The
computed parameters include the molecular weight, theoretical pI,
amino acid composition, extinction coefficient, estimated half-life,
instability index and aliphatic index.
<PAGE>
- RandSeq is a new tool which generates random protein sequences. You
can choose the length of the sequence to be created as well as choose
between four different options for the composition of the generated
sequence: equal composition for all amino acids; use the composition
of a specific sequence from SWISS-PROT; average amino acid
composition (computed from SWISS-PROT); user specified composition in
percent.
- WWW links have been implemented between SWISS-PROT yeast entries and
SGD (see section 2.3), as well between Escherichia coli K12
chromosomal entries and the EcoCyc database, the encyclopedia of E.
coli Gene and Metabolism.
- Most SWISS-PROT documents are now directly linked to relevant WWW
servers or specific documents (see section 2.5).
- Many other changes have been made to all parts of the server.
2.7 Weekly updates of SWISS-PROT
Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
are updated at each update:
new_seq.dat Contains all the new entries since the last full release;
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release;
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since the last release.
Currently these files are available on the following anonymous ftp
servers:
Organization ExPASy (Geneva University Expert Protein Analysis System)
Address expasy.hcuge.ch (or 129.195.254.61)
Directory /databases/swiss-prot/updates
Organization National Center for Biotechnology Information (NCBI)
Address ncbi.nlm.nih.gov (or 130.14.20.1)
Directory /repository/swiss-prot/updates
Organization European Bioinformatics Institute (EBI)
Address ftp.ebi.ac.uk (or 193.62.196.6)
Directory /pub/databases/swissprot/new
Organization Bioinformatics Unit, Weizmann Institute of Science (WIS)
Address bioinformatics.weizmann.ac.il (or 132.76.55.12)
Directory /pub/databases/swiss-prot/updates
!! Important notes !!!
Although we try to follow a regular schedule, we do not promise to
update these files every week. In some cases two weeks will elapse in-
between two updates.
<PAGE>
Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free.
3.0 IMPORTANT FORTHCOMING CHANGE
3.1 Major changes to the cross-references to EMBL
In the next release, the format of the DR (Database cross-Reference)
lines pointing to EMBL Nucleotide Sequence Database entries will be
changed from:
DR EMBL; ACCESSION_NUMBER; ENTRY_NAME.
to:
DR EMBL; ACCESSION_NUMBER; PID; STATUS_IDENTIFIER.
Where 'PID' stands for the "Protein IDentification" number. It is a
number that you will find from EMBL release 45 onwards (and Genbank
release 94.0 onwards) in a qualifier called "/db_xref" which is tagged
to every CDS in the nucleotide database. Example:
FT CDS 54..1382
FT /note="ribulose-1,5-bisphosphate carboxylase/
FT oxygenase activase precursor"
FT /db_xref="PID:g1006835"
When an EMBL database CDS exists as a sequence report in SWISS-PROT, the
SWISS-PROT DR lines of the corresponding SWISS-PROT entry will be
updated by citing the PID as secondary identifier. In all cases where a
PID will have been integrated into SWISS-PROT, a "/db_xref" qualifier
citing the corresponding SWISS-PROT entry will be added to the EMBL
database CDS labeled with this PID. Example:
FT CDS 14556__15696
FT /gene="cytochrome b"
FT /codon_start=1
FT /product="apoprotein"
FT /db_xref="PID:g463170"
FT /db_xref="SWISS-PROT:P12778"
This approach enables us to point precisely from a given SWISS-PROT
entry to one of potentially many CDS in the corresponding EMBL entry and
vice versa. This change will allow the development of software tools
that automatically retrieve the part of a nucleotide sequence entry that
codes for a specific protein. This will be especially useful in the
context of World-Wide Web as it will render obsolete the current
situation where, for example, one needs to retrieve the complete
sequence of a yeast chromosome when one wants the nucleotide sequence
coding for a specific protein encoded on that chromosome.
This major changes has been in preparation for the last six months, it
is one of the reasons that release 32 was delayed so long. In the course
<PAGE>
of cross-referencing at the level of the "PID", we had to manually check
thousands of problem cases. This lead to many sequence and annotation
updates.
An additional important principle of the PID system is that whenever a
change is made to the nucleotide entry or to the annotations of that
entry and that this change produces a modification in the translated
protein sequence, the PID number corresponding to the modified CDS is
replaced by a completely new number. The old number will be kept in a
special field tagged to the CDS. The exact syntax of this field is under
discussion at the international nucleotide databases.
The new cross-referencing system will allow a much closer
interconnection between SWISS-PROT and the international nucleotide
sequence databases. For example, it will allow us to automatically take
into account sequence updates made to the nucleotide entry when these
updates have an impact on the derived protein sequence(s).
It should also be noted that the "PID" numbers in the context of GenBank
replace the "NCBI gi" numbering system which was present in the "/note"
qualifier. The "gi" identifiers for the nucleic acid sequences have been
replaced by "NID" (nucleic acid identifier) numbers.
The 'STATUS_IDENTIFIER' provides information about the relationship
between the sequence in the SWISS-PROT entry and the CDS in the
corresponding EMBL entry.
a) In most cases the translation of the EMBL nucleotide sequence CDS
results in the same sequence as shown in the corresponding SWISS-PROT
entry or the differences are mentioned in the SWISS-PROT feature (FT)
lines as CONFLICT, VARIANT or VARSPLIC and in the RP lines. In these
cases the status identifier shows a dash ("-").
Example:
DR EMBL; Y00312; G63880; -.
b) In some cases the translation of the EMBL nucleotide sequence CDS
results in a sequence different from the sequence shown in the
corresponding SWISS-PROT entry and the differences are either not
mentioned in the SWISS-PROT feature (FT) lines as CONFLICT, VARIANT or
VARSPLIC and in the RP lines, or do simply not meet the criteria for
such situations.
1) If the difference is due to a different start of the sequence (e.g.
SWISS-PROT believes that the start of the sequence is upstream or
downstream of the site annotated as the start of the sequence in the
EMBL database), the status identifier shows the comment "ALT_INIT".
Example:
DR EMBL; L29151; G466334; ALT_INIT.
<PAGE>
2) If the difference is due to a different termination of the sequence
(e.g. SWISS-PROT believes that the termination of the sequence is
upstream or downstream of the site annotated as the end of the
sequence in the EMBL database), the status identifier shows the
comment "ALT_TERM". Example:
DR EMBL; L20562; G398099; ALT_TERM.
3) If the difference is due to frameshifts in the EMBL sequence, the
status identifier shows the comment "ALT_FRAME". Example:
DR EMBL; M95935; G146416; ALT_FRAME.
4) If the difference is not due to the cases mentioned above (e.g. wrong
intron-exon boundaries given in the EMBL entry) or to a mixture of
the cases mentioned above, the status identifier shows the comment
"ALT_SEQ". Example:
DR EMBL; X79206; G809602; ALT_SEQ.
c) In some cases the nucleotide sequence of a complete CDS is divided in
exons present in different EMBL entries. We point to the exon containing
EMBL entries by citing the PID as secondary identifier and adding the
comment "JOINED" into the status identifier. These EMBL entries are not
containing a CDS feature, they contain exons joined to a CDS feature
which is labeled with the given PID.
Example:
DR EMBL; M63397; G177196; -.
DR EMBL; M63395; G177196; JOINED.
DR EMBL; M63396; G177196; JOINED.
In the above example the SWISS-PROT sequence is derived from the CDS
labeled with the PID G177196. This CDS feature can be found in the EMBL
entry M63397. Exons belonging to this CDS are not only found in EMBL
entry M63397, but also in the EMBL entries M63395 and M63396.
d) In some cases there is no CDS feature key annotating a protein
translation in an EMBL entry and thus no PID for that CDS. Therefore it
is not possible for us to point to a PID as a secondary identifier. In
these cases we point to the relevant EMBL entries by including a dash
("-") in the position of the missing PID and "NOT_ANNOTATED_CDS" into
the status identifier.
Example:
DR EMBL; J04126; -; NOT_ANNOTATED_CDS.
<PAGE>
3.2 TREMBL - a supplement to SWISS-PROT
The ongoing genome sequencing and mapping projects have dramatically
increased the number of protein sequences to be incorporated into SWISS-
PROT. Since we do not want to dilute the quality standards of SWISS-PROT
by incorporating sequences into SWISS-PROT without proper sequence
analysis and annotation, we cannot speed up the incorporation of new
incoming data indefinitely. But as we also want to make the sequences
available as fast as possible, we will introduce with SWISS-PROT an
computer annotated supplement to SWISS-PROT. This supplement consists of
entries in SWISS-PROT-like format derived from the translation of all
coding sequences (CDS) in the EMBL nucleotide sequence database, except
the CDS already included in SWISS-PROT.
We name this supplement TREMBL (TRanslation from EMBL), since the
translation tools used to create the translations of the CDS are based
on the program 'trembl' written by Thure Etzold at the EMBL in
Heidelberg.
We will translate all CDS's in the EMBL Nucleotide Sequence Database
into TREMBL preentries. The preentries already as sequence reports in
SWISS-PROT will be excluded from TREMBL. Then the remaining entries will
be automatically merged whenever possible to reduce redundancy in
TREMBL. This step will lead to approximately 90'000 TREMBL entries,
which are supplementing SWISS-PROT.
We will split TREMBL in two main sections; SP-TREMBL and REM-TREMBL:
SP-TREMBL (SWISS-PROT TREMBL) will contain the entries (about 75'000)
which should be incorporated into SWISS-PROT. SP-TREMBL will be
partially redundant against SWISS-PROT, since approximately 40'000 of
these SP-TREMBL entries will be only additional sequence reports of
proteins already in SWISS-PROT. We will try to merge these sequence
reports as fast as possible with the already existing SWISS-PROT entries
for these proteins, so as to make SWISS-PROT and TREMBL completely
nonredundant.
REM-TREMBL (REMaining TREMBL) will contain the entries (about 15'000)
that we do not want to include in SWISS-PROT. This section will be
organized in four subsections:
1) Most REM-TREMBL entries will be immunoglobulins and T-cell receptors.
We stopped entering immunoglobulins and T-cell receptors into SWISS-
PROT, because we only want to keep the germ line gene derived
translations of these proteins in SWISS-PROT and not all known
somatic recombinated variations of these proteins. We are expecting
more than 10'000 immunoglobulins and T-cell receptors in TREMBL. We
would like to create a specialized database dealing with these
sequences as a further supplement to SWISS-PROT and keep only a
representative cross-section of these proteins in SWISS-PROT.
2) Another category of data which will not be included in SWISS-PROT are
synthetic sequences. Again, we do not want to leave these entries in
TREMBL. Ideally one should build a specialized database for
artificial sequences as a further supplement to SWISS-PROT.
<PAGE>
3) A third subsection consists of fragments with less than seven amino
acids.
4) The last subsection consists of CDS translations where we have strong
evidence to believe that these CDS are not coding for real proteins.
The first full release of TREMBL will be distributed with release 34 of
SWISS-PROT. However we will make available, with release 33, a beta
release so that users and software developers can send us feedback about
this new supplement to SWISS-PROT.
3.3 Introduction of a new CC line-type topic (MASS SPECTROMETRY)
We will introduce in the next release a new 'topic' for the comments
(CC) line-type: MASS SPECTROMETRY. This topic will be used to report the
exact molecular weight of a protein or part of a protein as determined
by mass spectrometric methods. The syntax of this new topic will be:
CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX]; METHOD=XX[; RANGE=XX-XX].
Where:
- "MW=XX" is the determined molecular weight (MW);
- "MW_ERR=XX" (optional) is the accuracy or error range of the MW
measurement;
- "METHOD=XX" is the masss spectrometric method: "ELECTROSPRAY" is used
for electrospray ionization (ESI) and "MALDI" is used for matrix-
assisted laser desorption/ionization;
- "RANGE=XX-XX" (optional) is used to indicate what part of the protein
sequence entry corresponds to the molecular weight. If this qualifier
is not present, the MW value corresponds to the full length of the
protein sequence.
Examples of its usage:
CC -!- MASS SPECTROMETRY: MW=13423.3; METHOD=ELECTROSPRAY.
CC -!- MASS SPECTROMETRY: MW=71890; MR_ER=7; METHOD=ELECTROSPRAY.
CC -!- MASS SPECTROMETRY: MW=8597.5; METHOD=ELECTROSPRAY; RANGE=40-119.
It should be noted that the syntax of this topic may evolve in future
releases as we expect feedback from groups using mass spectrometry for
protein identification on 2D gels, MW determination and characterization
of post-translational modifications.
3.4 Change in the syntax of the SQ line
The SQ (SeQuence header) line marks the beginning of the sequence data
and gives a quick summary of its content. The format of the SQ line is
currently:
SQ SEQUENCE XXXX AA; XXXXX MW; XXXXX CN;
<PAGE>
The line contains the length of the sequence in amino-acids (AA)
followed by the molecular weight (MW) rounded to the nearest gram and a
checking number (CN) as shown in the example:
SQ SEQUENCE 104 AA; 11530 MW; 54319 CN;
Starting with the next release, we will replace the checking number (CN)
by a 32-bit CRC (Cyclic Redundancy Check) value. The new syntax will be:
SQ SEQUENCE XXXX AA; XXXXX MW; XXXXXXXX CRC32;
Example:
SQ SEQUENCE 104 AA; 11530 MW; 7A70363C CRC32;
4. ENZYME AND PROSITE
4.1 The ENZYME data bank
4.1.1 Content of the release
Release 19.0 of the ENZYME data bank is distributed with release 32 of
SWISS-PROT. ENZYME release 19.0 contains information relative to 3601
enzymes. We have updated the data bank with new information released by
the Nomenclature Committee of IUBMB.
4.1.2 Improvements in the ENZYME section of the ExPASy WWW server
On ExPASY, the display of ENZYME entries has been completely changed to
be made more readable. One of the changes is that each compound listed
in a reaction is presented on a separate line. Example:
UDP-GLUCOSE + 2 NAD(+) + H(2)O = UDP-GLUCURONATE + 2 NADH.
is now shown as:
UDP-GLUCOSE
+ 2 NAD(+)
+ H(2)O
<=> UDP-GLUCURONATE
+ 2 NADH.
Links have been added to the Klotho database of metabolic compounds
maintained by Tonic Kazic at the Institute for Biomedical Computing at
Washington University in St. Louis.
<PAGE>
4.2 The PROSITE data bank
4.2.1 Statistics for release 13
Release 13.0 of the PROSITE data bank is distributed with release 32 of
SWISS-PROT. This release of PROSITE contains 889 documentation entries
that describe 1'167 different patterns, rules and profiles/matrices.
Since the last full release (12.0 of June 1994) we added 104 new
documentation entries and updated 499 entries. Therefore 68% of all
PROSITE entries are either new or updated.
Out of a total of 49'340 entries in SWISS-PROT, 24'137 are cross-
referenced in PROSITE (excluding the false positives). This tally for
exactly 49% of the sequences in SWISS-PROT.
4.2.2 List of the new entries in release 13
C1q domain signature
Death domain profile
Forkhead-associated (FHA) domain profile
PH domain profile
Src homology 2 (SH2) domain profile
Src homology 3 (SH3) domain profile
WW/rsp5/WWP domain signature and profile
S-layer homology domain signature
Prokaryotic dksA/traR C4-type zinc finger
Copper-fist domain
Bacterial regulatory proteins, iclR family signature
Bacterial regulatory proteins, marR family signature
Bacterial regulatory proteins, tetR family signature
Sigma-70 factors ECF subfamily signature
Ribosomal protein L10 signature
Ribosomal protein L24 signature
Ribosomal protein L31 signature
Ribosomal protein L7Ae signature
Ribosomal protein L13e signature
Ribosomal protein L18e signature
Ribosomal protein L24e signature
Ribosomal protein L27e signature
Ribosomal protein L31e signature
Ribosomal protein L34e signatures
Ribosomal protein L35Ae signature
Ribosomal protein L37e signature
Ribosomal protein S6 signature
Homoserine dehydrogenase signature
Aspartate-semialdehyde dehydrogenase signature
Pyridoxamine 5'-phosphate oxidase signature
Respiratory-chain NADH dehydrogenase 20 Kd subunit signature
Respiratory-chain NADH dehydrogenase 24 Kd subunit signature
NNMT/PNMT/TEMT family of methyltransferases signature
Ribosomal RNA adenine dimethylases signature
Squalene and phytoene synthases signatures
ROK family signature
<PAGE>
Casein kinase II regulatory subunit signature
Shikimate kinase signature
Prokaryotic diacylglycerol kinase signature
Acetate and butyrate kinases family signatures
RNA polymerases H / 23 Kd subunits signature
RNA polymerases L / 13 to 16 Kd subunits signature
RNA polymerases N / 8 Kd subunits signature
RNA polymerases RPB6 / 6 Kd subunits signature
Lipolytic enzymes "G-D-S-L" family, serine active site
Class A bacterial acid phosphatases signature
Phosphatidylinositol-specific phospholipase C profiles
DNA/RNA non-specific endonucleases active site
Thermonuclease family signature
Chitinases family 18 signature
Glycosyl hydrolases family 45 active site
ATP-dependent serine proteases, lon family, serine active site
Interleukin-1 beta converting enzyme family active sites
Hydroxymethylglutaryl-coenzyme A lyase active site
DNA photolyases class 2 signatures
Adenylate cyclases class-I signatures
Ribulose-phosphate 3-epimerase family signatures
PpiC-type peptidyl-prolyl cis-trans isomerase signature
Glucosamine/galactosamine-6-phosphate isomerases signature
Terpene synthases signature
SAICAR synthetase signatures
NAD-dependent DNA ligase signatures
Transposases, IS30 family, signature
Molybdenum cofactor biosynthesis proteins signatures
Radical activating enzymes signature
Electron transfer flavoprotein beta-subunit signature
Heavy-metal-associated domain
Bacterial extracellular solute-binding proteins, family 1 signature
Bacterial extracellular solute-binding proteins, family 3 signature
Bacterial extracellular solute-binding proteins, family 5 signature
Sulfate transporters signature
Xanthine/uracil permeases family signature
OmpA-like domain
GPR1/FUN34/yaaH family signature
FtsZ protein signatures
Kinesin light chain repeat
Bacterial microcompartiments proteins signature
Flagella transport protein fliP family signatures
Macrophage migration inhibitory factor family signature
Scorpion short toxins signature
GrpE protein signature
Bacterial type II secretion system protein C signature
Bacterial type II secretion system protein N signature
Protein secE/sec61-gamma signature
Fimbrial biogenesis outer membrane usher protein signature
Apoptosis regulator proteins, Bcl-2 family signature
GTP-binding nuclear protein ran signature
Elongation factor Ts signatures
Translation initiation factor SUI1 signature
Calponin family repeat
<PAGE>
CAP protein signatures
Hydrogenases expression/synthesis hupF/hypC family signature
NOL1/NOP2/fmu family signature
Hypothetical SUA5/yciO/yrdC family signature
Hypothetical YBL055c/yjjV family signatures
Hypothetical YBR002c family signature
Hypothetical YBR177c/yheT family signature
Hypothetical YER057c/yjgF family signature
Hypothetical YKL151c/yjeF family signatures
Hypothetical hesB/yadR/yfhF family signature
Hypothetical yabO/yceC/yfiI family signature
Hypothetical yciL/yejD/yjbC family signature
Hypothetical yedF/yeeD/yhhP family signature
Hypothetical yhdG/yjbN/yohI family signature
4.2.3 Status of profiles in PROSITE
This is the second release of PROSITE to include weight matrices (also
known as profiles). The last release included only two profile entries;
this release includes 16 profiles. Seven of these profiles are described
by documentation entries that are linked to both a signature pattern and
a profile.
As in general, a profile is much more sensitive than a pattern, you
should try to make use of the profile if you have access to the
necessary software tools to do so.
Many new profiles are being prepared and will be progressively added to
PROSITE. We also plan to upgrade some unsatisfactory patterns entries to
profiles.
4.2.4 Software to make use of the profiles
A set of two programs (for Unix systems) have been developed by Philipp
Bucher to make use of the PROSITE profile entries:
pfscan scans a single sequence for the occurrences of several
PROSITE profile entries.
pfsearch searches a sequence database for occurrences of a single
PROSITE profile entry.
These programs are available from the ISREC anonymous ftp server
"ulrec3.unil.ch"; the files are located in the directory "/pub/pftools".
From WWW, you can use "ProfileScan", an ISREC service that allows to
scan a sequence against the profile entries in PROSITE; the URL for this
service is:
http://ulrec3.unil.ch/software/profilescan.html
A link to this tool is also provided by the ExPASy WWW server.
<PAGE>
4.2.5 Changes in the format of the PROSITE.DAT file
In the NR line (Numerical Results) we changed the format of the
"/FALSE_NEG" qualifier and added a new qualifier, "/PARTIAL".
The syntax of the "/FALSE_NEG" qualifier which reports the number of
known missed sequences used to be: "/FALSE_NEG=x(y);" where `x'
represented the number of hits and `y' the number of sequences; we
simplified this syntax to "/FALSE_NEG=y;" where `y' represents the
number of sequences.
The new qualifier "/PARTIAL" is used to indicate the number of partial
sequences which belong to the set in consideration, but which are not
hit by the pattern or profile because they are partial (fragment)
sequences. Its syntax is "/PARTIAL=y;" where `y' represents the number
of sequences.
Example of a complete block of NR lines:
NR /RELEASE=32,49340;
NR /TOTAL=123(56); /POSITIVE=115(51); /UNKNOWN=5(2); /FALSE_POS=3(3);
NR /FALSE_NEG=3; /PARTIAL=2;
In the above example the scan for the pattern (or profile) was done on
release 32 of SWISS-PROT which contains 49'340 sequence entries, that
pattern (or profile) was found 123 times in 56 different sequences
(/TOTAL). Out of those 123 `hits', 115 were produced by 51 sequences
that belong to the set under consideration (/POSITIVE), 5 hits were
produced by two sequences which could possible belong to the set
(/UNKNOWN) and 3 hits were produced by 3 other sequences (/FALSE_POS).
That particular pattern missed 3 sequences (/FALSE_NEG) and there were
two partial sequences that belong to the set under consideration but
which do not include the region that contains that pattern (or profile)
(/PARTIAL).
4.2.6 New feature in the PROSITE.DOC file
Starting with release 13, we added a new form of references in the
PROSITE documentation file (PROSITE.DOC). These references are of the
form "[En]", where "n" is a number. These references are used to point
to electronic documents available on the Word-Wide Web. Example:
<PAGE>
{BEGIN}
********************************
* AAA-protein family signature *
********************************
A large family of ATPases has been described [1 to 5,E1] whose key
feature is that they share a conserved region of about 220 amino acids
that contains an ATP-binding site. This family is now called AAA, for
'A'TPases 'A'ssociated
..Lots of lines deleted..
[ 5] Confalonieri F., Duguet M.
BioEssays 17:639-650(1995).
[E1] http://yeamob.pci.chemie.uni-tuebingen.de/AAA/Description.html
{END}
It is of course possible, on the ExPASY WWW server, when displaying a
PROSITE documentation entry to directly access these electronic
references. While this change seems minor, we consider it as the first
step in the establishment of a on-line decentralized encyclopedia for
protein families.
WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.58 Gln (Q) 4.03 Leu (L) 9.29 Ser (S) 7.17
Arg (R) 5.17 Glu (E) 6.31 Lys (K) 5.91 Thr (T) 5.76
Asn (N) 4.52 Gly (G) 6.88 Met (M) 2.36 Trp (W) 1.26
Asp (D) 5.30 His (H) 2.23 Phe (F) 4.04 Tyr (Y) 3.21
Cys (C) 1.70 Ile (I) 5.70 Pro (P) 4.92 Val (V) 6.52
Asx (B) 0.001 Glx (Z) 0.001 Xaa (X) 0.02
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 4921
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 2231
2x: 776
3x: 441
4x: 272
5x: 200
6x: 198
7x: 117
8x: 95
9x: 103
10x: 50
11- 20x: 194
21- 50x: 147
51-100x: 49
>100x: 48
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 3468 Escherichia coli
2 3391 Baker's yeast (Saccharomyces cerevisiae)
3 3281 Human
4 1978 Mouse
5 1773 Rat
6 1575 Haemophilus influenzae
7 1389 Bacillus subtilis
8 924 Caenorhabditis elegans
9 800 Bovine
10 768 Fruit fly (Drosophila melanogaster)
11 605 Chicken
12 603 Salmonella typhimurium
13 479 African clawed frog (Xenopus laevis)
14 460 Fission yeast (Schizosaccharomyces pombe)
15 432 Rabbit
432 Arabidopsis thaliana (Mouse-ear cress)
17 376 Pig
18 282 Maize
19 275 Bacteriophage T4
20 251 Vaccinia virus (strain Copenhagen)
21 236 Rice
22 232 Pseudomonas aeruginosa
23 213 Slime mold (Dictyostelium discoideum)
24 205 Tobacco
25 193 Human cytomegalovirus (strain AD169)
26 190 Pea
27 183 Vaccinia virus (strain WR)
183 Wheat
29 173 Barley
30 165 Staphylococcus aureus
31 161 Soybean
32 160 Pseudomonas putida
160 Dog
34 157 Rhodobacter capsulatus
35 155 Neurospora crassa
36 154 Autographa californica nuclear polyhedrosis virus
37 150 Marchantia polymorpha (Liverwort)
38 148 Sheep
148 Klebsiella pneumoniae
40 146 Variola virus
146 Bacillus stearothermophilus
42 138 Spinach
43 130 Tomato
44 124 Potato
45 122 Rhizobium meliloti
122 Mycobacterium leprae
47 117 Lactococcus lactis (subsp. lactis)
48 116 Agrobacterium tumefaciens
49 100 Candida albicans
100 Chlamydomonas reinhardtii
100 Streptomyces coelicolor
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 2622 1001-1100 445
51- 100 4679 1101-1200 318
101- 150 6342 1201-1300 239
151- 200 4810 1301-1400 151
201- 250 4339 1401-1500 134
251- 300 3837 1501-1600 83
301- 350 3650 1601-1700 61
351- 400 3624 1701-1800 60
401- 450 2762 1801-1900 64
451- 500 2777 1901-2000 40
501- 550 1982 2001-2100 23
551- 600 1412 2101-2200 51
601- 650 1036 2201-2300 56
651- 700 782 2301-2400 23
701- 750 713 2401-2500 30
751- 800 568 >2500 145
801- 850 431
851- 900 457
901- 950 322
951-1000 272
A.4 Longest sequences
The longest sequences (>=4000 residues) are listed here:
HTS1_COCCA 5217
FAT_DROME 5147
RYNR_RABIT 5037
RYNR_PIG 5035
RYNR_HUMAN 5032
RYNC_RABIT 4969
DYHC_DICDI 4725
DYHC_RAT 4644
DYHC_DROME 4639
APB_HUMAN 4563
APOA_HUMAN 4548
RRPA_CVMJH 4488
DYHC_TRIGR 4466
DYHC_ANTCR 4466
GRSB_BACBR 4451
PKSK_BACSU 4447
PKSL_BACSU 4427
YP73_CAEEL 4385
DYHC_NEUCR 4367
DYHC_EMENI 4344
PLEC_RAT 4140
DYHC_YEAST 4092
RRPA_CVH22 4085
<PAGE>
A.5 List of the most cited journals in SWISS-PROT
Citations Journal abbreviation
4793 J. BIOL. CHEM.
3162 NUCLEIC ACIDS RES.
3037 PROC. NATL. ACAD. SCI. U.S.A.
2087 J. BACTERIOL.
1706 GENE
1644 FEBS LETT.
1535 EUR. J. BIOCHEM.
1394 EMBO J.
1323 NATURE
1304 BIOCHEM. BIOPHYS. RES. COMMUN.
1235 BIOCHEMISTRY
1023 BIOCHIM. BIOPHYS. ACTA
973 J. MOL. BIOL.
956 CELL
923 MOL. CELL. BIOL.
786 MOL. GEN. GENET.
716 PLANT MOL. BIOL.
705 VIROLOGY
684 BIOCHEM. J.
610 SCIENCE
570 MOL. MICROBIOL.
551 J. BIOCHEM.
452 J. VIROL.
404 J. GEN. VIROL.
316 J. CELL BIOL.
304 GENOMICS
287 GENES DEV.
258 YEAST
253 BIOL. CHEM. HOPPE-SEYLER
250 CURR. GENET.
233 PLANT PHYSIOL.
232 ARCH. BIOCHEM. BIOPHYS.
229 J. IMMUNOL.
223 INFECT. IMMUN.
213 HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
212 MOL. BIOCHEM. PARASITOL.
197 J. GEN. MICROBIOL.
179 MOL. ENDOCRINOL.
175 HUM. MOL. GENET.
169 J. CLIN. INVEST.
167 ONCOGENE
156 FEMS MICROBIOL. LETT.
151 AM. J. HUM. GENET.
145 DNA
136 J. EXP. MED.
129 J. MOL. EVOL.
129 GENETICS
115 BLOOD
112 DEVELOPMENT
108 NEURON
108 HEMOGLOBIN
102 AGRIC. BIOL. CHEM.
<PAGE>
APPENDIX B: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
***********************
****************** * EMBL Nucleotide * **********************
* EPD [Euk.Prom] * <---> * Sequence Database * <---- * ECDC [E.coli map] *
****************** * [EBI] * **********************
***********************
^ ^ ^ ^ ^ ^ ^ ^
****************** | | | I | | | |
* FlyBase * <------+ | | I | | | | **********************
* [D.melanogas.] * | | | I | | | +--------> * GCRDb [7TM recep.] *
****************** | | | I | | | | **********************
| | | I | | | |
****************** | | | I | | | | **********************
* SubtiList * <---------+ | I | | +-----------> * EcoGene [E.coli] *
* [B.subtilis] * | | | I | | | | **********************
****************** | | | I | | | |
| | | I | | | | **********************
****************** | | | I +---------------> * LISTA [Yeast] *
* MaizeDb * <-----------+ I | | | | **********************
* [Zea mays] * | | | I | | | |
****************** | | | I | | | | **********************
| | | I | +-------------> * SGD [Yeast] *
****************** | | | I | | | | **********************
* WormPep * | | | I | | | |
* [C.elegans] * <----+ | | | I | | | | **********************
****************** | | | | I | | | | +------> * DictyDB [D.disco.] *
| | | | I | | | | | **********************
****************** | v v v v v v v v v
* REBASE * *********************** **********************
* [Restriction * <--- * SWISS-PROT * <----- * ENZYME [Nomencl.] *
* enzymes] * * Protein Sequence * **********************
****************** * Data Bank * v
*********************** **********************
****************** ^ ^ ^ ^ ^ ^ | ^ ^ | * OMIM [Human] *
* StyGene * | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | |
****************** | | | | | | | | **********************
| | | | | | | +----------> * ECO2DBASE [2D] *
****************** | | | | | | | **********************
* Transfac * <------+ | | | | | |
****************** | | | | | | **********************
| | | | | +------------> * SWISS-2DPAGE [2D] *
****************** | | | | | **********************
* PROSITE * <--------+ | | | |
* [Patterns and * | | | | **********************
* profiles] * | | | +--------------> * Aarhus/Ghent [2D] *
****************** | | | **********************
| | | |
| | | +----------------> **********************
| | | * YEPD [Yeast] [2D] *
| | +-----------------+ **********************
| v |
| *********************** +-> **********************
+--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
*********************** **********************
<PAGE>