SWISS-PROT RELEASE 36.0 RELEASE NOTES
!! Important: do not forget to read section 11 of these release notes. It
contains an important announcement relevant to SWISS-PROT and PROSITE !!
1. INTRODUCTION
Release 36.0 of SWISS-PROT contains 74'019 sequence entries, comprising
26'840'295 amino acids abstracted from 59'911 references. This represents
an increase of 7% over release 35. The growth of the data bank is
summarized below.
Release Date Number of Number of amino
entries acids
2.0 09/86 3939 900 163
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
25.0 04/93 29955 10 214 020
26.0 07/93 31808 10 875 091
27.0 10/93 33329 11 484 420
28.0 02/94 36000 12 496 420
29.0 06/94 38303 13 464 008
30.0 10/94 40292 14 147 368
31.0 02/95 43470 15 335 248
32.0 11/95 49340 17 385 503
33.0 02/96 52205 18 531 384
34.0 10/96 59021 21 210 389
35.0 11/97 69113 25 083 768
36.0 07/98 74019 26 840 295
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 35
2.1 Sequences and annotations
4'976 sequences have been added since release 35, the sequence data of 712
existing entries has been updated and the annotations of 9'954 entries
have been revised.
2.2 What's happening with the model organisms
We have selected a number of organisms that are the target of genome
sequencing and/or mapping projects and for which we intend to:
. Be as complete as possible. All sequences available at a given time
should be immediately included in SWISS-PROT. This also includes
sequence corrections and updates;
. Provide a higher level of annotation;
. Provide cross-references to specialized database(s) that contain, among
other data, some genetic information about the genes that code for these
proteins;
. Provide specific indices or documents.
What was done since the last release or in preparation for the next
release concerning model organisms:
- We have continued our effort in catching up with the backlog of
sequences from other model organisms. In particular we added about 350
entries from human and from E.coli, 300 from mouse, 250 from S.pombe,
200 from M.jannaschii, 150 from C.elegans, 100 from B.subtilis, H.pylori
and from M.tuberculosis.
- We plan to finish as quickly as possible the annotation of the
Escherichia coli and Haemophilus influenzae sequence entries which are
not yet part of SWISS-PROT.
Here is the current status of the model organisms in SWISS-PROT:
Organism Database Index file Number of
cross-referenced sequences
-------------- ---------------- -------------- ---------
A.thaliana None yet In preparation 719
B.subtilis SubtiList SUBTILIS.TXT 1970
C.albicans None yet CALBICAN.TXT 192
C.elegans Wormpep CELEGANS.TXT 1887
D.discoideum DictyDB DICTY.TXT 280
D.melanogaster FlyBase FLY.TXT 1042
E.coli EcoGene ECOLI.TXT 4416
H.influenzae HiDB (TIGR) HAEINFLU.TXT 1693
H.sapiens MIM MIMTOSP.TXT 4980
H.pylori HpDB (TIGR) HPYLORI.TXT 334
M.genitalium MgDB (TIGR) MGENITAL.TXT 470
M.musculus MGD MGDTOSP.TXT 3253
M.jannaschii MjDB (TIGR) MJANNASC.TXT 1283
M.tuberculosis None yet None yet 873
S.cerevisiae SGD YEAST.TXT 4787
S.typhimurium StyGene SALTY.TXT 706
S.pombe None yet POMBE.TXT 1315
S.solfataricus None yet None yet 72
Collectively the entries from the above model organisms represent 40.9% of
all SWISS-PROT entries.
2.3 Changes affecting the accession numbers
With the creation of the TrEMBL database (see section 6) and the rapid
increase in the amount of sequence data, we are faced with a problem of
availability of accession numbers. Currently we use a system based on a
one-letter prefix followed by 5 digits. This system was also used by the
nucleotide sequence databases which had originally reserved for SWISS-PROT
the prefix letters 'P' and 'Q'. The nucleotide databases having run out of
space (due mainly to EST's), have been forced to start using a new format
based on a two-letter prefix followed by 6 digits.
We have used up all possible numbers with 'P' and 'Q' and the only letter
prefix which was not used by the nucleotide database is 'O'. As we believe
that changing the format of the accession numbers to that used now by the
nucleotide database would create havoc on the numerous software packages
using SWISS-PROT, we have decided to keep a system of accession numbers
based on a six-character code, but with the following changes:
1) We have started using 'O'. This extra letter should allow the
continuation of the present format (1 prefix letter + 5 digits) for
approximately one year.
2) When we will have finished using up 'O', we will introduce a system
based on the following format:
1 2 3 4 5 6
[O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9]
What the above means is that we will keep a six-character code, but that
in positions 3, 4 and 5 of this code any combination of letters and
numbers can be present. This format allows a total of 14 million accession
numbers (up from 300'000 with the current system).
We only allow numbers in positions 2 and 6 so that the SWISS-PROT
accession numbers can not be mistaken with gene names, acronyms, other
type of accession numbers or any type of words!
Examples: P0A3S2, Q2ASD4, O13YX2, P9B123
2.4 Changes concerning the reference location line (RL)
The (IN) prefix used for books is now also used for references to the
electronic Plant Gene Register (See http://www.tarweed.com/pgr/). Example:
RL (IN) PLANT GENE REGISTER PGR98-023.
2.5 Cleaning up of the SIMILARITY comment line (CC) topic
We started a major overhaul of the "SIMILARITY" topic. We would like the
majority of the information stored in this topic to be usable by computer
programs (while being human-readable). We are therefore standardizing the
format of this topic using two different subformats. One to describe to
which family a protein belongs to:
CC - !- SIMILARITY: BELONGS TO THE {Name1} FAMILY [OF {Name2}].
CC [{Name3} SUBFAMILY.]
Examples:
CC - !- SIMILARITY: BELONGS TO THE 14-3-3 FAMILY.
CC - !- SIMILARITY: BELONGS TO THE 6-PHOSPHOGLUCONATE DEHYDROGENASE
CC FAMILY.
CC - !- SIMILARITY: BELONGS TO THE AAA FAMILY OF ATPASES.
CC - !- SIMILARITY: BELONGS TO THE IRON/ASCORBATE-DEPENDENT FAMILY OF
CC OXIDOREDUCTASES.
CC - !- SIMILARITY: BELONGS TO THE ANTP FAMILY OF HOMEOBOX PROTEINS.
CC "DEFORMED" SUBFAMILY.
CC - !- SIMILARITY: BELONGS TO THE KINESIN-LIKE PROTEIN FAMILY. KINESIN
CC SUBFAMILY.
And one to describe which domains are found in a given protein:
CC - !- SIMILARITY: CONTAINS n {Name} [DOMAIN|REPEAT][S].
Examples:
CC - !- SIMILARITY: CONTAINS 1 FHA DOMAIN.
CC - !- SIMILARITY: CONTAINS 45 EGF-LIKE DOMAINS.
CC - !- SIMILARITY: CONTAINS 2 SH3 DOMAINS.
CC - !- SIMILARITY: CONTAINS 2 SUSHI (SCR) REPEATS.
We already have updated many entries in this release and plan to continue
to do so for the next release.
2.6 Changes concerning cross-references (DR line)
We have added cross-references from SWISS-PROT to the Mendel database, a
plant gene nomenclature database from the Commission for Plant Gene
Nomenclature (CPGN). These cross-references are present in the DR lines:
Data bank identifier: MENDEL
Primary identifier : The Mendel accession number for a gene in a given
species.
Secondary identifier: Composed of the acronym of the species (generally
the same five-letter code as that defined and used
by SWISS-PROT in the entry name), the gene name and
a number.
Example: DR MENDEL; 294; Amahy;psbA;1.
3. PLANNED CHANGES
3.1 Extension of the accession number system
As already explained in detail under 2.3, we will extend the accession
number system when we will have used up the 'O' series of accession
numbers. This can be anticipated for October 1998.
3.2 Switch to the NCBI taxonomy
To standardize the taxonomies used by different databases we will change
with release 37 our taxonomy. We will switch to the NCBI taxonomy, which
is already used as the common taxonomy by the DDBJ/EMBL/GenBank
nucleotide sequence databases.
3.3 Introduction of RT lines
With release 37 we will introduce a new line type, the RT (Reference
Title) line. This optional line will be placed between the RA and RL
line. The RT line gives the title of the paper (or other work) as
exactly as possible given the limitations of the computer character set.
The form which will be used is that which would be used in a citation
rather than displayed at the top of the published paper. For instance,
where journals capitalize major title words this is not preserved. The
title is enclosed in double quotes, and may be continued over several
lines as necessary. The title lines are terminated by a semicolon. An
example of the use of RT lines is shown below:
RT "Sequence analysis of the genome of the unicellular cyanobacterium
RT Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb
RT region from map positions 64% to 92% of the genome.";
4. STATUS OF THE DOCUMENTATION FILES
SWISS-PROT is distributed with a large number of documentation files. Some
of these files have been available for a long time (the user manual,
release notes, the various indices for authors, citations, keywords,
etc.), but many have been created recently and we are continuously adding
new files. Since release 35, we have added three new document files. The
following table lists all the documents that are currently available.
USERMAN.TXT User manual
RELNOTES.TXT Release notes
OLDRLNOT.TXT Release notes for previous release [1,2]
SHORTDES.TXT Short description of entries in SWISS-PROT
JOURLIST.TXT List of abbreviations for journals cited [3]
KEYWLIST.TXT List of keywords in use
SPECLIST.TXT List of organism identification codes
TISSLIST.TXT List of tissues [4]
EXPERTS.TXT List of on-line experts for PROSITE and SWISS-PROT
SUBMIT.TXT Submission of sequence data to SWISS-PROT
ACINDEX.TXT Accession number index
AUTINDEX.TXT Author index
CITINDEX.TXT Citation index
KEYINDEX.TXT Keyword index
SPEINDEX.TXT Species index
DELETEAC.TXT Deleted accession number index
7TMRLIST.TXT List of 7-transmembrane G-linked receptors entries
AATRNASY.TXT List of aminoacyl-tRNA synthetases
ALLERGEN.TXT Nomenclature and index of allergen sequences
BLOODGRP.TXT List of blood group antigen proteins
CALBICAN.TXT Index of Candida albicans entries and their
corresponding gene designations
CDLIST.TXT CD nomenclature for surface proteins of human
leucocytes
CELEGANS.TXT Index of Caenorhabditis elegans entries and their
corresponding gene Wormpep cross-references
DICTY.TXT Index of Dictyostelium discoideum entries and
their corresponding gene designations and DictyDb
cross-references
EC2DTOSP.TXT Index of Escherichia coli Gene-protein database
entries referenced in SWISS-PROT
ECOLI.TXT Index of Escherichia coli K12 chromosomal entries
and their corresponding EcoGene cross-references
EMBLTOSP.TXT Index of EMBL Database entries referenced in
SWISS-PROT
EXTRADOM.TXT Nomenclature of extracellular domains
FLY.TXT Index of Drosophila entries and FlyBase cross-
references
GLYCOSID.TXT Classification of glycosyl hydrolase families and
index of glycosyl hydrolase entries
HAEINFLU.TXT Index of Haemophilus influenzae RD chromosomal
entries
HOXLIST.TXT Vertebrate homeotic Hox proteins: nomenclature and
index
HPYLORI.TXT Index of Helicobacter pylori strain 26695
chromosomal entries
HUMCHR17.TXT Index of protein sequence entries encoded on human
chromosome 17 [1]
HUMCHR18.TXT Index of protein sequence entries encoded on human
chromosome 18
HUMCHR19.TXT Index of protein sequence entries encoded on human
chromosome 19
HUMCHR20.TXT Index of protein sequence entries encoded on human
chromosome 20
HUMCHR21.TXT Index of protein sequence entries encoded on human
chromosome 21
HUMCHR22.TXT Index of protein sequence entries encoded on human
chromosome 22
HUMCHRX.TXT Index of protein sequence entries encoded on human
chromosome X
HUMCHRY.TXT Index of protein sequence entries encoded on human
chromosome Y
HUMPVAR.TXT Index of human proteins with sequence variants [1]
INITFACT.TXT List and index of translation initiation factors
MIMTOSP.TXT Index of MIM entries referenced in SWISS-PROT
METALLO.TXT Classification of metallothioneins and index of
entries in SWISS-PROT
MGDTOSP.TXT Index of MGD entries referenced in SWISS-PROT
MGENITAL.TXT Index of Mycoplasma genitalium chromosomal entries
MJANNASC.TXT Index of Methanococcus jannaschii entries
NGR234.TXT Table of putative genes in Rhizobium plasmid
pNGR234a
NOMLIST.TXT List of nomenclature related references for
proteins
PCC6803.TXT Index of Synechocystis strain PCC 6803 entries
PDBTOSP.TXT Index of X-ray crystallography Protein Data Bank
(PDB) entries referenced in SWISS-PROT
PEPTIDAS.TXT Classification of peptidase families and index of
peptidase entries
PLASTID.TXT List of chloroplast and cyanelle encoded proteins
POMBE.TXT Index of Schizosaccharomyces pombe entries in
SWISS-PROT and their corresponding gene
designations
RESTRIC.TXT List of restriction enzyme and methylase entries
RIBOSOMP.TXT Index of ribosomal proteins classified by families
on the basis of sequence similarities
SALTY.TXT Index of Salmonella typhimurium LT2 chromosomal
entries and their corresponding StyGene cross-
references
SUBTILIS.TXT Index of Bacillus subtilis 168 chromosomal entries
and their corresponding SubtiList cross-references
UPFLIST.TXT UPF (Uncharacterized Protein Families) list and
index of members
YEAST.TXT Index of Saccharomyces cerevisiae entries and
their corresponding gene designations
YEAST1.TXT Yeast Chromosome I entries
YEAST2.TXT Yeast Chromosome II entries
YEAST3.TXT Yeast Chromosome III entries
YEAST5.TXT Yeast Chromosome V entries
YEAST6.TXT Yeast Chromosome VI entries
YEAST7.TXT Yeast Chromosome VII entries
YEAST8.TXT Yeast Chromosome VIII entries
YEAST9.TXT Yeast Chromosome IX entries
YEAST10.TXT Yeast Chromosome X entries
YEAST11.TXT Yeast Chromosome XI entries
YEAST13.TXT Yeast Chromosome XIII entries
YEAST14.TXT Yeast Chromosome XIV entries
Notes:
1 New in release 36.
2 We apologize for having not included, with release 35, the
corresponding release notes. We are therefore including it with this
release. As we believe that it may be useful to always distribute the
release notes of the previous release, we will start to do so and
such a file will be now known as "OLDRLNOT.TXT".
3 Has been extensively updated and contains Web links to more than 640
journals.
4 Has been extensively updated and now includes synonyms for many
tissues.
We have continued to include in some SWISS-PROT document files the
references of Web sites relevant to the subject under consideration. There
are now 24 documents that include such links.
5. THE EXPASY WORLD-WIDE WEB SERVER
5.1 Background information
The most efficient and user-friendly way to browse interactively in
SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
the World-Wide Web (WWW) molecular biology server ExPASy. The ExPASy
server was made available to the public in September 1993, it is
reachable at the following address:
http://www.expasy.ch/
The ExPASy WWW server allows access, using the user-friendly hypertext
model, to the SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE, SWISS-3DIMAGE
and CD40Lbase databases and, through any SWISS-PROT protein sequence
entry, to other databases such as EMBL, Eco2DBASE, EcoCyc, FlyBase,
GCRDb, MaizeDB, SubtiList/NRSub, OMIM, PDB, HSSP, ProDom, REBASE, SGD,
YEPD and Medline. ExPAsy also offers many tools for the analysis of
protein sequences and 2D gels.
5.2 SWISS-SHOP
We provide, on ExPASy, a service called SWISS-SHOP. SWISS-Shop allows
any users of SWISS-PROT to indicate what proteins he/she is interested
in. This can be done using various criteria that can be combined:
- By entering one or more words that should be present in the
description line;
- By entering one or more species name(s) or taxonomic division(s);
- By entering one or more keywords;
- By entering one or more author names;
- By entering the accession number (or entry name) of a PROSITE
pattern or a user-defined sequence pattern;
- By entering the accession number (or entry name) of an existing
SWISS-PROT entry or by entering a "private" sequence.
Every week, the new sequences entered in SWISS-PROT are automatically
compared with all the criteria that have been defined by the users. If a
sequence corresponds to the selection criteria defined by a user, that
sequence is sent by electronic mail.
5.3 What is new on ExPASy
ExPASy is constantly modified and improved. If you wish to be informed
on the changes made to the server you can either:
- Read the document "History of changes, improvements and new
features" which is available at the address:
http://www.expasy.ch/www/history.html
- Subscribe to SWISS-Flash, a service that reports news of databases,
software and services developments. By subscribing to this service,
you will automatically get SWISS-Flash bulletins by electronic
mail. To subscribe use the address:
http://www.expasy.ch/www/swiss-flash.html
6. TREMBL - A SUPPLEMENT TO SWISS-PROT
The ongoing genome sequencing and mapping projects have dramatically
increased the number of protein sequences to be incorporated into SWISS-
PROT. Since we do not want to dilute the quality standards of SWISS-PROT
by incorporating sequences into the database without proper sequence
analysis and annotation, we cannot speed up the incorporation of new
incoming data indefinitely. But as we also want to make the sequences
available as fast as possible, we have introduced with SWISS-PROT a
computer annotated supplement. This supplement consists of entries in
SWISS-PROT-like format derived from the translation of all coding
sequences (CDS) in the EMBL nucleotide sequence database, except those
already included in SWISS-PROT.
We name this supplement TrEMBL (Translation from EMBL). It can be
considered as a preliminary section of SWISS-PROT. This SWISS-PROT release
is supplemented by TrEMBL release 6. TrEMBL is split in two main sections;
SP-TrEMBL and REM-TrEMBL:
- SP-TrEMBL (SWISS-PROT TrEMBL) contains the entries (150'329 in release
6) which should be incorporated into SWISS-PROT. SWISS-PROT accession
numbers have been assigned for all SP-TrEMBL entries.
- REM-TrEMBL (REMaining TrEMBL) contains the entries (27'428 in release
6) that we do not want to include in SWISS-PROT for a variety of
reasons (synthetic sequences, pseudogenes, translations of uncorrect
open reading frames, fragments with less than eight amino acids,
patent-derived sequences, immunoglobulins and T-cell receptors, etc.)
TrEMBL is available by FTP from the EBI server (ftp.ebi.ac.uk) in the
directory '/pub/databases/trembl'. It can be queried on WWW by the EBI SRS
server (http://www.ebi.ac.uk/). It is also available on the SWISS-PROT CD-
ROM and is searchable on the FASTA, BIC and BLAST servers of the EBI.
7. WEEKLY UPDATES OF SWISS-PROT
Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
are updated at each update:
new_seq.dat Contains all the new entries since the last full release;
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release;
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since the last release.
Currently these files are available on the following anonymous FTP
servers:
Organization Swiss Institute of Bioinformatics (SIB)
Address ftp.expasy.ch
Directory /databases/swiss-prot/updates
Organization European Bioinformatics Institute (EBI)
Address ftp.ebi.ac.uk
Directory /pub/databases/swissprot/new
!! Important notes !!
- Although we try to follow a regular schedule, we do not promise to
update these files every week. In some cases two weeks will elapse in-
between two updates.
- Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free.
- Instead of using the above files, you can, every week, download an
updated copy of the SWISS-PROT database. This file is available in the
directory containing the non-redundant database (see next section).
8. NON-REDUNDANT DATABASE
A few months ago, we started to distribute on the ExPASy and EBI FTP
servers, files that make up a non-redundant (see further) and complete
protein sequence database consisting of three components:
1) SWISS-PROT
2) TrEMBL
3) New entries to be later integrated into TrEMBL (hereafter known as
TrEMBL_New)
Every week three files are completely rebuilt. These files are named:
sprot.dat.Z, trembl.dat.Z and trembl_new.dat.Z. As indicated by their ".Z"
extension these are Unix "compress" format files which, when decompressed,
will produce ASCII files in SWISS-PROT format.
Three others files are also available (sprot.fas.Z, trembl.fas.Z and
trembl_new.fas.Z) Which are compressed "fasta" format sequence files
useful for building the databases used by FASTA, BLAST and other sequence
similarity search programs. Please do not use these files for other
purpose as you loose all annotations by using this very primitive format.
The files for the non-redundant database are stored in the directory
"/databases/sp_tr_nrdb" on the ExPASy FTP server (ftp.expasy.ch) and in
the directory "/pub/databases/sp_tr_nrdb" on the EBI FTP server
(ftp.ebi.ac.uk).
Additional notes
- The SWISS-PROT file continuously grows as new annotated sequences are
added.
- The TrEMBL file decreases in size as sequences are moved out of that
section after being annotated and moved into SWISS-PROT. Four times a
year a new release of TrEMBL is built at EBI, at this point the TrEMBL
file increases in size as it then includes all of the new data (see
next section) that has accumulated since the last release.
- The TrEMBL_New file starts as a very small file and grows in size until
a new release of TrEMBL is available.
- SWISS-PROT and TrEMBL share the same system of accession numbers.
Therefore you will not find any primary accession number duplicated
between the two sections. A TrEMBL entry (and its associated accession
number(s)) can either move to SWISS-PROT as new entry or be merged with
an existing SWISS-PROT entry. In the later case, the accession
number(s) of that TrEMBL entry are added to that of the SWISS-PROT
entry.
- TrEMBL_New does not have real accession numbers. However it was
necessary to have an "AC" line so as to be able to use it with
different software products. This AC line contains a temporary
identifier which consists of the pID (protein identifier) of the coding
sequence in the parent nucleotide sequence.
- While these three files allow you to build what we call a "non-
redundant" database, it must be noted that this is not completely a
true statement. Without going into a long explanation we can say that
this is currently the best attempt in providing a complete selection of
protein sequence entries yet trying to eliminate redundancies. While
SWISS-PROT is completely (well 99.994% !) non-redundant, TrEMBL is far
from being non-redundant and the addition of SWISS-PROT + TrEMBL is
even less.
- To describe to your users the version of the non-redundant database
that you are providing to them, you should use a statement of the form:
SWISS-PROT release 36 and updates until {current_date};
TrEMBL release 6 minus data integrated into SWISS-PROT as of
{current_date};
New preliminary TrEMBL entries created since release 6 of TrEMBL
9. ENZYME and PROSITE
9.1 The ENZYME data bank
Release 23.0 of the ENZYME data bank is distributed with release 36 of
SWISS-PROT. ENZYME release 23.0 contains information relative to 3704
enzymes. It also differs from the previous release (22 of November 1997)
in that the "DE" (Description), "AN" (Alternative Names), "CF" (Cofactor)
and "CC" (Comments) lines are now in mixed-case characters instead of
being all in UPPER case.
Example, what was before:
ID 1.4.4.2
DE GLYCINE DEHYDROGENASE (DECARBOXYLATING).
AN GLYCINE DECARBOXYLASE.
AN GLYCINE CLEAVAGE SYSTEM P-PROTEIN.
CA GLYCINE + LIPOYLPROTEIN = S-AMINOMETHYLDIHYDROLIPOYLPROTEIN + CO(2).
CF PYRIDOXAL-PHOSPHATE.
CC -!- LIPOAMIDE CAN ALSO ACT AS ACCEPTOR.
CC -!- A COMPONENT, WITH EC 2.1.2.10, OF THE GLYCINE CLEAVAGE SYSTEM,
CC PREVIOUSLY KNOWN AS GLYCINE SYNTHASE.
DI NONKETOTIC HYPERGLYCINEMIA TYPE II; MIM:238310.
DR P54376, GCS1_BACSU; P54377, GCS2_BACSU; P49361, GCSA_FLAPR;
DR P49362, GCSB_FLAPR; P15505, GCSP_CHICK; P33195, GCSP_ECOLI;
DR O49850, GCSP_FLAAN; O49852, GCSP_FLATR; P23378, GCSP_HUMAN;
DR Q50601, GCSP_MYCTU; P26969, GCSP_PEA ; Q09785, GCSP_SCHPO;
DR O49954, GCSP_SOLTU; P49095, GCSP_YEAST;
//
is now:
ID 1.4.4.2
DE Glycine dehydrogenase (decarboxylating).
AN Glycine decarboxylase.
AN Glycine cleavage system P-protein.
CA GLYCINE + LIPOYLPROTEIN = S-AMINOMETHYLDIHYDROLIPOYLPROTEIN + CO(2).
CF Pyridoxal-phosphate.
CC -!- Lipoamide can also act as acceptor.
CC -!- A component, with EC 2.1.2.10, of the glycine cleavage system,
CC previously known as glycine synthase.
DI NONKETOTIC HYPERGLYCINEMIA TYPE II; MIM:238310.
DR P54376, GCS1_BACSU; P54377, GCS2_BACSU; P49361, GCSA_FLAPR;
DR P49362, GCSB_FLAPR; P15505, GCSP_CHICK; P33195, GCSP_ECOLI;
DR O49850, GCSP_FLAAN; O49852, GCSP_FLATR; P23378, GCSP_HUMAN;
DR Q50601, GCSP_MYCTU; P26969, GCSP_PEA ; Q09785, GCSP_SCHPO;
DR O49954, GCSP_SOLTU; P49095, GCSP_YEAST;
//
We plan to convert the "CA" (Catalytic Activity) lines to mixed-case for
the next release.
9.2 The PROSITE data bank
Release 15.0 of the PROSITE data bank is distributed with release 36 of
SWISS-PROT. This release of PROSITE contains 1014 documentation entries
that describe 1'352 different patterns, rules and profiles/matrices.
10. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available. To facilitate such feedback's we offer on the
ExPASY WWW server a form that allows the submission of updates and/or
corrections to SWISS-PROT:
http://www.expasy.ch/sprot/sp_update_form.html
It is also possible, from any entries in SWISS-PROT displayed by the
ExPASy server, to submit updates and/or corrections for that particular
entry. Finally, you can also send your comments by electronic mail to
the address:
swiss-prot@expasy.ch
11. IMPORTANT ANNOUNCEMENT
It became obvious in the last years that the tremendous increase in data
flow has created a requirement for resources which cannot be addressed in
full by public funding. This is causing databases to fall behind the
research. We believe that the only solution to the resource shortfall is
to ask commercial users to participate by paying a license fee. No fee
will be charged to academic users, nor will any restriction be imposed on
their use or reuse of the data. both SWISS-PROT and PROSITE are concerned
by these changes while this is not the case of ENZYME.
A document fully describing what will be the impact of this change for
SWISS-PROT is available with the SWISS-PROT distribution files on FTP
(SP_98.TXT). You can also access the document as well as other relevant
ones from:
http://www.expasy.ch/announce/
http://www.ebi.ac.uk/news.html
If you do not have the time to read this document, the most important
take-home message is that these changes should not have any impact on the
way SWISS-PROT or PROSITE are accessed or redistributed. Academic users
will not be affected by these changes. Industrial end-users will also not
directly be affected as long as their employer pays the license fee. The
same holds true for bioinformatics companies. Academic software or
database developers as well as providers of database distribution services
will be only minimally affected by these changes. We hope to be able to
keep the spirit of SWISS-PROT and PROSITE alive and at the same time
ensure their long-term financial survival. We sincerely hope and believe
that in the next two years the only change that will matter will be the
increase in scope and timeliness of the databases.
Finally, it should be noted that release 36 of SWISS-PROT and release 15
of PROSITE are not concerned by these changes. There are no restrictions
on their use and their distribution.
========================================================================
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.58 Gln (Q) 3.99 Leu (L) 9.42 Ser (S) 7.15
Arg (R) 5.14 Glu (E) 6.35 Lys (K) 5.93 Thr (T) 5.69
Asn (N) 4.47 Gly (G) 6.83 Met (M) 2.37 Trp (W) 1.24
Asp (D) 5.28 His (H) 2.24 Phe (F) 4.08 Tyr (Y) 3.18
Cys (C) 1.67 Ile (I) 5.80 Pro (P) 4.91 Val (V) 6.56
Asx (B) 0.001 Glx (Z) 0.001 Xaa (X) 0.01
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Ser, Gly, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 6002
The first twenty species represent 35826 sequences: 48.4 % of the total
number of entries.
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 2754
2x: 951
3x: 479
4x: 332
5x: 238
6x: 212
7x: 159
8x: 99
9x: 102
10x: 73
11- 20x: 277
21- 50x: 176
51-100x: 72
>100x: 78
A.2.2 Table of the most represented species
Number Frequency Species
1 4980 Human
2 4787 Baker's yeast (Saccharomyces cerevisiae)
3 4416 Escherichia coli
4 3253 Mouse
5 2491 Rat
6 1970 Bacillus subtilis
7 1887 Caenorhabditis elegans
8 1693 Haemophilus influenzae
9 1315 Fission yeast (Schizosaccharomyces pombe)
10 1283 Methanococcus jannaschii
11 1088 Bovine
12 1042 Fruit fly (Drosophila melanogaster)
13 873 Mycobacterium tuberculosis
14 840 Chicken
15 719 Arabidopsis thaliana (Mouse-ear cress)
16 706 Salmonella typhimurium
17 697 African clawed frog (Xenopus laevis)
18 616 Synechocystis sp. (strain PCC 6803)
19 607 Pig
20 563 Rabbit
21 489 Mycoplasma pneumoniae
22 470 Mycoplasma genitalium
23 406 Maize
24 403 Rhizobium sp. (strain NGR234)
25 345 Pseudomonas aeruginosa
26 334 Helicobacter pylori
27 304 Rice
28 284 Dog
29 280 Slime mold (Dictyostelium discoideum)
30 278 Tobacco
31 273 Bacteriophage T4
32 253 Vaccinia virus (strain Copenhagen)
33 250 Mycobacterium leprae
34 244 Sheep
35 240 Pea
36 219 Porphyra purpurea
37 215 Barley
38 212 Staphylococcus aureus
39 209 Neurospora crassa
40 208 Soybean
41 205 Wheat
42 195 Tomato
43 193 Rhodobacter capsulatus
193 Human cytomegalovirus (strain AD169)
45 192 Candida albicans
192 Potato
47 191 Klebsiella pneumoniae
48 190 Methanobacterium thermoautotrophicum
49 185 Bacillus stearothermophilus
50 184 Vaccinia virus (strain WR)
51 178 Pseudomonas putida
52 164 Agrobacterium tumefaciens
53 160 Spinach
160 Guinea pig
55 158 Chlamydomonas reinhardtii
56 157 Rhizobium meliloti
57 154 Autographa californica nuclear polyhedrosis virus
58 150 Marchantia polymorpha (Liverwort)
59 146 Variola virus
146 Cyanophora paradoxa
61 145 Aspergillus nidulans
62 139 Odontella sinensis
63 136 Streptomyces coelicolor
136 Golden hamster
136 Lactococcus lactis (subsp. lactis)
66 134 Orgyia pseudotsugata multicapsid polyhedrosis virus
67 130 Horse
68 127 Kluyveromyces lactis
69 125 Thermus aquaticus (subsp. thermophilus)
70 124 Trypanosoma brucei brucei
71 122 Synechococcus sp. (strain PCC 7942)
72 114 Anabaena sp. (strain PCC 7120)
73 113 Bradyrhizobium japonicum
74 111 Alcaligenes eutrophus
75 110 Bombyx mori (Silk moth)
76 107 Archaeoglobus fulgidus
77 105 Yersinia enterocolitica
78 101 Brassica napus (Rape)
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 3048 1001-1100 667
51- 100 6272 1101-1200 511
101- 150 9004 1201-1300 348
151- 200 7032 1301-1400 233
201- 250 6626 1401-1500 193
251- 300 6172 1501-1600 119
301- 350 5852 1601-1700 112
351- 400 5882 1701-1800 86
401- 450 4500 1801-1900 91
451- 500 4176 1901-2000 58
501- 550 3138 2001-2100 33
551- 600 2191 2101-2200 68
601- 650 1688 2201-2300 67
651- 700 1221 2301-2400 35
701- 750 1095 2401-2500 41
751- 800 891 >2500 207
801- 850 685
851- 900 736
901- 950 509
951-1000 432
A.4 Longest sequences
The longest sequences (>=4000 residues) are listed here:
HTS1_COCCA 5217
MUC2_HUMAN 5179
FAT_DROME 5147
RYNR_RABIT 5037
RYNR_PIG 5035
RYNR_HUMAN 5032
RYNC_RABIT 4969
LRP_CAEEL 4753
DYHC_DICDI 4725
PLEC_RAT 4687
LRP2_RAT 4660
DYHC_RAT 4644
DYHC_DROME 4639
DYHC_CAEEL 4568
DYHB_CHLRE 4568
APB_HUMAN 4563
APOA_HUMAN 4548
LRP1_HUMAN 4544
LRP1_CHICK 4543
DYHC_PARTE 4540
RRPA_CVMJH 4488
DYHG_CHLRE 4485
DYHC_ANTCR 4466
DYHC_TRIGR 4466
GRSB_BACBR 4451
PKSK_BACSU 4447
PKSL_BACSU 4427
PGBM_HUMAN 4393
YP73_CAEEL 4385
DYHC_NEUCR 4367
DYHC_NECHA 4349
DYHC_EMENI 4344
PKD1_HUMAN 4303
DYHC_SCHPO 4196
DYHC_YEAST 4092
RRPA_CVH22 4085
A.5 Statistics for journal citations
Total number of journals cited in this release of SWISS-PROT: 913
A.5.1 Table of the frequency of journal citations
Journals cited 1x: 339
2x: 124
3x: 70
4x: 39
5x: 37
6x: 23
7x: 17
8x: 15
9x: 14
10x: 10
11- 20x: 63
21- 50x: 65
51-100x: 24
>100x: 73
A.5.2 List of the most cited journals in SWISS-PROT
Nb Citations Journal abbreviation
-- --------- ----------------------------------
1 6303 J. BIOL. CHEM.
2 3814 PROC. NATL. ACAD. SCI. U.S.A.
3 3384 NUCLEIC ACIDS RES.
4 2714 J. BACTERIOL.
5 2498 GENE
6 2058 FEBS LETT.
7 1932 EUR. J. BIOCHEM.
8 1780 BIOCHEM. BIOPHYS. RES. COMMUN.
9 1732 BIOCHEMISTRY
10 1713 EMBO J.
11 1617 NATURE
12 1438 BIOCHIM. BIOPHYS. ACTA
13 1339 J. MOL. BIOL.
14 1228 CELL
15 1184 MOL. CELL. BIOL.
16 953 MOL. GEN. GENET.
17 929 PLANT MOL. BIOL.
18 888 BIOCHEM. J.
19 873 GENOMICS
20 808 SCIENCE
21 768 MOL. MICROBIOL.
22 764 VIROLOGY
23 682 J. BIOCHEM.
24 515 J. VIROL.
25 464 YEAST
26 461 J. CELL BIOL.
27 445 J. GEN. VIROL.
28 417 PLANT PHYSIOL.
29 407 GENES DEV.
30 376 HUM. MOL. GENET.
31 346 J. IMMUNOL.
32 342 HUM. MUTAT.
33 323 ARCH. BIOCHEM. BIOPHYS.
34 319 CURR. GENET.
35 312 ONCOGENE
36 312 INFECT. IMMUN.
37 305 MOL. BIOCHEM. PARASITOL.
38 270 FEMS MICROBIOL. LETT.
39 264 BIOL. CHEM. HOPPE-SEYLER
40 261 STRUCTURE
41 254 AM. J. HUM. GENET.
42 247 NAT. GENET.
43 239 DEVELOPMENT
44 237 MOL. ENDOCRINOL.
45 234 J. CLIN. INVEST.
46 218 J. MOL. EVOL.
47 218 J. GEN. MICROBIOL.
48 213 HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
49 204 MICROBIOLOGY
50 202 GENETICS
51 191 HUM. GENET.
52 188 NAT. STRUCT. BIOL.
53 186 DNA CELL BIOL.
54 182 J. EXP. MED.
55 181 BLOOD
56 175 DEV. BIOL.
57 174 APPL. ENVIRON. MICROBIOL.
58 172 NEURON
59 157 PROTEIN SCI.
60 153 DNA
61 145 IMMUNOGENETICS
62 137 ENDOCRINOLOGY
63 136 DNA SEQ.
64 125 PLANT CELL
65 115 HEMOGLOBIN
66 113 CANCER RES.
67 113 BIOCHIMIE
68 109 J. NEUROCHEM.
69 109 BIOORG. KHIM.
70 108 MOL. BIOL. EVOL.
71 107 AGRIC. BIOL. CHEM.
72 106 BRAIN RES. MOL. BRAIN RES.
73 105 PLANT J.
========================================================================
APPENDIX B: RELATIONSHIPS BETWEEN SWISS-PROT AND SOME BIOMOLECULAR
DATABASES
The current status of the relationships (cross-references) between
SWISS-PROT and some biomolecular databases is shown in the following
schematic:
***********************
* EMBL Nucleotide *
* Sequence Database *
* [EBI] *
***********************
^ ^ ^ ^ ^ ^ ^ ^ ^
****************** | | | I | | | | | **********************
* FlyBase * <-------+ | | I | | | | +-------> * MGD [Mouse] *
****************** | | | I | | | | | **********************
| | | I | | | | |
****************** | | | I | | | | | **********************
* SubtiList * <---------+ | I | | | +---------> * GCRDb [7TM recep.] *
* [B.subtilis] * | | | I | | | | | **********************
****************** | | | I | | | | |
| | | I | | | | | **********************
****************** | | | I | | +-----------> * EcoGene [E.coli] *
* Mendel [Plant] * <-----+ | | | I | | | | | **********************
****************** | | | | I | | | | |
| | | | I | | | | | **********************
****************** | | | | I +---------------> * SGD [Yeast] *
* MaizeDb * <-----------+ I | | | | | **********************
* [Zea mays] * | | | | I | | | | |
****************** | | | | I | | | | | **********************
| | | | I | +-------------> * DictyDB [D.disco.] *
****************** | | | | I | | | | | **********************
* WormPep * | | | | I | | | | |
* [C.elegans] * <---+ | | | | I | | | | | **********************
****************** | | | | | I | | | | | +-----> * ENZYME [Nomencl.] *
| | | | | I | | | | | | **********************
****************** | v v v v v v v v v v v v
* REBASE * ************************* **********************
* [Restriction * <-- * SWISS-PROT * ----> * OMIM [Human] *
* enzymes] * * Protein Sequence * **********************
****************** * Data Bank *
************************* **********************
****************** ^ ^ ^ ^ ^ ^ ^ | ^ ^ ^ * ECO2DBASE [2D] *
* StyGene * | | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | | |
****************** | | | | | | | | | **********************
| | | | | | | | +----------> * Maize-2DPAGE [2D] *
****************** | | | | | | | | **********************
* Transfac * <------+ | | | | | | |
****************** | | | | | | | **********************
| | | | | | +------------> * SWISS-2DPAGE [2D] *
****************** | | | | | | **********************
* Harefield [2D] * <--------+ | | | | |
****************** | | | | | **********************
| | | | +--------------> * Aarhus/Ghent [2D] *
****************** | | | | **********************
* PROSITE * | | | |
* [Patterns and * <----------+ | | +----------------> **********************
* profiles] * | | * YEPD [Yeast] [2D] *
****************** | +----------------+ **********************
| v |
| *********************** +-> **********************
+--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
*********************** **********************
=End=of=SWISS-PROT=release=36=notes=====================================