ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us PROSITE Swiss-Prot
Search for

PROSITE documentation PDOC50099

Sequence regions enriched in a particular amino acids

Description:

Many proteins contain compositionally biased sequence regions which are also called low-complexity regions [1]. Typically, such regions are highly enriched in one or a few amino acids. We have included profiles specific for each of the 20 amino acids so as to search for regions that are significantly enriched in a particular amino acid. The behaviour of these profiles is controlled by two parameters, the match and mismatch scores. These parameters were chosen such that the "target frequencies" of the corresponding amino acids computed according to the Karlin-Altschul theory [2] approximate 35% for the residue composition of Swiss-Prot (see below).

   Amino     Average    Match    Mismatch   Target
   acid      frequency  score    score      frequency

   Ala (A)   7.55          4       -1       38.5
   Cys (C)   1.69          7       -1       36.8
   Asp (D)   5.30          5       -1       35.1
   Glu (E)   6.32          5       -1       32.4
   Phe (F)   4.07          6       -1       31.9
   Gly (G)   6.84          5       -1       31.2
   His (H)   2.24          7       -1       33.6
   Ile (I)   5.72          5       -1       34.0
   Lys (K)   5.93          5       -1       33.4
   Leu (L)   9.33          4       -1       34.7
   Met (M)   2.35          7       -1       33.1
   Asn (N)   4.52          5       -1       37.4
   Pro (P)   4.92          5       -1       36.2
   Gln (Q)   4.02          6       -1       32.1
   Arg (R)   5.15          5       -1       35.5
   Ser (S)   7.22          4       -1       39.2
   Thr (T)   5.74          5       -1       33.9
   Val (V)   6.52          5       -1       32.0
   Trp (W)   1.25          8       -1       34.9
   Tyr (Y)   3.19          6       -1       35.1

The normalisation parameters for converting raw scores into per-residue log expectation values, which are given within the profile, were empirically derived by fitting an extreme value distribution to the score distribution obtained from a random database that conserves the length distribution and global amino acid composition of Swiss-Prot but not the composition of the individual sequences.

Note:

These profiles do not characterize biologically defined objects. As the underlying definition is purely statistical, it is not possible to speak of true or false matches to these profiles, neither is it possible to assign a false negative status to a sequence.

Expert(s) to contact by email:

Bucher P.

Last update:

April 2002 / First entry.

Technical section:

PROSITE methods (with tools and information) covered by this documentation:

ALA_RICH, PS50310Alanine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50310
view ligand binding statistics
ARG_RICH, PS50323Arginine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50323
view ligand binding statistics
ASN_RICH, PS50321Asparagine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50321
view ligand binding statistics
ASP_RICH, PS50312Aspartic acid-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50312
view ligand binding statistics
CYS_RICH, PS50311Cysteine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50311
view ligand binding statistics
GLN_RICH, PS50322Glutamine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50322
view ligand binding statistics
GLU_RICH, PS50313Glutamic acid-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50313
view ligand binding statistics
GLY_RICH, PS50315Glycine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50315
view ligand binding statistics
HIS_RICH, PS50316Histidine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50316
view ligand binding statistics
ILE_RICH, PS50317Isoleucine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50317
view ligand binding statistics
LEU_RICH, PS50319Leucine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50319
view ligand binding statistics
LYS_RICH, PS50318Lysine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50318
view ligand binding statistics
MET_RICH, PS50320Methionine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50320
view ligand binding statistics
PHE_RICH, PS50314Phenylalanine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50314
view ligand binding statistics
PRO_RICH, PS50099Proline-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50099
view ligand binding statistics
SER_RICH, PS50324Serine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50324
view ligand binding statistics
THR_RICH, PS50325Threonine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50325
view ligand binding statistics
TRP_RICH, PS50327Tryptophan-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50327
view ligand binding statistics
TYR_RICH, PS50328Tyrosine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50328
view ligand binding statistics
VAL_RICH, PS50326Valine-rich region profile  (MATRIX with a high probability of occurrence!)
Scan Swiss-Prot/TrEMBL entries against PS50326
view ligand binding statistics

References:

1 AuthorsWootton J.C., Federhen S.
TitleAnalysis of compositionally biased regions in sequence databases.
SourceMethods Enzymol. 266:554-571(1996).
PubMed ID8743706
2 AuthorsKarlin S., Bucher P., Brendel V., Altschul S.F.
TitleStatistical methods and insights for protein and DNA sequences.
SourceAnnu. Rev. Biophys. Biophys. Chem. 20:175-203(1991).
PubMed ID1867715

Copyright:

PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to license@isb-sib.ch or see: http://www.expasy.org/prosite/prosite_license.htm.

Miscellaneous:

View entry in original PROSITE document format
View entry in raw text format (no links)

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us PROSITE Swiss-Prot
 Hosted by ch flag SIB Switzerland Mirror sites: Australia  Brazil  Canada  China  Korea