Protein
Select the database(s) and taxon(s) on which you want to limit the search to.
Note! Multiple selections are possible by pressing the "Ctrl" key.
Some remarks about the database(s) selection:
- The databases available on ExPASy are Swiss-Prot and TrEMBL. Consult the Swiss-Prot/TrEMBL page, the SPTR README file or the Swiss-Prot/TrEMBL references for more information about these databases.
- When searching Swiss-Prot, you can choose to include or not all alternative splice isoforms described in Swiss-Prot feature tables (e.g. Isoform 12S of O43184). For splice isoforms, no processing or post-translational modifications documented in the Swiss-Prot feature table are taken into account, as they are usually documented for the primary isoform described in the original entry.
- Annotation in the TrEMBL database is done automatically; therefore it is incomplete and not always correct. Where available, TrEMBL annotation is used like for Swiss-Prot to process the proteins into mature chains or peptides. TrEMBL results should therefore be interpreted with care.
- Some Swiss-Prot/TrEMBL entries contain ambiguous residues (X = any amino acid,
B = Asx = Asp(D) or Asn(N), Z = Glx = Gln(Q) or Glu(E)). Example for such
entries is P19341. As substitution of D by N or of Q by E induces mass
differences of about 1 Dalton, it is not possible to compute exact masses for
peptides containing one or more residues B, Z or X. Therefore protein sequences containing such ambiguous residues are
not included in the database.
Some remarks about the taxon(s) selection:
- We define "single species matching" where you, for example, have proteins from E. coli which you then match against only the E. coli proteins in the database. This is a good approach to use when the organism you are working with is molecularly well defined, or ideally, the subject of a genome project.
- If the source of your proteins is not molecularly well defined, it is best to do "cross-species matching". Thus, for example, if you are working with proteins from C. albicans, you may wish to either match your proteins against all proteins from fungi or against the fully sequenced yeast S. cerevisiae.
- when cross-species matching, protein pI is frequently poorly conserved [ref], but protein
mass is generally very well conserved. You should take this into consideration when setting your pI and
Mw ranges.
- Peptide masses are not well conserved across species boundaries. The poor conservation of peptide mass data is expected, as a single amino acid substitution in any peptide can drastically change its mass [1].
Some remarks about the Mw range selection:
- For bacterial proteins larger than 20 KDa, a range of ± 20% around the Mw estimate is usually sufficient. For smaller proteins, allow a ± 40% range. For cytoplasmic eukaryotic proteins this range is also usually sufficient, but secreted eukaryotic proteins often carry post-translational modifications that require a range of respectively ± 40% and 100% or more to be inclusive. If masses have been determined with MS, the ranges used can be much smaller.
However, note that if MS has been used to determine the size of a glycoprotein or another heavily modified protein, the measured mass may be considerably larger than the mass of the unmodified polypeptide predicted in the database.
- If you only have a vague idea of the protein Mw, you can use a large range. However, as the proteins are ranked by the number of matching peptide masses, very large proteins are likely to obtain a high score and appear at the top of the list. Eliminating proteins with high molecular weight can reduce random matches. Whenever you have an idea about the Mw range, it is highly recommended to use this information in the identification to speed up searches and to reduce false positives.
Some remarks about the pI range selection:
- For bacterial proteins separated in IPG gradients, a range of ± 0.25 around the estimate is usually sufficient. For eukaryotic proteins, increase this range to ± 0.5 units if the proteins are thought to be unmodified. If there is a high probability that the eukaryotic protein carries charge-modifying modifications, such as sialic acid, the range should be changed to ± 1.
- If you only have a vague idea of the protein pI, use a very large range. Even using a pI with a large range can increase the power of your search.