sib logo SIB resources
external logo External resources - (No support from the ExPASy Team)

Databases

  • PROSITE  •  PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. [less]
  • HAMAP  •  HAMAP is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated, manually created annotation rules that specify annotations that apply to family members. HAMAP is applied to bacterial, archaeal and eukaryotic proteins and used to annotate records in UniProtKB via UniProt's automatic annotation pipeline. [less]
  • MyHits  •  Hits is a free database devoted to protein domains. It is also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them. These motifs are defined by an heterogeneous collection of predictors, which currently includes regular expressions, generalized profiles and hidden Markov models. [less]
  • PANDITplus  •  Along with sequence data for Pfam gene families and protein domains, PANDITplus provides access to data on protein interactions, functional and chemical pathway annotation, gene expression, and association with disease, and pre-computed estimates from evolutionary codon models. [less]

Tools

  • 2ZIP  •  The leucine zipper is a dimerisation domain occurring mostly in regulatory and thus in many oncogenic proteins. 2ZIP combines a standard coiled coil prediction algorithm with an approximate search for the characteristic leucine repeat. No further information from homologues is required for prediction. This approach improves significantly over existing methods, especially in that the coiled coil prediction turns out to be highly informative and avoids large numbers of false positives [less]
  • 3of5  •  The 3of5 application aims at the analysis of protein sequences to find user-defined patterns, described via complex regular expressions-like terms, e.g. to search for a motif with 3 basic AAs in 5 positions.  [less]
  • Coiled-Coils prediction  •  This program delineates coiled-coil domains in otherwise globular proteins, such as the leucine zipper domains in transcriptional regulators, and to predict regions of discontinuity within coiled-coil structures, such as the hinge region in myosin. [less]
  • ELM  •  Eukaryotic Linear Motif resource for functional sites in proteins  [less]
  • epestfind  •  epestfind allows rapid and objective identification of PEST motifs in protein target sequences. Those proteins share high local concentrations of amino acids proline (P), glutamic acid (E), serine (S), threonine (T) and to a lesser extent aspartic acid (D). It seems that PEST motifs reduce the half-lives of proteins dramatically and hence, that they target proteins for proteolytic degradation [less]
  • FingerPRINTScan  •  The program scans a protein sequence against the PRINTS Protein Fingerprint Database. [less]
  • GENIO/logo  •  Positional dependent information contents of aligned RNA/DNA or amino acid sequences are useful for the display of consensus sequences and for finding optimal search windows used in sequence analysis. The program calculates the positional information content of mono or poly nucleotides/amino acids from a FASTA file of aligned sequences and writes a Postscript (or encapsulated Postscript, EPS) file that can be viewed and included in text processors [less]
  • HAMAP  •  HAMAP is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated, manually created annotation rules that specify annotations that apply to family members. HAMAP is applied to bacterial, archaeal and eukaryotic proteins and used to annotate records in UniProtKB via UniProt's automatic annotation pipeline. [less]
  • HAMAP-Scan  •  Scan several protein sequences or a whole genome (all ORFs) against HAMAP family profiles. Sequences that match HAMAP profiles will be annotated in the UniProtKB format by the associated annotation rules.  [less]
  • HeliQuest  •  HeliQuest calculates from the amino acid sequence of a helix (??-helix, 3-10 helix, 3-11 helix or ?? helix) its physicochemical properties and amino acid composition and uses the results to screen any databank in order to identify protein segments possessing similar features.  [less]
  • InterProScan  •  InterProScan is a tool that combines different protein signature recognition methods into one resource. The number of signature databases and their associated scanning tools, as well as the further refinement procedures, increases the complexity of the problem. [less]
  • Multicoil  •  The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. The method is based on the PairCoil algorithm. To analyze your own sequences with MultiCoil, you can either use the web interface or download the program.  [less]
  • MyDomains  •  Image Creator for linear (multi) domain views [less]
  • MyHits  •  Hits is a free database devoted to protein domains. It is also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them. These motifs are defined by an heterogeneous collection of predictors, which currently includes regular expressions, generalized profiles and hidden Markov models. [less]
  • Paircoil  •  The Paircoil program predicts the location of coiled-coil regions in amino acid sequences. [less]
  • Paircoil2  •  Paircoil2 predicts the parallel coiled coil fold from sequence using pairwise residue probabilities with the Paircoil algorithm and an updated coiled coil database. Paircoil2 shows improved performance over Paircoil and other coiled-coil prediction algorithms.  [less]
  • PattInProt  •  PattInProt allows to scan a protein database of one or several sequences for one or several patterns written in PROSITE syntax. The tool allows to specify an allowed number of mismatches or a similarity threshold towards the pattern. [less]
  • Pfam HMM Search  •  Scans a sequence against the Pfam protein families database. [less]
  • pftools  •  The pftools are a collection of programs to build, calibrate, and search biological sequences with generalized profiles. Generalized profiles are an extension of position specific scoring matrices by including position specific scores for insertions and deletions. They correspond to a matrix representation of a multiple sequence alignment that can be used to search distant homologous sequences and precisely align sequences to the model. [less]
  • PLOGO  •  Protein sequence logos using relative entropy. [less]
  • PPSearch  •  Search your query sequence for protein motifs, rapidly compare your query protein sequence against all patterns stored in the PROSITE pattern database and determine what the function of an uncharacterised protein is. This tool requires a protein sequence as input, but DNA/RNA may be translated into a protein sequence using transeq and then queried. Allows a graphical output. [less]
  • PRATT  •  An important problem in sequence analysis is to find patterns matching sets or subsets of sequences. This tool allows the user to discover patterns conserved in sets of unaligned protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported. The patterns are reported using PROSITE syntax. [less]
  • PRATT (EBI)  •  An important problem in sequence analysis is to find patterns matching sets or subsets of sequences.This tool allows the user to search for patterns conserved in sets of unaligned protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported.  [less]
  • PredictProtein  •  PredictProtein integrates feature prediction for secondary structure, solvent accessibility, transmembrane helices, globular regions, coiled-coil regions, structural switch regions, B-values, disorder regions, intra-residue contacts, protein-protein and protein-DNA binding sites, sub-cellular localization, domain boundaries, beta-barrels, cysteine bonds, metal binding sites and disulphide bridges. [less]
  • ProDom  •  ProDom is a protein domain family database constructed automatically by clustering homologous segments. Compare your sequence with ProDom by running a Blast-P or Blast-X search against: the consensus sequence provided with the ProDom families or the multiple alignments provided with each ProDom family. ProDom-CG (Complete Genomes) and ProDom-SG (Structural Genomics Candidate Search) can also be searched. [less]
  • Protein Sequence Logos  •  Protein sequence alignment viewed as sequence logos. The total height of the sequence information part is computed as the relative entropy between the observed fractions of a given symbol and the respective a priori probabilities. [less]
  • Radar  •  RADAR stands for Rapid Automatic Detection and Alignment of Repeats in protein sequences. Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. Radar is uses an automatic algorithm, for segmenting your query sequence into repeats, it identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in your query sequence. [less]
  • REP  •  Protein search for repeats, using a collection of repeat families [less]
  • REPRO  •  REPRO is able to recognise distant repeats in a single query sequence. The technique relies on a variation of the Smith-Waterman local alignment strategy to find non-overlapping top-scoring local alignments, followed by a graph-based iterative clustering procedure to delineate the repeat set(s) based on consistency of the pairwise top-alignments. [less]
  • ScanProsite  •  Scan protein sequence(s) against PROSITE, or search for hits by specific motif(s) in protein sequence database(s). ScanProsite may be used alternatively in quick scan mode or advanced scan mode.  [less]
  • Seq2Logo  •  Seq2Logo is a web-based sequence logo method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.  [less]
  • SeqLogos  •  SeqLogos generates sequences logos from amino acid sequence alignment. Sequences logos are useful tools to visualize sequence patterns and represent a more informative alternative to consensus sequence [less]
  • SMART  •  The Simple Modular Architecture Research Tool allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. Normal SMART, (against UniProtKB and stable Ensembl proteomes), and Genomic SMART, (against complete proteomes) are supported. [less]
  • Superfamily Sequence search  •  Assign SCOP domains to your sequences using the SUPERFAMILY hidden Markov models [less]
  • SUPERFAMILY Sequence Search  •  Use the SUPERFAMILY database of structural and functional annotation to provide structural (and hence implied functional) assignments to protein sequences primarily at the SCOP superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor. This service offers sophisticated and expertly chosen remote homology detection. [less]
  • T-REKS  •  T-REKS is an algorithm for de novo detection and alignment of repeats in sequences based on K-means algorithm. Minimal length of repeat arrays is 9 for true homorepeats and 14 for other repeats with potential biological meaning. [less]
  • TEIRESIAS  •  Pattern discovery on event streams of alphanumeric characters. Possible alphabet sets include nucleic acids, amino acids, etc. [less]
  • TRUST  •  TRUST is a method for ab-initio determination of internal repeats in proteins. The high sensitivity and accuracy of the method is achieved by exploiting the concept of transitivity of alignments.  [less]
  • WebLogo  •  Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position.  [less]
  • XSTREAM  •  XSTREAM is a rapid and powerful algorithm for identifying perfect and degenerate tandem repeat motifs in protein (and nucleotide) sequence data. XSTREAM also effectively models the architecture of repetitive domains in tandem repeat proteins and eliminates motif redundancy to identify "fundamental" tandem repeat patterns. [less]