sib logo SIB resources
external logo External resources - (No support from the ExPASy Team)

Databases

  • EPD  •  The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references.  [less]
  • OMA  •  OMA is a project that aims to identify orthologs among publicly available, complete genomes. With many hundreds of genomes analyzed to date, OMA is one of the largest projects of its kind. [less]
  • OrthoDB  •  Catalog of eukaryotic orthologous protein-coding genes. OrthoDB explicitly delineates orthologs at each radiation along the species phylogeny. Available protein descriptors, together with Gene Ontology and InterPro attributes, serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive database querying. Data sources include proteomes from arthropods, fungi, vertebrates and basal metazoans. [less]
  • smirnaDB  •  smiRNAdb is a database containing expression information for human, mouse, rat, zebrafish, worm and fruitfly small RNAs (mostly miRNAs) [less]
  • SwissRegulon  •  SwissRegulon is a database of genome-wide annotations of regulatory sites. It contains annotations for 17 prokaryotes and 3 eukaryotes. The database frontend offers an intuitive interface showing genomic information in a clear and comprehensible graphical form. [less]
  • arrayMap  •  arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. [less]
  • CLIPZ  •  CLIPZ supports the automatic functional annotation of short reads resulting primarily from crosslinking and immunoprecipitation experiments (CLIP) performed with RNA-binding proteins in order to identify the binding sites of these proteins. The functional annotation could be also applied to short reads resulting from other type of experiments such as mRNA-Seq, Digital Gene Expression, small RNA cloning, etc. CLIPZ enables visualization, mining and analysis of data sets.  [less]
  • ElMMo  •  A website to browse miRNA target predictions from the ElMMo algorithm [less]
  • GPSDB  •  GPSDB (Gene and Protein Synonym DataBase) collects gene/protein names, in a species specific way, from several biological resources. A web-based search interface gives access to the database: given a gene/protein name, it retrieves all synonyms for this entity and queries Medline with a set of user-selected terms. [less]
  • ImmunoDB  •  A simple access point to view gene family assignments, annotations and phylogenetic data for insect immune-related genes and gene families. [less]
  • miROrtho  •  miROrtho contains predictions of precursor miRNA genes covering many animal genomes combining evidence from sequence homology and Support Vector Machine classifiers. We provide both consistent extrapolation of already known miRBase families and novel miRNA predictions by our SVM and orthology pipeline. [less]
  • OpenFlu  •  The OpenFlu database (OpenFluDB) is part of a collaborative effort to share observations on the evolution of Influenza virus in both animals and humans. It contains genomic and protein sequences as well as epidemiological data from more than 25'000 isolates. [less]
  • Progenetix  •  The Progenetix database provides an overview of copy number abnormalities in human cancer from Comparative Genomic Hybridization (CGH) experiments. With >30000 cases from >1000 publications, Progenetix is the largest curated database for whole genome copy number profiles. The current dataset contains >20000 chromosomal CGH and >10000 profiles from genomic array experiments. This data covers hundreds of cancer entities. [less]

Tools

  • EPD  •  The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references.  [less]
  • OMA  •  OMA is a project that aims to identify orthologs among publicly available, complete genomes. With many hundreds of genomes analyzed to date, OMA is one of the largest projects of its kind. [less]
  • smirnaDB  •  smiRNAdb is a database containing expression information for human, mouse, rat, zebrafish, worm and fruitfly small RNAs (mostly miRNAs) [less]
  • ALF  •  ALF simulates a wide range of evolutionary forces that act on genomes, such as character substitutions, indels, gene duplication, gene loss, lateral gene transfer and genome rearrangement. [less]
  • Alignment tools  •  Four tools for multiple alignments. [less]
  • arrayMap  •  arrayMap is a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. [less]
  • Association Viewer  •  AssociationViewer is a Java application used to display SNPs in a genetic context. Supplementary data (such as genes or LD plots) is downloaded from various public data sources on the fly and saved locally in a cache. Custom data can be added as supplementary tracks. [less]
  • BayeScan  •  BayeScan aims at identifying candidate loci under natural selection from genetic data, using differences in allele frequencies between populations. BayeScan is based on the multinomial-Dirichlet model. [less]
  • BLAST - NCBI  •  The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.  [less]
  • BLAST - PBIL  •  BLAST search on protein sequence databases [less]
  • Blast2Fasta  •  Converts BLAST output from NCBI to FASTA format. [less]
  • boxshade  •  Boxshade is a program for creating good looking printouts from multiple-aligned protein or DNA sequences.  [less]
  • ChIP-Seq  •  The ChIP-Seq tools are used to analyze ChIP-seq data and other types of mass genome annotation data (MGA).The programs are: a feature correlation tool (ChIP-cor); a tag centering tool (ChIP-center); a signal peak detection tool (ChIP-peak and a partitioning tool (ChIP-part) [less]
  • CLIPZ  •  CLIPZ supports the automatic functional annotation of short reads resulting primarily from crosslinking and immunoprecipitation experiments (CLIP) performed with RNA-binding proteins in order to identify the binding sites of these proteins. The functional annotation could be also applied to short reads resulting from other type of experiments such as mRNA-Seq, Digital Gene Expression, small RNA cloning, etc. CLIPZ enables visualization, mining and analysis of data sets.  [less]
  • ClustalW  •  Multiple alignment of nucleic acid and protein sequences. [less]
  • ClustalW - PBIL  •  ClustalW is a general purpose multiple sequence alignment program for DNA or proteins [less]
  • ClustalW2  •  ClustalW2 is a general purpose multiple sequence alignment program for DNA or proteins. [less]
  • Codon Suite  •  The CodonSuite server takes as input inframe coding DNA sequences and then performs various codon-based comparisons- codon-based alignments, a codon distance-based tree, SynPAM distance estimates and estimates of Nei and Gojoboris dN and dS.  [less]
  • COILS  •  COILS is a program that compares a sequence to a database of known parallel two-stranded coiled-coils and derives a similarity score. By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation. [less]
  • Decrease redundancy  •  Redundancy reduction in a set of aligned or unaligned sequences [less]
  • DIALIGN  •  While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing entire segments of the sequences. No gap penalty is used. This approach can be used for both global and local alignment, but it is particularly successful in situations where sequences share only local homologies. [less]
  • ElMMo  •  A website to browse miRNA target predictions from the ElMMo algorithm [less]
  • EMBOSS translation tools  •  Sequence Translation (Transeq, Sixpack) is used to translate nucleic acid sequence to corresponding peptide sequences. Back-translation (Backtranseq, Backtranambig) is used to predict the possible nucleic acid sequence that a specified peptide sequence has originated from.  [less]
  • ESTscan  •  ESTScan can detect coding regions (CDS) in DNA sequences, even if they are of low quality. It also detects/corrects sequencing errors that lead to frameshifts. ESTScan is not a gene prediction program , nor is it an open reading frame detector. In fact, its strength lies in the fact that it does not require an open reading frame to detect a coding region. The program may miss a few translated amino acids at termini, but detects coding regions with high selectivity and sensitivity. [less]
  • FASTA/SSEARCH/GGSEARCH/GLSEARCH  •  This tool provides sequence similarity searching against protein databases using the FASTA suite of programs. FASTA provides a heuristic search with a protein query. FASTX and FASTY translate a DNA query. Optimal searches are available with SSEARCH (local), GGSEARCH (global) and GLSEARCH (global query, local database).  [less]
  • FastEpistasis  •  FastEpistasis is a high performance tool designed to test for epistasis effects when analysing continuous phenotypes. It computes tests of epistasis for a large number of SNP pairs, and is an efficient parallel extension to the PLINK epistasis module. [less]
  • fastsimcoal  •  fast sequential markov coalescent simulation of genomic data under complex evolutionary models [less]
  • FetchGWI / tagger  •  Tagger and fetchGWI are tools which allow searching short sequence tags against entire genomes or mRNA reference sequence databases.  [less]
  • GENIO/logo  •  Positional dependent information contents of aligned RNA/DNA or amino acid sequences are useful for the display of consensus sequences and for finding optimal search windows used in sequence analysis. The program calculates the positional information content of mono or poly nucleotides/amino acids from a FASTA file of aligned sequences and writes a Postscript (or encapsulated Postscript, EPS) file that can be viewed and included in text processors [less]
  • GMM  •  Gaussian Mixture Model (GMM) detects copy number variation from the distribution of copy number ratios. From the data, it will fit one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy; with a constraint on the difference between the mixture means. Then for a given individual, it will determine the probabilities for each copy number state and compute the expected copy number (dosage). [less]
  • Graphical Codon Usage Analyser  •  Differences in codon usage preference among organisms lead to a variety of problems concerning heterologous gene expression but can be overcome by rational gene design and gene synthesis. The gcua tool displays the codon quality either in codon usage frequency values or relative adaptiveness values [less]
  • IScan  •  A package to identify insertion sequences and similar transposable elements, their inverted repeats, and the direct target repeats they generate in entire genomes.  [less]
  • ISMARA  •  MARA models genome-wide expression data in terms of our genome-wide annotations of regulatory sites. For a given expression data-set it infers the key transcription regulators, their sample-dependent activities, and their genome-wide targets. [less]
  • IsotopIdent  •  Isotopident can estimate the theoretical isotopic distribution of a peptide or protein, a polynucleotide and a chemical compound from its composition (sequence of amino acids expressed in either 1-letter code, sequence of amino acids expressed in 3-letter code, sequence of nucleotides or its chemical formula). Isotopident can also compute its monoisotopic mass. and predict the most likely isotope combination and the exact mass of the given input.  [less]
  • Kalign - EBI  •  Kalign is a fast and accurate multiple sequence alignment algorithm. [less]
  • Kalign - SBC  •  Kalign is a fast and accurate multiple sequence alignment algorithm.  [less]
  • LALIGN  •  LALIGN, from the FASTA package, finds multiple matching subsegments in two sequences, locally or globally. [less]
  • MADAP  •  MADAP is a flexible clustering tool for the interpretation of one-dimensional genome annotation data. Such data might consist of counts, probabilities, or intensities associated with genome positions. They may be generated by mapping mRNA 5' and 3' sequence tags to genomes, by ChIP-chip or ChIP-Seq assays, or by genome-wide genotype-phenotype association studies. MADAP identifies clusters of data points corresponding to genomic features. [less]
  • MAFFT - CBRC  •  MAFFT (Multiple Alignment using Fast Fourier Transform) is a high speed multiple sequence alignment program. [less]
  • MAFFT - EBI  •  MAFFT (Multiple Alignment using Fast Fourier Transform) is a high speed multiple sequence alignment program. [less]
  • MAMOT  •  C++ program for HMM models [less]
  • MaxAlign  •  The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. MaxAlign maximizes the number of characters that are present in gap-free columns. It can be used prior to any phylogenetic analysis as well as in other situations where this form of alignment clean up is useful, such as the presence of badly aligned or truncated sequences.  [less]
  • Multialin  •  Multiple sequence alignment with hierarchical clustering [less]
  • MUSCLE  •  Multiple alignment server [less]
  • Newick Utilities  •  The Newick Utilities are a set of command-line tools for processing phylogenetic trees. They can process arbitrarily large amounts of data and do not require user interaction, which makes them suitable for automating phylogeny processing tasks. [less]
  • Phylogibbs  •  Phylogibbs is an algorithm for discovering regulatory sites in a collection of DNA sequences, including multiple alignments of orthologous sequences from related organisms.The algorithm uses a Gibbs sampling strategy, takes the phylogenetic relationships of the input sequences rigorously into account, and assigns realistic posterior probabilities to reported sites using a novel annealing+tracking strategy. [less]
  • Progenetix  •  The Progenetix database provides an overview of copy number abnormalities in human cancer from Comparative Genomic Hybridization (CGH) experiments. With >30000 cases from >1000 publications, Progenetix is the largest curated database for whole genome copy number profiles. The current dataset contains >20000 chromosomal CGH and >10000 profiles from genomic array experiments. This data covers hundreds of cancer entities. [less]
  • Reverse Transcription and Translation Tool  •  This tool converts DNA to RNA to protein, and reverse transcript RNA to DNA. [less]
  • Reverse Translate  •  Accepts a protein sequence as input and uses a codon usage table to generate a DNA sequence representing the most likely non-degenerate coding sequence. A consensus sequence derived from all the possible codons for each amino acid is also returned.  [less]
  • SAMBA  •  SAMBA is a 128 processor array for speeding up the comparison of biological sequences. The hardware implements a parameterized version of the Smith and Waterman algorithm allowing the computation of local or global alignments with or without gap penalty. [less]
  • Sequence Similarity Maps (SSM)  •  A data mining solution to map thousands of viral isolates (influenza, dengue and FMDV virus). [less]
  • Sequerome  •  Sequerome is a web based Java tool that acts as a front-end to BLAST queries and provides simplified access to web-distributed resources for protein and nucleic acid analysis. It provides a web-based sequence profiling tool for integrating the results of a BLAST sequence-alignment report with external research tools and servers that perform advanced sequence manipulations, and allowing the user to record the steps of such an analysis. [less]
  • SHOPS  •  SHOPS (SHow OPeron Structures) was developed to analyze the genomic operon context for any group of proteins selected on the basis of a set of sequence or domain identifiers. It uses genome annotations of all available fully sequenced prokaryotes to create a scaled graphical overview of the genomic context around the proteins of interest. [less]
  • ShoRAH  •  Software to analyse deep-sequencing (NGS) data and reconstruct haplotypes in a genetically heterogeneous sample [less]
  • SIBsim4  •  SIBsim4 is a modified version of the sim4 program, which is a similarity-based tool for spliced alignments, i.e. for aligning an expressed DNA sequence (EST, mRNA) with a genomic sequence. [less]
  • SSA  •  SSA is a software package for the analysis of nucleic acid sequence motifs that are positionally correlated with a functional site such as a transcription initiation site. The programs provided by this web server are the following: generation of a constraint profile (Cpr); generation of a signal list (SList); generation of a signal occurrence profile (OProf); pattern optimization tool (PatOp); find motifs around functional sites (FindM), and extract sequences around functional sites (FromFPS). [less]
  • T-Coffee  •  A collection of tools for computing, evaluating and manipulating multiple alignments of DNA, RNA, protein sequences and structures. Includes M-Coffee, R-Coffee, Expresso, PSI-Coffee, iRMSD-APDB. [less]
  • T-Coffee - EBI  •  T-Coffee is a multiple sequence alignment program. Its main characteristic is that it will allow you to combine results obtained with several alignment methods.  [less]
  • T-Coffee - WUR  •  T-Coffee is a multiple sequence alignment program. Its main characteristic is that it will allow you to combine results obtained with several alignment methods. [less]
  • TagScan  •  TagScan allows searching for exact or near-exact matches between oligonucleotide queries of up to 60 bases and sequence databases comprising entire genomes or mRNA reference sequences. The smallest query allowed is 10 nucleotides long.  [less]
  • Translate  •  Translation of a nucleotide (DNA/RNA) sequence to a protein sequence. [less]
  • TriFLe  •  TRiFLe is a program for simulating TRFs and identifying species by TRF profiling [less]
  • tromer  •  The transcriptome analyser project aims to provide tools to determine and document all the transcribed parts of a genome. The transcribed parts are defined by analysing experimental evidence, like expressed sequence tags (EST) and other mRNA sequences. [less]
  • WebLogo  •  Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position.  [less]
  • Wise2  •  The Wise2 form compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors. [less]
  • WU BLAST  •  Protein databases query to find regions of sequence similarity quickly, with minimum loss of sensitivity.  [less]
  • ZFN-Site  •  ZFN-Site is intended to search genomes for specific target sites and off-target sites, such as for pairs of zinc finger proteins (ZFPs), zinc finger nucleases (ZFNs) and modified homing endonucleases. This can either define a zinc finger nucleases (hetero-dimer) or allow for homo-dimers depending on the status of the checkbox in the input form. [less]