Proteome Informatics Group: Research activities
The Proteome Informatics group is part of the
Swiss Institute of Bioinformatics (SIB).
The group focuses its activities on the development of software tools and databases for proteomics. Besides teaching activities of the PIG members, in particular within the Master's Degree in Proteomics and Bioinformatics, three axes of research and services have been pursued since the creation of the SIB. Some of these activities have been ongoing since 1984.
The Make 2D-DB II Package
Conversion of federated 2-DE gel databases into a relational format and interconnection of distributed databases
[1]
The growing use of 2-DE techniques, coupled with an increasing amount of experimental data, requires the setting of an easy-to-use environment. Such an environment has to ensure data consistency, as
well as the interconnection or remote databases into one global ?virtual? database that is freely accessible over the Internet. The basics have been introduced by the federated databases [2] concept.
2-DE databases, such as SWISS-2DPAGE [3], include textual descriptions of the identified proteins, as well as 2-DE images. The Make2ddb package [4] previously developed to help building similar
databases on one?s own Web site, provides various keyword search mechanisms and the ability to perform queries through a graphical interface.
The purpose of our current work is to extend the strength of the first package by fulfilling the different needs and demands arising from its use. The enhancements initially require the conversion of
the previously adopted text file, which lists the proteins sequentially, into a relational database system specific to 2-DE data. Following this task, various implementations are automatically set up.
They allow easier control of the information consistency, as well as the extension of various queries.
The development of the Make 2D-DB II package had to take into account several requirements:
- Conversion: the conversion process should be able to read, check and translate the content of already existent flat files into the new relational format.
- Consistency: the syntax, nature and content of the data are extensively checked. Suggestions for correction should be proposed if errors are encountered during conversion.
- Standardization: The new schema is aiming to include the most common standarized data structure, already adopted or still in progress: e.g: MIAME ('mged' sample preparation), PSI (for MS data),
proteomics ontology projects, etc..
- Flexibility: the relational schema should be adaptable for future evolution of protein annotation.
- Ease of use: updates and maintenance of data, through a Web interface, should not require any technical knowledge. Moreover, several predefined queries are set to carry out the most common
requests. Users can also choose to make data public or to keep them private.
- Automatic updates: numerous external data (e.g. taxonomy codes, main indexes, EC codes, tissue lists, etc.) could be automatically updated in a transparent manner by the maintainers.
- Restitution: we can restore or track any type of records throughout the database lifetime.
- Interconnection: the model is likely to promote a standard format, with the benefit of making possible the grouping of many different remote databases into one global virtual one. A special
interface would be easily set to send queries simultaneously to all distributed databases .
In its current state, the tool ensures the conversion of existing databases and the setting of the Web interface for the most common queries. It has been applied to SWISS-2DPAGE. The first version of
the new package will be soon available from the ExPASy server. The tools needed to update the content will then be finalized. At last, a
special interface has been set up to query simultaneously various remote databases.
We hope this work will contribute to the set up of an evolving standard for 2-DE databases and to facilitate access to remote information distributed among the many laboratories active in the 2-DE
field.
[1] Mostaguir K., Hoogland C., Binz P.-A., Appel R.D., The Make 2D-DB II package: Conversion of federated two-dimensional gel electrophoresis databases into a relational format and interconnection of distributed databases. Proteomics 2003, 3, 1441-1444.
[2] Appel R.D., Bairoch A., Sanchez J.C., Vargas J.R., Golaz O., Pasquali C., Hochstrasser D.F., Federated 2- DE database: a simple means of publishing 2-DE data. Electrophoresis 17, 540-546, 1996.
[3] Hoogland C., Sanchez J.-C., Tonella L., Binz P.-A., Bairoch A., Hochstrasser D.F., Appel R.D., The 1999 SWISS-2DPAGE database update.Nucleic Acids Res. 28, 2000, 286-28.
[4] Hoogland C., Baujard V., Sanchez J.-C., Hochstrasser D.F., Appel R.D., Make2ddb: A simple package to set up a two-dimensional electrophoresis database for the World Wide Web. Electrophoresis 18,
2755-2758, 1997.