[en] In life sciences, scientists are confronted with an exponential growth of biological data, especially in the genomics and proteomics area. The efficient management and use of these data, and its transformation into knowledge are basic requirements for biological research. Therefore, integration of diverse applications and data from geographically distributed computing resources will become a major issue. We will present the status of our efforts for the realization of an automated protein prediction pipeline as an example for a complex biological workflow scenario in a Grid environment based on Web services. This case study demonstrates the ability of an easy orchestration of complex biological workflows based on Web services as building blocks and Triana as workflow engine.
Disciplines :
Computer science
Author, co-author :
MAY, Patrick ; Zuse Institute Berlin - ZIB > Computer Science Research
Ehrlich, Hans-Christian
Steinke, Thomas
Language :
English
Title :
ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow through Web Services
Publication date :
2006
Event name :
12th International Euro-Par Conference
Event place :
Dresden, Germany
Event date :
August 28 – September 1, 2006
Audience :
International
Main work title :
Euro-Par 2006 Parallel Processing
Editor :
Nagel, Wolfgang E.
Walter, Wolfgang V.
Lehner, Wolfgang
Publisher :
Springer
ISBN/EAN :
978-3-540-37783-2
Collection name :
Lecture Notes in Computer Science; Volume 4128 2006
Baker, D., Sali, A.: Protein structure prediction and structural genomics. Science 294 (2001) 93-96
Moult, J.: A decade of GASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15 (2005) 285-289
Fischer, D., Barret, C., Bryson, K., Elofsson, A., Godzik, A., Jones, D., Karplus, K., Kelley, L., MacCallum, R., Pawowski, K., Rost, B., Rychlewski, L., Sternberg, M.: CAFASP-1: critical assessment of fully automated structure prediction methods. Proteins 3 (1999) 209-217
Altschul, S.F., Madden, T.L., Schaffler, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nuc. Acids Res. 25 (1997) 3389-3402
Majithia, S., Shields, M., Taylor, I., Wang, I.: Triana: A graphical web service composition and execution toolkit. In: IEEE International Conference on Web Services (ICWS'2004). (2004)
Curbera, F., Andrews, T., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: (Business Process Execution Language for Web services, V.1.0) Available via http://www-106.ibm.com/developerworks/webservices/library/ws-bpel.
Leymann, F.: (Web Service Flow Language (WSFL), version 1.0)
(Triana) Available via http://www.trianacode.org.
Gao, H.T., Hayes, J.H., Gai, H.: Integrating biological research through web services. Computer (2005) 26-31
Cavalcanti, M.C., Targino, R., Baiäo, F.A., Rössle, S.C., Bisch, P.M., Pires, P.F., Campos, M.L.M., Mattoso, M.: Managing structural genomic workflows using web services. Data Knowl. Eng. 53(1) (2005) 45-74
(IBM BPWS4J) Available via http://www.alphaworks.ibm.com/tech/bpws4j.
Guo, J., Ellrott, K., Chung, W.J., Xu, D., Passovets, S., Xu, Y.: PROSPECT-PSPP: an automated computational pipeline for protein structure prediction. Nucleic Acid Res. 32(Web Server Issue) (2004) W522-W525
Velankar, S., McNeil, P., Mittard-Runte, V., Suarez, A., Barrell, D., Apweiler, R., Henrick, K.: E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res. 33(Database issue) (2005) D262-265
Trissl, S., Rother, K., Muller, H., Steinke, T., Koch, I., Preissner, R., Froemmel, C., Leser, U.: Columba: an integrated database of proteins, structures, and annotations. BMC Bioinformatics 6(1) (2005) 81-92
(HOBIT (Helmholtz Open Bioinformatics Technology) project) Available via http://hobit.sourceforge.net.
Michalsky, E., Goede, A., Preissner, R., May, P., Steinke, T.: A distributed pipeline for structure prediction. In: CASP6 Methods Abstracts, 6th Meeting on the Critical Assessment of Techniques for Protein Structure Prediction, Gaeta, Italy (2004) 112-114
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shyndyalov, I., Bourne, P.: The protein data bank. Nucl. Acids Res 28 (2000) 235-242
Bairoch, A., Apweiler, R., Wu, C., Barker, W., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M., Natale, D., O'Donovan, C., Redaschi, N., Yeh, L.: The universal protein resource (uniprot). Nucleic Acids Res. 1(33) (2005) 154-159
Marti-Renom, M., Stuart, A., Fiser, A., Sanchez, R., Melo, F., Sali, A.: Comparitive protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29 (2000) 291-325
McGuffin, L., Bryson, K., Jones, D.: The PSIPRED protein structure prediction server. Bioinformatics 16 (2000) 404-405
May, P., Steinke, T.: THESEUS - protein structure prediction at ZIB. ZIB Report 06-24 (2006)
Lathrop, R.H., Sazhin, A., Sun, Y., Steffen, N., Irani, S.S.: A multi-queue branch-and-bound algorithm for anytime optimal search with biological applications. Genome Informatics 12 (2001) 73-82
(BCB-Cluster) Available via http://elfie.bcbio.de.
(Apache Axis) Available via http://ws.apache.org/axis.
Taylor, I., Wang, I., Shields, M., Majithia, S.: Distributed computing with triana on the grid. Concurrency and Computation:Practice and Experience 17 (2005) 1-18