Publications by Biophoenix' Principals

Innovations in Bioinformatics:
Emerging Tools for Drug Discovery and Development
Publisher:Business Insights Ltd (Datamonitor)
Year of publication:2008
Type of publication:Management report
Publisher's reference (if any):RBDD0020
Author(s):Sreten Bogdanovic and Beata Langlands
Approximate page count:250
Price when published:$3835
Remarks:
  1. Page numbers, where given, refer to the draft manuscript (which may differ from the published version).
  2. The copyright in this report is owned by the publisher, to whom any requests for copies should be addressed.
  3. The price shown is for a single copy of the print version. Multiple copies and electronic copies usually have different prices.
Table of Contents
Executive Summary

Chapter 1 Bioinformatics in the omics era
Chapter 2 Genomics and related omics
Chapter 3 Proteomics
Chapter 4 Metabolomics and Systems Biology
Chapter 5 Knowledge management solutions
Chapter 6 Profiles of selected companies
Chapter 7 Trends and opportunities

Chapter 1 Bioinformatics in the omics era

  1.0 Summary
  1.1 What is bioinformatics?
  1.2 End-users of bioinformatic tools and services
  1.3 Life sciences in the omics era
     1.3.1 Overview of omics
       1.3.1.1 Genomics
       1.3.1.2 Pharmacogenomics
       1.3.1.3 Transcriptomics
       1.3.1.4 Proteomics
       1.3.1.5 Metabolomics
       1.3.1.6 Other omics
  1.4 Background on drug discovery and development
     1.4.1 Types of drugs under development
       1.4.1.1 Small molecules
       1.4.1.2 Protein-based biotherapies
       1.4.1.3 Nucleic acid-based biotherapies
     1.4.2 The drug discovery and development process
       1.4.2.1 Stages (US)
       1.4.2.2 Role of biomarkers
  1.5 Bioinformatics in drug discovery and development
     1.5.1 From biological data to drug knowledge

Chapter 2 Genomics and related omics

  2.0 Summary
  2.1 Background
  2.2 Categories of genomic analysis
     2.2.1 Sequencing
       2.2.1.1 Next-generation sequencing
     2.2.2 Genotyping and gene expression
       2.2.2.1 DNA microarrays
  2.3 Value of genomic analysis to the drug industry .
  2.4 Bioinformatics solutions
     2.4.1 Analysis of sequencing data
       2.4.1.1 Sequence databases
       2.4.1.2 Sequence search tools
       2.4.1.3 Multiple sequence alignment tools
       2.4.1.4 Focus on RNAs
       2.4.1.5 Genome finishing and annotating
     2.4.2 Microarray platforms
       2.4.2.1 Affymetrix
       2.4.2.2 Illumina
       2.4.2.3 Agilent Technologies
       2.4.2.4 Applied Biosystems
     2.4.3 Genome-wide association studies
     2.4.4 Other advances

Chapter 3 Proteomics

  3.0 Summary
  3.1 Background
  3.2 Categories of proteomic analysis
     3.2.1 Protein separation
     3.2.2 Protein identification
       3.2.2.1 Mass spectrometry
       3.2.2.2 Protein microarrays
     3.2.3 Structure determination
  3.3 Value of proteomic analysis to the drug industry
  3.4 Bioinformatics solutions
     3.4.1 Analysis of sequencing data
       3.4.1.1 Sequence databases
       3.4.1.2 Sequence search tools
       3.4.1.3 Multiple sequence alignment tools
     3.4.2 In silico drug discovery
     3.4.3 Data analysis and integration
       3.4.3.1 Agilent Technologies
       3.4.3.2 Applied Biosystems
       3.4.3.3 BioWisdom
       3.4.3.4 Bruker Daltonics
       3.4.3.5 Geneva Bioinformatics
       3.4.3.6 GeneLogics
       3.4.3.7 Health Discovery Corporation
       3.4.3.8 Nonlinear Dynamics
       3.4.3.9 Sage-N Research
       3.4.3.10 Thermo Scientific
       3.4.3.11 Vermillion

Chapter 4 Metabolomics and Systems Biology

  4.0 Summary
  4.1 Background
  4.2 Metabolomics
     4.2.1 Introduction
     4.2.2 Value to the drug industry
     4.2.3 Commercial bioinformatics solutions
  4.3 Systems biology
     4.3.1 Introduction
     4.3.2 Value to the drug industry
       4.3.2.1 Initiatives at Big Pharma
     4.3.3 Approaches
     4.3.4 In silico mathematical models
       4.3.4.1 The Systems Biology Markup Language
     4.3.5 Publicly available modeling software
     4.3.6 Commercial pathway analysis tools
       4.3.6.1 Ariadne Genomics
       4.3.6.2 GeneGo
       4.3.6.3 Ingenuity Systems
     4.3.7 Commercial modeling technologies
       4.3.7.1 BG Medicine
       4.3.7.2 Cellnomica
       4.3.7.3 Compugen
       4.3.7.4 CuraGen
       4.3.7.5 Entelos
       4.3.7.6 Genstruct
       4.3.7.7 Genomatix
       4.3.7.8 Genomatica
       4.3.7.9 Gene Network Sciences
       4.3.7.10 Health Discovery Corporation
       4.3.7.11 Merrimack Pharmaceuticals
       4.3.7.12 Physiomics

Chapter 5 Knowledge management solutions

  5.0 Summary
  5.1 Introduction
  5.2 Providers of high-performance computing
  5.3 Providers of storage systems
  5.4 Web-based solutions
     5.4.1 Ontologies
     5.4.2 Knowledge sharing
     5.4.3 The Semantic Web
  5.5 Software to support R&D labs
     5.5.1 Abrevity
     5.5.2 Accelrys
     5.5.3 Agilent Technologies
     5.5.4 BioWisdom
     5.5.5 CambridgeSoft
     5.5.6 Elsevier MDL
     5.5.7 Geospiza
     5.5.8 GeneLogics
     5.5.9 IO Informatics
     5.5.10 KOOPrime
     5.5.11 LabVantage
     5.5.12 MathWorks
     5.5.13 NextBio
     5.5.14 Oracle
     5.5.15 SAS
     5.5.16 Symyx
     5.5.17 Teranode
     5.5.18 Thermo Fisher Scientific
  5.6 Text searching and mining
     5.6.1 QUOSA
     5.6.2 Linguamatics
     5.6.3 Inforsense
     5.6.4 Insightful
     5.6.5 Nervana
     5.6.6 Velocity
  5.7 Clinical trials solutions
     5.7.1 Adobe Systems
     5.7.2 Infosys
     5.7.3 Oracle
     5.7.4 Pharsight

Chapter 6 Profiles of selected companies

  6.0 Summary
  6.1 Accelrys Inc
  6.2 Affymetrix Inc
  6.3 Agilent Technologies Inc
  6.4 Avalon Pharmaceuticals Inc
  6.5 BG Medicine Inc
  6.6 BioWisdom Ltd
  6.7 Caliper Life Sciences Inc
  6.8 Compugen Ltd
  6.9 CuraGen Corporation
  6.10 Entelos Inc
  6.11 Genstruct Inc
  6.12 Gene Network Sciences Inc
  6.13 GeneGo Inc
  6.14 Health Discovery Corporation
  6.15 Illumina Inc
  6.16 Ingenuity Systems Inc
  6.17 Insightful Corp
  6.18 NextBio
  6.19 SAS Institute Inc
  6.20 Vermillion Inc

Chapter 7 Trends and opportunities

  7.0 Summary
  7.1 End-user needs and sentiment
  7.2 Emerging trends in market evolution
  7.3 Bioinformatics-enabled biomarker discovery
  7.4 The promise of Semantic Web technology
  7.5 Facilitating implementation of IT systems
  7.6 Market snapshot and forecasts
     7.6.1 Genomics and related omics
     7.6.2 Proteomics
     7.6.3 Systems biology
     7.6.4 Life science enterprise knowledge management
     7.6.5 Market estimates

Appendix:  Research Methodology

Index

List of Tables and Figures

Table 2.01  Bioinformatic Analyses of DNA/RNA Sequences
Table 2.02  Contribution of Bioinformatics to Genomics-Based
            Drug Discovery
Table 2.03  Databases and On-line Tools for Analysing DNA
            Sequences and Signals
Table 2.04  Leading Sequence Comparison Servers
Table 2.05  BLAST Applications and Flavors
Table 2.06  Online Pairwise Alignment Programs
Table 2.07  Multiple Sequence Alignment Tools
Table 2.08  RNA Secondary Structure Prediction
Table 2.09  miRNA and siRNA Resources
Table 2.10  Selected genome-sequencing packages
Table 2.11  Phylogeny and Orthology
Table 2.12  Summary of patent-related genes in the major
            organisms
Table 3.01  Role of Bioinformatics in Protein Diagnostics
            and Therapeutics
Table 3.02  Selected Protein Databases
Table 3.03  On-line tools to test for protein transmembrane
            segments
Table 3.04  Principal Protein Domain Recognition Resources
Table 3.05  Protein Structure Prediction
Table 4.01  Software for Systems Biology
Table 7.01  World Bioinformatics Market, 2006-2011

Figure 01  An entry from Entrez GENE, the US NCBI's web-based
           interface to GENBANK
Figure 02  Genomic context of the myostatin gene (GDF8) using
           the NCBI Map Viewer
Figure 03  Advanced Search page of the Protein Data Bank (PDB),
           the central repository for 3D protein structure
           information
Figure 04  PDB Entry for the complex of the acetylcholine
           receptor with carbamylcholine (1UV6)
Figure 05  Results of a search in the UniProt KnowledgeBase
           (Swiss-PROT and TrEMBL) for MAP kinase phosphatases
Figure 06  Partial UniProt record for Dual Specificity Protein
           Phosphatase 4 (DUSP4) / MAP Kinase Phosphatase 2
Figure 07  UniProt Feature Aligner for Dual Specificity Protein
           Phosphatase 4 (DUSP4)
Figure 08  UniProt sequences at least 90% similar to Dual
           Specificity Protein Phosphatase 4 (DUSP4)
Figure 09  Front page of the Protein Kinase Resource, University
           of California at San Diego
Figure 10  Search page of microInspector
Figure 11  Search page of the miRNA database (miRBase)
Figure 12  miRBase Targets database, a new resource for
           predicting miRNA targets in animals
Figure 13  Predicted Human Genomic Targets for the
           hsa-let-7g* miRNA
Figure 14  Dharmacon siRNA Designer Search Page
Figure 15  Genomatica's SimPheny(TM) Systems Biology Model
           Development Process
Executive Summary

Chapter 1 Bioinformatics In The Omics Era

  • Bioinformatics is an interdisciplinary field at the junction of several disciplines ranging from biology to mathematics to computer science to information technology.

  • The pharmaceutical industry is facing a data overload from omics technologies such as genomics and proteomics which utilise high throughput methods such as genome sequencing, DNA microarrays or mass spectrometry to detect, quantitate, and identify DNA, RNA, proteins, and metabolites from body tissue and fluids.

  • Bioinformatics tools are essential to manage the deluge of omics data. A major challenge is correctly interpreting omics data, which is also a prime force behind the development of bioinformatics.

  • Bioinformatics tools enable omic profiling, which holds the key to personalized medicine, and underpin systems biology, which aims at integrating the enormous amount of existing omics data.

  • Drugs currently under development include synthetic small molecules and biopharmaceuticals. The process of drug discovery and development is expensive, time consuming and risky. Well-informed decision-making is vitally important throughout this process.

  • Bioinformatics-enabled data management and integration is one of the keys to improved productivity in biopharmaceutical R&D.

  • Biomarkers can help pharmaceutical companies identify disease targets, serve as indicators of drug efficacy, safety and toxicity, and allow earlier detection of disease. Biomarker discovery requires bioinformatics tools capable of handling the analysis of vast numbers of clinical samples.

  • Extraordinary developments in the life sciences and bioinformatics are creating unprecedented opportunities to increase drug research productivity at a time when pharmaceutical companies are under immense pressure owing to declining drug pipelines and loss of patent protection for blockbuster drugs.

    Chapter 2 Genomics And Related Omics

  • Technological advances in sequencing, genotyping, and gene expression studies are driving developments in genomics and related omics (in particular pharmacogenomics and transcriptomics).

  • Next-generation sequencing machines, such as 454 Life Sciences' Genome Sequencer FLX, utilise miniaturized reactions. Associated software enables complex regions of genomes to be analysed without the time or cost constraints of traditional DNA-sequencing methods.

  • DNA microarrays have become a mainstay for a vast range of genomic applications such as high throughput genotyping and gene expression profiling.

  • Companies pursuing genomics-based drug discovery programs hope to discover new drug targets, predict treatment safety and efficacy, and develop better diagnostics. In order to achieve these aims they need bioinformatics tools to extract meaning from sequencing, genotyping, and gene expression data.

  • The storage and manipulation of sequence data (for nucleic acids and proteins) are key activities in bioinformatics. Most available sequence data, and a vast range of tools with which to analyse it, are open source and publicly available. GenBank (USA), EMBL-Bank (Europe), and DDBJ (Japan) are the leading sequence repositories.

  • BLAST (Basic Local Alignment Search Tool) is the primary tool for sequence comparisons in bioinformatics and contains several subprograms for different computational problems. Various on-line tools can be used to compare many nucleotide or protein sequences simultaneously.

  • Current focus on genome finishing and annotating has led to emergence of a new set of bioinformatics tools for parsing large DNA sequences into their components such as protein-coding regions and regulatory elements. The TIGR Assembler is the classic sequence assembly tool developed by The Institute for Genome Research.

  • Affymetrix and Illumina lead the SNP genotyping and gene expression markets. Both companies utilize hybridization microarrays. Companies such as Agilent Technologies and Applied Biosystems are directly competitive. Proprietary technologies and bioinformatics tools are discussed.

  • Microarrays are increasingly used in large-scale whole-genome disease association studies which has driven demand for software programs to support massive data sets in a more open-ended type of system.

    Chapter 3 Proteomics

  • Most current proteomics experiments are aimed at the quantitative profiling of proteins in complex mixtures using high-resolution separations followed by high-throughput mass spectrometry and protein fingerprinting.

  • While the primary structure of a protein can be determined from the DNA sequence, secondary, tertiary, and quaternary structures require nuclear magnetic resonance or x-ray crystallography. Bioinformatics can enhance experimental information or may in some circumstances predict higher-level protein structures.

  • Clinical proteomics studies are being carried out to identify new drug targets, biomarkers of drug efficacy and toxicity, and diagnostic biomarkers. In this way, proteomics holds the potential to select successful drugs and eliminate toxic compouds early in the drug discovery process saving significant resources.

  • The storage and manipulation of sequence data (for nucleic acids and proteins) are key activities in bioinformatics. Most available sequence data, and a vast range of tools with which to analyse it, are open source and publicly available.

  • Prominent protein databases include Protein Data Bank, a repository for protein 3D structures; Swiss-Prot, a curated database of protein sequences; and PROSITE, a database of protein families and domains. Two dedicated servers scan for lipophilic transmembrane regions, which are potentially important drug targets.

  • Scanning a protein sequence database is usually done with the BLAST sequence search tool. Various on-line tools can be used to compare many sequences simultaneously. Methods available to compare multiple sequences include ClustalW, MUSCLE, and Tcoffee.

  • Structure-based drug design programmes have been used for decades for the identification and optimization of drug leads. If the structure of the target is known, virtual screening of compound libraries can be followed by the application of docking and scoring algorithms.

  • Proteomics technologies generate dauntingly large amounts of heterogeneous data types and rely on bioinformatics tools to analyse and integrate data and impose order on the wealth of data generated. The activities of 11 companies which are major contributors of bioinformatics tools to the proteomics field are reviewed.

  • Commercial bioinformatics tools include: tools that work in tandem with mass spectrometers; proteomics lab and data management solutions; analysis and data mining tools; and pattern recognition tools for detection of potential biomarkers and biomarker patterns.

    Chapter 4 Metabolomics And Systems Biology

  • Metabonomics generally combines spectroscopic methods with pattern recognition analysis. Commercial bioinformatics tools are being developed to enable researchers to manage and understand metabolomics data.

  • Metabolomics offers a means of evaluating the metabolic actions and toxicity of new drugs. Analysis of dynamic systems is becoming possible making metabolomics a key technology for systems biology.

  • Systems biology seeks to understand biomolecules and pathways in which they participate at a whole systems level. Systems biology allows hypothesis-generation and requires discovery science to test and support the hypothesis.

  • Systems biology has the potential to impact the entire drug discovery and development timeline by providing a means for identifying disease pathways and intervention points, and discovering both on- and off-target effects of compounds.

  • Many major pharmaceutical companies are embracing systems biology. Novartis, an early advocate of systems biology has already integrated it into all stages of its drug discovery process.

  • The bottom-up approach to systems biology involves putting omics data together to allow hypothesis-driven research. The top-down modeling approach uses a conceptual framework to integrate data. Top-down disease models are generally built from information about relevant pathways together with consensus clinical opinions.

  • In silico mathematical models develop biological knowledge into a system-oriented understanding. They may be deterministic or stochastic. Using object-oriented techniques it is possible to build modeling libraries and allow the user to extend them.

  • The Systems Biology Markup Language (SBML) is the standard model description language and provides a way to integrate software packages from multiple suppliers.

  • The characteristics of models are influenced by the capabilities of the modeling software. Many systems biology modeling packages have been developed by non-industrial sources and are publicly available.

  • Ariadne Genomics, GeneGo, and Ingenuity Systems are the leading commercial providers of pathway analysis tools. These tools may be used for target identification/validation and lead identification, but most recently, toxicity and safety assessment has emerged as a key application.

  • Providers of modeling technologies often use pathway analysis tools to help inform their model building. We survey 12 systems biology companies, focusing on their approaches to modeling and predictive simulation, drug discovery and development and biomarker discovery.

    Chapter 5 Knowledge Management Solutions

  • Every step of the drug discovery and development process is information intensive. Integrating knowledge management solutions into the flow of the company is mow a major priority for many companies. Data and information about potential new drugs must be readily available so that project teams can make well informed decisions.

  • Analysis of disparate, complex biological data require computationally intensive numerical operations on a large data domain. High-performance computing (HPC) and supercomputers can solve those problems in a reasonable amount of time. Providers of HPC including IBM, Penguin Computing, Microsoft, InforSense, Silicon Graphics, Mitrionics, and Hewlett-Packard are discussed here.

  • Companies such as BlueArc Corporation, EMC Corporation, Isilon Systems, and Silicon Graphics address the unique challenges associated with high-performance data storage.

  • It is anticipated that in the coming years every traditional software application will be ported to the World Wide Web and will run on industry standard browsers. Launched in June 2007, GraphLogic's PointDragon is the first commercially available 100% web-based environment for enterprise systems.

  • US NIH's Digital Roadmap Initiative envisages all biomedical knowledge and data should be disseminated on the web using principled ontologies such that the knowledge and data are semantically interoperable.

  • The emerging bottom-up collaborative and knowledge sharing technologies include shared wikis, Technorati, and del.icio.us. The Semantic Web, an extension of the current web, provides a universally accessible platform that will allow scientific data to be shared and processed by automated tools.

  • We survey 18 companies that offer innovative software solutions for biomedical R&D, focusing on solutions developed to support management and analysis of metadata, workflow, and laboratory information. Many of these software solutions are web-based.

  • The use of text searching and mining tools can help a company gain significant gains in R&D productivity. New insights can be generated by passing content from one tool to another. Companies providing text searching and mining tools include QUOSA, Linguamatics, and Inforsense.

  • Few aspects of drug development are as costly as clinical trials. Many modeling providers previously focused on discovery have been turning their attention to accelerating the drug development process. Companies offering clinical trials solutions include Adobe Systems, Infosys, Oracle, and Pharsight.

    Chapter 6 Profiles Of Selected Companies

  • Companies profiled in this Chapter offer bioinformatics tools and services with applications in genomics, proteomics, systems biology and knowledge management.

  • Curagen offers bioinformatics for use in genome sequencing, while Affymetrix, Agilent, and Illumina offer bioinformatics for high-throughput genotyping.

  • Several of the profiled companies focus on systems biology: GeneGo and Ingenuity Systems offer pathway analysis tools; while Entelos, Gene Network Services, and Genstruct are biosimulation technology providers.

  • Accelrys, Insightful, BioWisdom, NextBio and SAS are focused on knowledge management; they are active in areas such as scientific business intelligence, data mining and metadata analysis.

  • Many of the profiled companies are involved in, or support, biomarker discovery. They include Avalon Pharmaceuticals, BG Medicine, BioWisdom, Caliper Life Sciences, Compugen, CuraGen, Genstruct, Gene Network Services, GeneGo, Health Discovery Corp, Ingenuity Systems, and Vermillion.

    Chapter 7 Trends And Opportunities

  • The life science industry (Big Pharma, specialty biopharma, and biotechs) looks towards enhanced bioinformatics platforms to deliver the ability to reduce the time to market and improve the success rate for new drugs with higher efficacy profiles and significantly reduced side effects.

  • There is likely to be continued growth in open, configurable, complete bioinformatics solutions based on interoperable platforms that allow customers to aggregate scientific data and technology regardless of source. Bioinformatics solutions will need to leverage open web standards.

  • Bioinformatics-enabled biomarker discovery will be fundamental to deliver the promise of more cost effective, safe and efficient drug trials, targeted therapies and molecular diagnostics. Making smarter decisions early in the life of a compound will demand superior data management and modeling software platforms.

  • The promise of the Semantic Web is to provide access to diverse data resources and support more complex knowledge-based efforts, with the potential for long-term benefits of improved efficiencies and cost savings of drug R&D. Broad adoption of Semantic Web technology is anticipated in the next few years.

  • The Life Sciences Information Technology Global Institute is developing the Good Informatics Practices (GIP) Guidance Document for improving quality and trust of IT within life science industry. Adopting a GIP guideline is expected to help companies realize lower costs and reduced timelines for drug discovery and development.

  • The bioinformatics market was worth $1.6 bilion in 2006 and is forecast to grow at an average compound annual growth rate (AGR) of 23% to $4.5 billion in 2011. Systems biology (including biomarker and metabolomics software) will enjoy the highest rate of growth (35%), but the leading segment is - and, during the forecast period, will continue to be - enterprise knowledge management with a compound AGR of 20%.

  • Proteomic applications of bioinformatics are expected to grow faster than genomic applications (16% vs 12% compound AGR), but they start from a lower baseline and are not increasing at a sufficient rate to catch up during the forecast period. The leading geographical market is the US, with a share of 45% in 2006, but Europe will narrow the gap over the forecast period owing to a significantly higher AGR (24.6% vs 21.9%).



  • Back to List of Biophoenix Publications