Chapter 1 Bioinformatics In The Omics Era
Bioinformatics is an interdisciplinary field at the junction
of several disciplines ranging from biology to mathematics to
computer science to information technology.
The pharmaceutical industry is facing a data overload from
omics technologies such as genomics and proteomics which utilise
high throughput methods such as genome sequencing, DNA
microarrays or mass spectrometry to detect, quantitate, and
identify DNA, RNA, proteins, and metabolites from body tissue
and fluids.
Bioinformatics tools are essential to manage the deluge of
omics data. A major challenge is correctly interpreting omics
data, which is also a prime force behind the development of
bioinformatics.
Bioinformatics tools enable omic profiling, which holds the
key to personalized medicine, and underpin systems biology,
which aims at integrating the enormous amount of existing omics
data.
Drugs currently under development include synthetic small
molecules and biopharmaceuticals. The process of drug discovery
and development is expensive, time consuming and risky.
Well-informed decision-making is vitally important throughout
this process.
Bioinformatics-enabled data management and integration is one
of the keys to improved productivity in biopharmaceutical R&D.
Biomarkers can help pharmaceutical companies identify disease
targets, serve as indicators of drug efficacy, safety and
toxicity, and allow earlier detection of disease. Biomarker
discovery requires bioinformatics tools capable of handling the
analysis of vast numbers of clinical samples.
Extraordinary developments in the life sciences and
bioinformatics are creating unprecedented opportunities to
increase drug research productivity at a time when
pharmaceutical companies are under immense pressure owing to
declining drug pipelines and loss of patent protection for
blockbuster drugs.
Chapter 2 Genomics And Related Omics
Technological advances in sequencing, genotyping, and gene
expression studies are driving developments in genomics and
related omics (in particular pharmacogenomics and
transcriptomics).
Next-generation sequencing machines, such as 454 Life
Sciences' Genome Sequencer FLX, utilise miniaturized reactions.
Associated software enables complex regions of genomes to be
analysed without the time or cost constraints of traditional
DNA-sequencing methods.
DNA microarrays have become a mainstay for a vast range of
genomic applications such as high throughput genotyping and gene
expression profiling.
Companies pursuing genomics-based drug discovery programs hope
to discover new drug targets, predict treatment safety and
efficacy, and develop better diagnostics. In order to achieve
these aims they need bioinformatics tools to extract meaning
from sequencing, genotyping, and gene expression data.
The storage and manipulation of sequence data (for nucleic
acids and proteins) are key activities in bioinformatics. Most
available sequence data, and a vast range of tools with which to
analyse it, are open source and publicly available. GenBank
(USA), EMBL-Bank (Europe), and DDBJ (Japan) are the leading
sequence repositories.
BLAST (Basic Local Alignment Search Tool) is the primary tool
for sequence comparisons in bioinformatics and contains several
subprograms for different computational problems. Various
on-line tools can be used to compare many nucleotide or protein
sequences simultaneously.
Current focus on genome finishing and annotating has led to
emergence of a new set of bioinformatics tools for parsing large
DNA sequences into their components such as protein-coding
regions and regulatory elements. The TIGR Assembler is the
classic sequence assembly tool developed by The Institute for
Genome Research.
Affymetrix and Illumina lead the SNP genotyping and gene
expression markets. Both companies utilize hybridization
microarrays. Companies such as Agilent Technologies and Applied
Biosystems are directly competitive. Proprietary technologies
and bioinformatics tools are discussed.
Microarrays are increasingly used in large-scale whole-genome
disease association studies which has driven demand for software
programs to support massive data sets in a more open-ended type
of system.
Chapter 3 Proteomics
Most current proteomics experiments are aimed at the
quantitative profiling of proteins in complex mixtures using
high-resolution separations followed by high-throughput mass
spectrometry and protein fingerprinting.
While the primary structure of a protein can be determined
from the DNA sequence, secondary, tertiary, and quaternary
structures require nuclear magnetic resonance or x-ray
crystallography. Bioinformatics can enhance experimental
information or may in some circumstances predict higher-level
protein structures.
Clinical proteomics studies are being carried out to identify
new drug targets, biomarkers of drug efficacy and toxicity, and
diagnostic biomarkers. In this way, proteomics holds the
potential to select successful drugs and eliminate toxic
compouds early in the drug discovery process saving significant
resources.
The storage and manipulation of sequence data (for nucleic
acids and proteins) are key activities in bioinformatics. Most
available sequence data, and a vast range of tools with which to
analyse it, are open source and publicly available.
Prominent protein databases include Protein Data Bank, a
repository for protein 3D structures; Swiss-Prot, a curated
database of protein sequences; and PROSITE, a database of
protein families and domains. Two dedicated servers scan for
lipophilic transmembrane regions, which are potentially
important drug targets.
Scanning a protein sequence database is usually done with the
BLAST sequence search tool. Various on-line tools can be used to
compare many sequences simultaneously. Methods available to
compare multiple sequences include ClustalW, MUSCLE, and
Tcoffee.
Structure-based drug design programmes have been used for
decades for the identification and optimization of drug leads.
If the structure of the target is known, virtual screening of
compound libraries can be followed by the application of docking
and scoring algorithms.
Proteomics technologies generate dauntingly large amounts of
heterogeneous data types and rely on bioinformatics tools to
analyse and integrate data and impose order on the wealth of
data generated. The activities of 11 companies which are major
contributors of bioinformatics tools to the proteomics field are
reviewed.
Commercial bioinformatics tools include: tools that work in
tandem with mass spectrometers; proteomics lab and data
management solutions; analysis and data mining tools; and
pattern recognition tools for detection of potential biomarkers
and biomarker patterns.
Chapter 4 Metabolomics And Systems Biology
Metabonomics generally combines spectroscopic methods with
pattern recognition analysis. Commercial bioinformatics tools
are being developed to enable researchers to manage and
understand metabolomics data.
Metabolomics offers a means of evaluating the metabolic
actions and toxicity of new drugs. Analysis of dynamic systems
is becoming possible making metabolomics a key technology for
systems biology.
Systems biology seeks to understand biomolecules and pathways
in which they participate at a whole systems level. Systems
biology allows hypothesis-generation and requires discovery
science to test and support the hypothesis.
Systems biology has the potential to impact the entire drug
discovery and development timeline by providing a means for
identifying disease pathways and intervention points, and
discovering both on- and off-target effects of compounds.
Many major pharmaceutical companies are embracing systems
biology. Novartis, an early advocate of systems biology has
already integrated it into all stages of its drug discovery
process.
The bottom-up approach to systems biology involves putting
omics data together to allow hypothesis-driven research. The
top-down modeling approach uses a conceptual framework to
integrate data. Top-down disease models are generally built from
information about relevant pathways together with consensus
clinical opinions.
In silico mathematical models develop biological knowledge
into a system-oriented understanding. They may be deterministic
or stochastic. Using object-oriented techniques it is possible
to build modeling libraries and allow the user to extend them.
The Systems Biology Markup Language (SBML) is the standard
model description language and provides a way to integrate
software packages from multiple suppliers.
The characteristics of models are influenced by the
capabilities of the modeling software. Many systems biology
modeling packages have been developed by non-industrial sources
and are publicly available.
Ariadne Genomics, GeneGo, and Ingenuity Systems are the
leading commercial providers of pathway analysis tools. These
tools may be used for target identification/validation and lead
identification, but most recently, toxicity and safety
assessment has emerged as a key application.
Providers of modeling technologies often use pathway analysis
tools to help inform their model building. We survey 12 systems
biology companies, focusing on their approaches to modeling and
predictive simulation, drug discovery and development and
biomarker discovery.
Chapter 5 Knowledge Management Solutions
Every step of the drug discovery and development process is
information intensive. Integrating knowledge management
solutions into the flow of the company is mow a major priority
for many companies. Data and information about potential new
drugs must be readily available so that project teams can make
well informed decisions.
Analysis of disparate, complex biological data require
computationally intensive numerical operations on a large data
domain. High-performance computing (HPC) and supercomputers can
solve those problems in a reasonable amount of time. Providers
of HPC including IBM, Penguin Computing, Microsoft, InforSense,
Silicon Graphics, Mitrionics, and Hewlett-Packard are discussed
here.
Companies such as BlueArc Corporation, EMC Corporation, Isilon
Systems, and Silicon Graphics address the unique challenges
associated with high-performance data storage.
It is anticipated that in the coming years every traditional
software application will be ported to the World Wide Web and
will run on industry standard browsers. Launched in June 2007,
GraphLogic's PointDragon is the first commercially available
100% web-based environment for enterprise systems.
US NIH's Digital Roadmap Initiative envisages all biomedical
knowledge and data should be disseminated on the web using
principled ontologies such that the knowledge and data are
semantically interoperable.
The emerging bottom-up collaborative and knowledge sharing
technologies include shared wikis, Technorati, and del.icio.us.
The Semantic Web, an extension of the current web, provides a
universally accessible platform that will allow scientific data
to be shared and processed by automated tools.
We survey 18 companies that offer innovative software
solutions for biomedical R&D, focusing on solutions developed to
support management and analysis of metadata, workflow, and
laboratory information. Many of these software solutions are
web-based.
The use of text searching and mining tools can help a company
gain significant gains in R&D productivity. New insights can be
generated by passing content from one tool to another. Companies
providing text searching and mining tools include QUOSA,
Linguamatics, and Inforsense.
Few aspects of drug development are as costly as clinical
trials. Many modeling providers previously focused on discovery
have been turning their attention to accelerating the drug
development process. Companies offering clinical trials
solutions include Adobe Systems, Infosys, Oracle, and Pharsight.
Chapter 6 Profiles Of Selected Companies
Companies profiled in this Chapter offer bioinformatics tools
and services with applications in genomics, proteomics, systems
biology and knowledge management.
Curagen offers bioinformatics for use in genome sequencing,
while Affymetrix, Agilent, and Illumina offer bioinformatics for
high-throughput genotyping.
Several of the profiled companies focus on systems biology:
GeneGo and Ingenuity Systems offer pathway analysis tools; while
Entelos, Gene Network Services, and Genstruct are biosimulation
technology providers.
Accelrys, Insightful, BioWisdom, NextBio and SAS are focused
on knowledge management; they are active in areas such as
scientific business intelligence, data mining and metadata
analysis.
Many of the profiled companies are involved in, or support,
biomarker discovery. They include Avalon Pharmaceuticals, BG
Medicine, BioWisdom, Caliper Life Sciences, Compugen, CuraGen,
Genstruct, Gene Network Services, GeneGo, Health Discovery Corp,
Ingenuity Systems, and Vermillion.
Chapter 7 Trends And Opportunities
The life science industry (Big Pharma, specialty biopharma,
and biotechs) looks towards enhanced bioinformatics platforms to
deliver the ability to reduce the time to market and improve the
success rate for new drugs with higher efficacy profiles and
significantly reduced side effects.
There is likely to be continued growth in open, configurable,
complete bioinformatics solutions based on interoperable
platforms that allow customers to aggregate scientific data and
technology regardless of source. Bioinformatics solutions will
need to leverage open web standards.
Bioinformatics-enabled biomarker discovery will be fundamental
to deliver the promise of more cost effective, safe and
efficient drug trials, targeted therapies and molecular
diagnostics. Making smarter decisions early in the life of a
compound will demand superior data management and modeling
software platforms.
The promise of the Semantic Web is to provide access to
diverse data resources and support more complex knowledge-based
efforts, with the potential for long-term benefits of improved
efficiencies and cost savings of drug R&D. Broad adoption of
Semantic Web technology is anticipated in the next few years.
The Life Sciences Information Technology Global Institute is
developing the Good Informatics Practices (GIP) Guidance
Document for improving quality and trust of IT within life
science industry. Adopting a GIP guideline is expected to help
companies realize lower costs and reduced timelines for drug
discovery and development.
The bioinformatics market was worth $1.6 bilion in 2006 and is
forecast to grow at an average compound annual growth rate (AGR)
of 23% to $4.5 billion in 2011. Systems biology (including
biomarker and metabolomics software) will enjoy the highest rate
of growth (35%), but the leading segment is - and, during the
forecast period, will continue to be - enterprise knowledge
management with a compound AGR of 20%.
Proteomic applications of bioinformatics are expected to grow
faster than genomic applications (16% vs 12% compound AGR), but
they start from a lower baseline and are not increasing at a
sufficient rate to catch up during the forecast period. The
leading geographical market is the US, with a share of 45% in
2006, but Europe will narrow the gap over the forecast period
owing to a significantly higher AGR (24.6% vs 21.9%).
|