`
Eugenio Del Prete1,2 and Angelo Facchiano1*
1Institute of Food Science, Italy
2Institute for Applied Mathematics “Mauro Picone”, Italy
*Corresponding author: Angelo Facchiano, Institute of Food Science, Italy
Submission: March 4, 2020; Published: November 09, 2020
ISSN: 2689-2707 Volume 2 Issue 4
The integration of computational approaches, omics and multi-omics studies, and bioinformatics resources offers great opportunities for clinical research. Biomarkers discovery for novel diagnostics devices, drug development and personalized medicine are the novel target of research in medicine. The effectiveness of their integration is the key to success of the future research in medicine.
Keywords: Clinical bioinformatics; Translational bioinformatics; Medical informatics; Multi-omics data; Data integration, Clinical metadata
Emerging techniques in molecular biology open grand challenges in the development
of new diagnostic procedures. Omics techniques include the capability of analysing a large
number of molecules with a single analysis, and continuous development of technology offers
the chance to perform the analysis at low cost, so that it becomes possible the screening
of large number of patients with a given pathology, looking for novel biomarkers [1-4].
Integration of experimental studies, computational approaches, and bioinformatics tools
represents an effective processing of the omics information, especially for multi-omics
studies, to investigate complex pathologies [5]. Further integration with clinical data opens
to the precision medicine perspective. In this mini review, we describe how the integration of
computational and bioinformatics methods can support the investigation on human diseases
and drive to the most effective clinical research.
Clinical bioinformatics take advantage of bioinformatics methods and technologies for
the elaboration of clinical data. Physicians, clinicians, and other researchers with expertise in
the management of patients collect heterogeneous data, with the aim of achieving underlying
information about a disease, usually not trivial or easy to obtain with standard analyses.
Such structured information can help in studying not only a single pathology, but also other
diseases (comorbidities or not) connected with the starting one [6]. The evolution of clinical
research involves also the application of bioinformatics and computational biochemistry for
drug discovery and development [7]. Virtual screening is a computational approach aimed
to screen a database of molecules searching for potential biological activity [8] and it is
object of interest in bioinformatics for the development of appropriate tools [9], as well as it
represents an active field for the search of molecules of interest for drug development [10-13].
Machine learning approaches have been developed for novel drug discovery [14]. Molecular
simulations are strongly based on computational approaches and offer many examples of
success in supporting the drug development steps [15,16]. Molecular simulations are also
useful in investigating molecular mechanisms underlying pathologies, also in the case of rare
diseases, that is of particular interest in our laboratory [17-24]. The low number of patients
make difficult to find resources to support studies, so that in many cases rare diseases are
also indicated as orphan diseases. In these cases, the bioinformatics and bio-computational
approach offers the opportunity to investigate disease as well clinical cases [20,21,23] with an approach that is an example of personalized medicine. Clinical
bioinformatics represents an interface between healthcare data and
the disciplines suitable for their analysis (statistics, mathematics,
informatics, molecular biology, biochemistry, and so on), essential
to cope with the goal of personalized medicine. From a biological
point of view, omics data and technologies, together with research
strategies (such as cancer research and system biology), are
included in the field of clinical bioinformatics [25]. The increasing
importance of clinical bioinformatics in medical laboratories is
related to the diagnosis of complex disease: the integration of omics
data with patients’ Electronic Health Record (EHR) helps physicians
in improving diagnoses and designing suitable therapies [26].
Many factors contribute in extracting knowledge from stored
data, useful in finding new evidences on a given pathology. Highthroughput
experimental techniques (e.g. Next Generation
Sequencing, NGS) provide a huge amount of data [27,28] to be
integrated with the clinical information. Discovering new molecular
biomarkers complete the knowledge and strengthen the strategy
of approaching a particular disease [3-5]. Scientific and structured
bibliography reassesses the importance of semantic algorithms for
the automated extraction of information from dedicated documents
[29,30]. The decrease in costs of technologies, making protocols
and machines available for many clinical and research centres. An
example of a complete clinical bioinformatics pipeline [31] starts
from the collection of the patients by clinical trials, the extraction of
biological samples and the conversion in omics data (by means of
microarray technologies) provide the material for the in silico postprocessing,
by the study of the differential expressed genes (DEG),
necessary for the discovery of new biomarkers related to a disease.
Translational bioinformatics is strictly connected to clinical
bioinformatics, with a particular reference on storage, analysis and
interpretation of biomedical data from an informatics point of view,
in order to ease all the health management [32]. The cooperation of
both fields is targeted to the personalized medicine. Many challenges
of the personalized medicine are still under fixing: treating largescale
genomic data, interpreting the effect of variations and the
differences in biological functions, creating robust models for
complex systems, converting evidences in medical practice, and so
on [33]. The ‘fourth paradigm’, data-intensive science, is oriented
towards personalized medicine, since nowadays in silico analyses
are feasible all over the world, with affordable computational
power and suitable infrastructures, together with the possibility of
collaboration among the scientists, and the availability of data and
results from public online repositories [34].
From these perspectives, personalized medicine can be
encouraged by integrating different kinds of omics data, helping
in the prediction of phenotypic outcomes. The main omics areas
are genomics, transcriptomics, proteomics and metabolomics.
Genomics studies genome biological function, genes distribution
on the genome, modifications in their expression, and relationships
with biological pathways, towards the increase of the therapeutic
efficiency. Clusterization of cells and tissues by expression profiles
is based on transcriptomics, such as in single-cell experiments,
in order to classify diseases by their similarity (e.g. by means of
microarray technology and Polymerase Chain Reaction protocol).
Proteomics studies the proteins with their relationships in
biological pathways, the modifications between structure and
functionality, and the interactions among them, with goals such as
drug discovery or discrimination of patients by mass spectrometry
data. Metabolomics study the set of metabolites, key regulators
in system homeostasis, in specific conditions, with emphasis on
changes caused by genetic or environmental variations, analysing
the profiles by technologies such as Gas Chromatography Mass
Spectrometry (GC-MS) or Nuclear Magnetic Resonance (NMR)
spectroscopy [35,36].
Cleaning, integrating and analysing multi-omics datasets are
important tasks in improving the personalized medicine and need
more and more updated tools and algorithms, to find intra-layer
and inter-layer connections among different omics, with references
to biological systems and, consequentially, to clinical evidences.
Currently, in silico integration methods are mainly divided into
unsupervised and supervised methods, with techniques focused
on dimensionality reduction, classification, clustering, variable
selection and network representation [37-39]. The integration
of omics data is helpful in cancer analysis [40,41] and in tissue
analysis, even considering imaging data [42]. From a statistical,
mathematical and informatics point of view, the work in [43] well
explains how the concept of integration can be enlarged in different
directions, in terms of omics data (P-integration) or in terms of
patients (N-integration), highlighted how the same problem can
be afforded by considering different features or the availability of
the data. From this perspective, the patient can be an active part of
the integration process, with all the information stored in the form
of metadata, that is available for a deeper post-processing analysis
of the obtained results. It is obvious how the metadata should be
standardized, creating an ontology where possible [44,45], not only
to ease the availability for the physicians and clinicians, but also
for the researchers that need these features, which are important
in studies as clustering, classification and outliers discovery.
This process has important implications also in the research
reproducibility [46].
A clinical bioinformatics pipeline is useful in the analysis of a
complex disease. The definition of a complex disease is not trivial,
especially because many chronic diseases have not Mendelian
behaviour. A representative list for describing a complex disease
can be caused by a combination of different factors (genetic,
environmental and lifestyle); not simple patterns of inheritance;
difficulty in predicting insurgence and transmission; complicated
treatment. For example, two patients can have different mutations
in their genomes, thus the challenge is to extract the phenotypes
and unravel specific casual mutations with association studies
[47]. An example of clinical bioinformatics pipeline applied to a
complex disease is reported in [48], with the analysis of celiac
disease together with some specific comorbidities. In this study,
microarray data available online are selected in order to extract DEGs from transcriptomics data, and Gene Set Enrichment Analysis
(GSEA) is performed to connect the most important DEGs to Gene
Ontology (GO) terms, extracting the biological process (BP) mainly
related to the pathology. Moreover, the GO and Disease Ontology
(DO) trees are compared by semantic similarity, to show which
datasets (celiac disease or other autoimmune diseases) are more
correlated. Finally, the pathways analysis is used to correlated DEGs
to Kyoto Encyclopaedia of Genes and Genomes (KEGG) database, to
search for biological pathways with strong relationships between
celiac disease and its comorbidities.
Novel efforts are applied to omics (and integration) in reducing
computational time, providing and storing big data, through
modern workflows and pipelines, which cover all the study from the
in vivo/in vitro experiment to the in-silico prediction and analysis
[49,50]. The standardization of different approaches for the insilico
elaboration is important in each subsequent step (getting
and cleaning data, aggregation of data, statistical analysis and
validation, presentation of result) [51]. Nevertheless, a cooperation
between the ‘bio’ area (biology, medicine, chemistry) and the ‘info’
area (mathematics, physics, statistics, informatics, engineering)
is compulsory, in terms of knowledge and collaboration among
the researchers. Fortunately, many programming languages
are clinical/biological data-oriented, providing ad hoc libraries
for specific problems: a valuable example is R (and the related
Bioconductor repository) [52-54], with lots of libraries conceived
for the sake of the reproducible research. For example, the class of
data ‘Summarized Experiment’ [55] was conceived to carry both
expression values and patients’ metadata, and library such as
‘TCGAbiolinks’ [56] provides the possibility to download, arrange,
analyze, and integrate with clinical information, cancer data from
the Genomic Data Commons (GDC) Data Portal online repository
[57].
In conclusion, clinical bioinformatics connect bioinformatics approaches to clinical data from patients (and healthy control) in order to extract principal features (e.g. biomarkers) that represent a sort of fingerprint for the subject. Such features are important for the prediction, diagnosis and treatment of the disease, with a view to personalized medicine, towards the last goal of treating every single patient on the base of the abovementioned specific evidences. Integration of this area with telemedicine and e-health services can represent an effective step towards personalized medicine approaches, especially taking into account the possibility to reduce the time in curing the patient, and to help physicians and clinicians in taking decisions using information, which has an automated extraction and a fast availability on electronical devices.
© 2020 and Angelo Facchiano. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.