Untargeted UPLC-MS Downstream Data Processing and Statistical Analysis - Illustrated by a Pilot Study on Cognitive Impairment

TANG Xingyu; KOH Woon-Puay; NG Sean Pin

+1 (929) 600-8049

- Feedback
- Signup
- Submit Manuscript

e-Pub

Full Text

Open Access Biostatistics & Bioinformatic

Untargeted UPLC-MS Downstream Data Processing and Statistical Analysis - Illustrated by a Pilot Study on Cognitive Impairment

TANG Xingyu¹*, KOH Woon-Puay¹ and NG Sean Pin¹*

¹Singapore Phenome Centre, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore

²Health Services and Systems Research, Duke-NUS Medical School Singapore, Singapore

*Corresponding author: NG Sean Pin, TANG Xingyu, Singapore Phenome Centre, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore

Submission: June 26, 2018;Published: July 25, 2018

DOI: 10.31031/OABB.2018.02.000531

ISSN: 2578-0247
Volume2 Issue2

Abstract

This article is to introduce the procedure of untargeted ultra-performance liquid chromatography-mass spectrometry(UPLC-MS) downstream data processing and statistica

Keywords: Feature filtration; Normalization; OPLS-DA; PCA; Quality control; Untargeted UPLC-MS

Introduction

Study overview

The present analysis utilized baseline data and bio-specimens from 99 cases with cognitive impairment and 99 age-and-sex matched controls nested within the prospective Singapore Chinese Health Study cohort [1]. Participants were recruited and interviewed for lifestyle and habitual dietary intake from 1993- 1998, and gave blood for research from 1999-2004, at a mean age of 63.6 (range 50.5-74.1) years. Cognitive function was measured using a 30-item modified Singapore version of the Mini-Mental State Examination (MMSE) from 2014-2016, after average 13.8 (range 10.4-20.2) years of follow-up. Participants were at a mean age of 77.5 (66.0-88.3) years at the time of cognitive assessment, and cases of cognitive impairment were identified using educationadjusted cut-offs of the MMSE scores (MMSE.edu) [2].

Data generation

The plasma samples collected from these 198 individuals underwent a lipid profiling analysis performed on ACQUITY UPLC/ Xevo G2-XS QToF (Waters, Manchester, UK) and equipped with an electrospray source operating at either positive (ESI+) or negative (ESI-) ionization mode. The liquid chromatography (LC) stage of the experiment performed a physical separation of compounds in the samples, followed by the mass spectrometry (MS) stage which measured the mass of charged particles (i.e. ions). Therefore, an MS feature was characterised by a combination of a retention time (RT) and a mass-to-charge ratio (m/z). Raw MS data were preprocessed using Progenesis QI (Nonlinear Dynamics, Newcastle, UK), including automatic alignment for RT, peak picking, and deconvolution that groups ions into compounds. Abundance data for all injections were exported for downstream data processing and statistical analysis. Data processing and statistical analysis procedures performed in SPC are described with details in Section 2 and 3 of this article respectively. Section 4 discusses the study as well as some related/future works, followed by the conclusions presented in Section 5.

Data Processing

Feature filtration

Multiple criteria are employed to identify and remove unreliable MS features (noises) from the dataset [3]. Prior to UPLCMS experiment, equal aliquots of the study samples are pooled to form the quality control (QC) sample. During the experiment, the QC sample is injected after every 5 or 10 injections of study samples, as well as at the start and the end of the analysis run. This is to monitor instrument stability and analytic reproducibility. The abundance variation among QC series is one of the key criteria in feature filtration. Also, during the UPLC-MS experiment, after the analysis run, a sequence of diluted QC (dQC) samples with known dilution factors are injected. The abundance correlation to dilution factors among dQC series is another key criterion in feature filtration, since features which do not respond to dilutions are much likely to be background noises. In this pilot study, the ESI+ and ESI- modes initially generate 8887 and 4806 MS features respectively. After the multiple-criteria filtration, there are 3020 (34%) and 818 (17%) of them remaining in the datasets.

Data normalization

After feature filtration, data normalization is employed to further correct the dataset for potential noises. Probabilistic quotient normalization (PQN) [4] accounts for potentially different dilutions of samples, by scaling the spectra to a same virtual overall concentration, which is generic and widely applied in Metabolomics studies. Locally weighted scatterplot smoothing (LOESS) [5] normalization corrects for variability in feature abundances related to injection order, due to for example, possible drift of detector sensitivity over time. LOESS normalization is less generic as relying on the QC series but is powerful in correction of abundances for injection order, which is commonly needed in studies with relatively large sample sizes [6].

Statistical Analysis

Normalized abundance data of the MS features having passed the filtrations, for all study samples as well as the QC injections are inputted to statistical analysis.

Unsupervised analysis

Unsupervised analysis is to identify major sources of variations in a high-dimensional dataset, and a typical approach is the principal component analysis (PCA). In preparation for PCA, normalized abundances are log-transformed for better normality in distributions. Thereafter, data are mean-centred and Paretoscaled, which is a compromise between the conventional autoscaling and no scaling at all, prior to PCA. PCA converts possibly correlated features into orthogonal components, which are sorted by the variances explained by them, making it easier to reduce dimensionality as well as to visualize the dataset. Moreover, the positions of QC series in the PCA score plot help monitoring the instrument stability and analytic reproducibility.

In this pilot study, the first two principal components explain 11.32% and 9.90% of the total variance of the data respectively for ESI+ mode, while those for ESI- mode explain 11.78% and 8.64% respectively. From the score plots (Figure 1 & 2), it is observed that:

Figure 1: ESI+ PCA Score Plot (First 2 Components).

1. QC series are clustered together on the plot, showing good stability and reproducibility in the UPLC-MS experiment.

2. Major variations in the data are not correlated with the study outcome (MMSE.edu).

Figure 2: ESI- PCA Score Plot (First 2 Components).

Supervised analysis

Supervised analysis is to maximize separation between groups of study subjects in correspondence to the study outcome of interest. Orthogonal partial least squares (OPLS) [7] and OPLSdiscriminate analysis (DA) [8] remove variation in data that is perpendicular to the study outcome and thereafter calculate components which are predictive for the outcome. In each OPLS(- DA) model, a R2 value and a Q2 value are obtained to measure the model fit and the model predictability, respectively. To validate the model, a random permutation strategy is employed where the study outcome is randomly shuffled and a new OPLS (-DA) model is built. 100 independent permutations are calculated, and the p-values testing whether the R2 and Q2 values of the actual model are higher than those of a random model are used to evaluate model validity.

In this pilot study, for ESI+ mode, we get R2 = 0.308 with p-value = 0.034, and Q2 = -0.180 with p-value = 0.219. For ESI- mode, we get R2 = 0.380 with p-value = 0.002, and Q2 = -0.128 with p-value = 0.092. These statistics indicate that the resulting model fit is moderate, but model predictability is poor, implying that the model built might not be reliable.

Discussion

In this pilot study, we were not able to obtain any conclusive results, and this could be due to the following limitations of the study:

1. Plasma samples were collected at least 10 years before the MMSE tests were carried out. This can make the “signal” that we are looking for, i.e. differential metabolic profiles between cases and controls, too weak to be observed.

2. The role of MMSE is still controversial as a stand-alone single-administration test in the identification of mild cognitive impairment patients who could develop dementia [9].

3. The association between cognitive impairment and plasma metabolite profile can be expected to be confounded by many dietary, lifestyle and comorbidity factors. In a substudy where we excluded all subjects with anyone of four major confounding diseases, namely hypertension, heart attack, stroke, and diabetes, the results improved but were still not conclusive due to reduced sample size.

Due to the limitations of observational epidemiologic studies, more quantitative approaches may be more promising to achieve conclusive results. Targeted MS experiment focuses on quantitative analysis of pre-specified metabolites, and recent works have shown improvement of ability in terms of the amount of compounds measured in a single run [10]. The technique of Nuclear Magnetic Resonance (NMR) is also gaining popularity in modern molecular epidemiology studies owning to its high reproducibility as well as quantitative accuracy [11].

Conclusion

The procedures of untargeted UPLC-MS downstream data processing and statistical analysis, performed in SPC, are described in this article, illustrated by a pilot study on cognitive impairment. SPC is consistently making efforts to optimize these procedures based on experience gained from various studies, as well as the latest literatures.

Programming

The procedures of data processing and statistical analysis are all implemented using R v3.4.1 (R Core Team) and RStudio v1.0.153 ( RStudio , Inc.) in SPC.

Acknowledgment

This work was supported by the Singapore National Medical Research Council (NMRC/CSA/0055/2013) and the United States National Institutes of Health (UM1 CA182876 and R01 CA144034). The SPC team supported this study in conducting the UPLC-MS experiment, authoring the use of data and providing valuable comments and suggestions.

References

© 2018 TANG Xingyu. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

Submit Query

PubMed Indexed Articles

Track Your Article

Editor In Chief

Hirotada TSUJII

Ph.D in Agriculture from Faculty of Agriculture, Tohoku University

Approaches in Poultry, Dairy & Veterinary Sciences

Maria Kuman

Research Professor, PhD, Holistic Research Institute

Advances in Complementary & Alternative Medicine

Tomasz Karski

MD PhD, Professor, Vincent Pol University

Orthopedic Research Online Journal

Jiexiong Feng

Professor, Chief Doctor, Director of Department of Pediatric Surgery, Associate Director of Department of Surgery, Doctoral Supervisor Tongji hospital, Tongji medical college, Huazhong University of Science and Technology

Research in Pediatrics & Neonatology

Muhammad Atiqullah

Senior Research Engineer and Professor, Center for Refining and Petrochemicals, Research Institute, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

Research & Development in Material Science

Ian James Martins

Fellow of International Agency for Standards and Ratings (IASR), Edith Cowan University, Sarich Neuroscience Research Institute

Advancements in Case Studies

Thomas F George

Chancellor Emeritus / Professor Emeritus of Chemistry and Physics, University of Missouri–St. Louis

Annals of Chemical Science Research

Jose Crisologo de Sales Silva

Ph.D in Science from the Federal University of Alagoas, UFAL, Brazil

Novel Research in Sciences

Naglaa Sami Adbel Aziz Mahmoud

Assistant Professor in College of Architecture, Art and Design

Academic Journal of Engineering Studies

Tong-Ching Tom Wu

Interim Dean, College of Education and Health Sciences, Director of Biomechanics Laboratory, Sport Science Innovation Program, Bridgewater State University

Research & Investigations in Sports Medicine

Dr. Jose Luis Turabian

Professor of numerous training courses in Family Medicine

Associative Journal of Health Sciences

Dariusz Jacek Jakóbczak

Assistant Professor, Department of Electronics and Computer Science

COJ Electronics & Communications

Önder Pekcan

Emeritus Professor of Physics, Kadir Has University, Turkey

Polymer Science: Peer Review Journal

Member In

View All...

Quick Links

Editorial Board Registrations

×

Join as Editor

Join as Associate Editor
Submit your Article
Best Paper of the Volume
Reprints
Refer a Friend

×

Refer a Friend

Suggested By

Referrer Details
Advertise With Us

×

Advertise With Us

Our Recent Edition

Top Editors

Zhengcai Lou

Wenzhou Medical University, China
Ya Lie Ku

Fooyin University, Taiwan
Volkan Sarper Erikci

Saglik Bilimleri University, Turkey
Tomasz Karski

Vincent Pol University, Poland
Thamil Selvam

National Defence University of Malaysia, Malaysia
Tarik Baykara

Dogus University, Turkey
Steven Smith

Hope College, USA
Stanislav Grigoriev

Russian Academy of Sciences, Russia
Shi Zhou

Southern Cross University, Australia
Shewikar Farrag

Umm Al-Qura University, Saudi Arabia
Ray Marks

City University of New York, USA
Praveen K Maghelal

Khalifa University of Science & Technology, United Arab Emirates
Pipat Chooto

Prince of Songkla University, Thailand
Peng Yu

Hebei Normal University, China
Nawal Mohamed Khalafallah

Alexandria University, Egypt
N K Kishore

Indian Institute of Technology Kharagpur, India
Muzzalupo Innocenzo

Council for Agriculture Research and Analysis of Agri Economy (CREA), Italy
Muhammad Atiqullah

King Fahd University of Petroleum and Minerals, Saudi Arabia
Mohd Azlan Mohd Ishak

Universiti Teknologi MARA, Malaysia
Mohamed A Rashed

King Abdulaziz University, Saudi Arabia
Maurice E Morgenstein

University of Oregon, USA
Martin Sweatman

University of Edinburgh, Scotland
Maria Kuman

University of Tennessee, USA
Manuel Velasco

Central University of Venezuela, Venezuela
Majid Monajjemi

Islamic Azad University Central Tehran Branch, Iran
Luisetto Mauro

Tourin University, Italy
Lloyd Arthur Jenkins

Teaching & Public Speaking, Spain
Leonardo Milella

Paeditric Hospital "Giovanni XXIII", Italy
Katerina Chryssou

General Chemical State Laboratory , Greece
Kanakis Dimitrios

University of Nicosia, Cyprus
Jose Luis Clua Espuny

Universidad Miguel Hernández de Elche, Spain
John Korstad

Oral Roberts University, USA
Jinliang Zhang

Beijing Normal University, China
Irina Koretsky

Howard University, USA
Ian James Martins

Edith Cowan University, Australia
Hamid Yahiya Hussain

Dubai Health Authority, UAE
Gundu HR Rao

University of Minnesota, USA
GP Karmakar

Indian Institute of Technology Kharagpur, India
Ghassan George Haddad

Serhal Hospital, Lebanon
George Thomas

University of Missouri-St. Louis , USA
George Gregory Buttigieg

University of Malta, Malta
Fumihiko Hinoshita

National Center for Global Health and Medicine, Japan
Freida Pemberton

Molloy College, USA
Francisco Welington de Sousa Lima

Federal University of Piauí, Brazil
Florian Bert

Krankenhaus Nordwest Hospital, Germany
Fedor Lisetskii

Belgorod State University, Russia
Fathi Habashi

Laval University, Canada
Dora Alicia Cortes Hernandez

Cinvestav-Unidad Saltillo, Mexico
Daniel Kinem

UPMC Hamot Neuroscience Institute, USA
Conxita Mestres Miralles

Ramon Llull University, Spain
Barry Kraynack

White Bear Associates, LLC, USA
Arkady S Voloshin

Lehigh University, USA
Alireza Heidari

California Southern University, USA
Alex Guskov

Institute of Solid State Physics of RAS, Russia
Alan Diego Briem Stamm

University of Buenos Aires, Argentina
Ahmed Nasr Ghanem

Mansoura University, Egypt
Afaf K El Ansary

King Saud University, Saudi Arabia
A Bernardes

University of Coimbra, Portugal

Financial Support

Latest e-Books

Latest Video

© 2017 Crimson Publishers, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by Crimson Publishers is licensed under

a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com. Best viewed in

| Above IE 9.0 version

Scroll

Full Text

Open Access Biostatistics & Bioinformatic

Untargeted UPLC-MS Downstream Data Processing and Statistical Analysis - Illustrated by a Pilot Study on Cognitive Impairment

Abstract

Introduction

Study overview

Data generation

Data Processing

Feature filtration

Data normalization

Statistical Analysis

Unsupervised analysis

Supervised analysis

Discussion

Conclusion

Programming

Acknowledgment

References

PubMed Indexed Articles

Track Your Article

Editor In Chief

Member In

Signup for Newsletter

Quick Links

Our Recent Edition

Top Editors

Financial Support

Sponsors

Latest e-Books

Latest Video

Reprints