Efficient Prediction of Protein Malanylation Sites
Using NLP and Machine Learning

Hananeh Rajabiun; Mohammad Ghasemzadeh; Masroor Hassan

+1 (929) 600-8049

- Feedback
- Signup
- Submit Manuscript

e-Pub

Full Text

COJ Robotics & Artificial Intelligence

Efficient Prediction of Protein Malanylation Sites Using NLP and Machine Learning

Hananeh Rajabiun, Mohammad Ghasemzadeh* and Masroor Hassan

Computer Engineering Department, Yazd University, Yazd, Iran

*Corresponding author: Mohammad Ghasemzadeh, Computer Engineering Department, Yazd University, Yazd, Iran

Submission: May 22, 2023;Published: June 08, 2023

DOI: 10.31031/COJRA.2023.03.000558

ISSN:2832-4463
Volume3 Issue2

Abstract

This research fills a scientific gap by addressing the challenge of identifying the site of Malanylation in proteins. It highlights the importance of efficient solutions that reduce execution time and improve output accuracy. The study introduces a novel framework for extracting informative features from protein functional domains. Multiple classifiers are utilized for prediction and experimental results indicate that the CRF-Mal method outperforms other approaches. Notably, the XG Boost classifier demonstrates superior performance compared to alternative classifiers.

Keywords:Malanylation; Machine learning; Natural language processing; Feature extraction

Introduction

Research in the field of protein function and Post-Translational Modifications (PTMs) has highlighted their crucial role in regulating biological processes. There are over 600 identified types of PTMs, including diverse chemical groups and peptides. Malanylation, a recently discovered PTM, plays a significant role in cellular processes and dynamic regulation. However, conventional methods for identifying alanylation sites encounter challenges. To address this, machine learning techniques are being employed to accurately predict these sites and reduce the reliance on time-consuming and labor-intensive experimental approaches [1]. In this research, various methods for predicting alanylation sites in proteins are discussed. These include the Mal-Lys [2] method, Malo Pred [3], feature extraction using sequence and structural features, a hybrid feature selection method and the use of machine learning classifiers such as random forest, support vector machine, K nearest neighbors, logistic regression and Light Gradient Boosting Machine. Additionally, deep learning approaches such as Deep Mal and Mal site-Deep have been utilized for accurate prediction [4,5]. The use of natural language processing techniques and protein domain information has also been proposed. The CRFMal framework, which extracts features from amino acid sequences using Term Frequency Category Relevance and employs the Fisher’s score for feature selection is introduced as a novel approach. The framework is evaluated and compared with existing tools and machine learning classifiers.

Material and Methods

The article describes a study that analyzed three protein data sets of E. coli, H. sapiens, and M. musculus using a cross-validation strategy. The data sets were selected to reduce sequence similarity and homology and were prepared from Uniports and CD-HIT databases [6]. The protein sequences were shortened to 25 amino acids with lysine in the center. The model was trained in 10 iterations, optimizing the parameters based on the training sets, and averaging the results of 10 repetitions. The study provides a systematic approach for analyzing protein sequences from different organisms and can help identify similarities and differences in protein structure and function across species. In the TFCRF method, two factors, Positive RF and Negative RF, are used to accurately weigh the features. Positive RF represents the ratio of the number of amino acids in a protein sequence that contains a specific property to the total number of amino acids in the sequence. Negative RF, on the other hand, represents the ratio of the total number of amino acids in other protein sequences without that property to the total number of amino acids in those sequences. These factors help in determining the significance of the features for prediction purposes [7].

Result

The proposed framework for detecting alanylation sites was evaluated using 10-fold cross-validation on three datasets: E. coli, M. musculus, and H. sapiens. The results showed that the XG Boost algorithm achieved the highest accuracy among the classifiers, while SVM had the lowest reported accuracy. The analysis of the area under the ROC curve confirmed that XG Boost outperformed the other classifiers in terms of generalization ability and prediction performance for identifying alanylation and non-alanylation sites. Error analysis further supported the superiority of XG Boost, demonstrating lower error rates and greater consistency compared to DNN, RF and SVC algorithms. In this study, the performance of the proposed method for predicting alanylation sites was compared to other previous methods. The comparison was based on evaluation criteria such as ACC, SN, SP and MCC, using the XG Boost classification results. The results showed that the proposed method outperformed other methods in all tested datasets, with higher accuracy, sensitivity, and other specific features. Specifically, in the E. coli dataset, the proposed method achieved significantly higher accuracy compared to other methods. Moreover, in the H. sapiens and M. musculus datasets, the proposed method also achieved higher accuracy and other evaluation metrics compared to previous methods.

Conclusion

This study presents a machine learning and natural language processing approach for the detection of malanylation sites in proteins. The TF-CRF technique is employed to extract functional domain information, and the most effective features are selected to prevent overfitting of the model. The cross-validation results demonstrate that the XG Boost classifier performs better than other classifiers when using the selected and extracted features. Furthermore, the results indicate that the features extracted by the proposed method exhibit the best performance.

References

Islam S, Mugdha SBS, Dipta SR, Arafat ME, Shatabda S, et al. (2022) MethEvo: An accurate evolutionary information-based methylation site predictor. Neural Computing and Applications pp: 1-12.
Xu Y, Ding YX, Ding J, Wu LY, Xue Y (2016) Mal-Lys: Prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep 6: 38318.
Liu X, Wang L, Li J, Hu J, Zhang X (2020) Mal-Prec: Computational prediction of protein Malonylation sites via machine learning based feature integration. BMC Genomics 21(1).
Wang M, Xiaowen Cui, Shan Li, Xin HY, Anjun M, et al. (2020) Deepmal: Accurate prediction of protein malonylation sites by deep neural networks. Chemometrics and Intelligent Laboratory Systems 207: 104175.
Wang M, Song L, Zhang, Y, Gao H, Yan L, et al. (2022) Malsite-deep: Prediction of protein malonylation sites through deep learning and multi-information fusion based on Nearmiss-2 strategy. Knowledge Based Systems 240: 108191.
Hananeh R, Mahdis MH, Hadi Z, Mehdi D (2022) A hybrid feature selection method for predicting lysine malonylation sites in proteins via machine learning. Chemometrics and Intelligent Laboratory Systems 222: 104496.
Maleki M, Abdollahzadeh A (2007) TFCRF: A novel feature weighting method based on class information in text categorization. International Conference on Computer, Information and Systems Science and Engineering, Bangkok.

© 2023 Mohammad Ghasemzadeh. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

---->

Submit Query

PubMed Indexed Articles

Track Your Article

Editor In Chief

Hirotada TSUJII

Ph.D in Agriculture from Faculty of Agriculture, Tohoku University

Approaches in Poultry, Dairy & Veterinary Sciences

Maria Kuman

Research Professor, PhD, Holistic Research Institute

Advances in Complementary & Alternative Medicine

Tomasz Karski

MD PhD, Professor, Vincent Pol University

Orthopedic Research Online Journal

Jiexiong Feng

Professor, Chief Doctor, Director of Department of Pediatric Surgery, Associate Director of Department of Surgery, Doctoral Supervisor Tongji hospital, Tongji medical college, Huazhong University of Science and Technology

Research in Pediatrics & Neonatology

Muhammad Atiqullah

Senior Research Engineer and Professor, Center for Refining and Petrochemicals, Research Institute, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

Research & Development in Material Science

Ian James Martins

Fellow of International Agency for Standards and Ratings (IASR), Edith Cowan University, Sarich Neuroscience Research Institute

Advancements in Case Studies

Thomas F George

Chancellor Emeritus / Professor Emeritus of Chemistry and Physics, University of Missouri–St. Louis

Annals of Chemical Science Research

Jose Crisologo de Sales Silva

Ph.D in Science from the Federal University of Alagoas, UFAL, Brazil

Novel Research in Sciences

Naglaa Sami Adbel Aziz Mahmoud

Assistant Professor in College of Architecture, Art and Design

Academic Journal of Engineering Studies

Tong-Ching Tom Wu

Professor, College of Education and Health Sciences, Director of Biomechanics Laboratory, Sport Science Innovation Program, Bridgewater State University

Research & Investigations in Sports Medicine

Dr. Jose Luis Turabian

Professor of numerous training courses in Family Medicine

Associative Journal of Health Sciences

Dariusz Jacek Jakóbczak

Assistant Professor, Department of Electronics and Computer Science

COJ Electronics & Communications

Önder Pekcan

Emeritus Professor of Physics, Kadir Has University, Turkey

Polymer Science: Peer Review Journal

Zhu Yaohua

Department of Industrial & Systems Engineering, The Hong Kong Polytechnic University, Hong Kong

Aspects in Mining & Mineral Science

Member In

View All...

Quick Links

Editorial Board Registrations

×

Join as Editor

Join as Associate Editor
Submit your Article
Best Paper of the Volume
Reprints
Refer a Friend

×

Refer a Friend

Suggested By

Referrer Details
Advertise With Us

×

Advertise With Us

Our Recent Edition

Top Editors

Zhengcai Lou

Wenzhou Medical University, China
Ya Lie Ku

Fooyin University, Taiwan
Volkan Sarper Erikci

Saglik Bilimleri University, Turkey
Tomasz Karski

Vincent Pol University, Poland
Thamil Selvam

National Defence University of Malaysia, Malaysia
Tarik Baykara

Dogus University, Turkey
Steven Smith

Hope College, USA
Stanislav Grigoriev

Russian Academy of Sciences, Russia
Shi Zhou

Southern Cross University, Australia
Shewikar Farrag

Umm Al-Qura University, Saudi Arabia
Ray Marks

City University of New York, USA
Praveen K Maghelal

Khalifa University of Science & Technology, United Arab Emirates
Pipat Chooto

Prince of Songkla University, Thailand
Peng Yu

Hebei Normal University, China
Nawal Mohamed Khalafallah

Alexandria University, Egypt
N K Kishore

Indian Institute of Technology Kharagpur, India
Muzzalupo Innocenzo

Council for Agriculture Research and Analysis of Agri Economy (CREA), Italy
Muhammad Atiqullah

King Fahd University of Petroleum and Minerals, Saudi Arabia
Mohd Azlan Mohd Ishak

Universiti Teknologi MARA, Malaysia
Mohamed A Rashed

King Abdulaziz University, Saudi Arabia
Maurice E Morgenstein

University of Oregon, USA
Martin Sweatman

University of Edinburgh, Scotland
Maria Kuman

University of Tennessee, USA
Manuel Velasco

Central University of Venezuela, Venezuela
Majid Monajjemi

Islamic Azad University Central Tehran Branch, Iran
Luisetto Mauro

Tourin University, Italy
Lloyd Arthur Jenkins

Teaching & Public Speaking, Spain
Leonardo Milella

Paeditric Hospital "Giovanni XXIII", Italy
Katerina Chryssou

General Chemical State Laboratory , Greece
Kanakis Dimitrios

University of Nicosia, Cyprus
Jose Luis Clua Espuny

Universidad Miguel Hernández de Elche, Spain
John Korstad

Oral Roberts University, USA
Jinliang Zhang

Beijing Normal University, China
Irina Koretsky

Howard University, USA
Ian James Martins

Edith Cowan University, Australia
Hamid Yahiya Hussain

Dubai Health Authority, UAE
Gundu HR Rao

University of Minnesota, USA
GP Karmakar

Indian Institute of Technology Kharagpur, India
Ghassan George Haddad

Serhal Hospital, Lebanon
George Thomas

University of Missouri-St. Louis , USA
George Gregory Buttigieg

University of Malta, Malta
Fumihiko Hinoshita

National Center for Global Health and Medicine, Japan
Freida Pemberton

Molloy College, USA
Francisco Welington de Sousa Lima

Federal University of Piauí, Brazil
Florian Bert

Krankenhaus Nordwest Hospital, Germany
Fedor Lisetskii

Belgorod State University, Russia
Fathi Habashi

Laval University, Canada
Dora Alicia Cortes Hernandez

Cinvestav-Unidad Saltillo, Mexico
Daniel Kinem

UPMC Hamot Neuroscience Institute, USA
Conxita Mestres Miralles

Ramon Llull University, Spain
Barry Kraynack

White Bear Associates, LLC, USA
Arkady S Voloshin

Lehigh University, USA
Alireza Heidari

California Southern University, USA
Alex Guskov

Institute of Solid State Physics of RAS, Russia
Alan Diego Briem Stamm

University of Buenos Aires, Argentina
Ahmed Nasr Ghanem

Mansoura University, Egypt
Afaf K El Ansary

King Saud University, Saudi Arabia
A Bernardes

University of Coimbra, Portugal

Financial Support

Latest e-Books

Latest Video

© 2017 Crimson Publishers, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by Crimson Publishers is licensed under

a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com. Best viewed in

| Above IE 9.0 version

Scroll