Crimson Publishers Publish With Us Reprints e-Books Video articles

Abstract

Significances of Bioengineering & Biosciences

Prediction and Survival Analysis of Head and Neck Cancer in Patients Using Epigenomics Data and Advanced Machine Learning Methods

Vikaskumar Chaudhary1, Kalpdrum Passi1* and Chakresh Kumar Jain2

1School of Engineering and Computer Science, Laurentian University, Canada

2Department of Biotechnology, Jaypee Institute of Information Technology, India

*Corresponding author:Kalpdrum Passi, School of Engineering and Computer Science, Laurentian University, Canada

Submission: January 25, 2024;Published: February 12, 2024

DOI: 10.31031/SBB.2024.06.000646

ISSN 2637-8078
Volume6 Issue 5

Abstract

Epigenomics is the field of biology dealing with modifications of the phenotype that do not cause any alteration in the sequence of cell DNA. Epigenomics adds something to the top of DNA to change the properties, which eventually prohibits certain DNA behavior from being performed. Such modifications occur in cancer cells and are the sole cause of cancer. The main objective of this research is to perform prediction and survival analysis of Head and Neck Squamous Cell Carcinoma (HNSCC) which is one of the biggest reasons of death and accounts for more than 650,000 cases and 330,000 deaths annually worldwide. Tobacco use, alcohol consumption, Human Papillomavirus (HPV) infection (for oropharyngeal cancer) and Epstein-Barr Virus (EBV) infection are the main risk factors associated with head and neck cancer (for nasopharyngeal cancer). Males, with a proportion ranging from 2:1 to 4:1, are slightly more affected than females. Four different types of data are used in this research to predict HNSCC in patients. The data includes methylation, histone, human genome and RNA-Sequences. The data is accessed through open-source technologies in R and Python programming languages. The data is processed to create features and with the help of statistical analysis and advanced machine learning techniques, the prediction of HNSCC is obtained from the fine-tuned model. The optimal model was determined to be ResNet50 utilizing the Sobel feature selection method for image data and Relief F-based feature selection for clinical features, achieving a test accuracy of 97.9%. The model’s precision score was 0.929, its recall score was 0.930 and its F1 score was 0.930. Additionally, the ResNet101 model demonstrated the best performance using the Histogram of Gradients feature selection method for image data and mutual information-based feature selection for clinical features, yielding a test accuracy of 96.1%. Its precision score, recall score and F1 score were identical to the aforementioned ResNet50 model. The research also utilized Kaplan-Meier survival analysis to investigate the survival rates of patients based on various factors, including age, gender, smoking status, tumor size and location of site. The results obtained from this analysis yielded the effectiveness of the method in providing valuable insights for risk assessment.

Keywords: Epigenomics; Histone; DNA methylation; Human Genome; RNA

Get access to the full text of this article