Jahandideh S1*, Jahandideh M2, Asefzadeh S3 and Ziaee A4
1Griffith University, Australia
2Zanjan University, Iran
3Qazvin University of Medical Sciences, Iran
4Qazvin University of Medical Sciences, Iran
*Corresponding author: Jahandideh S, School of Medicine & Menzies Health Institute Queensland, Gold Coast Campus, Griffith University, Queensland, Australia, Email: Sepideh.jahandideh@grijfithuni.edu.au
Submission: September 19, 2017; Published: November 15, 2017
ISSN: 2577-2007Volume1 Issue1
Objective: The first cause of death worldwide is cardiovascular disease (CVD). CVD covers a wide array of disorders, including diseases of the cardiac muscle and of the vascular system supplying the heart, brain, and other vital organs. This research aims to the comprehensive impact that a series of lifestyle data from a population has on the main cardiovascular risk factors. Such predictions of the influence of lifestyle on cardiovascular risk factors could be useful.
Material and methods: A cross-sectional study was designed; the subjects who lived in Minodar, Iran were interviewed by trained nurses using a structural questionnaire. Data were processed from a sample of 393 subjects of both sexes aged 26-79 years old. The output data were: high/low cholesterolemia, HDL-C cholesterol, triglyceridemia. The input data were: sex, age, build, weight, marital status, Individual's status in the family, physical activity, hours of sleep per day, smoking, tobacco type, BMI. Two predictor models including artificial neural network and linear regression were applied.
Results: Logistic regression (LR) as a conventional model obtained poor prediction performance measure values. However, LR distinguished that relationships exist between inputs and dichotomous output variables (sex and BMI in TG and sex, weight and tobacco type in HDL-C and sex in total cholesterol as more significant parameters). On the other hand, artificial neural network as a more powerful model showed high response accuracy in predicting CVD risk factors. Such pleasing results could be attributed to the non-linear nature of ANN in problem solving which provides the opportunity to predict independent variables to dependent ones non-linearly.
Conclusion: The results displayed that our ANN-based model approach is very hopeful and may play a useful role in developing a better method for assessing the influence of lifestyle on cardiovascular risk factors.
Keywords: Cardiovascular risk factors; Lifestyle; Neural network; Logistic regression
Recent years have witnessed a dramatic decline in cardiac mortality, but ischemic heart disease is still the leading killer in many parts of the world [1]. In Iran, similarly, Coronary Artery Disease (CAD) is the principle culprit for mortality, morbidity, and disability [2].
Direct (hospitalization and treatment) and indirect (absenteeism and unemployment) costs caused by CAD are estimated at 26.77 billion Rials in the Iranian Oil Industry [3]. The most important established risk factors for CAD include: high blood cholesterol (total cholesterol, LDL), hypertension, smoking, diabetes, and poor eating habits [4]. Although these risk factors have been diagnosed as the main causes of CAD, many studies have shown that more than 50 percent of patients with CAD have, in most cases, an absence of all risk factors except high cholesterol [5-8]. Smoking, hypertension, dyslipidemia, and inactivity are the main and controllable factors in CAD. Control and treatment of risk factors can significantly reduce mortality and the costs associated with CAD [9]. Primary and secondary prevention of coronary heart disease in residents of South Asia is a major health priority because the risk factors, including for CAD, in this population are very common [10]. Iran is also one of the countries in southwestern Asia that is not exempted from this matter. The death rates from CAD in Iran show it is necessary to find a solution to reduce the incidence of these illnesses and deaths.
Ischemic Heart Disease (IHD) is the single largest cause of death in developed countries and is one of the main contributors to the disease burden in developing countries. The two leading manifestations of IHD are angina and acute myocardial infarction. In 2001, IHD was responsible for 7.3 million deaths and 58 million Disability-Adjusted Life Years (DALYs) lost world Wide Organization [11].
The risk of developing Cardiovascular Disease (CVD) depends to a large extent on the presence of several risk factors. The major risk factors for CVD include tobacco use, high blood pressure, high blood glucose, lipid abnormalities, obesity, and physical inactivity. The global variations in CVD rates are related to temporal and regional variations in these known risk factors. Discussions of the strength of the associations of the various factors with CVD are widely discussed in the literature. Although some risk factors, such as age, ethnicity, and gender, obviously cannot be modified, most of the risk is attributable to lifestyle and behavioral patterns, which can be changed [12].
Artificial Neural Networks (ANNs) have attracted growing interest in recent years as a supplement or alternative to standard statistical techniques to predict complex phenomena in medicine and biological studies [13]. A neural network is a non-linear statistical data modeling tool that is able to capture and represent complex input/output relationships.
In this study, ANNs as a non-algorithmic model are used in predicting the influence of life style in cardiovascular risk factors. Prediction of risk factors will be helpful in assessing the comprehensive impact that a set of data demonstrating lifestyle from society has on the main cardiovascular risk factors. Most of the previous research has focused on predicting heart disease by considering risk factors [14-16].
This study used a cross-sectional design and a convenience sample of 393 subjects. Subjects' participation in the research was voluntary. The subjects were interviewed randomly by trained nurses and physicians using a structural questionnaire with each interview taking around 30 minutes. The questionnaire contained anthropometric, laboratory and physical activity questions. The subjects were used to compile the dataset. Both sexes were represented, were aged from between 26 to 75 years (average age 41.97 years old), and were all living in Minodar, Iran in 2015.
Logistic regression and artificial neural network as two algorithmic and non-algorithmic models were used.
The input data from the network for each subject were as follows: Sex (1: male, 2: female), age (years), height (cm), weight (kg), marital status (1: single, 2: married, 3: divorced, 4: wife dead), Individual's status in the family (1: head of family, 2: wife, 3: children), physical activity (1: high, 2: moderate, 3: low), hours of sleep per day (1: Less than 8 hours, 2: more than 8 hours), smoking (1: yes, 2: no), tobacco kind (Cigarette, pipe, hubble-bubble, none of them), BMI (underweight: <18.5, normal: 18.5-24.9, overweight: 25-29.9, obesity grade1: 30-34.9, obesity grade 2:>35). More details about HDL-Cholesterol, Cholesterol, Triglycerides and values of selected parameters are presented in (Table 1).
Table 1: Characterization of the study population.
We used the WHO Global physical Activity Questionnaire (GPAQ) to calculate the physical activity. It comprises of 16 questions to collect information on physical activity participation in three settings as well as sedentary behavior. The domains are activity at work; travel to and from places, and recreational activities. To calculate a categorical indicator, the total time spent in physical activity during a typical week, the number of days as well as the intensity of the physical activity is taken into account. The three levels of physical activity suggested for classifying populations are low, moderate, and high (Organization, 2012). According to the Analysis Guide, physical activities of subjects were calculated and clustered using MATLAB programming language. Each subject was put into one of 3 categories of physical activity. The cut-off values for the data were: triglyceride: 160mg/dl; cholesterol: 220mg/dl; HDL-cholesterol: 45mg/dl for males, 50mg/dl for females, with the values coming from the ranges of normality indicated by the literature [17]. Of the 393 subjects in the group study, 80 subjects were used for the test phase and the remaining 313 for the training phase.
Regression is the study of dependence. It is used to answer questions such as: do changes in cholesterol depend on age, sex, physical activity? The goal of regression is to summarize observed data as simply, and usefully as possible.
The human brain has been used to design and develop ANNS. Accordingly, they are a cellular information processing system. The neural network consisted of an interconnected set of artificial neurons. The neurons perform collectively and simultaneously as summing and non-linear mapping junctions for all data and inputs. Changes take place in structure on the basis of internal and external information that flows through the network during the learning phase, so ANN is called an adaptive system. To model complex relationships between inputs and outputs or finding patterns in data, modern neural networks are usually applied (Figure 1).
Figure 1: Diagram illustrating the structure of the developed neural network.
In this study, we trained an independent network for each output (Cholesterol, HDL-C and triglycerides). Our networks were trained perfectly over two layers of neurons. The factors of the optimized neural network are shown in Table 2. One or two hidden layers, different learning constants and hidden nodes were tested to train with inputs. An in-house program written in the MATLAB programming language was used to build the neural network.
Table 2: Optimized neural network parameters
The ANN-based models were fed with the twelve mentioned factors. The ANN was used as the predictor models on the data base by using the "trainrp" method. For each network, the optimized structure of four layer neural networks included one output neurons, two hidden layers and an input layer. The variables (Cholesterol, HDL-C and triglycerides) were used as the output in a dichotomous form in the neural network. 313 subjects were entered into the network as a training set and the remainders (80 subjects) were considered as a testing set.
In the first phase, all of the inputs were entered into the network and the output Triglycerides, HDL-C, total cholesterol were entered separately. Different learning constants of .08, 0.1, and 0.2 were tested and the learning constant of 0.1 was selected. The results showed that prediction accuracy in each networks respectively was 85.75%, 91.25% and 93.75%. The results are shown in Table 2.
The results indicate a strong influence of the sex variable on Cholesterol (P-Value: 0.000), HDL-C (P-Value: 0.000) and triglycerides (P-Value: 0.000) of BMI on triglycerides (P-Value: 0.012), and weight (0.005) and tobacco kind (P-Value: 0.02) on HDL-C. Changes in other factors were not associated with the cardiovascular risk factors. The statistical characteristics of the developed models are shown in Table 3-5.
Table 3: Statistical characteristics of the developed Logistic regression model.
Table 4: Statistical characteristics of the developed Logistic regression model.
Table 5: Statistical characteristics of the developed Logistic regression model.
Several studies have indicated the presence of correlations among variables for example body weight, smoking, age and classic cardiovascular disease risk factors [18-20]; that is, one single factor actually is not the determining factor but rather the comprehensive interaction of all of them together are determining.
In this study, logistic regression as a conventional model obtained poor prediction performance measure values. However, LR distinguished that relationships exist between inputs and dichotomous output variables (sex and BMI in TG and sex, weight and tobacco kind in HDL-C and sex in total cholesterol as more significant parameters). On the other hand, ANNs as a more powerful model showed high response accuracy in predicting. To examine the complex relationship between input variables and output variables, ANNs are broadly used (Nelson & Illingworth) and there are reasons for supporting the ANN-based model. The ANN- based model can handle factors without specifying their complex non-linear relationships. In positions where determining factors are numerous and/or not completely understood or controlled, this is particularly beneficial. However, when wanting to determine the relative influence of individual factors on performance of prediction, this black box nature of ANN-based models may be disadvantageous. As prediction of the influence of lifestyle on cardiovascular risk factors was the objective of this research, the predictive accuracy is the greatest advantage. In such research, the black box nature is not a weakness. Gueli [21] in their research, which had a similar aim, presented that a neural network offered data for a single individual with a high probability (up to 93.33%) [21].
In conclusion, our results are promising and confirm a beneficial role of neural networks in predicting the influence of lifestyle on cardiovascular risk factors. To have the most advantage of data mining techniques, it is suggested to apply two successful data mining tools, neural networks and genetic algorithms to cover the weakness of ANNs.
© 2017 Jahandideh S, et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.