Heart Disease Prediction, Machine Learning, Feature Selection, Data Preprocessing, Support Vector Machine
Abstract
Heart disease remains the leading cause of death worldwide, making the development of efficient diagnostic tools crucial. With the rise of machine learning, data-driven predictive models offer promising avenues for early detection and intervention. This study addresses the challenge by first conducting data preprocessing, including handling missing values, encoding categorical variables, and normalizing numerical features to ensure model quality. Then, exploratory data analysis (EDA) reveals that features such as age, gender, chest pain type, and others are significantly correlated with heart disease. Multiple machine learning algorithms were implemented to compare their performance in heart disease prediction, including Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, and Support Vector Machine (SVM). Finally, feature selection techniques, such as correlation analysis, recursive feature elimination (RFE), and LASSO regression, were employed to optimize the model’s input features. Experimental results indicate that SVM achieved the best predictive performance, with an accuracy of 83.61%. This research demonstrates the potential of machine learning models for heart disease prediction in clinical settings, and future improvements can be made by exploring ensemble learning and deep learning techniques.