A machine learning-based predictive model for progression of knee osteoarthritis from clinical data

      Purpose: Knee Osteoarthritis (KOA) is a multi-factorial disease. It poses challenges to precisely predict the progression of KOA, which is required for personalized medicine to prevent the deterioration of joint destruction. There are three KOA progression definitions: WOMAC progression refers to a persistent increase of at least nine points under a normalised scale from 0-100 of McMaster Universities Osteoarthritis Index (WOMAC); KL progression is any increase in KL grade (KLG), except for the increase from KLG 0 to KLG 1; Joint Space Width progression (JSW) progression, which is the loss in the medial knee joint space width of at least 0.7mm. In this study, the rate of KOA progression is further classified as rapid and chronic progression (aggravation of KOA conditions within 48 months and between 48 and 96 months respectively). This study aims to develop a patient-specific predictive model for the progression of KOA via a machine learning approach using the database of FNIH OA biomarker consortium.
      Methods: In feature selection, as shown in Figure 1, 36 features were selected from the dataset, which could enhance the prediction result of the supervised machine learning model. In order to maintain an adequate sample size for training the model, the Multivariate Imputation by Chained Equation (MICE) was employed to gave the approximation to the missing entries to complete the dataset.
      It was found that for all the progression types, the Non-progressor Class outnumbered its counterpart in all the instances, resulting in class imbalance. (see Table 1) We developed the ‘Random Feature Generator’, which generated combinations from the feature pool. The outputs were fed into a Logistic Regression model using the 48 and 96-month datasets, cross-validated with 5-fold, finally calculated the average accuracy. Downsampling was used to lower the computational cost. To investigate the performance trend over time, the data points from the 72-month dataset were recruited and compared with 48 and 96-month datasets using paired T-test for all three KOA progression types. In order to examine feature specificity, we extracted the top 50 best performing feature combinations from 48 and 96-month datasets for further analysis. A chart of occurrence frequency of each feature from the 50 selected combinations was plotted. It was hypothesised that the more frequent the feature appears, the more the importance it holds towards KOA progression.
      Results: From Table 2a and 2b, it was found that the top-performing combinations were from KL progression in both datasets. Moreover, it was observed that both models with feature selection outperformed those without in terms of accuracy using Logistic Regression. Table 3 showed the performance of the original Logistic Regression model could be boost by Multi-layer Perceptron in all metrics.
      In Figure 2, all the progression types demonstrated a significant increasing accuracy as time proceeds, with KL progression outperforms other progression types in all instances. It could be inferred that the more frequently a feature appears in the top-performing combinations, the more important it is to the prediction of KOA progression.
      As from Figure 3, it could be observed that the feature frequency distributions for 48 and 96-month datasets were different. However, the most frequently appeared features all fell into the categories of either generic or radiographic information of the subject. It could also be observed features of metabolic syndrome exerted a greater degree of impact to KOA progression in the short term (48 months) than long term (96 months). In short, the above frequency plot could reflect the extent of influence towards KOA progression from different features.
      Conclusions: In this study, we achieved a top model prediction accuracy for rapid and chronic KOA progression of 0.7059 and 0.7272 respectively. It was also found that the prediction accuracy increases as time progresses for all progression types. This may because due to the characteristic of KOA is a degenerative disease, such that signs and symptoms manifest over time. Furthermore, the model predictive capability was discovered to be specific to each time point, meaning features poses varying influence in model predictive prowess chronologically. As the top five most important and highly correlated features for rapid KOA progression predication were: joint space width, OARSI joint space narrowing grade in the medial compartment, sex, race, and medial meniscus extrusion. As for the chronic KOA progression prediction, sex, KL-grade, weight, habit of smoking, and OARSI joint space narrowing grade in lateral compartment were of most importance. Notably, the above features were all from either generic or radiographic information of the subject. In conclusion, the findings in this study may give insights to the treatment protocol for short, mid and long term disease management.