Automated Detection of Patellofemoral Osteoarthritis from Knee Lateral View Radiographs Using Deep Learning: Data from the Multicenter Osteoarthritis Study (MOST)

Objective: To assess the ability of imaging-based deep learning to predict radiographic patellofemoral osteoarthritis (PFOA) from knee lateral view radiographs. Design: Knee lateral view radiographs were extracted from The Multicenter Osteoarthritis Study (MOST) (n = 18,436 knees). Patellar region-of-interest (ROI) was first automatically detected, and subsequently, end-to-end deep convolutional neural networks (CNNs) were trained and validated to detect the status of patellofemoral OA. Patellar ROI was detected using deep-learning-based object detection method. Manual PFOA status assessment provided in the MOST dataset was used as a classification outcome for the CNNs. Performance of prediction models was assessed using the area under the receiver operating characteristic curve (ROC AUC) and the average precision (AP) obtained from the precision-recall (PR) curve in the stratified 5-fold cross validation setting. Results: Of the 18,436 knees, 3,425 (19%) had PFOA. AUC and AP for the reference model including age, sex, body mass index (BMI), the total Western Ontario and McMaster Universities Arthritis Index (WOMAC) score, and tibiofemoral Kellgren-Lawrence (KL) grade to predict PFOA were 0.806 and 0.478, respectively. The CNN model that used only image data significantly improved the prediction of PFOA status (ROC AUC= 0.958, AP= 0.862). Conclusion: We present the first machine learning based automatic PFOA detection method. Furthermore, our deep learning based model trained on patella region from knee lateral view radiographs performs better at predicting PFOA than models based on patient characteristics and clinical assessments.


Introduction
Plain radiography is commonly used in diagnostics of osteoarthritis (OA) because it is cheap, fast, and widely available. Both clinical practice and the majority of research studies in OA have traditionally concentrated on the tibiofemoral (TF) joint. Frontal plane radiography (postero-anterior (PA) view) is routinely used to evaluate the tibiofemoral joint. However, patellofemoral (PF) joint is the most frequently affected compartment by OA and yet it often remains unrecognized 1 . Moreover, patellofemoral osteoarthritis (PFOA) is both highly prevalent 2,3 and clinically important because it is more strongly associated with knee OA symptoms than tibiofemoral OA 4 . PFOA can occur in the absence of tibiofemoral OA and also in conjunction with it 5 . Actually, some studies suggest that OA is more likely to start in the patellofemoral joint and only then extend to the tibiofemoral joint [6][7][8] . Several studies have found that radiographic PFOA cannot be identified using only subject's characteristics and clinical assessments. Therefore, imaging data is needed for diagnosis of PFOA 9-12 .
However, the patellofemoral joint cannot be evaluated from the most commonly used frontal plane radiography. Consequently, previous studies suggested that PF joint should routinely be considered in knee OA studies by obtaining multiple radiographic views of the knee 13,14 ; otherwise 4-7% of OA cases would be missed 15 .
Patellofemoral OA may be an indicator for an early disease process and therefore a possible target for early intervention 8,16 , which is one of the high priority research areas of The European League Against Rheumatism (EULAR) 13 . Additionally, identifying PFOA is important for surgical approaches and rehabilitative treatments 17 . However, there is a lack of consistency among researchers and clinicians in grading the patellofemoral joint OA status 5 . Since clinical features cannot be used to diagnose PFOA 9-12 clear diagnostic guidelines are missing 17 . Thus, an accurate analysis of PFOA from imaging data is of high importance to better understand OA and especially its early stages. In the current study, we describe the first fully automated method to detect PFOA  Knees with missing data (radiographs, PFOA status), non-standard Kellgren and Lawrence (KL) scores, and low confidence of patella ROI detector were excluded.
(ROI) detector were excluded ( Figure 2). As such, the final subset for assessing PFOA status within the period from baseline to 84 months included 18,436 knees from 2,803 subjects (Table 1).

Automatic Detection of Patellar Region-of-Interest
Prior to extraction of the ROIs, the 16-bit DICOM images were normalized using global contrast normalisation and a histogram truncation between the 5 th and 99 th percentiles. These images were eventually converted to 8-bit images (0 − 255 grayscale range). The image spatial resolution, which was not standardized in the database, was now standardized to 0.2 mm using a bicubic interpolation. Right knee images were then horizontally flipped to match the left knee orientation.
State-of-the-art CNN-based object detection algorithm based on a Faster R-CNN design 18,19 was used to automatically detect the patellar ROI from lateral view radiographs. 596 knee radiographs were manually annotated to train the model. We used rectangular ROIs to cover patella. We initialized the weights from backbone models pre-trained on COCO (a large-scale object detection)  Figure S1 and S2). By setting a minimum threshold of 90% certainty, only 31 patellar ROI out of 18,467 knees (0.17 %) were missed.

Predicting Patellofemoral Osteoarthritis Status Using Deep CNN
We used patellar ROI for predicting the PFOA status using a second deep CNN ( Figure 1). Our CNN model consists of 3 convolutional layers. Each convolution layer (stride= 1, padding= 1) is followed by Batch normalization (BN), max pooling (2 × 2) and ReLU. We used two fully connected layers to make the prediction. A dropout of 0.5 is inserted after the first fully connected layer.
We trained the models from the scratch (end-to-end) using the random weight initialization. Pre-trained models were not utilized due to custom size of our input (patellar ROI). An input image size of 128 × 64 was utilized, and we adopted stochastic gradient descent training on a GPU. A mini-batch of 64 images were employed, and a momentum of 0.9 was used and trained without weight decay. A starting learning rate of 0.001 was first used and decreased by 10 every 8 epochs. The models were trained for 20 epochs.

Reference Models
We compared our CNN method with more conventional machine learning based prediction models using the clinical data.  23 to find the optimal parameters of the models. We also analysed the feature importance of the reference models using SHAP library 24 .

Statistical Analyses
In order to obtain unbiased estimation of future performance, subject-wise

Cross-validation Results
Of The performances of all the models are summarized in Figure 3.  Figure 3). We obtained a statistically significant performance difference in AUC (DeLong's p-value <  These results are shown in Table 2 and in Supplementary Figure S4.

Multi-modal Model: Combination of CNN Model, Clinical Features and Patient Characteristics
We also developed a GBM model that combine the predictions of the CNN model -the probability of PFOA -with age, sex, BMI, WOMAC, and KL grade.
We used the same 5-fold stratified cross validation setup. However, this fusion did not introduce any performance increase compared to the image-based CNN model alone (Supplementary Figure S5).

Discussion
In this study, we developed a deep learning method for the assessment of radiographic patellofemoral OA status from knee lateral view radiographs and assessed the ability of deep imaging features to predict PFOA. The trained models were evaluated in patient-wise stratified cross validation setting to assess its robustness. The discriminative ability of the final model was high (AUC 0.958).
Given the high prevalence of PFOA 2,3 , there is a need to consider also the patellofemoral joint in knee OA research and clinical settings. In the earlier literature, prediction models based on clinical features and patient characteristics were studied for radiographic PFOA 9-12 . They all have the same conclusion that confident diagnosis of radiographic PFOA is not possible with clinical signs and patient characteristics alone, and thus, imaging is necessary to confirm diagnosis. From the experiments, we also found that the diagnostic accuracy of such models were only modest. To the best of our knowledge, this is the first study to automatically detect radiographic PFOA from imaging data. Therefore, we believe that it adds a new tool for early OA diagnostics since the disease often starts from the patellofemoral joint 6-8 .
To assess the potential bias of the trained CNN model, we stratified the data according to the stages of tibiofemoral OA (KL grade) and pain level (WOMAC). ROC AUC values among different groups were similar, whereas AP score increases with severe pain. This could be an indication that there is an association of high pain and PFOA, which has been reported previously [27][28][29] , and our CNN model captures some of the symptom-related features from the image data. It is also notable that the combination of patient characteristics and clinical features did not improve the CNN model's performance further, and the feature importance analysis of the multi-modal model (Supplementary Figure  S3) showed that CNN's image-based predictions had the strongest impact onto the output.
Major limitation of this study is that we used the MOST (Multi-center Osteoarthritis study) data alone. It is well known that the generalizability of our approach would have been better if we could have utilized two independent datasets for training and testing. Since PFOA is largely unrecognised there is a lack of available data sets that allow the evaluation of PFOA detection from lateral (or skyline) view radiographs. Therefore, we had to use the stratified cross-validation setup with the MOST data. Another limitation of the study is model explanations. While we are providing the first results in automatic radiographic PFOA prediction from imaging data, we did not provide a "understanding" to characterize the CNN model's "black box" behavior. Despite the efforts like attention maps 30 , this post-hoc visualization method does not explain the reasoning process of how a network actually makes its decisions.
Therefore, further work is needed for interpreting a deep neural network model and explaining its predictions in the context of PFOA.
In conclusion, this study demonstrated the first results for automatic detection of radiographic PFOA from knee lateral view radiographs using deep learning. Our model had superior discriminative ability over models using patient characteristics and clinical assessments. Our model could be valuable when building prediction tools for early OA, for surgical approaches, and for rehabilitative treatments. investigators.