Abstract| Volume 30, SUPPLEMENT 1, S22, April 2022


      Purpose: The need for health systems to shift from late, reactive care of osteoarthritis (OA) to earlier, preventative strategies is widely acknowledged. Rigorously derived and validated models capable of predicting future individual-level risk of OA incidence in the general population, based on affordable and easily accessible sources of data, could play an important role in this endeavour. Our objective was to critically synthesise published evidence on the performance of multivariable prediction models for OA incidence and their applicability to large-scale use in the general population.
      Methods: For this systematic review with narrative synthesis, we searched MEDLINE, EMBASE and Web of Science from inception to November 2020, and supplemented this with reference list screening, citation searches, and hand-searches. We included longitudinal studies conducted in a general population sample that reported the derivation, comparison, or validation of a multivariable prediction model to predict individual risk of future OA incidence, defined by recognised clinical or imaging criteria. We excluded studies reporting prognostic models in populations with prevalent OA at baseline and those with joint arthroplasty as the sole outcome. Pairs of reviewers independently performed article selection, data extraction, and risk of bias assessment using PROBAST. We summarised evidence on model performance and calibration, as well as describing the types of predictors included in final models and how they were assessed. Our review was prospectively registered on PROSPERO (CRD42020220446).
      Results: Of 6,462 records identified, 21 original research articles published between 2010-2020 were eligible and included (Table 1). From these we extracted data on 26 final multivariable prediction models for incident knee OA (18), hip OA (4), hand OA (3), and any-site OA (1). The most common outcome was incident OA defined by plain radiography. Other outcomes included first OA diagnosis in the electronic health record, symptomatic radiographic OA, frequent pain in the target joint, and American College of Rheumatology clinical classification criteria. The median prediction horizon was 8 years (range 2 to 41 years), median number of participants/joints with the outcome of interest was 99 (range 27 to 12,803), and the median number of predictors included in the final models was 5.5 (range 3 to 13). Models used multiple modes of assessment for predictors (self-report (25 models), physical examination (21), imaging (12), urinary/serum biomarkers (6), electronic health record (2)). Age, body mass index (BMI), previous injury, and (occupational) physical exposures were commonly included predictors but there was heterogeneous predictor measurement within most domains and the majority of predictor variables were included in only a single final model. Beyond age, sex, and occupational exposures, educational level was the only other social stratifier, which was included in 5 final models. No final models included race/ethnicity, indicators of individual socioeconomic position, or measures of area-level deprivation. All except 3 final models used either internal validation processes, e.g. cross-validation/bootstrapping (14), external validation in a separate cohort (7), or both (2). Model performance for 25 of the 26 models was presented by Area under the Curve (AUC). Median performance for knee, hip and hand OA was 0.72, 0.76, 0.62, respectively. The one model for any-site OA had an AUC of 0.84. All but one model was judged to have high overall risk of bias. Common reasons for this were the use of univariate analysis in predictor selection and lack of accounting for competing risks. Many models also lacked clear applicability for large-scale use in the general population, for instance by requiring imaging, or restricting model derivation to a specific subpopulation.
      Conclusions: Of the 21 studies found and included in our review, 15 were published within the past 4 years, suggesting increasing interest among researchers in predicting individual-level risk for OA. The widespread use of internal and external validation is encouraging and in general the level of discrimination appears comparable to established risk prediction models for cardiovascular outcomes. However, models published to date remain heavily focussed on knee OA and have relied on a relatively small number of underlying cohort datasets. A relative lack of OA-relevant predictors and outcomes recorded in routine datasets may be one reason for this. Furthermore, our systematic review highlights common shortfalls in applicability, suggesting that many models are not designed (nor yet intended) for mass application. Future studies could be enhanced with the standard inclusion of key social stratifiers (e.g. race/ethnicity) to explicitly encourage an equity lens in this field, and by greater patient, public, and stakeholder involvement with a view to clarifying and strengthening intended ‘real-world’ application.