A big data approach for selection of a large osteoarthritis cohort

      Purpose: The Arthritis Foundation initiated this demonstration project with the long-term goal of implementing an innovative big data approach to recruitment for osteoarthritis (OA) clinical trials. A major challenge in OA clinical trials is correctly classifying the OA phenotype for each patient. This abstract describes the first step in this project: to identify and validate a large cohort of OA patients. The data were obtained from the national clinical repository from the Veterans Health Administration (VHA). The VHA is an integrated health care system, consisting of roughly 150 hospitals, 800 community-based outpatient clinics, and 50,000 providers. A single electronic health record system is used to capture a diverse collection of clinical data, including demographics, diagnostic codes, outpatient visits, hospital admissions, physician orders, vital signs, laboratory testing, pharmacy data, health screening, progress notes, and radiology reports.
      Methods: The data for our cohort were obtained from the VA Informatics and Computing Infrastructure (VINCI), which maintains the national VA clinical repository, and makes these data available to scientists within the VHA system. Institutional Review Board (IRB) approval was obtained through the University of Maryland School of Medicine and the Baltimore VA Medical Center. Funding and scientific resources were provided by the Arthritis Foundation (USA) with additional resources provided by the VHA.
      Our inclusion criteria were based on clinical diagnostic guidelines from the American College of Rheumatology (ACR), but were simplified to allow us to complete the task with the data resources available. We started with an initial cohort of people with a diagnosis of OA (ICD 9 code 715) who were treated within the VHA system between January 1, 2000 and December 31, 2014. We further classified the cohort, focusing on OA of the knee and hip. For knee OA, we included people with a diagnosis of OA (ICD 9 code 715), who were at least 50 years old, and who had been treated at least once for knee pain (ICD 9 code 719.46). For hip OA, we included people with a diagnosis of OA (ICD 9 code 715), who never had an erythrocyte sedimentation rate of 20 or above, and who had been treated at least once for hip pain (ICD 9 code 719.45). We validated the cohort through a manual review of 40 charts, with half of the charts randomly selected from the identified OA cohort, and half of the charts randomly selected from among those excluded from the cohort.
      Results: A total of 12,064,025 clinical records were available for this research. The OA identified cohort included 1,147,535 patients, of which 1,073,169 (94%) were male. The median age was 56 (interquartile range 51–68). There were 755,010 (66%) Caucasian, 177,745 (16%) African American, 51,826 (4%) Hispanic, 4,822 (<1%) Asian, and 158,132 (14%) of unknown race/ethnicity. Within the cohort, 11,976 had a history of a lower joint replacement (1%). Our manual review of the cohort included 20 true positives, 13 true negatives, 7 false negatives, and no false positives, resulting in a sensitivity of 74% and specificity of 100%.
      Between the years 2000 and 2014, the cohort accounted for 5,980,233 primary care visits, 884,882 rheumatology clinic visits, and 4,649,522 emergency medical visits within the VHA system. The top five ICD 9 codes for all clinic and emergency medical visits were hypertension (719,558 visits), hyperlipidemia (372,900 visits), diabetes (348,787 visits), lower back pain (222,397 visits), and joint pain (211,739 visits).
      Most patients in the cohort received prescriptions for NSAIDs (919,469, 79%), acetaminophen (874,721, 76%), and opiates (829,063, 72%). Less than half of the cohort received prescriptions for neuropathic pain medication (478,462, 42%), tramadol (431,338, 38%), and muscle relaxants (355,325, 31%). Over-the-counter use of acetaminophen and NSAIDs was not ascertainable.
      Conclusions: This demonstration project illustrates the feasibility of using a big data approach to select a large cohort of OA patients. This approach could improve our understanding of OA and holds promise for understanding OA subphenotypes, development of personalized treatment strategies for OA and facilitation of OA clinical trials. The next steps include combining clinical analytics with available genomic data as well as further validating this work with other datasets that include more females and other demographic diversity.