Background: Periprosthetic joint infection (PJI) data elements are contained in both structured and unstructured documents in electronic health records and require manual data collection. The goal of this study is to develop a natural language processing (NLP) algorithm to replicate manual chart review for PJI data elements. Methods: PJI was identified among all total joint arthroplasty (TJA) procedures performed at a single academic institution between 2000 and 2017. Data elements that comprise the Musculoskeletal Infection Society (MSIS) criteria were manually extracted and used as the gold standard for validation. A training sample of 1208 TJA surgeries (170 PJI cases) was randomly selected to develop the prototype NLP algorithms and an additional 1179 surgeries (150 PJI cases) were randomly selected as the test sample. The algorithms were applied to all consultation notes, operative notes, pathology reports, and microbiology reports to predict the correct status of PJI based on MSIS criteria. Results: The algorithm, which identified patients with PJI based on MSIS criteria, achieved an f1-score (harmonic mean of precision and recall) of 0.911. Algorithm performance in extracting the presence of sinus tract, purulence, pathologic documentation of inflammation, and growth of cultured organisms from the involved TJA achieved f1-scores that ranged from 0.771 to 0.982, sensitivity that ranged from 0.730 to 1.000, and specificity that ranged from 0.947 to 1.000. Conclusion: NLP-enabled algorithms have the potential to automate data collection for PJI diagnostic elements, which could directly improve patient care and augment cohort surveillance and research efforts. Further validation is needed in other hospital settings. Level of Evidence: Level III, Diagnostic.
- artificial intelligence
- electronic health records
- natural language processing
- periprosthetic joint infection
- total joint arthroplasty
ASJC Scopus subject areas
- Orthopedics and Sports Medicine