Purpose In-training evaluation reports (ITERs) constitute an integral component of medical student and postgraduate physician trainee (resident) assessment. ITER narrative comments have received less attention than the numeric scores. The authors sought both to determine what validity evidence informs the use of narrative comments from ITERs for assessing medical students and residents and to identify evidence gaps. Method Reviewers searched for relevant English-language studies in MEDLINE, EMBASE, Scopus, and ERIC (last search June 5, 2015), and in reference lists and author files. They included all original studies that evaluated ITERs for qualitative assessment of medical students and residents. Working in duplicate, they selected articles for inclusion, evaluated quality, and abstracted information on validity evidence using Kane's framework (inferences of scoring, generalization, extrapolation, and implications). Results Of 777 potential articles, 22 met inclusion criteria. The scoring inference is supported by studies showing that rich narratives are possible, that changing the prompt can stimulate more robust narratives, and that comments vary by context. Generalization is supported by studies showing that narratives reach thematic saturation and that analysts make consistent judgments. Extrapolation is supported by favorable relationships between ITER narratives and numeric scores from ITERs and non-ITER performance measures, and by studies confirming that narratives reflect constructs deemed important in clinical work. Evidence supporting implications is scant. Conclusions The use of ITER narratives for trainee assessment is generally supported, except that evidence is lacking for implications and decisions. Future research should seek to confirm implicit assumptions and evaluate the impact of decisions.
ASJC Scopus subject areas