Introduction: We sought to describe a methodology of crowdsourcing for obtaining quantitative performance ratings of surgeons performing renal artery and vein dissection of robotic partial nephrectomy (RPN). We sought to compare assessment of technical performance obtained from the crowdsourcers with that of surgical content experts (CE). Our hypothesis is that the crowd can score performances of renal hilar dissection comparably to surgical CE using the Global Evaluative Assessment of Robotic Skills (GEARS). Methods: A group of resident and attending robotic surgeons submitted a total of 14 video clips of RPN during hilar dissection. These videos were rated by both crowd and CE for technical skills performance using GEARS. A minimum of 3 CE and 30 Amazon Mechanical Turk crowdworkers evaluated each video with the GEARS scale. Results: Within 13 days, we received ratings of all videos from all CE, and within 11.5 hours, we received 548 GEARS ratings from crowdworkers. Even though CE were exposed to a training module, internal consistency across videos of CE GEARS ratings remained low (ICC = 0.38). Despite this, we found that crowdworker GEARS ratings of videos were highly correlated with CE ratings at both the video level (R = 0.82, p < 0.001) and surgeon level (R = 0.84, p < 0.001). Similarly, crowdworker ratings of the renal artery dissection were highly correlated with expert assessments (R = 0.83, p < 0.001) for the unique surgery-specific assessment question. Conclusions: We conclude that crowdsourced assessment of qualitative performance ratings may be an alternative and/or adjunct to surgical experts' ratings and would provide a rapid scalable solution to triage technical skills.
ASJC Scopus subject areas