A randomized double-blind controlled trial of automated term dissection.

P. L. Elkin, Kent R Bailey, P. V. Ogren, Brent A Bauer, C. G. Chute

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

OBJECTIVE: To compare the accuracy of an automated mechanism for term dissection to represent the semantic dependencies within a compositional expression, with the accuracy of a practicing Internist to perform this same task. We also compare the results of four evaluators to determine the inter-observer variability and the variance between term sets, with respect to the accuracy of the mappings and the consistency of the failure analysis. METHODS: 500 terms, which required a compositional expression to effect an exact match, were randomly distributed into two sets of 250 terms (Set A and Set B). Set A was dissected using the Automated Term Dissection (ATD) Algorithm. A physician specializing in Internal Medicine dissected set B. He had no prior knowledge of the dissection algorithm or how it functioned. In this manuscript, the authors use Human Term Dissection (HTD) to refer to this method. Set A was randomized to two sets of 125 terms (Set A1 and Set A2). Set B was randomized to two sets of 125 terms (Set B1 and Set B2). A new set of 250 terms Set C was created from Set A1 and Set B2. A second new set of 250 terms Set D was created from Set A2 and Set B1. Two expert Indexers reviewed Set C and another two expert Indexers reviewed Set D. They were blinded to which terms were dissected by the clinician and which terms were dissected by the automated term dissection algorithm. The person providing the files for review to the Indexers was also unaware of which terms were dissected by ATD vs. the HTD method. The Indexers recorded whether or not the dissection was the best possible representation of the input concept. If not, a failure analysis was conducted. They recorded whether or not the dissection was in error and if so was a modifier not subsumed or was a Kernel concept subsumed when it should not have been. If a concept was missing, the Indexers recorded whether it was a Kernel concept, a modifier, a qualifier or a negative qualifier. RESULTS: The ATD method was judged to be accurate and readable in 265 out of the 424 terms with adequate content (62.7%). The HTD method was judged to be accurate in 272 out of 414 terms with adequate content (65.7%). There was no statistically significant difference between the rates of acceptability of the ATD and HTD methods (p = 0.33). There was a non-significant trend toward greater acceptability of the ATD method in the subgroup of terms with three or more compositional elements. ATD was acceptable in 53.6% of the terms where the HTD was only acceptable in 43.6% (p = 0.11). The failure analysis showed that both methods misrepresented kernel concepts and modifiers much more commonly than qualifiers (p < 0.001). CONCLUSIONS: There is no statistically significant difference in the accuracy and readability of terms dissected using the automated term dissection method when compared with human term dissection, as judged by four expert medical indexers. There is a non-significant trend toward improved performance of the ATD method in the subset of more complex terms. The authors submit that this may be due to a tendency for users to be less compulsive when the time to complete the task is long. Automated term dissection is a useful and perhaps preferable method for representing readable and accurate compound terminological expressions.

Original languageEnglish (US)
Pages (from-to)62-66
Number of pages5
JournalProceedings / AMIA ... Annual Symposium. AMIA Symposium
StatePublished - 1999

Fingerprint

Dissection
varespladib methyl
Observer Variation
Internal Medicine
Semantics

Cite this

A randomized double-blind controlled trial of automated term dissection. / Elkin, P. L.; Bailey, Kent R; Ogren, P. V.; Bauer, Brent A; Chute, C. G.

In: Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1999, p. 62-66.

Research output: Contribution to journalArticle

@article{759468b1aca34064b5a24b52e6d0782d,
title = "A randomized double-blind controlled trial of automated term dissection.",
abstract = "OBJECTIVE: To compare the accuracy of an automated mechanism for term dissection to represent the semantic dependencies within a compositional expression, with the accuracy of a practicing Internist to perform this same task. We also compare the results of four evaluators to determine the inter-observer variability and the variance between term sets, with respect to the accuracy of the mappings and the consistency of the failure analysis. METHODS: 500 terms, which required a compositional expression to effect an exact match, were randomly distributed into two sets of 250 terms (Set A and Set B). Set A was dissected using the Automated Term Dissection (ATD) Algorithm. A physician specializing in Internal Medicine dissected set B. He had no prior knowledge of the dissection algorithm or how it functioned. In this manuscript, the authors use Human Term Dissection (HTD) to refer to this method. Set A was randomized to two sets of 125 terms (Set A1 and Set A2). Set B was randomized to two sets of 125 terms (Set B1 and Set B2). A new set of 250 terms Set C was created from Set A1 and Set B2. A second new set of 250 terms Set D was created from Set A2 and Set B1. Two expert Indexers reviewed Set C and another two expert Indexers reviewed Set D. They were blinded to which terms were dissected by the clinician and which terms were dissected by the automated term dissection algorithm. The person providing the files for review to the Indexers was also unaware of which terms were dissected by ATD vs. the HTD method. The Indexers recorded whether or not the dissection was the best possible representation of the input concept. If not, a failure analysis was conducted. They recorded whether or not the dissection was in error and if so was a modifier not subsumed or was a Kernel concept subsumed when it should not have been. If a concept was missing, the Indexers recorded whether it was a Kernel concept, a modifier, a qualifier or a negative qualifier. RESULTS: The ATD method was judged to be accurate and readable in 265 out of the 424 terms with adequate content (62.7{\%}). The HTD method was judged to be accurate in 272 out of 414 terms with adequate content (65.7{\%}). There was no statistically significant difference between the rates of acceptability of the ATD and HTD methods (p = 0.33). There was a non-significant trend toward greater acceptability of the ATD method in the subgroup of terms with three or more compositional elements. ATD was acceptable in 53.6{\%} of the terms where the HTD was only acceptable in 43.6{\%} (p = 0.11). The failure analysis showed that both methods misrepresented kernel concepts and modifiers much more commonly than qualifiers (p < 0.001). CONCLUSIONS: There is no statistically significant difference in the accuracy and readability of terms dissected using the automated term dissection method when compared with human term dissection, as judged by four expert medical indexers. There is a non-significant trend toward improved performance of the ATD method in the subset of more complex terms. The authors submit that this may be due to a tendency for users to be less compulsive when the time to complete the task is long. Automated term dissection is a useful and perhaps preferable method for representing readable and accurate compound terminological expressions.",
author = "Elkin, {P. L.} and Bailey, {Kent R} and Ogren, {P. V.} and Bauer, {Brent A} and Chute, {C. G.}",
year = "1999",
language = "English (US)",
pages = "62--66",
journal = "Proceedings / AMIA . Annual Symposium. AMIA Symposium",
issn = "1531-605X",
publisher = "Hanley & Belfus",

}

TY - JOUR

T1 - A randomized double-blind controlled trial of automated term dissection.

AU - Elkin, P. L.

AU - Bailey, Kent R

AU - Ogren, P. V.

AU - Bauer, Brent A

AU - Chute, C. G.

PY - 1999

Y1 - 1999

N2 - OBJECTIVE: To compare the accuracy of an automated mechanism for term dissection to represent the semantic dependencies within a compositional expression, with the accuracy of a practicing Internist to perform this same task. We also compare the results of four evaluators to determine the inter-observer variability and the variance between term sets, with respect to the accuracy of the mappings and the consistency of the failure analysis. METHODS: 500 terms, which required a compositional expression to effect an exact match, were randomly distributed into two sets of 250 terms (Set A and Set B). Set A was dissected using the Automated Term Dissection (ATD) Algorithm. A physician specializing in Internal Medicine dissected set B. He had no prior knowledge of the dissection algorithm or how it functioned. In this manuscript, the authors use Human Term Dissection (HTD) to refer to this method. Set A was randomized to two sets of 125 terms (Set A1 and Set A2). Set B was randomized to two sets of 125 terms (Set B1 and Set B2). A new set of 250 terms Set C was created from Set A1 and Set B2. A second new set of 250 terms Set D was created from Set A2 and Set B1. Two expert Indexers reviewed Set C and another two expert Indexers reviewed Set D. They were blinded to which terms were dissected by the clinician and which terms were dissected by the automated term dissection algorithm. The person providing the files for review to the Indexers was also unaware of which terms were dissected by ATD vs. the HTD method. The Indexers recorded whether or not the dissection was the best possible representation of the input concept. If not, a failure analysis was conducted. They recorded whether or not the dissection was in error and if so was a modifier not subsumed or was a Kernel concept subsumed when it should not have been. If a concept was missing, the Indexers recorded whether it was a Kernel concept, a modifier, a qualifier or a negative qualifier. RESULTS: The ATD method was judged to be accurate and readable in 265 out of the 424 terms with adequate content (62.7%). The HTD method was judged to be accurate in 272 out of 414 terms with adequate content (65.7%). There was no statistically significant difference between the rates of acceptability of the ATD and HTD methods (p = 0.33). There was a non-significant trend toward greater acceptability of the ATD method in the subgroup of terms with three or more compositional elements. ATD was acceptable in 53.6% of the terms where the HTD was only acceptable in 43.6% (p = 0.11). The failure analysis showed that both methods misrepresented kernel concepts and modifiers much more commonly than qualifiers (p < 0.001). CONCLUSIONS: There is no statistically significant difference in the accuracy and readability of terms dissected using the automated term dissection method when compared with human term dissection, as judged by four expert medical indexers. There is a non-significant trend toward improved performance of the ATD method in the subset of more complex terms. The authors submit that this may be due to a tendency for users to be less compulsive when the time to complete the task is long. Automated term dissection is a useful and perhaps preferable method for representing readable and accurate compound terminological expressions.

AB - OBJECTIVE: To compare the accuracy of an automated mechanism for term dissection to represent the semantic dependencies within a compositional expression, with the accuracy of a practicing Internist to perform this same task. We also compare the results of four evaluators to determine the inter-observer variability and the variance between term sets, with respect to the accuracy of the mappings and the consistency of the failure analysis. METHODS: 500 terms, which required a compositional expression to effect an exact match, were randomly distributed into two sets of 250 terms (Set A and Set B). Set A was dissected using the Automated Term Dissection (ATD) Algorithm. A physician specializing in Internal Medicine dissected set B. He had no prior knowledge of the dissection algorithm or how it functioned. In this manuscript, the authors use Human Term Dissection (HTD) to refer to this method. Set A was randomized to two sets of 125 terms (Set A1 and Set A2). Set B was randomized to two sets of 125 terms (Set B1 and Set B2). A new set of 250 terms Set C was created from Set A1 and Set B2. A second new set of 250 terms Set D was created from Set A2 and Set B1. Two expert Indexers reviewed Set C and another two expert Indexers reviewed Set D. They were blinded to which terms were dissected by the clinician and which terms were dissected by the automated term dissection algorithm. The person providing the files for review to the Indexers was also unaware of which terms were dissected by ATD vs. the HTD method. The Indexers recorded whether or not the dissection was the best possible representation of the input concept. If not, a failure analysis was conducted. They recorded whether or not the dissection was in error and if so was a modifier not subsumed or was a Kernel concept subsumed when it should not have been. If a concept was missing, the Indexers recorded whether it was a Kernel concept, a modifier, a qualifier or a negative qualifier. RESULTS: The ATD method was judged to be accurate and readable in 265 out of the 424 terms with adequate content (62.7%). The HTD method was judged to be accurate in 272 out of 414 terms with adequate content (65.7%). There was no statistically significant difference between the rates of acceptability of the ATD and HTD methods (p = 0.33). There was a non-significant trend toward greater acceptability of the ATD method in the subgroup of terms with three or more compositional elements. ATD was acceptable in 53.6% of the terms where the HTD was only acceptable in 43.6% (p = 0.11). The failure analysis showed that both methods misrepresented kernel concepts and modifiers much more commonly than qualifiers (p < 0.001). CONCLUSIONS: There is no statistically significant difference in the accuracy and readability of terms dissected using the automated term dissection method when compared with human term dissection, as judged by four expert medical indexers. There is a non-significant trend toward improved performance of the ATD method in the subset of more complex terms. The authors submit that this may be due to a tendency for users to be less compulsive when the time to complete the task is long. Automated term dissection is a useful and perhaps preferable method for representing readable and accurate compound terminological expressions.

UR - http://www.scopus.com/inward/record.url?scp=0033258137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033258137&partnerID=8YFLogxK

M3 - Article

C2 - 10566321

AN - SCOPUS:0033258137

SP - 62

EP - 66

JO - Proceedings / AMIA . Annual Symposium. AMIA Symposium

JF - Proceedings / AMIA . Annual Symposium. AMIA Symposium

SN - 1531-605X

ER -