Learning semantic and visual similarity for endomicroscopy video retrieval

Barbara Andre, Tom Vercauteren, Anna M. Buchner, Michael B. Wallace, Nicholas Ayache

Research output: Contribution to journalArticle

58 Citations (Scopus)

Abstract

Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

Original languageEnglish (US)
Article number6153380
Pages (from-to)1276-1288
Number of pages13
JournalIEEE Transactions on Medical Imaging
Volume31
Issue number6
DOIs
StatePublished - 2012

Fingerprint

Semantics
Learning
Image retrieval
Distance Education
Distance education
Computer vision
Support vector machines
Language
Physicians

Keywords

  • Bag-of-visual-words (BoW)
  • content-based image retrieval (CBIR)
  • endomicroscopy
  • semantic and visual similarity
  • semantic gap
  • similarity learning

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Radiological and Ultrasound Technology
  • Software

Cite this

Andre, B., Vercauteren, T., Buchner, A. M., Wallace, M. B., & Ayache, N. (2012). Learning semantic and visual similarity for endomicroscopy video retrieval. IEEE Transactions on Medical Imaging, 31(6), 1276-1288. [6153380]. https://doi.org/10.1109/TMI.2012.2188301

Learning semantic and visual similarity for endomicroscopy video retrieval. / Andre, Barbara; Vercauteren, Tom; Buchner, Anna M.; Wallace, Michael B.; Ayache, Nicholas.

In: IEEE Transactions on Medical Imaging, Vol. 31, No. 6, 6153380, 2012, p. 1276-1288.

Research output: Contribution to journalArticle

Andre, B, Vercauteren, T, Buchner, AM, Wallace, MB & Ayache, N 2012, 'Learning semantic and visual similarity for endomicroscopy video retrieval', IEEE Transactions on Medical Imaging, vol. 31, no. 6, 6153380, pp. 1276-1288. https://doi.org/10.1109/TMI.2012.2188301
Andre, Barbara ; Vercauteren, Tom ; Buchner, Anna M. ; Wallace, Michael B. ; Ayache, Nicholas. / Learning semantic and visual similarity for endomicroscopy video retrieval. In: IEEE Transactions on Medical Imaging. 2012 ; Vol. 31, No. 6. pp. 1276-1288.
@article{eb2dea1b8b654dfbbebb163d15c99bb7,
title = "Learning semantic and visual similarity for endomicroscopy video retrieval",
abstract = "Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.",
keywords = "Bag-of-visual-words (BoW), content-based image retrieval (CBIR), endomicroscopy, semantic and visual similarity, semantic gap, similarity learning",
author = "Barbara Andre and Tom Vercauteren and Buchner, {Anna M.} and Wallace, {Michael B.} and Nicholas Ayache",
year = "2012",
doi = "10.1109/TMI.2012.2188301",
language = "English (US)",
volume = "31",
pages = "1276--1288",
journal = "IEEE Transactions on Medical Imaging",
issn = "0278-0062",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Learning semantic and visual similarity for endomicroscopy video retrieval

AU - Andre, Barbara

AU - Vercauteren, Tom

AU - Buchner, Anna M.

AU - Wallace, Michael B.

AU - Ayache, Nicholas

PY - 2012

Y1 - 2012

N2 - Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

AB - Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called Dense-Sift, that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

KW - Bag-of-visual-words (BoW)

KW - content-based image retrieval (CBIR)

KW - endomicroscopy

KW - semantic and visual similarity

KW - semantic gap

KW - similarity learning

UR - http://www.scopus.com/inward/record.url?scp=84861873452&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861873452&partnerID=8YFLogxK

U2 - 10.1109/TMI.2012.2188301

DO - 10.1109/TMI.2012.2188301

M3 - Article

C2 - 22353403

AN - SCOPUS:84861873452

VL - 31

SP - 1276

EP - 1288

JO - IEEE Transactions on Medical Imaging

JF - IEEE Transactions on Medical Imaging

SN - 0278-0062

IS - 6

M1 - 6153380

ER -