MayoBMI at ImageCLEF 2016 handwritten document retrieval task

Sijia Liu, Yanshan Wang, Saeed Mehrabi, Dingcheng Li, Hongfang D Liu

Research output: Contribution to journalArticle

Abstract

In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a sufix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.

Original languageEnglish (US)
Pages (from-to)347-355
Number of pages9
JournalCEUR Workshop Proceedings
Volume1609
StatePublished - 2016

Fingerprint

Image retrieval
Information retrieval
Classifiers

Keywords

  • Handwriting recognition
  • Hyphenation detection
  • Text retrieval

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

MayoBMI at ImageCLEF 2016 handwritten document retrieval task. / Liu, Sijia; Wang, Yanshan; Mehrabi, Saeed; Li, Dingcheng; Liu, Hongfang D.

In: CEUR Workshop Proceedings, Vol. 1609, 2016, p. 347-355.

Research output: Contribution to journalArticle

Liu, S, Wang, Y, Mehrabi, S, Li, D & Liu, HD 2016, 'MayoBMI at ImageCLEF 2016 handwritten document retrieval task', CEUR Workshop Proceedings, vol. 1609, pp. 347-355.
Liu, Sijia ; Wang, Yanshan ; Mehrabi, Saeed ; Li, Dingcheng ; Liu, Hongfang D. / MayoBMI at ImageCLEF 2016 handwritten document retrieval task. In: CEUR Workshop Proceedings. 2016 ; Vol. 1609. pp. 347-355.
@article{ac58f3faf9af45cb8adda38361eb1157,
title = "MayoBMI at ImageCLEF 2016 handwritten document retrieval task",
abstract = "In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a sufix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.",
keywords = "Handwriting recognition, Hyphenation detection, Text retrieval",
author = "Sijia Liu and Yanshan Wang and Saeed Mehrabi and Dingcheng Li and Liu, {Hongfang D}",
year = "2016",
language = "English (US)",
volume = "1609",
pages = "347--355",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - MayoBMI at ImageCLEF 2016 handwritten document retrieval task

AU - Liu, Sijia

AU - Wang, Yanshan

AU - Mehrabi, Saeed

AU - Li, Dingcheng

AU - Liu, Hongfang D

PY - 2016

Y1 - 2016

N2 - In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a sufix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.

AB - In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a sufix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.

KW - Handwriting recognition

KW - Hyphenation detection

KW - Text retrieval

UR - http://www.scopus.com/inward/record.url?scp=85019593233&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019593233&partnerID=8YFLogxK

M3 - Article

VL - 1609

SP - 347

EP - 355

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -