Article,

Enhancing cross-lingual biomedical concept normalization using deep neural network pretrained language models

Y. Lin, P. Hoffmann, and E. Rahm.
SN Comput. Sci., (July 2022)

Abstract

AbstractIn this study, we propose a new approach for cross-lingual biomedical concept normalization, the process of mapping text in non-English documents to English concepts of a knowledge base. The resulting mappings, named as semantic annotations, enhance data integration and interoperability of documents in different languages. The US FDA (Food and Drug Administration), therefore, requires all submitted medical forms to be semantically annotated. These standardized medical forms are used in health care practice and biomedical research and are translated/adapted into various languages. Mapping them to the same concepts (normally in English) facilitates the comparison of multiple medical studies even cross-lingually. However, the translation and adaptation of these forms can cause them to deviate from its original text syntactically and in wording. This leads the conventional string matching methods to produce low-quality annotation results. Therefore, our new approach incorporates semantics into the cross-lingual concept normalization process. This is done using sentence embeddings generated by BERT-based pretrained language models. We evaluate the new approach by annotating entire questions of German medical forms with concepts in English, as required by the FDA. The new approach achieves an improvement of 136\% in recall, 52\% in precision and 66\% in F-measure compared to the conventional string matching methods.

BibTeX key: Lin2022-cf
entry type: article
year: 2022
month: 07
journal: SN Comput. Sci.
number: 5
publisher: Springer Science and Business Media LLC
volume: 3
copyright: https://creativecommons.org/licenses/by/4.0
language: en

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{Lin2022-cf, abstract = {AbstractIn this study, we propose a new approach for cross-lingual biomedical concept normalization, the process of mapping text in non-English documents to English concepts of a knowledge base. The resulting mappings, named as semantic annotations, enhance data integration and interoperability of documents in different languages. The US FDA (Food and Drug Administration), therefore, requires all submitted medical forms to be semantically annotated. These standardized medical forms are used in health care practice and biomedical research and are translated/adapted into various languages. Mapping them to the same concepts (normally in English) facilitates the comparison of multiple medical studies even cross-lingually. However, the translation and adaptation of these forms can cause them to deviate from its original text syntactically and in wording. This leads the conventional string matching methods to produce low-quality annotation results. Therefore, our new approach incorporates semantics into the cross-lingual concept normalization process. This is done using sentence embeddings generated by BERT-based pretrained language models. We evaluate the new approach by annotating entire questions of German medical forms with concepts in English, as required by the FDA. The new approach achieves an improvement of 136\% in recall, 52\% in precision and 66\% in F-measure compared to the conventional string matching methods.}, added-at = {2024-09-10T11:56:37.000+0200}, author = {Lin, Ying-Chi and Hoffmann, Phillip and Rahm, Erhard}, biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/29ad127044b0bd70ef664285cbf6f5b2d/scadsfct}, copyright = {https://creativecommons.org/licenses/by/4.0}, interhash = {330b66d5834101361842fbdc2fc02ff7}, intrahash = {9ad127044b0bd70ef664285cbf6f5b2d}, journal = {SN Comput. Sci.}, keywords = {area_bigdata}, language = {en}, month = {07}, number = 5, publisher = {Springer Science and Business Media LLC}, timestamp = {2024-12-06T14:36:51.000+0100}, title = {Enhancing cross-lingual biomedical concept normalization using deep neural network pretrained language models}, volume = 3, year = 2022 }

PUMA

Enhancing cross-lingual biomedical concept normalization using deep neural network pretrained language models

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on