Enhancing Cross-lingual Semantic Annotations Using Deep Network Sentence Embeddings
|
Beschreibung
This work won the Best Paper Award of HEALTHINF2021. Please see the award announcement.
Annotating documents using concepts of ontologies enhances data quality and interoperability. Such semantic annotations also facilitate the comparison of multiple studies and even cross-lingual results. The FDA therefore requires that all submitted medical forms have to be annotated. In this work we aim at annotating medical forms in German. These standardized forms are used in health care practice and biomedical research and are translated/adapted to various languages. We focus on annotations that cover the whole question in the form as required by the FDA. We need to map these non-English questions to English concepts as many of these concepts do not exist in other languages. Due to the process of translation and adaptation, the corresponding non-English forms deviate from the original forms syntactically. This causes the conventional string matching methods to produce low annotation quality results. Consequently, we propose a new approach that incorporates semantics into the mapping procedure. By utilizing sentence embeddings generated by deep networks in the cross-lingual annotation process, we achieve a recall of 84.62%. This is an improvement of 134% compared to conventional string matching. Likewise, we also achieve an improvement of 51% in precision and 65% in F-measure.
The presentation video can be seen here.