Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT
Název práce v češtině: | Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT |
---|---|
Název v anglickém jazyce: | Detection and Correction of Inconsistencies in the Multilingual Treebank HamleDT |
Klíčová slova: | závislostní korpusy, detekce chyb, oprava chyb, variační n-gramy |
Klíčová slova anglicky: | dependency treebanks, error detection, error correction, variation n-grams |
Akademický rok vypsání: | 2013/2014 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. Ing. Zdeněk Žabokrtský, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 27.03.2014 |
Datum zadání: | 27.03.2014 |
Datum potvrzení stud. oddělením: | 02.04.2014 |
Datum a čas obhajoby: | 05.06.2015 00:00 |
Datum odevzdání elektronické podoby: | 07.05.2015 |
Datum odevzdání tištěné podoby: | 07.05.2015 |
Datum proběhlé obhajoby: | 05.06.2015 |
Oponenti: | RNDr. David Mareček, Ph.D. |
Zásady pro vypracování |
The goal of the work is to increase the quality of the multilingual treebank HamleDT, which contains dependency syntactic structures for thirty languages. At first, the student will study annotation conventions used in the particular resources integrated in HamleDT. After collecting empirical observations concerning annotation and transformation flaws present in the current version of HamleDT, the student will design criteria for measuring the quality of the HamleDT data from two viewpoints: the data should be maximally consistent within each language, and at the same time the annotation principles used for the individual languages should be unified as much as possible (with the obvious limitations imposed by the typological differences among the languages). The student will implement software tools for detecting and correcting HamleDT inconsistencies and will evaluate their impact on the data quality using statistical measures.
|
Seznam odborné literatury |
Daniel Zeman, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič: HamleDT: To Parse or Not to Parse?. In:Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), Copyright © European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7, pp. 2735-2741, 2012
Markus Dickinson and W. Detmar Meurers (2003). Detecting Inconsistencies in Treebanks. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003). Växjö, Sweden. Václav Novák, Magda Ševčíková: Unsupervised Detection of Annotation Inconsistencies Using Apriori Algorithm. In: Proceedings of the Third Linguistic Annotation Workshop (LAW III) , Copyright © Association for Computational Linguistics, Suntec, Singapore, ISBN 978-1-932432-52-7, pp. 138-141, 2009 Adriane Boyd, Markus Dickinson, and Detmar Meurers (2007). Increasing the recall of corpus annotation error detection. Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007). Bergan, Norway. |