Témata prací (Výběr práce)

Váš prohlížeč nepodporuje JavaScript nebo je jeho podpora vypnutá. Některé funkce nemusejí být dostupné.

Consistency of Linguistic Annotation

Název práce v češtině:	Konzistence lingvistických anotací
Název v anglickém jazyce:	Consistency of Linguistic Annotation
Klíčová slova:	anotace, tokenizace, morfologie, syntax, universal dependencies
Klíčová slova anglicky:	annotation, tokenization, morphology, syntax, universal dependencies
Akademický rok vypsání:	2018/2019
Typ práce:	diplomová práce
Jazyk práce:	angličtina
Ústav:	Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel:	RNDr. Daniel Zeman, Ph.D.
Řešitel:	skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení:	05.12.2018
Datum zadání:	04.02.2019
Datum potvrzení stud. oddělením:	25.04.2019
Datum a čas obhajoby:	11.02.2020 09:00
Datum odevzdání elektronické podoby:	07.01.2020
Datum odevzdání tištěné podoby:	07.01.2020
Oponenti:	doc. RNDr. Markéta Lopatková, Ph.D.

Zásady pro vypracování

Natural language texts manually annotated with linguistic information are indispensable resource for machine learning algorithms. However, even human annotators make occasional errors, or fail to keep consistent approach to borderline situations. The topic of the thesis is to explore methods that automatically identify potential inconsistencies in annotation and, if possible, suggest corrections.

Methods should be as language-neutral as possible. If language-specific or treebank-specific rules are useful, they should be clearly separated from the portable components, and they should be evaluated also separately.

The methods will be tested on one or more treebanks in the Universal Dependencies collection.

Inconsistencies identified by the methods will be manually evaluated on a sample of the data. They will be categorized as annotation errors, unclear cases, problems in the underlying text, and false alarms.

Besides identifying errors, it will be also investigated whether, to what extent and how reliably the errors can be automatically corrected. In particular, whether the correction can be automatically proposed by the tool (as opposed to rules proposed by a human who sees the errors identified by the tool).

Seznam odborné literatury

Marie-Catherine de Marneffe, Matias Grioni, Jenna Kanerva, Filip Ginter (2017): Assessing the Annotation Consistency of the Universal Dependencies Corpora. In: Proceedings of Depling 2017, Pisa, Italy.

Chiara Alzetta, Felice Dell'Orletta, Simonetta Montemagni, Giulia Venturi (2018): Dangerous Relations in Dependency Treebanks. In: Proceedings of TLT 16, Praha, Czechia.

Chiara Alzetta, Felice Dell'Orletta, Simonetta Montemagni, Maria Simi, Giulia Venturi (2018): Assessing the Impact of Incremental Error Detection and Correction. A Case Study on the Italian Universal Dependency Treebank. In: Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), Bruxelles, Belgium.

Héctor Martínez Alonso, Daniel Zeman (2016): Universal Dependencies for the AnCora treebanks. In: Procesamiento del Lenguaje Natural, ISSN 1135-5948, 57, pp. 91-98.

Kira Droganova, Olga Lyashevskaya, Daniel Zeman (2018): Data Conversion and Consistency of Monolingual Corpora: Russian UD Treebanks. In: Proceedings of TLT 17, Oslo, Norway.