Robust Parsing of Noisy Content
Název práce v češtině: | Robustní parsing zašuměného obsah |
---|---|
Název v anglickém jazyce: | Robust Parsing of Noisy Content |
Klíčová slova: | závislostní syntax, syntaktická analýza, parsing, doménová adaptace |
Klíčová slova anglicky: | dependency syntax, parsing, domain adaptation |
Akademický rok vypsání: | 2012/2013 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | RNDr. Daniel Zeman, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 06.11.2012 |
Datum zadání: | 08.11.2012 |
Datum potvrzení stud. oddělením: | 27.11.2012 |
Datum a čas obhajoby: | 02.09.2013 00:00 |
Datum odevzdání elektronické podoby: | 02.08.2013 |
Datum odevzdání tištěné podoby: | 02.08.2013 |
Datum proběhlé obhajoby: | 02.09.2013 |
Oponenti: | RNDr. David Mareček, Ph.D. |
Zásady pro vypracování |
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain texts and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on parsing of noisy content, as e.g. user-generated content in services like Twitter.
We will compare various strategies for adaptation to noise and explore whether a text-normalization step based on MT techniques and using parallel data, as has been successfully applied to other tasks such as machine translation and part-of-speech tagging, can be used for parsing. We will further explore semi-supervised and unsupervised methods that do not require parallel data and investigate how a pre-processing step can be integrated with a dependency parser model (MST parser). We will test our approach by comparing various parser configurations on existing datasets for dependency parsing of noisy content (e.g. Twitter messages). |
Seznam odborné literatury |
McDonald, Ryan, et al. "Non-projective dependency parsing using spanning tree algorithms." Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005.
Foster, Jennifer, et al. "# hardtoparse: POS Tagging and Parsing the Twitterverse." proceedings of the Workshop On Analyzing Microtext (AAAI 2011). 2011. Gadde, Phani, L. V. Subramaniam, and Tanveer A. Faruquie. "Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results." Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data. ACM, 2011. Kaufmann, Max, and Jugal Kalita. "Syntactic normalization of Twitter messages." International Conference on Natural Language Processing, Kharagpur, India. 2010. Petrov, Slav, and Ryan McDonald. "Overview of the 2012 shared task on parsing the web." Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL). 2012. |