Methods for Creating Subjectivity Lexicon for Indonesian
Thesis title in Czech: | Metody vytváření subjektivního slovníku pro indonézštinu |
---|---|
Thesis title in English: | Methods for Creating Subjectivity Lexicon for Indonesian |
Academic year of topic announcement: | 2012/2013 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | doc. RNDr. Ondřej Bojar, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 01.11.2012 |
Date of assignment: | 06.11.2012 |
Confirmed by Study dept. on: | 21.11.2012 |
Date and time of defence: | 02.09.2013 00:00 |
Date of electronic submission: | 31.07.2013 |
Date of submission of printed version: | 01.08.2013 |
Date of proceeded defence: | 02.09.2013 |
Opponents: | doc. RNDr. Vladislav Kuboň, Ph.D. |
Advisors: | Mgr. Kateřina Lesch, Ph.D. |
Guidelines |
Polarity detection, one of the subfields of sentiment analysis, is the task of identifying whether a text fragment (a single sentence or a whole document) expresses positive or negative evaluation. Current successful methods usually rely on a "subjectivity lexicon", i.e. a list of positive and negative words or expressions. Subjectivity lexicons are naturally language-dependent but often also domain-dependent.
The aim of the thesis is to explore methods of creating a subjectivity lexicon for a language that does not have one yet. A number of approaches are possible and they differ in requirements on resources. A dictionary of synonyms, if available, can be used to expand a small seed subjectivity lexicon, a parallel corpus or a machine translation system can transfer a lexicon from another language, frequent words can be extracted from (unlabelled) sample input segments etc. The thesis will survey existing methods, possibly suggesting a novel one, and apply a promising subset of them on a sample language: Indonesian. An inherent part of the thesis is to construct a small test set with manually annotated segments and evaluate each of the examined methods in terms of precision and recall. Depending on the availability of data, the thesis may also explore aspects of domain dependence of each of the examined methods. |
References |
Bakliwal, A., Piyush, A. and V. Varma: Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012.
Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In The Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), 2008. Beigman Klebanov, B., Burstein, J., Madnani, N., Faulkner, A. and J. Tetreault: Building Subjectivity Lexicon(s) From Scratch For Essay Data. CICLING '12, New Delhi, India, 2012. De Smedt, T. and W. Daelemans. Vreselijk mooi! (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012. Jijkoun, V., K. Hofmann: Generating a Non-English Subjectivity Lexicon: Relations That Matter. In proceeding of: EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2009. Maks I., P. Vossen: Building a fine-grained subjectivity lexicon from a web corpus, in: Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012. Perez-Rosas, V., Banea, C. and R. Mihalcea: Learning Sentiment Lexicons in Spanish. In Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012. Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. Learning subjective language. Computational Linguistics 30 (3), 2004. |