Thesis (Selection of subject)Thesis (Selection of subject)(version: 381)
Thesis details
   Login via CAS
Methods for Creating Subjectivity Lexicon for Indonesian
Thesis title in Czech: Metody vytváření subjektivního slovníku pro indonézštinu
Thesis title in English: Methods for Creating Subjectivity Lexicon for Indonesian
Academic year of topic announcement: 2012/2013
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor: doc. RNDr. Ondřej Bojar, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 01.11.2012
Date of assignment: 06.11.2012
Confirmed by Study dept. on: 21.11.2012
Date and time of defence: 02.09.2013 00:00
Date of electronic submission:31.07.2013
Date of submission of printed version:01.08.2013
Date of proceeded defence: 02.09.2013
Opponents: doc. RNDr. Vladislav Kuboň, Ph.D.
 
 
 
Advisors: Mgr. Kateřina Lesch, Ph.D.
Guidelines
Polarity detection, one of the subfields of sentiment analysis, is the task of identifying whether a text fragment (a single sentence or a whole document) expresses positive or negative evaluation. Current successful methods usually rely on a "subjectivity lexicon", i.e. a list of positive and negative words or expressions. Subjectivity lexicons are naturally language-dependent but often also domain-dependent.

The aim of the thesis is to explore methods of creating a subjectivity lexicon for a language that does not have one yet. A number of approaches are possible and they differ in requirements on resources. A dictionary of synonyms, if available, can be used to expand a small seed subjectivity lexicon, a parallel corpus or a machine translation system can transfer a lexicon from another language, frequent words can be extracted from (unlabelled) sample input segments etc.

The thesis will survey existing methods, possibly suggesting a novel one, and apply a promising subset of them on a sample language: Indonesian. An inherent part of the thesis is to construct a small test set with manually annotated segments and evaluate each of the examined methods in terms of precision and recall.

Depending on the availability of data, the thesis may also explore aspects of domain dependence of each of the examined methods.
References
Bakliwal, A., Piyush, A. and V. Varma: Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012.

Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In The Proceedings of the Sixth International Conference on Language Resources
and Evaluation (LREC 2008), 2008.

Beigman Klebanov, B., Burstein, J., Madnani, N., Faulkner, A. and J. Tetreault: Building Subjectivity Lexicon(s) From Scratch For Essay Data. CICLING '12, New Delhi, India, 2012.

De Smedt, T. and W. Daelemans. Vreselijk mooi! (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012.

Jijkoun, V., K. Hofmann: Generating a Non-English Subjectivity Lexicon: Relations That Matter. In proceeding of: EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2009.

Maks I., P. Vossen: Building a fine-grained subjectivity lexicon from a web corpus, in: Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012.

Perez-Rosas, V., Banea, C. and R. Mihalcea: Learning Sentiment Lexicons in Spanish. In Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012.

Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. Learning subjective language. Computational Linguistics 30 (3), 2004.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html