Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Methods for Creating Subjectivity Lexicon for Indonesian
Název práce v češtině: Metody vytváření subjektivního slovníku pro indonézštinu
Název v anglickém jazyce: Methods for Creating Subjectivity Lexicon for Indonesian
Akademický rok vypsání: 2012/2013
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: doc. RNDr. Ondřej Bojar, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 01.11.2012
Datum zadání: 06.11.2012
Datum potvrzení stud. oddělením: 21.11.2012
Datum a čas obhajoby: 02.09.2013 00:00
Datum odevzdání elektronické podoby:31.07.2013
Datum odevzdání tištěné podoby:01.08.2013
Datum proběhlé obhajoby: 02.09.2013
Oponenti: doc. RNDr. Vladislav Kuboň, Ph.D.
 
 
 
Konzultanti: Mgr. Kateřina Lesch, Ph.D.
Zásady pro vypracování
Polarity detection, one of the subfields of sentiment analysis, is the task of identifying whether a text fragment (a single sentence or a whole document) expresses positive or negative evaluation. Current successful methods usually rely on a "subjectivity lexicon", i.e. a list of positive and negative words or expressions. Subjectivity lexicons are naturally language-dependent but often also domain-dependent.

The aim of the thesis is to explore methods of creating a subjectivity lexicon for a language that does not have one yet. A number of approaches are possible and they differ in requirements on resources. A dictionary of synonyms, if available, can be used to expand a small seed subjectivity lexicon, a parallel corpus or a machine translation system can transfer a lexicon from another language, frequent words can be extracted from (unlabelled) sample input segments etc.

The thesis will survey existing methods, possibly suggesting a novel one, and apply a promising subset of them on a sample language: Indonesian. An inherent part of the thesis is to construct a small test set with manually annotated segments and evaluate each of the examined methods in terms of precision and recall.

Depending on the availability of data, the thesis may also explore aspects of domain dependence of each of the examined methods.
Seznam odborné literatury
Bakliwal, A., Piyush, A. and V. Varma: Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012.

Banea, C., Mihalcea, R., Wiebe, J.: A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources. In The Proceedings of the Sixth International Conference on Language Resources
and Evaluation (LREC 2008), 2008.

Beigman Klebanov, B., Burstein, J., Madnani, N., Faulkner, A. and J. Tetreault: Building Subjectivity Lexicon(s) From Scratch For Essay Data. CICLING '12, New Delhi, India, 2012.

De Smedt, T. and W. Daelemans. Vreselijk mooi! (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the 8th Language Resources and Evaluation Conference (LREC'12), 2012.

Jijkoun, V., K. Hofmann: Generating a Non-English Subjectivity Lexicon: Relations That Matter. In proceeding of: EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2009.

Maks I., P. Vossen: Building a fine-grained subjectivity lexicon from a web corpus, in: Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012.

Perez-Rosas, V., Banea, C. and R. Mihalcea: Learning Sentiment Lexicons in Spanish. In Proceedings of the 8th international conference on Language Resources and Evaluation (LREC2012), 2012.

Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. Learning subjective language. Computational Linguistics 30 (3), 2004.
 
Univerzita Karlova | Informační systém UK