Combining text-based and vision-based semantics
Thesis title in Czech: | Combining text-based and vision-based semantics |
---|---|
Thesis title in English: | Combining text-based and vision-based semantics |
Key words: | semantics, semantic similarity measurement, text, image, vector space model |
English key words: | semantics, semantic similarity measurement, text, image, vector space model |
Academic year of topic announcement: | 2010/2011 |
Thesis type: | diploma thesis |
Thesis language: | angličtina |
Department: | Institute of Formal and Applied Linguistics (32-UFAL) |
Supervisor: | RNDr. Martin Holub, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 12.11.2010 |
Date of assignment: | 12.11.2010 |
Date and time of defence: | 06.09.2011 00:00 |
Date of electronic submission: | 05.08.2011 |
Date of submission of printed version: | 05.08.2011 |
Date of proceeded defence: | 06.09.2011 |
Opponents: | RNDr. Jana Straková, Ph.D. |
Guidelines |
Automatic semantic similarity measurement with the support of
unsupervised statistical vector space models triumps in many active and growing areas of research [1]. The high quality measurement becomes more and more important in many applied fields. This thesis is focused on integration of text-based and vision-based semantics. The Internet provides chances to take advantages of relation between images and texts [6]. The goal of this thesis is to use both vision and text-based semantics to create a multimodal semantic space from images and texts, in order to improve measurement of semantic similarity. Student's task will be building a semantic space model from corpora [3], [4], and extracting bags of visual words from images that share some topic characteristics or themes. Then the emerging multimodal semantic spaces should be applied to tasks such as measuring word similarity or concept clustering [2], that might be in turn helpful in applications such as query reformulation in information retrieval [5]. The quality of measurement of semantic similarity should be tested using Rubenstein and Goodenough similarity ratings, and/or Toefl synonyms testing, and/or noun/verb/concept clustering. |
References |
[1] Peter Turney and Patrick Pantel. 2010. From Frequency to Meaning:
Vector Space Models of Semantics. Journal of Artificial Intelligence Research (JAIR), 37(1):141-188. AI Access Foundation. [2] Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), pp. 379-416. [3] Baroni, M. and Lenci A. To appear. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics, 2010. [4] Baroni, M. and Zamparelli, R. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), East Stroudsburg PA: ACL, 1183-1193. [5] Manning, C. D., Raghavan, P., and Schuetze, H. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK. [6] Jonathon S. Hare, Sina Samangooei, Paul H. Lewis, and Mark S. Nixon. 2008. Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces. In Proceedings of the 2008 international conference on Content-based image and video retrieval (CIVR '08). ACM, New York, NY, USA, 359-368. |
Preliminary scope of work |
The goal of this thesis is to use both vision and text-based semantics
to create a multimodal semantic space from images and texts, in order to improve measurement of semantic similarity. |
Preliminary scope of work in English |
The goal of this thesis is to use both vision and text-based semantics
to create a multimodal semantic space from images and texts, in order to improve measurement of semantic similarity. |