Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Combining text-based and vision-based semantics

Thesis title in Czech:	Combining text-based and vision-based semantics
Thesis title in English:	Combining text-based and vision-based semantics
Key words:	semantics, semantic similarity measurement, text, image, vector space model
English key words:	semantics, semantic similarity measurement, text, image, vector space model
Academic year of topic announcement:	2010/2011
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Formal and Applied Linguistics (32-UFAL)
Supervisor:	RNDr. Martin Holub, Ph.D.
Author:	hidden - assigned and confirmed by the Study Dept.
Date of registration:	12.11.2010
Date of assignment:	12.11.2010
Date and time of defence:	06.09.2011 00:00
Date of electronic submission:	05.08.2011
Date of submission of printed version:	05.08.2011
Date of proceeded defence:	06.09.2011
Opponents:	RNDr. Jana Straková, Ph.D.

Guidelines

Automatic semantic similarity measurement with the support of
unsupervised statistical vector space models triumps in many active
and growing areas of research [1]. The high quality measurement
becomes more and more important in many applied fields.

This thesis is focused on integration of text-based and vision-based
semantics. The Internet provides chances to take
advantages of relation between images and texts [6].

The goal of this thesis is to use both vision and text-based semantics
to create a multimodal semantic space from images and texts, in order
to improve measurement of semantic similarity. Student's task will be
building a semantic space model from corpora [3], [4], and extracting
bags of visual words from images that share some topic characteristics
or themes.

Then the emerging multimodal semantic spaces should be applied to
tasks such as measuring word similarity or concept clustering [2],
that might be in turn helpful in applications such as query
reformulation in information retrieval [5]. The quality of measurement
of semantic similarity should be tested using Rubenstein and
Goodenough similarity ratings, and/or Toefl synonyms testing, and/or
noun/verb/concept clustering.

References

[1] Peter Turney and Patrick Pantel. 2010. From Frequency to Meaning:
Vector Space Models of Semantics. Journal of Artificial Intelligence
Research (JAIR), 37(1):141-188. AI Access Foundation.

[2] Turney, P. D. (2006). Similarity of semantic relations. Computational
Linguistics, 32(3), pp. 379-416.

[3] Baroni, M. and Lenci A. To appear. Distributional Memory: A general
framework for corpus-based semantics. Computational Linguistics, 2010.

[4] Baroni, M. and Zamparelli, R. 2010. Nouns are vectors, adjectives are
matrices: Representing adjective-noun constructions in semantic
space. Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP 2010), East Stroudsburg PA: ACL, 1183-1193.

[5] Manning, C. D., Raghavan, P., and Schuetze, H. 2008. Introduction to
Information Retrieval. Cambridge University Press, Cambridge, UK.

[6] Jonathon S. Hare, Sina Samangooei, Paul H. Lewis, and Mark S. Nixon.
2008. Semantic spaces revisited: investigating the performance of
auto-annotation and semantic retrieval using semantic spaces. In
Proceedings of the 2008 international conference on Content-based
image and video retrieval (CIVR '08). ACM, New York, NY, USA,
359-368.

Preliminary scope of work

The goal of this thesis is to use both vision and text-based semantics
to create a multimodal semantic space from images and texts, in order
to improve measurement of semantic similarity.

Preliminary scope of work in English

The goal of this thesis is to use both vision and text-based semantics
to create a multimodal semantic space from images and texts, in order
to improve measurement of semantic similarity.