Semantic relation extraction from unstructured data in the business domain
Název práce v češtině: | Extrakce sémantických vztahů z nestrukturovaných dat v komerční sféře |
---|---|
Název v anglickém jazyce: | Semantic relation extraction from unstructured data in the business domain |
Klíčová slova: | Nestrukturovaná Data, Získavání informací, Určování vztahů mezi entitami, Textová analytika, Distant Supervision, Snowball |
Klíčová slova anglicky: | Unstructured data, Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball |
Akademický rok vypsání: | 2014/2015 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Ústav formální a aplikované lingvistiky (32-UFAL) |
Vedoucí / školitel: | doc. RNDr. Pavel Pecina, Ph.D. |
Řešitel: | skrytý - zadáno a potvrzeno stud. odd. |
Datum přihlášení: | 27.03.2015 |
Datum zadání: | 27.03.2015 |
Datum potvrzení stud. oddělením: | 15.07.2015 |
Datum a čas obhajoby: | 08.06.2016 09:00 |
Datum odevzdání elektronické podoby: | 13.05.2016 |
Datum odevzdání tištěné podoby: | 13.05.2016 |
Datum proběhlé obhajoby: | 08.06.2016 |
Oponenti: | doc. RNDr. Vladislav Kuboň, Ph.D. |
Zásady pro vypracování |
Text Analytics in the business domain is a growing field of research. A significant part of databases of large companies is in the form of raw unstructured text (e.g. clients' emails) often linked to some structured data (e.g. clients' profiles). Nevertheless, this information is rarely utilized. In order to take advantage of the textual information, it has to be first converted to a structured form. One way to accomplish this is by extracting named entities in the text and relations between them. The extracted relations could be then used in predictive models and play a role in the process of decision-making.
The goal of the thesis is to apply methods for semantic relation extraction to enrich available structured data with relations extracted from unstructured textual data. The work will include comparison of several methods for relation extraction, such as semi-supervised Snowball [1] or DIPRE [2] and unsupervised Distant Supervision [3] and their modification to fit the use-case (e.g. by utilization the existing structured data). The work will be applied on a set of Czech texts from a business domain provided by an industrial partner and evaluated on manually annotated sample of the data. |
Seznam odborné literatury |
[1] Agichtein, E., & Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In Proceedings of the fifth ACM conference on Digital libraries. ACM.
[2] Brin S. (1998). Extracting patterns and relations from the World-Wide Web. In Proceedings of the 1998 International Workshop on the Web and Databases (WebDB’98), March 1998. [3] Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 (pp. 1003-1011). Association for Computational Linguistics. |