Active learning in E-Commerce Merchant Classification using Website Information
Název práce v češtině: | Aktivní učení pro klasifikaci |
---|---|
Název v anglickém jazyce: | Active learning in E-Commerce Merchant Classification using Website Information |
Klíčová slova: | aktivní učení|klasifikace|e-komerce |
Klíčová slova anglicky: | Active learning|Web mining|Classification|E-commerce |
Akademický rok vypsání: | 2022/2023 |
Typ práce: | diplomová práce |
Jazyk práce: | angličtina |
Ústav: | Katedra teoretické informatiky a matematické logiky (32-KTIML) |
Vedoucí / školitel: | Mgr. Marta Vomlelová, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 25.05.2022 |
Datum zadání: | 27.06.2022 |
Datum potvrzení stud. oddělením: | 23.11.2022 |
Datum a čas obhajoby: | 12.06.2023 09:00 |
Datum odevzdání elektronické podoby: | 03.05.2023 |
Datum odevzdání tištěné podoby: | 09.05.2023 |
Datum proběhlé obhajoby: | 12.06.2023 |
Oponenti: | doc. Mgr. Martin Pilát, Ph.D. |
Zásady pro vypracování |
The aim of this Master‘s Thesis will be to design a machine learning model that will be able to classify input data in the form of merchant URL into the main category and corresponding subcategories.
One of the challenges is to have a complete categorization of the whole available market, where each merchant can be classified into a main category based on what they offer, and then into more specific sub-categories, e.g., Eco, Zero Waste or Bike Sharing. Although the available database has categorised merchants mainly from the CEE region, we want to create a global coverage, thus the demand for accurate automated categorization is increasing. Thanks to the collected data, it is possible to use modern machine learning approaches to process it. The work assumes an existing dataset of a small number of manually categorised merchants. The student should build a merchant dataset based on the web information. On this partially labelled dataset he will test selection strategies for active learning, based on misclassification error and based on a Bayesian approach. |
Seznam odborné literatury |
Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Information. In Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS2019). Association for Computing Machinery, New York, NY, USA, Article 5, 1–10. https://doi.org/10.1145/3326467.3326486
Kottke, D., Herde, M., Sandrock, C. et al. Toward optimal probabilistic active learning using a Bayesian approach. Mach Learn 110, 1199–1231 (2021). https://doi.org/10.1007/s10994-021-05986-9 Galuh Tunggadewi Sahid, Rahmad Mahendra, and Indra Budi. 2019. E-Commerce Merchant Classification using Website Inforimation. In 9th International Conference on Web Intelligence, Mining and Semantics (WIMS2019), June 26–28, 2019, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3326467.3326486 Michael Färber, Benjamin Scheer, and Frederic Bartscherer. 2020. Who’s Behind That Website? Classifying Websites by the Degree of Commercial Intent. In Web Engineering: 20th International Conference, ICWE 2020, Helsinki, Finland, June 9–12, 2020, Proceedings. Springer-Verlag, Berlin, Heidelberg, 130–145. https://doi.org/10.1007/978-3-030-50578-3_10 |
Předběžná náplň práce v anglickém jazyce |
This topic is created for a specific student. |