Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Similarity of XML Data
Thesis title in Czech: Similarity of XML Data
Thesis title in English: Similarity of XML Data
Academic year of topic announcement: 2008/2009
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 20.10.2008
Date of assignment: 20.10.2008
Date and time of defence: 06.09.2010 00:00
Date of electronic submission:06.09.2010
Date of proceeded defence: 06.09.2010
Opponents: RNDr. Jakub Klímek, Ph.D.
 
 
 
Guidelines
A possible enhancing of XML data management tools is to store and manage similar XML data in the same or similar way, i.e. to exploit the idea of clustering. For this purpose it is necessary to propose a suitable technique, which is able to measure similarity among XML documents, XML schemes, or between the two groups.
The aim of this work is a research on various aspects of the problem and its limitations. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method for similarity evaluation focusing on the found disadvantages and shortcomings. The work will include suitable experimental results.
References
1. Extensible Markup Language (XML) 1.0 (Fourth Edition). 2000. W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml
2. W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/
3. Mlynkova, I. - Necasky, M. - Pokorny, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML - Principy a aplikace v praxi. Grada Publishing, Prague, Czech Republic, zari 2008. ISBN 978-80-247-2725-7.
4. A. Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents. In Proceedings of the Fifth International Workshop on the Web and Databases - WebDB 2002, Madison, Wisconsin, USA, 2002.
5. T. Jiang, L. Wang, and K. Zhang. Alignment of Trees - An Alternative to Tree Edit. Theor. Comput. Sci., 143(1):137-148, 1995.
6. Z. Zhang, R. Li, S. Cao, and Y. Zhu. Similarity Metric for XML Documents. In Proceedings of FGWM03: Workshop on Knowledge and Experience Management, Karlsruhe, Germany, 2003.
7. E. Bertino, G. Guerrini, and M. Mesiti. A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Inf. Syst., 29(1):23-46, 2004.
8. P. K.L. Ng and V. T.Y. Ng. Structural Similarity between XML Documents and DTDs. In Springer Berlin / Heidelberg, pages 412-421. Lecture Notes in Computer Science, 2003.
9. M. L. Lee, L. H. Yang, W. Hsu, and X. Yang. XClust: Clustering XML Schemas for Effective Integration. In CIKM '02: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pages 292-299, New York, NY, USA, 2002. ACM Press.
10. E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal, 10(4):334-350, 2001.
11. H. Do, S. Melnik, and E. Rahm. Comparison of Schema Matching Evaluations. In Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, pages 221-237, London, UK, 2003. Springer-Verlag.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html