Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Automatic Construction of an XML Schema for a Given Set of XML Documents
Thesis title in Czech: Automatic Construction of an XML Schema for a Given Set of XML Documents
Thesis title in English: Automatic Construction of an XML Schema for a Given Set of XML Documents
Academic year of topic announcement: 2007/2008
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 22.10.2007
Date of assignment: 22.10.2007
Date and time of defence: 25.05.2009 00:00
Date of electronic submission:25.05.2009
Date of proceeded defence: 25.05.2009
Opponents: doc. Mgr. Martin Nečaský, Ph.D.
 
 
 
Guidelines
Statistical analyses of real-world XML data show that a significant portion of XML documents do not have an appropriate XML schema. And even if they have, the XML Schema language is exploited even less. It is probably caused by the fact that manual construction of an XML schema is not an easy task and that the XML Schema language is relatively complex.
The aim of this work is a research on various aspects of the problem of automatic construction of an XML schema for a given set of XML documents. Firstly, it is necessary to analyze existing solutions and to discuss their advantages and disadvantages. The core of the work is a proposal and implementation of own method of automatic construction focusing on the found disadvantages and shortcomings. A possible approach can focus on new XML Schema constructs such as, e.g., inheritance, global and local items, groups of elements and attributes, etc. in combination with user interaction. The work will include suitable experimental results.
References
Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml

W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/

Mlýnková, I. - Pokorný, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML. Skripta. Karlova Univerzita, Praha, Česká republika, září 2006.

Vošta, O.: Automatická konstrukce schématu pro množinu XML dokumentů. Diplomová práce, MFF UK, 2005. http://kocour.ms.mff.cuni.cz/~mlynkova/dp/Vosta.ps

Moh, C.-H. - Lim, E.-P. - Ng, W.-K.: Re-engineering Structures from Web Documents. In DL '00: Proc. of the 5th ACM Conf. on Digital Libraries, pages 67-76, New York, NY, USA, 2000. ACM Press.

Garofalakis, M. - Gionis, A. - Rastogi, R. - Seshadri, S. - Shim K.: XTRACT: a System for Extracting Document Type Descriptors from XML Documents. In SIGMOD '00: Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, pages 165-176, New York, NY, USA, 2000. ACM Press.

Ahonen, H.: Generating Grammars for Structured Documents Using Grammatical Inference Methods. Report A-1996-4, Department of Computer Science, University of Helsinki, 1996.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html