Thesis (Selection of subject)Thesis (Selection of subject)(version: 390)
Thesis details
   Login via CAS
Automatic Generation of Synthetic XML Documents
Thesis title in Czech: Automatické generování umelých XML dokumentu
Thesis title in English: Automatic Generation of Synthetic XML Documents
Key words: XML, generátor, testování, benchmark, umělá data
English key words: XML, generator, testing, benchmark, synthetic data
Academic year of topic announcement: 2007/2008
Thesis type: diploma thesis
Thesis language: angličtina
Department: Department of Software Engineering (32-KSI)
Supervisor: doc. RNDr. Irena Holubová, Ph.D.
Author: hidden - assigned and confirmed by the Study Dept.
Date of registration: 11.11.2010
Date of assignment: 11.11.2010
Confirmed by Study dept. on: 07.05.2015
Date and time of defence: 15.06.2015 10:00
Date of electronic submission:07.05.2015
Date of submission of printed version:07.05.2015
Date of proceeded defence: 15.06.2015
Opponents: doc. RNDr. Jakub Klímek, Ph.D.
 
 
 
Guidelines
The aim of this work is a research on possibilities and limitations of automatic generation of synthetic XML documents for the purpose of testing of XML applications. First of all it is necessary to analyze existing data generators and to discuss their advantages and disadvantages. The core of the work should be a proposal and implementation of own algorithm that would focus on reasonable subset of possible XML data characteristics such as, e.g., size of the document, depth, fan-out, number of elements, number of attributes, mixed contents, IDs and IDREF(S), distribution of the constructs, complexity of the constructs, textual values etc. At the same time the usage of such a system should be easy and fast. The resulting algorithm will be also (at least partly) able to deal with mutual dependencies of various parameters. The parameters can be set either manually, or extracted from a given set of XML documents, XML queries etc. The work will include suitable experimental results.
References
1. Extensible Markup Language (XML) 1.0 (Fourth Edition). 2000. W3C Recommendation, 16 August 2006. http://www.w3.org/TR/REC-xml

2. W3C. W3C Technical Reports and Publications. http://www.w3.org/TR/

3. Mlýnková, I. - Pokorný, J. - Richta, K. - Toman, K. - Toman, V.: Technologie XML. Skripta. Karlova Univerzita, Praha, Česká republika, září 2006.

4. XML benchmarking projects:
XMark http://monetdb.cwi.nl/xml/
XOO7 Benchmark http://www.comp.nus.edu.sg/~ebh/XOO7.html
XMach-1 http://dbs.uni-leipzig.de/en/projekte/XML/XmlBenchmarking.html
The Michigan Benchmark http://www.eecs.umich.edu/db/mbench/
XBench http://se.uwaterloo.ca/~ddbms/projects/xbench/
XPathMark http://users.dimi.uniud.it/~massimo.franceschet/xpathmark/
MemBeR: XQuery Micro-Benchmark Repository http://ilps.science.uva.nl/Resources/MemBeR/index.html
TPoX http://tpox.sourceforge.net/

5. Data generators:
ToXgene http://www.alphaworks.ibm.com/tech/toxgene
A. Aboulnaga, J. F. Naughton, and C. Zhang. Generating Synthetic Complex-Structured XML Data. In WebDB'01: Proc. of the 4th Int. Workshop on the Web and Databases, pages 79-84, Washington, DC, USA, 2001.
L. Afanasiev, I. Manolescu, and P. Michiels. MemBeR XML Generator. http://ilps.science.uva.nl/Resources/MemBeR/member-generator.html
P. Azalov and F. Zlatarova. SDG - A System for Synthetic Data Generation. In ITCC'03: Proc of the Int. Conf. on Information Technology: Computers and Communications, pages 69-75, Washington, DC, USA, 2003. IEEE Computer Society.

6. Mlynkova, I. - Toman, K. - Pokorný, J.: Statistical Analysis of Real XML Data Collections. Technical report 2006/5. Charles University, Prague, Czech Republic, June 2006, 43 pages. http://www.ksi.mff.cuni.cz/~mlynkova/doc/tr2006-5.pdf

7. Maroš Vranec: XML Benchmarking http://www.ksi.mff.cuni.cz/~mlynkova/dp/Vranec.pdf

8. Mlynkova, I.: XML Benchmarking: Limitations and Opportunities. Technical report 2008/1. Charles University, Prague, Czech Republic, January 2008, 23 pages. http://www.ksi.mff.cuni.cz/~mlynkova/doc/tr2008-1.pdf
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html