SubjectsSubjects(version: 850)
Course, academic year 2019/2020
   Login via CAS
Data Integration and Quality - NSWI144
Title in English: Integrace a kvalita dat
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2018 to 2019
Semester: winter
E-Credits: 4
Hours per week, examination: winter s.:2/1 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information: https://jakub.klí
Note: enabled for web enrollment
Guarantor: RNDr. Jakub Klímek, Ph.D.
Class: Informatika Mgr. - Softwarové systémy
Classification: Informatics > Software Engineering
Annotation -
Last update: RNDr. Michal Kopecký, Ph.D. (28.12.2018)
The students will get to know the problems of data integration and the approaches to solving them, mainly using web technologies. The students will try the whole integration process on selected open data using open-source tools. Next, the students will get to know the problem of data quality, its dimensions and approaches to its measurement, evaluation and improvement.
Course completion requirements -
Last update: RNDr. Jakub Klímek, Ph.D. (07.06.2019)

The assessment (zápočet) can be earned for finishing a semestral assignment before deadline established by the lecturer. The nature of the assessment check excludes the possibility of repeating the assessment check.

It is necessary to achieve assessment (zápočet) before signing up for an exam.

The exam has a written form.

Literature - Czech
Last update: RNDr. Jakub Klímek, Ph.D. (02.10.2017)

[1] C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

[2] J. Bleiholder and F. Naumann. Data fusion. ACM Comput. Surv., 41(1):1:1-1:41, Jan. 2009.

[3] Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool.

[4] C. Batini, C. Cappiello, C. Francalanci, and A. Maurino. Methodologies for Data Quality Assessment and Improvement. ACM Comput. Surv., 41(3):1- 52, 2009.

[5] T. Berners-Lee. Linked Data - Design Issues.

[6] L. Dodds, I. Davis - Linked Data Patterns.

Syllabus -
Last update: RNDr. Michal Kopecký, Ph.D. (28.12.2018)

1. Publication and consumption of data on the Web based on the principles of Linked Data. Linked Open Data cloud. Practical examples.

2. Data integration, problems and approaches using web technologies.

3. Data integration of open data on the Web and relevant tools.

4. Integration of open data on the Web - students will design a data integration process, analyse important dimensions of data quality, design relevant data quality metrics for assessing and improving data quality of the selected open data on the Web.

5. Data quality, dimensions of data quality, metrics. Methodology and tools for assessing data quality.

6. Dimensions and metrics of data quality important for assessing quality of open data on the Web. Typical problems in data quality of existing data sources. Methods for increasing data quality.

Charles University | Information system of Charles University |