SubjectsSubjects(version: 850)
Course, academic year 2019/2020
   Login via CAS
Web Semantization - NSWI108
Title in English: Sémantizace webu
Guaranteed by: Department of Software Engineering (32-KSI)
Faculty: Faculty of Mathematics and Physics
Actual: from 2019 to 2019
Semester: winter
E-Credits: 5
Hours per week, examination: winter s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: not taught
Language: Czech, English
Teaching methods: full-time
Additional information:
Note: enabled for web enrollment
Guarantor: prof. RNDr. Peter Vojtáš, DrSc.
Class: Informatika Mgr. - Softwarové systémy
Classification: Informatics > Informatics, Software Applications, Computer Graphics and Geometry, Database Systems, Didactics of Informatics, Discrete Mathematics, External Subjects, General Subjects, Computer and Formal Linguistics, Optimalization, Programming, Software Engineering, Theoretical Computer Science
Annotation -
Last update: RNDr. Michal Kopecký, Ph.D. (09.05.2019)
Basic semantic web models are covered in NSWI145, dynamical aspects of web data extraction (semantization) are considered in NDBI021 and NSWI167 (see also NSWI144 and NSWI142). Semantization is appropriate dynamic content enrichment, needed for automated processing if it’s content. We are treating the problem from SW engineering perspective: models, methodology and process of web semantization. We cover basic formal knowledge necessary for orientation in the field and learn some practical skills. Labs are composed of reporting on current achievements, learning rules for semantization.
Course completion requirements -
Last update: prof. RNDr. Peter Vojtáš, DrSc. (12.10.2017)

Terms of passing the course consist of reporting on current achievements, induction on semantized data, project of a virtual Lean Startup and customer imitation via a social network. These are only conditions for getting credits. Exam is oral and requires basic understanding of whole material.

As soon as terminology is introduced, detailed milestones (also form of deliverables) and preferred deadlines will be announced (with possible repeated attempts). There is no evidence on personal presence. Nevertheless, no additional explanation for tasks will be given, except on the respective lab and brief description on the course web. Final deadline is end of semester (without repeated attempts).

Literature -
Last update: RNDr. Michal Kopecký, Ph.D. (10.05.2017)
  • P. Hitzler, M. Krötzsch, S. Rudolph. Foundations of Semantic Web Technologies. Chapman & Hall/CRC 2010,
  • E. Ries. Lean Startup, Crown Business 2011
  • D. Harel, D. Kozen, J. Tiuryn. Dynamic Logic. The MIT Press 2000
  • G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer 2013
  • C. D. Manning, P. Raghavan, H. Schütze. An Introduction to Information Retrieval. Cambridge University Press 2009

Syllabus -
Last update: RNDr. Michal Kopecký, Ph.D. (10.05.2017)
Web semantization
Basic problems and vision of automation of web content processing, extraction, annotation

Lean start-up methodology and semantization

RDF-framework, description logic, OWL
Data model RDF and RDFS as a model of metadata, formal semantics, satisfiability

Basics of description logic (DeL), knowledge and ontology representation

Web querying languages
Language SPARQL, SPARQL algebra

Dynamic logic
Propositional dynamic logic (DyL)

Decidability of DyL

A dynamic model of web semantization
Integration of W3C models and Dynamic logic

Reliability of automated web information extraction and annotation

A Kripke style model: states are query_based_predicate logic, programs (extractors) remain propositional + information on training extractors (metrics, data)

A Hypothesis - Extraction success is similar on similar resources (e.g. created by same templates)

Charles University | Information system of Charles University |