Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Web Knowledge Mining - NSWI107

Title:	Dobývání informací z webu
Guaranteed by:	Department of Software Engineering (32-KSI)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2007
Semester:	summer
E-Credits:	6
Hours per week, examination:	summer s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	cancelled
Language:	Czech
Teaching methods:	full-time
Teaching methods:	full-time

Guarantor:	RNDr. Leo Galamboš, Ph.D.
Class:	Informatika Mgr. - volitelný
Classification:	Informatics > Informatics, Software Applications, Computer Graphics and Geometry, Database Systems, Didactics of Informatics, Discrete Mathematics, External Subjects, General Subjects, Computer and Formal Linguistics, Optimalization, Programming, Software Engineering, Theoretical Computer Science
Pre-requisite :	NDBI010, NPRG013

Opinion survey results Examination dates Schedule Noticeboard

Annotation -

Last update: T_KSI (29.03.2005)

This course is intended to provide the student with an understanding of the fundamental concepts and advanced techniques for text-based information systems on the Web. This course covers efficient Web indexing, searching and crawling; Clustering, classification, text mining. The student will implement a project from diverse topics in the Web information retrieval.

Literature - Czech

Last update: T_KSI (29.03.2005)

Soumen Chakrabarti: Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam: Morgan Kaufmann, 2003.

Ricardo Baeza-Yates, Berthier Ribeiro-Neto: Modern Information Retrieval. Addison Wesley, 1999.

Ian H. Witten, Alistair Moffat, and Timothy C. Bell: Managing

Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.

Syllabus -

Last update: T_KSI (29.03.2005)

Engineering Large-Scale Crawlers.

The Vector-Space Model, Inverted Index, Recall, Precision.

Stopwords, stemming, lemmatization, soundex.

Handling "Find-Similar" Queries, Eliminating Near Duplicates.

Clustering: Bottom-Up/Top-Down; The k-Means Algorithm, Self-Organizing

Maps, Multidimensional Scaling, Latent Semantic Indexing,

Collaborative Filtering.

(Semi)supervised Learning.

PageRank, HITS.

Measuring and Modeling the Web.

Resource Discovery, Communities.