Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Automatic Text Data Processing - NPFL098

Title:	Automatické zpracování textových dat
Guaranteed by:	Institute of Formal and Applied Linguistics (32-UFAL)
Faculty:	Faculty of Mathematics and Physics
Actual:	from 2022
Semester:	summer
E-Credits:	6
Hours per week, examination:	summer s.:2/2, C+Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	cancelled
Language:	Czech, English
Teaching methods:	full-time
Teaching methods:	full-time
Additional information:	http://ufal.mff.cuni.cz/courses/npfl098
Note:	course is intended for doctoral students only

Guarantor:	Mgr. Pavel Straňák, Ph.D.
Class:	Informatika Mgr. - volitelný
Classification:	Informatics > Computer and Formal Linguistics
Incompatibility :	NPFL131
Interchangeability :	NPFL131
Is incompatible with:	NPFL131
Is interchangeable with:	NPFL131

Opinion survey results Examination dates Schedule Noticeboard

Annotation -

Last update: Mgr. Pavel Straňák, Ph.D. (10.05.2013)

An introductory course for automatic text processing using the most common and efficient tools and methods. The skills acquired during the course will benefit any scientific work that involves large texts and they are also required for serious study of computational linguistics.

Course completion requirements -

Last update: Mgr. Pavel Straňák, Ph.D. (10.06.2019)

Verbal exam.

Precondition to the exam is completing a course credit.

Course credit is composed of: attendence and activity in class, submitting all homeworks, and achieving >50% points for the homeworks.

Literature -

Last update: Mgr. Pavel Straňák, Ph.D. (10.06.2019)

http://ufal.mff.cuni.cz/courses/npfl098

Learning Perl, 7th Edition (or at least 5th)

Learning the bash Shell

Linux Pocket Guide

Requirements to the exam -

Last update: Mgr. Pavel Straňák, Ph.D. (10.06.2019)

Exams test knowledge of the content explained in the lectures.

Syllabus -

Last update: Mgr. Pavel Straňák, Ph.D. (10.05.2013)

We will use large texts from the students' field of study to demonstrate the

most important methods of text processing required to acquire non-trivial

information or verify hypotheses.

An impact of large text data: properties of big data

unix shell and basic commands

more commands for text processing

text editors

searching via regular expressions

using regular expressions for text maniplation

formulation and verification of hypotheses, application on data, precission, recall

example applications: stripping diacritics, sentence segmentation, tokenisation

rule-based part of speech tagging

corpus acquisition

NLP workfow engines: GATE, OpenNLP, Treex,

automatic complex analysis of a corpus

visualisation of the analysis and results