SubjectsSubjects(version: 978)
Course, academic year 2025/2026
   
Automatic Text Data Processing - NPFL098
Title: Automatické zpracování textových dat
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2022
Semester: summer
E-Credits: 6
Hours per week, examination: summer s.:2/2, C+Ex [HT]
Capacity: unlimited
Min. number of students: unlimited
4EU+: no
Virtual mobility / capacity: no
State of the course: cancelled
Language: Czech, English
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl098
Note: course is intended for doctoral students only
Guarantor: Mgr. Bc. Pavel Straňák, Ph.D.
Class: Informatika Mgr. - volitelný
Classification: Informatics > Computer and Formal Linguistics
Incompatibility : NPFL131
Interchangeability : NPFL131
Is incompatible with: NPFL131, NPFL152
Is interchangeable with: NPFL131, NPFL152
Opinion survey results   Schedule   Noticeboard   
Annotation -
An introductory course for automatic text processing using the most common and efficient tools and methods. The skills acquired during the course will benefit any scientific work that involves large texts and they are also required for serious study of computational linguistics.
Last update: Straňák Pavel, Mgr. Bc., Ph.D. (10.05.2013)
Course completion requirements -

Verbal exam.

Precondition to the exam is completing a course credit.

Course credit is composed of: attendence and activity in class, submitting all homeworks, and achieving >50% points for the homeworks.

Last update: Straňák Pavel, Mgr. Bc., Ph.D. (10.06.2019)
Literature -

http://ufal.mff.cuni.cz/courses/npfl098

Learning Perl, 7th Edition (or at least 5th)

Learning the bash Shell

Linux Pocket Guide

Last update: Straňák Pavel, Mgr. Bc., Ph.D. (10.06.2019)
Requirements to the exam -

Exams test knowledge of the content explained in the lectures.

Last update: Straňák Pavel, Mgr. Bc., Ph.D. (10.06.2019)
Syllabus -

We will use large texts from the students' field of study to demonstrate the

most important methods of text processing required to acquire non-trivial

information or verify hypotheses.

  • An impact of large text data: properties of big data
  • unix shell and basic commands
  • more commands for text processing
  • text editors
  • searching via regular expressions
  • using regular expressions for text maniplation
  • formulation and verification of hypotheses, application on data, precission, recall
  • example applications: stripping diacritics, sentence segmentation, tokenisation
  • rule-based part of speech tagging
  • corpus acquisition
  • NLP workfow engines: GATE, OpenNLP, Treex,
  • automatic complex analysis of a corpus
  • visualisation of the analysis and results
Last update: Straňák Pavel, Mgr. Bc., Ph.D. (10.05.2013)
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html