SubjectsSubjects(version: 916)
Course, academic year 2022/2023
   Login via CAS
Programming for corpus linguistics: Python and NLTK II - AMLV00063
Title: Programování pro korpusovou lingvistiku: Python a NLTK II
Guaranteed by: Institute of the Czech National Corpus (21-UCNK)
Faculty: Faculty of Arts
Actual: from 2019
Semester: summer
Points: 0
E-Credits: 4
Examination process: summer s.:
Hours per week, examination: summer s.:0/2, C [HT]
Capacity: unknown / 10 (10)
Min. number of students: unlimited
Virtual mobility / capacity: no
State of the course: taught
Language: Czech
Teaching methods: full-time
Level:  
Note: course can be enrolled in outside the study plan
enabled for web enrollment
Guarantor: Mgr. David Lukeš
Teacher(s): Mgr. David Lukeš
Annotation -
Last update: Mgr. David Lukeš (26.11.2019)
Introduction to programming in Python for linguists, part II. The course is taught mainly in Czech and therefore requires sufficient proficiency in order to attend. Please refer to the Czech annotation for further details.
Course completion requirements -
Last update: Mgr. David Lukeš (26.11.2019)

Credit requirements: regular attendance, active participation, completion of coursework including a final assignment.

Literature -
Last update: Mgr. David Lukeš (16.08.2018)

Bird, S., Klein, E., & Loper, E. (2014). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Retrieved from http://www.nltk.org/book/

Gries, P., Campbell, J., & Montojo, J. (2013). Practical Programming: An Introduction to Computer Science Using Python 3 (2nd ed.). Dallas, Texas: Pragmatic Bookshelf.

Jurafsky, D., & Martin, J. H. (2019). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed.). Retrieved from https://web.stanford.edu/~jurafsky/slp3/

Lukeš, D. (2016, January 27). How computers handle text: A gentle but thorough introduction to Unicode. Retrieved August 15, 2018, from https://dlukes.github.io/unicode.html

Matthes, E. (2015). Python Crash Course: A Hands-On, Project-Based Introduction to Programming (1st ed.). San Francisco: No Starch Press.

McEnery, T., & Hardie, A. (2011). Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.

Moran, S., & Cysouw, M. (2018). The Unicode cookbook for linguists: managing writing systems using orthography profiles. Berlin: Language Science Press. Retrieved from http://langsci-press.org/catalog/book/176

Skiena, S. S. (2008). The Algorithm Design Manual. London: Springer London. https://doi.org/10.1007/978-1-84800-070-4

Sweigart, A. (2018). Cracking Codes with Python: An Introduction to Building and Breaking Ciphers. San Francisco: No Starch Press.

Vaughan, L. (2018). Impractical Python: Playful Programming Activities to Make You Smarter. San Francisco: No Starch Press.

Zinoviev, D. (2016). Data Science Essentials in Python: Collect – Organize – Explore – Predict – Value (1st ed.). Raleigh, North Carolina: Pragmatic Bookshelf.

Syllabus - Czech
Last update: Mgr. David Lukeš (03.01.2021)

Hlavní okruhy (zde pro přehlednost tematicky sdružené, pořadí v rámci semestru se částečně liší):

1. pohled "pod kapotu" různých postupů využívaných při zpracování přirozeného jazyka

·         generování textu

·         morfologické značkování

·         klasifikace textu

2. seznámení s příkazovou řádkou, psaní programů pro příkazovou řádku

3. správa zdrojového kódu

·         organizace / strukturace delších programů, volba textového editoru

·         verzování (git, https://github.com/)

·         zveřejňování (free / libre / open-source software)

4. objektově orientované programování v Pythonu

·         tvorba vlastních nových typů objektů, tzv. tříd

·         jak poznat situace, kdy se vyplatí po těchto složitějších nástrojích sáhnout

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html