Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Transcriptor
Název práce v češtině: Transkriptor
Název v anglickém jazyce: Transcriptor
Klíčová slova: transkripce, transliterace, fonetická abeceda
Klíčová slova anglicky: transcription, transliteration, phonetic alphabet
Akademický rok vypsání: 2014/2015
Typ práce: bakalářská práce
Jazyk práce: angličtina
Ústav: Ústav formální a aplikované lingvistiky (32-UFAL)
Vedoucí / školitel: RNDr. Daniel Zeman, Ph.D.
Řešitel: skrytý - zadáno a potvrzeno stud. odd.
Datum přihlášení: 16.03.2015
Datum zadání: 16.03.2015
Datum potvrzení stud. oddělením: 23.03.2015
Zásady pro vypracování
Transcription of natural language text from one script to another is needed for various tasks such as:

- transcription of foreign personal or geographical names to be used in other than original language
- pronunciation guide for foreigners
- input method on computers and other devices lacking keybord for the target script

Transcription, in contrast to transliteration, does not necessarily mean a 1-1 mapping between sets of characters. Transcription focuses on capturing the pronunciation using the spelling rules of another script AND language. For instance, transcription of the Russian name Чайковский into the Latin script may result in Chaikovsky, Tchaïkovski, Tschaikowski or Čajkovskij, among others, depending on the target language. The focus on pronunciation can be exploited if we decompose transcription into modeling pronunciation of all the languages involved, using the International Phonetic Alphabet (IPA). We could model the mapping between sequences of characters in language L1 and sequences of IPA symbols. Then we could combine the models so that L1 → IPA → L2 would render the desired transcription L1 → L2.

The goal of the thesis is to test the approach on at least three languages, two of which use the Latin script and one using a different script. A minimal solution involves the following:

- Design and implement a general system that transcribes text according to user-supplied rules.
- As an interface to the transcription system, implement a web-based application. It should provide means for designing transcription rules, importing sets of rules and applying sets of rules to user-supplied text or existing websites.
- Create a rule-based model of pronunciation of each language (i.e. bi-directional mapping Lx ↔ IPA).
- Create (or find online) test data with transcriptions for evaluation purposes.
- Use the models to test and evaluate all 6 (or more in case of more languages) transcription directions. Analyze the results in the thesis.

An optional enhancement would be to add a machine-learning module that would learn transcription rules and/or context of their application from human-transcribed training data. A pre-existing, downloadable library implementing a machine-learning algorithm can be used for this purpose; the student would implement the pre- and postprocessing of the data. The focus of this enhancement would be on research rather than programming: What is the best way of preparing the training model in order to get good transcription rules.
Seznam odborné literatury
Min Zhang, A Kumaran, Haizhou Li: Whitepaper of NEWS 2011 Shared Task on Machine Transliteration, 2011
 
Univerzita Karlova | Informační systém UK