Scala-based Euphoria API
|Thesis title in Czech:||Scala-based Euphoria API|
|Thesis title in English:||Scala-based Euphoria API|
|Academic year of topic announcement:||2019/2020|
|Type of assignment:||diploma thesis|
|Department:||Department of Distributed and Dependable Systems (32-KDSS)|
|Supervisor:||prof. RNDr. Tomáš Bureš, Ph.D.|
|Euphoria API is an open-source framework for big-data application development (https://github.com/seznam/euphoria). The key features of Euphoria API are:
- Independence of an application on the particular runtime environment and infrastructure
- Unified way of expressing data operations over batch and stream data
- Good readability and and understandability of code written in Euphoria API
At present, Euphoria supports Java 8 with lambda functions. This API does not allow the underlying frameworks (executors) to exploit knowledge about the structure of operations. However, this knowledge, which is present in the code of the data operations, has the potential to let the underlying framework optimize the performance of the Euphoria-based code, which would bring the performance of Euphoria applications close to native application that are developed specifically for a particular executor framework.
The goal of the thesis is to address this problem by providing a Scala-based Euphoria API. Scala allows compile-time extraction of additional information from the lambda-functions, which should allow executors to perform the optimizations. The Scala API should be designed from scratch with emphasis on simple use (while exploiting Scala's advanced language features) and minimal overhead. The thesis should further extend the existing Java API by option to specify the additional information for the underlying executors. The Scala API will internally use this extended Java API. The work will also include the extensions of executor-specific code (for Apache Spark and Apache Flink) such that they can exploit the additional information and pass it on the underlying framework.
The changes of API can be validated on provided test applications executed on a Hadoop cluster (of tens to hundreds of nodes) and on data generated by the crawler of Seznam.cz.
|Martin Odersky, Lex Spoon, and Bill Venners: Programming in Scala, Third Edition, Artima
Euphoria API, http://github.com/seznam/euphoria/
Apache Spark, http://spark.apache.org/
Apache Flink, http://flink.apache.org/