Distributed job execution in IVIS Framework
Thesis title in Czech: | Distribuované vykonávání jobu v IVIS Framework |
---|---|
Thesis title in English: | Distributed job execution in IVIS Framework |
Key words: | distribuovaný výpočet|cloud|zpracování dat|javascript |
English key words: | distributed computing|cloud|data processing|javascript |
Academic year of topic announcement: | 2022/2023 |
Thesis type: | Bachelor's thesis |
Thesis language: | angličtina |
Department: | Department of Distributed and Dependable Systems (32-KDSS) |
Supervisor: | prof. RNDr. Tomáš Bureš, Ph.D. |
Author: | hidden - assigned and confirmed by the Study Dept. |
Date of registration: | 29.09.2022 |
Date of assignment: | 02.10.2022 |
Confirmed by Study dept. on: | 23.11.2022 |
Date and time of defence: | 07.09.2023 09:00 |
Date of electronic submission: | 14.07.2023 |
Date of submission of printed version: | 14.07.2023 |
Date of proceeded defence: | 07.09.2023 |
Opponents: | Mgr. Vojtěch Horký, Ph.D. |
Guidelines |
IVIS is a web-based framework for creating data analytics and visualization web applications. The framework allows complex data processing through the mechanism of tasks and jobs. A task is a collection of Python scripts and metadata that specifies the parameters of the scripts. A job is an instantiation of the task with a concrete set of parameters and input datasets and with the specification of triggers that govern when the job is executed.
In its current form, the execution of a job may utilize only the host machine of the IVIS server. This fact manifests in almost every part of a job lifecycle implementation: from creation to scheduling and execution. This thesis aims to extend the task and job subsystems to allow for external computation resources (individual machines and machine pools) for job execution. The solution will require extending the IVIS-core server with 1. a way to add external executor nodes 2. a way to configure an executor node on a per-job basis 3. a way to securely communicate with the executor nodes over the internet 4. new logic for scheduling remotely-executing jobs While executing, a job utilizes the IVIS Python package to coordinate with the IVIS-core server. This includes the reception of input data, connection to a storage and indexing solution (currently Elasticsearch), and sending requests to the IVIS server to allow side effects such as creating derived datasets and storage of the state of the job. Because the data storage is centralized on the IVIS server host, remotely executing jobs will require a secure way to communicate with the IVIS server and the storage and indexing server (Elasticsearch). The solution should enable the user of the IVIS framework to run jobs on a particular machine (local or remote) as well as on a configured pool of automatically managed machines. The implementation aims to support local execution (the status quo), execution on a publicly-available machine, and execution using a pool backed by a local computing cluster or a commercial cloud service provider. The solution will be validated using an artificial set of tasks, jobs, and datasets. |
References |
[1] https://github.com/smartarch/ivis-core |