The purpose of this thesis is to design an interface for checkpointing tasks (processes) in HelenOS operating system. The proposed solution is limited to a homogeneous environment (i.e. the checkpointed task will be always restored on the same hardware architecture) and is implemented with kernel support. The solution does not handle only the storing and restoring of the task state, but also handles task dependencies which might not exist anymore in the time of the task restoration (e.g. open files, other communicating tasks, etc.). The thesis gives also a comparison of the proposed solution with other possible approaches. There is also a prototype implementation for a single hardware architecture.
Seznam odborné literatury
Asim Shankar: A system for Process Checkpointing and Restarting (Using a core dump), http://www.geocities.com/asimshankar/checkpointing/report.pdf
Mishra, Wang: Choosing an Appropriate Checkpointing and Rollback Recovery Algorithm for Long-Running Parallel and Distributed Applications, Proceedings of the 11th ISCA International Conference on Computers and their Applications, San Francisco, USA, 1996
Macy, Dillon: Process checkpointing support in DragonFlyBSD
Předběžná náplň práce
Cílem práce je navrhnout rozhraní pro checkpointování úloh (procesů) v rámci operačního systému HelenOS. Navrhované řešení je omezeno na homogenní prostředí (checkpointovaná úloha je restartována na stejné HW architektuře) a probíhá s podporou kernelu. Součástí práce je nejen samotné uchování stavu úloh, ale i následné řešení závislosti, které po restartování úlohy již nemusí existovat (např. otevřené soubory, jiné úlohy, s nimiž checkpointována úloha komunikuje atp.). Práce rovněž obsahuje porovnání možností navrženého řešení s jinými přístupy. Součásti práce je také prototypová implementace pro jednu zvolenou architekturu.
Předběžná náplň práce v anglickém jazyce
The purpose of this thesis is to design an interface for checkpointing tasks (processes) in HelenOS operating system. The proposed solution is limited to a homogeneous environment (i.e. the checkpointed task will be always restored on the same hardware architecture) and is implemented with kernel support. The solution does not handle only the storing and restoring of the task state, but also handles task dependencies which might not exist anymore in the time of the task restoration (e.g. open files, other communicating tasks, etc.). The thesis gives also a comparison of the proposed solution with other possible approaches. There is also a prototype implementation for a single hardware architecture.