Prof. Henri Casanova, in collaboration with Prof. Rafael Ferreira da Silva at USC, has received a $600,000 grant from the National Science Foundation for supporting a project entitled: “Simulation-driven runtime resource management for distributed workflow applications”.
Software systems are used for running scientific applications on cyberinfrastructure systems. These systems automate application execution, including resource management decision making along several axes including selecting and provisioning (virtualized) hardware, picking application configuration options, and scheduling application activities in time and space. Their objective is to optimize application performance and a range of resource usage efficiency metrics that include monetary and energy costs. Consequently, the resource management decision space is enormous, and making good decisions is a steep challenge that has been the subject of countless efforts, both from theoreticians and practitioners. And yet, the challenge is far from being solved: theoreticians produce solutions that are rarely used by practitioners, and conversely practitioners implement solutions that may be vastly sub-optimal because not informed by theory. This project resolves this disconnect by obviating the need for developing effective resource management strategies. The key idea is to use online simulations to search the resource management decision space rapidly at runtime. Large numbers of fast simulations of the application’s execution are executed throughout that very execution, so as to evaluate many potential resource management options and automatically select desirable ones. This approach thus shifts the overall problem from the design of elusive resource management algorithms to the enumeration of many resource management decisions. This transformation of the resource management practice in cyberinfrastructure systems not only renders the resource management problem tractable but also unlocks previously out-of-reach resource management decisions. The benefits of this transformation will be demonstrated for a critical class of production systems and applications, namely, Workflow Management Systems and the scientific applications they support.