Mining the Workload of Real Grid Computing Systems


Since the mid 1990s, grid computing systems have emerged as an analogy for making computing power as pervasive an easily accessible as an electric power grid. Since then, grid computing systems have been shown to be able to provide very large amounts of storage and computing power to mainly support the scientific and engineering research on a wide geographic scale. Understanding the workload characteristics incoming to such systems is a milestone for the design and the tuning of effective resource management strategies. This is accomplished through the workload characterization, where workload characteristics are analyzed and a possibly realistic model for those is obtained. In this paper, we study the workload of some real grid systems by using a data mining approach to build a workload model for job interarrival time and runtime, and a Bayesian approach to capture user correlations and usage patterns. The final model is then validated against the workload coming from a real grid system.

CoRR, abs/1412.2673