Empowering Leadership: Computing Scholars of Tomorrow Alliance
 
 

In the pursuit of unprecedented high-performance computing (HPC) capabilities, and the scientific and economic advances such capabilities will bring, U.S., Asian, European and even Indian governments have established initiatives to build and deploy extreme scale systems with exaflop (quintillion or 10^18 floating point operations per second) computational power. In terms of component counts, such systems are expected to comprise scales orders of magnitude larger than current systems. In this context, fault-tolerance has been identified as a major concern. However, effective evaluation of these fault-tolerance mechanisms has been challenging.

In this webinar, I describe a research collaboration amongst The Sandia National Laboratories, ETH Zurich and The University of New Mexico in which we are developing a simulation-based framework for the accurate performance prediction of resilience mechanisms for HPC systems and applications and show that our simulation approach can be used to simulate unprecedented time and space scales. I also present some of the results and insights that we have learned from this framework.

Dr. Dorian Arnold is an Associate Professor in the Department of Computer Science at the University of New Mexico. His research focuses on the performance and reliability of extremely large scale systems with tens of thousands, hundreds of thousands or even millions of processing elements. Dorian earned his Ph.D. at the University of Wisconsin, where he developed MRNet with Phil Roth and their advisor, Barton Miller. He received M.S. and B.S. degrees from the University of Tennessee and Regis University. Dorian also worked in the Innovative Computing Laboratory, directed by Dr. Jack Dongarra, as technical lead of the NetSolve project, which won an R&D Top 100 award in 2000. As a student scholar at the Lawrence Livermore National Laboratory, Dorian (in collaboration with LLNL researchers) developed the Stack Trace Analysis Tool for effectively debugging large scale applications.

On a personal note, Dorian is originally from Belize, and his wife, Janice, is from Guam. They have two awesome kids, Denice and DJ. Apart from his family and CS, of course, his greatest loves are sports and loud music — particularly of the dancehall reggae variety.