## Performance Analysis of Common Loop Optimizations

Presented at the HPC Mini-Showcase August 2021

Brian Gravelle and Dave Nystrom {gravelle,wdn}@lanl.gov

In High Performance Computing, developers tune applications, especially computationally intensive kernels, for specific systems. In this presentation, we combine two methods for conducting performance analysis: Roofline visualization and hardware counter analysis. The Rooflines allow the user to understand the performance of the application relative to the hardware's potential while the hardware counters enable a deep understanding of how a computational kernel makes use of the CPU. We discuss the background of these methods and demonstrate their use to gain insight into a matrix multiplication benchmark running on an A64FX CPU from Fujitsu.

LA-UR-21-27741