Steinar H. Gunderson

Sun, 28 Apr 2013 - Precise cache miss monitoring with perf

This should have been obvious, but seemingly it's not (perf is amazingly undocumented, and has this huge lex/yacc grammar for its command-line parsing), so here goes:

If you want precise cache miss data from perf (where “precise” means using PEBS, so that it gets attributed to the actual load and not some random instruction a few cycles later), you cannot use “cache-misses:pp” since “cache-misses” on Intel maps to some event that's not PEBS-capable. Instead, you'll have to use “perf record -e r10cb:pp”. The trick is, apparently, that “perf list” very much suggests that what you want is rcb10 and not r10cb, but that's not the way it's really encoded.

FWIW, this is LLC misses, so it's really things that go to either another socket (less likely), or to DRAM (more likely). You can change the 10 to something else (see “perf list”) if you want e.g. L2 hits.

[22:53] | | Precise cache miss monitoring with perf

Steinar H. Gunderson <sgunderson@bigfoot.com>