How to calculate the amount (in bytes) of cache misses

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to calculate the amount (in bytes) of cache misses

Xiang Li

Dear Valgrind users,

 

I am using Cachegrind to study the cache miss behavior of our program. I need to collect the last level cache misses in bytes. According to the online manual http://valgrind.org/docs/manual/cg-manual.html (related parts are copied at the end for reference), Cachegrind outputs the number of misses. To my understanding, the amount in bytes can be calculated as follows

Misses in bytes = (the number of misses) * (line size of the cache)

 

where the line size can be configured with option such as --LL=<size>,<associativity>,<line size>. Is my understanding correct? If not, could you tell me how to calculate the misses in bytes?

 

Copied from the manual:

Cache accesses for instruction fetches are summarised first, giving the number of fetches made (this is the number of instructions executed, which can be useful to know in its own right), the number of I1 misses, and the number of LL instruction (LLi) misses.

Cache accesses for data follow. The information is similar to that of the instruction fetches, except that the values are also shown split between reads and writes (note each row's rd and wr values add up to the row's total).

 

Thanks and regards,

 

Xiang


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to calculate the amount (in bytes) of cache misses

John Reiser
> I am using Cachegrind to study the cache miss behavior of our program. I need to collect the last level cache misses in bytes. According to the online manual http://valgrind.org/docs/manual/cg-manual.html (related parts are copied at the end for reference), Cachegrind outputs the number of misses.
> To my understanding, the amount in bytes can be calculated as follows
>
> Misses in bytes = (the number of misses) * (line size of the cache)
>
> where the line size can be configured with option such as |--LL=<size>,<associativity>,<line size>. |Is my understanding correct? If not, could you tell me how to calculate the misses in bytes?

That product is the total bytes of traffic that are caused by misses.
But it ignores the width of the bus, which determines the duration
of the transfers.   Most desktop computers have a 64-bit data bus
(72 bits if ECC) to DDR3 or DDR4 SDRAM.  Some embedded devices
have a 32-bit bus (or even narrower).  Desktop video graphic
display cards usually have 32, 64, 128, 192, 256, or 384 bits
[and no cache :-)]  The bus width to L1 and L2 caches can be wider.
It's 128 or 256 bits on PowerPC chips, for instance.
(Yes: the icache fetches 4 or 8 32-bit instructions at a time,
and all can be decoded and executed in parallel except for
dataflow constraints.  Aligning branch destinations to 32-byte
boundaries might make a big difference in execution speed.)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to calculate the amount (in bytes) of cache misses

Xiang Li
Hi John,

Thanks for your prompt reply. I got your point.

BR, Xiang

-----Original Message-----
From: John Reiser [mailto:[hidden email]]
Sent: Monday, June 26, 2017 10:49 AM
To: [hidden email]
Subject: Re: [Valgrind-users] How to calculate the amount (in bytes) of cache misses

> I am using Cachegrind to study the cache miss behavior of our program. I need to collect the last level cache misses in bytes. According to the online manual http://valgrind.org/docs/manual/cg-manual.html (related parts are copied at the end for reference), Cachegrind outputs the number of misses.
> To my understanding, the amount in bytes can be calculated as follows
>
> Misses in bytes = (the number of misses) * (line size of the cache)
>
> where the line size can be configured with option such as |--LL=<size>,<associativity>,<line size>. |Is my understanding correct? If not, could you tell me how to calculate the misses in bytes?

That product is the total bytes of traffic that are caused by misses.
But it ignores the width of the bus, which determines the duration
of the transfers.   Most desktop computers have a 64-bit data bus
(72 bits if ECC) to DDR3 or DDR4 SDRAM.  Some embedded devices have a 32-bit bus (or even narrower).  Desktop video graphic display cards usually have 32, 64, 128, 192, 256, or 384 bits [and no cache :-)]  The bus width to L1 and L2 caches can be wider.
It's 128 or 256 bits on PowerPC chips, for instance.
(Yes: the icache fetches 4 or 8 32-bit instructions at a time, and all can be decoded and executed in parallel except for dataflow constraints.  Aligning branch destinations to 32-byte boundaries might make a big difference in execution speed.)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to calculate the amount (in bytes) of cache misses

Xiang Li
In reply to this post by John Reiser
Hi John,

One thing to confirm. For LLd reading misses (data only) in bytes, it is DLmr * lineSize, correct? (suppose bus width is the same as that of LLd for a simple case).

Thanks, Xiang


-----Original Message-----
From: John Reiser [mailto:[hidden email]]
Sent: Monday, June 26, 2017 10:49 AM
To: [hidden email]
Subject: Re: [Valgrind-users] How to calculate the amount (in bytes) of cache misses

> I am using Cachegrind to study the cache miss behavior of our program. I need to collect the last level cache misses in bytes. According to the online manual http://valgrind.org/docs/manual/cg-manual.html (related parts are copied at the end for reference), Cachegrind outputs the number of misses.
> To my understanding, the amount in bytes can be calculated as follows
>
> Misses in bytes = (the number of misses) * (line size of the cache)
>
> where the line size can be configured with option such as |--LL=<size>,<associativity>,<line size>. |Is my understanding correct? If not, could you tell me how to calculate the misses in bytes?

That product is the total bytes of traffic that are caused by misses.
But it ignores the width of the bus, which determines the duration
of the transfers.   Most desktop computers have a 64-bit data bus
(72 bits if ECC) to DDR3 or DDR4 SDRAM.  Some embedded devices have a 32-bit bus (or even narrower).  Desktop video graphic display cards usually have 32, 64, 128, 192, 256, or 384 bits [and no cache :-)]  The bus width to L1 and L2 caches can be wider.
It's 128 or 256 bits on PowerPC chips, for instance.
(Yes: the icache fetches 4 or 8 32-bit instructions at a time, and all can be decoded and executed in parallel except for dataflow constraints.  Aligning branch destinations to 32-byte boundaries might make a big difference in execution speed.)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Loading...