CPU cycle counts match real processors?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

CPU cycle counts match real processors?

- JinsongJi

Hi Valgrind Developers,

Thanks for your contribution to make such an amazing product.

I am working for one AMD performance team and currently thinking about aggregating CPU cycles to evaluate one app’s performance for different platforms (CPU+OS), therefore the CPU cycles should match real processors.

 But I saw this on page 214 of valgrind_manual.pdf of 3.12.0.

>>Especially, cycle counts matching real processors probably will never be generated by any Valgrind tools, as these are bound to simulations of simple machine models for acceptable slowdown.

So my question is if it’s feasible for Valgrind to get the real cycles for one specific CPU by passing the exact cycle numbers for all instructions on this CPU to Valgirnd? i.e. I can collect these cycles numbers for some CPUs and then add these numbers into Valgrind source code. If yes, which tool of Valgrind is more close to this? I would like to have a try.

Thanks for your time!

Jinsong


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers
Reply | Threaded
Open this post in threaded view
|

Re: CPU cycle counts match real processors?

wcohen (Bugzilla)
On 01/25/2017 04:32 AM, - JinsongJi wrote:

> Hi Valgrind Developers,
>
> Thanks for your contribution to make such an amazing product.
>
> I am working for one AMD performance team and currently thinking about aggregating CPU cycles to evaluate one app’s performance for different platforms (CPU+OS), therefore the CPU cycles should match real processors.
>
>  But I saw this on page 214 of valgrind_manual.pdf of 3.12.0.
>
>>>Especially, cycle counts matching real processors probably will never be generated by any Valgrind tools, as these are bound to simulations of simple machine models for acceptable slowdown.
>
> So my question is if it’s feasible for Valgrind to get the real cycles for one specific CPU by passing the exact cycle numbers for all instructions on this CPU to Valgirnd? i.e. I can collect these cycles numbers for some CPUs and then add these numbers into Valgrind source code. If yes, which tool of Valgrind is more close to this? I would like to have a try.
>
> Thanks for your time!
>
> Jinsong

Hi Jinsong,

Having Valgring estimate cycles is probably not feasible. There are not deterministic cycle counts for individual instructions.  The cycle counts for individual instructions is going to vary greatly on conditions.  One time an instruction at a particular address is going to have a cache miss causing a high cycle count.  However, another time that same instruction is executed the data is in cache and the cycle count is much lower.   Other mechanisms such as speculative and superscalar execution of instructions are going to add to the difficulty of estimating the cycles for a single instruction because of the possible overlap of instruction execution.

There are tools such as the perf command, libpfm, papi, oprofile, and systemtap that allow getting data from the processor's performance counters.  The counts obtained are usually only applicable to particular processor the data was collected on.  Changes in system parameters such as cache sizes/latency and processor microarchitecture make it difficult to apply measurements from one processor implementation to another processor implementation.

Even though the performance event data counts may be specific to a particular processor the events could point out performance issues for application that would be visible on a variety of processor implementations.  For example see poor last level cache (LLC) performance for an application on a processor with large LLC, you can infer that performance will be worse for processor implementation with smaller LLC.

-Will

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers