Dear Valgrind developers,
first, please forgive us if this post is out of place in this list. We would like to introduce Verrou [1], a floating-point error diagnostics tool based on Valgrind. The idea behind the tool is that it replaces all floating-point operations by randomly rounded ones (which means that instead of always rounding non-representable results to the nearest floating-point number, one of the two nearest floating-point numbers is chosen randomly). Instrumented program results thus become realizations of a random variable, the dispersion of which gives an estimation of the impact of the accumulation of floating-point round-off errors during program execution. In the computer arithmetic community, this technique is known as an asynchronous CESTAC method, which is a variant of Monte-Carlo arithmetic. More details can be found in Verrou's user manual [2]. This work was pursued at EDF R&D [3], but we think such a tool might be of broader interest, especially since Valgrind's "Project Suggestions" page lists the detection of floating-point inaccuracies as a topic of interest. We also would like to take the opportunity of this message to thank Josef Weidendorfer, who kindly helped us getting started with the development of a new Valgrind tool, back when this project began in 2014. We just released (under the GPLv2) version 1.0.0 of Verrou, which we believe to be stable enough for others to use. So please let us know of any comments you might have about this tool. In any case, many thanks to the Valgrind development team: Verrou would not exist without your outstanding work. And the other codes we develop would be a lot buggier and less performant if it were not for Valgrind tools... François Févotte and Bruno Lathuilière [1] http://github.com/edf-hpc/verrou [2] http://edf-hpc.github.io/verrou/vr-manual.html [3] EDF is France's main electricity utility. Its R&D department develops a lot of numerical simulation softwares, which are used in crucial parts of our industrial process.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
2017-05-29 13:20 GMT+02:00 FEVOTTE Francois <[hidden email]>:
Dear François and Bruno, Thank you for sharing information about new Valgrind tool with the Valgrind developers. I am Cc'ing also Valgrind users because it's actually users who will be using this tool. From the Valgrind (tooling) perspective it looks quite neat. But I know nothing about floating point rounding modes to know how practical it is
for finding real issues. So I asked some of my colleagues for their thoughts and comments. Your comments about these are welcome. ---------------------------------------------------------------- Comment #1: My first reaction is that just using random rounding might be
considerably less interesting than also being able to do precision
bounding. The latter might be able to help with questions like "do I
need to switch between float and double?" and stuff like that.
I also wonder if random rounding leads to tremendous understatement or overstatement of a rounding problem. I can imagine the former, since random choices might tend to cancel each other out. I could also imagine the latter, since interval arithmetic (most pessimistic rounding) was rather incapable of judging the numerical stability of conventional algorithms. I'm more inclined to bet on the former. Looking at top google hits on Monte Carlo arithmetic makes this stuff sound a little researchy and unproven (old hits, many from the same author), but I didn't look closely. Comment #2: I once tracked down a numerical problem with a SPEC benchmark by modifying the compiler to do arithmetic in both the precision specified by the program and a higher precision, and then to print a warning when they diverged. (Of course, that meant that the higher precision results had to be stored in a table hashed by the address of the lower precision results; also it had to reset the higher precision value when the variable was assigned from some source that didn’t have an associated higher precision value, for example via I/O.) That worked pretty well, although of course it slowed the program down a bit. Comment #3: While I've recently been reading up on design of elementary functions
for speed
and accuracy, I won't claim to be a master of numerical methods. With that disclaimer, I will say that the idea of "random rounding" makes me uncomfortable, in part because any method which does not give repeatable results creates difficulty for debugging. Also, some types of cumulative numerical instabilities will not be shown by random rounding. On the other hand, the general problem of identifying numerical instability in large applications is a tough problem. If "random rounding" has been shown to help identify some problems, then it could be considered one of several valid numerical stability tests and a useful tool in the numerical analyst's toolbox. Personally, I like the approach of doing test runs of an application at higher precision to see if the results change. That approach is often supported by use of a compiler switch and SW libraries for the higher precision, requiring modest programming effort and a one time investment of slow test runs. I'm sure this tool also requires slow test runs as it talks about repeated runs of different portions of the application to determine the source of maximum round-off variation. For elementary functions, such as those found in libm, there are more rigorous methods than either of the above for proving the worst case error does not exceed defined bounds. Whether this particular tool will become important to customers is unknown. Many more tools are developed than are widely used. It may be determined in part by the 'marketing' of it by its developers. Some approaches get a lot of buzz and then fade away. I class interval arithmetic in that category. Maybe not gone forever, but not driving any major purchase decisions. Others grow and eventually become part of everyone's base expectations (Perl, Java, ...). --------------------------------------------------------------------------- What will happen at this point? 1. I hope we can discuss more about this tool. 2. We can add a link at this page: http://valgrind.org/downloads/variants.html pointing to your tool. 3. If the community agrees that this tool will be worth adding into Valgrind source code repository then you can initiate talks about integrating it. But 1. needs to happen first. Kind regards, Ivosh Raisr ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
In reply to this post by FEVOTTE Francois
On Mon, May 29, 2017 at 1:20 PM, FEVOTTE Francois
<[hidden email]> wrote: > Dear Valgrind developers, > > first, please forgive us if this post is out of place in this list. > > We would like to introduce Verrou [1], a floating-point error diagnostics > tool based on Valgrind. The idea behind the tool is that it replaces all > floating-point operations by randomly rounded ones (which means that instead > of always rounding non-representable results to the nearest floating-point > number, one of the two nearest floating-point numbers is chosen randomly). It would be nice to be able to "define" the randomness here, e.g. by providing a pseudo-random-number generator and a command line option to provide the "seed" value. Point is that you can actually debug issues in a deterministic&&repeatable way *IF* they happen. > Instrumented program results thus become realizations of a random variable, > the dispersion of which gives an estimation of the impact of the > accumulation of floating-point round-off errors during program execution. In > the computer arithmetic community, this technique is known as an > asynchronous CESTAC method, which is a variant of Monte-Carlo arithmetic. > More details can be found in Verrou's user manual [2]. > > This work was pursued at EDF R&D [3], but we think such a tool might be of > broader interest, especially since Valgrind's "Project Suggestions" page > lists the detection of floating-point inaccuracies as a topic of interest. Another issue is to make sure things like +nan/-nan and NaNs with payloads work correctly with your tool since there are lots of applications which use this kind of stuff for error ("error" as in "error message") propagation. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [hidden email] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
In reply to this post by iraisr
Thank you Ivosh for expanding the announce to valgrind-users.
Le jeudi 01 juin 2017 à 12:58 +0200, [hidden email] a écrit : From the Valgrind (tooling) perspective it looks quite neat. But I know nothing about floating point rounding modes to know how practical it is Your comments about these are welcome. The comments and concerns raised by your colleagues are all very valid. I'll try to provide below the answers they deserve, based on theoretical grounds as well as our experience with analyzing our codes. ---------------------------------------------------------------- Comment #1: My first reaction is that just using random rounding might be considerably less interesting than also being able to do precision bounding. The latter might be able to help with questions like "do I need to switch between float and double?" and stuff like that.Agreed. There are several ways (other than Random Rounding or even MCA) which can produce more complete (and sometimes also more guaranteed) results. Some tools providing a more in-depth analysis than Verrou are listed below. These tools generally use expensive arbitrary precision computations and/or perform an in-depth analysis of the program execution flow, so that their overhead is generally very high (we're talking about slow downs by factors 600 to 10000 in some cases). Which is why we would recommend using them only on small programs, in order track and debug a specific error. Random rounding and other forms of Monte-Carlo arithmetic is in our opinion more useful for the broad exploration of a full-scale program on its whole set of test cases. Of course your mileage may very well vary. Here is a list of other tools that we know about. Some of them are also based on valgrind - fpDebug: https://github.com/fbenz/FpDebug - CRAFT-HPC: https://github.com/crafthpc/craft - Herbgrind: http://herbgrind.ucsd.edu/ I also wonder if random rounding leads to tremendous understatement or overstatement of a rounding problem. I can imagine the former, since random choices might tend to cancel each other out. I could also imagine the latter, since interval arithmetic (most pessimistic rounding) was rather incapable of judging the numerical stability of conventional algorithms. I'm more inclined to bet on the former.One of the reasons why Interval Arithmetic tends to produce extremely large intervals is because it does not track the dependencies between variables (think of how {x^2 for x in [-1,1]} = [0,1] is different from {x*y for x,y in [-1,1]} = [-1,1]). Random Rounding naturally keeps track of how variables values are related, so that the dispersion of the randomly rounded results often accurately estimates floating-point inaccuracies in practice. Looking at top google hits on Monte Carlo arithmetic makes this stuff sound a little researchy and unproven (old hits, many from the same author), but I didn't look closely.Of course, the key words above are "often" and "in practice": Random Rounding offers no strong guarantee. Still, I invite you to look at the CESTAC method, and its synchronous variant DSA (Discrete Stochastic Arithmetic) implemented in CADNA (http://www-pequan.lip6.fr/cadna/), which provide a few research papers explaining why the statistical analysis of randomly rounded results works, as well as the underlying hypotheses. Comment #2: I once tracked down a numerical problem with a SPEC benchmark by modifying the compiler to do arithmetic in both the precision specified by the program and a higher precision, and then to print a warning when they diverged. (Of course, that meant that the higher precision results had to be stored in a table hashed by the address of the lower precision results; also it had to reset the higher precision value when the variable was assigned from some source that didn’t have an associated higher precision value, for example via I/O.) That worked pretty well, although of course it slowed the program down a bit.Yes. Running a program in higher precision arithmetic is a very good way of debugging numerical errors. In order to apply it, you either need to recompile your whole program (and its 3rd party dependencies; think for example of a Python interpreter or a distributed linear algebra solver) or you need to perform Dynamic Binary Analysis (which is what the tools mentioned above do). Comment #3: With that disclaimer, I will say that the idea of "random rounding" makes meThe reproducibility of random rounding runs by being able to set the seed of the random generator is one of the next planned features of Verrou (in fact, I thought we already had the feature; it might have accidentally "disappeared" when we refactored the RNG part of the code). I must also say that it is in practice one of the great advantages of random rounding to be able to trigger instabilities that would otherwise arise only when something changes in the environment (like the CPU architecture, or the number of parallel processes). I like to think of it as a way of "reproducing the non-reproducibilities", which can in some instances help debugging. Also, some types of cumulativeDo you have something particular in mind? In our experience, the first errors that are usually detected using random rounding are large dot products (or other forms of accumulation of round-off errors in one variable). On the otherYes, this is also the way we think about it. No individual tool/methodology is (yet) able to tacke all problems at once. We like to think of random rounding as on of several tools which can help, alongside shadow execution in higher precision, interval arithmetic, static analysis, and code rewriting techniques. I'm sure this tool also requires slow test runs as it talks about repeated runsThis is actually the part that we're the least satisfied with: localizing the origin of errors using a bisection does produce good results. But it is painfully slow due to the large number of randomly rounded runs that is required. And we feel like there should be better ways of doing this. For elementary functions, such as those found in libm, there are more rigorousYes. We enter here the realm of formal proofs. We are currently working on integrating verrou with a guaranteed libm, such as MetaLibm (http://www.metalibm.org/) in order to get the best of both worlds: get accurate evaluations of mathematical functions, but still check how the last round-off error propagates during program execution. What will happen at this point? 1. I hope we can discuss more about this tool.Sure. Do you have something in mind (other than discussions on this mailing list)? 2. We can add a link at this page: http://valgrind.org/downloads/variants.html pointing to your tool.Thanks, but I think you've already done enough by expanding the diffusion to valgrind-users (I hesitated before contacting the small circle of Valgrind developers; I would not have risked to "spam" the whole users list) 3. If the community agrees that this tool will be worth adding into Valgrind source code repository then you can initiate talks about integrating it. But 1. needs to happen first.I'm not sure we would be ready for integration on our side anyway. Although we've tried to follow Valgrind's way of doing things, we still have difficulties with our build system (especially the C++ parts need some tweaking of the Makefiles, and the detection of FMA support is brittle), and have never tested anything on platforms other than Linux x86_64 Best regards, François -- François Févotte Research Engineer EDF R&D - PERICLES Group I23 (Analysis and Numerical Modeling)
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
In reply to this post by iraisr
Hello, Le jeudi 01 juin 2017 à 11:29 +0000, [hidden email] a écrit : > This is potentially interesting for what I do. Is the documentation at > http://edf-hpc.github.io/verrou/vr-manual.html up to date? Yes, it should be. > Something that would be very useful would be control of the seed of > the random number generator, so that we could repeat and debug cases > that gave strange results. I actually think we had this feature in earlier versions of Verrou, and I'm not sure when and how it disappeared. But I opened an issue on Github (https://github.com/edf-hpc/verrou/issues/3) and will (re)introduce this feature as soon as I can. > The code I work on is a large mathematical modeller, with perhaps > 70,000 functions. Rather than using an exclusion file, it would be > very useful to have an inclusion file as an alternative, where one > could specify only the functions that should be subject to > perturbation. We'd like this because it’s quite difficult to > adequately test various numerical algorithms outside the context of > the modeller. This one is also in our todo list. One way to avoid the problem is to let Verrou generate the whole list of functions it encounters during a test run, and then provide this list as an exclusion list. If you give the full list, nothing will be perturbed; if you comment some functions in the list, only these functions will be perturbed. The generation of exclusion lists is documented here: http://edf-hpc.github.io/verrou/vr-manual.html#idm6262 Thank you for your interest in Verrou. Best regards, François -- François FÉVOTTE Research Engineer EDF – R&D – PERICLES I23 (Analysis and Numerical Modeling) 7 boulevard Gaspard Monge 91120 Palaiseau [hidden email] Phone: +33 1 78 19 44 23 Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
In reply to this post by Roland Mainz
Le jeudi 01 juin 2017 à 13:46 +0200, Roland Mainz a écrit :
> It would be nice to be able to "define" the randomness here, e.g. by > providing a pseudo-random-number generator and a command line option > to provide the "seed" value. We are planning to add a command-line switch to provide the seed value (https://github.com/edf-hpc/verrou/issues/3), but I don't think that we will go any further than that. In other words, we don't plan changing the pRNG nor letting the user define it. > Point is that you can actually debug > issues in a deterministic&&repeatable way *IF* they happen. In order to have a deterministic way to perturb results, we have a "farthest" rounding mode, which always rounds in an opposite way to the standard nearest rounding mode (it leaves representable values unchanged, though). This is documented here: http://edf-hpc.github.io/verrou/vr-manual.html#vr-manual.feat.rounding-mode > Another issue is to make sure things like +nan/-nan and NaNs with > payloads work correctly with your tool since there are lots of > applications which use this kind of stuff for error ("error" as in > "error message") propagation. This is a very good point, thanks! I just checked and it appears that Verrou currently preserves NaN values, but sometimes changes an infinite value into a (large but) finite one. Also, NaN payloads are sometimes changed. So there is some work to do here. I opened an issue to handle this: https://github.com/edf-hpc/verrou/issues/4 Thanks for your comment, François -- François FÉVOTTE Research Engineer EDF – R&D – PERICLES I23 (Analysis and Numerical Modeling) 7 boulevard Gaspard Monge 91120 Palaiseau - FRANCE [hidden email] Phone: +33 1 78 19 44 23 Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-developers mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
Free forum by Nabble | Edit this page |