Trying to figure out if errors from app linked with IntelMPI are spurious

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to figure out if errors from app linked with IntelMPI are spurious

Ramsey, James J CIV USARMY RDECOM ARL (US)
I have an application with a hard-to-find segfault problem that is compiled against IntelMPI and linked with valgrind MPI wrappers, and I when I try to run it with Valgrind, I get several possibly spurious (?) errors of the following forms:

Invalid read of size 8
  at 0x5D3D570: free (i_rtc_hook.c:57)
  by 0x13F5AEF4: ??? (in /usr/lib64/libdaploscm.so.2.0.0)
 ...
Address 0x9166b38 is 8 bytes before a block of size 552 alloc'd
  at 0x4C293FA: malloc (vg_replace_malloc.c:299)

Invalid read of size 8
  at 0x5D3D570: free (i_rtc_hook.c:57)
  by 0x6E507CC: fclose@@GLIBC_2.2.5 (in /lib64/libc-2.12.so)
 ...
Address 0x9193fa8 is 8 bytes before a block of size 568 alloc'd
  at 0x4C293FA: malloc (vg_replace_malloc.c:299)

Invalid read of size 8
   at 0x5D3D570: free (i_rtc_hook.c:57)
   by by 0x4DA756: SomeWrapperOfFree(char*, int, void*) (MemCheck.cc:linenum1)
  ...
Address 0x9199f08 is 8 bytes before a block of size 4 alloc'd
  at 0x4C293FA: malloc (vg_replace_malloc.c:299)

Invalid read of size 8
 at 0x5D3D6D6: realloc (i_rtc_hook.c:82)
 by 0x4DA8E2: SomeWrapperOfRealloc(char*, int, void*, unsigned long) (MemCheck.cc:linenum2)
 ...
Address 0x9197ab8 is 8 bytes before a block of size 280 alloc'd
 at 0x4C293FA: malloc (vg_replace_malloc.c:299)

SomeWrapperOfFree() and SomeWrapperOfRealloc() are functions in source file MemCheck.cc of my application. (Actually, those aren't the real names of the functions. The names are changed to protect the innocent/guilty/???.) Near as I can tell, i_rtc_hook.c is some file in the implementation of IntelMPI. I am using the suppressions file "$I_MPI_ROOT/intel64/etc/valgrind.supp" included with the Intel MPI implementation that I'm using.

I'm not sure what to make of these instances of "Invalid read error". If I try a simple test program where I feed free() a pointer that is off from a pointer that's been properly malloc'd, I get an "Invalid free() / delete / delete[] / realloc()" instead of the "Invalid read error". Also, I'm not sure why SomeWrapperOfFree() and SomeWrapperOfRealloc() are calling the versions of free() and realloc() apparently defined in i_rtc_hook.c rather than the ones in vg_replace_malloc.c, which makes me wonder if the calls to malloc(), realloc(), and free() are all being fully wrapped or if there's some oddity in how things are linked.

In short, it's not clear if these errors are spurious, real, or indicative of some other problem.

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Trying to figure out if errors from app linked with IntelMPI are spurious

Ramsey, James J CIV USARMY RDECOM ARL (US)
________________________________________

I'm not sure what to make of these instances of "Invalid read error". If I try a simple test program where I feed free() a pointer that is off from a pointer that's been properly malloc'd, I get an "Invalid free() / delete / delete[] / realloc()" instead of the "Invalid read error".
________________________________________

Finally managed to reproduce the "Invalid read error" from free() with a small test case here:

#include <iostream>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char ** argv) {
    MPI_Init(&argc, &argv);
   
    int size, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &size);  
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   
    int * foo;
    foo = (int*)malloc(sizeof(int)*size);
   
    std::cout << "foo = [";
    for (int i = 0; i < size; ++i) {
        foo[i] = rank + i;
        std::cout << " " << foo[i];
    }
    std::cout << " ]\n";
   
    free(foo);
   
    MPI_Finalize();
    return 0;
}


The catch is that the "Invalid read error" only shows up if I run this simple test case on more than two nodes of a cluster. If I run in parallel, but the parallel cores are all on a single node, then there are no errors.

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Trying to figure out if errors from app linked with IntelMPI are spurious

John Reiser
In reply to this post by Ramsey, James J CIV USARMY RDECOM ARL (US)
> I have an application with a hard-to-find segfault problem that is compiled against IntelMPI and linked with valgrind MPI wrappers, and I when I try to run it with Valgrind, I get several possibly spurious (?) errors of the following forms:
>
> Invalid read of size 8
>    at 0x5D3D570: free (i_rtc_hook.c:57)
>    by 0x13F5AEF4: ??? (in /usr/lib64/libdaploscm.so.2.0.0)
        ...
> Address 0x9193fa8 is 8 bytes before a block of size 568 alloc'd
>   at 0x4C293FA: malloc (vg_replace_malloc.c:299)


A traceback which contains "free (i_rtc_hook.c:57)" gives a hint that
valgrind did not intercept the call to free() like it should have.
Instead the filename should be something like "vg_replace_free.c",
or even perhaps "vg_replace_malloc.c".

Accessing one Word just below an allocated block is something
that many implementations of malloc+free do.  That Word typically
contains size information, some flags, etc., that help malloc+free
maintain the allocation arena.  Valgrind(memcheck) wants to replace
that mechanism entirely.  The reported symptoms indicate that
only the malloc() side was replaced; the free() side was not recognized.




------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Trying to figure out if errors from app linked with IntelMPI are spurious

Ramsey, James J CIV USARMY RDECOM ARL (US)
________________________________________

A traceback which contains "free (i_rtc_hook.c:57)" gives a hint that
valgrind did not intercept the call to free() like it should have.
Instead the filename should be something like "vg_replace_free.c",
or even perhaps "vg_replace_malloc.c".
_______________________________________________

FYI, there is no "vg_replace_free.c" in Valgrind's source, just "vg_replace_malloc.c", which seems to contain the replacements for malloc(), free(), etc.

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Trying to figure out if errors from app linked with IntelMPI are spurious

Ramsey, James J CIV USARMY RDECOM ARL (US)
In reply to this post by Ramsey, James J CIV USARMY RDECOM ARL (US)
________________________________________

Finally managed to reproduce the "Invalid read error" from free() with a small test case here:
_______________________________________________

I noticed that the small test case reproduces the error when IntelMPI 5.0.2 is used, but not IntelMPI 4.1.3. It also appears that on my real problem, I also don't see that issue with the invalid read error from free (i_rtc_hook.c:57), when the code in question is compiled against IntelMPI 4.1.3 instead of IntelMPI 5.0.2.

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users