Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
Hi,

Thank you for sharing the great debugging tool.

When I tried to run mozilla thunderbird mail client, which I create  
under Debian GNU/Linux 64-bit,
under valgrind, valgrind mysteriously crashed and gdb was not much help.
This happened under the latest 4.8.x kernel which Debian distributed as  
part of its testing repository.

I tried a few things but subsequently reverted to kernel 3.19.5.
Now thunderbird under valgrind works (!).

The following is an excerpt from a post that I sent to developer mailing  
list of mozilla thunderbird.

If the symptom rings a bell, please let me know.

TIA

--- begin quote
Well the original problem I had: valgrind crashed when I tried to invoke  
it as part of |make mozmill| test of  mozilla thunderbird.

It looks that there is a Debign GNU/Linux kernel issue.
(It occurred about a couple of years ago in 2015, too.).
I was using 4.x series since late last year and valgrind did not work  
any more, and I reverted back to kernel version 3.19.5.[userland has  
been upgraded to work with 4.x series, and so I am a little  
uncomfortable doing this. But, I need valgrind to work for debugging.]
Now thunderbird under valgrind works again under 3.19.5
$ uname -a
Linux ip030 3.19.5 #1 SMP Mon Apr 20 08:50:21 JST 2015 x86_64 GNU/Linux


There is something, about Debian's kernel config, that interferes with  
the correct valgrind operation. I am not sure what.

For those interested:
I can send you the config file that Debian used to create these kernel  
images that Debian officially distributes.
On the other hand, if someone uses valgrind on, say, CentOS or Fedora  
(kernel 4.x series) and can run thunderbird under it succssefully,
I would appreciate to look at the config file to create the kernel image  
there, so I would be able to compare and tinker with the kernel and if I  
can make the valgrind to run
in a modified kernel in Debian GNU/Linux environment.

TIA
--- end quote

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

Tom Hughes-2
On 15/02/17 13:34, ISHIKAWA,chiaki wrote:

> When I tried to run mozilla thunderbird mail client, which I create
> under Debian GNU/Linux 64-bit,
> under valgrind, valgrind mysteriously crashed and gdb was not much help.

Well valgrind almost never "mysteriously crashes".

In fact it is usually very verbose when anything goes wrong.

So the first thing you should do is to tell us in detail exactly what it
said when it stopped.

> This happened under the latest 4.8.x kernel which Debian distributed as
> part of its testing repository.
>
> I tried a few things but subsequently reverted to kernel 3.19.5.
> Now thunderbird under valgrind works (!).

So most likely this is just a new system call that valgrind doesn't
handle or something, in which case valgrind will have reported all the
details needed to fix it when it stopped.

Tom

--
Tom Hughes ([hidden email])
http://compton.nu/

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
On 2017/02/15 23:32, Tom Hughes wrote:

> On 15/02/17 13:34, ISHIKAWA,chiaki wrote:
>
>> When I tried to run mozilla thunderbird mail client, which I create
>> under Debian GNU/Linux 64-bit,
>> under valgrind, valgrind mysteriously crashed and gdb was not much help.
>
> Well valgrind almost never "mysteriously crashes".
>
> In fact it is usually very verbose when anything goes wrong.
>

Hi,

Thank you for your comment.

The above was what I thought back in 2015 and actually I exchanged a few
e-mails with Julian Seward about the issue back then. But we gave up on it.
Because the system printed out "Segmentation error" without a good trace
of anything at all (!) (which was quite surprising): We traced signals,
and stuff. Everything we could think of using various options passed to
valgrind (and even traced the system calls valgrind was issuing using
strace.).


> So the first thing you should do is to tell us in detail exactly what it
> said when it stopped.

Since gdb and various traces invoked by the options passed to valgrind
are useless (as in the case back in 2015),
I traced the system calls issued by valgrind.

There was a MMAP call before something went wrong and signal 11 was
issued and then
I saw SIGSEGV passed a dozen times or so, and voila. Segmentation error
back at the shell level.
gdb does not print anything useful at all...

>
>> This happened under the latest 4.8.x kernel which Debian distributed as
>> part of its testing repository.
>>
>> I tried a few things but subsequently reverted to kernel 3.19.5.
>> Now thunderbird under valgrind works (!).
>
> So most likely this is just a new system call that valgrind doesn't
> handle or something, in which case valgrind will have reported all the
> details needed to fix it when it stopped.

That was what I (and Julian Seward) hoped back in 2015, but valgrind did
not. From the debugging I did over the last few months, I figured the
problem I face is indeed as perplexing as the case back in 2015 and I
took the easy course now: I decided that trying to find out if there is
ANYBODY who is using valgrind and running big program under it using
Debian GNU/Linux official kernel is easier (which I doubt based on my
experience). Also, Julian Seward back in 2015 mentioned valgrind could
grok thunderbird under Fedora and thus I thought it would be easier to
figure out if someone is running 64-bit thunderbird under CentOS or
Fedora 64-bit and compare the config to figure out what is causing the
problem under Debian's kernel.

BTW, the following is is what I found back in 2015.


------------------------+----------------
Kernel version          | valgrind + C-C TB works or not
------------------------+----------------
Debian          3.2.0...|  works <--- base debian version for wheezy
------------------------+----------------
self-compiled   3.9.0...|  works
------------------------+----------------
self-compiled  3.12.40  | works
------------------------+----------------
self-compiled  3.13.11  | works
------------------------+----------------

self-compiled  3.14.38  | ???  <--- pristine kernel hit the problem
mentioned in the following patch and panicked. open source is
wonderful when it works, but when it does not
            http://lkml.iu.edu/hypermail/linux/kernel/1407.3/04296.html

------------------------+----------------
self-compiled 3.15.9    | ??? <--- vanilla kernel could not bring up X
                            probably because the same reason above. X
did     not start in a few minutes, and so I gave up. I did not see the
kernel panic, though.

------------------------+----------------
Debian backport 3.16 ...|  Segmentation fault! [Why? I have no idea.]
------------------------+----------------

------------------------+------------------
Vanilla 3.19.5          | works   (worked back in 2015 and now I have to
revert to it...)
------------------------+------------------

This time arouind, I tried to figure out if I could do something similar
using the latest kernel 4.9.x (vanilla version), hoping it might make
valgrind run thunderbird under it without segmentation error. But the
very late kernel caused a problem of VirtualBox utility, such as
graphics driver that supports dynamic resizing, not supporting the
latest kernel as guest at all, and I had to give it up.
(Yes, I am running Debian GNU/Linux inside VirtualBox.)

Sorry, I was so tired of debugging and seeing that the current issue
looked so much like the mysterious problem back in 2015, that I did not
bother to pursue the issue in valgrind per se, but rather wanted to
focus on kernel issue now.

I am running the |make mozmill| test of thunderbird which now takes
about 48 hours and once it is over, I will switch the kernel and gather
the gdb stack trace (which is useless) when valgrind crashes, and
also show the last part of strace (system call trace) which again is not
very revealing.

I am sure you will be perplexed why on earth valgrind is crashing when
we try to run thunderbird underneath in Debian's kernel. [I *DID* notice
that there are differences in Debian kernel that it enables stack
protection, for starter. Not sure if it affects Valgrind operation.]


>
> Tom
>

TIA



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
On 2017/02/16 1:50, ISHIKAWA,chiaki wrote:

> On 2017/02/15 23:32, Tom Hughes wrote:
>> On 15/02/17 13:34, ISHIKAWA,chiaki wrote:
>>
>>> When I tried to run mozilla thunderbird mail client, which I create
>>> under Debian GNU/Linux 64-bit,
>>> under valgrind, valgrind mysteriously crashed and gdb was not much help.
>>
>> Well valgrind almost never "mysteriously crashes".
>>
>> In fact it is usually very verbose when anything goes wrong.
>>
>
> Hi,
>
> Thank you for your comment.
>
> The above was what I thought back in 2015 and actually I exchanged a few
> e-mails with Julian Seward about the issue back then. But we gave up on it.
> Because the system printed out "Segmentation error" without a good trace
> of anything at all (!) (which was quite surprising): We traced signals,
> and stuff. Everything we could think of using various options passed to
> valgrind (and even traced the system calls valgrind was issuing using
> strace.).
>
>
>> So the first thing you should do is to tell us in detail exactly what it
>> said when it stopped.
>
> Since gdb and various traces invoked by the options passed to valgrind
> are useless (as in the case back in 2015),
> I traced the system calls issued by valgrind.
>
> There was a MMAP call before something went wrong and signal 11 was
> issued and then
> I saw SIGSEGV passed a dozen times or so, and voila. Segmentation error
> back at the shell level.
> gdb does not print anything useful at all...
>
>>
>>> This happened under the latest 4.8.x kernel which Debian distributed as
>>> part of its testing repository.
>>>
>>> I tried a few things but subsequently reverted to kernel 3.19.5.
>>> Now thunderbird under valgrind works (!).
>>
>> So most likely this is just a new system call that valgrind doesn't
>> handle or something, in which case valgrind will have reported all the
>> details needed to fix it when it stopped.
>
> That was what I (and Julian Seward) hoped back in 2015, but valgrind did
> not. From the debugging I did over the last few months, I figured the
> problem I face is indeed as perplexing as the case back in 2015 and I
> took the easy course now: I decided that trying to find out if there is
> ANYBODY who is using valgrind and running big program under it using
> Debian GNU/Linux official kernel is easier (which I doubt based on my
> experience). Also, Julian Seward back in 2015 mentioned valgrind could
> grok thunderbird under Fedora and thus I thought it would be easier to
> figure out if someone is running 64-bit thunderbird under CentOS or
> Fedora 64-bit and compare the config to figure out what is causing the
> problem under Debian's kernel.
>
> BTW, the following is is what I found back in 2015.
>
>
> ------------------------+----------------
> Kernel version          | valgrind + C-C TB works or not
> ------------------------+----------------
> Debian          3.2.0...|  works <--- base debian version for wheezy
> ------------------------+----------------
> self-compiled   3.9.0...|  works
> ------------------------+----------------
> self-compiled  3.12.40  | works
> ------------------------+----------------
> self-compiled  3.13.11  | works
> ------------------------+----------------
>
> self-compiled  3.14.38  | ???  <--- pristine kernel hit the problem
> mentioned in the following patch and panicked. open source is
> wonderful when it works, but when it does not
>             http://lkml.iu.edu/hypermail/linux/kernel/1407.3/04296.html
>
> ------------------------+----------------
> self-compiled 3.15.9    | ??? <--- vanilla kernel could not bring up X
>                             probably because the same reason above. X
> did     not start in a few minutes, and so I gave up. I did not see the
> kernel panic, though.
>
> ------------------------+----------------
> Debian backport 3.16 ...|  Segmentation fault! [Why? I have no idea.]
> ------------------------+----------------
>
> ------------------------+------------------
> Vanilla 3.19.5          | works   (worked back in 2015 and now I have to
> revert to it...)
> ------------------------+------------------
>
> This time arouind, I tried to figure out if I could do something similar
> using the latest kernel 4.9.x (vanilla version), hoping it might make
> valgrind run thunderbird under it without segmentation error. But the
> very late kernel caused a problem of VirtualBox utility, such as
> graphics driver that supports dynamic resizing, not supporting the
> latest kernel as guest at all, and I had to give it up.
> (Yes, I am running Debian GNU/Linux inside VirtualBox.)
>
> Sorry, I was so tired of debugging and seeing that the current issue
> looked so much like the mysterious problem back in 2015, that I did not
> bother to pursue the issue in valgrind per se, but rather wanted to
> focus on kernel issue now.
>
> I am running the |make mozmill| test of thunderbird which now takes
> about 48 hours and once it is over, I will switch the kernel and gather
> the gdb stack trace (which is useless) when valgrind crashes, and
> also show the last part of strace (system call trace) which again is not
> very revealing.
>
> I am sure you will be perplexed why on earth valgrind is crashing when
> we try to run thunderbird underneath in Debian's kernel. [I *DID* notice
> that there are differences in Debian kernel that it enables stack
> protection, for starter. Not sure if it affects Valgrind operation.]
>
>
>>
>> Tom
>>
>
> TIA
Here are snipets from  the log when valgrind could not run mozilla
thunderbird (which seems to spawn a few binaries during its life time
when it is invoked as part of |make mozmill| test suite.)

uname -a
Linux ip030 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04) x86_64
GNU/Linux
ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$

---

run-valgrind (masquerading as thunderbird binary)
final command line is:
valgrind --trace-children=yes --smc-check=all-non-file
--gen-suppressions=all --malloc-fill=0xA5 --free-fill=0xC3
--leak-check=full --num-callers=50
--suppressions=$HOME/Dropbox/myown.sup
--suppressions=$HOME/Dropbox/myown32.sup --show-possibly-lost=no
/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin -jsbridge 24242
-foreground -profile
/NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile

==3755== Memcheck, a memory error detector
==3755== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==3755== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for
copyright info
==3755== Command: /NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin
-jsbridge 24242 -foreground -profile
/NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile
==3755==
==3755== Mismatched free() / delete / delete []
==3755==    at 0x4C2CD3A: free (vg_replace_malloc.c:530)
==3755==    by 0x13EE71B3: bool
google::protobuf::InsertIfNotPresent<std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > > >(std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >*, std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >::value_type::first_type const&,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::pair<void const*, int>,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >::value_type::second_type const&) (mozalloc.h:218)
==3755==    by 0x13EE8827:
google::protobuf::SimpleDescriptorDatabase::DescriptorIndex<std::pair<void
const*, int> >::AddFile(google::protobuf::FileDescriptorProto const&,
std::pair<void const*, int>) (descriptor_database.cc:56)
==3755==    by 0x13EE8DDE:
google::protobuf::EncodedDescriptorDatabase::Add(void const*, int)
(descriptor_database.cc:313)
==3755==    by 0x13EE8E4A:
google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*,
int) (descriptor.cc:1018)
==3755==    by 0x13EE8EEE:
google::protobuf::protobuf_AddDesc_google_2fprotobuf_2fdescriptor_2eproto()
(descriptor.pb.cc:711)
==3755==    by 0x13EEB33A:
__static_initialization_and_destruction_0(int, int) (descriptor.pb.cc:762)
==3755==    by 0x13EF879F:
_GLOBAL__sub_I_Unified_cpp_components_protobuf0.cpp (message.cc:358)
==3755==    by 0x400F649: call_init.part.0 (dl-init.c:72)
==3755==    by 0x400F75A: call_init (dl-init.c:30)
==3755==    by 0x400F75A: _dl_init (dl-init.c:120)
==3755==    by 0x4013CD7: dl_open_worker (dl-open.c:575)
==3755==    by 0x400F4F3: _dl_catch_error (dl-error.c:187)
==3755==    by 0x4013488: _dl_open (dl-open.c:660)
==3755==    by 0x5055EE8: dlopen_doit (dlopen.c:66)
==3755==    by 0x400F4F3: _dl_catch_error (dl-error.c:187)
==3755==    by 0x5056520: _dlerror_run (dlerror.c:163)
==3755==    by 0x5055F81: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==3755==    by 0x123072: GetLibHandle(char const*) (nsXPCOMGlue.cpp:105)
==3755==    by 0x1230FA: ReadDependentCB(char const*) (nsXPCOMGlue.cpp:157)
==3755==    by 0x123337: XPCOMGlueLoad(char const*) (nsXPCOMGlue.cpp:333)
==3755==    by 0x12347B: mozilla::GetBootstrap(char const*)
(nsXPCOMGlue.cpp:408)
==3755==    by 0x10D406: InitXPCOMGlue(char const*) (nsMailApp.cpp:247)
==3755==    by 0x10D7C4: main (nsMailApp.cpp:295)
==3755==  Address 0x5f08f90 is 0 bytes inside a block of size 33 alloc'd
==3755==    at 0x4C2C1EC: operator new(unsigned long)
(vg_replace_malloc.c:334)
==3755==    by 0x113B8A: void std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char>
 >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
(basic_string.tcc:219)
==3755==    by 0x13EE7185: bool
google::protobuf::InsertIfNotPresent<std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > > >(std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >*, std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, std::pair<void const*,
int>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >::value_type::first_type const&,
std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::pair<void const*, int>,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, std::pair<void
const*, int> > > >::value_type::second_type const&) (basic_string.h:196)
==3755==    by 0x13EE8827:
google::protobuf::SimpleDescriptorDatabase::DescriptorIndex<std::pair<void
const*, int> >::AddFile(google::protobuf::FileDescriptorProto const&,
std::pair<void const*, int>) (descriptor_database.cc:56)


... flurry of mismatched free/delete, etc.



==3755==    by 0x123337: XPCOMGlueLoad(char const*) (nsXPCOMGlue.cpp:333)
==3755==    by 0x12347B: mozilla::GetBootstrap(char const*)
(nsXPCOMGlue.cpp:408)
==3755==    by 0x10D406: InitXPCOMGlue(char const*) (nsMailApp.cpp:247)
==3755==    by 0x10D7C4: main (nsMailApp.cpp:295)
==3755==
{
    <insert_a_suppression_name_here>
    Memcheck:Free
    fun:free
    fun:_ZN6google8protobuf20OneofDescriptorProto10SharedDtorEv
    fun:_ZN6google8protobuf20OneofDescriptorProtoD1Ev
    fun:_ZN6google8protobuf20OneofDescriptorProtoD0Ev
 
fun:_ZN6google8protobuf8internal20RepeatedPtrFieldBase7DestroyINS0_16RepeatedPtrFieldINS0_20OneofDescriptorProtoEE11TypeHandlerEEEvv
 
fun:_ZN6google8protobuf16RepeatedPtrFieldINS0_20OneofDescriptorProtoEED1Ev
    fun:_ZN6google8protobuf15DescriptorProtoD1Ev
    fun:_ZN6google8protobuf15DescriptorProtoD0Ev
 
fun:_ZN6google8protobuf8internal20RepeatedPtrFieldBase7DestroyINS0_16RepeatedPtrFieldINS0_15DescriptorProtoEE11TypeHandlerEEEvv
    fun:_ZN6google8protobuf16RepeatedPtrFieldINS0_15DescriptorProtoEED1Ev
    fun:_ZN6google8protobuf15DescriptorProtoD1Ev
    fun:_ZN6google8protobuf15DescriptorProtoD0Ev
 
fun:_ZN6google8protobuf8internal20RepeatedPtrFieldBase7DestroyINS0_16RepeatedPtrFieldINS0_15DescriptorProtoEE11TypeHandlerEEEvv
    fun:_ZN6google8protobuf16RepeatedPtrFieldINS0_15DescriptorProtoEED1Ev
    fun:_ZN6google8protobuf19FileDescriptorProtoD1Ev
    fun:_ZN6google8protobuf25EncodedDescriptorDatabase3AddEPKvi
    fun:_ZN6google8protobuf14DescriptorPool24InternalAddGeneratedFileEPKvi
    fun:_ZN7mozilla8devtools8protobuf33protobuf_AddDesc_CoreDump_2eprotoEv
    fun:_Z41__static_initialization_and_destruction_0ii
    fun:_GLOBAL__sub_I_CoreDump.pb.cc
    fun:call_init.part.0
    fun:call_init
    fun:_dl_init
    fun:dl_open_worker
    fun:_dl_catch_error
    fun:_dl_open
    fun:dlopen_doit
    fun:_dl_catch_error
    fun:_dlerror_run
    fun:dlopen@@GLIBC_2.2.5
    fun:_ZL12GetLibHandlePKc
    fun:_ZL15ReadDependentCBPKc
    fun:_ZL13XPCOMGlueLoadPKc
    fun:_ZN7mozilla12GetBootstrapEPKc
    fun:_ZL13InitXPCOMGluePKc
    fun:main
}
Segmentation fault      <===== one of the binaries invoked by the above
                                command fails here.

==3760==
==3760== HEAP SUMMARY:
==3760==     in use at exit: 426,021 bytes in 1,928 blocks
==3760==   total heap usage: 6,366 allocs, 4,438 frees, 12,676,013 bytes
allocated
==3760==
==3760== LEAK SUMMARY:
==3760==    definitely lost: 0 bytes in 0 blocks
==3760==    indirectly lost: 0 bytes in 0 blocks
==3760==      possibly lost: 8,848 bytes in 150 blocks
==3760==    still reachable: 416,533 bytes in 1,777 blocks
==3760==       of which reachable via heuristic:
==3760== newarray    : 1,536 bytes in 16 blocks
==3760== suppressed: 640 bytes in 1 blocks
==3760== Reachable blocks (those to which a pointer was found) are not
shown.
==3760== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3760==
==3760== For counts of detected and suppressed errors, rerun with: -v
==3760== ERROR SUMMARY: 155 errors from 37 contexts (suppressed: 1 from 1)

Traceback (most recent call last):
   File "runtestlist.py", line 107, in <module>
     line = proc.stdout.readline()
KeyboardInterrupt
xfwm4: Fatal IO error 11 (Resource temporarily unavailable) on X server
:2.0.
/NREF-COMM-CENTRAL/comm-central/mozilla/../mail/testsuite-targets.mk:30:
recipe for target 'mozmill' failed
make: *** [mozmill] Interrupt



[Note the Segmentation error]?

===

So I invoked valgrind under gdb directly.

ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$ gdb /usr/local/bin/valgrind
GNU gdb (GDB) 7.10.50.20160102-cvs
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/valgrind...done.
(gdb) run
Starting program: /usr/local/bin/valgrind --verbose
--trace-children=yes --smc-check=all-non-file --gen-suppressions=all
--malloc-fill=0xA5 --free-fill=0xC3 --leak-check=full --num-callers=50
--suppressions=$HOME/Dropbox/myown.sup
--suppressions=$HOME/Dropbox/myown32.sup --show-possibly-lost=no
/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin -jsbridge 24242
-foreground -profile
/NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile
process 3973 is executing new program:
/usr/local/lib/valgrind/memcheck-amd64-linux
==3973== Memcheck, a memory error detector
==3973== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==3973== Using Valgrind-3.13.0.SVN and LibVEX; rerun with -h for
copyright info
==3973== Command: /NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin
-jsbridge 24242 -foreground -profile
/NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile
==3973==
--3973-- Valgrind options:
--3973--    --verbose
--3973--    --trace-children=yes
--3973--    --smc-check=all-non-file
--3973--    --gen-suppressions=all
--3973--    --malloc-fill=0xA5
--3973--    --free-fill=0xC3
--3973--    --leak-check=full
--3973--    --num-callers=50
--3973--    --suppressions=/home/ishikawa/Dropbox/myown.sup
--3973--    --suppressions=/home/ishikawa/Dropbox/myown32.sup
--3973--    --show-possibly-lost=no
--3973-- Contents of /proc/version:
--3973--   Linux version 4.8.0-2-amd64 ([hidden email])
(gcc version 5.4.1 20161202 (Debian 5.4.1-4) ) #1 SMP Debian 4.8.15-2
(2017-01-04)
--3973--
--3973-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-avx
--3973-- Page sizes: currently 4096, max supported 4096
--3973-- Valgrind library directory: /usr/local/lib/valgrind
--3973-- Reading syms from
/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin
--3973-- Reading syms from /lib/x86_64-linux-gnu/ld-2.24.so
--3973--   Considering
/usr/lib/debug/.build-id/09/5935d2da92389e2991f2b56d14dab9e6978696.debug ..
--3973--   .. build-id is valid
--3973-- Reading syms from /usr/local/lib/valgrind/memcheck-amd64-linux
--3973--    object doesn't have a dynamic symbol table
--3973-- Scheduler: using generic scheduler lock implementation.
--3973-- Reading suppressions file: /home/ishikawa/Dropbox/myown.sup
--3973-- Reading suppressions file: /home/ishikawa/Dropbox/myown32.sup
--3973-- Reading suppressions file: /usr/local/lib/valgrind/default.supp
==3973== embedded gdbserver: reading from
/tmp/vgdb-pipe-from-vgdb-to-3973-by-ishikawa-on-???
==3973== embedded gdbserver: writing to
/tmp/vgdb-pipe-to-vgdb-from-3973-by-ishikawa-on-???
==3973== embedded gdbserver: shared mem
/tmp/vgdb-pipe-shared-mem-vgdb-3973-by-ishikawa-on-???
==3973==
==3973== TO CONTROL THIS PROCESS USING vgdb (which you probably
==3973== don't want to do, unless you know exactly what you're doing,
==3973== or are doing some strange experiment):
==3973==   /usr/local/lib/valgrind/../../bin/vgdb --pid=3973 ...command...
==3973==
==3973== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==3973==   /path/to/gdb
/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin
==3973== and then give GDB the following command
==3973==   target remote | /usr/local/lib/valgrind/../../bin/vgdb --pid=3973
==3973== --pid is optional if only one valgrind process is running
==3973==
--3973-- REDIR: 0x401af50 (ld-linux-x86-64.so.2:strlen) redirected to
0x380a80e8 (vgPlain_amd64_linux_REDIR_FOR_strlen)
--3973-- REDIR: 0x40198a0 (ld-linux-x86-64.so.2:index) redirected to
0x380a8102 (vgPlain_amd64_linux_REDIR_FOR_index)
--3973-- Reading syms from
/usr/local/lib/valgrind/vgpreload_core-amd64-linux.so
--3973-- Reading syms from
/usr/local/lib/valgrind/vgpreload_memcheck-amd64-linux.so
==3973== WARNING: new redirection conflicts with existing -- ignoring it
--3973--     old: 0x0401af50 (strlen              ) R-> (0000.0)
0x380a80e8 vgPlain_amd64_linux_REDIR_FOR_strlen
--3973--     new: 0x0401af50 (strlen              ) R-> (2007.0)
0x04c2ec60 strlen
--3973-- REDIR: 0x4019ac0 (ld-linux-x86-64.so.2:strcmp) redirected to
0x4c2fd60 (strcmp)
--3973-- REDIR: 0x401ba60 (ld-linux-x86-64.so.2:mempcpy) redirected to
0x4c33130 (mempcpy)
--3973-- Reading syms from /lib/x86_64-linux-gnu/libpthread-2.24.so
--3973--   Considering
/usr/lib/debug/.build-id/75/b2a574fa9c03e43b58f53b424b1daec1211862.debug ..
--3973--   .. build-id is valid
--3973-- Reading syms from /lib/x86_64-linux-gnu/libdl-2.24.so
--3973--   Considering
/usr/lib/debug/.build-id/e4/8bb27b88670405041a12eefef9ef586f6e1533.debug ..
--3973--   .. build-id is valid
--3973-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
--3973--    object doesn't have a symbol table
--3973-- Reading syms from /lib/x86_64-linux-gnu/libm-2.24.so
--3973--   Considering
/usr/lib/debug/.build-id/d0/4c68ec51462ba3088cf1b19d54e1706463f723.debug ..
--3973--   .. build-id is valid
--3973-- Reading syms from /lib/x86_64-linux-gnu/libgcc_s.so.1
--3973--   Considering
/usr/lib/debug/.build-id/90/f96c5be1c683de41a42ab262411fb7a3876fb2.debug ..
--3973--   .. build-id is valid
--3973-- Reading syms from /lib/x86_64-linux-gnu/libc-2.24.so
--3973--   Considering
/usr/lib/debug/.build-id/4b/9cc30ba41f027a0dca6cd877f59f0db38f4025.debug ..
--3973--   .. build-id is valid
--3973-- REDIR: 0x5b7a510 (libc.so.6:strcasecmp) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b75fc0 (libc.so.6:strcspn) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b7c800 (libc.so.6:strncasecmp) redirected to
0x4a26742 (_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b78430 (libc.so.6:strpbrk) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b787c0 (libc.so.6:strspn) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b79b90 (libc.so.6:memmove) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
--3973-- REDIR: 0x5b78140 (libc.so.6:rindex) redirected to 0x4c2e5f0
(rindex)
--3973-- REDIR: 0x5b70d30 (libc.so.6:malloc) redirected to 0x4c2bb1f
(malloc)
--3973-- REDIR: 0x5c1bbf0 (libc.so.6:__strcasecmp_avx) redirected to
0x4c2f4a0 (strcasecmp)

Program received signal SIGSEGV, Segmentation fault.
0x000000080470fdf8 in ?? ()
(gdb) where
#0  0x000000080470fdf8 in ?? ()
#1  0x0000000802e8df30 in ?? ()
#2  0x000000000010d76b in ?? ()
#3  0x0000000802008460 in ?? ()
#4  0x0000000802e8df30 in ?? ()
#5  0x0000000000001c00 in ?? ()
#6  0x0000000038c6bb00 in ?? ()
#7  0x0000000000000601 in ?? ()
#8  0x0000000000011af3 in ?? ()
#9  0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

        Inferior 1 [process 3973] will be killed.

Quit anyway? (y or n) y
ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$ /usr/local/bin/valgrind
--help


===
Oh by the way, I added --vex-iropt-register... in the option.

valgrind  --vex-iropt-register-updates=allregs-at-mem-access --verbose
--trace-children=yes --smc-check=all-non-file --gen-suppressions=all
--malloc-fill=0xA5 --free-fill=0xC3 --leak-check=full --num-callers=50
--suppressions=$HOME/Dropbox/myown.sup
--suppressions=$HOME/Dropbox/myown32.sup --show-possibly-lost=no
/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/thunderbird-bin -jsbridge 24242
-foreground -profile
/NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile

But something segfaults anyway.

     [...]
--5688--    object doesn't have a symbol table
--5688-- REDIR: 0x5b76820 (libc.so.6:strncat) redirected to 0x4a26742
(_vgnU_ifunc_wrapper)
Segmentation fault
ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$ --5688-- REDIR:
0x5b72dd0 (libc.so.6:posix_memalign) redirected to 0x4c2de3a
(posix_memalign)
--5688-- Reading syms from
/usr/lib/x86_64-linux-gnu/libtxc_dxtn_s2tc.so.0.0.0
--5688--    object doesn't have a symbol table
--5688-- REDIR: 0x5c1bad0 (libc.so.6:__strspn_sse42) redirected to
0x4c33530 (strspn)
--5688-- REDIR: 0x5b79c90 (libc.so.6:__memcpy_chk_sse2_unaligned)
redirected to 0x4c33220 (__memcpy_chk)

          [...]

===

The final part of sstrace output.


getpid()                                = 4280
gettid()                                = 4280
write(1029, "F", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[KILL STOP], 8) = 0
open("/tmp/thunderbird_ishikawa/.parentlock", O_WRONLY|O_CREAT|O_TRUNC,
0666) = 6
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP], NULL, 8) = 0
gettid()                                = 4280
read(1028, "F", 1)                      = 1
fcntl(6, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0,
l_len=0, l_pid=0}) = 0
fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
l_len=0}) = 0
lstat("/tmp/thunderbird_ishikawa/lock", 0xffeffe100) = -1 ENOENT (No
such file or directory)
uname({sysname="Linux", nodename="ip030", ...}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffeffbab8} ---
+++ killed by SIGSEGV +++



It is possible that the signal handler was not completely installed
before a signal (SIGSEGV) was generated?
=====
The above is the situation under .8.0-2-amd6 kernel (Debian GNU/Linux )


---
To my consternation, this is kernel-dependent.

Under the vanilla kernel I created, valgrind runs just fine
Linux ip030 3.19.5 #1 SMP Mon Apr 20 08:50:21 JST 2015 x86_64 GNU/Linux


Any thoughts?



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

John Reiser
Hint #1.  Fix the first complaint.  Do not pass GO, do not collect $200.  FIX the first complaint.
You will get more sympathy and attention if the *first* significant event
is the bug/error/mystery that is the focus of your inquiry.

> ==3755== Mismatched free() / delete / delete []
> ==3755==    at 0x4C2CD3A: free (vg_replace_malloc.c:530)
> ==3755==    by 0x13EE71B3: bool
> google::protobuf::InsertIfNotPresent...

=====

> ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$ gdb /usr/local/bin/valgrind
   [[snip]]

> Program received signal SIGSEGV, Segmentation fault.
> 0x000000080470fdf8 in ?? ()
> (gdb) where
> #0  0x000000080470fdf8 in ?? ()
> #1  0x0000000802e8df30 in ?? ()
> #2  0x000000000010d76b in ?? ()
> #3  0x0000000802008460 in ?? ()
> #4  0x0000000802e8df30 in ?? ()
> #5  0x0000000000001c00 in ?? ()
> #6  0x0000000038c6bb00 in ?? ()
> #7  0x0000000000000601 in ?? ()
> #8  0x0000000000011af3 in ?? ()
> #9  0x0000000000000000 in ?? ()
> (gdb) quit
> A debugging session is active.

Hint #2.  Use gdb effectively.

(gdb) info reg   ## show all registers
(gdb) x/5i $pc   ## examine instruction stream
(gdb) x/30i $pc-0x20   ## likely previous instruction stream (heuristic sync for variable-length instructions)
(gdb) x/32xw $sp   ## examine memory at stack pointer
(gdb) info proc   ## display the process ID
(gdb) shell cat /proc/<PID>/maps   ## show memory mapping; <PID> is "process" from "info proc"


Hint #3.  If child processes are involved, then apply the tool to them, too.
$ valgrind --trace-children=yes ...

--


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
On 2017/02/18 0:57, John Reiser wrote:
> Hint #1.  Fix the first complaint.  Do not pass GO, do not collect $200.  FIX the first complaint.
> You will get more sympathy and attention if the *first* significant event
> is the bug/error/mystery that is the focus of your inquiry.
>
>> ==3755== Mismatched free() / delete / delete []
>> ==3755==    at 0x4C2CD3A: free (vg_replace_malloc.c:530)
>> ==3755==    by 0x13EE71B3: bool
>> google::protobuf::InsertIfNotPresent...

Thank you. I thought of investigating this myself.
(But my previous brief analysis came to a dead end since the allocation
was done inside libstdc++ AND the mozilla code seemed to
honor the proper free/malloc, delete/new, delete []/new arrayobject at
the superficial source code level :-( ]

By running the the latest thunderbird code under valgrind/memcheck
under linux kernel 3.19.5 (this is the latest kernel I could make the
memcheck + thunderbird work under Debian GNU/Linux.), I obtained
the mismatched warnings as many as possible, and tried to analyze them.

According to
https://bugzilla.mozilla.org/show_bug.cgi?id=1340576,
the prospect is grim.
See comment 5 there.
https://bugzilla.mozilla.org/show_bug.cgi?id=1340576#c5
--- begin quote ---
(In reply to ISHIKAWA, Chiaki from comment #4)
 > Julian, what course of action should I take from here?

The simple answer is, run with --show-mismatched-frees=no.  Most of
them are false positives caused by inconsistent inlining of malloc
into new vs free into delete.

The more complex answer is, we'd have to look at them on an
individual basis.  Bug 1325470 is an example which Mike Hommey
believes is a real bug.  But those are relatively rare.  Mostly
Valgrind is reporting false positives here.
--- end quote ---

So it seems that these are actually FALSE POSITIVEs due to inconsistent
inlining of compiler/header/whatever [I am not a C++ guru].
So if I say, --show-mismatched-frees=no, these won't show up and since
they don't interfere with the operation of valgrind under the kernel
3.19.5, it does seem to be a false positive to me.
(That these are reported as false positives is in itself a big problem:
I think it is the issues of GCC6 and libstdc++ code compiled by GCC6. I
am not sure whether these false positives won't happen if clang is used
for compiling libstdc++ and mozilla thunderbird. But I digress.)

My original question was why the test set up works under vanilla 3.19.5
linux kernel and not under 4.8.y Debian GNU/Linux kernel. Somehow
the same setup works under kernel revision 3.19.5.

>
> =====
>
>> ishikawa@ip030:/NREF-COMM-CENTRAL/comm-central$ gdb /usr/local/bin/valgrind
>    [[snip]]
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000000080470fdf8 in ?? ()
>> (gdb) where
>> #0  0x000000080470fdf8 in ?? ()
>> #1  0x0000000802e8df30 in ?? ()
>> #2  0x000000000010d76b in ?? ()
>> #3  0x0000000802008460 in ?? ()
>> #4  0x0000000802e8df30 in ?? ()
>> #5  0x0000000000001c00 in ?? ()
>> #6  0x0000000038c6bb00 in ?? ()
>> #7  0x0000000000000601 in ?? ()
>> #8  0x0000000000011af3 in ?? ()
>> #9  0x0000000000000000 in ?? ()
>> (gdb) quit
>> A debugging session is active.
>
> Hint #2.  Use gdb effectively.
>
> (gdb) info reg   ## show all registers
> (gdb) x/5i $pc   ## examine instruction stream
> (gdb) x/30i $pc-0x20   ## likely previous instruction stream (heuristic sync for variable-length instructions)
> (gdb) x/32xw $sp   ## examine memory at stack pointer
> (gdb) info proc   ## display the process ID
> (gdb) shell cat /proc/<PID>/maps   ## show memory mapping; <PID> is "process" from "info proc"
>
>
> Hint #3.  If child processes are involved, then apply the tool to them, too.
> $ valgrind --trace-children=yes ...

Oh, I thought I passed "--trace-children to the particular valgrind
session(s) when I captured the latest log.
Hmm. All the logs in the last e-mail of the valgrind runs
had --trace-children=yes option (not always at the beginning, though).
Aha, there seems to have been a copy&paste error when I created the
previous e-mail.

case 1. valgrind --trace-children=yes ...
case 2. (gdb) run
Starting program: /usr/local/bin/valgrind --verbose --trace-children=yes
  --smc-check=all-non-file ...

(I am afraid that there could have been a copy&paste error here. I ran
valgrind with the echoed back options. I might have erased the command
line after |run| by mistake. You can see that the said option was passed
correctly from the following output from valgrind as well.

--3973-- Valgrind options:
--3973--    --verbose
--3973--    --trace-children=yes  <=== here
--3973--    --smc-check=all-non-file
--3973--    --gen-suppressions=all

case 3. valgrind  --vex-iropt-register-updates=allregs-at-mem-access
--verbose --trace-children=yes ...

I would check what is the memory at 0xffeffbab8 (reported in
strace output):

 > --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
si_addr=0xffeffbab8} ---
+++ killed by SIGSEGV +++

using

 > (gdb) shell cat /proc/<PID>/maps   ## show memory mapping; <PID> is
"process" from "info proc"

and look at the memory area at the address.

In the meantime, if there is anyone who has run a large program under
valgrind/memcheck under stock Debian GNU/Linux kernel, please let me
know your kernel version number.  Even if I can figure out the memory
mmap/stack/whatever condition by analyzing the kernel memory map, etc.
by looking at the address reported when SIGSEGV is reported, unless I
can figure out WHAT KERNEL OPTION is the culprit exactly, that won't be
of much help to me as it stands now. :-(
If I can know what KERNEL OPTION is the culprit, at least I can try to
re-create the 4.8.y series kernel and try valgrind under it.
There are enough differences of kernel options between 3.19.5 and 4.8.y,
and a fishing trip won't discover the culprit easily.
(I have been using Debian for close to 20 years now, but maybe I should
switch to Fedora/CentOS since it is used by Mozilla foundation's
compilatation/test farm. Oh well, the compilation/test farm uses clang
and so there is another issue GCC vs clang. I have been using GCC for 30
years, and was comfortable using Debian GNU/Linux and GCC. Maybe it is
time for a change.)

TIA


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

John Reiser
How many failures occur in 10 runs of thunderbird under valgrind?
How many failures occur in 10 runs if you reboot just before each run?

Thunderbird is a user mail agent that uses interactive graphics.
How many failures occur before the display window appears, and how many after?
Are the symptoms and frequency the same for a Radeon card as for NVidia?
On the open-source NVidia driver versus the proprietary driver?
In "dumb framebuffer" mode ("no" acceleration)?
Please tell us which cards: "lspci -nn | grep VGA" or similar.

Are the symptoms and frequency the same for Firefox as for thunderbird?
Are the symptoms and frequency the same for Chrome as for thunderbird?

Please present a histogram of the {mapped file, pc offset, instruction stream}
when the SIGSEGV happens.  [You should have at least 70 runs by now: 10 each
for thunderbird plain, with reboot, other graphics card, other NVidia driver,
dumb framebuffer, Firefox, Chrome.]

thunderbird is not available from the Debian stable "jessie" repository
(Debian 8.7.1, 2017-01-20.)  Where did you get it?

Which kernel modules have been loaded (lsmod)?
Which version(s) of the low-level X11 and display drivers (DRM: direct
rendering manager) are in use?



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
Hi,

Thank you again.

I will hopefully upload the requested info next week.
Here is what I can write down today.

What would be the appropriate upload service? [The data would be too
large for e-mail to the list.]

On 2017/02/19 7:32, John Reiser wrote:
> How many failures occur in 10 runs of thunderbird under valgrind?

10 times, i.e., all the time under the Debian's stock newer kernel.

> How many failures occur in 10 runs if you reboot just before each run?

It never occurred to me to to reboot the system before retrying.
I will check this next week (but given the tests I did by SWITCHING
kernel versions by rebooting to a different revision before over the
last few months, I would say 10 times, i.e. all the times, but again let
me check.)

>
> Thunderbird is a user mail agent that uses interactive graphics.
> How many failures occur before the display window appears, and how many after?

There is one issue: I am seeing a failure of valgrind when I try to run
thunderbird test suite and the complicating factor here is aside from
the available user interaction through GUI under X windows, during the
execution of |make mozmill| test suite, there is a daemon that runs test
scripts and talks to the main TB binary via COM interface. [I stay away
from KB and mouse cursor during tests to avoid interfering with the test
suite run. I do this by invoking virtual X desktop using Xephyr: the
test suite run using Valgrind is done in that virtual desktop. If I
wanted to, I COULD interact with thunderbird's GUI via mouse explicitly.
I did this a few times when a bug in thunderbird or test scrips made the
execution hung waiting for a confirmation of modal dialog, etc.]

 From what I did, the crash occurs before the display window of the
tested thunderbird appears all the time [all the time when the valgrind
printed mysterious Segmentation error under newer Debian kernel.

> Are the symptoms and frequency the same for a Radeon card as for NVidia?
> On the open-source NVidia driver versus the proprietary driver?
> In "dumb framebuffer" mode ("no" acceleration)?
> Please tell us which cards: "lspci -nn | grep VGA" or similar.

I am using Debian GNU/Linux inside
VirtualBox installed under Windows 10 as a platform to
develop and test thunderbird patches.
Debian GNU/Linux installed as the guest OS inside VirtualBox.

So the video graphics driver relevant here is the the VirtualBox video
driver, I think, correct? (But there was a puzzling message in X.0.log.
I will mention it to the answer to your second to last question.)

Under 3.19.5 kernel where the valgrind + thunderbird test suite works:

$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: InnoTek Systemberatung GmbH
VirtualBox Graphics Adapter [80ee:beef]
ishikawa@ip030:/KERNEL-SRC/kernel/linux-source-4.9$

(InnoTek is the name of original virtualbox developer.)

I am not sure if I can remove the above virtualbox graphics adaptor and
revert to the plain VGA adaptor emulation done by VirtualBox, but let me
try.

> Are the symptoms and frequency the same for Firefox as for thunderbird?
I am not developing or creating patches for  Firefox. Sorry.

> Are the symptoms and frequency the same for Chrome as for thunderbird?
Ditto.

Oh, you mean to ask whether I can run very simple
valgrind firefox-binary (without any test harness invovlment) under the
new kernel and see it works?

Then I can test it.
But Chrome. I have not even installed it before.

> Please present a histogram of the {mapped file, pc offset, instruction stream}
> when the SIGSEGV happens.  [You should have at least 70 runs by now: 10 each
> for thunderbird plain, with reboot, other graphics card, other NVidia driver,
> dumb framebuffer, Firefox, Chrome.]

OK, I will gather the data (not sure what you man by "histogram", but I
will gather what I think is relevant.)

   10 each

   for thunderbird plain,
   with reboot [I will certainly reboot before the test run.
   x 10 times with the above InnoTech driver (built-in for VirtualBox).
   [I am not sure if SIGSEGV happens under this setup.]

   for thunderbid + test suite hookup.
   I am quite certain that SIGSEGV happens under this setup.

   BTW, DOES ANYONE HAVE A GOOD IDEA ABOUT HOW TO CAPTURE the mapped
file, etc WHEN SIGSEGV happens? It is very dynamic and by the time I am
ready to type in shell commands, the child binary that experienced it
may be gone. Yes, I have not been able to figure out exactly which
process under the test suite setup started by thunderbird (under
valgrind) is experiencing a difficulty.
I guess some clever hacking via gdb gets me started there?
BTW, valgrind's --gdb-* options are meant to debug the target under
valgrind, NOT the segfault of valgrind itself, correct?
[And the whole thing including valgrind works under kernel 3.19.5 and
not under later kernel drives me crasy.]

   > other graphics card, other NVidia driver, These won't apply.
   for thunderbird plain,
   dumb framebuffer [IF THIS SETUP IS FEASIBLE under VirtualBox.]
   after reboot

   for thunderbird + test suite hookup.
   dumb framebuffer [IF THIS SETUP IS FEASIBLE under VirtualBox.]
   after reboot

   > Firefox,

   I think without any test suite hookup, or anything, I can
   simply run Firefox ESR now available from Debian GNU/Linux repository.
   I suspect without any test suite hookup, it will run.
   Anyway, I will try to compare the
   mmap status under firefox with stock VirtualBox graphics driver, and
   mmap status under firefox with dumb framebuffer [IF THIS IS FEASIBLE.]
   after reboot.


   > Chrome.
   It looks there is a package of Chrome for Ubuntu.
   Maybe I can install it under Debian.
   However, this can wait, I think.
   At the same time, it would be very instructive to compare the mmap
   between the one while chrome is running [AFTER REBOOT]
    and the ones when mozilla software {thunderbird, firefox} is running.

> thunderbird is not available from the Debian stable "jessie" repository
> (Debian 8.7.1, 2017-01-20.)  Where did you get it?

Sorry I was not clear about it.

I have fetched so-called comm-central thunderbird repository and
have been building it locally [64-bit] for testing purposes to fix some
serious bugs I experienced.

The instruction to build thunderbird locally is in the following URL and
I have basically followed it.

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Simple_Thunderbird_build

"Basically" means that I had to tweak the so-called "mozconfig" in many
ways, especially, to enable valgrind-friendly build.
Very brief explanation is in the following URL:
https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Valgrind

The above refers to test |mochitest| for firefox.
Since thunderbird lives in different source directory and
uses a very different test suite setup that uses mozmill, there are
quirks and modifications one need to add to the source files and scripts
in order to run thunderbird under valgrind.

It seems that, at one time, somebody hacked the thunderbird test suite
to run valgrind/memcheck for thunderbird, but it was abandoned and
nobody seems to recall how it was exactly done or how to update the
scripts, etc.

So basically, what I do myself to run thunderbird is
  - renaming the original thunderbird binary to something else, and
  - in its place, I place a binary that invokes the original
    thunderbird binary under valgrind/memcheck with the supplied parameters.

This trick has worked very well and many bugs/issues were found in the
last several years until 2015 when I first experienced the strange
problem of valgrind failure. And back then,
I realized it was related to different kernel versioning.
The locally created kernel 3.19.5 saved the day.
But the world has moved on to 4.x series kernel since then, and when I
updated the kernel last summer this problem reappeared.
I have reverted the kernel to 3.19.5 for the moment, but I am not sure
how long I can stick to the older kernel.

If you need a thinderbird binary to test on your end, I can certainly
make it available.
Actually, I run the test (without valgrind) inside mozilla's
compilation/testing farm occasionally. [This makes it for me to possible
to compile/test OSX version and Windows version. This is a necessary
step before a patch is accepted into mozilla's source tree. ]
You can fetch the binary from there. Please let me know if this is the case.

> Which kernel modules have been loaded (lsmod)?

Under 3.19.5
ishikawa@ip030:/KERNEL-SRC/kernel/linux-source-4.9$ uname -a
Linux ip030 3.19.5 #1 SMP Mon Apr 20 08:50:21 JST 2015 x86_64 GNU/Linux
ishikawa@ip030:/KERNEL-SRC/kernel/linux-source-4.9$ lsmod
Module                  Size  Used by
fuse                   72030  1
btrfs                 731518  0
xor                    21081  1 btrfs
raid6_pq               95431  1 btrfs
ufs                    59011  0
qnx4                   13100  0
hfsplus                81692  0
hfs                    45988  0
minix                  27622  0
ntfs                  160179  0
vfat                   17270  0
msdos                  17077  0
fat                    50634  2 vfat,msdos
jfs                   137440  0
xfs                   667205  0
libcrc32c              12426  1 xfs
ext3                  151975  0
jbd                    52800  1 ext3
ext2                   59160  0
dm_mod                 77808  0
vboxsf                 37355  1
mptctl                 29762  0
mptbase                56835  1 mptctl
binfmt_misc            12846  1
ghash_clmulni_intel    13019  0
aesni_intel           163983  0
ppdev                  12724  0
joydev                 17107  0
iTCO_wdt               12831  0
iTCO_vendor_support    12704  1 iTCO_wdt
aes_x86_64             16719  1 aesni_intel
ablk_helper            12572  1 aesni_intel
cryptd                 14600  3 ghash_clmulni_intel,aesni_intel,ablk_helper
lrw                    12871  1 aesni_intel
evdev                  17518  14
gf128mul               13047  1 lrw
glue_helper            12773  1 aesni_intel
microcode              30394  0
snd_intel8x0           30885  2
psmouse                83740  0
serio_raw              12894  0
pcspkr                 12595  0
snd_ac97_codec        102547  1 snd_intel8x0
snd_pcm                73065  2 snd_ac97_codec,snd_intel8x0
snd_timer              22641  1 snd_pcm
snd                    53213  8
snd_ac97_codec,snd_intel8x0,snd_timer,snd_pcm
soundcore              13031  1 snd
sg                     29968  0
ac97_bus               12510  1 snd_ac97_codec
processor              28021  0
lpc_ich                20905  0
mfd_core               12601  1 lpc_ich
video                  18144  0
rng_core               12880  0
vboxvideo              36417  2
vboxguest             181315  6 vboxsf,vboxvideo
thermal_sys            28310  2 video,processor
ttm                    61967  1 vboxvideo
drm_kms_helper         74527  1 vboxvideo
drm                   229484  5 ttm,drm_kms_helper,vboxvideo
i2c_piix4              12665  0
i2c_core               38003  3 drm,i2c_piix4,drm_kms_helper
syscopyarea            12350  1 vboxvideo
sysfillrect            12522  1 vboxvideo
sysimgblt              12351  1 vboxvideo
ac                     12715  0
battery                13356  0
parport_pc             22422  0
parport                31812  2 ppdev,parport_pc
button                 12988  0
sunrpc                192012  1
loop                   22596  0
ip_tables              22004  0
x_tables               19034  1 ip_tables
autofs4                27584  2
ext4                  403601  15
crc16                  12343  1 ext4
jbd2                   71809  1 ext4
mbcache                13488  3 ext2,ext3,ext4
sd_mod                 39859  26
sr_mod                 21993  0
cdrom                  27042  1 sr_mod
ata_generic            12490  0
hid_generic            12393  0
usbhid                 40671  0
hid                    90268  2 hid_generic,usbhid
ohci_pci               12808  0
ehci_pci               12472  0
ohci_hcd               30951  1 ohci_pci
ehci_hcd               40790  1 ehci_pci
crc32c_intel           21850  4
ahci                   29245  16
usbcore               151644  5 ohci_hcd,ohci_pci,ehci_hcd,ehci_pci,usbhid
libahci                23158  1 ahci
usb_common             12440  1 usbcore
ata_piix               29671  0
libata                145717  4 ahci,libahci,ata_generic,ata_piix
scsi_mod              172107  5 sg,libata,mptctl,sd_mod,sr_mod
e1000                  90595  0
ishikawa@ip030:/KERNEL-SRC/kernel/linux-source-4.9$

I did not realize there are so many vbox drivers.

> Which version(s) of the low-level X11 and display drivers (DRM: direct
> rendering manager) are in use?

Under 3.19.5
egrep -i "(module|vbox|drm)" /var/log/Xorg.0.log &

printed out

[     8.651] (==) ModulePath set to "/usr/lib/xorg/modules"
[     8.651] (II) Module ABI versions:
[     8.652] (II) xfree86: Adding drm device (/dev/dri/card0)
[     8.655] (II) LoadModule: "glx"
[     8.658] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[     8.716] (II) Module glx: vendor="X.Org Foundation"
[     8.716] compiled for 1.19.1, module version = 1.0.0
[     8.716] (==) Matched vboxvideo as autoconfigured driver 0
[     8.716] (==) Matched vboxvideo as autoconfigured driver 1
[     8.716] (II) LoadModule: "vboxvideo"
[     8.716] (WW) Warning, couldn't open module vboxvideo
[     8.716] (II) UnloadModule: "vboxvideo"
[     8.716] (II) Unloading vboxvideo
[     8.716] (EE) Failed to load module "vboxvideo" (module does not
exist, 0)
[     8.716] (II) LoadModule: "modesetting"
[     8.716] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[     8.717] (II) Module modesetting: vendor="X.Org Foundation"
[     8.717] compiled for 1.19.1, module version = 1.19.1
[     8.717] Module class: X.Org Video Driver
[     8.717] (II) LoadModule: "fbdev"
[     8.717] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[     8.717] (II) Module fbdev: vendor="X.Org Foundation"
[     8.717] compiled for 1.19.0, module version = 0.4.4
[     8.717] Module class: X.Org Video Driver
[     8.717] (II) LoadModule: "vesa"
[     8.717] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[     8.717] (II) Module vesa: vendor="X.Org Foundation"
[     8.717] compiled for 1.19.0, module version = 2.3.4
[     8.717] Module class: X.Org Video Driver
[     8.721] (II) Loading sub module "fbdevhw"
[     8.721] (II) LoadModule: "fbdevhw"
[     8.721] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[     8.722] (II) Module fbdevhw: vendor="X.Org Foundation"
[     8.722] compiled for 1.19.1, module version = 0.0.2
[     8.722] (II) Loading sub module "glamoregl"
[     8.722] (II) LoadModule: "glamoregl"
[     8.722] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[     8.733] (II) Module glamoregl: vendor="X.Org Foundation"
[     8.733] compiled for 1.19.1, module version = 1.0.0
[     8.838] EGL_MESA_drm_image required.
[     8.839] (II) modeset(0): Monitor name: VBOX monitor
[     8.840] (II) Loading sub module "fb"
[     8.840] (II) LoadModule: "fb"
[     8.840] (II) Loading /usr/lib/xorg/modules/libfb.so
[     8.840] (II) Module fb: vendor="X.Org Foundation"
[     8.840] compiled for 1.19.1, module version = 1.0.0
[     8.840] (II) UnloadModule: "fbdev"
[     8.840] (II) UnloadSubModule: "fbdevhw"
[     8.840] (II) UnloadModule: "vesa"
[     8.916] (II) LoadModule: "libinput"
[     8.916] (II) Loading /usr/lib/xorg/modules/input/libinput_drv.so
[     8.919] (II) Module libinput: vendor="X.Org Foundation"
[     8.919] compiled for 1.19.0, module version = 0.23.0
[     8.919] Module class: X.Org XInput Driver

I am a little surprised but right now I may be using glx driver given
that "vboxvide" module does not seem to be loaded and other famous
modules get unloaded. Yes, I found out glxinfo printed out rows of
output including the following lines, and glxgears seems to run fine. I
should have known.
Re: glx:
glxinfo | grep -i1 vmware
Extended renderer info (GLX_MESA_query_renderer):
     Vendor: VMware, Inc. (0xffffffff)
     Device: llvmpipe (LLVM 3.9, 256 bits) (0xffffffff)
--
     Max GLES[23] profile version: 3.0
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: Gallium 0.4 on llvmpipe (L


I will collect info on 4.9.0-1 kernel (this is the latest test kernel
where I could not run thunderbird test suite since something dies during
execution.).

It may take a little time to gether the data. (Since the
compiling/testing thunderbird requires resources, I have only once
instance of VM running on the PC. So I really have to reboot this VM to
switch the kernel to obtain data.)
I wish someone with 64GB memory could retry and reproduce the issue in
their VirtualBox images on their hardware :-)
It would be very instructive compare the mmap usage, etc. under
different kernel revisions side by side (!)

TIA

PS: Just in case the HOST CPU/OS may have something to do with the issues:
OS: Windows 10 Pro
CPU: Intel Xeon CPU E3-1240 V2
Graphics: Radeon 7700

But I am sure that VirtualBox has shielded the bare metal rather well.
Windows version of VirtualBox : 5.1.14 r112924 (Qt5.6.2)


>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Valgrind-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

John Reiser
On 02/18/2017 11:38 PM, ISHIKAWA,chiaki wrote:
>   BTW, DOES ANYONE HAVE A GOOD IDEA ABOUT HOW TO CAPTURE the mapped file, etc WHEN SIGSEGV happens? It is very dynamic and by the time I am ready to type in shell commands, the child binary that experienced it may be gone. Yes, I have not been able to figure out exactly which process under the test
> suite setup started by thunderbird (under valgrind) is experiencing a difficulty.
> I guess some clever hacking via gdb gets me started there?
> BTW, valgrind's --gdb-* options are meant to debug the target under valgrind, NOT the segfault of valgrind itself, correct?
> [And the whole thing including valgrind works under kernel 3.19.5 and not under later kernel drives me crasy.]

This gdb command will stop execution and print a message when SIGSEGV happens:
        (gdb) handle SIGSEGV stop print
When the SIGSEGV happens then you will have to focus keyboard input to that process.
(The above 'handle' command is the default anyway, so if the automation for your
test harness snatches control, then you still might not get a chance for manual input.)
There is no way to ask of gdb, "Please run these commands upon SIGSEGV."

You can write a script for the entire input to gdb: gdb -batch -x script -e executable
(beware: it is very brittle) but gdb cannot switch its input stream
(such as back and forth between the script and the terminal)
while it is running.  "gdb -batch -x script -e executable" might be your best option,
but it will take some patience.  There is no way for the script to
check that gdb is waiting for input after SIGSEGV, so you just have to assume
that the SIGSEGV is going to happen after your 'run' command in the input.

Yes, valgrind's --gdb-* options are for debugging the target under valgrind,
and are NOT for debugging valgrind itself.


If you run "strace -f -o strace.out -e trace=execve valgrind --trace-children=yes ..."
then the output in strace.out will tell you which process receives the SIGSEGV.
The "-e trace=execve" is a filter which restricts tracing to execve only;
otherwise the output will be very long because it contains every system call
for every process.

--





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
Sorry for top-posting, but thank you for the suggestions.

So far, I figured out that the maps are different under 3.19.5, 4.7.0.1,
and 4.9.6 versions of linux kernel.

Also, I have figured out the SIGSEGV problem is timing-related/race
under 4.7.0.1
(Worst bug in terms of reproducibility).
If managed to attach to the thunderbird binary executing under valgrind
using --vgdb=y and --vgdb-error=0 and single step (or step over
functions) through thunderbird to figure out what kind of thunderbird
behavior may trigger valgrind problem.
Then I noticed the SIGSEGV did not occur when I stepped through the code
(over the fork) while if I simply run the thunderbird code by "cont" all
the way, SIGSEGV occurs :-(
To sum up, under 4.7.0.1, the SIGSEGV seems to occur near the fork()
system call.
Thunderbird invokes a small glxtest program which checks for the
graphics driver info (for debugging?). And fork() is reached before
SIGSEGV is observed under 4.7.0.1.
[I thought that I was homing on a possible bug.]
But under 4.9.6, the SIGSEGV seems to occur way before this fork() is
executed and it is very difficult yet to figure out where the SIGSEGV
occurs.

Under 3.19.5, thunderbird runs under valgrind just fine.

 From the way it goes, I will be able to post the logs with some results
from additional probes at the beginning of next week.


On 2017/02/20 8:30, John Reiser wrote:

> On 02/18/2017 11:38 PM, ISHIKAWA,chiaki wrote:
>>   BTW, DOES ANYONE HAVE A GOOD IDEA ABOUT HOW TO CAPTURE the mapped
>> file, etc WHEN SIGSEGV happens? It is very dynamic and by the time I
>> am ready to type in shell commands, the child binary that experienced
>> it may be gone. Yes, I have not been able to figure out exactly which
>> process under the test
>> suite setup started by thunderbird (under valgrind) is experiencing a
>> difficulty.
>> I guess some clever hacking via gdb gets me started there?
>> BTW, valgrind's --gdb-* options are meant to debug the target under
>> valgrind, NOT the segfault of valgrind itself, correct?
>> [And the whole thing including valgrind works under kernel 3.19.5 and
>> not under later kernel drives me crasy.]
>
> This gdb command will stop execution and print a message when SIGSEGV
> happens:
>     (gdb) handle SIGSEGV stop print
> When the SIGSEGV happens then you will have to focus keyboard input to
> that process.
> (The above 'handle' command is the default anyway, so if the automation
> for your
> test harness snatches control, then you still might not get a chance for
> manual input.)
> There is no way to ask of gdb, "Please run these commands upon SIGSEGV."
>
> You can write a script for the entire input to gdb: gdb -batch -x script
> -e executable
> (beware: it is very brittle) but gdb cannot switch its input stream
> (such as back and forth between the script and the terminal)
> while it is running.  "gdb -batch -x script -e executable" might be your
> best option,
> but it will take some patience.  There is no way for the script to
> check that gdb is waiting for input after SIGSEGV, so you just have to
> assume
> that the SIGSEGV is going to happen after your 'run' command in the input.
>
> Yes, valgrind's --gdb-* options are for debugging the target under
> valgrind,
> and are NOT for debugging valgrind itself.
>
>
> If you run "strace -f -o strace.out -e trace=execve valgrind
> --trace-children=yes ..."
> then the output in strace.out will tell you which process receives the
> SIGSEGV.
> The "-e trace=execve" is a filter which restricts tracing to execve only;
> otherwise the output will be very long because it contains every system
> call
> for every process.
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users
Reply | Threaded
Open this post in threaded view
|

Re: Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

ISHIKAWA,chiaki
After looking for a suitable place to post lengthy log files,
I filed https://bugs.kde.org/show_bug.cgi?id=377006
in valgrind bugzilla with a set of log files.

Sorry, I have forgotten to post a notice to this mailing list.

If someone can figure out a possible way to proceed with debugging, I
would appreciate to hear it.

TIA


On 2017/02/22 2:25, ISHIKAWA,chiaki wrote:

> Sorry for top-posting, but thank you for the suggestions.
>
> So far, I figured out that the maps are different under 3.19.5, 4.7.0.1,
> and 4.9.6 versions of linux kernel.
>
> Also, I have figured out the SIGSEGV problem is timing-related/race
> under 4.7.0.1
> (Worst bug in terms of reproducibility).
> If managed to attach to the thunderbird binary executing under valgrind
> using --vgdb=y and --vgdb-error=0 and single step (or step over
> functions) through thunderbird to figure out what kind of thunderbird
> behavior may trigger valgrind problem.
> Then I noticed the SIGSEGV did not occur when I stepped through the code
> (over the fork) while if I simply run the thunderbird code by "cont" all
> the way, SIGSEGV occurs :-(
> To sum up, under 4.7.0.1, the SIGSEGV seems to occur near the fork()
> system call.
> Thunderbird invokes a small glxtest program which checks for the
> graphics driver info (for debugging?). And fork() is reached before
> SIGSEGV is observed under 4.7.0.1.
> [I thought that I was homing on a possible bug.]
> But under 4.9.6, the SIGSEGV seems to occur way before this fork() is
> executed and it is very difficult yet to figure out where the SIGSEGV
> occurs.
>
> Under 3.19.5, thunderbird runs under valgrind just fine.
>
> From the way it goes, I will be able to post the logs with some results
> from additional probes at the beginning of next week.
>
>
> On 2017/02/20 8:30, John Reiser wrote:
>> On 02/18/2017 11:38 PM, ISHIKAWA,chiaki wrote:
>>>   BTW, DOES ANYONE HAVE A GOOD IDEA ABOUT HOW TO CAPTURE the mapped
>>> file, etc WHEN SIGSEGV happens? It is very dynamic and by the time I
>>> am ready to type in shell commands, the child binary that experienced
>>> it may be gone. Yes, I have not been able to figure out exactly which
>>> process under the test
>>> suite setup started by thunderbird (under valgrind) is experiencing a
>>> difficulty.
>>> I guess some clever hacking via gdb gets me started there?
>>> BTW, valgrind's --gdb-* options are meant to debug the target under
>>> valgrind, NOT the segfault of valgrind itself, correct?
>>> [And the whole thing including valgrind works under kernel 3.19.5 and
>>> not under later kernel drives me crasy.]
>>
>> This gdb command will stop execution and print a message when SIGSEGV
>> happens:
>>     (gdb) handle SIGSEGV stop print
>> When the SIGSEGV happens then you will have to focus keyboard input to
>> that process.
>> (The above 'handle' command is the default anyway, so if the automation
>> for your
>> test harness snatches control, then you still might not get a chance for
>> manual input.)
>> There is no way to ask of gdb, "Please run these commands upon SIGSEGV."
>>
>> You can write a script for the entire input to gdb: gdb -batch -x script
>> -e executable
>> (beware: it is very brittle) but gdb cannot switch its input stream
>> (such as back and forth between the script and the terminal)
>> while it is running.  "gdb -batch -x script -e executable" might be your
>> best option,
>> but it will take some patience.  There is no way for the script to
>> check that gdb is waiting for input after SIGSEGV, so you just have to
>> assume
>> that the SIGSEGV is going to happen after your 'run' command in the
>> input.
>>
>> Yes, valgrind's --gdb-* options are for debugging the target under
>> valgrind,
>> and are NOT for debugging valgrind itself.
>>
>>
>> If you run "strace -f -o strace.out -e trace=execve valgrind
>> --trace-children=yes ..."
>> then the output in strace.out will tell you which process receives the
>> SIGSEGV.
>> The "-e trace=execve" is a filter which restricts tracing to execve only;
>> otherwise the output will be very long because it contains every system
>> call
>> for every process.
>>
>
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/valgrind-users