I am getting a segmentation fault that is internal to malloc. Here is the trace of the error via gdb (Note: everything frome frame 4 and above is my code).
#0 0xf7d109dd in _int_malloc (av=av#entry=0xf7e45420 <main_arena>, bytes=bytes#entry=100)
at malloc.c:3697
#1 0xf7d12358 in __GI___libc_malloc (bytes=100) at malloc.c:2888
#2 0xf7efb4a5 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
#3 0xf7efb5ab in operator new[](unsigned int) () from /usr/lib/libstdc++.so.6
#4 0x0804f295 in GetDataFromMem (fileAddr=1744, size=-1) at ../userprog/exception.cc:709
#5 0x0804ec9d in ExecSC (bufAddr=1744, ptrAddr=2912) at ../userprog/exception.cc:578
#6 0x0804dde9 in ExceptionHandler (which=SyscallException) at ../userprog/exception.cc:191
#7 0x08051585 in Machine::RaiseException (this=0x805b8c8, which=SyscallException, badVAddr=0)
at ../machine/machine.cc:109
#8 0x08052ee3 in Machine::OneInstruction (this=0x805b8c8, instr=0x80628e8) at ../machine/mipssim.cc:535
#9 0x080519d0 in Machine::Run (this=0x805b8c8) at ../machine/mipssim.cc:40
#10 0x0804e74f in ForkBootStrap (val=0) at ../userprog/exception.cc:462
#11 0x08054338 in ThreadRoot ()
#12 0x00000000 in ?? ()
The line in my code that resulted in the fault:
int vpn = fileAddr / PageSize;
int offset = fileAddr % PageSize;
int counter = 0;
bool nullhit = false;
char *buf;
if (size == -1)
{
int curSize = DEF_BUF_LEN; //100
buf = new char[curSize]; //SEGFAULT
The exact line where the segmentation fault occurred (within malloc code):
3687 else
3688 {
3689 size = chunksize (victim);
3690
3691 /* We know the first chunk in this bin is big enough to use. */
(gdb)
3692 assert ((unsigned long) (size) >= (unsigned long) (nb));
3693
3694 remainder_size = size - nb;
3695
3696 /* unlink */
3697 unlink (victim, bck, fwd); //SEGFAULTS HERE
3698
3699 /* Exhaust */
3700 if (remainder_size < MINSIZE)
3701 {
(gdb)
3702 set_inuse_bit_at_offset (victim, size);
3703 if (av != &main_arena)
3704 victim->size |= NON_MAIN_ARENA;
3705 }
3706
3707 /* Split */
3708 else
3709 {
3710 remainder = chunk_at_offset (victim, nb);
3711
(gdb)
3712 /* We cannot assume the unsorted list is empty and therefore
3713 have to perform a complete insert here. */
3714 bck = unsorted_chunks (av);
3715 fwd = bck->fd;
3716 if (__builtin_expect (fwd->bk != bck, 0))
3717 {
3718 errstr = "malloc(): corrupted unsorted chunks 2";
3719 goto errout;
3720 }
3721 remainder->bk = bck;
The values of victim, bck, and fwd in unlink are respectively :
(gdb) p victim
$6 = (mchunkptr) 0x805c650
(gdb) p bck
$7 = (mchunkptr) 0xffffffff
(gdb) p fwd
$8 = (mchunkptr) 0xf7e454e8 <main_arena+200>
I am not really sure what the causes for this are. Any insight is appreciated.
Related
I came across below code for walking backtrace
struct stack_frame {
struct stack_frame *prev;
void *return_addr;
} __attribute__((packed));
typedef struct stack_frame stack_frame;
__attribute__((noinline, noclone))
void backtrace_from_fp(void **buf, int size)
{
int i;
stack_frame *fp;
__asm__("movl %%ebp, %[fp]" : /* output */ [fp] "=r" (fp));
for(i = 0; i < size && fp != NULL; fp = fp->prev, i++)
buf[i] = fp->return_addr;
}
the reason behind looking for this code is we are using a 3rd party malloc hook hence don't want to use backtrace which again allocates memory. Above doesn't work for x86_64 and I modified asm statement to
__asm__("movl %%rbp, %[fp]" : /* output */ [fp] "=r" (fp));
I get crash
(gdb) bt
#0 backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1910
#1 tc_malloc (size=<optimized out>) at src/tcmalloc.cc:1920
#2 0x00007f5023ade58d in __fopen_internal () from /lib64/libc.so.6
#3 0x00007f501e687956 in selinuxfs_exists () from /lib64/libselinux.so.1
#4 0x00007f501e67fc28 in init_lib () from /lib64/libselinux.so.1
#5 0x00007f5029a32503 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#6 0x00007f5029a241aa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#7 0x0000000000000001 in ?? ()
#8 0x00007fff22cb8e24 in ?? ()
#9 0x0000000000000000 in ?? ()
(gdb)
(gdb) p $rbp
$2 = (void *) 0x7f501e695f37
(gdb) p (stack_frame *)$rbp
$3 = (stack_frame *) 0x7f501e695f37
(gdb) p *$3
$4 = {prev = 0x69662f636f72702f, return_addr = 0x6d6574737973656c}
(gdb) x /1xw 0x69662f636f72702f
0x69662f636f72702f: Cannot access memory at address 0x69662f636f72702f
(gdb) fr
#0 backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1910
1910 in src/tcmalloc.cc
(gdb)
Am I missing something ?. Any help on how can I reconstruct the same via code ?.
Am I missing something ?
The code you referenced assumes the compiled code is using frame pointer register chain.
This was the default on (32-bit) i*86 up until about 5-7 years ago, and has not been the default on x86_64 since ~forever.
The code will most likely work fine in non-optimized builds, but will fail miserably with optimization on both 32-bit and 64-bit x86 platforms using non-ancient versions of the compiler.
If you can rebuild all code (including libc) with -fno-omit-frame-pointer, then this code will work most of the time (but not all the time, because libc may have hand-coded assembly, and that assembly will not have frame pointer chain).
One solution is to use libunwind. Unfortunately, using it from inside malloc can still run into a problem, if you (or any libraries you use) also use dlopen.
I got the following segmentation fault:
Program terminated with signal 11, Segmentation fault.
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
145 dispose();
Missing separate debuginfos, use: debuginfo-install boost-filesystem- 1.41.0-11.el6_1.2.x86_64 boost-program-options-1.41.0-11.el6_1.2.x86_64 boost-system-1.41.0-11.el6_1.2.x86_64 bzip2-libs-1.0.5-7.el6_0.x86_64 glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 lzo-2.03-3.1.el6.x86_64
(gdb) bt
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#1 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:217
#2 0x00007f13fad83dab in boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) ()
from /usr/local/lib/libboost_thread.so.1.40.0
#3 0x000000000042e191 in boost::thread_specific_ptr<infrastructure::tfeed::sequenced_data_queue_element_t::mem_prealloc>::release (this=<value optimized out>)
at /usr/local/include/boost/thread/tss.hpp:95
#4 0x000000000042ed43 in infrastructure::tfeed::sequenced_data_queue_element_t::operator new (size=16)
at ../../../infrastructure/include/tfeed/tfeed_multicast_defs.h:120
Also, I got a similar seg fault in another thread:
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f122e1fe700 (LWP 7547))]
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
145 dispose();
(gdb) bt
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#1 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:217
#2 0x00007f13fad83dab in boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) ()
from /usr/local/lib/libboost_thread.so.1.40.0
#3 0x000000000042e591 in release (q_elem=<value optimized out>) at /usr/local/include/boost/thread/tss.hpp:95
#4 infrastructure::tfeed::sequenced_data_queue_element_t::operator delete (q_elem=<value optimized out>)
at ../../../infrastructure/include/tfeed/tfeed_multicast_defs.h:144
this happened while using the boost's thread specific pointer at /usr/local/include/boost/thread/tss.hpp:95 at (set_tss_data() call below)
T* release()
{
T* const temp=get();
detail::set_tss_data(this,boost::shared_ptr<detail::tss_cleanup_function>(),0,false);
return temp;
}
and futher at sp_counted_base_gcc_x86.hpp(at dispose())
void release() // nothrow
{
if( atomic_exchange_and_add( &use_count_, -1 ) == 1 )
{
dispose();
weak_release();
}
}
I am using thread specific pointer while specialized new and delete calls for a datastructure(sequenced_data_queue_element_t). As this new and delete are called from multiple threads:
class sequenced_data_queue_element_t
{
public:
sequenced_data_queue_element_t() {
}
~sequenced_data_queue_element_t() {
delete data;
}
unsigned char* data;
uint32_t data_len;
typedef struct mem_prealloc
{
struct mem_prealloc* next;
} mem_prealloc_t;
static boost::thread_specific_ptr<mem_prealloc_t> mem_prealloc_q_head;
static void* operator new(size_t size)
{
mem_prealloc_t* q_elem;
if (UNLIKELY(mem_prealloc_q_head.get() == NULL))
{
/* allocate PREALLOC_BATCH elems at a time */
for (int i=0; i < MEM_PREALLOC_BATCH; i++)
{
q_elem = (mem_prealloc_t*)malloc(size);
q_elem->next = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(q_elem);
cur_mem_prealloced += size;
}
}
q_elem = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(q_elem->next);
return (void*)q_elem;
}
static void operator delete(void* q_elem)
{
/* C++ guarantees that an object's destructor
* is automatically called just before delete executes. */
/* next reuses the first pointer of sequenced_data_element_t */
((mem_prealloc_t*)q_elem)->next = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset((mem_prealloc_t*)q_elem);
if (cur_mem_prealloced > MEM_PREALLOC_MAX_BYTES)
{
for (int i=0; i < MEM_PREALLOC_BATCH; i++)
{
mem_prealloc_t* qelem = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(qelem->next);
free(qelem);
cur_mem_prealloced -= sizeof(sequenced_data_queue_element_t);
if (mem_prealloc_q_head.get() == NULL)
break;
}
}
}
};
cur_mem_prealloced(uint64_t) is a global variable.
What could be the possible reason triggering this bug?
Further, the stack of some other threads of the program seems to be corrupted. Also, the core dump shows unexpected code paths for those other threads.
the kernel logs shows the following error message:
[7547]: segfault at 10 ip 000000000040fbf6 sp 00007f122e1fcd90 error 4
[7531]: segfault at 10 ip 000000000040fbf6 sp 00007fffc94bcca0 error 4 in tbt[400000+a0000] in tbt[400000+a0000]
Can these seg faults be triggered in case of stack corruption as well?
I have the following declaration for a 2D dynamic integer linked list in Population.cpp:
sectionProf = new int*[section_count]; //list of professor for each section declaration
It is defined in Population.h as:
int ** sectionProf; //list of professor for each section
It is then filled from a file as such, again in Population.cpp, later on:
sectionProf[section] = new int[professors + 1];
sectionProf[section][0] = professors;
if (professors > 0) {
for (int x = 1; x < professors + 1; ++x) {
sectionProf[section][x] = stoi(tokenizedVersion[x + 1]);
}
}
Then, in the destructor, I destroy it as follows:
if(sectionProf){
for(int i = 0; i < section_count; ++i){
delete [] sectionProf[i];
}
delete [] sectionProf;
}
However, upon execution, I keep getting the following error:
* glibc detected * ./research_scheduling_backend: corrupted double-linked list: 0x00000000020b78c0 ***
Here is the gdb backtrace (#17 is referring to the 'delete [] sectionProf' line):
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:93
#1 0x00007ffff7085f61 in _L_lock_10611 () at malloc.c:5249
#2 0x00007ffff7083c87 in __GI___libc_malloc (bytes=140737341265696) at malloc.c:2921
#3 0x00007ffff7de7900 in _dl_map_object_deps (map=0x7ffff7fdd4e0, preloads=<optimized out>, npreloads=<optimized out>, trace_mode=0, open_mode=-2147483648) at dl-deps.c:517
#4 0x00007ffff7ded8a9 in dl_open_worker (a=0x7fffffffbb00) at dl-open.c:262
#5 0x00007ffff7de9176 in _dl_catch_error (objname=0x7fffffffbb48, errstring=0x7fffffffbb50, mallocedp=0x7fffffffbb5f, operate=0x7ffff7ded700 <dl_open_worker>, args=0x7fffffffbb00) at dl-error.c:178
#6 0x00007ffff7ded31a in _dl_open (file=0x7ffff717a858 "libgcc_s.so.1", mode=-2147483647, caller_dlopen=0x7ffff710bea5, nsid=-2, argc=3, argv=<optimized out>, env=0x7fffffffeac8) at dl-open.c:639
#7 0x00007ffff7131bb2 in do_dlopen (ptr=0x7fffffffbd00) at dl-libc.c:89
#8 0x00007ffff7de9176 in _dl_catch_error (objname=0x7fffffffbd30, errstring=0x7fffffffbd20, mallocedp=0x7fffffffbd3f, operate=0x7ffff7131b70 <do_dlopen>, args=0x7fffffffbd00) at dl-error.c:178
#9 0x00007ffff7131c74 in dlerror_run (args=0x7fffffffbd00, operate=0x7ffff7131b70 <do_dlopen>) at dl-libc.c:48
#10 __GI___libc_dlopen_mode (name=<optimized out>, mode=<optimized out>) at dl-libc.c:165
#11 0x00007ffff710bea5 in init () at ../sysdeps/x86_64/../ia64/backtrace.c:53
#12 0x00007ffff6df1400 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:104
#13 0x00007ffff710bfc4 in __GI___backtrace (array=<optimized out>, size=64) at ../sysdeps/x86_64/../ia64/backtrace.c:104
#14 0x00007ffff707505f in __libc_message (do_abort=2, fmt=0x7ffff717f560 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:180
#15 0x00007ffff707f846 in malloc_printerr (action=3, str=0x7ffff717be4c "corrupted double-linked list", ptr=<optimized out>) at malloc.c:5047
#16 0x00007ffff7080b1b in _int_free (av=0x7ffff73b9720, p=0x627dd0, have_lock=0) at malloc.c:4125
#17 0x0000000000404b7e in Population::~Population (this=0x7fffffffc910, __in_chrg=<optimized out>) at Population.cpp:91
#18 0x0000000000403919 in main (argc=3, argv=0x7fffffffeaa8) at Scheduler.cpp:101
At absolutely no place in the code is the sectionProf array ever modified. It is only used to check values. Can someone please tell me why I might be getting this error? I have looked all over the place about glibc double-linked list errors and I understand that it is because in some way I am corrupting the symbol table(?) somehow...
For anyone who lands on this problem, here is what is wrong in my specific problem. I was reading garbage value for section index that were out of range (section_count) when I was generating the array. That is, in the for loop,
sectionProf[section] = new int[professors + 1];
sectionProf[section][0] = professors;
if (professors > 0) {
for (int x = 1; x < professors + 1; ++x) {
sectionProf[section][x] = stoi(tokenizedVersion[x + 1]);
}
}
my value for section was not in the range of 0 and section_count, the index used in the delete loop. Hence why I was causing the corruption of memory.
Lesson: Check for PEBKAC errors generated in input files.
now I develop the test code using GPB in qnx as follows:
Offer_event Offer;
string a = "127.0.0.7";
Offer.set_ipaddress(a);
Offer.set_port(9000);
BufSize = Offer.ByteSize();
Length_message = BufSize + Message_Header_Size;
Message->PayloadLength_of_Payload = BufSize;
PayloadBuffer = new char[BufSize];
Offer.SerializeToArray(PayloadBuffer, BufSize);
in that case, I met some errors. but I cannot understand it.
that error is as follows:
#0 std::string::size (this=0xcd21c0)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624
624 /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-
nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h: No such file or d
irectory.
in /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unkno
wn-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h
(gdb) bt
#0 std::string::size (this=0xcd21c0)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-n
to-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624
#1 0x0067d6b0 in google::protobuf::internal::WireFormatLite::StringSize ()
#2 0x0063ecd0 in Offer_event::ByteSize ()
#3 0x00404f18 in AnalysisCmdC_Actor::TestGPB ()
from C:/QNX650/target/qnx6/armle-v7/lib/libc.so.3
#11 0x0004201a in ?? ()
Cannot access memory at address 0x0
Current language: auto; currently c++
(gdb)
I don't know why the ByteSize has a problem.
If i delete the string part, it works well.
I think usage of string is problem.
what's the problem?
I am trying to get a backtrace at some point of the execution of my (c++) program.
for that I am using backtrace and backtrace_symbols. Something along this lines:
std::string stacktrace( unsigned int frames_to_skip )
{
std::string str;
void* stack_addrs[50];
int trace_size = backtrace( stack_addrs, 50 );
char** stack_strings = backtrace_symbols( stack_addrs, trace_size );
str += "[bt] backtrace:\n";
// skip frames_to_skip stack frames
for( int i = frames_to_skip; i < trace_size; ++i )
{
char tmp[4096];
sprintf( tmp, "[bt] #%d %s\n", i-frames_to_skip, stack_strings[i] );
str += tmp;
}
free( stack_strings );
return str;
}
It works but some functions names are missing. example:
[bt] #0 /path/to/executable() [0x43e1b5]
[bt] #1 /path/to/executable() [0x43e0cd]
[bt] #2 /path/to/executable() [0x43df51]
[bt] #3 /path/to/executable() [0x43dd44]
[bt] #4 /path/to/executable() [0x43db50]
[bt] #5 /path/to/executable() [0x43d847]
[bt] #6 /path/to/executable() [0x43d216]
[bt] #7 /path/to/executable() [0x43c1e1]
[bt] #8 /path/to/executable() [0x43b293]
[bt] #9 /path/to/executable(_Z29SomeRN5other8symbolE+0x2c) [0x43a6ca]
[bt] #10 /path/to/executable(_Z11SomeIN5_8symbolEPFvRS1_EEvRKT_RKT0_+0x77) [0x441716]
...
the functions 0 to 8 have one common point : they all sit in a namespace...
I tried putting function 9 in an anonymous namespace (without any other modification) and it disapears from the backtrace... which now looks like this:
[bt] #0 /path/to/executable() [0x43e1b5]
[bt] #1 /path/to/executable() [0x43e0cd]
[bt] #2 /path/to/executable() [0x43df51]
[bt] #3 /path/to/executable() [0x43dd44]
[bt] #4 /path/to/executable() [0x43db50]
[bt] #5 /path/to/executable() [0x43d847]
[bt] #6 /path/to/executable() [0x43d216]
[bt] #7 /path/to/executable() [0x43c1e1]
[bt] #8 /path/to/executable() [0x43b293]
[bt] #9 /path/to/executable() [0x43a6ca]
[bt] #10 /path/to/executable(_Z11SomeIN5_8symbolEPFvRS1_EEvRKT_RKT0_+0x77) [0x441716]
...
Is there any way to fix that?
p.s.: version of g++:
g++ (GCC) 4.6.0 20110530 (Red Hat 4.6.0-9)
edit fixed max depth of the backtrace after Code Monkey remark
edit2 added the full code of the function
edit3 the code is compiled with -O0 -g3 and linked with -rdynamic
Your problem may be the functions you are using. Your max_depth in backtrace(..) is set to 16. That may be too low. At any rate...
This blog post on C++ stack traces with GCC explains how you should be performing stack traces. In sum,
#include <execinfo.h>
void print_trace(FILE *out, const char *file, int line)
{
const size_t max_depth = 100;
size_t stack_depth;
void *stack_addrs[max_depth];
char **stack_strings;
stack_depth = backtrace(stack_addrs, max_depth);
stack_strings = backtrace_symbols(stack_addrs, stack_depth);
fprintf(out, "Call stack from %s:%d:\n", file, line);
for (size_t i = 1; i < stack_depth; i++) {
fprintf(out, " %s\n", stack_strings[i]);
}
free(stack_strings); // malloc()ed by backtrace_symbols
fflush(out);
}
GCC also provides access to the C++ name (de)mangler. There are some
pretty hairy details to learn about memory ownership, and interfacing
with the stack trace output requires a bit of string parsing, but it
boils down to replacing the above inner loop with this:
#include <cxxabi.h>
...
for (size_t i = 1; i < stack.depth; i++) {
size_t sz = 200; // just a guess, template names will go much wider
char *function = static_cast(malloc(sz));
char *begin = 0, *end = 0;
// find the parentheses and address offset surrounding the mangled name
for (char *j = stack.strings[i]; *j; ++j) {
if (*j == '(') {
begin = j;
}
else if (*j == '+') {
end = j;
}
}
if (begin && end) {
*begin++ = '';
*end = '';
// found our mangled name, now in [begin, end)
int status;
char *ret = abi::__cxa_demangle(begin, function, &sz, &status);
if (ret) {
// return value may be a realloc() of the input
function = ret;
}
else {
// demangling failed, just pretend it's a C function with no args
std::strncpy(function, begin, sz);
std::strncat(function, "()", sz);
function[sz-1] = '';
}
fprintf(out, " %s:%s\n", stack.strings[i], function);
}
else
{
// didn't find the mangled name, just print the whole line
fprintf(out, " %s\n", stack.strings[i]);
}
free(function);
}
There is more information on that site (I didn't want to copy verbatim) but looking at this code and the above site should get you on the right track.
Can you try adding -rdynamic to your link?
http://www.linuxforums.org/forum/programming-scripting/35192-backtrace_symbols-no-symbols.html
backtrace lists the call frames, which correspond to machine code call instructions, not source level function calls.
The difference is that with inlining, an optimizing compiler can often avoid using a call instruction for every logical function call in the source code.