I came across below code for walking backtrace
struct stack_frame {
struct stack_frame *prev;
void *return_addr;
} __attribute__((packed));
typedef struct stack_frame stack_frame;
__attribute__((noinline, noclone))
void backtrace_from_fp(void **buf, int size)
{
int i;
stack_frame *fp;
__asm__("movl %%ebp, %[fp]" : /* output */ [fp] "=r" (fp));
for(i = 0; i < size && fp != NULL; fp = fp->prev, i++)
buf[i] = fp->return_addr;
}
the reason behind looking for this code is we are using a 3rd party malloc hook hence don't want to use backtrace which again allocates memory. Above doesn't work for x86_64 and I modified asm statement to
__asm__("movl %%rbp, %[fp]" : /* output */ [fp] "=r" (fp));
I get crash
(gdb) bt
#0 backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1910
#1 tc_malloc (size=<optimized out>) at src/tcmalloc.cc:1920
#2 0x00007f5023ade58d in __fopen_internal () from /lib64/libc.so.6
#3 0x00007f501e687956 in selinuxfs_exists () from /lib64/libselinux.so.1
#4 0x00007f501e67fc28 in init_lib () from /lib64/libselinux.so.1
#5 0x00007f5029a32503 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#6 0x00007f5029a241aa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#7 0x0000000000000001 in ?? ()
#8 0x00007fff22cb8e24 in ?? ()
#9 0x0000000000000000 in ?? ()
(gdb)
(gdb) p $rbp
$2 = (void *) 0x7f501e695f37
(gdb) p (stack_frame *)$rbp
$3 = (stack_frame *) 0x7f501e695f37
(gdb) p *$3
$4 = {prev = 0x69662f636f72702f, return_addr = 0x6d6574737973656c}
(gdb) x /1xw 0x69662f636f72702f
0x69662f636f72702f: Cannot access memory at address 0x69662f636f72702f
(gdb) fr
#0 backtrace_from_fp (size=10, buf=<optimized out>) at src/tcmalloc.cc:1910
1910 in src/tcmalloc.cc
(gdb)
Am I missing something ?. Any help on how can I reconstruct the same via code ?.
Am I missing something ?
The code you referenced assumes the compiled code is using frame pointer register chain.
This was the default on (32-bit) i*86 up until about 5-7 years ago, and has not been the default on x86_64 since ~forever.
The code will most likely work fine in non-optimized builds, but will fail miserably with optimization on both 32-bit and 64-bit x86 platforms using non-ancient versions of the compiler.
If you can rebuild all code (including libc) with -fno-omit-frame-pointer, then this code will work most of the time (but not all the time, because libc may have hand-coded assembly, and that assembly will not have frame pointer chain).
One solution is to use libunwind. Unfortunately, using it from inside malloc can still run into a problem, if you (or any libraries you use) also use dlopen.
Related
I have a problem using Google V8 in linux. If I create a V8 instance in my shared library, I get a segfault. The same code works fine in a Windows DLL and in a linux executable.
My code:
extern "C" void InitV8ExtensionInterFace(){
v8::V8::InitializeICU();
v8::V8::Initialize();
v8::Isolate* isolate = v8::Isolate::New(); **//error occur**
threadfunc(argc, args);
}
gdb stack trace:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff3cb86d5 in v8::internal::Builtins::SetUp (this=0x7fffffffb9e0, isolate=0x235abb0, create_heap_objects=false) at ../src/builtins.cc:1567
#2 0x00007ffff3e271cf in v8::internal::Isolate::Init (this=0x235abb0, des=0x0) at ../src/isolate.cc:2115
#3 0x00007ffff3c96049 in v8::Isolate::New (params=...) at ../src/api.cc:6861
#4 0x00007ffff3b78d40 in InitV8ExtensionInterFace () at ../Framework/ExPublic.cpp:107
#5 0x00000000004729db in myTest1 () at arangod/RestServer/arangod.cpp:106
#6 0x0000000000472a50 in main (argc=1, argv=0x7fffffffe118) at arangod/RestServer/arangod.cpp:126
It appears that in the V8 function void Builtins::SetUp(Isolate* isolate, bool create_heap_objects), the array functions is empty. If I initialize v8::Platform, the error will occur in code V8::InitializePlatform(platform):
extern "C" void InitV8ExtensionInterFace(){
v8::V8::InitializeICU();
v8::Platform* platform = v8::platform::CreateDefaultPlatform();
v8::V8::InitializePlatform(platform); **//error occur**
v8::V8::Initialize();
v8::Isolate* isolate = v8::Isolate::New();
threadfunc(argc, args);
}
gdb stack trace:
1: V8_Fatal
2: v8::internal::V8::InitializePlatform(v8::Platform*)
3: InitV8ExtensionInterFace
4: 0x4aab60
5: 0x4aacf1
6: 0x5f4532
7: 0x47c32b
8: 0x474498
9: 0x472ae1
10: __libc_start_main
11: 0x4726f9
Thread 1 received signal SIGABRT, Aborted.
0x00007ffff66595e5 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff66595e5 in raise () from /lib64/libc.so.6
#1 0x00007ffff665adc5 in abort () from /lib64/libc.so.6
#2 0x00007fffc6e7d9c9 in v8::base::OS::Abort () at ../src/base/platform/platform-posix.cc:233
#3 0x00007fffc6e7b586 in V8_Fatal (file=0x7fffc703f535 "../src/v8.cc", line=107, format=0x7fffc6ffe77a "Check failed: %s.") at ../src/base/logging.cc:116
#4 0x00007fffc6cbd909 in v8::internal::V8::InitializePlatform (platform=0x267d840) at ../src/v8.cc:107
#5 0x00007fffc690bc8b in InitV8ExtensionInterFace () at ../Framework/ExPublic.cpp:98
#6 0x00000000004aab60 in myTest () at arangod/V8Server/ApplicationV8.cpp:1068
#7 0x00000000004aacf1 in triagens::arango::ApplicationV8::prepare2 (this=0x2378310) at arangod/V8Server/ApplicationV8.cpp:1093
#8 0x00000000005f4532 in triagens::rest::ApplicationServer::prepare2 (this=0x2377000) at arangod/ApplicationServer/ApplicationServer.cpp:525
#9 0x000000000047c32b in triagens::arango::ArangoServer::startupServer (this=0x2375330) at arangod/RestServer/ArangoServer.cpp:1009
#10 0x0000000000474498 in triagens::rest::AnyServer::start (this=0x2375330) at arangod/Rest/AnyServer.cpp:347
#11 0x0000000000472ae1 in main (argc=1, argv=0x7fffffffe118) at arangod/RestServer/arangod.cpp:139
I get "4.3.61" at runtime with v8::V8::GetVersion.
This problem has troubled me for several days, Very much hope that someone will give me help, thank you.
You're missing a call to V8::InitializeExternalStartupData(), see the Get Started tutorials.
int main(int argc, char* argv[])
{
// Initialize V8.
V8::InitializeICU();
V8::InitializeExternalStartupData(argv[0]);
Platform* platform = platform::CreateDefaultPlatform();
V8::InitializePlatform(platform);
V8::Initialize();
// ...
}
You will need to copy natives_blob.bin and snapshot_blob.bin alongside your executable. They should be somewhere with the V8 binaries.
Looking at your edits with a better stack trace, you're running into two very different problems. Your first crash (the empty functions array) is because you're not calling CreateDefaultPlatform(), which is mandatory.
The second crash is inside InitializePlatform() on this line:
void V8::InitializePlatform(v8::Platform* platform) {
CHECK(!platform_); // <- here
CHECK(platform);
platform_ = platform;
}
This check is to make sure the default platform is only created once. It appears that you're calling InitializePlatform() twice. You can try putting a breakpoint in it to figure out where it gets called from.
I got the following segmentation fault:
Program terminated with signal 11, Segmentation fault.
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
145 dispose();
Missing separate debuginfos, use: debuginfo-install boost-filesystem- 1.41.0-11.el6_1.2.x86_64 boost-program-options-1.41.0-11.el6_1.2.x86_64 boost-system-1.41.0-11.el6_1.2.x86_64 bzip2-libs-1.0.5-7.el6_0.x86_64 glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 lzo-2.03-3.1.el6.x86_64
(gdb) bt
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#1 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:217
#2 0x00007f13fad83dab in boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) ()
from /usr/local/lib/libboost_thread.so.1.40.0
#3 0x000000000042e191 in boost::thread_specific_ptr<infrastructure::tfeed::sequenced_data_queue_element_t::mem_prealloc>::release (this=<value optimized out>)
at /usr/local/include/boost/thread/tss.hpp:95
#4 0x000000000042ed43 in infrastructure::tfeed::sequenced_data_queue_element_t::operator new (size=16)
at ../../../infrastructure/include/tfeed/tfeed_multicast_defs.h:120
Also, I got a similar seg fault in another thread:
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f122e1fe700 (LWP 7547))]
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
145 dispose();
(gdb) bt
#0 0x000000000040fbf6 in release (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#1 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:217
#2 0x00007f13fad83dab in boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) ()
from /usr/local/lib/libboost_thread.so.1.40.0
#3 0x000000000042e591 in release (q_elem=<value optimized out>) at /usr/local/include/boost/thread/tss.hpp:95
#4 infrastructure::tfeed::sequenced_data_queue_element_t::operator delete (q_elem=<value optimized out>)
at ../../../infrastructure/include/tfeed/tfeed_multicast_defs.h:144
this happened while using the boost's thread specific pointer at /usr/local/include/boost/thread/tss.hpp:95 at (set_tss_data() call below)
T* release()
{
T* const temp=get();
detail::set_tss_data(this,boost::shared_ptr<detail::tss_cleanup_function>(),0,false);
return temp;
}
and futher at sp_counted_base_gcc_x86.hpp(at dispose())
void release() // nothrow
{
if( atomic_exchange_and_add( &use_count_, -1 ) == 1 )
{
dispose();
weak_release();
}
}
I am using thread specific pointer while specialized new and delete calls for a datastructure(sequenced_data_queue_element_t). As this new and delete are called from multiple threads:
class sequenced_data_queue_element_t
{
public:
sequenced_data_queue_element_t() {
}
~sequenced_data_queue_element_t() {
delete data;
}
unsigned char* data;
uint32_t data_len;
typedef struct mem_prealloc
{
struct mem_prealloc* next;
} mem_prealloc_t;
static boost::thread_specific_ptr<mem_prealloc_t> mem_prealloc_q_head;
static void* operator new(size_t size)
{
mem_prealloc_t* q_elem;
if (UNLIKELY(mem_prealloc_q_head.get() == NULL))
{
/* allocate PREALLOC_BATCH elems at a time */
for (int i=0; i < MEM_PREALLOC_BATCH; i++)
{
q_elem = (mem_prealloc_t*)malloc(size);
q_elem->next = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(q_elem);
cur_mem_prealloced += size;
}
}
q_elem = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(q_elem->next);
return (void*)q_elem;
}
static void operator delete(void* q_elem)
{
/* C++ guarantees that an object's destructor
* is automatically called just before delete executes. */
/* next reuses the first pointer of sequenced_data_element_t */
((mem_prealloc_t*)q_elem)->next = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset((mem_prealloc_t*)q_elem);
if (cur_mem_prealloced > MEM_PREALLOC_MAX_BYTES)
{
for (int i=0; i < MEM_PREALLOC_BATCH; i++)
{
mem_prealloc_t* qelem = mem_prealloc_q_head.release();
mem_prealloc_q_head.reset(qelem->next);
free(qelem);
cur_mem_prealloced -= sizeof(sequenced_data_queue_element_t);
if (mem_prealloc_q_head.get() == NULL)
break;
}
}
}
};
cur_mem_prealloced(uint64_t) is a global variable.
What could be the possible reason triggering this bug?
Further, the stack of some other threads of the program seems to be corrupted. Also, the core dump shows unexpected code paths for those other threads.
the kernel logs shows the following error message:
[7547]: segfault at 10 ip 000000000040fbf6 sp 00007f122e1fcd90 error 4
[7531]: segfault at 10 ip 000000000040fbf6 sp 00007fffc94bcca0 error 4 in tbt[400000+a0000] in tbt[400000+a0000]
Can these seg faults be triggered in case of stack corruption as well?
Here is the backtrace of gdb,
Program terminated with signal 11, Segmentation fault.
#0 0xb7e78830 in Gtk::Widget::get_width () from /usr/lib/libgtkmm-2.4.so.1
(gdb) bt
#0 0xb7e78830 in Gtk::Widget::get_width () from /usr/lib/libgtkmm-2.4.so.1
#1 0x08221d5d in sigc::bound_mem_functor0<bool, videoScreen>::operator() (this=0xb1c04714)
at /usr/include/sigc++-2.0/sigc++/functors/mem_fun.h:1787`enter code here`
#2 0x08221d76 in sigc::adaptor_functor<sigc::bound_mem_functor0<bool, videoScreen> >::operator() (this=0xb1c04710)
at /usr/include/sigc++-2.0/sigc++/adaptors/adaptor_trait.h:251
#3 0x08221d96 in sigc::internal::slot_call0<sigc::bound_mem_functor0<bool, videoScreen>, bool>::call_it (rep=0xb1c046f8)
at /usr/include/sigc++-2.0/sigc++/functors/slot.h:103
#4 0xb7b1ed35 in ?? () from /usr/lib/libglibmm-2.4.so.1
#5 0xb73c6bb6 in ?? () from /usr/lib/libglib-2.0.so.0
#6 0xb28ff1f8 in ?? ()
#7 0xb647479c in __pthread_mutex_unlock_usercnt () from /lib/libpthread.so.0
#8 0xb73c6446 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#9 0xb73c97e2 in ?? () from /usr/lib/libglib-2.0.so.0
#10 0xb3d11af8 in ?? ()
#11 0x00000000 in ?? ()
I figured out the line of crash,here is the code around that line.
1:currPicLoaded = 1;
2:int status = -1;
3:zoomedPicWidth = drawVideo1->get_width();
I figured out that above line is 3 is the cause of crash, but this line execute 5 times before crash.So I do not know why it does crash at 6th time.
PS : Above line of code is with in a thread which run continuously.
Any help is more than welcome :)
how should I proceed
Your very first step should be to find out which instruction caused the SIGSEGV. Do this:
(gdb) x/i $pc
The most likely cause is that your drawVideo1 object is either dangling (has been deleted), or is corrupt in some other way.
Since you are apparently on Linux (you didn't say, but you should always say), the first tool to reach for for debugging "strange" problems like this is Valgrind.
I am trying to port some Python ctypes code from a Windows-specific program to link with a Linux port of my library. The shortest Python code sample that describes my problem is shown below. When I try to execute it, I receive a segmentation fault in examine_arguments() in Python. I placed a printf statement in my library at the crashing function call, but it is never executed, which leads me to think the problem is in the ctypes code.
import ctypes
avidll = ctypes.CDLL("libavxsynth.so")
class AVS_Value(ctypes.Structure, object):
def __init__(self, val=None):
self.type=ctypes.c_short(105) # 'i'
self.array_size = 5
self.d.i = 99
class U(ctypes.Union):
_fields_ = [("c", ctypes.c_void_p),
("b", ctypes.c_long),
("i", ctypes.c_int),
("f", ctypes.c_float),
("s", ctypes.c_char_p),
("a", ctypes.POINTER(AVS_Value))]
AVS_Value._fields_ = [("type", ctypes.c_short),
("array_size", ctypes.c_short),
("d", U)]
avs_create_script_environment = avidll.avs_create_script_environment
avs_create_script_environment.restype = ctypes.c_void_p
avs_create_script_environment.argtypes = [ctypes.c_int]
avs_set_var = avidll.avs_set_var
avs_set_var.restype = ctypes.c_int
avs_set_var.argtypes = [ctypes.c_void_p, ctypes.c_char_p, AVS_Value]
env = avs_create_script_environment(2)
val = AVS_Value()
res = avs_set_var(env, b'test', val)
My library has the following in its headers, and a plain-C program doing what I describe above (calling create_script_environment followed by set_var) runs fine. Looking at logging information my library is putting onto the console, the crash happens when I try to enter avs_set_var.
typedef struct AVS_ScriptEnvironment AVS_ScriptEnvironment;
typedef struct AVS_Value AVS_Value;
struct AVS_Value {
short type; // 'a'rray, 'c'lip, 'b'ool, 'i'nt, 'f'loat, 's'tring, 'v'oid, or 'l'ong
// for some function e'rror
short array_size;
union {
void * clip; // do not use directly, use avs_take_clip
char boolean;
int integer;
float floating_pt;
const char * string;
const AVS_Value * array;
} d;
};
AVS_ScriptEnvironment * avs_create_script_environment(int version);
int avs_set_var(AVS_ScriptEnvironment *, const char* name, AVS_Value val);
I tried backtracing the call from GDB, but I don't understand how to interpret the results nor really much about using GDB.
#0 0x00007ffff61d6490 in examine_argument () from /usr/lib/python2.7/lib-dynload/_ctypes.so
#1 0x00007ffff61d65ba in ffi_prep_cif_machdep () from /usr/lib/python2.7/lib-dynload/_ctypes.so
#2 0x00007ffff61d3447 in ffi_prep_cif () from /usr/lib/python2.7/lib-dynload/_ctypes.so
#3 0x00007ffff61c7275 in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.so
#4 0x00007ffff61c7aa2 in PyCFuncPtr_call.2798 () from /usr/lib/python2.7/lib-dynload/_ctypes.so
#5 0x00000000004c7c76 in PyObject_Call ()
#6 0x000000000042aa4a in PyEval_EvalFrameEx ()
#7 0x00000000004317f2 in PyEval_EvalCodeEx ()
#8 0x000000000054b171 in PyRun_FileExFlags ()
#9 0x000000000054b7d8 in PyRun_SimpleFileExFlags ()
#10 0x000000000054c5d6 in Py_Main ()
#11 0x00007ffff68e576d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x000000000041b931 in _start ()
I'm at a loss as to how to approach this problem. I've looked at the details of the calling types, but I don't see anything obviously incorrect there. Am I falling into any platform-specific usages of types?
Edit It seems there's a problem with 32-bit vs 64-bit architectures in the ctypes module. When I tested this again with a 32-bit build of my library and 32-bit Python, it ran successfully. On 64-bit, it segfaults at the same place.
Try using c_void_p for the opaque AVS_ScriptEnvironment*:
avs_create_script_environment.restype = c_void_p
and:
avs_set_var.argtypes=[c_void_p,ctypes.c_char_p,AVS_Value]
I wrote a php extension but couldn't find the reason why it reported segmentation fault.
related code:
char *tkey = (char *)emalloc(sizeof(char)*(result_pair[num-1].length+1));
std::memcpy(tkey, (text+i),result_pair[num-1].length);
tkey[result_pair[num-1].length]='\0';
if (zend_hash_find(HASH_OF(return_value), tkey, result_pair[num-1].length+1, (void**)&origval) == SUCCESS) {
ZVAL_LONG(*origval, Z_LVAL_P(*origval)+1);
zend_hash_update(HASH_OF(return_value), tkey, result_pair[num-1].length+1, origval, sizeof(origval), NULL);
} else {
add_assoc_long(return_value, tkey, 1);
}
efree(tkey);
num--;
The following were from gdb
Program received signal SIGSEGV, Segmentation fault.
_zend_mm_alloc_int (heap=0x1ca07750, size=32) at /php-5.3.3/Zend/zend_alloc.c:1825
1825 heap->cache[index] = best_fit->prev_free_block;
Current language: auto; currently c
(gdb) bt
#0 _zend_mm_alloc_int (heap=0x1ca07750, size=32) at /php-5.3.3/Zend/zend_alloc.c:1825
#1 0x0000000000729160 in add_assoc_long_ex (arg=0x1cc29380, key=0x20 <Address 0x20 out of bounds>, key_len=2, n=2) at /php-5.3.3/Zend/zend_API.c:1117
#2 0x00002b2ebee6552e in zif_xs_search (ht=<value optimized out>, return_value=0x1cc29380, return_value_ptr=<value optimized out>,
this_ptr=<value optimized out>, return_value_used=<value optimized out>) at /release/xsplit_t/xsplit.cpp:1007
#3 0x000000000076b489 in zend_do_fcall_common_helper_SPEC (execute_data=0x2b2ebf070050) at /php-5.3.3/Zend/zend_vm_execute.h:316
#4 0x0000000000741cae in execute (op_array=0x1cc26378) at /php-5.3.3/Zend/zend_vm_execute.h:107
#5 0x000000000071e5c9 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /php-5.3.3/Zend/zend.c:1194
#6 0x00000000006cc8b8 in php_execute_script (primary_file=0x7fffb5324230) at /php-5.3.3/main/main.c:2260
#7 0x00000000007a897e in main (argc=2, argv=0x7fffb53244a8) at /php-5.3.3/sapi/cli/php_cli.c:1192
(gdb) frame 2
#2 0x00002b2ebee6552e in zif_xs_search (ht=<value optimized out>, return_value=0x1cc29380, return_value_ptr=<value optimized out>,
this_ptr=<value optimized out>, return_value_used=<value optimized out>) at /release/xsplit_t/xsplit.cpp:1007
1007 add_assoc_long(return_value, tkey, 1);
Current language: auto; currently c++
I figured out that the problem was add_assoc_long function, 'tkey' is declared as:
char *tkey = (char *)emalloc(sizeof(char)*(result_pair[num-1].length+1));
Segfault only occurred under some circumstances but not always, but I thought tkey wouldn't have any problems. Any help is appreciated, thanks a lot~
Your memcpy is not copying the NUL character, you need a +1 for the number of bytes to be copied:
std::memcpy(tkey, (text+i),result_pair[num-1].length+1);
^^