I'm trying to write some code that will allow me to insert into a C++ map from an extern "C" function. The code is as follows:
class CFITracing {
std::unordered_map<uintptr_t, uintptr_t> CallerCalleePairs;
std::map<std::string, int> BranchResults;
public:
void HandleCallerCallee(uintptr_t Caller, uintptr_t Callee);
void HandleBranchResult(int cond, char* branchName);
void printResults();
};
...
void CFITracing::HandleBranchResult(int cond, char* branchName) {
std::string branchStr(branchName);
printf("%s\n", "pre");
/* segfault at this line, regardless of what string I use as a key (even "hi") */
BranchResults[branchStr] = cond;
printf("%s\n", "success");
}
CFITracing CFIT;
__attribute__((used))
__attribute__((optnone))
extern "C" void __trace(int cond, char* branchName) {
CFIT.HandleBranchResult(cond, branchName);
}
Calls to the __trace function are inserted into the binary via an LLVM pass I've written, which passes an int and char* to my code above.
On calls to __trace, "pre" is printed repeatedly until a segmentation fault occurs. GDB shows that, when the line with the map insert occurs, the code somehow loops and this line is called repeatedly until the segfault occurs.
When debugging via valgrind, the following error occurs:
==20881== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==20881==
==20881== Process terminating with default action of signal 11 (SIGSEGV)
==20881== Access not within mapped region at address 0x1FFE801FF8
==20881== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==20881== at 0x58069A2: _IO_file_xsputn##GLIBC_2.2.5 (fileops.c:1220)
The presence of a stack overflow, together with the repeated prints, make me think infinite recursion has been triggered. I think it's likely I've created some type of undefined behavior, but I'm not sure exactly how to write this code to prevent it, given that I need to insert into the map via this extern "C" __trace() function.
Thus, my questions are as follows: is there anything quick I can fix to prevent this behavior from taking place? If not, how should I aim to redesign this, given that I need to insert into the map via the extern "C" function? Thanks for your help!
Related
I'm designing a preloader-based lock tracing utility that attaches to Pthreads, and I've run into a weird issue. The program works by providing wrappers that replace relevant Pthreads functions at runtime; these do some logging, and then pass the args to the real Pthreads function to do the work. They do not modify the arguments passed to them, obviously. However, when testing, I discovered that the condition variable pointer passed to my pthread_cond_wait() wrapper does not match the one that gets passed to the underlying Pthreads function, which promptly crashes with "futex facility returned an unexpected error code," which, from what I've gathered, usually indicates an invalid sync object passed in. Relevant stack trace from GDB:
#8 __pthread_cond_wait (cond=0x7f1b14000d12, mutex=0x55a2b961eec0) at pthread_cond_wait.c:638
#9 0x00007f1b1a47b6ae in pthread_cond_wait (cond=0x55a2b961f290, lk=0x55a2b961eec0)
at pthread_trace.cpp:56
I'm pretty mystified. Here's the code for my pthread_cond_wait() wrapper:
int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* lk) {
// log arrival at wait
the_tracer.add_event(lktrace::event::COND_WAIT, (size_t) cond);
// run pthreads function
GET_REAL_FN(pthread_cond_wait, int, pthread_cond_t*, pthread_mutex_t*);
int e = REAL_FN(cond, lk);
if (e == 0) the_tracer.add_event(lktrace::event::COND_LEAVE, (size_t) cond);
else {
the_tracer.add_event(lktrace::event::COND_ERR, (size_t) cond);
}
return e;
}
// GET_REAL_FN is defined as:
#define GET_REAL_FN(name, rtn, params...) \
typedef rtn (*real_fn_t)(params); \
static const real_fn_t REAL_FN = (real_fn_t) dlsym(RTLD_NEXT, #name); \
assert(REAL_FN != NULL) // semicolon absence intentional
And here's the code for __pthread_cond_wait in glibc 2.31 (this is the function that gets called if you call pthread_cond_wait normally, it has a different name because of versioning stuff. The stack trace above confirms that this is the function that REAL_FN points to):
int
__pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex)
{
/* clockid is unused when abstime is NULL. */
return __pthread_cond_wait_common (cond, mutex, 0, NULL);
}
As you can see, neither of these functions modifies cond, yet it is not the same in the two frames. Examining the two different pointers in a core dump shows that they point to different contents, as well. I can also see in the core dump that cond does not appear to change in my wrapper function (i.e. it's still equal to 0x5... in frame 9 at the crash point, which is the call to REAL_FN). I can't really tell which pointer is correct by looking at their contents, but I'd assume it's the one passed in to my wrapper from the target application. Both pointers point to valid segments for program data (marked ALLOC, LOAD, HAS_CONTENTS).
My tool is definitely causing the error somehow, the target application runs fine if it is not attached. What am I missing?
UPDATE: Actually, this doesn't appear to be what's causing the error, because calls to my pthread_cond_wait() wrapper succeed many times before the error occurs, and exhibit similar behavior (pointer value changing between frames without explanation) each time. I'm leaving the question open, though, because I still don't understand what's going on here and I'd like to learn.
UPDATE 2: As requested, here's the code for tracer.add_event():
// add an event to the calling thread's history
// hist_entry ctor gets timestamp & stack trace
void tracer::add_event(event e, size_t obj_addr) {
size_t tid = get_tid();
hist_map::iterator hist = histories.contains(tid);
assert(hist != histories.end());
hist_entry ev (e, obj_addr);
hist->second.push_back(ev);
}
// hist_entry ctor:
hist_entry::hist_entry(event e, size_t obj_addr) :
ts(chrono::steady_clock::now()), ev(e), addr(obj_addr) {
// these are set in the tracer ctor
assert(start_addr && end_addr);
void* buf[TRACE_DEPTH];
int v = backtrace(buf, TRACE_DEPTH);
int a = 0;
// find first frame outside of our own code
while (a < v && start_addr < (size_t) buf[a] &&
end_addr > (size_t) buf[a]) ++a;
// skip requested amount of frames
a += TRACE_SKIP;
if (a >= v) a = v-1;
caller = buf[a];
}
histories is a lock-free concurrent hashmap from libcds (mapping tid->per-thread vectors of hist_entry), and its iterators are guaranteed to be thread-safe as well. GNU docs say backtrace() is thread-safe, and there's no data races mentioned in the CPP docs for steady_clock::now(). get_tid() just calls pthread_self() using the same method as the wrapper functions, and casts its result to size_t.
Hah, figured it out! The issue is that Glibc exposes multiple versions of pthread_cond_wait(), for backwards compatibility. The version I reproduce in my question is the current version, the one we want to call. The version that dlsym() was finding is the backwards-compatible version:
int
__pthread_cond_wait_2_0 (pthread_cond_2_0_t *cond, pthread_mutex_t *mutex)
{
if (cond->cond == NULL)
{
pthread_cond_t *newcond;
newcond = (pthread_cond_t *) calloc (sizeof (pthread_cond_t), 1);
if (newcond == NULL)
return ENOMEM;
if (atomic_compare_and_exchange_bool_acq (&cond->cond, newcond, NULL))
/* Somebody else just initialized the condvar. */
free (newcond);
}
return __pthread_cond_wait (cond->cond, mutex);
}
As you can see, this version tail-calls the current one, which is probably why this took so long to detect: GDB is normally pretty good at detecting frames elided by tail calls, but I'm guessing it didn't detect this one because the functions have the "same" name (and the error doesn't affect the mutex functions because they don't expose multiple versions). This blog post goes into much more detail, coincidentally specifically about pthread_cond_wait(). I stepped through this function many times while debugging and sort of tuned it out, because every call into glibc is wrapped in multiple layers of indirection; I only realized what was going on when I set a breakpoint on the pthread_cond_wait symbol, instead of a line number, and it stopped at this function.
Anyway, this explains the changing pointer phenomenon: what happens is that the old, incorrect function gets called, reinterprets the pthread_cond_t object as a struct containing a pointer to a pthread_cond_t object, allocates a new pthread_cond_t for that pointer, and then passes the newly allocated one to the new, correct function. The frame of the old function gets elided by the tail-call, and to a GDB backtrace after leaving the old function it looks like the correct function gets called directly from my wrapper, with a mysteriously changed argument.
The fix for this was simple: GNU provides the libdl extension dlvsym(), which is like dlsym() but also takes a version string. Looking for pthread_cond_wait with version string "GLIBC_2.3.2" solves the problem. Note that these versions do not usually correspond to the current version (i.e. pthread_create()/exit() have version string "GLIBC_2.2.5"), so they need to be looked up on a per-function basis. The correct string can be determined either by looking at the compat_symbol() or versioned_symbol() macros that are somewhere near the function definition in the glibc source, or by using readelf to see the names of the symbols in the compiled library (mine has "pthread_cond_wait##GLIBC_2.3.2" and "pthread_cond_wait##GLIBC_2.2.5").
I've got 2 functions, func1() and func2(). func2 takes a character array as input. Both functions run on different threads. I call func2 from func1. When I passed a stack allocated array to func2, I got garbage values when I printed the array from inside func2(). However, when I passed a heap allocated array to func2, I got the correct string inside func2() i.e.
func2(char * char_array)
{
/* Some code */
cout<<char_array;
}
/* This does not work(Garbage values were printed in func2()) */
func1()
{
char array_of_char[SIZE];
memset(array_of_char,0,SIZE);
strncpy(array_of_char,"SOME_STRING",SIZE);
func2(array_of_char); //Asynchronous call to func2(). func1() proceeds immediately.
/*
some code
*/
}
/* This works(Correct values were printed in func2) */
func1()
{
char * array_of_char=new char[SIZE];
memset(array_of_char,0,SIZE);
strncpy(array_of_char,"SOME_STRING",SIZE);
func2(array_of_char); //Asynchronous call to func2(). func1() proceeds immediately.
/*
some code
*/
}
Does this mean that in multi-threaded programs, whenever some pointer has to be passed between different threads, the pointer should always be pointing at a heap-allocated memory?
Please note that func2() is actually a callback function which
executes on the occurrence of an event. I've hidden those details in
the question. Func1() does not stop/wait for the execution of func2().
Edit: I feel like I need to provide some more details of my implementation. In my program, I'm using Datastax C++ client library for Cassandra. Please find the link in the end, containing some details of the functions used in the program:
int main()
{
func1();
/* some code */
return 0;
}
/* This does not work(Garbage is printed in func2) */
void func1()
{
/* some code */
char array_of_char[SIZE];
memset(array_of_char,0,SIZE);
strncpy(array_of_char,"SOME_STRING",SIZE);
CassFuture* l_query_future = NULL;
/* some code where query is created */
l_query_future = cass_session_execute(rtGetSession(), l_stmt); //l_stmt is the query statement, rtgetSession() returns CassSession *
cass_future_set_callback ( l_query_future, func2, (void *)array_of_char); //details in the link given in the end
/* some code */
}
/* This works(Correct values were printed in func2) */
void func1()
{
/* some code */
char * array_of_char=new char[SIZE];
memset(array_of_char,0,SIZE);
strncpy(array_of_char,"SOME_STRING",SIZE);
CassFuture* l_query_future = NULL;
/* some code where query is created */
l_query_future = cass_session_execute(rtGetSession(), l_stmt); //l_stmt is the query statement, rtgetSession() returns CassSession *
cass_future_set_callback ( l_query_future, func2, (void *)array_of_char);
/*
some code
*/
}
void func2(CassFuture* l_query_future, void * data)
{
/* some code */
cout<<(char *)data;
}
References for Datastax driver APIs:
cass_future_set_callback
CassFuture
CassSession
cass_session_execute
How do you run func1() and func2() under different threads? func1() directly calls func2(), so they run under the same thread. Even the first implementation of func1() should work, as the array is still in its place.
EDIT:
But calling func2() directly from within func1() isn't an "asynchronous call to func2()" (even if at some other point it's used as a thread function). "Asynchronous call" means creating a new thread with func2() as the thread function. If so, this behaviour is very much expected, because func1() may have exited when func2() runs, and the array wouldn't exist by that time. On the other hand, the heap-block would still be allocated, so this would work. func2() should release the block then.
EDIT2:
Ummm, yes, the 2nd version is indeed an "asynchronous call to func2()", so the objects' lifetime considerations listed above indeed apply.
Does this mean that in multi-threaded programs, whenever some pointer has to be passed between different threads, the pointer should always be pointing at a heap-allocated memory?
No. This means that you must keep track of the lifetimes of your objects properly. When thread 1 finishes execution, the stack will be automatically cleaned up, thus spoiling the results that thread 2 is working with. Heap memory on the other hand stays if not explicitely freed. You have to check in thread 1 whether thread 2 is still executing and wait until it is finished, e.g. by using the join function.
No, the pointer does not have to point to heap-allocated memory, but you have to ensure the memory (in this case - an array) will be available until you join the thread.
Here, in the version that doesn't work, the array is allocated on the stack, and when the function func1 finishes, it is destroyed. Hence the rubbish values - likely something has written at that address slready.
To work around this, you could wait until the thread finishes in func1. The local variable would in this case be OK.
This code runs fine with the array allocated on the stack. As has been mentioned there is absolutely no threading happening here. To answer your question though, when passing pointers (or other data) between different threads (when you are actually using them) you would likely need some kind of synchronization such as a mutex or atomics, and of course ensure the lifetimes of any data.
Here is your code working,
#include <iostream>
#include <cstring>
using namespace std;
#define SIZE 20
void func2(char * char_array)
{
/* Some code */
cout<<char_array;
}
void func1()
{
char array_of_char[SIZE];
strncpy(array_of_char,"SOME_STRING",SIZE);
func2(array_of_char);
}
int main() {
func1();
return 0;
}
Demo
I should note that you are copying additional garbage by strncpy'ing SIZE into the array, you really just want to copy strlen("SOME_STRING").
I am on Linux (CentOS 7.4, compiler Clang) and getting a segmentation fault (not easily reproducible) within a C++ struct object. This is a class member of a polymorphic object I do not allocate, but is instantiated within a framework I do not have the source code to. This means I cannot compile using sanitize easily and Valgrind increases the initialisation time from seconds to 5 minutes):
// C is allocated within a third party framework, I assume they use new()
//
class C : public ThirdPartyParentClass
{
S s;
}
struct S
{
.
std::mutex _mutex;
.
};
the segmentation fault corrupts _mutex.
I therefore added a char buffer so I could see the corruption:
struct S
{
.
char _buffer[1000];
std::mutex _mutex;
.
};
and I can see the corrupted bytes when the segmentation fault occurs. However, I cannot determine when the corruption takes places.
To determine when the corruption takes place I would like to protect the char buffer bytes. I tried:
struct S
{
S()
{
mprotect(&_buffer[0], 4096, PROT_NONE);
const int test = buffer[0]; // Trigger seg fault to test it works
}
.
char _buffer[4096]__attribute__((aligned(4096)));
std::mutex _mutex;
.
};
but my test to determine the memory protection is working, doesn't cause a seg fault.
Could somebody please help?
Doing this at the source level is a bit silly. If you want to find the exact moment when something gets written to a particular memory address, use a data breakpoint (gcc calls them "watchpoints"). Just do watch *(int*)0xWHATEVER on the area of memory you expect to be corrupted, and it'll break on the first modification, with very low overhead.
So I have a library (not written by me) which unfortunately uses abort() to deal with certain errors. At the application level, these errors are recoverable so I would like to handle them instead of the user seeing a crash. So I end up writing code like this:
static jmp_buf abort_buffer;
static void abort_handler(int) {
longjmp(abort_buffer, 1); // perhaps siglongjmp if available..
}
int function(int x, int y) {
struct sigaction new_sa;
struct sigaction old_sa;
sigemptyset(&new_sa.sa_mask);
new_sa.sa_handler = abort_handler;
sigaction(SIGABRT, &new_sa, &old_sa);
if(setjmp(abort_buffer)) {
sigaction(SIGABRT, &old_sa, 0);
return -1
}
// attempt to do some work here
int result = f(x, y); // may call abort!
sigaction(SIGABRT, &old_sa, 0);
return result;
}
Not very elegant code. Since this pattern ends up having to be repeated in a few spots of the code, I would like to simplify it a little and possibly wrap it in a reusable object. My first attempt involves using RAII to handle the setup/teardown of the signal handler (needs to be done because each function needs different error handling). So I came up with this:
template <int N>
struct signal_guard {
signal_guard(void (*f)(int)) {
sigemptyset(&new_sa.sa_mask);
new_sa.sa_handler = f;
sigaction(N, &new_sa, &old_sa);
}
~signal_guard() {
sigaction(N, &old_sa, 0);
}
private:
struct sigaction new_sa;
struct sigaction old_sa;
};
static jmp_buf abort_buffer;
static void abort_handler(int) {
longjmp(abort_buffer, 1);
}
int function(int x, int y) {
signal_guard<SIGABRT> sig_guard(abort_handler);
if(setjmp(abort_buffer)) {
return -1;
}
return f(x, y);
}
Certainly the body of function is much simpler and more clear this way, but this morning a thought occurred to me. Is this guaranteed to work? Here's my thoughts:
No variables are volatile or change between calls to setjmp/longjmp.
I am longjmping to a location in the same stack frame as the setjmp and returning normally, so I am allowing the code to execute the cleanup code that the compiler emitted at the exit points of the function.
It appears to work as expected.
But I still get the feeling that this is likely undefined behavior. What do you guys think?
I assume that f is in a third party library/app, because otherwise you could just fix it to not call abort. Given that, and that RAII may or may not reliably produce the right results on all platforms/compilers, you have a few options.
Create a tiny shared object that defines abort and LD_PRELOAD it. Then you control what happens on abort, and NOT in a signal handler.
Run f within a subprocess. Then you just check the return code and if it failed try again with updated inputs.
Instead of using the RAII, just call your original function from multiple call points and let it manually do the setup/teardown explicitly. It still eliminates the copy-paste in that case.
I actually like your solution, and have coded something similar in test harnesses to check that a target function assert()s as expected.
I can't see any reason for this code to invoke undefined behaviour. The C Standard seems to bless it: handlers resulting from an abort() are exempted from the restriction on calling library functions from a handler. (Caveat: this is 7.14.1.1(5) of C99 - sadly, I don't have a copy of C90, the version referenced by the C++ Standard).
C++03 adds a further restriction: If any automatic objects would be destroyed by a thrown exception transferring control to another (destination) point in the program, then a call to longjmp(jbuf, val) at the throw point that transfers control to the same (destination) point has undefined behavior. I'm supposing that your statement that 'No variables are volatile or change between calls to setjmp/longjmp' includes instantiating any automatic C++ objects. (I guess this is some legacy C library?).
Nor is POSIX async signal safety (or lack thereof) an issue - abort() generates its SIGABRT synchronously with program execution.
The biggest concern would be corrupting the global state of the 3rd party code: it's unlikely that the author will take pains to get the state consistent before an abort(). But, if you're correct that no variables change, then this isn't a problem.
If someone with a better understanding of the standardese can prove me wrong, I'd appreciate the enlightenment.
Is there any way, short of putting an attribute on each function prototype, to let gcc know that C functions can never propagate exceptions, i.e. that all functions declared inside extern "C" should be __attribute__((nothrow))? Ideal would be a -f style command line option.
You can always use -fno-exceptions which will ensure that the c++ compiler does not generate exception propagation code.
Side note:
Are you sure telling the compiler "all these funcs never throw" is exactly what you want ?
It's not necessarily so that extern "C" ... functions cannot propagate/trigger exceptions. Take a case in point:
class Foo {
public:
class Away {};
static void throwaway(void) { throw Away(); }
}
extern "C" {
void wrap_a_call(void (*wrapped)(void)) { wrapped(); }
}
int main(int argc, char **argv)
{
wrap_a_call(Foo::throwaway);
return 0;
}
Compiling and running this creates a C-linkage function wrap_a_call() which, when called like above, will happily cause an exception:
$ ./test
terminate called after throwing an instance of 'Foo::Away'
Abort(coredump)
I.e. there can be "exception leakage" with extern "C" (through invoking function pointers) and just because you're using/invoking extern "C" functions in a particular place in C++ doesn't guarantee no exceptions can be thrown when invoking those.
When exception is raised it generate interrupt which unroll stack and override existing stack. It goes up to the point where try/except syntax is. This mean than you do not have any overhead if you do not use exceptions. Only overhead in memory/time is at try/catch blocks and stack unrolling at throw().
If your c functions does not generate exceptions your overhead will be only in space when you will call try/catch in your C++ but is same for any number of exceptions. (and small time overhead on initialising this small space whit constant).
GCC 4.5 seems to optimize those out for me automatically. Indeed, this line appears in the list of changes at http://gcc.gnu.org/gcc-4.5/changes.html :
GCC now optimize exception handling code. In particular cleanup regions that are proved to not have any effect are optimized out.