How can I schedule some code to run after all '_atexit()' functions are completed - c++

I'm writing a memory tracking system and the only problem I've actually run into is that when the application exits, any static/global classes that didn't allocate in their constructor, but are deallocating in their deconstructor are deallocating after my memory tracking stuff has reported the allocated data as a leak.
As far as I can tell, the only way for me to properly solve this would be to either force the placement of the memory tracker's _atexit callback at the head of the stack (so that it is called last) or have it execute after the entire _atexit stack has been unwound. Is it actually possible to implement either of these solutions, or is there another solution that I have overlooked.
Edit:
I'm working on/developing for Windows XP and compiling with VS2005.

I've finally figured out how to do this under Windows/Visual Studio. Looking through the crt startup function again (specifically where it calls the initializers for globals), I noticed that it was simply running "function pointers" that were contained between certain segments. So with just a little bit of knowledge on how the linker works, I came up with this:
#include <iostream>
using std::cout;
using std::endl;
// Typedef for the function pointer
typedef void (*_PVFV)(void);
// Our various functions/classes that are going to log the application startup/exit
struct TestClass
{
int m_instanceID;
TestClass(int instanceID) : m_instanceID(instanceID) { cout << " Creating TestClass: " << m_instanceID << endl; }
~TestClass() {cout << " Destroying TestClass: " << m_instanceID << endl; }
};
static int InitInt(const char *ptr) { cout << " Initializing Variable: " << ptr << endl; return 42; }
static void LastOnExitFunc() { puts("Called " __FUNCTION__ "();"); }
static void CInit() { puts("Called " __FUNCTION__ "();"); atexit(&LastOnExitFunc); }
static void CppInit() { puts("Called " __FUNCTION__ "();"); }
// our variables to be intialized
extern "C" { static int testCVar1 = InitInt("testCVar1"); }
static TestClass testClassInstance1(1);
static int testCppVar1 = InitInt("testCppVar1");
// Define where our segment names
#define SEGMENT_C_INIT ".CRT$XIM"
#define SEGMENT_CPP_INIT ".CRT$XCM"
// Build our various function tables and insert them into the correct segments.
#pragma data_seg(SEGMENT_C_INIT)
#pragma data_seg(SEGMENT_CPP_INIT)
#pragma data_seg() // Switch back to the default segment
// Call create our call function pointer arrays and place them in the segments created above
#define SEG_ALLOCATE(SEGMENT) __declspec(allocate(SEGMENT))
SEG_ALLOCATE(SEGMENT_C_INIT) _PVFV c_init_funcs[] = { &CInit };
SEG_ALLOCATE(SEGMENT_CPP_INIT) _PVFV cpp_init_funcs[] = { &CppInit };
// Some more variables just to show that declaration order isn't affecting anything
extern "C" { static int testCVar2 = InitInt("testCVar2"); }
static TestClass testClassInstance2(2);
static int testCppVar2 = InitInt("testCppVar2");
// Main function which prints itself just so we can see where the app actually enters
void main()
{
cout << " Entered Main()!" << endl;
}
which outputs:
Called CInit();
Called CppInit();
Initializing Variable: testCVar1
Creating TestClass: 1
Initializing Variable: testCppVar1
Initializing Variable: testCVar2
Creating TestClass: 2
Initializing Variable: testCppVar2
Entered Main()!
Destroying TestClass: 2
Destroying TestClass: 1
Called LastOnExitFunc();
This works due to the way MS have written their runtime library. Basically, they've setup the following variables in the data segments:
(although this info is copyright I believe this is fair use as it doesn't devalue the original and IS only here for reference)
extern _CRTALLOC(".CRT$XIA") _PIFV __xi_a[];
extern _CRTALLOC(".CRT$XIZ") _PIFV __xi_z[]; /* C initializers */
extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];
extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[]; /* C++ initializers */
extern _CRTALLOC(".CRT$XPA") _PVFV __xp_a[];
extern _CRTALLOC(".CRT$XPZ") _PVFV __xp_z[]; /* C pre-terminators */
extern _CRTALLOC(".CRT$XTA") _PVFV __xt_a[];
extern _CRTALLOC(".CRT$XTZ") _PVFV __xt_z[]; /* C terminators */
On initialization, the program simply iterates from '__xN_a' to '__xN_z' (where N is {i,c,p,t}) and calls any non null pointers it finds. If we just insert our own segment in between the segments '.CRT$XnA' and '.CRT$XnZ' (where, once again n is {I,C,P,T}), it will be called along with everything else that normally gets called.
The linker simply joins up the segments in alphabetical order. This makes it extremely simple to select when our functions should be called. If you have a look in defsects.inc (found under $(VS_DIR)\VC\crt\src\) you can see that MS have placed all the "user" initialization functions (that is, the ones that initialize globals in your code) in segments ending with 'U'. This means that we just need to place our initializers in a segment earlier than 'U' and they will be called before any other initializers.
You must be really careful not to use any functionality that isn't initialized until after your selected placement of the function pointers (frankly, I'd recommend you just use .CRT$XCT that way its only your code that hasn't been initialized. I'm not sure what will happen if you've linked with standard 'C' code, you may have to place it in the .CRT$XIT block in that case).
One thing I did discover was that the "pre-terminators" and "terminators" aren't actually stored in the executable if you link against the DLL versions of the runtime library. Due to this, you can't really use them as a general solution. Instead, the way I made it run my specific function as the last "user" function was to simply call atexit() within the 'C initializers', this way, no other function could have been added to the stack (which will be called in the reverse order to which functions are added and is how global/static deconstructors are all called).
Just one final (obvious) note, this is written with Microsoft's runtime library in mind. It may work similar on other platforms/compilers (hopefully you'll be able to get away with just changing the segment names to whatever they use, IF they use the same scheme) but don't count on it.

atexit is processed by the C/C++ runtime (CRT). It runs after main() has already returned. Probably the best way to do this is to replace the standard CRT with your own.
On Windows tlibc is probably a great place to start: http://www.codeproject.com/KB/library/tlibc.aspx
Look at the code sample for mainCRTStartup and just run your code after the call to _doexit();
but before ExitProcess.
Alternatively, you could just get notified when ExitProcess gets called. When ExitProcess gets called the following occurs (according to http://msdn.microsoft.com/en-us/library/ms682658%28VS.85%29.aspx):
All of the threads in the process, except the calling thread, terminate their execution without receiving a DLL_THREAD_DETACH notification.
The states of all of the threads terminated in step 1 become signaled.
The entry-point functions of all loaded dynamic-link libraries (DLLs) are called with DLL_PROCESS_DETACH.
After all attached DLLs have executed any process termination code, the ExitProcess function terminates the current process, including the calling thread.
The state of the calling thread becomes signaled.
All of the object handles opened by the process are closed.
The termination status of the process changes from STILL_ACTIVE to the exit value of the process.
The state of the process object becomes signaled, satisfying any threads that had been waiting for the process to terminate.
So, one method would be to create a DLL and have that DLL attach to the process. It will get notified when the process exits, which should be after atexit has been processed.
Obviously, this is all rather hackish, proceed carefully.

This is dependent on the development platform. For example, Borland C++ has a #pragma which could be used for exactly this. (From Borland C++ 5.0, c. 1995)
#pragma startup function-name [priority]
#pragma exit function-name [priority]
These two pragmas allow the program to specify function(s) that should be called either upon program startup (before the main function is called), or program exit (just before the program terminates through _exit).
The specified function-name must be a previously declared function as:
void function-name(void);
The optional priority should be in the range 64 to 255, with highest priority at 0; default is 100. Functions with higher priorities are called first at startup and last at exit. Priorities from 0 to 63 are used by the C libraries, and should not be used by the user.
Perhaps your C compiler has a similar facility?

I've read multiple times you can't guarantee the construction order of global variables (cite). I'd think it is pretty safe to infer from this that destructor execution order is also not guaranteed.
Therefore if your memory tracking object is global, you will almost certainly be unable any guarantees that your memory tracker object will get destructed last (or constructed first). If it's not destructed last, and other allocations are outstanding, then yes it will notice the leaks you mention.
Also, what platform is this _atexit function defined for?

Having the memory tracker's cleanup executed last is the best solution. The easiest way I've found to do that is to explicitly control all the relevant global variables' initialization order. (Some libraries hide their global state in fancy classes or otherwise, thinking they're following a pattern, but all they do is prevent this kind of flexibility.)
Example main.cpp:
#include "global_init.inc"
int main() {
// do very little work; all initialization, main-specific stuff
// then call your application's mainloop
}
Where the global-initialization file includes object definitions and #includes similar non-header files. Order the objects in this file in the order you want them constructed, and they'll be destructed in the reverse order. 18.3/8 in C++03 guarantees that destruction order mirrors construction: "Non-local objects with static storage duration are destroyed in the reverse order of the completion of their constructor." (That section is talking about exit(), but a return from main is the same, see 3.6.1/5.)
As a bonus, you're guaranteed that all globals (in that file) are initialized before entering main. (Something not guaranteed in the standard, but allowed if implementations choose.)

I've had this exact problem, also writing a memory tracker.
A few things:
Along with destruction, you also need to handle construction. Be prepared for malloc/new to be called BEFORE your memory tracker is constructed (assuming it is written as a class). So you need your class to know whether it has been constructed or destructed yet!
class MemTracker
{
enum State
{
unconstructed = 0, // must be 0 !!!
constructed,
destructed
};
State state;
MemTracker()
{
if (state == unconstructed)
{
// construct...
state = constructed;
}
}
};
static MemTracker memTracker; // all statics are zero-initted by linker
On every allocation that calls into your tracker, construct it!
MemTracker::malloc(...)
{
// force call to constructor, which does nothing after first time
new (this) MemTracker();
...
}
Strange, but true. Anyhow, onto destruction:
~MemTracker()
{
OutputLeaks(file);
state = destructed;
}
So, on destruction, output your results. Yet we know that there will be more calls. What to do? Well,...
MemTracker::free(void * ptr)
{
do_tracking(ptr);
if (state == destructed)
{
// we must getting called late
// so re-output
// Note that this might happen a lot...
OutputLeaks(file); // again!
}
}
And lastly:
be careful with threading
be careful not to call malloc/free/new/delete inside your tracker, or be able to detect the recursion, etc :-)
EDIT:
and I forgot, if you put your tracker in a DLL, you will probably need to LoadLibrary() (or dlopen, etc) yourself to up your reference count, so that you don't get removed from memory prematurely. Because although your class can still be called after destruction, it can't if the code has been unloaded.

Related

Pointer passed to function changes unexpectedly

I'm designing a preloader-based lock tracing utility that attaches to Pthreads, and I've run into a weird issue. The program works by providing wrappers that replace relevant Pthreads functions at runtime; these do some logging, and then pass the args to the real Pthreads function to do the work. They do not modify the arguments passed to them, obviously. However, when testing, I discovered that the condition variable pointer passed to my pthread_cond_wait() wrapper does not match the one that gets passed to the underlying Pthreads function, which promptly crashes with "futex facility returned an unexpected error code," which, from what I've gathered, usually indicates an invalid sync object passed in. Relevant stack trace from GDB:
#8 __pthread_cond_wait (cond=0x7f1b14000d12, mutex=0x55a2b961eec0) at pthread_cond_wait.c:638
#9 0x00007f1b1a47b6ae in pthread_cond_wait (cond=0x55a2b961f290, lk=0x55a2b961eec0)
at pthread_trace.cpp:56
I'm pretty mystified. Here's the code for my pthread_cond_wait() wrapper:
int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* lk) {
// log arrival at wait
the_tracer.add_event(lktrace::event::COND_WAIT, (size_t) cond);
// run pthreads function
GET_REAL_FN(pthread_cond_wait, int, pthread_cond_t*, pthread_mutex_t*);
int e = REAL_FN(cond, lk);
if (e == 0) the_tracer.add_event(lktrace::event::COND_LEAVE, (size_t) cond);
else {
the_tracer.add_event(lktrace::event::COND_ERR, (size_t) cond);
}
return e;
}
// GET_REAL_FN is defined as:
#define GET_REAL_FN(name, rtn, params...) \
typedef rtn (*real_fn_t)(params); \
static const real_fn_t REAL_FN = (real_fn_t) dlsym(RTLD_NEXT, #name); \
assert(REAL_FN != NULL) // semicolon absence intentional
And here's the code for __pthread_cond_wait in glibc 2.31 (this is the function that gets called if you call pthread_cond_wait normally, it has a different name because of versioning stuff. The stack trace above confirms that this is the function that REAL_FN points to):
int
__pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex)
{
/* clockid is unused when abstime is NULL. */
return __pthread_cond_wait_common (cond, mutex, 0, NULL);
}
As you can see, neither of these functions modifies cond, yet it is not the same in the two frames. Examining the two different pointers in a core dump shows that they point to different contents, as well. I can also see in the core dump that cond does not appear to change in my wrapper function (i.e. it's still equal to 0x5... in frame 9 at the crash point, which is the call to REAL_FN). I can't really tell which pointer is correct by looking at their contents, but I'd assume it's the one passed in to my wrapper from the target application. Both pointers point to valid segments for program data (marked ALLOC, LOAD, HAS_CONTENTS).
My tool is definitely causing the error somehow, the target application runs fine if it is not attached. What am I missing?
UPDATE: Actually, this doesn't appear to be what's causing the error, because calls to my pthread_cond_wait() wrapper succeed many times before the error occurs, and exhibit similar behavior (pointer value changing between frames without explanation) each time. I'm leaving the question open, though, because I still don't understand what's going on here and I'd like to learn.
UPDATE 2: As requested, here's the code for tracer.add_event():
// add an event to the calling thread's history
// hist_entry ctor gets timestamp & stack trace
void tracer::add_event(event e, size_t obj_addr) {
size_t tid = get_tid();
hist_map::iterator hist = histories.contains(tid);
assert(hist != histories.end());
hist_entry ev (e, obj_addr);
hist->second.push_back(ev);
}
// hist_entry ctor:
hist_entry::hist_entry(event e, size_t obj_addr) :
ts(chrono::steady_clock::now()), ev(e), addr(obj_addr) {
// these are set in the tracer ctor
assert(start_addr && end_addr);
void* buf[TRACE_DEPTH];
int v = backtrace(buf, TRACE_DEPTH);
int a = 0;
// find first frame outside of our own code
while (a < v && start_addr < (size_t) buf[a] &&
end_addr > (size_t) buf[a]) ++a;
// skip requested amount of frames
a += TRACE_SKIP;
if (a >= v) a = v-1;
caller = buf[a];
}
histories is a lock-free concurrent hashmap from libcds (mapping tid->per-thread vectors of hist_entry), and its iterators are guaranteed to be thread-safe as well. GNU docs say backtrace() is thread-safe, and there's no data races mentioned in the CPP docs for steady_clock::now(). get_tid() just calls pthread_self() using the same method as the wrapper functions, and casts its result to size_t.
Hah, figured it out! The issue is that Glibc exposes multiple versions of pthread_cond_wait(), for backwards compatibility. The version I reproduce in my question is the current version, the one we want to call. The version that dlsym() was finding is the backwards-compatible version:
int
__pthread_cond_wait_2_0 (pthread_cond_2_0_t *cond, pthread_mutex_t *mutex)
{
if (cond->cond == NULL)
{
pthread_cond_t *newcond;
newcond = (pthread_cond_t *) calloc (sizeof (pthread_cond_t), 1);
if (newcond == NULL)
return ENOMEM;
if (atomic_compare_and_exchange_bool_acq (&cond->cond, newcond, NULL))
/* Somebody else just initialized the condvar. */
free (newcond);
}
return __pthread_cond_wait (cond->cond, mutex);
}
As you can see, this version tail-calls the current one, which is probably why this took so long to detect: GDB is normally pretty good at detecting frames elided by tail calls, but I'm guessing it didn't detect this one because the functions have the "same" name (and the error doesn't affect the mutex functions because they don't expose multiple versions). This blog post goes into much more detail, coincidentally specifically about pthread_cond_wait(). I stepped through this function many times while debugging and sort of tuned it out, because every call into glibc is wrapped in multiple layers of indirection; I only realized what was going on when I set a breakpoint on the pthread_cond_wait symbol, instead of a line number, and it stopped at this function.
Anyway, this explains the changing pointer phenomenon: what happens is that the old, incorrect function gets called, reinterprets the pthread_cond_t object as a struct containing a pointer to a pthread_cond_t object, allocates a new pthread_cond_t for that pointer, and then passes the newly allocated one to the new, correct function. The frame of the old function gets elided by the tail-call, and to a GDB backtrace after leaving the old function it looks like the correct function gets called directly from my wrapper, with a mysteriously changed argument.
The fix for this was simple: GNU provides the libdl extension dlvsym(), which is like dlsym() but also takes a version string. Looking for pthread_cond_wait with version string "GLIBC_2.3.2" solves the problem. Note that these versions do not usually correspond to the current version (i.e. pthread_create()/exit() have version string "GLIBC_2.2.5"), so they need to be looked up on a per-function basis. The correct string can be determined either by looking at the compat_symbol() or versioned_symbol() macros that are somewhere near the function definition in the glibc source, or by using readelf to see the names of the symbols in the compiled library (mine has "pthread_cond_wait##GLIBC_2.3.2" and "pthread_cond_wait##GLIBC_2.2.5").

C++ syntax I don't understand

I've found a C++ code that has this syntax:
void MyClass::method()
{
beginResetModel();
{
// Various stuff
}
endResetModel();
}
I've no idea why there are { } after a line ending with ; but it seems there is no problem to make it compile and run. Is it possible this as something to do with the fact that the code may be asynchronous (I'm not sure yet)? Or maybe the { } are only here to delimit a part of the code and don't really make a difference but honestly I doubt this. I don't know, does someone has any clue what this syntax mean ?
More info: There is no other reference to beginResetModel, resetModel or ResetModel in the whole project (searched with grep). Btw the project is a Qt one. Maybe it's another Qt-related macro I haven't heard of.
Using {} will create a new scope. In your case, any variable created in those braces will cease to exist at the } in the end.
beginResetModel();
{
// Various stuff
}
endResetModel()
The open and close braces in your code are a very important feature in C++, as they delimit a new scope. You can appreciate the power of this in combination with another powerful language feature: destructors.
So, suppose that inside those braces you have code that creates various objects, like graphics models, or whatever.
Assuming that these objects are instances of classes that allocate resources (e.g. textures on the video card), and those classes have destructors that release the allocated resources, you are guaranteed that, at the }, these destructors are automatically invoked.
In this way, all the allocated resources are automatically released, before the code outside the closing curly brace, e.g. before the call to endResetModel() in your sample.
This automatic and deterministic resource management is a key powerful feature of C++.
Now, suppose that you remove the curly braces, and your method looks like this:
void MyClass::method()
{
beginResetModel();
// {
// Various stuff
// }
endResetModel();
}
Now, all the objects created in the Various stuff section of code will be destroyed before the } that terminates the MyClass::method(), but after the call to endResetModel().
So, in this case, you end up with the endResetModel() call, followed by other release code that runs after it. This may cause bugs.
On the other hand, the curly braces that define a new scope enclosed in begin/endResetModel() do guarantee that all the objects created inside this scope are destroyed before endResetModel() is invoked.
{} delimits a scope. That means that any variable declared inside there is not accessible outside of it and is erased from memory once the } is reached. Here is an example:
#include <iostream>
using namespace std;
class MyClass{
public:
~MyClass(){
cout << "Destructor called" << endl;
}
};
int main(){
{
int x = 3;
MyClass foo;
cout << x << endl; //Prints 3
} //Here "Destructor called" is printed since foo is cleared from the memory
cout << x << endl; //Compiler error, x isn't defined here
return 0;
}
Usually scopes are used for functions, loops, if-statements, etc, but you're perfectly allowed to use scopes without any statement before them. This can be particularly useful to declare variables inside a switch (this answer explains why).
As others have pointed out, the curly braces create a new scope, but maybe the interesting thing is why would you want to do that - that is, what is the difference between using it and not using it. There are cases where scopes are obviously necessary, such as with if or for blocks; if you don't create a scope after them you can only have one statement. Another possible reason is that maybe you use one variable in one part of the function and do not one it to be used outside of that part, so you put it into its own scope. However, the main use of scopes out of control statements has to do with RAII. When you declare an instance variable (not a pointer or reference), it is always initialized; when it goes out of scope, it is always destroyed. This can be used to define blocks that require some setup at the beginning and some tear down at the end (if you are familiar with Python, similar to with blocks).
Take this example:
#include <mutex>
void fun(std::mutex & mutex) {
// 1. Perform some computations...
{
std::lock_guard<std::mutex> lock(mutex);
// 2. Operations in this scope are performed with the mutex locked
}
// 3. More computations...
}
In this example, part 2 is only run after the mutex has been acquired, and is released before part 3 starts. If you remove the additional scope:
#include <mutex>
void fun(std::mutex & mutex) {
// 1. Perform some computations...
std::lock_guard<std::mutex> lock(mutex);
// 2. Operations in this scope are performed with the mutex locked
// 3. More computations...
}
In this case the mutex is acquired before starting part 2, but it is held until part 3 is complete (possibly producing more interlocking between threads than necessary). Note, however, that in both cases there was no need to specify when the lock is released; std::lock_guard is responsible for both acquiring the lock on construction and releasing it on destruction (i.e. when it goes out of scope).

How do you know whether main has exited?

In both C and C++, atexit functions are called either inside exit, or after main returns (which notionally calls exit: __libc_start_main(argc,argv) { __libc_constructors(); exit(main(argc,argv)); }).
Is there a way to find out if we're inside the exit sequence? Destructors of C++ global and local statics are registered with atexit, so your code can certainly be called into at this stage. (Interestingly, on some platforms if you try to create a C++ local-static object inside exit, it deadlocks on the exit lock!)
My best attempt so far is as follows:
static bool mainExited = false;
static void watchMain() {
static struct MainWatcher {
~MainWatcher() { mainExited = true; }
} watcher;
}
When you want to watch for exit, you call watchMain(), and mainExited tells you at any time whether or not the exit sequence has begun -- except of course if a later-initialized local-static object is destructing!
Can the technique be improved to correct this, or is there another method that would work?
Aside - the use case!
While the problem is interesting from a language point-of-view (a bit like "can I tell if I'm inside a catch block?"), it's also useful to outline a use-case. I came across the problem while writing some code which will be run with and without a JVM loaded (with either direct calls or calls via JNI). After the JVM exits, the C atexit handlers are called, and JNI_OnUnload is not called if the JNI shared library is not unloaded by the class loader.
Since the shared library's objects can be destructed both by explicit destruction (and should free their resources), and by cleanup at exit, I need to distinguish these two cases safely, since the JVM is gone by the time we reach the exit code! Basically without a bit of sniffing there's no way I can find in the JNI specs/docs for a shared library to know whether the JVM is still there or not, and if it's gone, then it's certainly wrong to try and free up references we have to Java objects.
The real issue here is that the ownership semantics you've listed are messed up. The JVM kinda owns your shared library but also kinda doesn't. You have a bunch of references to Java objects that sometimes you need to clean up but sometimes you don't.
The real solution here is simply to not keep references to Java objects as global variables. Then you won't need to know if the JVM still exists or not when the library is unloaded for whatever reason. Just keep references to Java objects from inside objects referenced by Java and then let the JVM care about whether or not it needs to free them.
In other words, don't make yourself responsible for cleanup on exit in the first place.
Your watcher doesn't need to rely on any static initialization order:
#include <iostream>
struct MainWatcher // : boost::noncopyable
{
enum MainStatus { before, during, after };
MainWatcher(MainStatus &b): flag(b) { flag = during; }
~MainWatcher() { flag = after; }
MainStatus &flag;
};
//////////////////////////////////////////////////////////////////////
// Test suite
//////////////////////////////////////////////////////////////////////
// note: static data area is zero-initialized before static objects constructed
MainWatcher::MainStatus main_flag;
char const *main_word()
{
switch(main_flag)
{
case MainWatcher::before: return "before main()";
case MainWatcher::during: return "during main()";
case MainWatcher::after: return "after main()";
default: return "(error)";
}
}
struct Test
{
Test() { std::cout << "Test created " << main_word() << "\n"; }
~Test() { std::cout << "Test destroyed " << main_word() << "\n"; }
};
Test t1;
int main()
{
MainWatcher watcher(main_flag);
// rest of code
Test t2;
}

Is the main thread allowed to spawn a POSIX thread before it enters main()?

I have this object that contains a thread. I want the fate of the object and the fate of the thread to be one in the same. So the constructor creates a thread (with pthread_create) and the destructor performs actions to cause the thread to return in a reasonable amount of time and then joins the thread. This is working fine as long as I don't instantiate one of these objects with static storage duration. If I instantiate one of these objects at global or namespace or static class scope the program compiles fine (gcc 4.8.1) but immediately segfaults upon running. With print statements I have determined that the main thread doesn't even enter main() before the segfault. Any ideas?
Update: Also added a print statement to the first line of the constructor (so before pthread_create is called), and not even that gets printed before the segfault BUT the constructor does use an initialization list so it is possible something there is causing it?
Here is the constructor:
worker::worker(size_t buffer_size):
m_head(nullptr),m_tail(nullptr),
m_buffer_A(operator new(buffer_size)),
m_buffer_B(operator new(buffer_size)),
m_next(m_buffer_A),
m_buffer_size(buffer_size),
m_pause_gate(true),
m_worker_thread([this]()->void{ thread_func(); }),
m_running(true)
{
print("this wont get printed b4 segfault");
scoped_lock lock(worker_lock);
m_worker_thread.start();
all_workers.push_back(this);
}
And destructor:
worker::~worker()
{
{
scoped_lock lock(worker_lock);
auto w=all_workers.begin();
while(w!=all_workers.end())
{
if(*w==this)
{
break;
}
++w;
}
all_workers.erase(w);
}
{
scoped_lock lock(m_lock);
m_running=false;
}
m_sem.release();
m_pause_gate.open();
m_worker_thread.join();
operator delete(m_buffer_A);
operator delete(m_buffer_B);
}
Update 2:
Okay I figured it out. My print function is atomic and likewise protects cout with an extern namespace scope mutex defined elsewhere. I changed to just plain cout and it printed at the beginning of the ctor. Apparently none of these static storage duration mutexes are getting initialized before things are trying to access them. So yeah it is probably Casey's answer.
I'm just not going to bother with complex objects and static storage duration. It's no big deal anyway.
Initialization of non-local variables is described in C++11 ยง3.6.2, there's a ton of scary stuff in paragraph 2 that has to do with threads:
If a program starts a thread (30.3), the subsequent initialization of a variable is unsequenced with respect to the initialization of a variable defined in a different translation unit. Otherwise, the initialization of a variable is indeterminately sequenced with respect to the initialization of a variable defined in a different translation unit. If a program starts a thread, the subsequent unordered initialization of a variable is unsequenced with respect to every other dynamic initialization.
I interpret "the subsequent unordered initialization of a variable is unsequenced with respect to every other dynamic initialization" to mean that the spawned thread cannot access any variable with dynamic initialization that was not initialized before the thread was spawned without causing a data race. If that thread doesn't somehow synchronize with main, you're basically dancing through a minefield with your hands over your eyes.
I'd strongly suggest you read through and understand all of 3.6; even without threads it's a huge PITA to do much before main starts.
What happens before entering main will be platform specific, but here is a link on how main() executes on Linux
http://linuxgazette.net/84/hawk.html
The useful snipet is
__libc_start_main initializes necessary stuffs, especially C library(such as malloc) and thread environment and calls our main.
For more information look up __libc_start_main
Not sure how this behaves on Windows, but it seems like any standard C library call before entering main is not a good idea
There may be many ways to do that. See the snippet below where the constructor of class A called before main because we have declared an object of class A at global scope: (I have expanded the example to demonstrate how a thread can be created before main executes)
#include <iostream>
#include <stdlib.h>
#include <pthread.h>
using namespace std;
void *fun(void *x)
{
while (true) {
cout << "Thread\n";
sleep(2);
}
}
pthread_t t_id;
class A
{
public:
A()
{
cout << "Hello before main \n " ;
pthread_create(&t_id, 0, fun, 0);
sleep(6);
}
};
A a;
int main()
{
cout << "I am main\n";
sleep(40);
return 0;
}
I found this question after I posted my own question about threads. Reviewing my question might be helpful to others. I found that when I allocated an object creating a thread in the constructor at global scope I got strange behavior, but if I moved the objection creation just inside main() things worked as I expected. That seems to be consistent with comments on this question.

Is it possible to bind() *this to class member function to make a callback to C API

Is there a way to use boost or std bind() so I could use a result as a callback in C API?
Here's sample code I use:
#include <boost/function.hpp>
#include <boost/bind/bind.hpp>
typedef void (*CallbackType)();
void CStyleFunction(CallbackType functionPointer)
{
functionPointer();
}
class Class_w_callback
{
public:
Class_w_callback()
{
//This would not work
CStyleFunction(boost::bind(&Class_w_callback::Callback, this));
}
void Callback(){std::cout<<"I got here!\n";};
};
Thanks!
No, there is no way to do that. The problem is that a C function pointer is fundamentally nothing more than an instruction address: "go to this address, and execute the instructions you find". Any state you want to bring into the function has to either be global, or passed as parameters.
That is why most C callback APIs have a "context" parameter, typically a void pointer, that you can pass in, and just serves to allow you to pass in the data you need.
You cannot do this in portable C++. However, there are libraries out there that enable creation of C functions that resemble closures. These libraries include assembly code in their implementation and require manual porting to new platforms, but if they support architectures you care about, they work fine.
For example, using the trampoline library by Bruno Haible, you would write the code like this:
extern "C" {
#include <trampoline.h>
}
#include <iostream>
typedef int (*callback_type)();
class CallbackDemo {
static CallbackDemo* saved_this;
public:
callback_type make_callback() {
return reinterpret_cast<callback_type>(
alloc_trampoline(invoke, &saved_this, this));
}
void free_callback(callback_type cb) {
free_trampoline(reinterpret_cast<int (*)(...)>(cb));
}
void target(){
std::cout << "I got here, " << this << '\n';
};
static int invoke(...) {
CallbackDemo& me = *saved_this;
me.target();
return 0;
}
};
CallbackDemo *CallbackDemo::saved_this;
int main() {
CallbackDemo x1, x2;
callback_type cb1 = x1.make_callback();
callback_type cb2 = x2.make_callback();
cb1();
cb2();
}
Note that, despite the use of a static member, the trampolines created by alloc_trampoline are reentrant: when the returned callback is invoked, it first copies the pointer to the designated address, and then invokes the original function with original arguments. If the code must also be thread-safe, saved_this should be made thread-local.
This won't work.
The problem is that bind returns a functor, that is a C++ class with an operator() member function. This will not bind to a C function pointer. What you need is a static or non-member function that stores the this pointer in a global or static variable. Granted, finding the right this pointer for the current callback might be a non-trivial task.
Globals
As mentioned by the others, you need a global (a static member is a global hidden as a variable member) and of course if you need multiple objects to make use of different parameters in said callback, it won't work.
Context Parameters in Callback
A C library may offer a void * or some similar context. In that case use that feature.
For example, the ffmpeg library supports a callback to read data which is defined like so:
int(*read_packet)(void *opaque, uint8_t *buf, int buf_size);
The opaque parameter can be set to this. Within your callback, just cast it back to your type (name of your class).
Library Context Parameter in Calback
A C library may call your callback with its object (struct pointer). Say you have a library named example which offers a type named example_t and defines callbacks like this:
callback(example_t *e, int param);
Then you may be able to place your context (a.k.a. this pointer) in that example_t structure and retrieve it back out in your callback.
Serial Calls
Assuming you have only one thread using that specific C library and that the callback can only be triggered when you call a function in the library (i.e. you do not get events triggered at some random point in time,) you could still use a global variable. What you have to do is save your current object in the global before each call. Something like this:
object_i_am_working_with = this;
make_a_call_to_that_library();
This way, inside the callback you can always access the object_i_am_working_with pointer. This does not work in a multithreaded application or when the library automatically generates events in the background (i.e. a key press, a packet from the network, a timer, etc.)
One Thread Per Object (since C++11)
This is an interesting solution in a multi-threaded environment. When none of the previous solutions are available to you, you may be able to resolve the problem using threads.
In C++11, there is a new special specifier named thread_local. In the old days, you had to handle that by hand which would be specific to each thread implementation... now you can just do this:
thread_local Class_w_callback * callback_context = nullptr;
Then when in your callback you can use the callback_context as the pointer back to your Class_w_callback class.
This, of course, means you need to create one thread per object you create. This may not be feasible in your environment. In my case, I have components which are all running their own loop and thus each have their own thread_local environment.
Note that if the library automatically generates events you probably can't do that either.
Old Way with Threads (And C solution)
As I mentioned above, in the old days you would have to manage the local thread environment yourself. With pthread (Linux based), you have the thread specific data accessed through pthread_getspecific():
void *pthread_getspecific(pthread_key_t key);
int pthread_setspecific(pthread_key_t key, const void *value);
This makes use of dynamically allocated memory. This is probably how the thread_local is implemented in g++ under Linux.
Under MS-Windows, you probably would use the TlsAlloc function.