C++ registration functions called before main() - c++

My program looks something like this:
map<string, function<void(const MyType&)>> callables;
int main(int argc, char *argv[]) {
string name = GetFromSomewhere();
auto iter = callables.find(name);
if (iter != callables.end()) {
MyType my_thing = GetSomeValue();
iter->second(my_thing);
}
}
In other words, I have a table of functions, and main is going to do something that produces a lookup key into that table, do the lookup, and if successful, call the function.
Now I could initialise the table in the translation unit where I define the map, but that means each new function that wants to be in that map has to modify the map's TU. That gets cumbersome.
Better to have a registration function:
void RegisterCallable(const string&, function<void(MyType)>);
and then any developer who wants to put something in the table just calls RegisterCallable():
# In foo.cc:
void NiftyCallable(const MyType& thing) { ... }
RegisterCallable("nifty", NiftyCallable);
Past experience with string vs char[] warns me that I'm asking for pain, but I've not been able to (re)find the specific C++ rule that tells me when those RegisterCallable() calls that we'll scatter about the code base will be called (in particular, if they're guaranteed to be called before main executes or if maybe the TU can load on demand later -- and so whether my memory of pain is correct or not for C++14).
Am I misremembering that this will cause pain?
Or is there a better way to do this other than asking for some TU to know about (currently 100 or so) functions that need registering?

Don't put the table in global scope put it in function scope (still has to be static to make sure that it lives for the length of the application). So you can force the initialization order. Then you solve the problem of initialization order across compilation units.
static std::map<std::string, std::function<void(const MyType&)& getCallables() {
static std::map<std::string, std::function<void(const MyType&)>> callables;
// ^^^^^^ Static storage duration object.
// lives as long as the application.
return callables;
}
int main(int argc, char *argv[]) {
std::string name = GetFromSomewhere();
auto iter = getCallables().find(name);
if (iter != getCallables().end()) {
MyType my_thing = GetSomeValue();
iter->second(my_thing);
}
}
When calling from any scope to register a new function it calls getCallables() which forces initialization. So you avoid the initialization order issue.
void RegisterCallable(const std::string& name, std::function<void(MyType)> f)
{
getCallables()[name] = f;
}
Unfortunately, you can not have freestanding function calls directly in a compilation unit in C++ (unlike a lot of interpreted languages).
// So this will not work
RegisterCallable("nifty", NiftyCallable);
So the way to do this is to declare objects at global scope whose constructor registers the object.
struct DoRegisterCallable {
DoRegisterCallable(std::string const& name, std::function<void(MyType)> f) {
RegisterCallable(name, f);
}
};
Now in your compilation unit the person adding the function will do:
// In foo.cc:
void NiftyCallable(const MyType& thing) { ... }
DoRegisterCallable niftyCallableRegister("nifty", NiftyCallable);
In the comments above IgorTandetnik suggests that niftyCallableRegister may not be included in the executable as the compiler may optimize the variable out. This statement is not interlay true but has merit to think about.
If the file foo.cc is compiled into a static library. Then this static library is linked against the executable, then there is a potential that it may not be included. But in normal situations most builds are done with dynamic libraries not static libraries (as static libraries have so many other issues that people have mostly stopped using them) so this is minor concern in normal operations (but is something to think about).
Additionally, it is implementation defined if file scope, static storage duration variables are initialized before main or deferred. This is easily testable via some unit tests as it is a property of the compiler and not undefined behavior (if you are compiler is doing this then you need to check the documentation to see if the behavior can be changed).
My speculation on this language in the standard is to allow delayed loading of shared libraries till after the application starts, but still guarantee that their behavior conforms to the standard. The way this is written, allows an application to dynamically load a shared library and initialize it (make sure file scope static storage duration objects are initialized) after the application main() has started. A corner case and easily tested via unit test.
I may decide that it is nice to wrap this in a class for easy usage:
#include <string>
#include <functional>
#include <map>
#include <iostream>
class MyType
{
};
using Callable = std::function<void(MyType)>;
using CallableMap = std::map<std::string, Callable>;
class Callables
{
static CallableMap& getCallables()
{
static CallableMap callables;
return callables;
}
public:
static void registerFunc(std::string name, std::function<void(MyType)>&& f)
{
getCallables()[std::move(name)] = std::move(f);
}
static void call(std::string const& name, std::function<MyType()>&& getter)
{
auto find = getCallables().find(name);
if (find != getCallables().end()) {
find->second(getter());
}
}
static void call(std::string const& name, MyType const& value)
{
call(name, [&value](){return value;});
}
};
struct RegisterCallables
{
RegisterCallables(std::string value, Callable&& f)
{
Callables::registerFunc(std::move(value), std::move(f));
}
};
void echo(MyType v)
{
std::cout << "Echo\n";
}
RegisterCallables echoRegister("echo", echo);
int main()
{
MyType d;
Callables::call("echo", d);
}

Related

Constructing std::function from extern functions gives std::bad_function_call

I am experimenting with making pure Haskell-style I/O in C++. It's working correctly, but when I reorganize some definitions, I run into a std::bad_function_call.
This is about as much as it takes to trigger the problem:
//common.h
#include <functional>
#include <iostream>
#include <utility>
#include <string>
class Empty {};
class State {};
template <class A>
class IOMonad {
public:
typedef std::function<std::pair<A, State> (State)> T;
};
template <class A, class B>
const auto bind(typename IOMonad<A>::T ma, std::function<typename IOMonad<B>::T (A)> f) {
return [ma, f] (State state) {
const auto x = ma(state);
return f(x.first)(x.second);
};
}
extern const IOMonad<std::string>::T getLine;
IOMonad<Empty>::T putLine(std::string str);
//externs.cpp
#include "common.h"
const IOMonad<std::string>::T getLine = [](State s) {
(void)s;
std::string str;
std::cin >> str;
return std::make_pair(str, State());
};
IOMonad<Empty>::T putLine(std::string str) {
return [str] (State s) {
(void)s;
std::cout << str;
return std::make_pair(Empty(), State());
};
}
//main.cpp
#include "common.h"
const auto putGet = bind<std::string, Empty>(getLine, putLine);
int main() {
(void)putGet(State());
return 0;
}
With this setup, I get a std::bad_function_call when putGet is called. Previously, I had the contents of externs.cpp in main.cpp between including common.h and defining putGet, and everything worked fine. Something about having those functions in a different translation unit seems to be causing this problem. Also, if I keep the functions in externs.cpp, but I make putGet a local variable to main instead of a global variable, this does not happen. Another thing that makes the exception go away is folding the definition of bind into the definition of putGet, like so:
const auto putGet = [] (State state) {
const auto x = getLine(state);
return putLine(x.first)(x.second);
};
Why is this happening? Does std::function have some limitations I don't know about?
You've run afoul of the static initialization order fiasco. In your case, getLine is yet uninitialized when it's used to initialize putGet.
The cardinal rule of C++ global variables is: Global variables must not depend on global variables in other compilation units for their initialization.
While global variables in a single compilation unit are initialized in the order they're defined, the order in which global variables in different compilation units are initialized is unspecified. There is no guarantee that getLine will be initialized before putGet (and indeed, it seems it wasn't).
To work around this, you need to either (two of which you've already found):
A. Move the initialization of putGet into main so that getLine is guaranteed to be initialized before it's used
or
B. Don't use getLine directly in the initialization of putGet (i.e. wrap it in an extra layer of lambda).
or
C. Make getLine an actual function instead of a std::function holding a lambda. Objects and functions are fundamentally different in C++ and have different rules governing their lifetime. Despite their name, std::functions are objects, not functions.

Running C++ code outside of functions scope

(I know) In c++ I can declare variable out of scope and I can't run any code/statement, except for initializing global/static variables.
IDEA
Is it a good idea to use below tricky code in order to (for example) do some std::map manipulation ?
Here I use void *fakeVar and initialize it through Fake::initializer() and do whatever I want in it !
std::map<std::string, int> myMap;
class Fake
{
public:
static void* initializer()
{
myMap["test"]=222;
// Do whatever with your global Variables
return NULL;
}
};
// myMap["Error"] = 111; => Error
// Fake::initializer(); => Error
void *fakeVar = Fake::initializer(); //=> OK
void main()
{
std::cout<<"Map size: " << myMap.size() << std::endl; // Show myMap has initialized correctly :)
}
One way of solving it is to have a class with a constructor that does things, then declare a dummy variable of that class. Like
struct Initializer
{
Initializer()
{
// Do pre-main initialization here
}
};
Initializer initializer;
You can of course have multiple such classes doing miscellaneous initialization. The order in each translation unit is specified to be top-down, but the order between translation units is not specified.
You don't need a fake class... you can initialize using a lambda
auto myMap = []{
std::map<int, string> m;
m["test"] = 222;
return m;
}();
Or, if it's just plain data, initialize the map:
std::map<std::string, int> myMap { { "test", 222 } };
Is it a good idea to use below tricky code in order to (for example)
do some std::map manipulation ?
No.
Any solution entailing mutable non-local variables is a terrible idea.
Is it a good idea...?
Not really. What if someone decides that in their "tricky initialisation" they want to use your map, but on some system or other, or for not obvious reason after a particular relink, your map ends up being initialised after their attempted use? If you instead have them call a static function that returns a reference to the map, then it can initialise it on first call. Make the map a static local variable inside that function and you stop any accidental use without this protection.
§ 8.5.2 states
Except for objects declared with the constexpr specifier, for which
see 7.1.5, an initializer in the definition of a variable can consist
of arbitrary expressions involving literals and previously declared
variables and functions, regardless of the variable’s storage duration
therefore what you're doing is perfectly allowed by the C++ standard. That said, if you need to perform "initialization operations" it might be better to just use a class constructor (e.g. a wrapper).
What you've done is perfectly legal C++. So, if it works for you and is maintainable and understandable by anybody else who works with the code, it's fine. Joachim Pileborg's sample is clearer to me though.
One problem with initializing global variables like this can occur if they use each other during initialization. In that case it can be tricky to ensure that variables are initialized in the correct order. For that reason, I prefer to create InitializeX, InitializeY, etc functions, and explicitly call them in the correct order from the Main function.
Wrong ordering can also cause problems during program exit where globals still try to use each other when some of them may have been destroyed. Again, some explicit destruction calls in the correct order before Main returns can make it clearer.
So, go for it if it works for you, but be aware of the pitfalls. The same advice applies to pretty much every feature in C++!
You said in your question that you yourself think the code is 'tricky'. There is no need to overcomplicate things for the sake of it. So, if you have an alternative that appears less 'tricky' to you... that might be better.
When I hear "tricky code", I immediately think of code smells and maintenance nightmares. To answer your question, no, it isn't a good idea. While it is valid C++ code, it is bad practice. There are other, much more explicit and meaningful alternatives to this problem. To elaborate, the fact that your initializer() method returns void* NULL is meaningless as far as the intention of your program goes (i.e. each line of your code should have meaningful purpose), and you now have yet another unnecessary global variable fakeVar, which needlessly points to NULL.
Let's consider some less "tricky" alternatives:
If it's extremely important that you only ever have one global instance of myMap, perhaps using the Singleton Pattern would be more fitting, and you would be able to lazily initialize the contents of myMap when they are needed. Keep in mind that the Singleton Pattern has issues of its own.
Have a static method create and return the map or use a global namespace. For example, something along the lines of this:
// global.h
namespace Global
{
extern std::map<std::string, int> myMap;
};
// global.cpp
namespace Global
{
std::map<std::string, int> initMap()
{
std::map<std::string, int> map;
map["test"] = 222;
return map;
}
std::map<std::string, int> myMap = initMap();
};
// main.cpp
#include "global.h"
int main()
{
std::cout << Global::myMap.size() << std::endl;
return 0;
}
If this is a map with specialized functionality, create your own class (best option)! While this isn't a complete example, you get the idea:
class MyMap
{
private:
std::map<std::string, int> map;
public:
MyMap()
{
map["test"] = 222;
}
void put(std::string key, int value)
{
map[key] = value;
}
unsigned int size() const
{
return map.size();
}
// Overload operator[] and create any other methods you need
// ...
};
MyMap myMap;
int main()
{
std::cout << myMap.size() << std::endl;
return 0;
}
In C++, you cannot have statements outside any function. However, you have global objects declared, and constructor (initializer) call for these global objects are automatic before main starts. In your example, fakeVar is a global pointer that gets initialized through a function of class static scope, this is absolutely fine.
Even a global object would do provide that global object constructor does the desired initializaton.
For example,
class Fake
{
public:
Fake() {
myMap["test"]=222;
// Do whatever with your global Variables
}
};
Fake fake;
This is a case where unity builds (single translation unit builds) can be very powerful. The __COUNTER__ macro is a de facto standard among C and C++ compilers, and with it you can write arbitrary imperative code at global scope:
// At the beginning of the file...
template <uint64_t N> void global_function() { global_function<N - 1>(); } // This default-case skips "gaps" in the specializations, in case __COUNTER__ is used for some other purpose.
template <> void global_function<__COUNTER__>() {} // This is the base case.
void run_global_functions();
#define global_n(N, ...) \
template <> void global_function<N>() { \
global_function<N - 1>(); /* Recurse and call the previous specialization */ \
__VA_ARGS__; /* Run the user code. */ \
}
#define global(...) global_n(__COUNTER__, __VA_ARGS__)
// ...
std::map<std::string, int> myMap;
global({
myMap["test"]=222;
// Do whatever with your global variables
})
global(myMap["Error"] = 111);
int main() {
run_global_functions();
std::cout << "Map size: " << myMap.size() << std::endl; // Show myMap has initialized correctly :)
}
global(std::cout << "This will be the last global code run before main!");
// ...At the end of the file
void run_global_functions() {
global_function<__COUNTER__ - 1>();
}
This is especially powerful once you realize that you can use it to initialize static variables without a dependency on the C runtime. This means you can generate very small executables without having to eschew non-zero global variables:
// At the beginning of the file...
extern bool has_static_init;
#define default_construct(x) x{}; global(if (!has_static_init()) new (&x) decltype(x){})
// Or if you don't want placement new:
// #define default_construct(x) x{}; global(if (!has_static_init()) x = decltype(x){})
class Complicated {
int x = 42;
Complicated() { std::cout << "Constructor!"; }
}
Complicated default_construct(my_complicated_instance); // Will be zero-initialized if the CRT is not linked into the program.
int main() {
run_global_functions();
}
// ...At the end of the file
static bool get_static_init() {
volatile bool result = true; // This function can't be inlined, so the CRT *must* run it.
return result;
}
has_static_init = get_static_init(); // Will stay zero without CRT
This answer is similar to Some programmer dude's answer, but may be considered a bit cleaner. As of C++17 (that's when std::invoke() was added), you could do something like this:
#include <functional>
auto initializer = std::invoke([]() {
// Do initialization here...
// The following return statement is arbitrary. Without something like it,
// the auto will resolve to void, which will not compile:
return true;
});

static unordered_map is erased when putting into different compilation unit in XCode

I have a static unordered_map in my class C. I experience difference in behaviour if I put my class definition and declaration in different files from the file containing function main.
The thing is that I observed that if the class C is in the same compilation unit as function main, all is well, I see only once the text "new string created: c". However if I split my code into three files (see the listing below), I see "new string created: c" twice which means that my static unordered_map is wiped right before entering main.
My question would be: why does this happen? (The difference only happens when compiling with Apple LLVM compiler 4.1. I have tested it with g++4.7 -std=c++11 and the split code works out just fine.)
Thanks in advance for any ideas!
// would go to My_header.h
#include <unordered_map>
#include <string>
#include <iostream>
using namespace std;
class C{
public:
C(const string & s);
private:
static unordered_map<string, string*> m;
string *name;
};
// would go to My_code.cpp
// (when separated, add #include "My_header.h")
unordered_map<string, string*> C::m;
C::C(const string & s):
name(NULL)
{
string*& rs = m[s];
if(rs)
{
name = rs;
}
else
{
cout<<"new string created: "<<s<<endl;
rs = name = new string(s);
}
}
// would go to main.cpp
// (when separated, add #include "My_header.h")
C c("c");
int main(int argc, const char * argv[])
{
cout << "main" << endl;
C c1("c");
}
The order of initialization of global objects is defined only within one translation unit. Between different translation the order isn't guaranteed. Thus, you probably see behavior resulting from the std::unordered_map being accessed before it is constructed.
The way to avoid these problems is to not use global objects, of course. If you realky need to use a global object it is best to wrap the object by a function. This way it is guaranteed that the object is constructed the first time it is accessed. With C++ 2011 the construction is even thread-safe:
T& global() {
static T rc;
return rc;
}
Thanks, guys! Following Dietmar's advice, I did this:
class C{
//...
private:
static unordered_map<string, string*>& m();
};
unordered_map<string, string*>& C::m()
{
static unordered_map<string, string*> m;
return m;
}
and then I kept referring to m(). It is strange that it did not happen before. I guess I got lucky. But then, this should be a case for a warning message, shouldn't it?
To avoid mistakes like this I will use the following macros to declare and define static variables:
/// Use this macro in classes to declare static variables
#define DECLARE_STATIC(type, name) static type& name();
/// Use this macro in definition files to define static variables
#define DEFINE_STATIC(type, name) type& name(){static type local; return local;}
Usage in this case:
class C{
//...
private:
DECLARE_STATIC(unordered_map<string, string*>, m);
}
DEFINE_STATIC(unordered_map<string, string*>, m)

Very strange memory leak

I am running the following piece of code under the Marmalade SDK. I need to know if there's a "bug" in my code or in Marmalade:
template <class Return = void, class Param = void*>
class IFunction {
private:
static unsigned int counterId;
protected:
unsigned int id;
public:
//
static unsigned int getNewId() { return counterId++; }
template <class FunctionPointer>
static unsigned int discoverId(FunctionPointer funcPtr) {
typedef std::pair<FunctionPointer, unsigned int> FP_ID;
typedef std::vector<FP_ID> FPIDArray;
static FPIDArray siblingFunctions; // <- NOTE THIS
typename FPIDArray::iterator it = siblingFunctions.begin();
while (it != siblingFunctions.end()) {
if (funcPtr == it->first) return it->second; /// found
++it;
}
/// not found
unsigned int newId = getNewId();
siblingFunctions.push_back( FP_ID(funcPtr, newId) ); // <- NOTE THIS
return newId;
}
//
virtual ~IFunction() {}
bool operator<(const IFunction* _other) const {
if (this->id < _other->id) return true;
return false;
}
virtual Return call(Param) = 0;
};
Note that every time template class discoverId is called for the 1st time, a static local array is created.
At program exit, the Marmalade memory manager complains that the memory reserved at this line :
siblingFunctions.push_back( FP_ID(funcPtr, newId) );
hasn't been freed. (The truth is that I don't empty the array, but how could I, I don't have access to it outside that function!).
Here is the catch : Marmalade complains only for the memory reserved at the very first call of this function! This function is called several times and with several different template parameters, but the complaining always occurs only for the memory reserved at the 1st call. This is the case even if I mix up the order of the various calls to this function. Memory reserved for every call after the 1st one is automatically freed - I have checked this out.
So, who's to blame now?
I don't know what "Marmalade" is (and a quick search for this word expectedly found a lot of irrelevant references) but your code doesn't have a resource leak with respect to the static FPIDArray siblingFunctions: this object is constructed the first time the function is called. It is destroyed at some point after main() is exited. I seem to recall that the order of destruction of objects with static linkage is the reverse of order in which objects are constructed but I'm not sure if this extends function local statics.

Partially initialize variable defined in other module

I'm considering a certain solution where I would like to initialize a cell of an array that is defined in other module (there will be many modules initializing one table). The array won't be read before running main (so there is not problem with static initialization order).
My approach:
/* secondary module */
extern int i[10]; // the array
const struct Initialize {
Initialize() { i[0] = 12345; }
} init;
/* main module */
#include <stdio.h>
int i[10];
int main()
{
printf("%d\n", i[0]); // check if the value is initialized
}
Compiler won't strip out init constant because constructor has side effects. Am I right? Is the mechanism OK? On GCC (-O3) everything is fine.
//EDIT
In a real world there will be many modules. I want to avoid an extra module, a central place that will gathered all minor initialization routines (for better scalability). So this is important that each module triggers its own initialization.
This works with MSVC compilers but with GNU C++ does not (at least for me). GNU linker will strip all the symbol not used outside your compilation unit. I know only one way to guarantee such initialization - "init once" idiom. For examle:
init_once.h:
template <typename T>
class InitOnce
{
T *instance;
static unsigned refs;
public:
InitOnce() {
if (!refs++) {
instance = new T();
}
}
~InitOnce() {
if (!--refs) {
delete instance;
}
}
};
template <typename T> unsigned InitOnce<T>::refs(0);
unit.h:
#include "init_once.h"
class Init : public InitOnce<Init>
{
public:
Init();
~Init();
};
static Init module_init_;
secondary.cpp:
#include "unit.h"
extern int i[10]; // the array
Init::Init()
{
i[0] = 12345;
}
...
I don't think you want the extern int i[10]; in your main module, though, adf88.
EDIT
/*secondary module (secondary.cpp) */
int i[10];
void func()
{
i[0]=1;
}
.
/*main module (main.cpp)*/
#include<iostream>
extern int i[];
void func();
int main()
{
func();
std::cout<<i[0]; //prints 1
}
Compile, link and create and executable using g++ secondary.cpp main.cpp -o myfile
In general constructors are used(and should be used) for initializing members of a class only.
This might work, but it's dangerous. Globals/statics construction order within a single module is undefined, and so is module loading order (unless you're managing it explicitly). For example, you assume that during secondary.c Initialize() ctor run, i is already present. You'd have to be very careful not to have two modules initialize the same common data, or have two modules carry out initializations with overlapping side effects.
I think a cleaner design to tackle such a need is to have the owner of the common data (your main module) expose it as a global singleton, with an interface to carry out whichever data initializations needed. You'd have a central place to control init-order, and maybe even control concurrent access (using critical sections or other concurrency primitives). Along the lines of your simplified example, that might be -
/main module (main.c)/
#include
class CommonDat
{
int i;
public:
const int GetI() { return i;}
void SetI(int newI) { i = newI; }
void incI()
{
AcquireSomeLock();
i++;
ReleaseTheLock();
}
}
CommonDat g_CommonDat;
CommonDat* getCommonDat() { return &g_CommonDat; }
int main(void)
{
printf("%d",getCommonDat()->GetI());
}
It's also preferable to have the secondary modules call these interfaces at controlled times in runtime (and not during the global c'tors pass).
(NOTE: you named the files as C files, but tagged the question as c++. The suggested code is c++, of course).
May I ask why you use an array (running the risk of getting out of bounds) when you could use a std::vector ?
std::vector<int>& globalArray()
{
static std::vector<int> V;
return V;
}
bool const push_back(std::vector<int>& vec, int v)
{
vec.push_back(v);
return true; // dummy return for static init
}
This array is lazily initialized on the first call to the function.
You can use it like such:
// module1.cpp
static bool const dummy = push_back(globalArray(), 1);
// module2.cpp
static bool const dummy = push_back(globalArray(), 2);
It seems much easier and less error-prone. It's not multithread compliant until C++0x though.