Use side effect of static initialization for one-time initialization - c++

I want to initialize some lookup table in a shared library, I want to see if this is a valid way, and if it is continue using it in more libraries I am about to write.
typedef std::map<std::string, int> NAME_LUT;
NAME_LUT g_mLUT;
namespace
{
bool OneTimeInit()
{
::g_mLUT.insert(NAME_LUT::value_type("open", 1));
::g_mLUT.insert(NAME_LUT::value_type("close", 2));
return true;
}
bool bInit = OneTimeInit(); // Just to make initialization happen
}
It seems to work fine on both Visual Studio, and gcc (Linux). Only gcc complains that bInit is not used anywhere.
Is it possible that initialization is optimized out (bInit not used), or language does not allow it (because of the side effect).
It sure looks like a good cross platform way of handling onetime initialization, but I am not sure if this is the best approach.
Does it make sense to make OneTimeInit declared static? (i.e. use static bool OneTimeInit() {...}), or namespace alone is a better approach to make it unique to this compilation unit

I don't quite like the idea of variables with static storage, but if you are going to do so, you can actually simplify the code by just writing a function that will initialize your object:
typedef std::map<std::string, int> NAME_LUT;
namespace {
NAME_LUT create_lut() {
NAME_LUT table;
table.insert(NAME_LUT::value_type("open", 1));
table.insert(NAME_LUT::value_type("close", 2));
return table;
}
}
NAME_LUT g_mLut = create_lut();
Note that this has all of the usual initialization order issues (accross different translation units, and specially with dynamic libraries)

Yes, it is legal but since you mention that it's in a library, you must ensure that the translation you are creating this in will be linked:
How to force inclusion of "unused" object definitions in a library

If you have C++11 initializer lists, would this work better for you?
NAME_LUT g_mLUT = { {"open", 1}, {"close", 2}, };

If bInit gets optimized out (possible, especially if your lib gets dlopened) then your initialization code won't run.
Is there a problem with having the user make a setup call to your library in a sane place (after main starts)? Doign it thihs way removes any possibility of one of many possible hard-to-debug order-of-initialization bugs from cropping up in your code.
If that's not an option I would really suggest hiding the init into a function call with a static local, something like this:
typedef std::map<std::string, int> NAME_LUT;
namespace
{
bool OneTimeInit(NAME_LUT& mLUT)
{
::mLUT.insert(NAME_LUT::value_type("open", 1));
::mLUT.insert(NAME_LUT::value_type("close", 2));
return true;
}
NAME_LUT& get_global_mLUT()
{
static NAME_LUT g_mLUT;
static bool bInit = OneTimeInit(g_mLUT); // Just to make initialization happen
return g_mLUT;
}
}

Related

Setting the value for variables in a method just once

I have this method:
bool CDemoPickerDlg::IsStudentTalk(CString strAssignment)
{
bool bStudentTalk = false;
CString strTalkMain, strTalkClass;
if (theApp.UseTranslationINI())
{
strTalkMain = theApp.GetSMMethod(_T("IDS_STR_HISTORY_TALK_MAIN"));
strTalkClass = theApp.GetSMMethod(_T("IDS_STR_HISTORY_TALK_AUX"));
}
else
{
strTalkMain.LoadString(IDS_STR_HISTORY_TALK_MAIN);
strTalkClass.LoadString(IDS_STR_HISTORY_TALK_AUX);
}
int iTalkMainLen = strTalkMain.GetLength();
int iTalkClassLen = strTalkClass.GetLength();
if (strAssignment.Left(iTalkMainLen) == strTalkMain ||
strAssignment.Left(iTalkClassLen) == strTalkClass)
{
bStudentTalk = true;
}
return bStudentTalk;
}
It is called multiple times. Without added "member variables" to the class to cache values is there any other way to create the values for the two CString and int values just the once? As they will not change for the duration of the program.
The method above is static. I know about assigning a value to a static variable but I understand that can only be done once at the time of declaration. Have I miss-understood that?
You can use a static constant (or variable, but why make it variable if it isn't supposed to be changed?) at function scope:
static CString const someImmutableText = <some initializer>;
The placeholder <some initializer> above can be a literal, a function call or any other expression that you can initialize a CString from. The static makes sure the object is only created once and subsequently only initialized once, too.
#Ulrich's answer will of course work fine, but if <some initializer> is non-trivial there is a hidden downside - as of C++11, the compiler is required to generate a threadsafe initialiser.
This has minimal runtime overhead but it does generate quite a lot of code, see at Godbolt, and if you have a lot of these then this can add up.
If there are no multi-threading issues (which generally there aren't, especially in initialisation code), then there is a simple alternative which will eliminate this code. In fact, it's so simple that it's barely worth posting at all, but I'll do it here anyway for completeness. It's just this; please excuse the anglicisms:
static bool initialised;
static Foo *initialise_me;
static Bar *initialise_me_too;
...
if (!initialised)
{
initialise_me = new Foo (...);
initialise_me_too = new Bar (...);
...
initialised = true;
}
...
Note that the variables to be initialised are declared as raw pointers here and allocated with new. This is done for a reason - the one thing you most definitely don't want is to call constructors at the point where you declare these variables, else you'll be right back where you started. There are no object lifetime issues because the variables remain in existence for the entire duration of the program, so it's all good.
And, in fact, you don't actually need that bool at all - just test (say) initialise_me against nullptr.

Init values by using (somewhat) global variables vs. static function variables?

I have some small helper functions needed throughout the code.
To work, they need to be initialized with some data once.
Where should I store the init data?
I've come up with two methods:
I create static variables in the scope of the helper.cpp file which I set with a dedicated setter function and then use in my helper function.
static int _initData = 0;
void initHelpMe(int initData)
{
_initData = initData;
}
void helpMe()
{
doSomethingWith(_initData);
}
Or I use a static function variable inside the original helper function and a default parameter to it.
void helpMe(int initData = 0)
{
static int _initData = 0;
if (initData != 0)
_initData = initData;
doSomethingWith(_initData);
}
(Lets asume that 0 is outside of the valid data range of initData and that I've not shown additional code to ensure an error is raised when the function is called for the first time without initiating it first.)
What are the advantages / disadvantages of those two methods and is there an even better way of doing it?
I of course like the second method, because it keeps all the functionality in one place. But I already know it is not thread-safe (which is not an issue a.t.m.).
And, to make this more interesting, albeit being C++ this is not to be used in object-oriented but in procedural code. So please no answers proposing objects or classes. Just imagine it to be C with the syntax of C++.
I was going to suggest that you wrap your data into an object, until I realized that you are asking for a C solution with a C++ tag...
Both of your solutions have their benefits.
The second one is the one I'd prefer, assuming we just go by "what it looks like/maintainability". However, there is a drawback if helpMe is called MANY times with initData == 0, because of the extra if, which isn't present in the first case. This may or may not be an issue if doSomethingWith() is long enough a function and/or the compiler has the ability to inline helpMe (and initData is constant).
And of course, something in the code will have to call initHelpMe too, so it may turn out to be the same anyway.
In summary: Prefer the second one, based on isolation/encapsulation.
I clearly prefer the second! Global static data in different compilation units are initialized in unspecified order (In one unit in order, though). Local static data of a function is initialized at first call.
Example:
If you have two translation units A and B. The unit A calls during initialization the function helpMe of unit B. Assume the order of initialization is A, B.
The first solution will set the zero initialized _initData to some initData. After that the initialization of unit B resets _initData back to zero and may produce a memory leak or other harm.
There is a third solution:
void helpMe(int initData = 0)
{
static std::once_flag once;
static int _initData = 0;
std::call_once(once, [&] {
_initData = initData;
}
doSomethingWith(_initData);
}
I feel strongly both ways.
Prefer option 2 for the isolation, but option 1 lends itself to porting to a C++ class. I've coded both ways. It comes down to the SW architecture.
Let me offer another point.
Both options down side: You have not limited initialization to one occurrence. "need to be initialized with some data once". It appears OP's conditions insure a proper initialization of initHelpMe(123) or HelpMe(123) followed by helpMe(), but do not prevent/detect a secondary initialization.
Should a secondary need to be prevented/detected, some additional code could be used.
// Initialization
if (_initData != 0) {
; // Handle error
}
_initData = initData;
Another paradigm I've used follows. It may not be realizable in you code as it does not pass initData as a parameter but magically can get it.
void helpMe(void) {
static int Initialized = 0;
if (!Initialized) {
Initialized = 1;
_initData = initData();
}
doSomethingWith(_initData);
}

Segmentation fault when initializing a file scoped variable

The situation I have is that I am trying to initialize a file scoped variable, std::string, in a shared object constructor. It will probably be more clear in code:
#include <string>
#include <dlfcn.h>
#include <cstring>
static std::string pathToDaemon; // daemon should always be in the same dir as my *.so
__attribute__((constructor))
static void SetPath()
{
int lastSlash(0):
Dl_info dl_info;
memset(&dl_info, 0, sizeof(dl_info));
if((dladdr((void*)SetPath, &dl_info)) == 0)
throw up;
pathToDaemon = dl_info.dli_fname; // **whoops, segfault here**
lastSlash = pathToDaemon.find_last_of('/');
if(std::string::npos == lastSlash)
{
// no slash, but in this dir
pathToDaemon = "progd";
}
else
{
pathToDaemon.erase(pathToDaemon.begin() + (lastSlash+1), pathToDaemon.end());
pathToDaemon.append("progd");
}
std::cout << "DEBUG: path to daemon is: " << pathToDaemon << std::endl;
}
I have a very simple program that does this same thing: a test driver program for concept if you will. The code in that looks just like this: a "shared object ctor" which uses dladdr() to store off the path of the *.so file when the file is loaded.
Modifications I've tried:
namespace {
std::string pathToDaemon;
__attribute__((constructor))
void SetPath() {
// function def
}
}
or
static std::string pathToDaemon;
__attribute__((constructor))
void SetPath() { // this function not static
// function def
}
and
std::string pathToDaemon; // variable not static
__attribute__((constructor))
void SetPath() { // this function not static
// function def
}
The example you see above sits in a file that is compiled into both a static object library and a DLL. The compilation process:
options for static.a: --std=C++0x -c -Os.
options for shared.so: -Wl,--whole-archive /path/to/static.a -Wl,--no-whole-archive -lz -lrt -ldl -Wl,-Bstatic -lboost_python -lboost_thread -lboost_regex -lboost_system -Wl,-Bdynamic -fPIC -shared -o mymodule.so [a plethora of more objects which wrap into python the static stuff]
The hoops I have to jump through in the bigger project make a much more complicated build process than my little test driver program requires. This makes me think that the problem lies there. Can anyone please shed some light on what I'm missing?
Thanks,
Andy
I think it's worth giving the answer that I've found. The problem was due to the complex nature of the shared library loading. I discovered after some digging that I could reproduce the problem in my test bed program when compiling the code with optimizations enabled. That confirmed the hypothesis that the variable truly didn't exist when it was being accessed by the constructor function.
GCC includes some extra tools for C++ which allow for developers to force certain things to happen at particular times during code initialization. More precisely, it allows for certain things to take place in particular order rather than particular times.
For example:
int someVar(55) __attribute__((init_priority(101)));
// This function is a lower priority than the initialization above
// so, this will happen *after*
__attribute__((constructor(102)))
void SomeFunc() {
// do important stuff
if(someVar == 55) {
// do something here that important too
someVar = 44;
}
}
I was able to use these tools to success in the test bed program even with optimizations enabled. The happiness which ensued was short lived when applied to my much larger library. Ultimately, the problem was due to the nature of such a large amount of code and the problematic way in which the variables are brought into existence. It just wasn't reliable to use these mechanisms.
Since I wanted to avoid repeated calls for evaluating the path, i.e.
std::string GetPath() {
Dl_info dl_info;
dladdr((void*)GetPath, &dl_info);
// do wonderful stuff to find the path
return dl_info.dli_fname;
}
The solution turned out to be much simpler than I was trying to make it:
namespace {
std::string PathToProgram() {
Dl_info dl_info;
dladdr((void*)PathToProgram, &dl_info);
std::string pathVar(dl_info.dli_fname);
// do amazing things to find the last slash and remove the shared object
// from that path and append the name of the external daemon
return pathVar;
}
std::string DaemonPath() {
// I'd forgotten that static variables, like this, are initialized
// only once due to compiler magic.
static const std::string pathToDaemon(PathToProgram());
return pathToDaemon;
}
}
As you can see, exactly what I wanted with less confusion. Everything happens only once, except calls to DaemonPath(), and everything remains within the translation unit.
I hope this helps someone who runs into this in the future.
Andy
Maybe you could try running valgrind on your program
In you self posted solution above, you have changed your »interface« (for the code that reads your pathToDaemon / DaemonPath()) from »Accessing a file scoped variable« to »calling a function in anonymous namespace« - so far ok.
But the implementation of DaemonPath() is not done in a thread-safe way. I though that thread-safeness matters, because your are wrote »-lboost_thread« in your question. So you may think about to change the implementation thread-safe. There are many discussions and solutions about singleton pattern and thread-safeness available, e.g.:
Article from Scott Meyers
Stack Overflow
The fact is, that your DaemonPath() will invoked (maybe far) after loading of the library is done. Note, that only the 1st call to the singleton pattern is critical in a multithreaded environment.
As an alternative, you may add a simple »early« call to your DaemonPath() function like this:
namespace {
std::string PathToProgram() {
... your code from above ...
}
std::string DaemonPath() {
... your code from above ...
}
__attribute__((constructor)) void MyPathInit() {
DaemonPath();
}
}
or in a more portable way like this:
namespace {
std::string PathToProgram() {
... your code from above ...
}
std::string DaemonPath() {
... your code from above ...
}
class MyPathInit {
public:
MyPathInit() {
DaemonPath();
}
} myPathInit;
}
Of course, this approach don't makes your singleton pattern thread-safe. But sometimes, there are situations, we can be sure that there are no concurrent thread accesses (e.g. at initialization time when the shared lib is loading). If this conditions matches for you, this approach could be a way to bypass thread-safeness problem without the use of thread locking (mutex...).

What could cause initialization order to corrupt the stack?

Question is in bold below :
This works fine:
void process_batch(
string_vector & v
)
{
training_entry te;
entry_vector sv;
assert(sv.size() == 0);
...
}
However, this causes the assert to fail :
void process_batch(
string_vector & v
)
{
entry_vector sv;
training_entry te;
assert(sv.size() == 0);
...
}
Now I know this issue isn't shrink wrapped, so I'll restrict my question to this: what conditions could cause such a problem ? Specifically: variable initialization getting damaged dependant on appearance order in the stack frame. There are no malloc's or free's in my code, and no unsafe functions like strcpy, memcpy etc... it's modern c++. Compilers used: gcc and clang.
For brevity here are the type's
struct line_string
{
boost::uint32_t line_no;
std::string line;
};
typedef std::vector<boost::uint32_t> line_vector;
typedef std::vector<line_vector> entry_vector;
typedef std::vector<line_string> string_vector;
struct training_body
{
boost::uint32_t url_id;
bool relevant;
};
struct training_entry
{
boost::uint32_t session_id;
boost::uint32_t region_id;
std::vector< training_body> urls;
};
p.s., I am in no way saying that there is a issue in the compiler, it's probably my code. But since I am templatizing some code I wrote a long time ago, the issue has me completely stumped, I don't know where to look to find the problem.
edit
followed nim's suggestion and went through the following loop
shrink wrap the code to what I have shown here, compile and test, no problem.
#if 0 #endif to shrink wrap the main program.
remove headers till it compiles in shrink wrapped form.
remove library links till compiles in shrink wrapped form.
Solution: removing link to protocol buffers gets rid of the problem
The C++ standard guarantees that the following assertion will succeed:
std::vector<anything> Default;
//in your case anything is line_vector and Default is sv
assert(Default.size() == 0);
So, either you're not telling the whole story or you have a broken STL implementation.
OR: You have undefined behavior in your code. The C++ standard gives no guarantees about the behavior of a program which has a construct leading to UB, even prior to reaching that construct.
The usual case for this when one of the created objects writes beyond
its end in the constructor. And the most frequent reason this happens
in code I've seen is that object files have been compiled with different
versions of the header; e.g. at some point in time, you added (or
removed) a data member of one of the classes, and didn't recompile all
of the files which use it.
What might cause the sort of problem you see is a user-defined type with a misbehaving constructor;
class BrokenType {
public:
int i;
BrokenType() { this[1].i = 9999; } // Bug!
};
void process_batch(
string_vector & v
)
{
training_entry te;
BrokenType b; // bug in BrokenType shows up as assert fail in std::vector
entry_vector sv;
assert(sv.size() < 100);
...
}
Do you have the right version of the Boost libaries suited for your platform? (64 bit/32 bit)? I'm asking since the entry_vector object seems to be have a couple of member variables of type boost::uint32_t. I'm not sure what could be the behaviour if your executable is built for one platform and the boost library loaded is of another platform.

Problems with Static Initialization

I'm having some weird issues with static initalization. I'm using a code generator to generate structs and serialization code for a message passing system I wrote. In order to have a way of easily allocating a message based on it's message id I have my code generator ouput something similar to the following for each message type:
MessageAllocator s_InputPushUserControllerMessageAlloc(INPUT_PUSH_USER_CONTROLLER_MESSAGE_ID, (AllocateMessageFunc)Create_InputPushUserControllerMessage);
The MessageAllocator class basically looks like this:
MessageAllocator::MessageAllocator( uint32_t messageTypeID, AllocateMessageFunc func )
{
if (!s_map) s_map = new std::map<uint32_t, AllocateMessageFunc>();
if (s_map->insert(std::make_pair(messageTypeID, func)).second == false)
{
//duplicate key!
ASSERT(false, L"Nooooo!");
}
s_count++;
}
MessageAllocator::~MessageAllocator()
{
s_count--;
if (s_count == 0) delete s_map;
}
where s_map and s_count are static members of MessageAllocator. This works most of the time but sometimes messages are not added to the map. For example, this particular message is not added unless i call Create_InputPushUserControllerMessage() somewhere in my startup code, however other messages work fine. I thought this might be something to do with the linker incorrectly thinking the type is unreferenced and removing it so I disabled that using the /OPT:NOREF switch (I'm using Visual Studio 2008 SP1) but that had no effect.
I'm aware of the problem of the "static initialization order fiasco" but as far as I know the order in which these objects are created shouldn't alter the result so this seems ok to me.
Any insight here would be appreciated.
Put the static into a class so it is a static member of a class
struct InputPushUserControllerMessageAlloc { static MessageAllocator s_obj; };
MessageAllocator InputPushUserControllerMessageAlloc::s_obj(
INPUT_PUSH_USER_CONTROLLER_MESSAGE_ID,
(AllocateMessageFunc)Create_InputPushUserControllerMessage);
The Standard allows it to delay initialization of objects having namespace scope until any function/object from its translation unit is used. If the initialization has side-effect, it can't be optimized out. But that doesn't forbid delaying it.
Not so of objects having class-scope. So that might forbid it optimizing something there.
I would change s_map from a static class member into a static method member:
std::map<uint32_t,AllocateMessageFunc>& MessageAllocator::getMap()
{
// Initialized on first use and destroyed correctly on program termination.
static std::map<uint32_t,AllocateMessageFunc> s_map;
return s_map;
}
MessageAllocator::MessageAllocator( uint32_t messageTypeID, AllocateMessageFunc func )
{
if (getMap().insert(std::make_pair(messageTypeID, func)).second == false)
{
//duplicate key!
ASSERT(false, L"Nooooo!");
}
}
No need for destructor or a count.
If your global objects are in separate DLL's(or shared libs) that are lazy loaded.
This may cause a problem similar to your description.
You are not setting the pointer back to null.
MessageAllocator::~MessageAllocator()
{
s_count--;
if (s_count == 0)
{
delete s_map;
s_map = 0;
}
}
Turns out that the object files containing the static initializers were not included by the linker because nothing referenced any functions in them. To work around this I extern "C"-ed one of the generated functions so that it would have a predictable non-mangled name and then forced a reference to it using a pragma like this for each message
#pragma comment(linker, "/include:Create_GraphicsDynamicMeshCreationMessage")
which I put in the generated header file that is later included in all the other non-generated files. It's MSVC only and kind of hack but I assume I can do something similar on GCC once I eventually port it.