(Yes, I know that one machine instruction usually doesn't matter. I'm asking this question because I want to understand the pimpl idiom, and use it in the best possible way; and because sometimes I do care about one machine instruction.)
In the sample code below, there are two classes, Thing and
OtherThing. Users would include "thing.hh".
Thing uses the pimpl idiom to hide it's implementation.
OtherThing uses a C style – non-member functions that return and take
pointers. This style produces slightly better machine code. I'm
wondering: is there a way to use C++ style – ie, make the functions
into member functions – and yet still save the machine instruction. I like this style because it doesn't pollute the namespace outside the class.
Note: I'm only looking at calling member functions (in this case, calc). I'm not looking at object allocation.
Below are the files, commands, and the machine code, on my Mac.
thing.hh:
class ThingImpl;
class Thing
{
ThingImpl *impl;
public:
Thing();
int calc();
};
class OtherThing;
OtherThing *make_other();
int calc(OtherThing *);
thing.cc:
#include "thing.hh"
struct ThingImpl
{
int x;
};
Thing::Thing()
{
impl = new ThingImpl;
impl->x = 5;
}
int Thing::calc()
{
return impl->x + 1;
}
struct OtherThing
{
int x;
};
OtherThing *make_other()
{
OtherThing *t = new OtherThing;
t->x = 5;
}
int calc(OtherThing *t)
{
return t->x + 1;
}
main.cc (just to test the code actually works...)
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing;
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other();
printf("calc: %d\n", calc(t2));
}
Makefile:
all: main
thing.o : thing.cc thing.hh
g++ -fomit-frame-pointer -O2 -c thing.cc
main.o : main.cc thing.hh
g++ -fomit-frame-pointer -O2 -c main.cc
main: main.o thing.o
g++ -O2 -o $# $^
clean:
rm *.o
rm main
Run make and then look at the machine code. On the mac I use otool -tv thing.o | c++filt. On linux I think it's objdump -d thing.o. Here is the relevant output:
Thing::calc():
0000000000000000 movq (%rdi),%rax
0000000000000003 movl (%rax),%eax
0000000000000005 incl %eax
0000000000000007 ret
calc(OtherThing*):
0000000000000010 movl (%rdi),%eax
0000000000000012 incl %eax
0000000000000014 ret
Notice the extra instruction because of the pointer indirection. The first function looks up two fields (impl, then x), while the second only needs to get x. What can be done?
One instruction is rarely a thing to spend much time worrying over. Firstly, the compiler may cache the pImpl in a more complex use case, thus amortising the cost in a real-world scenario. Secondly, pipelined architectures make it almost impossible to predict the real cost in clock cycles. You'll get a much more realistic idea of the cost if you run these operations in a loop and time the difference.
Not too hard, just use the same technique inside your class. Any halfway decent optimizer will inline
the trivial wrapper.
class ThingImpl;
class Thing
{
ThingImpl *impl;
static int calc(ThingImpl*);
public:
Thing();
int calc() { calc(impl); }
};
There's the nasty way, which is to replace the pointer to ThingImpl with a big-enough array of unsigned chars and then placement/new reinterpret cast/explicitly destruct the ThingImpl object.
Or you could just pass the Thing around by value, since it should be no larger than the pointer to the ThingImpl, though may require a little more than that (reference counting of the ThingImpl would defeat the optimisation, so you need some way of flagging the 'owning' Thing, which might require extra space on some architectures).
I disagree about your usage: you are not comparing the 2 same things.
#include "thing.hh"
#include <cstdio>
int main()
{
Thing *t = new Thing; // 1
printf("calc: %d\n", t->calc());
OtherThing *t2 = make_other(); // 2
printf("calc: %d\n", calc(t2));
}
You have in fact 2 calls to new here, one is explicit and the other is implicit (done by the constructor of Thing.
You have 1 new here, implicit (inside 2)
You should allocate Thing on the stack, though it would not probably change the double dereferencing instruction... but could change its cost (remove a cache miss).
However the main point is that Thing manages its memory on its own, so you can't forget to delete the actual memory, while you definitely can with the C-style method.
I would argue that automatic memory handling is worth an extra memory instruction, specifically because as it's been said, the dereferenced value will probably be cached if you access it more than once, thus amounting to almost nothing.
Correctness is more important than performance.
Let the compiler worry about it. It knows far more about what is actually faster or slower than we do. Especially on such a minute scale.
Having items in classes has far, far more benefits than just encapsulation. PIMPL's a great idea, if you've forgotten how to use the private keyword.
Related
I have a program which nearly immediately finishes with -O0 on gcc, but hangs forever with gcc and -O3. It also exits immediately if I remove the [[gnu::pure]] function attribute, even though the function does not modify global state. The program is in three files:
thread.hpp
#include <atomic>
extern ::std::atomic<bool> stopthread;
extern void threadloop();
[[gnu::pure]] extern int get_value_plus(int x);
thread.cpp
#include <thread>
#include <atomic>
#include "thread.hpp"
namespace {
::std::atomic<int> val;
}
::std::atomic<bool> stopthread;
void threadloop()
{
while (!stopthread.load())
{
++val;
}
}
[[gnu::pure]] int get_value_plus(int x)
{
return val.load() + x;
}
main.cpp
#include <thread>
#include "thread.hpp"
int main()
{
stopthread.store(false);
::std::thread loop(threadloop);
while ((get_value_plus(5) + get_value_plus(5)) % 2 == 0)
;
stopthread.store(true);
loop.join();
return 0;
}
Is this a compiler bug? A lack of documentation for the proper caveats to using [[gnu::pure]]? A misreading of the documentation for [[gnu::pure]] such that I've coded a bug?
I have a program which nearly immediately finishes with -O0 on gcc, but hangs forever with gcc and -O3
Yes, because the program gets compiled down to an infinite loop when optimizations are enabled.
Is this a compiler bug? A lack of documentation for the proper caveats to using [[gnu::pure]]? A misreading of the documentation for [[gnu::pure]] such that I've coded a bug?
It isn't a compiler bug. get_value_plus is not a pure function:
[[gnu::pure]] int get_value_plus(int x)
{
return val.load() + x;
}
since the return value can change at any time (for the same x), because val is expected to be modified by the other thread.
The compiler, however, thinking that get_value_plus will always return the same value, will perform CSE and therefore will assume this:
while ((get_value_plus(5) + get_value_plus(5)) % 2 == 0);
can be written as:
int x = get_value_plus(5);
while ((x + x) % 2 == 0);
Which, indeed, it is an infinite loop regardless of the value of x:
while (true);
Please see the GCC documentation on pure for more details.
In general, avoid using optimization hints unless they are well understood!
In this case, the misunderstanding is that pure functions are allowed to read global memory, but not if that memory is changing from call to call by someone else than the caller:
However, functions declared with the pure attribute can safely read any non-volatile objects, and modify the value of objects in a way that does not affect their return value or the observable state of the program.
As it turns out, I misread the documentation. From the online documentation about the pure attribute in gcc:
The pure attribute prohibits a function from modifying the state of the program that is observable by means other than inspecting the function’s return value. However, functions declared with the pure attribute can safely read any non-volatile objects, and modify the value of objects in a way that does not affect their return value or the observable state of the program.
and a different paragraph:
Some common examples of pure functions are strlen or memcmp. Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between consecutive calls (such as the standard C feof function in a multithreading environment).
These two paragraphs make it clear that I've been lying to the compiler, and the function I wrote does not qualify as being 'pure' because it depends on a variable that might change at any time.
The reason I asked this question is because the answers to this question: __attribute__((const)) vs __attribute__((pure)) in GNU C didn't address this problem at all (at the time I asked my question anyway). And a recent C++ Weekly episode had a comment asking about threads and pure functions. So it's clear there's some confusion out there.
So the criteria for a function that qualifies for this marker is that it must not modify global state, though it is allowed to read it. But, if it does read global state, it is not allowed to read any global state that could be considered 'volatile', and this is best understood as state that might change between two immediately successive calls to the function, i.e. if the state it's reading can change in a situation like this:
f();
f();
Google wrote in Android ndk guides site:
Memory allocated in one library, and freed in the other, causing memory leakage or heap corruption.
Why?
It's always correct?
EDIT
As #Galik wrote the context of this quote is:
In C++, it is not safe to define more than one copy of the same function or object in a single program. This is one aspect of the One Definition Rule present in the C++ standard.
When using a static runtime (and static libraries in general), it is easy to accidentally break this rule. For example, the following application breaks this rule:
...
In this situation, the STL, including and global data and static constructors, will be present in both libraries. The runtime behavior of this application is undefined, and in practice crashes are very common. Other possible issues include:
Memory allocated in one library, and freed in the other, causing memory leakage or heap corruption.
Exceptions raised in libfoo.so going uncaught in libbar.so, causing your app to crash.
Buffering of std::cout not working properly.
One possible reason why it's considered a mistake is because usually allocation comes with a certain initialization, and deallocation with some destruction logic.
Theory:
The main danger is mismatching initialization / destruction logic.
Lets look at two different STL versions as two different and separate libraries.
Consider this: Each library lets you allocate / deallocate something. Upon resource acquisition, each library does some house-keeping on that thing in its own way, which is encapsulated (read: you don't know about it, and don't need to).
What happens if the housekeeping each does is significantly different?
Example:
class Foo
{
private:
int x;
public:
Foo() : x(42) {}
};
namespace ModuleA
{
Foo* createAFoo()
{
return new Foo();
}
void deleteAFoo(Foo* foo)
{
if(foo != nullptr)
delete foo;
}
}
namespace ModuleB
{
std::vector<Foo*> all_foos;
Foo* createAFoo()
{
Foo* foo = new Foo();
all_foos.push_back(foo);
return foo;
}
void deleteAFoo(Foo* foo)
{
if(foo != nullptr)
{
std::vector<int>::iterator position = std::find(all_foos.begin(), all_foos.end(), foo);
if (position != myVector.end())
{
myVector.erase(position);
}
delete foo;
}
}
}
Question: What happens if we do the following?
Foo* foo = ModuleB::createAFoo();
ModuleA::deleteAFoo(foo);
Answer: ModuleB now has a dangling pointer.
This can cause all sorts scary and hard to debug of issues down the line.
We're also not making all_foos smaller, which may be considered a memory leak (the size of a pointer each time).
Question: What happens if we do the following?
Foo* foo = ModuleA::createAFoo();
ModuleB::deleteAFoo(foo);
Answer: Looks like... nothing bad happens!
But what if I removed the if (position != myVector.end()) check? Then we'd have a problem.
And an STL might do that in the name of optimization, so...
I wrote that section of the doc. I've had to debug an issue where one of the standard stream objects (cout or similar) was doubly linked into to libraries resulting in two distinct instances of the object. The constructor for the object was run twice, but twice on the same instance of the object. One object was double initialized, the other was uninitialized. When the unconstructed object was used, it would attempt to access some uninitialized memory and crash.
There's really no limit to the strangeness of undefined behavior. It's entirely possible that the bug I'm remembering was unique to the version of the compiler, linker, or loader that we were using at the time.
EDIT: Here's a repro case:
// foo.cpp
#include <stdio.h>
class Foo {
public:
Foo() { printf("this: %p\n", this); }
};
Foo foo;
// main.cpp
int main() {
}
Build with:
$ clang++ --version
clang version 7.0.0 (trunk 330210)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
$ clang++ foo.cpp -shared -o libfoo.so
$ clang++ foo.cpp -shared -o libbar.so
$ clang++ main.cpp -L. -lfoo -lbar -rpath '$ORIGIN'
Both libfoo and libbar will be loaded, and each have their own copy of the object. The constructor will be run twice, but as you can see only one instance of the object has its constructor run; it just runs twice.
$ ./a.out
this: 0x7f9475d48031
this: 0x7f9475d48031
Using policy based design, an EncapsulatedAlgorithm:
template< typename Policy>
class EncapsulatedAlgorithm : public Policy
{
double x = 0;
public:
using Policy::subCalculate;
void calculate()
{
Policy::subCalculate(x);
}
protected:
~EncapsulatedAlgorithm() = default;
};
may have a policy Policy that performs a sub-calculation. The sub-calculation is not necessary for the algorithm: it can be used in some cases to speed up algorithm convergence. So, to model that, let's say there are three policies.
One that just "logs" something:
struct log
{
static void subCalculate(double& x)
{
std::cout << "Doing the calculation" << endl;
}
};
one that calculates:
struct calculate
{
static void subCalculate(double& x)
{
x = x * x;
}
};
and one to bring them all and in the darkness bind them :D - that does absolutely nothing:
struct doNothing
{
static void subCalculate(double& x)
{
// Do nothing.
}
};
Here is the example program:
typedef EncapsulatedAlgorithm<doNothing> nothingDone;
typedef EncapsulatedAlgorithm<calculate> calculationDone;
typedef EncapsulatedAlgorithm<loggedCalculation> calculationLogged;
int main(int argc, const char *argv[])
{
nothingDone n;
n.calculate();
calculationDone c;
c.calculate();
calculationLogged l;
l.calculate();
return 0;
}
And here is the live example. I tried examining the assembly code produced by gcc with the optimization turned on:
g++ -S -O3 -std=c++11 main.cpp
but I do not know enough about Assembly to interpret the result with certainty - the resulting file was tiny and I was unable to recognize the function calls, because the code of the static functions of all policies was inlined.
What I could see is that when no optimization is set for the, within the main function, there is a call and a subsequent leave related to the 'doNothing::subCalculate'
call _ZN9doNothing12subCalculateERd
leave
Here are my questions:
Where do I start to learn in order to be able to read what g++ -S spews out?
Is the empty function optimized away or not and where in main.s are those lines?
Is this design O.K.? Usually, implementing a function that does nothing is a bad thing, as the interface is saying something completely different (subCalculate instead of doNothing), but in the case of policies, the policy name clearly states that the function will not do anything. Otherwise I need to do type traits stuff like enable_if, etc, just to exclude a single function call.
I went to http://assembly.ynh.io/, which shows assembly output. I
template< typename Policy>
struct EncapsulatedAlgorithm : public Policy
{
void calculate(double& x)
{
Policy::subCalculate(x);
}
};
struct doNothing
{
static void subCalculate(double& x)
{
}
};
void func(double& x) {
EncapsulatedAlgorithm<doNothing> a;
a.calculate(x);
}
and got these results:
.Ltext0:
.globl _Z4funcRd
_Z4funcRd:
.LFB2:
.cfi_startproc #void func(double& x) {
.LVL0:
0000 F3 rep #not sure what this is
0001 C3 ret #}
.cfi_endproc
.LFE2:
.Letext0:
Well, I only see two opcodes in the assembly there. rep (no idea what that is) and end function. It appears that the G++ compiler can easily optimize out the function bodies.
Where do I start to learn in order to be able to read what g++ -S spews out?
This site's not for recommending reading material. Google "x86 assembly language".
Is the empty function optimized away or not and where in main.s are those lines?
It will have been when the optimiser was enabled, so there won't be any lines in the generated .S. You've already found the call in the unoptimised output....
In fact, even the policy that's meant to do a multiplication may be removed as the compiler should be able to work out you're not using the resultant value. Add code to print the value of x, and seed x from some value that can't be known at compile time (it's often convenient to use argc in a little experimental program like this, then you'll be forcing the compiler to at least leave in the functionally significant code.
Is this design O.K.?
That depends on a lot of things (like whether you want to use templates given the implementation needs to be exposed in the header file, whether you want to deal with having distinct types for every instantiation...), but you're implementing the design correctly.
Usually, implementing a function that does nothing is a bad thing, as the interface is saying something completely different (subCalculate instead of doNothing), but in the case of policies, the policy name clearly states that the function will not do anything. Otherwise I need to do type traits stuff like enable_if, etc, just to exclude a single function call.
You may want to carefully consider your function names... do_any_necessary_calculations(), ensure_exclusivity() instead of lock_mutex(), after_each_value() instead of print_breaks etc..
The situation I have is that I am trying to initialize a file scoped variable, std::string, in a shared object constructor. It will probably be more clear in code:
#include <string>
#include <dlfcn.h>
#include <cstring>
static std::string pathToDaemon; // daemon should always be in the same dir as my *.so
__attribute__((constructor))
static void SetPath()
{
int lastSlash(0):
Dl_info dl_info;
memset(&dl_info, 0, sizeof(dl_info));
if((dladdr((void*)SetPath, &dl_info)) == 0)
throw up;
pathToDaemon = dl_info.dli_fname; // **whoops, segfault here**
lastSlash = pathToDaemon.find_last_of('/');
if(std::string::npos == lastSlash)
{
// no slash, but in this dir
pathToDaemon = "progd";
}
else
{
pathToDaemon.erase(pathToDaemon.begin() + (lastSlash+1), pathToDaemon.end());
pathToDaemon.append("progd");
}
std::cout << "DEBUG: path to daemon is: " << pathToDaemon << std::endl;
}
I have a very simple program that does this same thing: a test driver program for concept if you will. The code in that looks just like this: a "shared object ctor" which uses dladdr() to store off the path of the *.so file when the file is loaded.
Modifications I've tried:
namespace {
std::string pathToDaemon;
__attribute__((constructor))
void SetPath() {
// function def
}
}
or
static std::string pathToDaemon;
__attribute__((constructor))
void SetPath() { // this function not static
// function def
}
and
std::string pathToDaemon; // variable not static
__attribute__((constructor))
void SetPath() { // this function not static
// function def
}
The example you see above sits in a file that is compiled into both a static object library and a DLL. The compilation process:
options for static.a: --std=C++0x -c -Os.
options for shared.so: -Wl,--whole-archive /path/to/static.a -Wl,--no-whole-archive -lz -lrt -ldl -Wl,-Bstatic -lboost_python -lboost_thread -lboost_regex -lboost_system -Wl,-Bdynamic -fPIC -shared -o mymodule.so [a plethora of more objects which wrap into python the static stuff]
The hoops I have to jump through in the bigger project make a much more complicated build process than my little test driver program requires. This makes me think that the problem lies there. Can anyone please shed some light on what I'm missing?
Thanks,
Andy
I think it's worth giving the answer that I've found. The problem was due to the complex nature of the shared library loading. I discovered after some digging that I could reproduce the problem in my test bed program when compiling the code with optimizations enabled. That confirmed the hypothesis that the variable truly didn't exist when it was being accessed by the constructor function.
GCC includes some extra tools for C++ which allow for developers to force certain things to happen at particular times during code initialization. More precisely, it allows for certain things to take place in particular order rather than particular times.
For example:
int someVar(55) __attribute__((init_priority(101)));
// This function is a lower priority than the initialization above
// so, this will happen *after*
__attribute__((constructor(102)))
void SomeFunc() {
// do important stuff
if(someVar == 55) {
// do something here that important too
someVar = 44;
}
}
I was able to use these tools to success in the test bed program even with optimizations enabled. The happiness which ensued was short lived when applied to my much larger library. Ultimately, the problem was due to the nature of such a large amount of code and the problematic way in which the variables are brought into existence. It just wasn't reliable to use these mechanisms.
Since I wanted to avoid repeated calls for evaluating the path, i.e.
std::string GetPath() {
Dl_info dl_info;
dladdr((void*)GetPath, &dl_info);
// do wonderful stuff to find the path
return dl_info.dli_fname;
}
The solution turned out to be much simpler than I was trying to make it:
namespace {
std::string PathToProgram() {
Dl_info dl_info;
dladdr((void*)PathToProgram, &dl_info);
std::string pathVar(dl_info.dli_fname);
// do amazing things to find the last slash and remove the shared object
// from that path and append the name of the external daemon
return pathVar;
}
std::string DaemonPath() {
// I'd forgotten that static variables, like this, are initialized
// only once due to compiler magic.
static const std::string pathToDaemon(PathToProgram());
return pathToDaemon;
}
}
As you can see, exactly what I wanted with less confusion. Everything happens only once, except calls to DaemonPath(), and everything remains within the translation unit.
I hope this helps someone who runs into this in the future.
Andy
Maybe you could try running valgrind on your program
In you self posted solution above, you have changed your »interface« (for the code that reads your pathToDaemon / DaemonPath()) from »Accessing a file scoped variable« to »calling a function in anonymous namespace« - so far ok.
But the implementation of DaemonPath() is not done in a thread-safe way. I though that thread-safeness matters, because your are wrote »-lboost_thread« in your question. So you may think about to change the implementation thread-safe. There are many discussions and solutions about singleton pattern and thread-safeness available, e.g.:
Article from Scott Meyers
Stack Overflow
The fact is, that your DaemonPath() will invoked (maybe far) after loading of the library is done. Note, that only the 1st call to the singleton pattern is critical in a multithreaded environment.
As an alternative, you may add a simple »early« call to your DaemonPath() function like this:
namespace {
std::string PathToProgram() {
... your code from above ...
}
std::string DaemonPath() {
... your code from above ...
}
__attribute__((constructor)) void MyPathInit() {
DaemonPath();
}
}
or in a more portable way like this:
namespace {
std::string PathToProgram() {
... your code from above ...
}
std::string DaemonPath() {
... your code from above ...
}
class MyPathInit {
public:
MyPathInit() {
DaemonPath();
}
} myPathInit;
}
Of course, this approach don't makes your singleton pattern thread-safe. But sometimes, there are situations, we can be sure that there are no concurrent thread accesses (e.g. at initialization time when the shared lib is loading). If this conditions matches for you, this approach could be a way to bypass thread-safeness problem without the use of thread locking (mutex...).
Question: How can I access a member variable in assembly from within a non-POD class?
Elaboration:
I have written some inline assembly code for a class member function but what eludes me is how to access class member variables. I've tried the offsetof macro but this is a non-POD class.
The current solution I'm using is to assign a pointer from global scope to the member variable but it's a messy solution and I was hoping there was something better that I dont know about.
note: I'm using the G++ compiler. A solution with Intel syntax Asm would be nice but I'll take anything.
example of what I want to do (intel syntax):
class SomeClass
{
int* var_j;
void set4(void)
{
asm("mov var_j, 4"); // sets pointer SomeClass::var_j to address "4"
}
};
current hackish solution:
int* global_j;
class SomeClass
{
int* var_j;
void set4(void)
{
asm("mov global_j, 4"); // sets pointer global_j to address "4"
var_j = global_j; // copy it back to member variable :(
}
};
Those are crude examples but I think they get the point across.
This is all you need:
__asm__ __volatile__ ("movl $4,%[v]" : [v] "+m" (var_j)) ;
Edited to add: The assembler does accept Intel syntax, but the compiler doesn't know it, so this trick won't work using Intel syntax (not with g++ 4.4.0, anyway).
class SomeClass
{
int* var_j;
void set4(void)
{
__asm__ __volatile__("movl $4, (%0,%1)"
:
: "r"(this), "r"((char*)&var_j-(char*)this)
:
);
}
};
This might work too, saving you one register:
__asm__ __volatile__("movl $4, %1(%0)"
:
: "r"(this), "i"((char*)&var_j-(char*)this)
:
);
In fact, since the offset of var_j wrt. this should be known at compile time, the second option is the way to go, even if it requires some tweaking to get it working. (I don't have access to a g++ system right now, so I'll leave this up to you to investigate.)
And don't ever underestimate the importance of __volatile__. Took more of my time that I'd liked to track down bugs that appeared because I missed the volatile keyword and the compiler took it upon itself to do strange things with my assembly.