Mapping class to instance - c++

I'm trying to build a simple entity/component system in c++ base on the second answer to this question : Best way to organize entities in a game?
Now, I would like to have a static std::map that returns (and even automatically create, if possible).
What I am thinking is something along the lines of:
PositionComponent *pos = systems[PositionComponent].instances[myEntityID];
What would be the best way to achieve that?

Maybe this?
std::map< std::type_info*, Something > systems;
Then you can do:
Something sth = systems[ &typeid(PositionComponent) ];
Just out of curiosity, I checked the assembler code of this C++ code
#include <typeinfo>
#include <cstdio>
class Foo {
virtual ~Foo() {}
};
int main() {
printf("%p\n", &typeid(Foo));
}
to be sure that it is really a constant. Assembler (stripped) outputted by GCC (without any optimisations):
.globl _main
_main:
LFB27:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
pushl %ebx
LCFI2:
subl $20, %esp
LCFI3:
call L3
"L00000000001$pb":
L3:
popl %ebx
leal L__ZTI3Foo$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
leal LC0-"L00000000001$pb"(%ebx), %eax
movl %eax, (%esp)
call _printf
movl $0, %eax
addl $20, %esp
popl %ebx
leave
ret
So it actually has to read the L__ZTI3Foo$non_lazy_ptr symbol (I wonder though that this is not constant -- maybe with other compiler options or with other compilers, it is). So a constant may be slightly faster (if the compiler sees the constant at compile time) because you save a read.

You should create some constants (ex. POSITION_COMPONENT=1) and then map those integers to instances.

Related

Hidden parameter in C++ function call [duplicate]

The return value of a function is usually stored on the stack or in a register. But for a large structure, it has to be on the stack. How much copying has to happen in a real compiler for this code? Or is it optimized away?
For example:
struct Data {
unsigned values[256];
};
Data createData()
{
Data data;
// initialize data values...
return data;
}
(Assuming the function cannot be inlined..)
None; no copies are done.
The address of the caller's Data return value is actually passed as a hidden argument to the function, and the createData function simply writes into the caller's stack frame.
This is known as the named return value optimisation. Also see the c++ faq on this topic.
commercial-grade C++ compilers implement return-by-value in a way that lets them eliminate the overhead, at least in simple cases
...
When yourCode() calls rbv(), the compiler secretly passes a pointer to the location where rbv() is supposed to construct the "returned" object.
You can demonstrate that this has been done by adding a destructor with a printf to your struct. The destructor should only be called once if this return-by-value optimisation is in operation, otherwise twice.
Also you can check the assembly to see that this happens:
Data createData()
{
Data data;
// initialize data values...
data.values[5] = 6;
return data;
}
here's the assembly:
__Z10createDatav:
LFB2:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
subl $1032, %esp
LCFI2:
movl 8(%ebp), %eax
movl $6, 20(%eax)
leave
ret $4
LFE2:
Curiously, it allocated enough space on the stack for the data item subl $1032, %esp, but note that it takes the first argument on the stack 8(%ebp) as the base address of the object, and then initialises element 6 of that item. Since we didn't specify any arguments to createData, this is curious until you realise this is the secret hidden pointer to the parent's version of Data.
But for a large structure, it has to be on the heap stack.
Indeed so! A large structure declared as a local variable is allocated on the stack. Glad to have that cleared up.
As for avoiding copying, as others have noted:
Most calling conventions deal with "function returning struct" by passing an additional parameter that points the location in the caller's stack frame in which the struct should be placed. This is definitely a matter for the calling convention and not the language.
With this calling convention, it becomes possible for even a relatively simple compiler to notice when a code path is definitely going to return a struct, and for it to fix assignments to that struct's members so that they go directly into the caller's frame and don't have to be copied. The key is for the compiler to notice that all terminating code paths through the function return the same struct variable. If that's the case, the compiler can safely use the space in the caller's frame, eliminating the need for a copy at the point of return.
There are many examples given, but basically
This question does not have any definite answer. it will depend on the compiler.
C does not specify how large structs are returned from a function.
Here's some tests for one particular compiler, gcc 4.1.2 on x86 RHEL 5.4
gcc trivial case, no copying
[00:05:21 1 ~] $ gcc -O2 -S -c t.c
[00:05:23 1 ~] $ cat t.s
.file "t.c"
.text
.p2align 4,,15
.globl createData
.type createData, #function
createData:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl $1, 24(%eax)
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
gcc more realistic case , allocate on stack, memcpy to caller
#include <stdlib.h>
struct Data {
unsigned values[256];
};
struct Data createData()
{
struct Data data;
int i;
for(i = 0; i < 256 ; i++)
data.values[i] = rand();
return data;
}
[00:06:08 1 ~] $ gcc -O2 -S -c t.c
[00:06:10 1 ~] $ cat t.s
.file "t.c"
.text
.p2align 4,,15
.globl createData
.type createData, #function
createData:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl $1, %ebx
subl $1036, %esp
movl 8(%ebp), %edi
leal -1036(%ebp), %esi
.p2align 4,,7
.L2:
call rand
movl %eax, -4(%esi,%ebx,4)
addl $1, %ebx
cmpl $257, %ebx
jne .L2
movl %esi, 4(%esp)
movl %edi, (%esp)
movl $1024, 8(%esp)
call memcpy
addl $1036, %esp
movl %edi, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
gcc 4.4.2### has grown a lot, and does not copy for the above non-trivial case.
.file "t.c"
.text
.p2align 4,,15
.globl createData
.type createData, #function
createData:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl $1, %ebx
subl $1036, %esp
movl 8(%ebp), %edi
leal -1036(%ebp), %esi
.p2align 4,,7
.L2:
call rand
movl %eax, -4(%esi,%ebx,4)
addl $1, %ebx
cmpl $257, %ebx
jne .L2
movl %esi, 4(%esp)
movl %edi, (%esp)
movl $1024, 8(%esp)
call memcpy
addl $1036, %esp
movl %edi, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
In addition, VS2008 (compiled the above as C) will reserve struct Data on the stack of createData() and do a rep movsd loop to copy it back to the caller in Debug mode, in Release mode it will move the return value of rand() (%eax) directly back to the caller
typedef struct {
unsigned value[256];
} Data;
Data createData(void) {
Data r;
calcualte(&r);
return r;
}
Data d = createData();
msvc(6,8,9) and gcc mingw(3.4.5,4.4.0) will generate code like the following pseudocode
void createData(Data* r) {
calculate(&r)
}
Data d;
createData(&d);
gcc on linux will issue a memcpy() to copy the struct back on the stack of the caller. If the function has internal linkage, more optimizations become available though.

Why does GCC 4.8.2 not propagate 'unused but set' optimization?

If a variable is not read from ever, it is obviously optimized out. However, the only store operation on that variable is the result of the only read operation of another variable. So, this second variable should also be optimized out. Why is this not being done?
int main() {
timeval a,b,c;
// First and only logical use of a
gettimeofday(&a,NULL);
// Junk function
foo();
// First and only logical use of b
gettimeofday(&b,NULL);
// This gets optimized out as c is never read from.
c.tv_sec = a.tv_sec - b.tv_sec;
//std::cout << c;
}
Aseembly (gcc 4.8.2 with -O3):
subq $40, %rsp
xorl %esi, %esi
movq %rsp, %rdi
call gettimeofday
call foo()
leaq 16(%rsp), %rdi
xorl %esi, %esi
call gettimeofday
xorl %eax, %eax
addq $40, %rsp
ret
subq $8, %rsp
Edit: The results are the same for using rand() .
There's no store operation! There are 2 calls to gettimeofday, yes, but that is a visible effect. And visible effects are precisely the things that may not be optimized away.

Does C++ optimisation level affect Swig Python module performance

I have a large Swig Python module. The C++ wrapper ends up being about 320,000 LoC (including headers I guess). I currently compile this with -O1, and g++ produces a binary that is 44MiB in size, and takes about 3 minutes to compile it.
If I turn off optimisation (-O0), the binary comes out at 40MiB, and it takes 44s to compile.
Is compiling the wrapper with -O0 going to hurt the performance of the python module significantly? Before I go and profile the performance of the module at different optimisation levels, has anyone done this sort of analysis before or have any insight into whether it matters?
-O0 deactivates all the optimizations performed by gcc. And optimizations matter.
So, without a lot of knowledge in your application, I could suggest that this will hurt the performance of your application.
A usually safe optimization level to use is -O2.
You can check the kind of optimizations performed by GCC in:
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.
But at the end, if you want to know exactly you should compile at different levels and profile.
This is bad irrespective of SWIG modules or not. There are many optimisations that happen even with gcc -O1 which you will miss if you prevent them happening.
You can check the difference by inspecting the asm generated by your compiler of choice. Of these the ones I trivially know will be detrimental to SWIG's generated wrapper:
Dead code elimination:
void foo() {
int a = 1;
a = 0;
}
With -O1 this entirely pointless code gets totally removed:
foo:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
whereas with -O0 it becomes:
foo:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
movl $0, -4(%ebp)
leave
ret
Register allocation will be detrimentally impacted in functions with lots of local variables - most SWIG wrapper functions will see a hit from this. It's hard to show a concise example of this though.
Another example, the output from gcc compiling the SWIG wrapper for the prototype:
int foo(unsigned int a, unsigned int b, unsigned int c, unsigned int d);
Generates with -O0:
Java_testJNI_foo:
pushl %ebp
movl %esp, %ebp
subl $88, %esp
movl 16(%ebp), %eax
movl %eax, -48(%ebp)
movl 20(%ebp), %eax
movl %eax, -44(%ebp)
movl 24(%ebp), %eax
movl %eax, -56(%ebp)
movl 28(%ebp), %eax
movl %eax, -52(%ebp)
movl 32(%ebp), %eax
movl %eax, -64(%ebp)
movl 36(%ebp), %eax
movl %eax, -60(%ebp)
movl 40(%ebp), %eax
movl %eax, -72(%ebp)
movl 44(%ebp), %eax
movl %eax, -68(%ebp)
movl $0, -32(%ebp)
movl -48(%ebp), %eax
movl %eax, -28(%ebp)
movl -56(%ebp), %eax
movl %eax, -24(%ebp)
movl -64(%ebp), %eax
movl %eax, -20(%ebp)
movl -72(%ebp), %eax
movl %eax, -16(%ebp)
movl -16(%ebp), %eax
movl %eax, 12(%esp)
movl -20(%ebp), %eax
movl %eax, 8(%esp)
movl -24(%ebp), %eax
movl %eax, 4(%esp)
movl -28(%ebp), %eax
movl %eax, (%esp)
call foo
movl %eax, -12(%ebp)
movl -12(%ebp), %eax
movl %eax, -32(%ebp)
movl -32(%ebp), %eax
leave
ret
Compared to -O1 which generates just:
Java_testJNI_foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 40(%ebp), %eax
movl %eax, 12(%esp)
movl 32(%ebp), %eax
movl %eax, 8(%esp)
movl 24(%ebp), %eax
movl %eax, 4(%esp)
movl 16(%ebp), %eax
movl %eax, (%esp)
call foo
leave
ret
With -O1 g++ can generate far smarter code for:
%module test
%{
int value() { return 100; }
%}
%feature("compactdefaultargs") foo;
%inline %{
int foo(int a=value(), int b=value(), int c=value()) {
return 0;
}
%}
The short answer is with optimisations disabled completely GCC generates extremely naive code - this is true of SWIG wrappers as much as any other program, if not more given the style of the automatically generated code.

how do static variables inside functions work?

In the following code:
int count(){
static int n(5);
n = n + 1;
return n;
}
the variable n is instantiated only once at the first call to the function.
There should be a flag or something so it initialize the variable only once.. I tried to look on the generated assembly code from gcc, but didn't have any clue.
How does the compiler handle this?
This is, of course, compiler-specific.
The reason you didn't see any checks in the generated assembly is that, since n is an int variable, g++ simply treats it as a global variable pre-initialized to 5.
Let's see what happens if we do the same with a std::string:
#include <string>
void count() {
static std::string str;
str += ' ';
}
The generated assembly goes like this:
_Z5countv:
.LFB544:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA544
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushq %r13
pushq %r12
pushq %rbx
subq $8, %rsp
movl $_ZGVZ5countvE3str, %eax
movzbl (%rax), %eax
testb %al, %al
jne .L2 ; <======= bypass initialization
.cfi_offset 3, -40
.cfi_offset 12, -32
.cfi_offset 13, -24
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_acquire ; acquire the lock
testl %eax, %eax
setne %al
testb %al, %al
je .L2 ; check again
movl $0, %ebx
movl $_ZZ5countvE3str, %edi
.LEHB0:
call _ZNSsC1Ev ; call the constructor
.LEHE0:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_release ; release the lock
movl $_ZNSsD1Ev, %eax
movl $__dso_handle, %edx
movl $_ZZ5countvE3str, %esi
movq %rax, %rdi
call __cxa_atexit ; schedule the destructor to be called at exit
jmp .L2
.L7:
.L3:
movl %edx, %r12d
movq %rax, %r13
testb %bl, %bl
jne .L5
.L4:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_abort
.L5:
movq %r13, %rax
movslq %r12d,%rdx
movq %rax, %rdi
.LEHB1:
call _Unwind_Resume
.L2:
movl $32, %esi
movl $_ZZ5countvE3str, %edi
call _ZNSspLEc
.LEHE1:
addq $8, %rsp
popq %rbx
popq %r12
popq %r13
leave
ret
.cfi_endproc
The line I've marked with the bypass initialization comment is the conditional jump instruction that skips the construction if the variable already points to a valid object.
This is entirely up to the implementation; the language standard says nothing about that.
In practice, the compiler will usually include a hidden flag variable somewhere that indicates whether the static variable has already been instantiated or not. The static variable and the flag will probably be in the static storage area of the program (e.g. the data segment, not the stack segment), not in the function scope memory, so you may have to look around about in the assembly. (The variable can't go on the call stack, for obvious reasons, so it's really like a global variable. "static allocation" really covers all sorts of static variables!)
Update: As #aix points out, if the static variable is initialized to a constant expression, you may not even need a flag, because the initialization can be performed at load time rather than at the first function call. In C++11 you should be able to take advantage of that better than in C++03 thanks to the wider availability of constant expressions.
It's quite likely that this variable will be handled just as ordinary global variable by gcc. That means the initialization will be statically initialized directly in the binary.
This is possible, since you initialize it by a constant. If you initialized it eg. with another function return value, the compiler would add a flag and skip the initialization based on the flag.

gcc stack optimization

Hi I have a question on possible stack optimization by gcc (or g++)..
Sample code under FreeBSD (does UNIX variance matter here?):
void main() {
char bing[100];
..
string buffer = ....;
..
}
What I found in gdb for a coredump of this program is that the address
of bing is actually lower than that buffer (namely, &bing[0] < &buffer).
I think this is totally the contrary of was told in textbook. Could there
be some compiler optimization that re-organize the stack layout in such a
way?
This seems to be only possible explanation but I'm not sure..
In case you're interested, the coredump is due to the buffer overflow by
bing to buffer (but that also confirms &bing[0] < &buffer).
Thanks!
Compilers are free to organise stack frames (assuming they even use stacks) any way they wish.
They may do it for alignment reasons, or for performance reasons, or for no reason at all. You would be unwise to assume any specific order.
If you hadn't invoked undefined behavior by overflowing the buffer, you probably never would have known, and that's the way it should be.
A compiler can not only re-organise your variables, it can optimise them out of existence if it can establish they're not used. With the code:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
Compare the normal assembler output:
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $80, %esp
movl %gs:20, %eax
movl %eax, 76(%esp)
xorl %eax, %eax
movl $7, (%esp)
movb $11, 5(%esp)
movl $0, %eax
movl 76(%esp), %edx
xorl %gs:20, %edx
je .L3
call __stack_chk_fail
.L3:
leave
ret
with the insanely optimised:
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
Notice anything missing from the latter? Yes, there are no stack manipulations to create space for either bing or x. They don't exist. In fact, the entire code sequence boils down to:
set return code to 0.
return.
A compiler is free to layout local variables on the stack (or keep them in register or do something else with them) however it sees fit: the C and C++ language standards don't say anything about these implementation details, and neither does POSIX or UNIX. I doubt that your textbook told you otherwise, and if it did, I would look for a new textbook.