Difference between static variable in free function and static member function - c++

I have such code:
struct Storage
{
static int GetData()
{
static int global_value;
return global_value++;
}
};
int free_func()
{
static int local_value;
return local_value++;
}
int storage_test_func()
{
return Storage::GetData();
}
Compiling it on OSX:
$ clang++ 1.cpp -shared
And running nm:
$ nm | c++filt
I am getting strange results:
0000000000000f50 T storage_test_func()
0000000000000f30 T free_func()
0000000000000f60 unsigned short Storage::GetData()
0000000000001024 bool free_func()::local_value
0000000000001020 D Storage::GetData()::global_value
U dyld_stub_binder
Two symbols (local_value and global_value) have different linkage! One visible difference is that global_value is defined in static member function and local_value is defined in free function.
Can somebody explain why it happens?
UPD:
After comments reading it looks like I should clarify things.
Maybe using of c++filt was a bad idea. Without it it shows:
$ nm
0000000000000f50 T __Z17storage_test_funcv
0000000000000f30 T __Z9free_funcv
0000000000000f60 t __ZN7Storage7GetDataEv
0000000000001024 b __ZZ9free_funcvE11local_value
0000000000001020 D __ZZN7Storage7GetDataEvE5value
U dyld_stub_binder
So yes. __ZZ9free_funcvE11local_value goes to BSS and __ZZN7Storage7GetDataEvE5value goes to data section.
man nm says that:
If the symbol is local (non-external), the symbol's type is instead represented by the corresponding lowercase letter.
And that what I see. __ZZ9free_funcvE11local_value is marked with lowercase b and __ZZN7Storage7GetDataEvE5value is marked with capital D.
And this is the main part of the question. Why it happens like that?
UPD2
One more way:
$ clang++ -c -emit-llvm 1.cpp
$ llvm-dis 1.bc
shows how these variables are represented internally:
#_ZZ9free_funcvE11local_value = internal global i32 0, align 4
#_ZZN7Storage7GetDataEvE5value = global i32 0, align 4
UPD3
Also there were some concerns about different sections symbols belong to. Putting __ZZ9free_funcvE11local_value to the text section does not change its visibility:
struct Storage
{
static int GetData()
{
static int value;
return value++;
}
};
int free_func()
{
static int local_value = 123;
return local_value++;
}
int storage_test_func()
{
return Storage::GetData();
}
Compiling:
$ clang++ 1.cpp -shared
Checking:
$ nm
Gives:
0000000000000f50 T __Z17storage_test_funcv
0000000000000f30 T __Z9free_funcv
0000000000000f60 t __ZN7Storage7GetDataEv
0000000000001020 d __ZZ9free_funcvE11local_value
0000000000001024 D __ZZN7Storage7GetDataEvE5value
Now both symbols are in data section but still one of those is local and another is global.
And the question is why it happens? Can somebody the logic on such compiler's decision?

There is no difference in a local static variable in static member function, and a local static variable in a free function.
Two symbols (local_value and global_value) have different linkage!
In standard nomenclature and from point of view of the standard, both of the variables have no linkage.
The relevant difference between the functions is not that one is a static member function and the other is not. The relevant difference is that the former is an inline function, but the latter is not.
Non-inline functions can only be defined in one translation unit, and therefore their local static variables do not need to be accessible from other translation units.
Inline functions on the other hand must be defined in every translation unit that use them. And, since the local static must refer to the same object globally, that object must be visible to multiple translation units.

The values will be linked in different storage sections.
000000000001020 D Storage::GetData()::global_value
The Dshows that your variable will be linked to a section which will be initialized. From the nm man page:
"D"
"d" The symbol is in the initialized data section.
The local_value will not be initialized via the c-startup code.
After linking I got:
0804a0d0 b free_func()::local_value
0804a0d8 u Storage::GetData()::global_value
Also from nm man page:
"B"
"b" The symbol is in the uninitialized data section (known as BSS).
"U" The symbol is undefined.
"u" The symbol is a unique global symbol. This is a GNU extension to the standard set of ELF symbol bindings. For such a symbol the
dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use.
The reason is simply that the values will initialized via the standard startup initialization ( data section ) or not ( bss section ). The names for the section are not generally specified but used in common implementations.
You can find a lot Q&A to "When and where will which var initialized or not".
e.g.:
When are static and global variables initialized?
Clarification: ( I hope )
That a variable is not initialized via startup code did not mean that is not initialized. A static variable inside a method/function is typically initialized in the first access of the containing code block. If you want to know how your compiler do this job, have a look in the assembly output.
For gcc you will find some __cxa_ labels around the value initialization which protects your vars from multiple init.

Related

For a function that takes a const struct, does the compiler not optimize the function body?

I have the following piece of code:
#include <stdio.h>
typedef struct {
bool some_var;
} model_t;
const model_t model = {
true
};
void bla(const model_t *m) {
if (m->some_var) {
printf("Some var is true!\n");
}
else {
printf("Some var is false!\n");
}
}
int main() {
bla(&model);
}
I'd imagine that the compiler has all the information required to eliminate the else clause in the bla() function. The only code path that calls the function comes from main, and it takes in const model_t, so it should be able to figure out that that code path is not being used. However:
With GCC 12.2 we see that the second part is linked in.
If I inline the function this goes away though:
What am I missing here? And is there some way I can make the compiler do some smarter work? This happens in both C and C++ with -O3 and -Os.
The compiler does eliminate the else path in the inlined function in main. You're confusing the global function that is not called anyway and will be discarded by the linker eventually.
If you use the -fwhole-program flag to let the compiler know that no other file is going to be linked, that unused segment is discarded:
[See online]
Additionally, you use static or inline keywords to achieve something similar.
The compiler cannot optimize the else path away as the object file might be linked against any other code. This would be different if the function would be static or you use whole program optimization.
The only code path that calls the function comes from main
GCC can't know that unless you tell it so with -fwhole-program or maybe -flto (link-time optimization). Otherwise it has to assume that some static constructor in another compilation unit could call it. (Including possibly in a shared library, but another .cpp that you link with could do it.) e.g.
// another .cpp
typedef struct { bool some_var; } model_t;
void bla(const model_t *m); // declare the things from the other .cpp
int foo() {
model_t model = {false};
bla(&model);
return 1;
}
int some_global = foo(); // C++ only: non-constant static initializer.
Example on Godbolt with these lines in the same compilation unit as main, showing that it outputs both Some var is false! and then Some var is true!, without having changed the code for main.
ISO C doesn't have easy ways to get init code executed, but GNU C (and GCC specifically) have ways to get code run at startup, not called by main. This works even for shared libraries.
With -fwhole-program, the appropriate optimization would be simply not emitting a definition for it at all, as it's already inlined into the call-site in main. Like with inline (In C++, a promise that any other caller in another compilation unit can see its own definition of the function) or static (private to this compilation unit).
Inside main, it has optimized away the branch after constant propagation. If you ran the program, no branch would actually execute; nothing calls the stand-alone definition of the function.
The stand-alone definition of the function doesn't know that the only possible value for m is &model. If you put that inside the function, then it could optimize like you're expecting.
Only -fPIC would force the compiler to consider the possibility of symbol-interposition so the definition of const model_t model isn't the one that is in effect after (dynamic) linking. But you're compiling code for an executable not a library. (You can disable symbol-interposition for a global variable by giving it "hidden" visibility, __attribute__((visibility("hidden"))), or use -fvisibility=hidden to make that the default).

different behavior when linking with static library vs using object files in C++

I'm working with some legacy C++ code that is behaving in a way I don't understand. I'm using the Microsoft compiler but I've tried it with g++ (on Linux) as well—same behavior.
I have 4 files listed below. In essence, it's a registry that's keeping track of a list of members. If I compile all files and link the object files into one program, it shows the correct behavior: registry.memberRegistered is true:
>cl shell.cpp registry.cpp member.cpp
>shell.exe
1
So somehow the code in member.cpp gets executed (which I don't really understand, but OK).
However, what I want is to build a static library from registry.cpp and member.cpp, and link that against the executable built from shell.cpp. But when I do this, the code in member.cpp does not get executed and registry.memberRegistered is false:
>cl registry.cpp member.cpp /c
>lib registry.obj member.obj -OUT:registry.lib
>cl shell.cpp registry.lib
>shell.exe
0
My questions: how come it works the first way and not the second and is there a way (e.g. compiler/linker options) to make it work with the second way?
registry.h:
class Registry {
public:
static Registry& get_registry();
bool memberRegistered;
private:
Registry() {
memberRegistered = false;
}
};
registry.cpp:
#include "registry.h"
Registry& Registry::get_registry() {
static Registry registry;
return registry;
}
member.cpp:
#include "registry.h"
int dummy() {
Registry::get_registry().memberRegistered = true;
return 0;
}
int x = dummy();
shell.cpp:
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
void main() {
shell *cf = new shell;
cf->init();
}
You have been hit by what is popularly known as static initialization order fiasco.
The basics is that the order of initialization of static objects across translation units is unspecified. See this
The call here Registry::get_registry().memberRegistered; in "shell.cpp" may happen before the call here int x = dummy(); in "member.cpp"
EDIT:
Well, x isn't ODR-used. Therefore, the compiler is permitted not to evaluate int x = dummy(); before or after entering main(), or even at all.
Just a quote about it from CppReference (emphasis mine)
It is implementation-defined whether dynamic initialization
happens-before the first statement of the main function (for statics)
or the initial function of the thread (for thread-locals), or deferred
to happen after.
If the initialization is deferred to happen after the first statement
of main/thread function, it happens before the first odr-use of any
variable with static/thread storage duration defined in the same
translation unit as the variable to be initialized. If no variable or function is odr-used from a given translation unit, the non-local variables defined in that translation unit may never be initialized (this models the behavior of an on-demand dynamic library)...
The only way to get your program working as you want is to make sure x is ODR-used
shell.cpp
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
extern int x; //or extern int dummy();
int main() {
shell *cf = new shell;
cf->init();
int k = x; //or dummy();
}
^ Now, your program should work as expected. :-)
This is a result of the way linkers treat libraries: they pick and choose the objects that define symbols left undefined by other objects processed so far. This helps keep executable sizes smaller, but when a static initialization has side effects, it leads to the fishy behavior you've discovered: member.obj / member.o doesn't get linked in to the program at all, although its very existence would do something.
Using g++, you can use:
g++ shell.cpp -Wl,-whole-archive registry.a -Wl,-no-whole-archive -o shell
to force the linker to put all of your library in the program. There may be a similar option for MSVC.
Thanks a lot for all the replies. Very helpful.
So both the solution proposed WhiZTiM (making x ODR-used) and aschepler (forcing linker to include the whole library) work for me. The latter has my preference since it doesn't require any changes to the code. However, there seems to be no MSVC equivalent for --whole-archive.
In Visual Studio I managed to solve the problem as follows (I have a project for the registry static library, and one for the shell executable):
In the shell project add a reference to the registry project;
In the linker properties of the shell project under General set
"Link Library Dependencies" and "Use Library Dependent Inputs" to
"Yes".
If these options are set registry.memberRegistered is properly initialized. However, after studying the compiler/linker commands I concluded that setting these options results in VS simply passing the registry.obj and member.obj files to the linker, i.e.:
>cl /c member.cpp registry.cpp shell.cpp
>lib /OUT:registry.lib member.obj registry.obj
>link /OUT:shell.exe "registry.lib" shell.obj member.obj registry.obj
>shell.exe
1
To my mind, this is essentially the first approach to my original question. If you leave out registry.lib in the linker command it works fine as well.
Anyway, for now, it's good enough for me.
I'm working with CMake so now I need to figure out how to adjust CMake settings to make sure that the object files get passed to the linker? Any thoughts?

Does static function hide non-static function with the same name?

I tried to look this up, but did not find it anywhere. So here's the question:
Static functions in C/C++ can be used to "make them invisible to the outer world". Great, when having two same-named static functions in two different compiled units (.c files), it makes me sure that I call the right one. But can I also be sure that I call my local static function when there exists a same-named non-static function somewhere in the project or libraries? That is, does the static function locally hide the non-static one?
Sure I can test it (and I did) but I want to know whether this behaviour has fixed definition in C/C++. Thanks.
Edit: Simplified example code which caused unexpected behaviour to me. The question is about the fix of the problem (suppose I cannot change the library).
In mylib.c:
#include "mylib.h"
int send(void * data, int size);
...
int send(void * data, int size) {
return send_message(queueA, data, size);
}
void libfunc(void) {
send(str, strlen(str));
}
In mylib.h:
// only libfunc is declared here
void libfunc(void);
In myprog.c:
#include "mylib.h"
int send(void * data, int size);
...
int send(void * data, int size) {
return send_message(queueB, data, size);
}
void progfunc(void) {
// expected to send a message to queueB
// !!! but it was sent to queueA instead !!!
send(str, strlen(str));
}
Compiled mylib.c + further files -> mylib.a
Compiled myprog.c -> myprog.o
Linked myprog.o + mylib.a -> myprog
You'd get a compilation error because functions have default external linkage, thus the new static function would result in a conflict of linkage specifiers.
If the declaration of the non-static function isn't visible, the static one will be called:
void foo(); //external linkage
static void foo() {}; //internal linkage and error
It does not hide functions with same name declared in the same scope. However you may not have a function with the same signature declared as having internal and external linkage.

global const variable definition - access through extern in c++

I read some answers about this topic, but I am still not sure:
In C++ a global const variable definition is automatically static. However I can access it from another cpp-file through extern:
// module.cpp
const int i = 0;
and
// main.cpp
extern const int i;
int main ()
{
if (i > 10)
return 0;
else
return 1;
}
Why is this possible (accessing an object with internal linkage from another module)? Normally I should be forced to define i as extern const int i = 0 in module.cpp, to have an explicitely global but non-static const, or?
In comparison, this is not possible:
// module.cpp
static int i = 0;
and
// main.cpp
extern int i;
int main ()
{
i = 10; // but read-only access like (i > 10) would be possible!
return 0;
}
So is the answer that: yes, you can access internal linked objects from other modules, but only for read (so always with const)?
Edit:
Sorry, but I made a mistake: in my original code I just tried an expression without effect (in both examples):
extern const int i; // or extern int i for second example
int main ()
{
i>10;
return 0;
}
I thought, that it behaves the same, as if program flow or data was dependent of this expression, but in fact it does not! The compiler seems to just cut out this effectless expression, so that the linker does not see it! So all is alright: in the first example i must be indeed defined extern const int i = 0 in module.cpp, and in the second example i cannot be accessed at all (unless in effectless expression). Compiler is VC++2010.
Edit2:
However, now I don't understand why this is possible:
// module.cpp
extern const int i = 0;
and
// main.cpp
int i = 99;
int main ()
{
bool b = i>10;
return 0;
}
i have both external linkage. But no error. When in module.cpp I define int i = 0, then error (multiple symbols). Why?
As for me it looks like a compiler bug. Constant variable i is not defined It is only declared in main.cpp. As for variable i in module.cpp then it has internal linkage and shall not be accessible outside the module.
Relative to your addition to the original post then the compiler has nothing common with this situation. It is linker that checks whether there are duplicate external symbols. I think it decided that if one variable has qualifier const and other has no it then there are two different variables. I think that it is implementation defined whether the linker will issue an error or not. Moreover it can have some options that could control the behaviour of the linker in such situations.

Static function access in other files

Is there any chance that a function defined with static can be accessed outside the file scope?
It depends upon what you mean by "access". Of course, the function cannot be called by name in any other file since it's static in a different file, but you have have a function pointer to it.
$ cat f1.c
/* static */
static int number(void)
{
return 42;
}
/* "global" pointer */
int (*pf)(void);
void initialize(void)
{
pf = number;
}
$ cat f2.c
#include <stdio.h>
extern int (*pf)(void);
extern void initialize(void);
int main(void)
{
initialize();
printf("%d\n", pf());
return 0;
}
$ gcc -ansi -pedantic -W -Wall f1.c f2.c
$ ./a.out
42
It could be called from outside the scope via function pointer.
For example, if you had:
static int transform(int x)
{
return x * 2;
}
typedef int (*FUNC_PTR)(int);
FUNC_PTR get_pointer(void)
{
return transform;
}
then a function outside the scope can call get_pointer() and use the returned function pointer to call transform.
No, unless there's a bug in the compiler. Normally the static function code is not tagged with a name used for exporting the function in the object file, so it is not presented to the linker and it just can't link to it.
This of course only applies to calling the function by name. Other code within the same file can get the function address and pass it into a non-static function in another file and then the function from another file can call your static function.
It cannot be accessed outside a file by it's name. But, you can as well assign it to function pointer and use it wherever you want.
"Accessed"? It depends on what you mean by this term. I assume when you say "static function" you are talking about standalone function declared static (i.e. declared with internal linkage) as opposed to static class member functions in C++, since the latter are obviosly and easily accessible from anywhere.
Now, a standalone function declared static has internal linkage. It cannot be linked to from any other translation unit. Or, putting it differently, it cannot be referred to by name from any other translation unit. If that's what you meant by "access from outside the file scope", then no, it can't be done.
However, if the other translation units somehow get a pointer to that function (i.e. if you somehow allow that pointer to "leak" into the ouside world), then anybody can still call that function by making an idirect call and thus "access" it. For example, if you declare
static void foo_static(void) {
}
extern void (*foo_ptr)(void) = foo_static;
then in any other translation unit the user will be able to do
extern void (*foo_ptr)(void);
foo_ptr();
and the call will go to your foo_static function. I don't know if that kind of access qualifies as "access" in your question.
Following the standard, a static function cannot be accessed outside of the scope of the file by name because it is subject to internal linkage. It's name is not exported, and not provided to the linker. However, it can still be accessed and called by function pointer, like any other function.
Only with trickery. The function is generally not visible to the linker so it won't let you do it.
But, if you provide a function inside the same compilation unit (as the static function) which returns the address of that function:
In main.c:
#inclde <stdio.h>
int (*getGet7(void))(void);
int main (void) {
int (*fn)(void) = getGet7();
printf ("Result is: %d\n", fn());
return 0;
}
In hidden.c:
static int get7 (void) {
return 7;
}
int (*getGet7(void)) (void) {
return get7;
}
This will result in the static function get7 being called.
pax> gcc -o demo main.c hidden.c ; ./demo
Result is: 7
No, the purpose of the keyword static is to limit the scope of the function name to the file.