Consider the following translation unit:
extern "C" int x = 1;
I know that C-mangling an int doesn't mean much; and that int x = 1 already guarantees external linkage, but this should work. Which it does; the thing is, GCC warns about using it, while clang doesn't: See this on GodBolt.
Which compiler is "right"?
How strongly, if it all would you advise avoiding such code?
According to the standard at [dcl.link.2]
Linkage between C++ and non-C++ code fragments can be achieved using a linkage-specification:
linkage-specification:
extern string-literal { declaration-seq }
extern string-literal declaration
Both brace-enclosed and as a prefix to a single declaration are valid.
Which compiler is correct? I'd say both, as they both accept the code and do the correct thing with it.
GCC's warning is just that, a warning. This is the same kind of warning as a "named parameter is not used anywhere": Information about something unusual, but not affecting the behavior of the program.
Both compilers are correct to compile the code without error.
gcc's warning 'x' initialized and declared 'extern' makes sense as well. "extern" declarations, including extern "C" ones, usually go in header files. Initializing in the header file will get you a multiple-definitions error at link time.
If you use the usual pattern, gcc will not give you any warning:
(in header) extern "C" int x;
(in one source file) int x = 1;
Your question says that you're doing this inside one compilation unit. That's not illegal under the language rules, but it is very sketchy.
Either you aren't using the variable in any other compilation unit (in which case it doesn't need linkage at all, let alone "C" linkage) or else you are repeating the declaration in other compilation units, which risks having a mismatch on the type or language linkage.
Putting declarations in header files in a good practice because it prevents type mismatch across compilation units, which is instant undefined behavior.
Related
I did run into the situation, that I declared two (separate) global variables with the same name in two separate files, but without using static, volatile nor extern on them.
One file was a .c and the other a .cpp file.
The compiler and build environment (GCC) was the ESP IDF and even the data types were different on these declarations.
mqtt_ssl.c:
esp_mqtt_client_handle_t client;
mb_master.cpp:
PL::ModbusClient client(port, PL::ModbusProtocol::rtu, 1);
During the runtime i experienced a lot of problems with reboots of the ESP32 until I found out, that the same name, of two actually separate variables, is causing the issue.
My guess is, that the compiler used the same memory-region for both of them.
I expected, that it gets handled as two separate objects with its own region in the memory, which is obviously not true.
After reading some questions here and there is my understanding now is that the behavior is undefined, if a variable with the same name gets declared without static, extern or volatile in two separate C/C++ files.
It took me quite a while to figure that out.
If it is not allowed to declare it like this, why didn't the compiler/linker throw an error?
Is there an option for GCC to treat such a situation as error to prevent that situation in the future?
Edit 1:
This is a reproducible example with xtensa-esp32-elf-gcc.exe (crosstool-NG esp-2021r2-patch3) 8.4.0.
app_main.c
#include <stdio.h>
void test_sub(void);
uint16_t test1;
void app_main(void)
{
test1 = 1;
printf("app_main:test1=%d\n", test1);
test_sub();
printf("app_main:test1=%d\n", test1);
}
test_sub.c
#include <stdio.h>
int16_t test1;
void test_sub(void)
{
test1 = 2;
printf("test_sub:test1=%d\n", test1);
}
Result:
app_main:test1=1
test_sub:test1=2
app_main:test1=2
test1 in app_main() got overwritten by test_sub(), because it had the same name in both files.
esp_mqtt_client_handle_t client; is a tentative definition. In spite of its name, it is not a definition, just a declaration, but it will cause a definition to be created at the end of the translation unit if there is no regular definition in the translation unit.
The C standard allows C implementations to choose how they resolve multiple definitions (because it says it does not define the behavior, so compilers and linkers may define it). Prior to GCC version 10, the default in GCC was to mark definitions from tentative definitions as “common,” meaning they could be coalesced with other definitions. This results in the linker not complaining about such multiple definitions.
Giving the -fno-common switch to GCC instructs it not to do this; definitions will not be marked as “common,” and the linker will complain if it sees multiple definitions.
Is your problem really that the variable is shaddowed? Often with global variables the problem can be that the variables are not initialized in the right order.
My code mysteriously stop working. I figured out I accidentally wrote int listen; in my main.cpp and used listen in my network.cpp which seems to be trying to call the int as a function instead of the C function. Changing the name of the variable (or making it static) fixed the problem.
Is there any warnings I can turn on so I don't get caught by something like this again? The closest I found was something that suggest I make variables static if they don't need to be extern
Heres code
//a.cpp
#include<cstdio>
int main() { puts("Hello"); }
//b.cpp
int puts;
What you have is an ODR violation. You have two different definitions of puts in two different TUs. The standard (N4713 - C++17 draft) says
§6.2 One-definition rule [basic.def.odr]
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program
outside of a discarded statement (9.4.1); no diagnostic required.
This is of course for C++. C has similar rules.
Because of the "no diagnostic required" the compiler chain is not required to issue errors or warnings for your code.
As far as I know there are no flags on popular compilers to issue error/warnings for ODR violations. This is because of how compilers optimize their parsing. See this great answer for more info on that.
There are some good practices that you can follow to minimize this kind of mistake:
In C++ don't declare symbols at global space. Use namespaces.
In C you can prefix global variables with g_. If you write a library you also add a library prefix to everything global.
(as you've suggested) make symbols that are used only in their TU have internal linkage (declare them static or in an unnamed namespace).
use more meaningful, relevant names. E.g. instead of listen you can name it server_is_listening (or something like that that makes sense for its use).
-Werror=missing-variable-declarations may help. It forces you to use static which avoids this problem or declare the variable elsewhere (ie a header). Including that header in any file that includes the C function will cause an error causing another chance to catch the error
int a;
int a=3; //error as cpp compiled with clang++-7 compiler but not as C compiled with clang-7;
int main() {
}
For C, the compiler seems to merge these symbols into one global symbol but for C++ it is an error.
Demo
file1:
int a = 2;
file2:
#include<stdio.h>
int a;
int main() {
printf("%d", a); //2
}
As C files compiled with clang-7, the linker does not produce an error and I assume it converts the uninitialised global symbol 'a' to an extern symbol (treating it as if it were compiled as an extern declaration). As C++ files compiled with clang++-7, the linker produces a multiple definition error.
Update: the linked question does answer the first example in my question, specifically 'In C, If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.' and 'C++ does not have “tentative definitions”'.
As for the second scenario, if I printf a, then it does print 2, so obviously the linker has linked it correctly (but I previously would have assumed that a tentative definition would be initialised to 0 by the compiler as a global definition and would cause a link error).
It turns out that int i[]; tentative defintion in both files also gets linked to one definition. int i[5]; is also a tentative definition in .common, just with a different size expressed to the assembler. The former is known as a tentative definition with an incomplete type, whereas the latter is a tentative definition with a complete type.
What happens with the C compiler is that int a is made strong-bound weak global in .common and left uninitialised (where .common implies a weak global) in the symbol table (whereas extern int a would be an extern symbol), and the linker makes the necessary decision, i.e. it ignores all weak-bound globals defined using #pragma weak if there is a strong-bound global with the same identifier in a translation unit, where 2 strong-bounds would be a multiple definition error (but if it finds no strong-bounds and 1 weak-bound, the output is a single weak-bound, and if it finds no strong-bounds but two weak-bounds, it chooses the definition in the first file on the command line and outputs the single weak-bound. Though two weak-bounds are two definitions to the linker (because they are initialised to 0 by the compiler), it is not a multiple definition error, because they are both weak-bound) and then resolves all .common symbols to point to the strong/weak-bound strong global. https://godbolt.org/z/Xu_8tY https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter2-93321/index.html
As baz is declared with #pragma weak, it is weak-bound and gets zeroed by the compiler and put in .bss (even though it is a weak global, it doesn't go in .common, because it is weak-bound; all weak-bound variables go in .bss if uninitialised and get initialised by the compiler, or .data if they are initialised). If it were not declared with #pragma weak, baz would go in common and the linker will zero it if no weak/strong-bound strong global symbol is found.
C++ compiler makes int a a strong-bound strong global in .bss and initialises it to 0: https://godbolt.org/z/aGT2-o, therefore the linker treats it as a multiple definition.
Update 2:
GCC 10.1 defaults to -fno-common. As a result, global variable targets are more efficient on various targets. In C, global variables with multiple tentative definitions now result in linker errors (like C++). With -fcommon such definitions are silently merged during linking.
I'll address the C end of the question, since I'm more familiar with that language and you seem to already be pretty clear on why the C++ side works as it does. Someone else is welcome to add a detailed C++ answer.
As you noted, in your first example, C treats the line int a; as a tentative definition (see 6.9.2 in N2176). The later int a = 3; is a declaration with an initializer, so it is an external definition. As such, the earlier tentative definition int a; is treated as merely a declaration. So, retroactively, you have first declared a variable at file scope and later defined it (with an initializer). No problem.
In your second example, file2 also has a tentative definition of a. There is no external definition in this translation unit, so
the behavior is exactly as if the translation
unit contains a file scope declaration of that identifier, with the composite type as of the end of the
translation unit, with an initializer equal to 0. [6.9.2 (1)]
That is, it is as if you had written int a = 0; in file2. Now you have two external definitions of a in your program, one in file1 and another in file2. This violates 6.9 (5):
If an identifier declared with external linkage is used in an expression
(other than as part of the operand of a sizeof or _Alignof operator whose result is an integer
constant), somewhere in the entire program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
So under the C standard, the behavior of your program is undefined and the compiler is free to do as it likes. (But note that no diagnostic is required.) With your particular implementation, instead of summoning nasal demons, what your compiler chooses to do is what you described: use the common feature of your object file format, and have the linker merge the definitions into one. Although not required by the standard, this behavior is traditional at least on Unix, and is mentioned by the standard as a "common extension" (no pun intended) in J.5.11.
This feature is quite convenient, in my opinion, but since it's only possible if your object file format supports it, we couldn't really expect the C standard authors to mandate it.
clang doesn't document this behavior very clearly, as far as I can see, but gcc, which has the same behavior, describes it under the -fcommon option. On either compiler, you can disable it with -fno-common, and then your program should fail to link with a multiple definition error.
When I declare a global variable in two different source files and only define it in one of the source files, I get different results compiling for C++ than for C. See the following example:
main.c
#include <stdio.h>
#include "func.h" // only contains declaration of void print();
int def_var = 10;
int main() {
printf("%d\n", def_var);
return 0;
}
func.c
#include <stdio.h>
#include "func.h"
/* extern */int def_var; // extern needed for C++ but not for C?
void print() {
printf("%d\n", def_var);
}
I compile with the following commands:
gcc/g++ -c main.c -o main.o
gcc/g++ -c func.c -o func.o
gcc/g++ main.o func.o -o main
g++/clang++ complain about multiple definition of def_var (this is the behaviour I expected, when not using extern).
gcc/clang compile just fine. (using gcc 7.3.1 and clang 5.0)
According to this link:
A tentative definition is a declaration that may or may not act as a definition. If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.
So my variable def_var should be defined at the end of each translation unit and then result in multiple definitions (as it is done for C++). Why is that not the case when compiling with gcc/clang?
This isn't valid C either, strictly speaking. Says as much in
6.9 External definitions - p5
An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an
object. If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
You have two definitions for an identifier with external linkage. You violate that requirement, the behavior is undefined. The program linking and working is not in opposition to that. It's not required to be diagnosed.
And it's worth noting that C++ is no different in that regard.
[basic.def.odr]/4
Every program shall contain exactly one definition of every non-inline
function or variable that is odr-used in that program outside of a
discarded statement; no diagnostic required. The definition can appear
explicitly in the program, it can be found in the standard or a
user-defined library, or (when appropriate) it is implicitly defined
(see [class.ctor], [class.dtor] and [class.copy]). An inline function
or variable shall be defined in every translation unit in which it is
odr-used outside of a discarded statement.
Again, a "shall" requirement, and it says explicitly that no diagnostic is required. As you may have noticed, there's quite a bit more machinery that this paragraph can apply to. So the front ends for GCC and Clang probably need to work harder, and as such are able to diagnose it, despite not being required to.
The program is ill-formed either way.
As M.M pointed out in a comment, the C standard has an informative section that mentions the very extension in zwol's answer.
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of
an object, with or without the explicit use of the keyword extern; if
the definitions disagree, or more than one is initialized, the
behavior is undefined (6.9.2).
I believe you are observing an extension to C known as "common symbols", implemented by most, but not all, Unix-lineage C compilers, originally (IIUC) for compatibility with FORTRAN. The extension generalizes the "tentative definitions" rule described in StoryTeller's answer to multiple translation units. All external object definitions with the same name and no initializer,
int foo; // at file scope
are collapsed into one, even if they appear in more than one TU, and if there exists an external definition with an initializer for that name,
int foo = 1; // different TU, also file scope
then all of the external definitions with no initializers are treated as external declarations. C++ compilers do not implement this extension, because (oversimplifying) nobody wanted to figure out what it should do in the presence of templates. For GCC and Clang, you can disable the extension with -fno-common, but other Unix C compilers may not have any way to turn it off.
main.h
extern int array[100];
main.c
#include "main.h"
int array[100] = {0};
int main(void)
{
/* do_stuff_with_array */
}
In the main.c module, the array is defined, and declared. Does the act of also having the extern statement included in the module, cause any problems?
I have always visualized the extern statement as a command to the linker to "look elsewhere for the actual named entity. It's not in here.
What am I missing?
Thanks.
Evil.
The correct interpretation of extern is that you tell something to the compiler. You tell the compiler that, despite not being present right now, the variable declared will somehow be found by the linker (typically in another object (file)). The linker will then be the lucky guy to find everything and put it together, whether you had some extern declarations or not.
To avoid exposure of names (variables, functions, ..) outside of a specific object (file), you would have to use static.
yea, it's harmless. In fact, I would say that this is a pretty standard way to do what you want.
As you know, it just means that any .c file that includes main.h will also be able to see array and access it.
Edit
In both C and C++, the presence of extern indicates that the first declaration is not a definition. Therefore, it just makes the name available in the current translation unit (anyone who includes the header) and indicates that the object referred to has external linkage - i.e. is available across all the translation units making up the program. It's not saying that the object is necessarily located in another translation unit - just that 'this line isn't the definition'.
End edit
In C, the extern is optional. Without it, the first declaration is a 'tentative definition'. If it were not for the later definition (which is unambiguously a definition because it has an initializer), this would be treated as a definition (C99 6.9.2). As it is, it's just a declaration and does not conflict.
In C++, the extern is not optional - without it, the first declaration is a definition (C++03 3.1) which conflicts with the second.
This difference is explicitly pointed out in Annex C of C++:
"Change: C++ does not have “tentative definitions” as in C
E.g., at file scope,
int i;
int i;
is valid in C, invalid in C++."
The extern is harmless and correct. You couldn't declare it in the header without extern.
As an extra, usually it is best practice to create a macro or a constant to hold the size of the array; in your code the actual size (100) appears twice in the source base. It would be cleaner to do it like this:
#define ARRAY_SIZE 100
extern int array[ARRAY_SIZE];
...
int array[ARRAY_SIZE] = { 0 };
But maybe you did not want to include this in the code snippet just for the sake of brevity, so please take no offense :)
From a compilation or execution point of view, it makes no difference.
However, it's potentially dangerous as it makes array[] available to any other file which #includes main.h, which could result in the contents of array[] being changed in another file.
So, if array[] will only ever be used in main.c, remove the line from main.h, and declare array[] as static in main.c.
If array[] will only be used in the main() function, declare it in there.
In other words, array[] should have its scope limited to the smallest possible.