C++ name mangling in C - c++

C language does not use name mangling like C++. This can lead to subtle bugs, when function prototype is declared differently in different files. Simple example:
/* file1.c */
int test(int x, int y)
{
return y;
}
/* file2.c */
#include <stdio.h>
extern int test(int x);
int main()
{
int n = test(2);
printf("n = %d\n", n);
return 0;
}
When compiling such code using C compiler (in my case gcc) no errors are reported. After switching to C++ compiler, linking will fail with error "undefined reference to 'test(int)'". Unfortunately in practice this is not so easy - there are cases when code is accepted by C compiler (with possible warning messages), but compilation fails when using C++ compiler.
This is of course bad coding practice - all function prototypes should be added to .h file, which is then included in files where function is implemented or used. Unfortunately in my app there are many cases like this, and fixing all of them is not possible in short term. Switching to g++ is also not at option, I got compilation error quite fast.
One of possible solutions would be to use C++ name mangling when compiling C code. Unfortunately gcc does not allow to do this - I did not found command line option to do this. Do you know if it is possible to do this (maybe use other compiler?). I also wonder if some static analysis tools are able to catch this.

Using splint catches these kinds of errors.
foo.c:
int test(int x);
int main() {
test(0);
}
bar.c:
int test(int x, int y) {
return y;
}
Running splint:
$ splint -weak foo.c bar.c
Splint 3.1.2 --- 20 Feb 2009
bar.c:1:5: Function test redeclared with 2 args, previously declared with 1
Types are incompatible. (Use -type to inhibit warning)
foo.c:4:5: Previous declaration of test
Finished checking --- 1 code warning

~/dev/temp$ cat > a.c
int f(int x, int y) { return x + y; }
~/dev/temp$ cat > b.c
extern int f(int x); int g(int x) { return f(x + x); }
~/dev/temp$ splint *.c
Splint 3.1.2 --- 03 May 2009
b.c:1:12: Function f redeclared with 1 arg, previously declared with 2
Types are incompatible. (Use -type to inhibit warning)
a.c:1:5: Previous declaration of f
Finished checking --- 1 code warning
~/dev/temp$

Related

Multi-dimensional array of unknown bounds argument : difference between C and C++

The following program compiles as a C program:
#include <stdlib.h>
#include <stdio.h>
void f(int n, int m, int x[n][m]) {
printf("x[0][2] = %i\n",x[0][2]);
}
int main() {
int v[][3] = { {0,1,2}, {3,4,5} };
f(2,3,v);
}
However, when compiled as C++ with g++, I have:
main.c:4:29: error: use of parameter outside function body before ‘]’ token
void f(int n, int m, int x[n][m]) {
^
It seems that this feature of C does not exist in C++. Is there any flag that can be given to g++ so that it accepts the code?
It seems that this feature of C does not exist in C++.
Correct.
Is there any flag that can be given to g++ so that it accepts the code?
No, there is no such feature allowing a VLA as part of a parameter list. You will have to compile the code as C.
Other similar gcc extensions exist, see: https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html

Make a object accessible by only its library, and not by any other routine in the program

Lets say I have two (or more) c functions func1() and func2() both requiring a buffer variable int buff. If both functions are kept in separate files, func1.c and func2.c, How do I make it so that buff is accessible to only func1() and func2() and not to the calling routine(or any other routine).
Here is an example setup:
file func1.c:
/*func1.c*/
static int buff;
int *func1(int x)
{
buff = x;
return &buff;
}
file func2.c:
/*func2.c*/
static int buff;
int *func2(int x)
{
buff = x;
return &buff;
}
header header.h:
/*header for func1.c and func2.c*/
//multiple inclusion guard not present.
int *func1(int);
int *func2(int);
file main.c:
#include<stdio.h>
#include"header.h"
int main()
{
int *ptr;
ptr = func1(1);
printf("&buff = %p , buff = %d\n", ptr, *ptr);
ptr = func2(2);
printf("&buff = %p , buff = %d\n", ptr, *ptr);
return 0;
}
As expected, the output shows different memory locations for buff.
&buff = 0x55b8fd3f0034 , buff = 1
&buff = 0x55b8fd3f0038 , buff = 2
But I need only one copy buff, not more.
I could of course, put both functions in the same file, and define buff as static int but then I would lose the ability to compile the functions separately.
If I put int buff in a separate buff.c and declare it extern in func1.c and func2.c, but then it would be easily accessible by the calling routine(main in this case).
Basically, I need to create a library of functions that work on the same external object, that is accessible only to them. The calling routine may not need all the functions, so I do not want to put them in a single file and create unused code. But there must be only one copy of the object.
Please help on how I could do the same, if it is achievable.
The C standard does not provide a way to do this. It is usually done using features of compilers and linkers beyond the C standard. Here is an example using Apple’s developer tools on macOS. For options suitable to your environment, you should specify the build tools and versions you are using, such as whether you are using Apple tools, GNU tools, Microsoft tools, or something else.
With this in a.c:
#include <stdio.h>
int x = 123;
void a(void)
{
printf("In a.c, x is %d.\n", x);
}
and this in b.c:
#include <stdio.h>
extern int x;
void b(void)
{
printf("In b.c, x is %d.\n", x);
}
we compile the source files to object modules:
clang -c a.c b.c
and then link them to a new object module r.o while requesting that the symbol x (_x in the linker view) not be exported:
ld -r -o r.o -unexported_symbol _x a.o b.o
Then, if we have another source file c.c that attempts to use x:
#include <stdio.h>
extern int x;
extern void a(void);
extern void b(void);
int main(void)
{
a();
b();
printf("In c.c, x is %d.\n", x);
}
attempting to build an executable with it using clang -o c c.c r.o yields:
Undefined symbols for architecture x86_64:
"_x", referenced from:
_main in c-139a35.o
ld: symbol(s) not found for architecture x86_64
However, if we remove the two lines in c.c that refer to x, the build succeeds, and the program prints:
In a.c, x is 123.
In b.c, x is 123.
One typical approach to this problem is to give the global variable a name that begins with _.
That is, in func1.c you might write
int _mylib_buff;
And then in func2.c, of course, you'd have
extern int _mylib_buff;
Now, of course, in this case, _mylib_buff is technically an ordinary global variable. It's not truly "private" at all. But global variables beginning with _ are private "by convention", and I'd say this works okay in practice. But, obviously, there's nothing preventing some other source file from cheating and peeking at the nominally-private variable, and there's no way in Standard C to prevent one from doing so.
The other complication is that some identifiers beginning with _ are reserved to the implementation, and you're not supposed to use them in your own code. (That is, components of the implementation -- like your C compiler and C library -- have semi-global variables they're trying to hide from you, and they're typically using a leading _ to achieve this, also.) I'm pretty sure the rules say it's okay for you to define a global variable beginning with a leading underscore followed by a lower-case letter, but the rules are somewhat complicated, and I can never remember all the nuances. See questions 1.9 and 1.29 in the C FAQ list.
The answer is: It's not possible.
C has no way of saying "this variable may be used by source file x, y, z and not by any other sources files".
So if you want buff to be "private" to a number of functions, you'll have to put those functions in the same source file.
You need to define the non-static variable in one of the files for example:
int buff;
int *func1(int x)
{
buff = x;
return &buff;
}
in the header file declare it as extern:
/*header for func1.c and func2.c*/
//multiple inclusion guard not present.
extern int buff;
int *func1(int);
int *func2(int);
Include it in all other files:
/*func2.c*/
#include "header.h"
int *func1(int x)
{
buff = x;
return &buff;
}
If you do not want variable to be visible you need to create function which will get and set the "hidden" variable.
typedef enum
{
GET,
SET,
REF,
}OP_t;
#define CREATE(type, name) type getset##name(OP_t oper, type val, type **ref) \
{\
static type buff;\
switch(oper)\
{\
case GET:\
return buff;\
case SET:\
buff = val;\
break;\
case REF:\
if(ref) *ref = &buff;\
break;\
}\
return 0;\
}\
#define HEAD(type, name) type getset##name(OP_t oper, type val, type **ref)
#define GETVAL(name) getset##name(GET, 0, NULL)
#define SETVAL(name,val) getset##name(SET, val, NULL)
#define GETREF(name,ref) getset##name(REF, 0, ref)

Is the return type of a function part of the mangled name?

Suppose I have two functions with the same parameter types and name (not in the same program):
std::string foo(int x) {
return "hello";
}
int foo(int x) {
return x;
}
Will they have the same mangled name once compiled?
Is the the return type part of the mangled name in C++?
As mangling schemes aren't standardised, there's no single answer to this question; the closest thing to an actual answer would be to look at mangled names generated by the most common mangling schemes. To my knowledge, those are the GCC and MSVC schemes, in alphabetical order, so...
GCC:
To test this, we can use a simple program.
#include <string>
#include <cstdlib>
std::string foo(int x) { return "hello"; }
//int foo(int x) { return x; }
int main() {
// Assuming executable file named "a.out".
system("nm a.out");
}
Compile and run with GCC or Clang, and it'll list the symbols it contains. Depending on which of the functions is uncommented, the results will be:
// GCC:
// ----
std::string foo(int x) { return "hello"; } // _Z3fooB5cxx11i
// foo[abi:cxx11](int)
int foo(int x) { return x; } // _Z3fooi
// foo(int)
// Clang:
// ------
std::string foo(int x) { return "hello"; } // _Z3fooi
// foo(int)
int foo(int x) { return x; } // _Z3fooi
// foo(int)
The GCC scheme contains relatively little information, not including return types:
Symbol type: _Z for "function".
Name: 3foo for ::foo.
Parameters: i for int.
Despite this, however, they are different when compiled with GCC (but not with Clang), because GCC indicates that the std::string version uses the cxx11 ABI.
Note that it does still keep track of the return type, and make sure signatures match; it just doesn't use the function's mangled name to do so.
MSVC:
To test this, we can use a simple program, as above.
#include <string>
#include <cstdlib>
std::string foo(int x) { return "hello"; }
//int foo(int x) { return x; }
int main() {
// Assuming object file named "a.obj".
// Pipe to file, because there are a lot of symbols when <string> is included.
system("dumpbin/symbols a.obj > a.txt");
}
Compile and run with Visual Studio, and a.txt will list the symbols it contains. Depending on which of the functions is uncommented, the results will be:
std::string foo(int x) { return "hello"; }
// ?foo##YA?AV?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##H#Z
// class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > __cdecl foo(int)
int foo(int x) { return x; }
// ?foo##YAHH#Z
// int __cdecl foo(int)
The MSVC scheme contains the entire declaration, including things that weren't explicitly specified:
Name: foo# for ::foo, followed by # to terminate.
Symbol type: Everything after the name-terminating #.
Type and member status: Y for "non-member function".
Calling convention: A for __cdecl.
Return type:
H for int.
?AV?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std# (followed by # to terminate) for std::basic_string<char, std::char_traits<char>, std::allocator<char>> (std::string for short).
Parameter list: H for int (followed by # to terminate).
Exception specifier: Z for throw(...); this one is omitted from demangled names unless it's something else, probably because MSVC just ignores it anyway.
This allows it to whine at you if declarations aren't identical across every compilation unit.
Generally, most compilers will use one of those schemes (or sometimes a variation thereof) when targeting *nix or Windows, respectively, but this isn't guaranteed. For example...
Clang, to my knowledge, will use the GCC scheme for *nix, or the MSVC scheme for Windows.
Intel C++ uses the GCC scheme for Linux and Mac, and the MSVC scheme (with a few minor variations) for Windows.
The Borland and Watcom compilers have their own schemes.
The Symantec and Digital Mars compilers generally use the MSVC scheme, with a few small changes.
Older versions of GCC, and a lot of UNIX tools, use a modified version of cfront's mangling scheme.
And so on...
Schemes used by other compilers are thanks to Agner Fog's PDF.
Note:
Examining the generated symbols, it becomes apparent that GCC's mangling scheme doesn't provide the same level of protection against Machiavelli as MSVC's. Consider the following:
// foo.cpp
#include <string>
// Simple wrapper class, to avoid encoding `cxx11 ABI` into the GCC name.
class MyString {
std::string data;
public:
MyString(const char* const d) : data(d) {}
operator std::string() { return data; }
};
// Evil.
MyString foo(int i) { return "hello"; }
// -----
// main.cpp
#include <iostream>
// Evil.
int foo(int);
int main() {
std::cout << foo(3) << '\n';
}
If we compile each source file separately, then attempt to link the object files together...
GCC: MyString, due to not being part of the cxx11 ABI, causes MyString foo(int) to be mangled as _Z3fooi, just like int foo(int). This allows the object files to be linked, and an executable is produced. Attempting to run it causes a segfault.
MSVC: The linker will look for ?foo##YAHH#Z; as we instead supplied ?foo##YA?AVMyString##H#Z, linking will fail.
Considering this, a mangling scheme that includes the return type is safer, even though functions can't be overloaded solely on differences in return type.
No, and I expect that their mangled name will be the same with all modern compilers. More importantly, using them in the same program results in undefined behavior. Functions in C++ cannot differ only in their return type.

Compiling C program in C++ compiler

I wrote a program in C and I want to use C++ library in this code, I though that I will be able to compile the C in g++ since C++ built in top of C. However, I couldn't do that and the main error was because in one part of the code I wrote a function to read data from input file, before the main function. That worked well in C compiler but not in Cpp compiler.
Below is some of the error messages I got, so I'd like to get general comments and points to take into consideration when use c and cpp interchangeably
error : ‘get_inputs’ was not declared in this scope
error: use of parameter outside function body before ‘]’ token
Following program compiles in C with a warning such as: 'bar' undefined; assuming extern returning int
void foo()
{
bar(5);
}
int bar(int x)
{
return x*2;
}
If you want this to compile in C++ you must declare bar before you use it:
int bar(int x); // forward declaration
void foo()
{
bar(5);
}
int bar(int x)
{
return x*2;
}
Even in C it's good practice to use forward declarations and to enable all compiler warnings otherwise the error in following program will slip through:
void foo()
{
bar(); // calling bar without argument....
}
int bar(int x)
{
return x*2; // ... will result in an undefined value for x here
}

Linker error-Calling function in C++ file from C file

I am trying to execute basic code in C and C++ in Linux environment.
I am using eclipse to run it. Current project is created as C project.
All I am trying to do is to call a function from different file in same folder.
I have my main in sample.c, In main I would like to call function sum(int a, int b) in A.c. I was able to run it. But when I rewrite same function sum in A.cpp(a C++ template file) it throws linker error.
gcc -o "Test" ./sample.o
./sample.o: In function
main':/home/idtech/workspace/Test/Debug/../sample.c:19: undefined
reference to sum' collect2: ld returned 1 exit status make: * [Test]
Error 1
I need help in calling functions in C++ file from C file in same folder.
Please help me to solve this linker issue.
Thanks
Harsha
The C++ compiler mangles symbol names in order to encode type information. Typically, when writing C++ functions that should be exposed to C code, you'll want to wrap the function in an extern "C" { ... } block, like so (or just prefix it with extern "C" as #DaoWen pointed out):
A.cpp:
extern "C" {
int sum(int a, int b)
{
return a+b;
}
}
caller.c:
extern int sum(int a, int b);
...
int main() { sum(42, 4711); }
By marking a function as extern "C", you're sacrificing the ability to overload it, because different overloads are distinguishable only by their mangled symbol names, and you just requested that mangling be turned off! What it means is that you cannot do this:
extern "C" {
int sum(int a, int b) { return a+b; }
float sum(float a, float b) { return a+b; } // conflict!
}