I am looking after a huge old C program and converting it to C++ (which I'm new to).
There are a great many complicated preprocessor hacks going on connected to the fact that the program must run on many different platforms in many different configurations.
In one file (call it file1.c) I am calling functionA().
And in another file (call it file2.c) I have a definition of functionA().
Unfortunately the exact type of the function is specified by a collection of macros created in a bewildering number of ways.
Now the linker is complaining that:
functionA is an unresolved external symbol.
I suspect that the problem is that the prototype as seen in file1.c is slightly different from the true definition of the function as seen in file2.c.
There is a lot of scope for subtle differences due to mismatches between _cdecl and fastcall, and between with and without __forceinline.
Is there some way to show exactly what the compiler thinks is the type of functionA() as seen by file1.c as opposed to file2.c?
You can pass a flag to the compiler (/P, I think) that causes it to output the complete preprocessed output that is passed to the compiler - you can then open this (huge) file, and search through it and the information you need will be in there, somewhere.
Must you actually convert all the existing C code to C++? This is likely to be a lot of work, especially given what you've described so far.
Instead, you can write new code in C++ and call into the C code using extern "C". For example, in a C++ source file you can:
extern "C" {
#include "old_c_header.h"
}
This changes the linkage so the C++ compiler generates external references to the C code without name mangling, allowing the linker to match everything up.
Normally you should have the expected and the actual signature in the output.
Otherwise you can instruct the compiler to output the results of the preprocessing into a seperate file, cl.exe /p for MSVC and for gcc gcc -E.
Related
I have a project in pure C - st usb library and I need to migrate it to c++ and change same structures into classes. I removed all c++ "protections" like:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
I changed all files extensions from .c to .cpp (except HAL library).
I realized that c++ .hex is 7kB smaller then c .hex. When I looked into .map file I saw that many functions are missing. I thought that staticfunctions caused that, but removing static key word didn't help. Does anyone have idea what could cause that some functions weren't compiled. When extensions are .c everything is fine.
I can think of two main reasons:
Inlining. The compiler can decide there is no need to emit the function as a standalone function if all usages can be inlined.
Unused code. The compiler can see that a function isn't used anywhere in your code and decide to eliminate it from the final result.
If the result is to be used as sort of library, that your environment calls specific functions without having an explicit call in your own code, I think the best method is to compile and link your code as a library (probably dynamic lib) and exporting those functions as library interface (visibility in gcc, dllexport in MSVC) will force the compiler/linker to include them in the produced binary even if they don't see why they are needed right now.
(Of course, this is a wild guess about your target environment.)
Another option is to turn off the specific compiler/linker optimizations for removing dead code and forcing emitting standalone function instance for inlined functions, but I think it's very indirect approach, have a wider effect and can complicate maintenance later.
C++ functions are given different signatures than C functions. Since you lost functionality and the code is much smaller, it's likely that a function that requires C linkage is being compiled as C++ and the signature mismatch is preventing proper linking.
A likely place for this to happen is in the interrupt vector table. If the handler function is compiled with C++ linkage, the handler address won't make it into a table that's compiled with C.
Double check the interrupt vectors and make sure that they reference the correct functions. If they are correct, check any other code compiled with C that might reference an external symbol compiled with C++.
I have a 10+ years old C library which -- I believe -- used to work just fine in the good old days, but when I tried to use it with a C++ source (containing the main function) the other day I ran into some difficulties.
Edit: to clarify, the C library compiles just fine with gcc, and it generates an object file old_c_library.o. This library was supposed to be used in a way so that the C header file old_c_library.h is #included in your main.c C source file. Then your main C source file should be compiled and linked together with old_c_library.o via gcc. Here I want to use a C++ source file main.cpp instead, and compile/link it with g++.
The following three problems occurred, during the compilation of the C++ source file:
one of the header files of the C library contains the C++ reserved word new (it is the name of an integer), which resulted in fatal error; and
one of the header files of the C library contains a calloc call (an explicit typecast is missing), which resulted in fatal error; and
various files of the C library contain code where comparison of signed and unsigned integers happen, which resulted in warnings.
Edit: I tried to use the #extern "C" { #include "obsolete_c_library.h" } "trick", as suggested in the comments, but that did not solve any of my problems.
I can sort out problem 1 by renaming all instances of the reserved words and replacing them by -- basically -- anything else. I can sort out problem 2 by typecasting the calloc call. I might try to sort out the warnings by ideas suggested here: How to disable GCC warnings for a few lines of code.
But I still wonder, is there a way to overcome these difficulties in an elegant, high-level way, without actually touching the original library?
Relevant:
Where is C not a subset of C++? and Do I cast the result of malloc? and How do I use extern to share variables between source files?.
Generally speaking, it is not safe to #include C header files into C++ sources if those headers were not built in anticipation of such usage. Under some circumstances it can be made to work, but you need to be prepared to either modify the headers or write your own declarations for the functions and global variables you want to access.
At minimum, if the C headers declare any functions and you are not recompiling those functions in C++ then you must ensure that the declarations are assigned C linkage in your C++ code. It's not uncommon for C headers to account for that automatically via conditional compilation directives, but if they do not then you can do it on the other side by wrapping the inclusion(s) in a C linkage block:
extern "C" {
#include "myclib.h"
}
If the C headers declare globals whose names conflict with C++ keywords, and which you do not need to reference, then you may be able to use the preprocessor to redefine them:
#define new extern_new
#include "myclib.h"
#undef new
That's not guaranteed to work, but it's worth a try. Do not forget to #undef such macros after including the C headers, as shown.
There may be other fun tricks you can play with macros to adapt specific headers to C++, but at some point it makes more sense to just copy / rewrite the needed declarations (and only those), either in your main C++ source or in your own C++ header. Note that doing so does not remove the need to declare C linkage -- that requirement comes from the library having been compiled by a C compiler rather than a C++ compiler.
You can write a duplicate of the c header with the only difference being lack of declaration of extern int new. Then use the newly created c++ friendly header instead of the old, incompatible one.
If you need to refer that variable from c++, then you need to do something more complex. You will need to write a new wrapper library in c, that exposes read and write functions a pointer to c++ that can be used to access the unfortunately named variable.
If some inline functions refer to the variable, then you can leave those out from the duplicate header and if you need to call them from c++, re-implement them in a c++ friendly manner. Just don't give the re-implementation same name within extern "C", because that would give you a conflict with the original inline function possibly used by some existing c code.
Do the same header duplication as described in 1. leaving out the problematic inline function and re-implementing if needed.
Warnings can be ignored or disabled as you already know. No need to modify the original headers. You could rewrite any {const,type}-unsafe inline functions with versions that don't generate warnings, but you have to consider whether your time is worth it.
The disadvantage of this approach is that you now have two vesions of some/all of the headers. New ones used by c++, and the old ones that may be used by some old code.
If you can give up the requirement of not touching the original library and don't need to work with an existing compiled binary, then an elegant solution would be to simply rename the problematic variables (1.). Make non-c++-compatible inline functions (2.) non-inline and move them into c source files. Fix unsafe code (3.) that generates warnings or if the warning is c++ specific, simply make the function non-inline.
I have the situation that two C++ Libraries export the very same C-function symbols from shared code. When I now compile an executable which links both libraries, I do not get any linker error or warning from VC12. Why is this? It silently just chooses one of the two symbols and I have no idea which one is chosen.;
extern "C" { __declspec(dllexport) int function(void* argument);}
There is a flag named /FORCE which whould be able to convice VC to compile even if there are multiply defined symbols, but this flag is not set.
I do not find any official information from Microsoft why this links at all. I was expecting to get a LNK4006 warning, but I don't.
I just want to know if this is the expected or undefined behavior, which only did not explode by coincidence. I read things about the One Definition Rule not being applied generally to C-Code, but I cannot find any reliable statement for the VC compiler.
Can I assume that, given the functions do not use any singletons, use the very same code and compiler flags, it does not matter which one is chosen?
You are violating the one definition rule.
The behavior of you program is undefined.
See section "3.2 One definition rule [basic.def.odr]" in the C++ standard.
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. ...
Section 3.2.6 describes when there can be more than one definition
of a class type, inline function with external linkage etc. in a program.
I just want to know if this is the expected or undefined behavior, which only did not explode by coincidence. I read things about the One Definition Rule not being applied generally to C-Code, but I cannot find any reliable statement for the VC compiler.
It is undefined behavior.
The C++ standard is the master, not the VC compiler.
Can I assume that, given the functions do not use any singletons, use the very same code and compiler flags, it does not matter which one is chosen?
It is still undefined behavior - though the program might appear to behave as expected.
I'm currently updating a C++ library for Arduino (Specifically 8-bit AVR processors compiled using avr-gcc).
Typically the authors of the default Arduino libraries like to include an extern variable for the class inside the header, which is defined in the class .cpp file also. This I assume is basically to have everything provided ready to go for newbies as built-in objects.
The scenario I have is: The library I have updated no longer requires the .cpp file and I have removed it from the library. It wasn't until I went on a final pass checking for bugs that I realized, no linker error was produced despite the fact a definition wasn't provided for the extern variable in a .cpp file.
This is as simple as I can get it (header file):
struct Foo{
void method() {}
};
extern Foo foo;
Including this code and using it in one or many source files does not cause any linker error. I have tried it in both versions of GCC which Arduino uses (4.3.7, 4.8.1) and with C++11 enabled/disabled.
In my attempt to cause an error, I found it was only possible when doing something like taking the address of the object or modifying the contents of a dummy variable I added.
After discovering this I find its important to note:
The class functions only return other objects, as in, nothing like operators returning references to itself, or even a copy.
It only modifies external objects (registers which are effectively volatile uint8_t references in code), and returns temporaries of other classes.
All of the class functions in this header are so basic that they cost less than or equal to the cost of a function call, therefore they are (in my tests) completely in-lined into the caller. A typical statement may create many temporary objects in the call chain, however the compiler sees through these and outputs efficient code modifying registers directly, rather than a set of nested function calls.
I also recall reading in n3797 7.1.1 - 8 that extern can be used on incomplete types, however the class is fully defined whereas the declaration is not (this is probably irrelevant).
I'm led to believe that this may be a result of optimizations at play. I have seen the effect that taking the address has on objects which would otherwise be considered constant and compiled without RAM usage. By adding any layer of indirection to an object in which the compiler cannot guarantee state will cause this RAM consuming behavior.
So, maybe I've answered my question by simply asking it, however I'm still making assumptions and it bothers me. After quite some time hobby-coding C++, literally the only thing on my list of do-not's is making assumptions.
Really, what I want to know is:
With respect to the working solution I have, is it a simple case of documenting the inability to take the address (cause indirection) of the class?
Is it just an edge case behavior caused by optimizations eliminating the need for something to be linked?
Or is plain and simple undefined behavior. As in GCC may have a bug and is permitting code that might fail if optimizations were lowered or disabled?
Or one of you may be lucky enough to be in possession of a decoder ring that can find a suitable paragraph in the standard outlining the specifics.
This is my first question here, so let me know if you would like to know certain details, I can also provide GitHub links to the code if needed.
Edit: As the library needs to be compatible with existing code I need to maintain the ability to use the dot syntax, otherwise I'd simply have a class of static functions.
To remove assumptions for now, I see two options:
Add a .cpp just for the variable declaration.
Use a define in the header like #define foo (Foo()) allowing dot syntax via a temporary.
I prefer the method using a define, what does the community think?
Cheers.
Declaring something extern just informs the assembler and the linker that whenever you use that label/symbol, it should refer to entry in the symbol table, instead of a locally allocated symbol.
The role of the linker is to replace symbol table entries with an actual reference to the address space whenever possible.
If you don't use the symbol at all in your C file, it will not show up in the assembly code, and thus will not cause any linker error when your module is linked with others, since there is no undefined reference.
It is either an edge case behaviour caused by optimization, or you never use the foo variable in your code. I'm not 100% sure it is formally not an undefined behavior, but i'm quite sure it isn't undefined from practical point of view.
extern variables are implemented in such way, that code compiled with them produces so-called relocations - empty places where addres of variable should be placed - which are then filled by linker. Apparently foo is never used in your code in such a way that would need getting it's address and therefore linker doesn't even try to find that symbol. If you turn optimization off (-O0) you will probably get linker error.
Update: If you want to keep "dot notation" but remove the problem with undefined extern, you may replace extern with static (in header file), creating separate "instance" of variable for each TU. As this variable is going to be optimized out anyway, this will not change the real code at all, but will also work for unoptimized build.
I want to understand exactly which part of a program compiler looks at and which the linker looks at. So I wrote the following code:
#include <iostream>
using namespace std;
#include <string>
class Test {
private:
int i;
public:
Test(int val) {i=val ;}
void DefinedCorrectFunction(int val);
void DefinedIncorrectFunction(int val);
void NonDefinedFunction(int val);
template <class paramType>
void FunctionTemplate (paramType val) { i = val }
};
void Test::DefinedCorrectFunction(int val)
{
i = val;
}
void Test::DefinedIncorrectFunction(int val)
{
i = val
}
void main()
{
Test testObject(1);
//testObject.NonDefinedFunction(2);
//testObject.FunctionTemplate<int>(2);
}
I have three functions:
DefinedCorrectFunction - This is a normal function declared and defined correctly.
DefinedIncorrectFunction - This function is declared correctly but the implementation is wrong (missing ;)
NonDefinedFunction - Only declaration. No definition.
FunctionTemplate - A function template.
Now if I compile this code I get a compiler error for the missing ';'in DefinedIncorrectFunction.
Suppose I fix this and then comment out testObject.NonDefinedFunction(2). Now I get a linker error.
Now comment out testObject.FunctionTemplate(2). Now I get a compiler error for the missing ';'.
For function templates I understand that they are not touched by the compiler unless they are invoked in the code. So the missing ';' is not complained by the compiler until I called testObject.FunctionTemplate(2).
For the testObject.NonDefinedFunction(2), the compiler did not complain but the linker did. For my understanding, all compiler cared was to know that is a NonDefinedFunction function declared. It didn't care for the implementation. Then linker complained because it could not find the implementation. So far so good.
Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.
So I'm little unclear as to what the compiler does exactly and what the linker does. My understanding is linker links components with their calls. So for when NonDefinedFunction is called it looked for the compiled implementation of NonDefinedFunction and complained. But compiler didn't care about the implementation of NonDefinedFunction but it did for DefinedIncorrectFunction.
I'd really appreciate if someone can explain this or provide some reference.
Thank you.
The function of the compiler is to compile the code that you have written and convert it into object files. So if you have missed a ; or used an undefined variable, the compiler will complain because these are syntax errors.
If the compilation proceeds without any hitch, the object files are produced. The object files have a complex structure but basically contain five things
Headers - The information about the file
Object Code - Code in machine language (This code cannot run by itself in most cases)
Relocation Information - What portions of code will need to have addresses changed when the actual execution occurs
Symbol Table - Symbols referenced by the code. They may be defined in this code, imported from other modules or defined by linker
Debugging Info - Used by debuggers
The compiler compiles the code and fills the symbol table with every symbol it encounters. Symbols refers to both variables and functions. The answer to This question explains the symbol table.
This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand.
The point to note
If you call a function from your code, the compiler doesn't put the
final address of the routine in the object file. Instead, it puts a
placeholder value into the code and adds a note that tells the linker
to look up the reference in the various symbol tables from all the
object files it's processing and stick the final location there.
The generated object files are processed by the linker that will fill out the blanks in symbol tables, link one module to the other and finally give the executable code which can be loaded by the loader.
So in your specific case -
DefinedIncorrectFunction() - The compiler gets the definition of the function and begins compiling it to make the object code and insert appropriate reference into Symbol Table. Compilation fails due to syntax error, so Compiler aborts with an error.
NonDefinedFunction() - The compiler gets the declaration but no definition so it adds an entry to symbol table and flags the linker to add appropriate values (Since linker will process a bunch of object files, it is possible this definitionis present in some other object file). In your case you do not specify any other file, so the linker aborts with an undefined reference to NonDefinedFunction error because it can't find the reference to the concerned symbol table entry.
To understand it further lets say your code is structured as following
File- try.h
#include<string>
#include<iostream>
class Test {
private:
int i;
public:
Test(int val) {i=val ;}
void DefinedCorrectFunction(int val);
void DefinedIncorrectFunction(int val);
void NonDefinedFunction(int val);
template <class paramType>
void FunctionTemplate (paramType val) { i = val; }
};
File try.cpp
#include "try.h"
void Test::DefinedCorrectFunction(int val)
{
i = val;
}
void Test::DefinedIncorrectFunction(int val)
{
i = val;
}
int main()
{
Test testObject(1);
testObject.NonDefinedFunction(2);
//testObject.FunctionTemplate<int>(2);
return 0;
}
Let us first only copile and assemble the code but not link it
$g++ -c try.cpp -o try.o
$
This step proceeds without any problem. So you have the object code in try.o. Let's try and link it up
$g++ try.o
try.o: In function `main':
try.cpp:(.text+0x52): undefined reference to `Test::NonDefinedFunction(int)'
collect2: ld returned 1 exit status
You forgot to define Test::NonDefinedFunction. Let's define it in a separate file.
File- try1.cpp
#include "try.h"
void Test::NonDefinedFunction(int val)
{
i = val;
}
Let us compile it into object code
$ g++ -c try1.cpp -o try1.o
$
Again it is successful. Let us try to link only this file
$ g++ try1.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
No main so won';t link!!
Now you have two separate object codes that have all the components you need. Just pass BOTH of them to linker and let it do the rest
$ g++ try.o try1.o
$
No error!! This is because the linker finds definitions of all the functions (even though it is scattered in different object files) and fills the blanks in object codes with appropriate values
I believe this is your question:
Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.
The compiler tried to parse DefinedIncorrectFunction (because you provided a definition in this source file) and there was a syntax error (missing semicolon). On the other hand, the compiler never saw a definition for NonDefinedFunction because there simply was no code in this module. You might have provided a definition of NonDefinedFunction in another source file, but the compiler doesn't know that. The compiler only looks at one source file (and its included header files) at a time.
Say you want to eat some soup, so you go to a restaurant.
You search the menu for soup. If you don't find it in the menu, you leave the restaurant. (kind of like a compiler complaining it couldn't find the function) If you find it, what do you do?
You call the waiter to go get you some soup. However, just because it's in the menu, doesn't mean that they also have it in the kitchen. Could be an outdated menu, it could be that someone forgot to tell the chef that he's supposed to make soup. So again, you leave. (like an error from the linker that it couldn't find the symbol)
Compiler checks that the source code is language conformant and adheres to the semantics of the language. The output from compiler is object code.
Linker links the different object modules together to form a exe. The definitions of functions are located in this phase and the appropriate code to call them is added in this phase.
The compiler compiles code in the form of translation units. It will compile all the code that is included in a source .cppfile,
DefinedIncorrectFunction() is defined in your source file, So compiler checks it for language validity.
NonDefinedFunction() does have any definition in the source file so the compiler does not need to compile it, if the definition is present in some other source file, the function will be compiled as a part of that translation unit and further the linker will link to it, if at linking stage the definition is not found by the linker then it will raise a linking error.
What the compiler does, and what the linker does, depends on the
implementation: a legal implementation could just store the tokenized
source in the “compiler”, and do everything in the linker.
Modern implementations do put off more and more to the linker, for
better optimization. And many early implementations of templates didn't
even look the template code until link time, other than matching braces
enough to know where the template ended. From a user point of view,
you're more interested in whether the error “requires a
diagnostic” (which can be emitted by the compiler or the linker)
or is undefined behavior.
In the case of DefinedIncorrectFunction, you have provides source text
which the implementation is required to parse. That text contains a
error for which a diagnostic is required. In the case of
NonDefinedFunction: if the function is used, failure to provide a
definition (or providing more than one definition) in the complete
program is a violation of the one definition rule, which is undefined
behavior. No diagnostic is required (but I can't imagine an
implementation that didn't provide one for a missing definition of a
function that was used).
In practice, errors which can be easily detected simply by examining the
text input of a single translation unit are defined by the standard to
“require a diagnostic”, and will be detected by the
compiler. Errors which cannot be detected by the examination of a
single translation unit (e.g. a missing definition, which might be
present in a different translation unit) are formally undefined
behavior—in many cases, the errors can be detected by the linker,
and in such cases, implementations will in fact emit an error.
This is somewhat modified in cases like inline functions, where you're
allowed to repeat the definition in each translation unit, and extremely
modified by templates, since many errors cannot be detected until
instantiation. In the case of templates, the standard leaves
implementations a great deal of freedom: at the least, the compiler must
parse the template enough to determine where the template ends. The
standard added things like typename, however, to allow much more
parsing before instantiation. In dependent contexts, however, some
errors cannot possibly be detected before instantiation, which may take
place at compilation time or at link time—early implementations
favored link time instantiation; compile time instantiation dominates
today, and is used by VC++ and g++.
The missing semi-colon is a syntax error and therefore the code should not compile. This might happen even in a template implementation. Essentially, there is a parsing stage and whilst it is obvious to a human how to "fix and recover" a compiler doesn't have to do that. It can't just "imagine the semi-colon is there because that's what you meant" and continue.
A linker looks for function definitions to call where they are required. It isn't required here so there is no complaint. There is no error in this file as such, as even if it were required, it might not be implemented in this particular compilation unit. The linker is responsible for collecting together different compilation units, i.e. "linking" them.
Ah, but you could have NonDefinedFunction(int) in another compilation unit.
The compiler produces some output for the linker that basically says the following (among other things):
Which symbols (functions/variables/etc) are defined.
Which symbols are referenced but undefined. In this case the linker needs to resolve the references by searching through the other modules being linked. If it can't, you get a linker error.
The linker is there to link in code defined (possibly) in external modules - libraries or object files you will use together with this particular source file to generate the complete executable. So, if you have a declaration but no definition, your code will compile because the compiler knows the linker might find the missing code somewhere else and make it work. Therefore, in this case you will get an error from the linker, not the compiler.
If, on the other hand, there's a syntax error in your code, the compiler can't even compile and you will get an error at this stage. Macros and templates may behave a bit differently yet, not causing errors if they are not used (templates are about as much as macros with a somewhat nicer interface), but it also depends on the error's gravity. If you mess up so much that the compiler can't figure it out where the templated/macro code ends and regular code starts, it won't be able to compile.
With regular code, the compiler must compile even dead code (code not referenced in your source file) because someone might want to use that code from another source file, by linking your .o file to his code. Therefore non-templated/macro code must be syntactically correct even if it is not directly used in the same source file.