Understanding C++ Compilation - c++

I have recently become aware that I have no idea, genericly speaking, how a c/c++ compiler works. I will admit this initialy came from trying to understand header guards but came to the realization that I am lacking in how compiling works.
Take Visual C++ for instance; Theres the "Header Files" folder, the "Resources Files" folder, and "Source Files" folder. Is there any significance to the separation of these folders and what you put in them? To me, they are all source files. Take the code snippets:
Snippet 1
//a1.h
int r=4;
and
//a1.cpp
int b //<--semicolon left out on purpose
and
//main.cpp
#include <iostream>
#include "a1.h"
void main()
{
cout << r;
}
The compiler errors out saying "a1.cpp(3) : fatal error C1004: unexpected end-of-file found" where I would expect it wouldn't because the a1.cpp file is not #included where the main method exists where in the next code snippet
Snippet 2
//a1.h
int r=4 //<--semicolon left out on purpose
and
//a1.cpp
int b = 4;
and
//main.cpp
#include <iostream>
void main()
{
cout << b;
}
Errors out because "main.cpp(6) : error C2065: 'b' : undeclared identifier". If you include the a1.cpp like so
Snippet 3
//a1.h
int r=4 //<--semicolon left out on purpose
and
//a1.cpp
int b = 4;
and
//main.cpp
#include <iostream>
#include "a1.cpp"
void main()
{
cout << b;
}
the compiler complains "a1.obj : error LNK2005: "int b" (?b##3HA) already defined in main.obj". Both snippets 2 and 3 ignore the fact that int r = 4 does not have a semicolon missing as I suspect that it has something to do with its a xxxx.h file. If I remove the a1.cpp file from the project on snippet 1, then it compiles fine. Clearly what I have expected is not what I am getting. Theres plenty of books and tutorials on how to code in cpp, but not much in the way cpp handles files and source code in the complition process. What on earth is going on here?

Your questions aren't really about the compiler, but about how your IDE is handling the entire build system. The build systems for most C/C++ projects compile each .c or .cpp file separately, and then link the resulting object files together into a final executable. In your case, your IDE is compiling any file you have in the project with a filename extension of .cpp and then linking the resulting objects. The behaviour you're seeing can be explained as follows:
a1.cpp is missing a ;, so when the IDE tries to compile that file, you get the error about 'unexpected end of file'.
b isn't declared anywhere in the main.cpp compilation unit, so you get an error about an undefined identifier.
b exists in both the main.cpp and a1.cpp compilation units (obviously in a1.cpp, and via your #include for main.cpp). Your IDE compiles both of those files - now a1.o and main.o each contain an object called b. When linking, you get a duplicate symbol error.
The important point to take away here, which explains all of the behaviour you see, is that your IDE compiles every .cpp file - not just main.cpp and the files it includes - and then links the resulting objects.
I recommend setting up a command-line test project with a makefile you create yourself - that will teach you all about the inner workings of build systems, and you can then apply that knowledge to the inner workings of your IDE.

header files are not compiled
an #include directive literally pastes the contents of the includable file instead of the #include line
All source files (redargless of main) are compiled into .o or .obj files.
All obj files are linked together along with external .lib files if there are any
You get an executable.
Regarding point 2:
example
//a.h
int
//b.h
x =
//c.h
5
//main.cpp
#include <iostream>
int main()
{
#include "a.h"
#include "b.h"
#include "c.h"
;
std::cout << x << std::endl; //prints 5 :)
}
This isn't a full answer, but hth, my2c, etc :)

Since it seems that there are two ways of understanding your question, I will answer to the understanding C++ compilation part.
I suggest that you start by reading the "compiler" definition in Wikipedia. After that, try Google search for compiler tutorials to build up your comprenhension about compilers. More specific to C++, you can read about #include and preprocessor directives (try Google search for those terms.)
If you still want to understand compilers further, I suggest a compiler book. You'll find a good list of books on StackOverflow.

The #include statement inserts that file into the file making the #include. Your snippet 3 main.cpp thus becomes the following before compilation.
// main.cpp
// All sorts of stuff from iostream
//a1.cpp
int b = 4;
void main()
{
cout << b;
}
The reason you are getting a linker error is that you are defining b twice. It is defined in a.cpp and in main.cpp.
You may wish to read about declaring and defining.

You tell the build system what files to compile. In the case of Visual C++, it will automatically compile any file named "*.cpp" that you add to the project. Though you can go into the project settings and tell it not to.
It will not compile files named *.h (though it can if you explicity tell it to.
The #include directive, is something the compiler processes before it does any compilation (it's called the pre-processor). It basically takes the file that it is pointing it to and sticks it in the source file being compiled at the point the #include directive appears in the file. The compiler then compiles that whole thing as one complete unit.
So, in your example cases:
Snippet 1
Bote a1.cpp and main.cpp are compiled seperately by the build system. SO, when it encounters the error om a1.cpp, it reports it.
Snippet 2
Note that it compiles these files seperately, with no knowledge of each other, so when your reference b in main.cpp, it does not know that b is defined in a1.cpp.
Snippet 3
Now you've included a1.cpp in main.cpp, so it compiles main.cpp, sees a definition for b and says, OK I have a b at global scope. Then it compiles a1.cpp, and says OK, I have a b at global scope.
Now the linker steps in and =tries to put a1 and main together, it;'s now telling you, hey I have 2 b's ate global scope. No good.

The compiler picks up source files from where you tell it to. In the case of Visual C++ there's an IDE telling the compiler what to do, and the different folders are there because that's how the IDE organises the files.
Also, the error in snippet 2 is from the linker, not from the compiler.
The compiler has compiled main.cpp and a1.cpp into object files main.obj and a1.obj and then the linker is trying to make an executable combining these object files, but the variable b is in both a1.obj (directly) and main.obj (via the include of a1.cpp), so you get the "already defined" error.

The problems you see in case 1 and 3 are VS specific. VS apparently tries to compile both main.cpp and a1.cpp.
Case 1: As VS tries to compile a1.cpp, which has an syntax error (the missing semicolon), the compilation fails.
Case 2: You have not declared the variable b in your main.cpp or in any included files. Thus the compilation fails.
Case 3: This is a linker error. Due to the include, int b has been declared in main.cpp as well as in a1.cpp. Since none of them is either static or extern, two global variables with the same identifier have been declared in the same scope. This is no allowed.

Related

"Undefined reference to" Error while linking object files [duplicate]

This question already has an answer here:
"undefined reference to" errors when linking static C library with C++ code
(1 answer)
Closed 6 years ago.
I realize this question has been asked in a number of ways including this very comprehensive answer but I have looked at many and tried fixing my error to no avail.
I am using .cpp and .c files to create a program. I compiled all files with g++, it seemed to have no more linking errors, though it gave me a bunch of C++ errors related to the C syntax. This was the command I used:
g++ -o program main.cpp /data/*.c -l[...libs]
The main.cpp calls functions in the .c files.
I then understood that one should not try to compile both .c and .cpp files with one compiler, instead to use gcc for one, g++ for the other, and thereafter to simply link the object files.
So I did this (the .c files are part of a library, and already have .o files)
g++ -c main.cpp
g++ -o program main.o /data/*.o -l[..libs]
But then here I will get "undefined reference to" errors for functions called from main.cpp to the precompiled .c files, errors which I didn't get previously.
Could anyone help? Or do I maybe need to provide more information?
EDIT (a more in depth excerpt of code, I've tried to simplify otherwise this will be impossible to read, but let me know if I still need to add stuff, I'll try to remove unnecessary code):
main.cpp :
#include "header.h"
int main(int argc, char** argv) {
string s1 = argv[2];
fn1(s1)
}
header.h
void fn1(string s1)
mycfile.c
#include "header.h"
void fn1(string s1){
fprintf(stdout, " you entered %s", s1);
fflush(stdout);
}
ANSWER:
#Smeehey helped me figure out the solution, it was actually that I was still including the old headers in an .hpp file I was using. But the core solution was indeed to use the external C{}.
This is highly likely to do with C-vs-C++ linkage. In your main.cpp, you probably have something like this:
#include <data/header.h>
where header.h refers to your c library. Replace it as follows:
extern "C" {
#include <data/header.h>
}
This tells your c++ compiler not to use c++-style name mangling when defining the required symbols from the header, allowing the linker to successfully find them in the c-compiled .o files.
You have to compile C files with the gcc command, C++ files with either gcc or g++, and link with the g++ command. This sequence will probably work:
gcc -c data/*.c main.cpp
g++ -o program *.o -l <libs...>
Next step: learn to write proper makefiles.
This is a shot in the dark but the problem might be the way C and
Cpp files are compiled is simillar yet slightly different....
Due to name spaces the function foo would generate the symbol some_prefix#foo
Unlike C whereas goo generates the symbol goo
Try doing the following in your .cpp files
extern "C"
{
#include "yourcfilesheader.h"
}
And please attach your code

Conditional compilation confusion and failure

I want to compile different files with a common *.c file. Like I want to compile A.c common.c xor B.c common.c but I can't figure out how to achieve that.
Can you please tell me how do I make common.c use different headers without using my text editor to change the headers list every time I want to compile
So let's say I have 3 files: A.c, B.c and common.c.
A.h and B.h define an enum enum {Mon, Tues, Wed...} in different ways. And that enum is used in common.c. So I can't just do in common.c:
#include "A.h"
#include "B.h"
What I thought of doing is to use preprocessor directives:
In common.h
#define A 1
#define B 2
In A.c
#define COMPILE_HEADER A
And in common.c
#if COMPILE_HEADER == A
#include A.h
#end
This doesn't work, of course, because the compiler didn't visit the A.c file to find #define COMPILE_HEADER A
So can you please tell me how do I make common.c use different headers without using my text editor to change the headers list every time I want to compile?
It's pretty complicated to explain, but I'll give it a try.
Explaination
The compiler, for example gcc, gets the input files provided, includes (literally just copy-pastes) the header files into their respective places (where the #include directive is located) in the *.c file, then compiles the file to the object (*.o) file. No executable is created. Here comes the linker, which is included in gcc. It takes the *.o files and links them into one executable.
The problem is, that the files are compiled independently, and then linked together. I mean, that predefinition like int func(int param); is like saying to compiler "Hey man, don't worry about any usage of func in the code, the linker will care". Compiler then just saves this usage as external symbol in the corresponding *.o file, and when linker is doing his job, he firstly finds the location of the symbol definition (the function implementation) and then just points to it whenever the func is called.
Try to include function definition in header file, then include it in 2 or more files from same project (compiled/linked together). Compiler will say it's ok, since the code is correct and the generated code is valid. Then, the linker will try to link it into one executable and he would have to decide which version of the same name-param function should it link to. Since most developer tools are not really good at making the right choices, he will just yell at you saying "hey man, you gave me two definitions of same function, what to do now?". This results with an error like this:
obj\Release\b.o:b.cpp:(.text+0x0): multiple definition of 'func(int)'
obj\Release\a.o:a.cpp:(.text+0xc): first defined here
What about having two main in one project?
obj\Release\b.o:b.cpp:(.text.startup+0x0): multiple definition of 'main'
obj\Release\a.o:a.cpp:(.text.startup+0x0): first defined here
Both files compile, but they cannot be linked together.
The header files are meant to contain class definitions and function predefinitions to allow you to write them only once, and then share between all files that want to use them. You can always just type class definitions for each file separately (as long as they stay same) and use them just like you would use them in .h file, same applies for function predefinitions.
There comes your problem. You have to compile only one file, not include only one. You can do this by using the preprocessor trick, but I wouldn't reccomend as solution, since it can be solved much easier (I'll tell you how in a moment).
TL;DR; (The actual answer without explaination)
You can #ifdef / #ifndef both files, then in another *.c define (or not define) some value. For example:
A.cpp
#include "t.h"
#ifdef USE_A
int func(int a)
{
return a + 5;
}
#endif
B.cpp
#include "t.h"
#ifndef USE_A
int func(int a)
{
return a * 10;
}
#endif
T.h
#ifndef T_H
#define T_H
//Or comment to use B
#define USE_A
int func(int a);
#endif // T_H
main.cpp
#include <iostream>
#include "t.h"
using namespace std;
int main()
{
cout << func(3);
}
Notice that the #ifdef/#ifndef are after the #include, so the preprocessor knows which one to compile. Also, it's not possible (at least I can't think of any way) to make the definition in any .c file, because they are compiled separately (as described above). You could use the -D switch, but that involves messing with build configurations in environments and if you want to do that it's better to try the second solution presented bellow.
Better answer
You should choose one file to compile with each version. It hardly depends on your needs, but basically instead of using g++ main.cpp a.cpp b.cpp you should compile either a.cpp or b.cpp. If you are using integrated environment like Visual Studio or Code Blocks, the configuration managers allow you to decide which file to include. I mean, those Debug/Release dropdowns can contain your own entries, which will customize your project. Then, switching between the A.cpp and B.cpp is just a question of choosing appropiate option in the always-visible bar in your environment.
If you want a more detailed tutorial on how to manage configurations on Code::Blocks or Visual Studio create appropiate question on stackoverflow and it will be answered in no time :)

Fatal error: 'stdafx.h' file not found

I am new to programming C++ and am trying to learn myself through websites (learncpp.com) although I am already stuck on compiling my first program =( . They use Visual Studio to program their code and because I am using a macbook, I just use vi and terminal (or should I use something else?)
Here's the helloworld.cpp program I wrote based on the tutorial:
#include "stdafx.h"
#include <iostream>
{
std::cout <<"Hello World!" <<std::end1;
return 0;
}
when I compiled (gcc -Wall hello.cpp) I get the error :
helloworld.cpp:1:10: fatal error: 'stdafx.h' file not found
#include "stdafx.h"
^
1 error generated.
Can anyone give me insight on to how to fix this?
stdafx.h is the precompiled header used by Visual Studio, you do not need this.
You seem to have missed out the int main() function
It is std::endl not std::end1
So something like this:
#include <iostream>
int main() {
std::cout << "Hello World!" << std::endl;
return 0;
}
stdafx.h is a Precompiled Header file and it is specific to the Visual Studio.
Precompiled Header file is worthless unless you are facing slow compilation Time. In your program, you don't need them at all, so you can remove that and everything will be fine.
You might be guessing if it is not needed then why we include them?
I will Explain it:
Whenever we add header files (#include), The compiler will walk through it, Examine it and then compile the header file whenever CPP file is compiled.
This process is repeated for each and every CPP file that has header file included.
In case if you have 1000 CPP files in a project which has to say xyz.h header file included then the compiler will compile xyz.h file 1000 times. it may take a noticeable time.
To avoid that compiler gives us the option to "precompile" the header file so it will get compiled only once to speed up the compilation time.
Two problems:
a) stdafx.h is not needed (as others noted).
b) 'end1' should be 'endl' (note the letter 'l' vs. the number '1').

How can the order of include statements matter in the linking step?

I cannot explain the behaviour I am seeing when linking my code. Maybe someone has an idea what's going on...
I have a multiple file C++ project which uses GNU automake tools as its build system (all on Linux).
After adding a source and header file (lets call them util.cc and util.h) to the project and having an already existing source file (calc.cc) calling a function from the newly added files I get a linking error depending on where the include statement appears. I repeat: The error occurs in the linking step, compilation runs fine!!
Example:
I get an error when putting the new include statement at the end of the preexisting statements, like:
calc.cc:
#include "file1.h"
#include "file2.h"
#include "file3.h"
#include "file4.h"
#include "util.h" // new header
This version compiles fine. But linking produces an error (symbol not found)!!
Now, when changing this to
#include "util.h" // new header
#include "file1.h"
#include "file2.h"
#include "file3.h"
#include "file4.h"
then compilation and linking runs fine!
Since the linker only reads the .o files, this must mean that different content is produced depending on where the include statement appears. How can this be?
Compiler is g++ (GCC) 4.4.6
Chances are that util.h has a #define that changes the behaviour of one of the other files.
Your best chance of working out exactly what is going on would involve examining those header files for the name of the missing symbol and getting the pre-processor output from compiling calc.cc both 'working' and 'non working' way, and comparing the two files.
Simple, header files can (re)define macros which can change the interpretation of later macros.
For instance, in your example above, if file1.h does
#define lseek lseek64
and util.h has an inline function which calls lseek, then depending on the include order the generated object code will have a symbol reference to lseek or lseek64.
Which is why projects tend to have rules that config.h (generated by autoconf) is included first.
You're absolutley right: different object code is produced in the two cases. As #hmjd also points out, most likely there's a macro in util.h which one of the other (.h or .c) files use, and any undeclared, called identifier is assumed to be a function by the compiler -- this is most likely the error here.

Understanding the origin of a linker duplicate symbol error

I have a c++ program that compiled previously, but after mucking with the Jamfiles, the program no longer compiled and ld emitted a duplicate symbol error. This persisted after successively reverting to the original Jamfiles, running bjam clean, removing the objects by hand, and switching from clang with the gcc front end to gcc 4.2.1 on MacOs 10.6.7.
A simplified description of the program is that there is main.cpp and four files, a.h,cpp and b.h,cpp, which are compiled into a static library which is linked to main.o. Both, main.cpp and b.cpp depend on the file containing the offending symbol, off.h, through two different intermediate files, but neither a.h nor a.cpp depend in any way on off.h.
Before you ask, I made sure that all files were wrapped in multiple definition guards (#ifndef, #define, #endif), and while I did find a file that was missing them, it did not reference off.h. More importantly, b.h does not include anything that references off.h, only the implementation, b.cpp, makes any reference to off.h. This alone had me puzzled.
To add to my confusion, I was able to remove the reference to off.h from b.cpp and, as expected, it recompiled successfully. However, when I added the reference back in, it also compiled successfully, and continued to do so after cleaning out the object files. I am still at a loss for why it was failing to compile, especially considering that the symbols should not have conflicted, I had prevented symbol duplication, and I had gotten rid of any prior/incomplete builds.
Since I was able to successfully compile my program, I doubt I'll be able to reproduce it to test out any suggestions. However, I am curious as to how this can happen, and if I run across this behavior in the future, what, if anything beyond what I've done, might I do to fix it?
This is often the result of defining an object in a header file, rather than merely declaring it. Consider:
h.h :
#ifndef H_H_
#define H_H_
int i;
#endif
a.cpp :
#include "h.h"
b.cpp :
#include "h.h"
int main() {}
This will produce a duplicate symbol i. The solution is to declare the object in the header file: extern int i; and to define it in exactly one of the source-code files: int i;.