How does main.cpp see this? - c++

Just started out in C++, so bit of a noob, and not sure why this works. How can main.cpp see to use the print() function contained in the separate print.cpp file? I thought you had to use #include/header files or something like that? I'm using Visual Studio if that helps.
main.cpp
#include "stdafx.h"
#include <iostream>
#include <string>
void print(std::string message);
int main()
{
std::cout << "Enter message: ";
std::string message = "";
std::getline(std::cin, message);
print(message);
return 0;
}
print.cpp
#include "stdafx.h"
#include <iostream>
#include <string>
void print(std::string message)
{
std::cout << "Your message is - " << message << std::endl;
}

Actually the code in main.cpp does not "see" the print function in print.cpp at all!
The call to print is only checked by the compiler against the incomplete function specification you wrote earlier in the file, not against anything from any other file. C++ allows this incomplete specification as a way to say, "well I'm not telling you how this function is implemented now, but it should be available to you after this file and all other files are compiled and ready to link together, perhaps with some existing libraries."
You mentioned include files. All an include directive does is (among other things) place a bunch of partial function specifications directly inside your program. After including (which runs as a pre-processing phase before the compiler runs), you will have some code that looks just like your main.cpp above. In fact, to the C++ compiler, your code looks no different than one in which your incomplete function specification of print was replaced with an #include directive of a file containing that specification.
An interesting thing about writing incomplete function specifications is that functions implementing those specifications can often be written in different languages, as long as their data types map directly to C++ types. In your case, std::string binds you directly to C++, but had you used int or even char* an external program in assembly language or C could have been used!

The reason that you can compile code in separate translation units is linkage: Linkage is the property of a name, and names come in three kinds of linkage, which determine what the name means when it is seen in different scopes:
None: the meaning of a name with no linkage is unique to the scope in which the name appears. For example, "normal" variables declared inside a function have no linkage, so the name i in foo() has a distinct meaning from the name i in bar().
Internal: the meaning of a name with internal linkage is the same inside each translation unit, but distinct across translation units. A typical example are the names of variables declared at namespace scope that are constants, or that appear in an unnamed namespace, or that use the static specifier. For a concrete example, static int n = 10; declared in one .cpp file refers to the same entity in every use of that name inside that file, but a different static int n in a different file refers to a distinct entity.
External: the meaning of a name with external linkage is the same across the entire program. That is, wherever you declare a specific name with external linkage, that name refers to the same thing. This is the default linkage for functions and non-constants at namespace scope, but you can also explicitly request external linkage with the extern specifier. For example, extern int a; would refer to the same int object anywhere in the program.
Now we see how your program fits together (or: "links"): The name print has external linkage (because it's the name of a function), and so every declaration in the program refers to the same function. There's a declaration in main.cpp that you use to call the function, and there's another declaration in print.cpp that defines the function, and the two mean the same thing, which means that the thing you call in main is the exact thing you define in print.cpp.
The use of header files doesn't do any magic: header files are just textually substituted, and now we see precisely what header files are useful for: They are useful to hold declarations of names with external linkage, so that anyone wanting to refer to the entities thus names has an easy and maintainable way of including those declarations into their code.
You could do entirely without headers, but that would require you to know precisely how to declare the names you need, and that is generally not desirable, because the specifics of the declarations are owned by the library owner, not the user, and it is the library owner's responsibility to maintain and ship the declarations.
Now you also see what the purpose of the "linker" part of the translation toolchain is: The linker matches up references to names with external linkage. The linker fills in the reference to the print name in your first translation unit with the ultimate address of the defined entity with that name (coming from the second translation unit) in the final link.

Related

Real Applications of internal linkage in C++

This sounds like a duplicate version of What is the point of internal linkage in C++ and probably is. There was only one post with some code that didn't look like a practical example. C++ISO draft says:
When a name has internal linkage, the entity it denotes can be
referred to by names from other scopes in the same translation unit.
It looks a good punctual definition for me, but I couldn't find any reasonable application of that, something like: "look this code, the internal linkage here makes a great difference to implement it". Furthermore based on the definition provided above,it looks that global variables fulfils the internal linkage duty. Could you provide some examples?
Internal linkage is a practical way to comply with the One Definition Rule.
One might find the need to define functions or objects with plain names, like sum, or total, or collection, or any one of other common terms, more than once. In different translation units they might serve different purposes, specific purposes that are particular to that, particular, translation unit.
If only external linkage existed you'd have to make sure that the same name will not be repeated in different translation units, i.e. little_sum, big_sum, red_sum, etc... At some point this will get real old, real fast.
Internal linkage solves this problem. And unnamed namespaces effectively results in internal linkage for entire classes and templates. With an unnamed namespace: if a translation unit has a need for its own private little template, with a practical name of trampoline it can go ahead and use it, safely, without worrying about violating the ODR.
Consider helper functions, that don't exactly need to be exposed to the outside like:
// Foo.hh
void do_something();
// Foo.cc
static void log(std::string) { /* Log into foo.log */ }
void do_something() { /* Do stuff while using log() to log info */ }
// Bar.hh
void bar();
// Bar.cc
static void log(std::string) { /* Log into bar.log */ }
void bar() { /* Do stuff while using log */ }
You can use the proper log function within two parts of your project, while avoiding multiple definition errors.
This latter part becomes very important for header only libraries, where the library might be included in multiple translation units within the same project.
Now as to a reason of using internal linkage with variables: Again you can avoid multiple definitions errors, which would be the result of code like this:
// Foo.cc
int a = 5;
// Bar.cc
int a = 5;
When compiling this the compiler will happily produce object code but the linker will not link it together.

Link an externally defined static function in C++ Application code

I have a set of pre defined C source code files that declares defines a lot of static functions - they are just coded up in .c files and not declared in any .h headers file.
Now I am trying to make use of those functions in my C++ application code:
Cmethods.c
static int amethod(int oppcheck)
{
}
A library is created using the C source code files:
$ nm --demangle userlib.a | grep amethod
00000000000001b6 t amethod
CppApp.h
extern "C" { int amethod(int oppcheck); }
CppApp.cpp
#include "CppApp.h"
voit callme()
{
amethod(check);
}
However during compilation ensuring that userlib.a is linked I get below error:
: undefined reference to `amethod'
$ nm --demangle userappcode.a | grep amethod
00000000000001b6 t amethod
U amethod
My further findings is that for functions in C source code files if declared in C header files - the linker error for them never occurs.
Note I cannot touch the C source code files - they are provided by third party community and we cannot break the license.
How can I resolve the issue
I have a set of pre defined C source code files that declares defines a lot of static functions.
Now I am trying to make use of those functions in my C++ application code:
Remove the static from those functions you want to call from some other translation unit.
If you cannot do that, you can't use these functions from outside. And the compiler could even optimize them to the point of removing them from your object file.
(a dirty trick that I do not recommend could be to compile that C code with gcc -Dstatic= to have the preprocessor replace static by nothing)
Note I cannot touch the C source code files.
Then your task is impossible.
You could "augment" the translation units, perhaps by appending to them a public (non static) function calling the static one. For example, you might compile something like
// include the original C code with only `static`
#include "Cmethods.c"
// code a public wrapper calling the static method
extern int public_amethod(int oppcheck);
int public_amethod(int oppcheck) { return amethod(oppcheck); }
Note I cannot touch the C source code files - they are provided by third party community and we cannot break the license -
It looks like you might not be legally allowed to compile that code, or that you cannot distribute its object file. There are no technical tricks to overcome a legal prohibition. If that goes in court, you'll probably lose!
Your issue is not technical, but social and legal. You may need a lawyer. You could also talk with the provider of the original code, and ask him if you are allowed to do what you want.
(without more motivation and context, your question looks weird)
static functions in C and static member functions in C++ are two different things. In C, a static function it is limited to its translation unit and invisible outside, this essential means object file(.o/.obj).
In C++, static can also apply to member functions and data members of classes. A static data member is also called a "class variable", while a non-static data member is an "instance variable".
If you defined amethod with static, then the function will have a internal linkage which means you can't link this function from other source files.
internal linkage.
The name can be referred to from all scopes in the
current translation unit. Any of the following names declared at
namespace scope have internal linkage
variables, functions, or function templates declared static
non-volatile non-inline (sinceC++17) const-qualified variables
(including constexpr) that aren'tdeclared extern and aren't previously
declared to have external linkage.
data members of anonymous unions
you might not touch the C source but a hacky solution is to #include it into a different translation unit with functions that forward the statics. This is not something I would recommend as a general answer but ...
file1.c:
static void function1(int)
{ ...}
file2.c:
#include "file1.c"
void fwd_function1(int x)
{
function1(x);
}

How are function definitions determined with header files?

When using separate files in C++, I know that functions can be declared using header files like this:
// MyHeader.h
int add(int num, int num2);
// MySource.cpp
int add(int num, int num2) {
return num + num2;
}
// Main.cpp
#include "MyHeader.h"
#include <iostream>
int main() {
std::cout << add(4, 5) << std::endl;
return 0;
}
My question is, in this situation, how does the compiler determine the function definition of add(int,int) when MyHeader.h and Main.cpp have no references at all to MySource.cpp?
As, if there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?
The function declaration gives the compiler enough information to generate a call to that function.
The compiler then generates an object file that specifies the names (which, in the case of C++ are mangled to specify the arguments, namespace, cv-qualifiers, etc.) of external functions to which that object file refers (along with another list of names it defines).
The linker then takes all those object files, and tries to match up every name that something refers to but doesn't define with some other object file that defines the same name. Then it assigns and fills in addresses, so where one object file refers to nameX, it fills in the address it's assigning to nameX from the other file.
At least in a typical case, the object files it looks at will include a number of libraries (standard library + any others you specify). A library is basically just a collection of object files, shoved together into a single file, with enough data to index what data is which object file. In a few cases, it also includes some extra meta-data to (for example) quickly find an object file that defines a specific name (obviously handy for the sake of faster linking, but not really an absolute necessity).
If there are two or more functions with exactly the same mangled name, then your code has undefined behavior (you're violating the one definition rule). The linker will usually give an error message telling you that nameZ was defined in both object file A and object file B (but the C++ standard doesn't really require that).
The compiler does not "determine" (you mean "know") the function definition. The linker does. You have just discovered why the build process consists of compiling and linking.
So, basically, the compiler produces two object files here. One which contains the definition of add and one which just refers to the "unknown" function add. The linker then takes the two object files and puts the reference and definition together. Of course, that's just a very simple explanation, but for a beginner, that's all you need to know.
The compiler doesn't compile header files; it compiles source files. It will include the code in the header when the header is #included in a source file being compiled, but on its own, the header file doesn't "do" anything.
Also, the compiler doesn't worry about whether a function is defined or not. It just compiles against function declarations. It's the linker that resolves the definitions of functions.
You don't need to include a definition of a function at all, unless it's being called by some other code you need to link.
As to your question, "If there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?": It depends on the linker and the settings, but generally, if you have more than one definition of a function with the same signature, the linker will issue an error stating that the function is multiply defined.

Accessing a function through inclusion vs declaring static

I have a header file I want to include in another cpp file. I want to know what is the difference if I write the header file like this,
#include <iostream>
#include <string>
using namespace std;
string ret()
{
return "called";
}
===================================
#include <iostream>
#include <string>
using namespace std;
static string ret()
{
return "called";
}
I can access the ret() function anyway!! So, what's the use of the static?
That is a pretty evil header file you're showing.
Never put using namespace std; into a header file. This forces anyone including the header to have all of std in the global namespace.
Use some form of include guards.
static makes the function invisible outside the .cpp where it's included. This means that every .cpp which includes the header will have its own copy of the function. static (non-member) functions should only be used if you specifically need this behaviour.
If you don't use static, you should either move the definition from the header into a source file (if you want it defined once), or declare the function inline (its code will then be inlined on every call site, if possible). If you do neither of these, you'll get multiple definition errors if you include the header in more than one source file.
The first header file defines a function called ret with external linkage in every translation unit that includes it. This is incorrect if more than one such TU is linked in the same program.
The second header file defines a function called ret with internal linkage in every translation unit that includes it. This means that each TU has its own private copy of the function (with a different address) no matter how many are linked together.
There are three correct ways to share code using a header file:
function with internal linkage (as in your second header, or in C++11 by putting it in a nameless namespace).
inline function with external linkage (replace static with inline). The meaning of inline is that although there is only one copy of the function in the program, every TU that uses the function contains its definition.
declare the function in the header, and define in it exactly one .cpp file (for example ret.cpp).
In C++03 there was a fourth way:
function with external linkage in a nameless namespace
I believe this is still available in C++11, but in C++11 functions in nameless namespaces have internal linkage by default. I'm not aware of any use in C++11 for making a function in a nameless namespace have external linkage. So as far as functions are concerned, nameless namespaces are a nice way of giving the function internal linkage.
Which one you use depends on your needs. The third option means that you can change the definition of the function without re-compiling the calling code, although you'd still need to re-link the executable unless the function is in a dll.
The first two (static or inline) differ in their behaviour if:
the function contains static local variables,
you compare function pointers to ret taken in different TUs,
you examine your executable size or symbol table,
the definition of the function is different in different TUs (perhaps due to different #defines), which is forbidden if the function has external linkage but not if internal.
Otherwise they're much the same.
According to the standard, inline is also a hint that the compiler should optimize calls to that function for fast execution (which in practice means, inline the code at the call site). Most compilers ignore this hint most of the time. They will happily inline a static but non-inline function if they assess it to be a good candidate for inlining, and they will happily avoid inlining an inline function if they assess it to be a bad candidate for inlining.
Use header guards.
Don't use "using namespace" in header files. (Actually, don't use "using" in header files. Use identifiers fully qualified.)
And use a header for declaring functions, not for defining them. You will want the code for ret() to be present in the resulting executable only once. You achieve this by putting the definition (code) of ret() in a .cpp file. One .cpp file, not multiple ones (by including the definition).
The header file lists the declaration of the function ret() so that other code "knows" that the function exists, which parameter it takes, and what it returns.
If you define c++ methods as static in the header file, each translation unit ( each .cpp file which includes that header file ) will have different versions of those static methods - they will not have the same address space.
Hence the size of your program will increase unnecessarily.
Also, just for clarity:
Defining a method as static only in the .cpp file means that the method has static linkage and is only accessible from other methods within the same .cpp file.

Is extern keyword really necessary?

...
#include "test1.h"
int main(..)
{
count << aaa <<endl;
}
aaa is defined in test1.h,and I didn't use extern keyword,but still can reference aaa.
So I doubt is extern really necessary?
extern has its uses. But it mainly involves "global variables" which are frowned upon. The main idea behind extern is to declare things with external linkage. As such it's kind of the opposite of static. But external linkage is in many cases the default linkage so you don't need extern in those cases. Another use of extern is: It can turn definitions into declarations. Examples:
extern int i; // Declaration of i with external linkage
// (only tells the compiler about the existence of i)
int i; // Definition of i with external linkage
// (actually reserves memory, should not be in a header file)
const int f = 3; // Definition of f with internal linkage (due to const)
// (This applies to C++ only, not C. In C f would have
// external linkage.) In C++ it's perfectly fine to put
// somethibng like this into a header file.
extern const int g; // Declaration of g with external linkage
// could be placed into a header file
extern const int g = 3; // Definition of g with external linkage
// Not supposed to be in a header file
static int t; // Definition of t with internal linkage.
// may appear anywhere. Every translation unit that
// has a line like this has its very own t object.
You see, it's rather complicated. There are two orthogonal concepts: Linkage (external vs internal) and the matter of declaration vs definition. The extern keyword can affect both. With respect to linkage it's the opposite of static. But the meaning of static is also overloaded and -- depending on the context -- does or does not control linkage. The other thing it does is to control the life-time of objects ("static life-time"). But at global scope all variables already have a static life-time and some people thought it would be a good idea to recycle the keyword for controlling linkage (this is me just guessing).
Linkage basically is a property of an object or function declared/defined at "namespace scope". If it has internal linkage, it won't be directly accessible by name from other translation units. If it has external linkage, there shall be only one definition across all translation units (with exceptions, see one-definition-rule).
I've found the best way to organise your data is to follow two simple rules:
Only declare things in header files.
Define things in C (or cpp, but I'll just use C here for simplicity) files.
By declare, I mean notify the compiler that things exist, but don't allocate storage for them. This includes typedef, struct, extern and so on.
By define, I generally mean "allocate space for", like int and so on.
If you have a line like:
int aaa;
in a header file, every compilation unit (basically defined as an input stream to the compiler - the C file along with everything it brings in with #include, recursively) will get its own copy. That's going to cause problems if you link two object files together that have the same symbol defined (except under certain limited circumstances like const).
A better way to do this is to define that aaa variable in one of your C files and then put:
extern int aaa;
in your header file.
Note that if your header file is only included in one C file, this isn't a problem. But, in that case, I probably wouldn't even have a header file. Header files are, in my opinion, only for sharing things between compilation units.
If your test1.h has the definition of aaa and you wanted to include the header file into more than one translation unit you will run into multiple definition error, unless aaa is constant.
Better you define the aaa in a cpp file and add extern definition in header file that could be added to other files as header.
Thumb rule for having variable and constant in header file
extern int a ;//Data declarations
const float pi = 3.141593 ;//Constant definitions
Since constant have internal linkage in c++ any constant that is defined in a translation unit will not be visible to other translation unit, but it is not the case for variable they have external linkage i.e., they are visible to other translation unit. Putting the definition of a variable in a header, that is shared in other translation unit would lead to multiple definition of a variable, leading to multiple definition error.
In that case, extern is not necessary. Extern is needed when the symbol is declared in another compilation unit.
When you use the #include preprocessing directive, the included file is copied out in place of the directive. In this case you don't need extern because the compiler already know aaa.
If aaa is not defined in another compilation unit you don't need extern, otherwise you do.