Related
So, I have a file structure like this:
FileA
FileB
FileC
FileA includes FileB and FileC
FileB has:
#define image(i, j, w) (image[ ((i)*(w)) + (j) ])
and FileC has:
#define image(i, j, h) (image[ ((j)*(h)) + (i) ])
on compilation i get:
warning: "image" redefined
note: this is the location of the previous definition ...
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
Is there any way to avoid this warning while maintaining these two defines, and them applying their different definitions on their respective files?
Thankyou in advance :)
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
The program is ill-formed. The language doesn't specify what happens in this case. If the compiler accepts an ill-formed program, then you must read the documentation of the compiler to find out what they do in such case.
Note that the program might not even compile with other compilers.
Is there any way to avoid this warning while maintaining these two defines, and them applying their different definitions on their respective files?
Technically, you could use hack like this without touching either header:
#include "FileB"
#undef image
#include "FileC"
But a good solution - if you can modify the headers - is to not use macros. Edit the headers to get rid of them. Use functions instead, and declare them in distinct namespaces so that their names don't conflict.
Some rules of thumb:
Don't use unnecessary macros. Functions and variables are superior to macros.
Follow the common convention of using only upper case for macro names, if you absolutely need to use macros. It is important to make sure that macro names don't mix with non-macros because macros don't respect namespaces nor scopes.
If you need a macro within a single header, then undefine it immediately when it's no longer needed instead of leaking it into other headers.
Don't use names without namespaces. That will lead to name conflicts. Macros don't respect C++ namespaces, but you can instead prefix their names. For example, you could have FILE_B_IMAGE and FILE_C_IMAGE (or something more descriptive based on the concrete context).
They are not functionally equivalent, one can be seen as a row-wise iteration and the other a column-wise
This seems like a good argument for renaming the functions (or the macros, if you for some reason cannot replace them). Call one row_wise and the other column_wise or something along those lines. Use descriptive names!
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
For GCC (tagged) it means that the definition processed second is used from the point of the redefinition onward, including not only in the same file but at any places later in the translation unit where the macro identifier appears followed by a (. Previous appearances will have used the previous definition.
Neither the C language specification nor the C++ language specification provides a more general answer: the redefinition other than with an identical token sequence violates language constraints, therefore both the translation behavior and the execution behavior of a program containing such a non-matching redefinition are undefined.
Is there any way to avoid this warning while maintaining these two
defines, and them applying their different definitions on their
respective files?
If these definitions are meant to be used only within their respective files, then the easiest solution would be for each file to #undef image at the end. This would work in both C and C++.
If both are intended to be exposed for use by other files then you have a name collision that you will have to resolve one way or another. You might, for instance, add a distinguishing prefix to the definition and all uses of each one. In C++ only, you also have the option of resolving the name collision by changing the macros to [inline] functions and putting them in different namespaces. That would probably make it easier to adapt each one's users to the new names than prefixing the names would do.
AFAIK, we can have two static variables with the same name in different functions? How are these managed by the compiler and symbol table? How are their identities managed seperately?
Compilers don't store static variables' names in the linking symbol table. They are just some memory that is part of the module as far as the linker is concerned. (this may not be 100% true in all cases but it is effectively true)
The names of static variables are usually included within the debugging symbol table.
When you feed a .c file to the compiler it keeps up with the names of all known symbols so that it can recognize them for what they are when they come up in future code. It also remembers them so that it can give useful error/warning messages, but it pretty much forgets about them when generating output files (unless debugging symbols are being generated).
They are likely mangled in the table, in a similar way to how overloaded functions are implemented.
See dumpbin /symbols foo.obj if you want to peek at the table, or use objdump on linux.
It depends on the compiler, but some embedded ones simply add a number to the end of each duplicate name. That way each variable has a unique name.
I am trying to learn and understand name mangling in C++. Here are some questions:
(1) From devx
When a global function is overloaded, the generated mangled name for each overloaded version is unique. Name mangling is also applied to variables. Thus, a local variable and a global variable with the same user-given name still get distinct mangled names.
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables ?
(2) From Wiki
The need arises where the language allows different entities to be named with the same identifier as long as they occupy a different namespace (where a namespace is typically defined by a module, class, or explicit namespace directive).
I don't quite understand why name mangling is only applied to the cases when the identifiers belong to different namespaces, since overloading functions can be in the same namespace and same-name global and local variables can also be in the same space. How to understand this?
Do variables with same name but in different scopes also use name mangling?
(3) Does C have name mangling? If it does not, how can it deal with the case when some global and local variables have the same name? C does not have overloading functions, right?
Thanks and regards!
C does not do name mangling, though it does pre-pend an underscore to function names, so the printf(3) is actually _printf in the libc object.
In C++ the story is different. The history of it is that originally Stroustrup created "C with classes" or cfront, a compiler that would translate early C++ to C. Then rest of the tools - C compiler and linker would we used to produce object code. This implied that C++ names had to be translated to C names somehow. This is exactly what name mangling does. It provides a unique name for each class member and global/namespace function and variable, so namespace and class names (for resolution) and argument types (for overloading) are somehow included in the final linker names.
This is very easy to see with tools like nm(1) - compile your C++ source and look at the generated symbols. The following is on OSX with GCC:
namespace zoom
{
void boom( const std::string& s )
{
throw std::runtime_error( s );
}
}
~$ nm a.out | grep boom
0000000100001873 T __ZN4zoom4boomERKSs
In both C and C++ local (automatic) variables produce no symbols, but live in registers or on stack.
Edit:
Local variables do not have names in resulting object file for mere reason that linker does not need to know about them. So no name, no mangling. Everything else (that linker has to look at) is name-mangled in C++.
Mangling is simply how the compiler keeps the linker happy.
In C, you can't have two functions with the same name, no matter what. So that's what the linker was written to assume: unique names. (You can have static functions in different compilation units, because their names aren't of interest to the linker.)
In C++, you can have two functions with the same name as long as they have different parameter types. So C++ combines the function name with the types in some way. That way the linker sees them as having different names.
The exact manner of mangling is not significant to the programmer, only the compiler, and in fact every compiler does it differently. All that matters is that every function with the same base name is somehow made unique for the linker.
You can see now that adding namespaces and templates to the mix keeps extending the principle.
Technically, it's "decorating". It sounds less crude but also mangling sort of implies that CreditInterest might get rearranged into IntCrederestit whereas what actually happens is more like _CreditInterest#4 which is, fair to say, "decorated" more than mangled. That said, I call it mangling too :-) but you'll find more technical info and examples if you search for "C++ name decoration".
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables?
C++ mangles all symbols, always. It's just easier for the compiler. Typically the mangling encodes something about the parameter list or types as these are the most common causes of mangling being needed.
C does not mangle. Scoping is used to control access to local and global variables of the same name.
Source:http://sickprogrammersarea.blogspot.in/2014/03/technical-interview-questions-on-c_6.html
Name mangling is the process used by C++ compilers give each function in your program a unique name. In C++, generally programs have at-least a few functions with the same name. Thus name mangling can be considered as an important aspect in C++.
Example:
Commonly, member names are uniquely generated by concatenating the name of the member with that of the class e.g. given the declaration:
class Class1
{
public:
int val;
...
};
val becomes something like:
// a possible member name mangling
val__11Class1
agner has more information on what is a name mangling and how it is done in different compilers.
Name mangling (also called name decoration) is a method used by C++
compilers to add additional information to the names of functions and
objects in object files. This information is used by linkers when a
function or object defined in one module is referenced from another
module. Name mangling serves the following purposes:
Make it possible for linkers to distinguish between different versions of overloaded functions.
Make it possible for linkers to check that objects and functions are declared in exactly the same way in all modules.
Make it possible for linkers to give complete information about the type of unresolved references in error messages.
Name mangling was invented to fulfill purpose 1. The other purposes
are secondary benefits not fully supported by all compilers. The
minimum information that must be supplied for a function is the name
of the function and the types of all its parameters as well as any
class or namespace qualifiers. Possible additional information
includes the return type, calling convention, etc. All this
information is coded into a single ASCII text string which looks
cryptic to the human observer. The linker does not have to know what
this code means in order to fulfill purpose 1 and 2. It only needs to
check if strings are identical.
I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge.
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.
We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.
What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.
OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.
Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.
OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below.
The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).
Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).
As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way.
However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)
As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.
While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).
Update March 2017: Now have symbol tables for C++2014.
Update June 2018: Now have symbol tables for C++2017.
A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.
(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).
Part of what you say makes no sense:
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
There's no reason it could not do this in a single pass.
I could not understand whether it is actually compiler dependent or not?
All compilers are going to use a symbol table, but its use will be hidden inside the implementation.
I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?
How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
I'll give it a stab:
class A
{
int a;
void f(int, int);
};
Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:
"A" -> (class)
"A::a" -> (class variable member)
"A::f(int,int)" -> (class function member)
It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.
In general the "A::a" symbol would not be output to the object file, since it is not required for linking.
Short answer: yes, using 'nm --demangle' on linux
Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.
http://linux.about.com/library/cmd/blcmdl1_nm.htm
It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.
Why do some languages, like C++ and Python, require the namespace of an object be specified even when no ambiguity exists? I understand that there are backdoors to this, like using namespace x in C++, or from x import * in Python. However, I can't understand the rationale behind not wanting the language to just "do the right thing" when only one accessible namespace contains a given identifier and no ambiguity exists. To me it's just unnecessary verbosity and a violation of DRY, since you're being forced to specify something the compiler already knows.
For example:
import foo # Contains someFunction().
someFunction() # imported from foo. No ambiguity. Works.
Vs.
import foo # Contains someFunction()
import bar # Contains someFunction() also.
# foo.someFunction or bar.someFunction? Should be an error only because
# ambiguity exists.
someFunction()
One reason is to protect against accidentally introducing a conflict when you change the code (or for an external module/library, when someone else changes it) later on. For example, in Python you can write
from foo import *
from bar import *
without conflicts if you know that modules foo and bar don't have any variables with the same names. But what if in later versions both foo and bar include variables named rofl? Then bar.rofl will cover up foo.rofl without you knowing about it.
I also like to be able to look up to the top of the file and see exactly what names are being imported and where they're coming from (I'm talking about Python, of course, but the same reasoning could apply for C++).
Python takes the view that 'explicit is better than implicit'.
(type import this into a python interpreter)
Also, say I'm reading someone's code. Perhaps it's your code; perhaps it's my code from six months ago. I see a reference to bar(). Where did the function come from? I could look through the file for a def bar(), but if I don't find it, what then? If python is automatically finding the first bar() available through an import, then I have to search through each file imported to find it. What a pain! And what if the function-finding recurses through the import heirarchy?
I'd rather see zomg.bar(); that tells me where the function is from, and ensures I always get the same one if code changes (unless I change the zomg module).
The problem is about abstraction and reuse : you don't really know if there will not be any future ambiguity.
For example, It's very common to setup different libraries in a project just to discover that they all have their own string class implementation, called "string".
You compiler will then complain that there is ambiguity if the libraries are not encapsulated in separate namespaces.
It's then a delightful pleasure to dodge this kind of ambiguity by specifying wich implementation (like the standard std::string one) you wants to use at each specific instruction or context (read : scope).
And if you think that it's obvious in a particular context (read : in a particular function or .cpp in c++, .py file in python - NEVER in C++ header files) you just have to express yourself and say that "it should be obvious", adding the "using namespace" instruction (or import *). Until the compiler complain because it is not.
If you use using in specific scopes, you don't break the DRY rule at all.
There have been languages where the compiler tried to "do the right thing" - Algol and PL/I come to mind. The reason they are not around anymore is that compilers are very bad at doing the right thing, but very good at doing the wrong one, given half a chance!
The ideal this rule strives for is to make creating reusable components easy - and if you reuse your component, you just don't know which symbols will be defined in other namespaces the client uses. So the rule forces you to make your intention clear with respect to further definitions you don't know about yet.
However, this ideal has not been reached for C++, mainly because of Koenig lookup.
Is it really the right thing?
What if I have two types ::bat and ::foo::bar
I want to reference the bat type but accidentally hit the r key instead of t (they're right next to each others).
Is it "the right thing" for the compiler to then go searching through every namespace to find ::foo::bar without giving me even a warning?
Or what if I use "bar" as shorthand for the "::foo::bar" type all over my codebase.
Then one day I include a library which defines a ::bar datatype. Suddenly an ambiguity exists where there was none before. And suddenly, "the right thing" has become wrong.
The right thing for the compiler to do in this case would be to assume I meant the type I actually wrote. If I write bar with no namespace prefix, it should assume I'm referring to a type bar in the global namespace. But if it does that in our hypothetical scenario, it'll change what type my code references without even alerting me.
Alternatively, it could give me an error, but come on, that'd just be ridiculous, because even with the current language rules, there should be no ambiguity here, since one of the types is hidden away in a namespace I didn't specify, so it shouldn't be considered.
Another problem is that the compiler may not know what other types exist. In C++, the order of definitions matters.
In C#, types can be defined in separate assemblies, and referenced in your code. How does the compiler know that another type with the same name doesn't exist in another assembly, just in a different namespace? How does it know that one won't be added to another assembly later on?
The right thing is to do what gives the programmer the fewest nasty surprises. Second-guessing the programmer based on incomplete data is generally not the right thing to do.
Most languages give you several tools to avoid having to specify the namespace.
In c++, you have "using namespace foo", as well as typedefs. If you don't want to repeat the namespace prefix, then don't. Use the tools made available by the language so you don't have to.
This all depends on your definition of "right thing". Is it the right thing for the compiler to guess your intention if there's only one match?
There are arguments for both sides.
Interesting question. In the case of C++, as I see it, provided the compiler flagged an error as soon as there was a conflict, the only problem this could cause would be:
Auto-lookup of all C++ namespaces would remove the ability to hide the names of internal parts of library code.
Library code often contains parts (types, functions, global variables) that are never intended to be visible to the "outside world." C++ has unnamed namespaces for exactly this reason -- to avoid "internal parts" clogging up the global namespace, even when those library namespaces are explicitly imported with using namespace xyz;.
Example: Suppose C++ did do auto-lookup, and a particular implementation of the C++ Standard Library contained an internal helper function, std::helper_func(). Suppose a user Joe develops an application containing a function joe::helper_func() using a different library implementation that does not contain std::helper_func(), and calls his own method using unqualified calls to helper_func(). Now Joe's code will compile fine in his environment, but any other user who tries to compile that code using the first library implementation will hit compiler error messages. So the first thing required to make Joe's code portable is to either insert the appropriate using declarations/directives or use fully qualified identifiers. In other words, auto-lookup buys nothing for portable code.
Admittedly, this doesn't seem like a problem that's likely to come up very often. But since typing explicit using declarations/directives (e.g. using namespace std;) is not a big deal for most people, solves this problem completely, and would be required for portable development anyway, using them (heh) seems like a sensible way to do things.
NOTE: As Klaim pointed out, you would never in any circumstances want to rely on auto-lookup inside a header file, as this would immediately prevent your module from being used at the same time as any module containing a conflicting name. (This is just a logical extension of why you don't do using namespace xyz; inside headers in C++ as it stands.)