Will static variables with same name in two seperate function conflict? [duplicate] - c++

AFAIK, we can have two static variables with the same name in different functions? How are these managed by the compiler and symbol table? How are their identities managed seperately?

Compilers don't store static variables' names in the linking symbol table. They are just some memory that is part of the module as far as the linker is concerned. (this may not be 100% true in all cases but it is effectively true)
The names of static variables are usually included within the debugging symbol table.
When you feed a .c file to the compiler it keeps up with the names of all known symbols so that it can recognize them for what they are when they come up in future code. It also remembers them so that it can give useful error/warning messages, but it pretty much forgets about them when generating output files (unless debugging symbols are being generated).

They are likely mangled in the table, in a similar way to how overloaded functions are implemented.
See dumpbin /symbols foo.obj if you want to peek at the table, or use objdump on linux.

It depends on the compiler, but some embedded ones simply add a number to the end of each duplicate name. That way each variable has a unique name.

Related

Does Symbol table for C++ code contain function names along with class names?

I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge.
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.
We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.
What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.
OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.
Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.
OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below.
The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).
Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).
As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way.
However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)
As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.
While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).
Update March 2017: Now have symbol tables for C++2014.
Update June 2018: Now have symbol tables for C++2017.
A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.
(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).
Part of what you say makes no sense:
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
There's no reason it could not do this in a single pass.
I could not understand whether it is actually compiler dependent or not?
All compilers are going to use a symbol table, but its use will be hidden inside the implementation.
I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?
How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
I'll give it a stab:
class A
{
int a;
void f(int, int);
};
Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:
"A" -> (class)
"A::a" -> (class variable member)
"A::f(int,int)" -> (class function member)
It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.
In general the "A::a" symbol would not be output to the object file, since it is not required for linking.
Short answer: yes, using 'nm --demangle' on linux
Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.
http://linux.about.com/library/cmd/blcmdl1_nm.htm
It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.

How to find the linking path for extern variable

Is there any way to find the file/shared_object from where linking is happenning for an extern variable used in current file/module.
for example: In a large application sofware in linux, I want to find the declaration of a particular variable that I have externed in my module ...
Thanks in advance.
I am not entirely sure to understand your question, but perhaps dladdr should be suitable for your needs. dladdr is a Gnu/Glibc extension. From its manual:
The function dladdr() takes a function pointer and tries to resolve name and file where it is located. (and it very probably could be used with the pointer to a global variable).
However, I am puzzled by the phrasing of your question. The "declaration of a variable" has no sense inside executable ELF binaries or shared objects, because declaration is essentially a source code concept, not an object code one. And practically speaking, most declarations (of global variables) are inside some header file.
Be aware of C++ name mangling
If you have the source code of your application, you could use textual tools (like grep or etags) or even extend the GCC compiler thru plugins or MELT extensions to find such declarations.
Of course, you can also use dlsym to find the address of some symbol, given its name.

Do unused functions get optimized out?

Compilers these days tend to do a significant amount of optimizations. Do they also remove unused functions from the final output?
It depends on the compiler. Visual C++ 9 can do that - unused static functions are removed at compilation phase (there's even a C4505 warning for that), unused functions with external linkage can be removed at link phase depending on linker settings.
MSVC (the Visual Studio compiler/linker) can do this if you compile with /Gy and link with /OPT:REF.
GCC/binutils can do this if you compile with -ffunction-sections -fdata-sections and link with --gc-sections.
Don't know about other compilers.
As a general rule, the answer is:
Yes: for unused static functions.
No: for unused globally available functions.
The compiler doesn't know if some other compilation unit references it. Also, most object module types do not allow functions to be removed after compilation and also do not provide a way for the linker to tell if there exist internal references. (The linker can tell if there are external ones.) Some linkers can do it but many things work against this.
Of course, a function in its own module won't be loaded unnecessarily by any linker, unless it is part of a shared library. (Because it might be referenced in the future at runtime, obviously.)
Many compilers do, but it depends on the particular implementation. Debug builds will often include all functions, to allow them to be invoked or examined from within the debugger. Many embedded systems compilers, for reasons I don't totally understand(*), will include all of the functions in an object file if they include any, but will omit entirely any object files that aren't used at all.
Note that in languages which support reflection (e.g., Java, C#, VB.NET, etc.) it's possible, given the name of a function, to create a reference to it at runtime even if no references exist in the code. For example, a routine could accept a string from the console, munge it in some fashion, and generate a call to a function by that name. There wouldn't be any way for a compiler or linker to know what names might be so generated, and thus no way to know what functions may be safely omitted from the code.
No such difficulty exists in C or C++, however, since there is no defined way for code to create a reference to a function, variable, or constant without an explicit reference existing in the code. Some implementations may arrange things so that consecutively-declared constants or variables will be stored consecutively, and one might thus create a reference to a later-declared one by adding an offset to an earlier-declared one, but the behavior of such tricks is explicitly not guaranteed by the C or C++ standards.
(*)I understand that it makes compiling and linking easier, but today's computers should have no trouble running more sophisticated compiling and linking algorithms than would have been practical in decades past. If nothing else, a two-pass pre-compile/pre-link/compile/link method could on the pre-compile/link phase produce a list of things that are used, and then on the "real" compile/link phase omit those that are not.
GCC, if you turn on the optimizations, can remove unused functions and dead code.
More on GCC optimizations can be found here.
Quite a lot of the time, yes. It’s often called linker stripping.
When it comes to Microsoft, it's the linker that takes care of this during the link phase and the compiler might warn you about unused static functions (file scope).
If you want the linker to remove unused functions, you use the /OPT:REF option.
Under MSVC and with global functions or variable you can use __declspec( selectany ).
It will remove the function or variable if it has not being referenced in the code if the linker option /OPT:REF (Optimizations) is selected.

symbol not found AKA undefined symbol

Most of the people who work on UNIX will face this irritating error often.
and some times it will take less time to solve and sometimes it will take hell lot of time.
Even i faced this regularly and i need some good document or an article regarding the specific error in c/c++
what are all the cases where there might be Symbol not found/Undefined Symbol error.
Could anybody help me to know what are all those cases?
The error is not related to UNIX/Windows/any other OS, but rather to the languages themselves. It is actually rather simple to diagnose with the information that compilers provide. Usually they will tell you what symbol is missing and sometimes where it is being used. The main reasons for a symbol to be missing are:
You have declared but never defined it
You have defined it, but did not add the compiled symbol (object file/library) to the linker
It is external and you forgot to link the library, or you are linking an invalid version, or in the wrong order.
The first one is a little trickier if you intended to define the symbol but did not match the declaration (declared void foo( int, double );, but defined void foo( double, int ). As with all other cases, the compiler will tell you the exact signature that it is looking for, make sure that you have defined that symbol, and not something close or similar, a particular corner case can be if you are using different calling conventions in the declaration and the definition, as they will look very similar in code.
In the case of libraries external code the complexity is in identifying what library needs to be linked for that symbol to be added, and that comes from the documentation of the lib. Beware that with static libraries the order of the libs in the linker command line affects the result.
To help you in finding what symbols are actually defined you can use nm (gcc, which is common among unix systems). So basically you can run nm against the object files/libs that you are linking and search for the symbol that the linker is complaining about. This will help in cases where the order is what makes the difference (i.e. the symbol is there, but the linker skipped it).
At runtime (thanks to Matthieu M. for pointing it out) you might have similar issues with dynamic libraries, if the wrong version of a library is found in the LD_LIBRARY_PATH you might end up with a library that does not have a required symbol.
Although they can be platform dependent, I have some "more complex" instances of some of the points from Andreas and David:
When dealing with shared libraries (.so or.dll) and linking against symbols which are not exported (dllimport/dllexport on Windows and visibility("default") with GCC on *nix)
Or similar: Linking against the static lib, while expecting a shared lib or vice versa. This one is bit similar to Mathieu's comment about linking against another, unexpected version of the library.
Creating a pure virtual classs and not providing an implementation for at least one method (causing no vtable to be available).
Actually a more complex case of declaring but not defining: The linking errors you can get when dealing with large, nested templates. Finding out what was not defined can be difficult with large error messages.
For most cases when you get a symbol not found/undefined symbol or sometimes even a "duplicate symbol" error, they usually stem from the fact that the linker is unable to find the symbol in the project that you are trying to build.
The best way to go about it is to look at the map file generated or a symbol table that is the output of the compiler. It may look something like this:
This will allow you to see if the symbol is present or not. Also, there might be other esoteric problems such as compiler optimizations that might cause a symbol duplication especially with inline assembly. These are about the hardest to detect.
As for good resources and materials, I don't have many good references. When I did ask around back then, most of the senior engineers have actually learned from their own experiences.
But I'm sure that's where forums such as these are present to help us expedite such knowledge acquisition.
Hope it helped :)
Cheers!
I assume you're referring to the linker error. Here's a list from the top of my head in what I think most-to-least common:
You forgot to tell the linker about a dependency (e.g. a LIB-file).
You have a class with a static data member and forgot to initialize it (only C++).
You declared a function not purely virtual and forgot to implement it.
You forgot to implement a function that you called from another function (only C, C++ will give a compiler error which is much easier to find).
You declared an external variable and forgot to initialize it.
The declaration of the function doesn't match the implementation (only C++, C will accept it and might die horribly).
You forgot to implement a function that you declared and called from another function.

What is a symbol table?

Can someone describe what a symbol table is within the context of C and C++?
There are two common and related meaning of symbol tables here.
First, there's the symbol table in your object files. Usually, a C or C++ compiler compiles a single source file into an object file with a .obj or .o extension. This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand. If you call a function from your code, the compiler doesn't put the final address of the routine in the object file. Instead, it puts a placeholder value into the code and adds a note that tells the linker to look up the reference in the various symbol tables from all the object files it's processing and stick the final location there.
Second, there's also the symbol table in a shared library or DLL. This is produced by the linker and serves to name all the functions and data items that are visible to users of the library. This allows the system to do run-time linking, resolving open references to those names to the location where the library is loaded in memory.
If you want to learn more, I suggest John Levine's excellent book "Linkers and Loaders".link text
Briefly, it is the mapping of the name you assign a variable to its address in memory, including metadata like type, scope, and size. It is used by the compiler.
That's in general, not just C[++]*. Technically, it doesn't always include direct memory address. It depends on what language, platform, etc. the compiler is targeting.
In Linux, you can use command:
nm [object file]
to list the symbol table of that object file. From this printout, you may then decipher the in-use linker symbols from their mangled names.
The symbol table is the list of "symbols" in a program/unit. Symbols are most often the names of variables or functions. The symbol table can be used to determine where in memory variables or functions will be located.
Check out the Symbol Table wikipedia entry.
Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc.
From the "Computer Systems A Programmer’s Perspective" book, Ch 7 Linking. "Symbols and Symbol Tables":
Symbol table is information about functions and global variables that
are defined and referenced in the program
And important note (form the same chapter):
It is important to realize that local linker symbols are not the same
as local program variables. The symbol table does not contain any
symbols that correspond to local nonstatic program variables. These
are managed at run time on the stack and are not of interest to the
linker