c++ compiled object files and internal identifiers - c++

i read here that
A function with internal linkage is only visible inside one translation unit. When the compiler compiles a function with internal linkage, the compiler writes the machine code for that function at some address and puts that address in all calls to that function (which are all in that one translation unit), but strips out all mention of that function in the ".o" file.
i compiled this code
int g_i{}; //extern
static int sg_i{}; //static
static int add(int a, int b) //internal linakge!
{
return a+b;
}
int main()
{
static int s_i{}; //static - local
int a_i{}; //auto - local
a_i = add(1,2);
return 0;
}
and compiled using g++ -c and created my main.o file
then trying nm -C main.o im getting this result:
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 p .pdata
0000000000000000 r .rdata$zzz
0000000000000000 t .text
0000000000000000 r .xdata
U __main
0000000000000000 t add(int, int)
0000000000000004 b sg_i
0000000000000008 b main::s_i
0000000000000000 B g_i
0000000000000014 T main
can you please explain why those internal identifier are still mentioned in the object file while i heard that linker using these object files will have no idea about their existence?
thanks.

The linker knows that there is such function. However it also knows that the function that the function with internal linkage is only visible in the translation that translation unit. More simply, it just forbids the call of that function outside the translation unit.
That's why you need those internal identifiers, so that the linker knows that this function belongs only to this translation unit.

Related

How linker allow multiple definitions of a function template in different object files but only allow one-definition of ordinary functions

I know how to use inline keyword to avoid 'multiple definition' while using C++ template. However, what I am curious is that how linker is distinguishing which specialization is full specialization and violating ODR and reporting error, while another specialization is implicit and correctly handle it?
From the nm output, we can see duplicated definitions in main.o and other.o for both int-version max() and char-version max(), but C++ linker only reports 'multiple definition error for char-version max()' but let 'char-version max() go a successful link? How linker differentiate them and does this?
// tmplhdr.hpp
#include <iostream>
// this function is instantiated in main.o and other.o
// but leads no 'multiple definition' error by linker
template<typename T>
T max(T a, T b)
{
std::cout << "match generic\n";
return (b<a)?a:b;
}
// 'multiple definition' link error if without inline
template<>
inline char max(char a, char b)
{
std::cout << "match full specialization\n";
return (b<a)?a:b;
}
// main.cpp
#include "tmplhdr.hpp"
extern int mymax(int, int);
int main()
{
std::cout << max(1,2) << std::endl;
std::cout << mymax(10,20) << std::endl;
std::cout << max('a','b') << std::endl;
return 0;
}
// other.cpp
#include "tmplhdr.hpp"
int mymax(int a, int b)
{
return max(a, b);
}
Test output on Ubuntu is reasonable; but output on Cygwin is rather strange and confusing...
==== Test on Cygwin ====
g++ linker only reported 'char max(char, char)' is duplicated.
$ g++ -o main.exe main.cpp other.cpp
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld:
/tmp/ccYivs3O.o:other.cpp:(.text$_Z3maxIcET_S0_S0_[_Z3maxIcET_S0_S0_]+0x0):
multiple definition of `char max<char>(char, char)';
/tmp/cc7HJqbS.o:main.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
I dumped my .o object file and found no many clues (maybe I am not quite familiar with object format spec.).
$ nm main.o | grep max | c++filt.exe
0000000000000000 p .pdata$_Z3maxIcET_S0_S0_
0000000000000000 p .pdata$_Z3maxIiET_S0_S0_
0000000000000000 t .text$_Z3maxIcET_S0_S0_
0000000000000000 t .text$_Z3maxIiET_S0_S0_
0000000000000000 r .xdata$_Z3maxIcET_S0_S0_
0000000000000000 r .xdata$_Z3maxIiET_S0_S0_
0000000000000000 T char max<char>(char, char) <-- full specialization
0000000000000000 T int max<int>(int, int) <<-- implicit specialization
U mymax(int, int)
$ nm other.o | grep max | c++filt.exe
0000000000000000 p .pdata$_Z3maxIcET_S0_S0_
0000000000000000 p .pdata$_Z3maxIiET_S0_S0_
0000000000000000 t .text$_Z3maxIcET_S0_S0_
0000000000000000 t .text$_Z3maxIiET_S0_S0_
0000000000000000 r .xdata$_Z3maxIcET_S0_S0_
0000000000000000 r .xdata$_Z3maxIiET_S0_S0_
000000000000009b t _GLOBAL__sub_I__Z5mymaxii
0000000000000000 T char max<char>(char, char) <-- full specialization
0000000000000000 T int max<int>(int, int) <-- implicit specialization
0000000000000000 T mymax(int, int)
==== Test on Ubuntu ====
This is what I have got on my Ubuntu with g++-9 after having remove inline from tmplhdr.hpp
tony#Win10Bedroom:/mnt/c/Users/Tony Su/My Documents/cpphome$ g++ -o main main.o other.o
/usr/bin/ld: other.o: in function `char max<char>(char, char)':
other.cpp:(.text+0x0): multiple definition of `char max<char>(char, char)'; main.o:main.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
'char-version max()' is marked with T which is not allowed to have multiple definitions; but 'in-version max()' is marked as W which allows multiple definitions. However, I start to be curious why nm gives different marks on Cygwin than on Ubuntu?? and Why linker on Cgywin can handle two T definitions correctly?
tony#Win10Bedroom:/mnt/c/Users/Tony Su/My Documents/cpphome$ nm main.o | grep max | c++filt
0000000000000133 t _GLOBAL__sub_I__Z3maxIcET_S0_S0_
0000000000000000 T char max<char>(char, char)
0000000000000000 W int max<int>(int, int)
U mymax(int, int)
tony#Win10Bedroom:/mnt/c/Users/Tony Su/My Documents/cpphome$ nm other.o | grep max | c++filt
00000000000000d7 t _GLOBAL__sub_I__Z3maxIcET_S0_S0_
0000000000000000 T char max<char>(char, char)
0000000000000000 W int max<int>(int, int)
000000000000003e T mymax(int, int)
However, I start to be curious why nm gives different marks on Cygwin than on Ubuntu?? and Why linker on Cgywin can handle two T definitions correctly?
You need to understand that the nm output does not give you the full picture.
nm is part of binutils, and uses libbfd. The way this works is that various object file formats are parsed into libbfd-internal representation, and then tools like nm print that internal representation in human-readable format.
Some things get "lost in translation". This is the reason you should ~never use e.g. objdump to look at ELF files (at least not at the symbol table of the ELF files).
As you correctly deduced, the reason multiple max<int>() symbols are allowed on Linux is that the compiler emits them as a W (weakly defined) symbol.
The same is true for Windows, except Windows uses older COFF format, which doesn't have weak symbols. Instead, the symbol is emitted into a special .linkonce.$name section, and the linker knows that it can select any such section into the link, but should only do that once (i.e. it knows to discard all other duplicates of that section in any other object file).

g++ skipping a function when compiling to object file [duplicate]

This question already has answers here:
Why can templates only be implemented in the header file?
(17 answers)
Closed 2 years ago.
the solution to this is probably trivial, but I can't find it. I tried to google it but with no luck.
I'm working on a C++ project using g++ on linux (gcc version 10.1.0 Ubuntu 10.1.0-2ubuntu1-18.04).
g++ compiles a C++ file into an object .o without raising any error, but the end object file is missing a function! The other 8 library files that I wrote are all compiled and linked fine, only this one is giving me trouble. Why, and how do I solve it?
The library header file bpo_interface.h is:
#pragma once
#include <boost/program_options/options_description.hpp>
#include <boost/program_options/variables_map.hpp>
#include <boost/algorithm/string.hpp>
#include <optional>
#include <string>
namespace bpo = boost::program_options;
namespace ibsimu_client::bpo_interface {
template <typename T>
std::optional<T> get(bpo::variables_map &params_op, std::string key)
}
The bpo_interface.cpp:
#include "bpo_interface.h"
namespace ic_bpo = ibsimu_client::bpo_interface;
template <typename T>
std::optional<T> ic_bpo::get(bpo::variables_map &params_op, std::string key)
{
try {
const T& value =
params_op[key].as<T>();
return value;
}
catch(const std::exception& e) {
return std::nullopt;
}
return std::nullopt;
}
The g++ command used to compile the file:
g++-10 -std=c++20 -lboost_program_options -Wall -g `pkg-config --cflags ibsimu-1.0.6dev` -c -o bin/build/bpo_interface.o src/bpo_interface.cpp
and the output of objdump -t -C bin/build/bpo_interface.o:
bin/build/bpo_interface.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 bpo_interface.cpp
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000000 l O .rodata 0000000000000001 __pstl::execution::v1::seq
0000000000000001 l O .rodata 0000000000000001 __pstl::execution::v1::par
0000000000000002 l O .rodata 0000000000000001 __pstl::execution::v1::par_unseq
0000000000000003 l O .rodata 0000000000000001 __pstl::execution::v1::unseq
0000000000000004 l O .rodata 0000000000000004 __gnu_cxx::__default_lock_policy
0000000000000008 l O .rodata 0000000000000008 boost::container::ADP_nodes_per_block
0000000000000010 l O .rodata 0000000000000008 boost::container::ADP_max_free_blocks
0000000000000018 l O .rodata 0000000000000008 boost::container::ADP_overhead_percent
0000000000000020 l O .rodata 0000000000000008 boost::container::ADP_only_alignment
0000000000000028 l O .rodata 0000000000000008 boost::container::NodeAlloc_nodes_per_block
0000000000000030 l O .rodata 0000000000000001 boost::container::ordered_range
0000000000000031 l O .rodata 0000000000000001 boost::container::ordered_unique_range
0000000000000032 l O .rodata 0000000000000001 boost::container::default_init
0000000000000033 l O .rodata 0000000000000001 boost::container::value_init
0000000000000000 l d .debug_info 0000000000000000 .debug_info
0000000000000000 l d .debug_abbrev 0000000000000000 .debug_abbrev
0000000000000000 l d .debug_aranges 0000000000000000 .debug_aranges
0000000000000000 l d .debug_line 0000000000000000 .debug_line
0000000000000000 l d .debug_str 0000000000000000 .debug_str
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
Coherently with the objdump result, the linker complains that it cannot find the ic_bpo::get() function - specifically:
undefined reference to 'std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > ibsimu_client::bpo_interface::get<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(boost::program_options::variable_maps&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
If I copy&paste the function body into the definition file bpo_interface.h and remove the bpo_interface.cpp and bpo_interface.o from the project, everything works fine.
So I guess g++ at compile time is perfectly able to process that function and match its declaration with its use in the project.
But why is not compiled into the bpo_interface.o object file?
Thank you
You have the placed the definition of the template function get() in a source (cpp) file. This means the definition is not available when needed for instantiations of the function template for particular specializations outside of this source file. Note that the definition of a function template is not equivalent to the definition of a non-function template, it is more like a blueprint for how to generate definitions for particular specializations; as needed for implicit or explicit instantiation (definitions).
When moving the definition to the header file, the function template definition is readily available as needed when particular specializations are instantiated. You can place the definition of a function template in a source file, but as that source file is then the only translation unit that can see the definitions, you would also need to provide explicit instantiation definitions of all the specializations you would like the template function to provide instantiated definitions for. This is quite uncommon, and usually only used for e.g. static dependency injection into class template which have only a single specialization used for production code intent (which can can then be explicitly instantiated) and e.g. other instantiations for test code (e.g. injecting mocked or stubbed implementations).
But why is not compiled into the bpo_interface.o object file?
From cppreference - function templates [emphasis mine]:
Function template instantiation
A function template by itself is not a type, or a function, or any other entity. No code is generated from a source file that contains only template definitions. In order for any code to appear, a template must be instantiated: the template arguments must be determined so that the compiler can generate an actual function (or class, from a class template).
tl;dr: Add the following to your .cpp file:
template std::optional<std::string> ic_bpo::get<std::string(bpo::variables_map &, std::string);
(and of course make sure to include the <string> header.)
But why is not compiled into the bpo_interface.o object file?
Because you defined a template; you did not instantiate that template at all. Only instantiations are actual functions, which can be compiled and put in object files. So, you need to force an instantiation of your template; that's what the line above does, for the case of T = std::string.
Alternatively, if you keep the template definition in your header, than other translation units can instantiate it themselves as needed.
See also:
Explicit template instantiation - when is it used?

Name Mangling in C++

I was going through the article - http://www.geeksforgeeks.org/extern-c-in-c/
There are two example given -
int printf(const char *format,...);
int main()
{
printf("GeeksforGeeks");
return 0;
}
It say this wont compile because the compiler wont be able to find the mangled version of 'printf' function. However, the below give output.
extern "C"
{
int printf(const char *format,...);
}
int main()
{
printf("GeeksforGeeks");
return 0;
}
This is beacuse extern "C" block prevent the name from being mangled. However, the code run and gives output. From where does it get the definition of 'printf'. I read a post which says 'stdio.h' is included by default. If this is true, below code must run. However, it give error that printf is not defined.
int main()
{
printf("GeeksforGeeks");
return 0;
}
Can somebody explain this?
Your compiler is being helpful by treating printf specially as a built-in.
Sample code "tst.cpp":
int printf(char const *format,...);
int foo(int a, char const *b);
int main() {
printf("Hello, World!");
foo(42, static_cast<char const *>("Hello, World!"));
return 0;
}
When compiling with Microsoft's cl compiler command "cl /c tst.cpp" we can inspect the resulting .obj and find:
00000000 r $SG2552
00000010 r $SG2554
00000000 N .debug$S
00000000 i .drectve
00000000 r .rdata
00000000 t .text$mn
U ?foo##YAHHPBD#Z
U ?printf##YAHPBDZZ
00e1520d a #comp.id
80000191 a #feat.00
00000000 T _main
Note that both foo() and printf() are mangled.
But when we compile with /usr/lib/gcc/i686-pc-cygwin/3.4.4/cc1plus.exe via cygwin "g++ -c tst.cpp", we get:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 t .text
U __Z3fooiPKc
U ___main
U __alloca
00000000 T _main
U _printf
Here foo() is mangled and printf() is not, because the cygwin compiler is being helpful. Most would consider this a compiler defect. If the cygwin compiler is invoked with "g++ -fno-builtin -c tst.cpp" then the problem goes away and both symbols are mangled as they should be.
A more up-to-date g++ gets it right, compiling with with /usr/libexec/gcc/i686-redhat-linux/4.8.3/cc1plus via "g++ -c tst.cpp" we get:
00000000 T main
U _Z3fooiPKc
U _Z6printfPKcz
Both foo() and printf() are mangled.
But if we declare printf such that cygwin g++ does not recognize it:
char const * printf(char const *format,...);
int foo(int a, char const *b);
int main() {
printf("Hello, World!");
foo(42, static_cast<char const *>("Hello, World!"));
return 0;
}
Then both foo() and printf() are mangled:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 t .text
U __Z3fooiPKc
U __Z6printfPKcz
U ___main
U __alloca
00000000 T _main
Let's take a look at the relevant standard quotes:
17.6.2.3 Linkage [using.linkage]
2 Whether a name from the C standard library declared with external linkage has extern "C" or extern "C++" linkage is implementation-defined. It is recommended that an implementation use extern "C++" linkage for this purpose.
17.6.4.3 Reserved names [reserved.names]
2 If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.
17.6.4.3.3 External linkage [extern.names]
1 Each name declared as an object with external linkage in a header is reserved to the implementation to designate that library object with external linkage, both in namespace std and in the global namespace.
2 Each global function signature declared with external linkage in a header is reserved to the implementation to designate that function signature with external linkage.
3 Each name from the Standard C library declared with external linkage is reserved to the implementation for use as a name with extern "C" linkage, both in namespace std and in the global namespace.
4 Each function signature from the Standard C library declared with external linkage is reserved to the implementation for use as a function signature with both extern "C" and extern "C++" linkage, or as a name of namespace scope in the global namespace.
What we get from this is that the compiler may assume that printf in any of the given instances always refers to the standard-library-function printf, and thus can have any amount of info about them baked in. And if you get the declaration wrong, or indeed simply provide your own, it is free to do whatever it wants, including but not limited to magically correcting it.
Anyway, you cannot know which language-linkage it expects.

Section type conflict for identically defined variables

This question arose in the context of this question: Find unexecuted lines of c++ code
When searching for this problem most people tried to add code and variables into the same section - but this is definitely not the problem here. Here is a minimal working example:
unsigned cover() { return 0; }
#define COV() do { static unsigned cov[2] __attribute__((section("cov"))) = { __LINE__, cover() }; } while(0)
inline void foo() {
COV();
}
int main(int argc, char* argv[])
{
COV();
if (argc > 1)
COV();
if (argc > 2)
foo();
return 0;
}
which results with g++ -std=c++11 test.cpp (g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)) in the following error:
test.cpp:6:23: error: cov causes a section type conflict with cov
COV();
^
test.cpp:11:30: note: ‘cov’ was declared here
COV();
^
The error is not very helpful though, as it does not state why this is supposed to be a conflict. Both the .ii and .s temporary files give no hint as to what might be the problem. In fact there is only one section definition in the .s file
.section cov,"aw",#progbits
and I don't see why the next definition should conflict with this ("aw",#progbits is correct...).
Is there any way to get more information on this? See what the precise
conflict is? Or is this just a bug...?
The message is indeed very bad, but it isn't a bug.
The problem here occurs with inline function foo()
and occurs because Inline functions must be defined in each translation context where they used. In this link we can read about section attribute:
"..uninitialized variables tentatively go in the common (or bss) section and can be multiply ‘defined’. Using the section attribute changes what section the variable goes into and
may cause the linker to issue an error if an uninitialized variable has multiple definitions...".
Thus, when the foo function needs to be 'defined' in function main, the linker finds cov variable previously defined in inline function foo and issues the error.
Let’s make the pre-processor's work and expand COV() define to help to clarify the problem:
inline void foo()
{
do { static unsigned cov[2] __attribute__((section("cov"))) = { 40, cover() }; } while(0);
}
int main(int argc, char *argv[]) {
do { static unsigned cov[2] __attribute__((section("cov"))) = { 44, cover() }; } while(0);
if (argc > 1)
do { static unsigned cov[2] __attribute__((section("cov"))) = { 47, cover() }; } while(0);
if (argc > 2)
foo();
To facilitate reasoning, let’s alter the section attribute of definition in foo inline function to cov.2 just to compile the code. Now we haven’t the error, so we can examine the object (.o) with objdump:
objdump -C -t -j cov ./cmake-build-debug/CMakeFiles/stkovf.dir/main.cpp.o
./cmake-build-debug/CMakeFiles/stkovf.dir/main.cpp.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d cov 0000000000000000 cov
0000000000000000 l O cov 0000000000000008 main::cov
0000000000000008 l O cov 0000000000000008 main::cov
objdump -C -t -j cov.2 ./cmake-build-debug/CMakeFiles/stkovf.dir/main.cpp.o
./cmake-build-debug/CMakeFiles/stkovf.dir/main.cpp.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d cov.2 0000000000000000 cov.2
0000000000000000 u O cov.2 0000000000000008 foo()::cov
We can see that compiler makes foo::cov, in section cov.2 GLOBAL (signed by ‘u’ letter).
When we use the same section name (cov), the compiler, trying to ‘define’ foo in main block encounters a previous globally defined cov and the issues the error.
If you make inline foo static (inline static void foo() . . .), which avoids compiler to emit code for inline function and just copies it at expansion time, you’ll see the error disappears, because there isn't a global foo::cov.

ld of data file makes size of data an *ABS* and not an integer

I have a c++ program which includes an external dependency on an empty xlsx file. To remove this dependency I converted this file to a binary object in view of linking it in directly, using:
ld -r -b binary -o template.o template.xlsx
followed by
objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents template.o template.o
Using objdump, I can see three variables declared :
$ objdump -x template.o
template.o: file format elf64-x86-64
template.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .rodata 00000fd1 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
SYMBOL TABLE:
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000fd1 g *ABS* 0000000000000000 _binary_template_xlsx_size
0000000000000000 g .rodata 0000000000000000 _binary_template_xlsx_start
0000000000000fd1 g .rodata 0000000000000000 _binary_template_xlsx_end
I then tell my program about this data :
template.h:
#ifndef TEMPLATE_H
#define TEMPLATE_H
#include <cstddef>
extern "C" {
extern const char _binary_template_xlsx_start[];
extern const char _binary_template_xlsx_end[];
extern const int _binary_template_xlsx_size;
}
#endif
This compiles and links fine,(although I am having some trouble automating it with cmake, see here : compile and add object file from binary with cmake)
However, when I use _binary_template_xlsx_size in my code, it is interpreted as a pointer to an address that doesn't exist. So to get the size of my data, I have to pass (int)&_binary_template_xlsx_size (or (int)(_binary_template_xlsx_end - _binary_template_xlsx_start))
Some research tells me that the *ABS* in the objdump above means "absolute value" but I don't get why. How can I get my c++ (or c) program to see the variable as an int and not as a pointer?
An *ABS* symbol is an absolute address; it's more often created by passing --defsym foo=0x1234 to ld.
--defsym symbol=expression
Create a global symbol in the output file, containing the absolute
address given by expression. [...]
Because an absolute symbol is a constant, it's not possible to link it into a C source file as a variable; all C object variables have an address, but a constant doesn't.
To make sure you don't dereference the address (i.e. read the variable) by accident, it's best to define it as const char [] as you have with the other symbols:
extern const char _binary_template_xlsx_size[];
If you want to make sure you're using it as an int, you could use a macro:
extern const char _abs_binary_template_xlsx_size[] asm("_binary_template_xlsx_size");
#define _binary_template_xlsx_size ((int) (intptr_t) _abs_binary_template_xlsx_size)