Barebones C++ without standard library? - c++

Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?

You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.

"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.

Related

"undefined reference to __dso_handle" while linking static library with -nostdlib [duplicate]

I have an unresolved symbol error when trying to compile my program which complains that it cannot find __dso_handle. Which library is this function usually defined in?
Does the following result from nm on libstdc++.so.6 mean it contains that?
I tried to link against it but the error still occurs.
nm libstdc++.so.6 | grep dso
00000000002fc480 d __dso_handle
__dso_handle is a "guard" that is used to identify dynamic shared objects during global destruction.
Realistically, you should stop reading here. If you're trying to defeat object identification by messing with __dso_handle, something is likely very wrong.
However, since you asked where it is defined: the answer is complex. To surface the location of its definition (for GCC), use iostream in a C++ file, and, after that, do extern int __dso_handle;. That should surface the location of the declaration due to a type conflict (see this forum thread for a source).
Sometimes, it is defined manually.
Sometimes, it is defined/supplied by the "runtime" installed by the compiler (in practice, the CRT is usually just a bunch of binary header/entry-point-management code, and some exit guards/handlers). In GCC (not sure if other compilers support this; if so, it'll be in their sources):
Main definition
Testing __dso_handle replacement/tracker example 1
Testing __dso_handle replacement/tracker example 2
Often, it is defined in the stdlib:
Android
BSD
Further reading:
Subtle bugs caused by __dso_handle being unreachable in some compilers
I ran into this problem. Here are the conditions which seem to reliably generate the trouble:
g++ linking without the C/C++ standard library: -nostdlib (typical small embedded scenario).
Defining a statically allocated standard library object; specific to my case is std::vector. Previously this was std::array statically allocated without any problems. Apparently not all std:: statically allocated objects will cause the problem.
Note that I am not using a shared library of any type.
GCC/ARM cross compiler is in use.
If this is your use case then merely add the command line option to your compile/link command line: -fno-use-cxa-atexit
Here is a very good link to the __dso_handle usage as 'handle to dynamic shared object'.
There appears to be a typo in the page, but I have no idea who to contact to confirm:
After you have called the objects' constructor destructors GCC automatically calls the function ...
I think this should read "Once all destructors have been called GCC calls the function" ...
One way to confirm this would be to implement the __cxa_atexit function as mentioned and then single step the program and see where it gets called. I'll try that one of these days, but not right now.
Adding to #natersoz's answer-
For me, using -Wabi-tag -D_GLIBCXX_USE_CXX11_ABI=0 alongside -fno-use-cxa-atexit helped compile an old lib. A telltale is if the C++ functions in the error message have std::__cxx11 in them, due to an ABI change.

Is it allowed to name a global variable `read` or `malloc` in C++?

Consider the following C++17 code:
#include <iostream>
int read;
int main(){
std::ios_base::sync_with_stdio(false);
std::cin >> read;
}
It compiles and runs fine on Godbolt with GCC 11.2 and Clang 12.0.1, but results in runtime error if compiled with a -static key.
As far as I understand, there is a POSIX(?) function called read (see man read(2)), so the example above actually invokes ODR violation and the program is essentially ill-formed even when compiled without -static. GCC even emits warning if I try to name a variable malloc: built-in function 'malloc' declared as non-function
Is the program above valid C++17? If no, why? If yes, is it a compiler bug which prevents it from running?
The code shown is valid (all C++ Standard versions, I believe). The similar restrictions are all listed in [reserved.names]. Since read is not declared in the C++ standard library, nor in the C standard library, nor in older versions of the standard libraries, and is not otherwise listed there, it's fair game as a name in the global namespace.
So is it an implementation defect that it won't link with -static? (Not a "compiler bug" - the compiler piece of the toolchain is fine, and there's nothing forbidding a warning on valid code.) It does at least work with default settings (though because of how the GNU linker doesn't mind duplicated symbols in an unused object of a dynamic library), and one could argue that's all that's needed for Standard compliance.
We also have at [intro.compliance]/8
A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.
We can consider POSIX functions such an extension. This is intentionally vague on when or how such extensions are enabled. The g++ driver of the GCC toolset links a number of libraries by default, and we can consider that as adding not only the availability of non-standard #include headers but also adding additional translation units to the program. In theory, different arguments to the g++ driver might make it work without the underlying link step using libc.so. But good luck - one could argue it's a problem that there's no simple way to link only names from the C++ and C standard libraries without including other unreserved names.
(Does not altering a well-formed program even mean that an implementation extension can't use non-reserved names for the additional libraries? I hope not, but I could see a strict reading implying that.)
So I haven't claimed a definitive answer to the question, but the practical situation is unlikely to change, and a Standard Defect Report would in my opinion be more nit-picking than a useful clarification.
Here is some explanation on why it produces a runtime error with -static only.
The https://godbolt.org/z/asKsv95G5 link in the question indicates that the runtime error with -static is Program returned: 139. The output of kill -l in Bash on Linux contains 11) SIGSEGV (and 128 + 11 = 139), so the process exits with fatal signal SIGSEGV (Segmentation fault) indicating invalid memory reference. The reason for that is that the process tries to run the contents (4 bytes) of the read variable as machine code. (Eventually std::cin >> ... calls read.) Either somethings fails in those 4 bytes accidentally interpreted as machine code, or it fails because the memory page containing those 4 bytes is not executable.
The reason why it succeeds without -static is that with dynamic linking it's possible to have multiple symbols with the same name (read): one in the program executable, and another one in the shared library (libc.so.6). std::cin >> ... (in libstdc++.so.6) links against libc.so.6, so when the dynamic linker tries to find the symbol read at program load time (to be used by libstdc++.so.6), it will look at libc.so.6 first, finding read there, and ignoring the read symbol in the program executable.

I receive different results on UNIX and WIN when use static members with static linking of shared library to executable. Please explain why?

Please consider following peace of code:
// 1. Single header file. Imagine that it is some static library.
// Counter.h
#pragma once
struct Counter
{
Counter()
{
++getCount();
}
static int& getCount()
{
static int counter = 0;
return counter;
}
};
// 2. Shared library (!) :
// main_DLL.cpp
#include <iostream>
#include "counter.h"
extern "C"
{
__declspec(dllexport) // for WIN
void main_DLL()
{
Counter c;
std::cout << "main_DLL : ptr = " << &Counter::getCount()<< " value = " << Counter::getCount() << std::endl;
}
}
// 3. Executable. Shared library statically (!) linked to the executable file.
// main.cpp
#include "counter.h"
#include <iostream>
extern "C"
{
__declspec(dllimport) // for WIN
void main_DLL();
}
int main()
{
main_DLL();
Counter c;
std::cout << "main_EXE : ptr = " << &Counter::getCount() << " value = " << Counter::getCount() << std::endl;
}
Results:
Results for WIN (Win8.1 gcc 5.1.0):
main_DLL : ptr = 0x68783030 value = 1
main_EXE : ptr = 0x403080 value = 1
// conclusion: two different counters
Results for UNIX (Red Hat <I don’t remember version exactly> gcc 4.8.3):
main_DLL : ptr = 0x75693214 value = 1
main_EXE : ptr = 0x75693214 value = 2
// conclusion: the same counter addressed
Building:
Building for WIN:
g++ -c -Wall -Werror -o main_DLL.o main_DLL.cpp
g++ -shared -Wl,--out-implib=libsharedLib.a -o libsharedLib.so main_DLL.o
g++ -Wall –Werror -o simpleExample main.cpp -L./ -lsharedLib
Building for UNIX:
g++ -c -Wall -Werror -fPIC -o main_DLL.o main_DLL.cpp
g++ -shared -fPIC -o libsharedLib.so main_DLL.o
g++ -Wall –Werror -fPIC -o simpleExample main.cpp -L./ -lsharedLib
So, you see that I added –fPIC on UNIX and there is no need to create import library for UNIX, because all exports symbols are included inside shared library. On Windows I use __declspec for it.
For me, results on Windows are pretty much expected. Because shared library and executable are building separately and they should know about static variable in Counter::getCount. They should simply allocate memory for it, that’s why they have different static counters.
I did quite some analysis using tools like nm, objdump. Although I’m not a big expert in them, so I haven’t found anything suspicious. I can provide their output if needed.
Using ldd tool I can see that library linked statically in both cases.
Why I can’t see the same results on Unix for me it’s strange. Could the root cause lie in building options (–fPIC, for example), or I’m missing something?
In windows, A DLL is not exporting global and static symbols unless you add the dllexport statement, therefore, the linker doesn't even know they exists, so it allocate new instance for the static member.
In linux/unix a shared lib is exporting all the global and static symbols, so when the linker find the existence of the static member in the shared lib, it just use its address.
That is the reason for the different result.
EDIT: This is a complete rewrite of the answer. With much more details.
I think that this question deserves more elaborated answer. Especially that there are things that were not mentioned so far.
Dependency Walker
Let me start with referring to the “Dependency Walker” program.
It is a nice program (although these days a bit old-schoolish in its look & feel) that allows analyzing Windows binaries (both EXE and DLL) for symbols that they export/import and their own dependencies to other DLLs. Also it allows showing undecorated symbol names but this seems to be working only with MSVC build binaries. (And some more but that is not important here.)
Thanks to this program crucial information (for this question) can be uncovered. So I encourage you to use it during experiments.
Exporting policy on Linux vs. Windows
SHR already pointed this out but I will mention it also for completeness of the answer. And some extra details.
On Linux every symbol gets exported from a shared library by default. On the other hand on Windows you have to explicitly state which symbols to export from a shared library.
GCC seems however to provide some means of controlling exports in "Windows style". See for example Visibility entry on GCC Wiki.
Also note that there are various ways of exporting on both Linux and Windows. For example both seem to support exporting selectively by providing linker with a list of names for symbols to export. But it also seems that nowadays (on Windows at least) this isn't really used much. __declspec approach seems to be preferred.
What can be exported?
After that general introduction let's now stick to Windows case. Nowadays you export/import symbols from shared libraries by using the __declspec. Just as shown in the question. (Well maybe not exactly that - typically you use a #define to handle bi-directionality as shown in already mentioned Visibility entry on GCC Wiki.)
But the declaration can be applied not only to functions, methods and global variables. It can also be applied to types. For example you can have:
class __declspec(dllexport) Counter { /* ... */ };
Such exporting/importing means in general that all members get exported/imported.
Not so easy!
But it would be too easy, wouldn't it? The complication is that GCC and MSVC handle exporting types differently.
My notes here are based mostly on experiments (checks done using Dependency Walker) so I can be wrong or not precise enough. But I did observe differences in behavior.
In tests I used MSVC 2013 from the Express Edition with update 5. For GCC I used MinGW distro from nuwen.net, version 13.0.
MSVC, when exporting entire type, exports each and every member. Including implicitly defined members (like compiler generated copy constructor). And including inlined functions. Furthermore if inlined function has some static local variables they get exported to (!).
GCC on the other hand seems to be far more restrictive. It doesn't export implicitly defined members. Nor it doesn't export inlined members.
Exporting/Importing inline functions
If instead of exporting entire type you would explicitly export an inlined function then and only then will GCC really export it. But still it will not export static local variables in that function.
Further more if you try to import an inlined function GCC will error. With GCC you cannot define symbols that you are importing. And this happens when you import inlined (and so defined) symbol. So in fact it doesn't make any sense to export inlined functions with GCC.
MSVC allows to import inlined functions. In all cases I checked it didn't seem to actually inline the function but instead called the imported version.
Yet note that because MSVC in case of inlined function exports also its static local variables it would be possible for it to really inline the function (rather than import it) while maintaining a single copy of static local variables. For ordinary programs such behavior is mandated by the Standard (N3337, C++11), in point 7.1.2 ([dcl.fct.spec]) at $4 we can read:
(…) A static local variable in an extern inline function always refers to the same object. (…)
But a program and a shared library are actually more like two programs so they are out of scope for the Standard. Yet MSVC even in that case acts (or better to say: could act) as one would expect from a single program.
Solution
Denis Bakhvalov in a comment provided solution for his own question. The solution is to move getCount function from header to source file and export/import it.
This seems to be the only solution portable between GCC and MSVC. Or to be more precise MSVC allows more solutions to this problem but none of them will work when program is build under GCC.
The variable trick
The above is not entirely true. There is another workaround that will work consistently between GCC and MSVC.
This is to stop using static local variable. Instead make it a global variable (most likely by making it static variable in the class) and export it. This will make the trick as well.
Sadly there is no way (or I don't know any) to directly force exporting/importing static local variables. You have to change them to global variables to do that.
MSVC solutions
With MSVC you have more options.
As mentioned before exporting/importing the inlined function itself (whether directly or through type) will do the job.
Summary
As described above even consistency between GCC and MSVC on Windows only requires care. You have to limit yourself to stay in common subset of allowed solutions.
Keeping the program (source) interoperable between Linux and Windows even if with same compiler (GCC) also requires care.
Luckily there is a common subset for all three environments: GCC on Linux, GCC on Windows and MSVC on Windows. That common subset is described already by mentioned Denis' comment.
So do not inline functions that you intend to export/import. Keep them in sources. And on Windows builds (regardless of compiler) export them explicitly (otherwise you will get linker error anyway since the functions in sources of a shared library will not be available when building program).
Note that this is actually a reasonable approach on its own. Inlining function from shared library doesn't seem wise. It freezes not only the interface but also implementation (of that function). You can no longer change this function freely (and deliver new version of your shared library) since all clients would have to be rebuild since they could have inlined that function. So it is a wise approach by itself not to inline from shared library. And as a bonus it assures that your sources are multi-platform friendly.
Also do have a look into the mentioned Visibility entry on GCC Wiki. It might be reasonable to use that approach (of explicit exports) on Linux as well since it seems cleaner (from design point of view) and more efficient at runtime. While it fits well what you have to do for Windows anyway.

Create automatic C wrapper for C++ library?

Let say I have a C++ DLL. AFAIK, there is no widely-adopted ABI standard for C++, therefore to make sure it works and does not depend on the compiler of the target application I would need to wrap my library in a C interface.
Are there any tools that can automatically generate such interface? Would also be nice if they could generate wrappers around C interface to look as if they are original C++ objects, e.g.
Foo* f = new Foo(); // FooWrapper* fw = Foo_create();
f->bar("test"); // Foo_bar(fw, "test")
translates into C functions that are invoked in my library using generated C ABI. I understand that C++ is fairly complicated language and not everything can be easily wrapped in a C interface, but I was wondering if there are any such solutions that even support a subset of the C++ language (maybe with the help of some manually written IDL/XML files)?
there is no widely-adopted ABI standard for C++
I'm pretty sure that is a bit exaggerated - there aren't THAT many different compilers available for any given platform, so it would probably be easier to just produce a DLL for each vendor (e.g. Microsoft, GCC on Windows, GCC on Linux, Sun and GCC for Solaris, GCC for MacOS - CLANG is compatible with GCC as far as I know).
To add a C layer interface basically means that the interface layer must not:
1. Use any objects of that require special copy/assignment/construction behaviour.
2. Use any "throw" exceptions.
3. Use virtual functions.
across that interface.
It is my opinion that it's easier to "fix" the problems caused by "lack of ABI" than it is to make a good interface suitable for C++ use with a C interface in the middle of it.
If you want a way to make C++ code callable from other compilers/standard libraries, you can use cppcomponents from https://github.com/jbandela/cppcomponents. Full disclosure - I am the author of the library.
Here is a simple hello world example
First make a file called library.h
In this file you will define the Component
#include <cppcomponents/cppcomponents.hpp>
struct IPerson
:public cppcomponents::define_interface<cppcomponents::uuid<0xc618fd04,0xaa62,0x46e0,0xaeb8,0x6605eb4a1e64>>
{
std::string SayHello();
CPPCOMPONENTS_CONSTRUCT(IPerson,SayHello);
};
inline std::string PersonId(){return "library!Person";}
typedef cppcomponents::runtime_class<PersonId,cppcomponents::object_interfaces<IPerson>> Person_t;
typedef cppcomponents::use_runtime_class<Person_t> Person;
Next create library.cpp
In this file you will implement the interface and component
#include "library.h"
struct PersonImplementation:cppcomponents::implement_runtime_class<PersonImplementation,Person_t>
{
std::string SayHello(){return "Hello World\n";}
};
CPPCOMPONENTS_DEFINE_FACTORY(PersonImplementation);
Finally here is you main program (call it example1.cpp) that uses your implementation
#include "library.h"
#include <iostream>
int main(){
Person p;
std::cout << p.SayHello();
}
To build the program you will need to download cppcomponents (just clone from the git link above). It is a header only library and needs only a c++11 compiler.
Here is how you would build it on Windows
cl /EHsc example1.cpp /I pathtocppcomponents
g++ -std=c++11 library.cpp -o library.dll -shared -I pathtocppcomponents
where pathocppcomponents is the directory of cppcomponents.
I am assuming you have cl and g++ in your path.
To run the program, make sure library.dll is in the same directory as example1.exe and run example1.exe
This library requires fairly compliant c++11 support, so it needs MSVC 2013 Preview, and at least g++ 4.7. This library works on both Windows and Linux.
As far as I know the answer is no and you are supposed to handle this by yourself with a little bit of "hacking" and modifications, for example your t variable which is an std::string can possibly be "externed" to a C interface by t.c_str() because c_str returns a const char * which is a type that C understands without any problem at all.
I personally don't find C++ complicated, I can't see that "ABI issue" either, I mean nothing is perfect but you are externalizing to C your entire code base to "solve" this issue ? Just use C in the first place, also C it's no easy language to deal with either, for example in C there is not even the notion of "string", and problems that are trivial to solve in C++ while keeping everything type-safe, are really challenging in C if you want to meet the same goal.
I think that you are going a little bit too far with this, and you are complicating things, as it is now you have 3 + 1 main options on the most popular platforms :
libsupc++
libcxxrt
libc++abi
plus the whetever ABI is for the MSVC of your choice ( aka "only god knows")
for me, on linux, libsupc++ works very well, I'm following the libc++abi project and I don't see any big problem either, the only real problem with this is that llvm is basically an Apple oriented project for now, so there isn't that real and good support for the other platforms, but libc++abi compiles and works quite well on linux too ( although it's basically useless and pointless, on linux there is libsupc++ already.) .
I also would never ever use MSVC under Windows, in my opinion it's better to stick with a GCC-like compiler such as mingw, you got bleeding edge features, and you can simplify your codebase and your building phase a lot.

How to debug GCC/LD linking process for STL/C++

I'm working on a bare-metal cortex-M3 in C++ for fun and profit. I use the STL library as I needed some containers. I thought that by simply providing my allocator it wouldn't add much code to the final binary, since you get only what you use.
I actually didn't even expect any linking process at all with the STL
(giving my allocator), as I thought it was all template code.
I am compiling with -fno-exception by the way.
Unfortunately, about 600KB or more are added to my binary. I looked up what symbols are included in the final binary with nm and it seemed a joke to me. The list is so long I won't try and past it. Although there are some weak symbols.
I also looked in the .map file generated by the linker and I even found the scanf symbols
.text
0x000158bc 0x30 /CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_GNU_Linux/bin/../arm-none-linux-gnueabi/libc/usr/lib/libc.a(sscanf.o)
0x000158bc __sscanf
0x000158bc sscanf
0x000158bc _IO_sscanf
And:
$ arm-none-linux-gnueabi-nm binary | grep scanf
000158bc T _IO_sscanf
0003e5f4 T _IO_vfscanf
0003e5f4 T _IO_vfscanf_internal
000164a8 T _IO_vsscanf
00046814 T ___vfscanf
000158bc T __sscanf
00046814 T __vfscanf
000164a8 W __vsscanf
000158bc T sscanf
00046814 W vfscanf
000164a8 W vsscanf
How can I debug this? For first I wanted to understand what exactly GCC is using for linking (I'm linking through GCC). I know that if symbol is found in a text segment, the
whole segment is used, but still that's too much.
Any suggestion on how to tackle this would really be appreciated.
Thanks
Using GCC's -v and -Wl,-v options will show you the linker commands (and version info of the linker) being used.
Which version of GCC are you using? I made some changes for GCC 4.6 (see PR 44647 and PR 43863) to reduce code size to help embedded systems. There's still an outstanding enhancement request (PR 43852) to allow disabling the inclusion of the IO symbols you're seeing - some of them come from the verbose terminate handler, which prints a message when the process is terminated with an active exception. If you're not using execptions then some of that code is useless to you.
The problem is not about the STL, it is about the Standard library.
The STL itself is pure (in a way), but the Standard Library also includes all those streams packages and it seems that you also managed to pull in the libc as well...
The problem is that the Standard Library has never been meant to be picked apart, so there might not have been much concern into re-using stuff from the C Standard Library...
You should first try to identify which files are pulled in when you compile (using strace for example), this way you can verify that you only ever use header-only files.
Then you can try and remove the linking that occurs. There are options to pass to gcc to precise that you would like a standard library-free build, something like --nostdlib for example, however I am not well versed enough in those to instruct you exactly here.