Linkage of standard libraries in C++ code called from ASM - c++

as I am developing my "OsDev" project, where I am learning a new stuff (for somebody who did not code in C/C++ for a long time due to web development it is kinda "new"). I figured out in the other thread, that calling a C++ function from ASM needs to have a extern "C" prefix but now I have problem with the lining of standard libraries as a for example cstdio etc. I stuck with this message.
kc.o: In function `kmain':
kernel.cpp:(.text+0x3e4): undefined reference to `strlen`
C++
#include <string.h>
#include <cstdio>
#include "inc/screen.h"
extern "C" void kmain()
{
clearScreen();
kernel_print((char*)"Hello Github! :-)", 0x04);
}
and if I try to use strlen() it won't link. (BTW. including screen.h is working for some reason).
Compiling script
nasm -f elf32 kernel.asm -o kasm.o
g++ -c kernel.cpp -o kc.o -lgcc -m32 -Wall -Wextra -O2
ld -m elf_i386 -T link.ld -o kernel.bin kasm.o kc.o
link.ld
OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
{
. = 0x100000;
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss) }
}
Thanks for any suggestions. :)

Your code cannot work as kernel's can't directly use shared libraries.
Why can't I use shared libraries directly in my kernel?
When an application is loaded by the operating system, all the required files are brought into its address space. This includes the executable file and any dynamic libraries (all ABI-conforming ELF applications will always link with a system library - the C Standard Library or just libc).
But while loading the kernel, only the original executable is loaded. Multiboot 2 (with GRUB bootloader) will allow you to load kernel-modules which can be dynamic libraries. But still, your kernel must know how to link itself and the kernel-modules in physical memory. To do so, you must implement a ELF parser and dynamic linker in your kernel.
Before implementing one, make sure your kernel is mature enough to systematically handle dynamic memory allocation, pagination, and other basic features.
How can I use the sweet features of libc?
Usually, you won't use all of the userspace functionality of libc. But things like memcpy, strlen, strcpyn and so on are absolutely necessary. You will have to implement these functions on your own, but the better part here is that, you can change the names of these functions. For example, if you prefere camelCase for function names, then you can also use function names like copyMemory, lengthOfString, etc.
https://github.com/SukantPal/Silcos-Kernel
I have built my own kernel, which has a few implementations of the required functions in KernelHost/Source/Util/CircuitPrimitive.cpp. You can look into that. Also, it has a full-fledged module linker. (KernelHost, ModuleFramework, etc. those parent folders contain separate kernel-module source code).
Make sure not to use the standard C headers in your kernel, as for now. Implement all required functions on your own, including printf

Related

"no main" function for linking or execution in C++ [duplicate]

This question already has answers here:
How to change entry point of C program with gcc?
(4 answers)
Closed 5 years ago.
I am trying to compile a function (not called main) that can be integrated in another code or directly executed (after linking).
I try it one my mac, and work well.
I finally test it on Linux (CentOS and ubuntu). However, the task looks harder as expected on Linux.
The source code is the following one (just to explain the problem)
test.cpp:
#include <cstdio>
#ifdef __cplusplus
extern "C" {
#endif
int test(int argc, char const *argv[]);
#ifdef __cplusplus
}
#endif
int test(int argc, char const *argv[]) {
fprintf(stderr, "%s\n", "test");
return 0;
}
Compilation line on MacOS
g++ -c test.cpp -o test.o && g++ test.o -o test -e _test
and on Linux
g++ -c test.cpp -o test.o && g++ test.o -o test -e test
I try on my MacOS with clang, g++ and Intel compiler, all 3 works fine.
And I try with g++ and the Intel compiler on Linux, always, the same error.
usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
Any advice, explanation or solution, on what I am doing wrong or missing would be very helpful.
Thanks
Edit:
Currently, I have a "define" to create a main, but if we have lots of function we are obligated to do two compilations each time (one for the function version and one for the execution) and make finally the code heavier.
Like discussed in this topic is there a GCC compiler/linker option to change the name of main?
To don't do a XY I inherited from a bunch of small programs that I want to put to gather, that it is easier to use (for remote execution ...). However, each one need to be able to be executed independently if needed, for debugging,... I hesitate, between using "execv" and just convert each main as a function. I probably take the bad chose.
Edit:
The final goal is to be able to have independent programs. But that we can call from an external software too.
Solution:
The solution looks to be, to a call to the main through a dlopen
You cannot do that (and even if it appears to work on MacOSX it is implementation specific and undefined behavior).
Linux crt0 is doing more complex stuff that what you think.
The C standard (e.g. n1570 for C11) requires a main function for hosted implementations (§5.1.2.2.1) :
The function called at program startup is named main. The implementation declares no prototype for this function.
And the C++ standard also requires a main and strongly requires some processing (e.g. construction of static data) to be done before main is running and after it has returned (and various crt0 tricks are implementing that feature on Linux).
If you want to understand gory details (and they are not easy!), study the ABI and the (free software) source code of the implementation of the crt0.
I am trying to compile a function (not called main) that can be integrated in another code
BTW, to use dynamically some code (e.g. plug-ins) from another program, consider using the dynamic linker. I recommend using the POSIX compliant dlopen(3) with dlsym(3) on position-independent code shared libraries. It works on most Unix flavors (including MacOSX & Linux & Solaris & AIX). For C++ code beware of name mangling so read at least the C++ dlopen mini howto.
Read also the Program Library HowTo.
Problems with libraries, they cannot be executed, no ?
I don't understand what that means. You certainly can load a plugin then run code inside it from the main program dlopen-ing it.
(and on Linux, some libraries like libc.so are even specially built to also work as an executable; I don't recommend this practice for your own code)
You might take several days to read Drepper's How To Write Shared Libraries (but it is advanced stuff).
If you want to add some code at runtime, read also this answer and that one.
The final goal is to be able to have independent program. But that we can call from an external software too
You can't do that (and it would make no sense). However, you could have conventions for communicating with other running programs (i.e. processes), using inter-process communication such as pipe(7)-s and many others. Read Advanced Linux Programming first and before coding. Read also Operating Systems : Three Easy Pieces
The solution looks to be, to a call to the main through a dlopen
Calling the main function via dlopen & dlsym is forbidden by the C++ standard (which disallows using a pointer to main). The main function has a very specific status and role (and is compiled specially; the compiler knows about main).
(perhaps calling main obtained by dlsym would appear to work on some Linux systems, but it certainly is undefined behavior so you should not do that)

Barebones C++ without standard library?

Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?
You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.
"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.

I receive different results on UNIX and WIN when use static members with static linking of shared library to executable. Please explain why?

Please consider following peace of code:
// 1. Single header file. Imagine that it is some static library.
// Counter.h
#pragma once
struct Counter
{
Counter()
{
++getCount();
}
static int& getCount()
{
static int counter = 0;
return counter;
}
};
// 2. Shared library (!) :
// main_DLL.cpp
#include <iostream>
#include "counter.h"
extern "C"
{
__declspec(dllexport) // for WIN
void main_DLL()
{
Counter c;
std::cout << "main_DLL : ptr = " << &Counter::getCount()<< " value = " << Counter::getCount() << std::endl;
}
}
// 3. Executable. Shared library statically (!) linked to the executable file.
// main.cpp
#include "counter.h"
#include <iostream>
extern "C"
{
__declspec(dllimport) // for WIN
void main_DLL();
}
int main()
{
main_DLL();
Counter c;
std::cout << "main_EXE : ptr = " << &Counter::getCount() << " value = " << Counter::getCount() << std::endl;
}
Results:
Results for WIN (Win8.1 gcc 5.1.0):
main_DLL : ptr = 0x68783030 value = 1
main_EXE : ptr = 0x403080 value = 1
// conclusion: two different counters
Results for UNIX (Red Hat <I don’t remember version exactly> gcc 4.8.3):
main_DLL : ptr = 0x75693214 value = 1
main_EXE : ptr = 0x75693214 value = 2
// conclusion: the same counter addressed
Building:
Building for WIN:
g++ -c -Wall -Werror -o main_DLL.o main_DLL.cpp
g++ -shared -Wl,--out-implib=libsharedLib.a -o libsharedLib.so main_DLL.o
g++ -Wall –Werror -o simpleExample main.cpp -L./ -lsharedLib
Building for UNIX:
g++ -c -Wall -Werror -fPIC -o main_DLL.o main_DLL.cpp
g++ -shared -fPIC -o libsharedLib.so main_DLL.o
g++ -Wall –Werror -fPIC -o simpleExample main.cpp -L./ -lsharedLib
So, you see that I added –fPIC on UNIX and there is no need to create import library for UNIX, because all exports symbols are included inside shared library. On Windows I use __declspec for it.
For me, results on Windows are pretty much expected. Because shared library and executable are building separately and they should know about static variable in Counter::getCount. They should simply allocate memory for it, that’s why they have different static counters.
I did quite some analysis using tools like nm, objdump. Although I’m not a big expert in them, so I haven’t found anything suspicious. I can provide their output if needed.
Using ldd tool I can see that library linked statically in both cases.
Why I can’t see the same results on Unix for me it’s strange. Could the root cause lie in building options (–fPIC, for example), or I’m missing something?
In windows, A DLL is not exporting global and static symbols unless you add the dllexport statement, therefore, the linker doesn't even know they exists, so it allocate new instance for the static member.
In linux/unix a shared lib is exporting all the global and static symbols, so when the linker find the existence of the static member in the shared lib, it just use its address.
That is the reason for the different result.
EDIT: This is a complete rewrite of the answer. With much more details.
I think that this question deserves more elaborated answer. Especially that there are things that were not mentioned so far.
Dependency Walker
Let me start with referring to the “Dependency Walker” program.
It is a nice program (although these days a bit old-schoolish in its look & feel) that allows analyzing Windows binaries (both EXE and DLL) for symbols that they export/import and their own dependencies to other DLLs. Also it allows showing undecorated symbol names but this seems to be working only with MSVC build binaries. (And some more but that is not important here.)
Thanks to this program crucial information (for this question) can be uncovered. So I encourage you to use it during experiments.
Exporting policy on Linux vs. Windows
SHR already pointed this out but I will mention it also for completeness of the answer. And some extra details.
On Linux every symbol gets exported from a shared library by default. On the other hand on Windows you have to explicitly state which symbols to export from a shared library.
GCC seems however to provide some means of controlling exports in "Windows style". See for example Visibility entry on GCC Wiki.
Also note that there are various ways of exporting on both Linux and Windows. For example both seem to support exporting selectively by providing linker with a list of names for symbols to export. But it also seems that nowadays (on Windows at least) this isn't really used much. __declspec approach seems to be preferred.
What can be exported?
After that general introduction let's now stick to Windows case. Nowadays you export/import symbols from shared libraries by using the __declspec. Just as shown in the question. (Well maybe not exactly that - typically you use a #define to handle bi-directionality as shown in already mentioned Visibility entry on GCC Wiki.)
But the declaration can be applied not only to functions, methods and global variables. It can also be applied to types. For example you can have:
class __declspec(dllexport) Counter { /* ... */ };
Such exporting/importing means in general that all members get exported/imported.
Not so easy!
But it would be too easy, wouldn't it? The complication is that GCC and MSVC handle exporting types differently.
My notes here are based mostly on experiments (checks done using Dependency Walker) so I can be wrong or not precise enough. But I did observe differences in behavior.
In tests I used MSVC 2013 from the Express Edition with update 5. For GCC I used MinGW distro from nuwen.net, version 13.0.
MSVC, when exporting entire type, exports each and every member. Including implicitly defined members (like compiler generated copy constructor). And including inlined functions. Furthermore if inlined function has some static local variables they get exported to (!).
GCC on the other hand seems to be far more restrictive. It doesn't export implicitly defined members. Nor it doesn't export inlined members.
Exporting/Importing inline functions
If instead of exporting entire type you would explicitly export an inlined function then and only then will GCC really export it. But still it will not export static local variables in that function.
Further more if you try to import an inlined function GCC will error. With GCC you cannot define symbols that you are importing. And this happens when you import inlined (and so defined) symbol. So in fact it doesn't make any sense to export inlined functions with GCC.
MSVC allows to import inlined functions. In all cases I checked it didn't seem to actually inline the function but instead called the imported version.
Yet note that because MSVC in case of inlined function exports also its static local variables it would be possible for it to really inline the function (rather than import it) while maintaining a single copy of static local variables. For ordinary programs such behavior is mandated by the Standard (N3337, C++11), in point 7.1.2 ([dcl.fct.spec]) at $4 we can read:
(…) A static local variable in an extern inline function always refers to the same object. (…)
But a program and a shared library are actually more like two programs so they are out of scope for the Standard. Yet MSVC even in that case acts (or better to say: could act) as one would expect from a single program.
Solution
Denis Bakhvalov in a comment provided solution for his own question. The solution is to move getCount function from header to source file and export/import it.
This seems to be the only solution portable between GCC and MSVC. Or to be more precise MSVC allows more solutions to this problem but none of them will work when program is build under GCC.
The variable trick
The above is not entirely true. There is another workaround that will work consistently between GCC and MSVC.
This is to stop using static local variable. Instead make it a global variable (most likely by making it static variable in the class) and export it. This will make the trick as well.
Sadly there is no way (or I don't know any) to directly force exporting/importing static local variables. You have to change them to global variables to do that.
MSVC solutions
With MSVC you have more options.
As mentioned before exporting/importing the inlined function itself (whether directly or through type) will do the job.
Summary
As described above even consistency between GCC and MSVC on Windows only requires care. You have to limit yourself to stay in common subset of allowed solutions.
Keeping the program (source) interoperable between Linux and Windows even if with same compiler (GCC) also requires care.
Luckily there is a common subset for all three environments: GCC on Linux, GCC on Windows and MSVC on Windows. That common subset is described already by mentioned Denis' comment.
So do not inline functions that you intend to export/import. Keep them in sources. And on Windows builds (regardless of compiler) export them explicitly (otherwise you will get linker error anyway since the functions in sources of a shared library will not be available when building program).
Note that this is actually a reasonable approach on its own. Inlining function from shared library doesn't seem wise. It freezes not only the interface but also implementation (of that function). You can no longer change this function freely (and deliver new version of your shared library) since all clients would have to be rebuild since they could have inlined that function. So it is a wise approach by itself not to inline from shared library. And as a bonus it assures that your sources are multi-platform friendly.
Also do have a look into the mentioned Visibility entry on GCC Wiki. It might be reasonable to use that approach (of explicit exports) on Linux as well since it seems cleaner (from design point of view) and more efficient at runtime. While it fits well what you have to do for Windows anyway.

Writing a plugin system?

After many hours of research I have turned up nothing, so I turn to you good folks in hopes of a solution. I am going to be writing a bot in c++, and at some point would like to make a plugin system for it. Now I know I could just write a scripting language for it, however, I know its possible to just write an api and have the program link to that dynamically at run time. My question is, how do i get that dynamic linkage (like what hexchat has for its plugins)? Are there any elegant solutions, or at least theories on the typical design?
On Linux and Posix systems, you want to use dlopen(3) & dlsym (or some libraries wrapping these functions, e.g. Glib from GTK, Qt, POCO, etc...). More precisely,
Build a position independent code shared library as your plugin:
gcc -fPIC -Wall -c plugin1.c -o plugin1.pic.o
gcc -fPIC -Wall -c plugin2.c -o plugin2.pic.o
Notice that if the plugin is coded in C++ you'll compile it with g++ and you should declare the plugin functions as extern "C" to avoid name mangling.
Then link your plugin as
gcc -shared -Wall plugin1.pic.o plugin2.pic.o -o plugin.so
You may add dynamic libraries (e.g. a -lreadline at end of command above if your plugin wants GNU readline).
At last, call dlopen with a full path in your main program, e.g.
void* dlh = dlopen("./plugin.so", RTLD_NOW);
if (!dlh) { fprintf(stderr, "dlopen failed: %s\n", dlerror());
exit(EXIT_FAILURE); };
(often dlh is a global data)
Then use dlsym to get the function pointers. So declare their signature in some header included both by program and plugin code like
typedef int readerfun_t (FILE*);
declare some (often) global function pointers
readerfun_t* readplugfun;
and use dlsym on the plugin handle dlh:
readplugfun = (readerfun_t*) dlsym(dlh, "plugin_reader");
if (!readplugfun) { fprintf (stderr, "dlsym failed: %s\n", dlerror());
exit(EXIT_FAILURE); };
Of course in your plugin source code (e.g. in plugin1.cc) you'll define
extern "C" int plugin_reader (FILE*inf) { // etc...
You might define some constructor (or destructor) functions in your plugin (see GCC function attributes); the would be called at dlopen (or dlclose) time. In C++ you should simply use static objects. (their constructor is called at dlopen time, their destructor is called at dlclose time; hence the name of the function attributes).
At the end of your program call
dlclose(dlh), dlh = NULL;
In practice, you can do a lot (perhaps a million) of dlopen calls.
You generally want to link your main program with -rdynamic to let its symbols be visible from plugins.
gcc -rdynamic prog1.o prog2.o -o yourprog -ldl
Read Program Library HowTo & C++ dlopen mini HowTo & Drepper's paper: How to Write a Shared Library
The most important part is to define and document a plugin convention (i.e. "protocol"), that is a set (and API) of functions (to be dlsym-ed) required in your plugin and how to use them, in which order they are called, what is the memory ownership policy, etc. If you allow several similar plugins, you might have some well documented hooks in your main program which calls all the dlsym-ed functions of relevant dlopen-ed plugins. Examples: GCC plugins conventions, GNU make modules, Gedit plugins, ...

in gcc how to force symbol resolution at runtime

My first post on this site with huge hope::
I am trying to understand static linking,dynamic linking,shared libraries,static libraries etc, with gcc. Everytime I try to delve into this topic, I have something which I don't quite understand.
Some hands-on work:
bash$ cat main.c
#include "printhello.h"
#include "printbye.h"
void main()
{
PrintHello();
PrintBye();
}
bash$ cat printhello.h
void PrintHello();
bash$ cat printbye.h
void PrintBye();
bash$ cat printbye.c
#include <stdio.h>
void PrintBye()
{
printf("Bye bye\n");
}
bash$ cat printhello.c
#include <stdio.h>
void PrintHello()
{
printf("Hello World\n");
}
gcc -Wall -fPIC -c *.c -I.
gcc -shared -Wl,-soname,libcgreet.so.1 -o libcgreet.so.1.0 *.o
ln -sf libcgreet.so.1.0 libcgreet.so
ln -sf libcgreet.so.1.0 libcgreet.so.1
So I have created a shared library.
Now I want to link this shared library with my main program to create an executable.
gcc -Wall -L. main.c -lcgreet -o greet
It very well works and if I set the LD_LIBRARY_PATH before running greet( or link it with rpath option) I can make it work.
My question is however different:
Since I am anyway using shared library, is it not possible to force symbol resolution at runtime (not sure about the terminology but perhaps called dynamic linking as per the book "Linkers and Loaders"). I understand that we may not want to do it because this makes the program run slow and has overhead everytime we want to run the program, but I am trying to understand this to clear my concepts.
Does gcc linker provide any option to delay symbol resolution at runtime? (to do it with the library we are actually going to run the program with)(as library available at compile time may be different than the one available at runtime if any changes in the library)
I want to be able to do sth like:
bash$ gcc main.c -I.
(what option needed here?)
so that I don't have to give the library name, and just tell it that I want to do symbol resolution at runtime, so headers are good enough for now, actual library names are not needed.
Thanks,
Learner For Ever.
Any linker (gcc, ld or any other) only resolves links at compile-time. That is because the ELF standard (as most others) do not define 'run-time' linkage as you describe. They either link statically (i.e. lib.a) or at start-up time (lib.so, which must be present when the ELF is loaded). However, if you use a dynamic link, the linker will only put in the ELF the name of the file and the symbols it must find, it does not link the file directly. So, if you want to upgrade the lib to a newer version later, you can do so, as long as system can find the same filename (the path can actually be different) and the same symbol names.
The other option, to get symbols at run-time, is to use dlopen, which has nothing to do with gcc or ld. dlopen simply put, opens a dynamic link library, just like fopen might, and returns you a handle, which then you pass to dlsym with the name of the symbol you want, which might be a function name for example. dlsym will then pass you a pointer to that symbol, which you can then use to call the function or use as a variable. This is how plugins are implemented.
I think you are looking for ld option '--unresolved-symbols=ignore-all', yes it can actually do it (ignore prev answer). Imagine scenario where a shared library loaded late (when program is already running), it can use all symbols that are already resolved/loaded by the main process, no need to bother to do it again . btw it does not nervelessly makes it slow , at least on Linux