I have messed around a few times by making a small assembly boot loader on a floppy disk and was wondering if it's possible to make a boot loader in c++ and if so where might I begin? For all I know im not sure it would even use int main().
Thanks for any help.
If you're writing a boot loader, you're essentially starting from nothing: a small chunk of code is loaded into memory, and executed. You can write the majority of your boot loader in C++, but you will need to bootstrap your own C++ runtime environment first.
Assembly is really the only option for the first stage, as you need to set up a sensible environment for running anything higher-level. Doing enough to run C code is fairly straightforward -- you need:
code and data loaded in the right place;
there may be an additional part of the data area which must be zero-initialised;
you need to point the stack pointer at a suitable area of memory for the stack.
Then you can jump into the code at an appropriate point (e.g. main()) and expect that the basic language features will work. (It's possible that any features of the standard library that may have been implemented or linked in might require additional initialisation at this stage.)
Getting a suitable environment going for C++ requires more effort, as it needs more initialisation here, and also has core language features which require runtime support (again, this is before considering library features). These include:
running static constructors;
memory allocation to support new and delete;
support for run-time type information (RTTI);
support for exceptions;
probably some other things I've forgotten to mention.
None of these are required until the C environment is up and running, so the code that handles these can be written in C rather than assembler (or even in a subset of C++ that does not make use of the above features).
(The same principles apply in embedded systems, and it's not uncommon for such systems to make use of C++, but only in a limited way -- e.g. no exceptions and/or RTTI because the runtime support isn't implemented.)
It's been a while since I played with writing bootloaders, so I'm going off memory.
For an x86 bootloader, you need to have a C++ compiler that can emit x86 assembly, or, at the very least, you need to write your own preamble in 16-bit assembly that will put the CPU into 32-bit protected (or 64-bit long) mode, before you can call your C++ functions.
Once you've done that, though, you should be able to make use of most, if not all, of C++'s language features, so long as you stay away from things that require an underlying libc. But statically link everything without the CRT and you're golden.
Bootloaders don't have "int main()"s, unless you write assembly code to call it.
If you are writing a stage 1 bootloader, then it is seriously discouraged.
Otherwise, the osdev.org has great documentation on the topic.
While it is probably possible to make a bootloader in C++, remember not to link your code to any dynamic libraries, and remember that just because it is C++, that doesn't mean you can/should use the STL, etc.
Yes it is possible. You have elements of answer and usefull links in this question
You also can have a look here, there is a C++ bootloader example.
The main thing to understand is that you need to create a flat binary instead of the usual fancy executable file formats (PE on windows, or ELF on Unixes), because these file format need an OS to load them, and in a boot loader you don't have an OS yet.
Using library is not a problem if you link statically (no dynamic link because again of the above executable problem). But obviously all OS API related entry points are not available...
Related
I am almost certain this question has been asked before, but I can not seem to find the right keywords to search for to get an answer. My apologies if this is a duplicate.
I am better trying to understand the compilation process of say a C++ file as it goes from the C++ syntax to the binary machine code. In addition I am trying to understand what influences the resulting machine code.
First, I am nearly certain that the following are the only factors (for most systems) that dictate the final machine code (please correct me if I am wrong here)
The tools used to compile, assemble, and link.
Things like gnu c compiler, clang, visual studio, nasm, ect.
The kernel of the system being used.
Whether its a specific version of the linux kernel, windows microkernel, or some other kernel like a mac os x one.
The operating system being used.
This one I am less clear about. I am unsure if machines running the same linux kernel, but different os, in this case let's say debian vs centos, will they produce different binaries.
Lastly the hardware architecture.
Different cpu architectures like arm 64, x86, power pc, ect. take different op codes so obviously the machine code should be different.
So with that being said here is my understanding of the compilation process and where each of these dependencies show up.
I write a C++ file and use code that my system can understand. A good example might be using <winsock.h> on windows and <sys/socket.h> on linux.
The preprocessor runs and executes any preprocessor macros.
Here I know that different preprocessors will define different macros but for now I will assume this is not too machine dependent. (This might be wrong to assume).
The compiler tools run to produce assembly file outputs.
Here the assembly produced depends on the compiler and what optimizations or choices it makes.
It also depends on the kernel because different kernels have different system calls and store files in different locations. This means the assembly might make changes such as different branching when calling functions specific to that kernel.
The operating system? Still unsure how the operating system fits in to this. If two machines have the same kernel, what does the operating system do to the binaries?
Finally the assembly code depends on the cpu architecture. I think that is a pretty obvious statement.
Once the compiler produces an assembly. We can then invoke the assembler to turn our assembly code into almost complete machine code. (I think machine code is identical to binary opcodes a cpu manual lists but this might be wrong).
The corresponding machine code files (often called object files I think) contain nearly all the instructions needed to run or reference other machine code files which will be linked in the next step.
This machine code usually has some format (I think ELF is a popular format for linux) and this format is dependent on the linker for sure.
I don't think the kernel, operating system, or hardware affect the layout/format of the object file but this is probably wrong. If they do please correct this.
The hardware will affect the actual machine code produced because again I think it is a 1 to 1 mapping of machine code instructions to opcodes for a cpu.
I am unsure if the kernel or operating system affect the linking process because I thought their changes were already incorporated in the compiling step.
Finally the linking step occurs.
I think this is as simple as the linker looking for all the referenced machine code and injecting it into one complete machine code file which can be executed.
I have no clue what affects this besides the linker tool itself.
So with all that, I need help identifying inaccuracies with the procedure I described above, and any dependencies I might have missed whether it be cpu, os, kernel, or tool ones.
Thank you and sorry for the long winded question. This probably should have been broken up into multiple questions but I am too far in. If this does not go well I may ask each part in individual questions.
EDIT:
Questions with more focus.
What components of a machine affect the machine code produced given a C++ file input?
Actually that is a lot of questions and usually you're question would be much too broad for SO (as you managed to recognize by yourself). But on the other hand you showed a deep interest (just by writing such a long and profound question) and also a lot of correct understanding of the process of compiling a program. The things you are missing or not understanding correctly (and you are probably the most interested in) are those things, that I myself found hard to learn. Thus I will provide you with some important points, that I think you are missing in the big picture.
Note that I am very much used to Linux, so I will mostly describe how things work on Linux. But I believe that most things also happen in a similar way on other operating systems.
Let's begin with the hardware. A modern computer has a CPU of some architecture. There are lots of different of CPU architectures. You mentioned some of them like arm, x86, etc. which are families of similar CPUs and can be divided into smaller groups by bit width and/or supported extensions. Ultimately your processor has a specified instruction set that defines which opcodes it supports and what those opcodes do. If a native (compiled) program runs, there are raw opcodes in the memory and the CPU directly executes them following its architecture specification.
Aside from the CPU there is a lot more hardware connected to your computer. Usually communicating with this hardware is complicated and not standardized. If a user program for example gets input keystrokes from the keyboard, in does not have to directly communicate with the keyboard, but rather does this via the operating system kernel. This works by a mechanism called syscall interrupt. The kernel installs an handler routine, that is called if a user program triggers such an interrupt with a special CPU instruction. You can think of it like a language agnostic function call from the program into the kernel. For example for Linux you can find a list of all syscalls at the syscall(2) man page. The syscalls form the kernel's Application Binary Interface (kernel ABI). Reading and writing from a terminal or using a filesystem are examples for syscall functionality.
As you can see, there are already very high level functions, that are implemented in the kernel. However the functionality is still quite limited for most typical applications. To encapsulate the syscalls and provide functions for memory management, utility functions, mathematical functions and many other things you probably use in your daily programs, there is usually another layer between the program and the kernel. This thing is called the C standard library, and it is a shared library (we will cover what exactly this is in a moment). On GNU/Linux it is the glibc which is the single most important library on a GNU/Linux system (and notably not part of the kernel 1). While it implements all the features that are required by the C standard (for example functions like malloc() or strcpy()), it also ships a lot of additional functions which are a superset of the ISO C standard library, the POSIX standard and some extensions. This interface is usually called the Application Programming Interface (API) of the operating system. While it is in principle possible to bypass the API and directly use the syscalls, almost all programs (even when written in other languages than C or C++) use the C library.
Now get yourself a coffee and a few minutes of rest. We now have enough background information to look at how a C++ program is transformed into a binary, and how exactly this binary is executed.
A C++ program consists of different compilation units (usually each different source file is a compilation unit). Each compilation unit undergoes the following steps
The preprocessor is run on the file. It includes header, expands macros and does some other stuff. As you wrote in your question this is rather platform independent. The preprocessor actions are standardized in the C++ standard.
The resulting code is compiled. That means C++ code is translated into assembly code. Because assembly code directly reflects the CPU instructions, this step is dependent on the target CPU architecture, that the compiler was configured for (usually the host CPU). The compiler is allowed to optimize and translate the program in any way it wants, as long as it follows the as-if rule. Thus this step is also higly dependent on the compiler you are using.
Note: Symbols (especially functions) that are not defined, are left undefined. If you say call the malloc() function, this will not be compiled, but left unevaluated until later. Thus this step is also not much dependent on the operating system.
Assembling takes place. This is very straightforward. The assembly code usually can be converted directly into binary CPU instructions. Local symbols (such as goto labels etc.) are resolved and replaced by their corresponding addresses. Unknown external symbols such as the mentioned malloc() call still are left unevaluated and are stored in the object file's symbol table. Because most of the syscalls are wrapped in library functions, the assembly code will usually not directly contain syscall code. Thus this step is depended on the CPU architecture. It is however dependent on the ABI2, which in term is dependent on the compiler and the OS.
Linking takes place. The different compilation units are combined into a single executable binary in an OS-dependent format (e.g. GNU/Linux uses ELF). Here yet more symbols are resolved. For example if one compilation calls a function in another compilation unit, this call is resolved and the symbol is replaced by the function address. If you link to a library statically, this is just treated like another compilation unit and included into the executable with its symbols resolved.
Shared libraries are checked for the needed symbols, but not linked yet. For example in case of the malloc() call, the linker checks, that there is a malloc symbol in the glibc, but the symbol in the executable still remains unresolved.
At this point you have a executable binary. As you might noticed, there might still be unresolved symbols in that binary. Thus you cannot just load that binary into RAM and let the CPU execute it. A final step called dynamic linking is needed. On Linux the program that performs this step is called the dynamic linker/loader. Its task is to load the executable ELF file into memory, look up all the needed dynamic libraries, load them into memory as well (a list is stored in the ELF file) and resolve the remaining symbols. This last step happens each time the program is executed. Now finally the malloc() symbol is resolved with the address in the glibc shared library.
You have pure CPU instructions in memory, the CPU's program counter register (the one that tracks the next instruction) is set to the entry point, and the program can begin to run. Every now and then it is interrupted either because it makes a syscall, or because it is interrupted by the kernel scheduler to let another program run on that CPU core.
I hope I could answer some of your questions and satisfy your curiosity. I think the most important part you were missing, was how dynamic linking happens. This is a very interesting topic which is related to concepts like position independent code. I wish you could luck learning.
1 this is also one reason why some people insist on calling Linux based systems GNU/Linux. The glibc library (together with many other GNU programs) defines much of the operating system structure, interacts with supplementary programs and configuration files etc. There are however Linux based systems without glibc. One of them is Android, using Googles bionic libc.
2 The ABI is related to the calling convention. This is a mixture of operating system, programming language and compiler specification. It is one of the reasons (besides name mangling, see the comment of PeterCordes below) you need those extern "C" {...} scopes in C++ header files, that declare C functions in shared libraries. It basically is a convention on how to pass parameters and return values between functions.
Neither operating system nor kernel are directly involved in any of this.
Their limited involvement is in that if you want to build Linux 64 bit binaries for x86 using gnu tools then you need to in some way (download and install or build yourself) build the gnu tools themselves for that target processor and that operating system. As system calls are specific to the operating system and target, and also the binaries supported by that operating system. Not strictly just the elf file format, that is just a container, but the linking and possibly bootstrap is also specific to the operating systems loader. (or if building something for the kernel that would have other rules). For example, does the application loader initialize .bss and .data for you from specific information in the .elf file, or like on an mcu does the bootstrap code itself have to do this?
The builder for gnu tools for a target like linux and ideally a pre-built binary for your os and target, would have paths setup in some way. The c library would have a default linker script and its intimate partner the bootstrap.
After that point, it is just a dumb toolchain. Include files be they at the system level, compiler level, or programmer level are just includes in the C language. The default paths and gcc knows where it was executed from so it knows where in a normal build the gcc and other libraries live.
gcc itself is not a compiler actually it calls other programs like the preprocessor, the compiler itself, the assembler and linker.
The preprocessor is going to do the search and replace for includes and defines and end up with one great big cpp file, then pass that to the compiler.
The compiler front end (C++ language for gcc for example) turns that into an internal language, allocate an int with this name, and another add the two and blah. A pseudo code if you will. This gets a lot of the optimization work done on it then eventually the back end (which for gnu could be x86, mips, arm, etc independent to some extent of the front and middle). The LLVM tools, are at least capable of exposing that middle, internal, language to external files (external to the memory used by the compiler to do the compilation) and you can combine and optimize those bytecode files and then convert them to assembly or direct to object in the llvm world. I think this is an exception not a rule, others just use internal tables.
While I think it is wise and sane to use an assembly language step. Not all compilers do and do not assume that all compilers do. Some output objects.
Yes that assembly is naturally partial, external functions (labels) and variables (labels) cannot be resolved at the object level. The linker has to do that.
So the target (x86, arm, etc) does affect the construction of the elf file as
there are certain items, magic numbers specific to the target. As mentioned the operating system and or kernel do affect the elf in that there are rules for construction of the binary for that kernel or operating system. Remember that elf is just a container like tar or zip or mkv etc. Do not assume that the operating system can handle every possible choice you want to make with the contents that the linker will allow (the tools are dumb, do what they are told).
So your source.
All the relevant sources that go with it including system includes, compiler includes and your includes.
gcc/g++ is a wrapper program that manages the steps.
calls the pre-processor expands includes and defines into one file (no magic here)
call the compiler to parse that one file into internal tables, think pseudo code and data
many, many possible optimizers that operate on these structures
backend, including peephole optimizer, turns the tables into assembly language (for gnu at least)
assembler is called to turn the asm into an object
If all the objects are specified and gcc is told to link, then...
Linker combines all the objects for the binary, including the bootstrap, including already built libraries, stubs, etc, and command line or more likely a linker script (linker script and bootstrap have an intimate relationship they are not assumed to be separable and not part of the compiler they are part of a C library, etc).
Kernel module loader or operating system application loader fed the file and per the rules of that loader loads and runs the program.
Recently I have been learning how to program in C++, and was wandering, if compiler languages are translated to machine code is it possible to just simply run the code as if it was an assembly code? Or in another example I load just the compiled code onto a formatted flash drive and nothing else and plug up that flash drive into a computer with no OS on it what so ever, and boot from the flash drive to make the computer run the compiled code, and nothing else. Is something like this even possible? Is the language not supported directly by the processor or is some sort of interpreter/execution environment for the language needed to run the program?
Sorry if what im asking is a bit abstract, tbh I don't know exactly how to explain it beyond providing examples.
Almost.
You will probably need some initialization before you can hand execution over to compiled C++. For example you would maybe need to initialize the stack pointer and other low level initialization that can't be done in C++.
After that you should be aware that there are some initialization that needs to be done before main is being run, but that could normally be done in C++, especially if you want a reasonable set of the features of the language (memory allocation, exception handling etc) available.
You should also be aware that much of the functionality that are taken for granted are normally handled by the operating system. Without an OS the executable would have to have libraries that handles that functionality if needed (like for example stream output functionality, file system etc).
So I watched this video months back where Epic showcased their new Unreal Engine 4 developer features. Sorry I could not find the video but I'll try my best to explain.
The one feature that got my attention was the C++ "on the fly" modification and compilation. The guy showed how he was playing the game in editor and modified some variables in the code, saved it and the changes were immediately mirrored in game.
So I've been wondering... How does one achieve this? Currently I can think of two possible ways: either a hoax and it was only "c style"-scripting language not C++ itself OR it's shared library (read: DLL) magic.
Here's something I whipped up to try myself(simplified):
for(;;)
{
Move DLL from build directory to execution directory
Link to DLL using LoadLibrary() / dlopen() and fetch a pointer to function "foo()" from it
for(;;)
{
Call the function "foo()" found in the dll
Check the source of the dll for changes
If it has changed, attempt to compile
On succesfull compile, break
}
Unload DLL with FreeLibrary() / dlclose()
}
Now that seems to work but I can't help but wonder what other methods are there?
And how would this kind of approach compare versus using a scripting language?
edit: https://www.youtube.com/watch?v=MOvfn1p92_8&t=10m0s
Yes, "hot code modification" it's definitely a capability that many IDEs/debuggers can have to one extent or another. Here's a good article:
http://www.technochakra.com/debugging-modifying-code-at-runtime/
Here's the man page for MSVS "Edit and Continue":
http://msdn.microsoft.com/en-us/library/esaeyddf%28v=vs.80%29.aspx
Epic's Hot Reload works by having the game code compiled and loaded as a dll/.so, and this is then dynamically loaded by the engine. Runtime reloading is then simply re-compiling the library and reloading, with state stored and restored if needed.
There are alternatives. You could take a look at Runtime Compiled C++ (or see RCC++ blog and videos), or perhaps try one of the other alternatives I've listed on the wiki.
As the author of Runtime Compiled C++ I think it has some advantages - it's far faster to recompile since it only compiles the code you need, and the starting point can be a single exe, so you don't need to configure a seperate project for a dll. However it does require some learning and a bit of extra code to get this to work.
Neither C nor C++ require ahead-of-time compilation, although the usual target environments (operating systems, embedded systems, high-performance number crunching) often benefit greatly from AOT.
It's completely possible to have a C++ script interpreter. As long as it adheres to the behavior in the Standard, it IS C++, and not a "hoax" as you suggest.
Replacement of a shared library is also possible, although to make it work well you'll need a mechanism for serializing all state and reloading it under the new version.
guys I want to start programing with C++. I have written some programs in vb6, vb.net and now I want to gain knowledge in C++, what I want is a compiler that can compile my code to the smallest windows application. For example there is a Basic language compiler called PureBasic that can make Hello world standalone app's size 5 kb, and simple socket program which i compiled was only 12kb (without any DLL-s and Runtime files). I know it is amazing, so I want something like this for C++.
If I am wrong and there is not such kind of windows compiler can someone give me a website or book that can teach me how to reduce C++ executable size, or how to use Windows API calls?
Taking Microsoft Visual C++ compiler as example, if you turn off linking to the C runtime (/NODEFAULTLIB) your executable will be as small as 5KB.
There's a little problem though: you won't be able to use almost anything from the standard C or C++ libraries, nor standard features of C++ like exception handling, new and delete operators, floating point arithmetics, and more. You'll need to use only the features directly provided by WinAPI (e.g. create files with CreateFile, allocate memory with HeapAlloc, etc...).
It's also worth noting that while it's possible to create small executables with C++ using these methods, you may not be using most of C++ features at this point. In fact typical C++ code have some significant bloat due to heavy use of templates, polymorphism that prevents dead code elimination, or stack unwinding tables used for exception handling. You may be better off using something like C for this purpose.
I had to do this many years ago with VC6. It was necessary because the executable was going to be transmitted over the wire to a target computer, where it would run. Since it was likely to be sent over a modem connection, it needed to be as small as possible. To shrink the executable, I relied on two techniques:
Do not use the C or C++ runtime. Tell the compiler not to link them in. Implement all necessary functionality using a subset of the Windows API that was guaranteed to be available on all versions of Windows at the time (98, Me, NT, 2000).
Tell the linker to combine all code and data segments into one. I don't remember the switches for this and I don't know if it's still possible, especially with 64-bit executables.
The final executable size: ~2K
Reduction of the executable size for the code below from 24k to 1.6k bytes in Visual C++
int main (char argv[]) {
return 0;
}
Linker Switches (although the safe alignment is recommended to be 512):
/FILEALIGN:16
/ALIGN:16
Link with (in the VC++ project properties):
LIBCTINY.LIB
Additional pragmas (this will address Feruccio's suggestion)
However, I still see a section of ASCII(0) making a third of the executable, and the "Rich" Windows signature. (I'm reading the latter is not really needed for program execution).
#ifdef NDEBUG
#pragma optimize("gsy",on)
#pragma comment(linker,"/merge:.rdata=.data")
#pragma comment(linker,"/merge:.text=.data")
#pragma comment(linker,"/merge:.reloc=.data")
#pragma comment(linker,"/OPT:NOWIN98")
#endif // NDEBUG
int main (char argv[]) {
return 0;
}
I don't know why you are interested in this kind of optimization before learning the language, but anyways...
It doesn't make much difference of what compiler you use, but on how you use it. Chose a compiler like the Visual Studio C++'s or MinGW for example, and read its documentation. You will find information of how to optimize the compilation for size or performance (usually when you optimize for size, you lose performance, and vice-versa).
In Visual Studio, for example, you can minimize the size of the executable by passing the /O1 parameter to the compiler (or Project Properties/ C-C++ /Optimization).
Also don't forget to compile in "release" mode, or your executable may be full of debugging symbols, which will increase the size of your executable.
A modern desktop PC running Windows has at least 1Gb RAM and a huge hard drive, worrying about the size of a trivial program that is not representative of any real application is pointless.
Much of the size of a "Hello world" program in any language is fixed overhead to do with establishing an execution environment and loading and starting the code. For any non-trivial application you should be more concerned with the rate the code size increases as more functionality is added. And in that sense it is likley that C++ code in any compiler is pretty efficient. That is to say your PureBasic program that does little or nothing may be smaller than an equivalent C++ program, but that is not necessarily the case by the time you have built useful functionality into the code.
#user: C++ does produce small object code, however if the code for printf() (or cout<<) is statically linked, the resulting executable may be rather larger because printf() has a lot of functionality that is not used in a "hello world" program so is redundant. Try using puts() for example and you may find the code is smaller.
Moreover are you sure that you are comparing apples with apples? Some execution environments rely on a dynamically linked runtime library or virtual machine that is providing functionality that might be statically linked in a C++ program.
I don't like to reply to a dead post, but since none of the responses mentions this (except Mat response)...
Repeat after me: C++ != ( vb6 || vb.net || basic ). And I'm not only mentioning syntax, C++ coding style is typically different than the one in VB, as C++ programmers try to make things usually better designed than vb programmers...
P.S.: No, there is no place for copy-paste in C++ world. Sorry, had to say this...
I am not trying to do any such thing, but I was wondering out of curiosity whether one could implement an "entire OS" (not necessarily something big like Linux or Microsoft Windows, but more like a small DOS-like operating system) in C and/or C++ using no or little assembly.
By implementing an OS , I mean making an OS from scratch starting the boot-loader and the kernel to the graphics drivers (and optionally GUI) in C or C++. I have seen a few low-level things done in C++ by accessing low-level features through the compiler. Can this be done for an entire OS?
I am not asking whether it is a good idea, I am just asking whether it is even remotely possible?
Obligatory link to the OSDev wiki, which describes most of the steps needed to create an OS as described on x86/x64.
To answer your question, it is gonna be extremely difficult/unpleasant to create the boot loader and start protected mode without resorting to at least some assembly, though it can be kept to a minimum (especially if you're not really counting stuff like using __asm__ ( "lidt %0\n" : : "m" (*idt) ); as 'assembly').
A big hurdle (again on x86) is that the processor starts in 16-bit real mode, so you need some 16-bit code. According to this discussion you can have GCC generate 16-bit code, but you would still need some way to setup memory, load code from some storage media and so on, all of which requires interfacing with the hardware in ways that standard C just has no concept of (interrupts, IO ports etc.).
For architectures which communicate with hardware solely through memory mapped IO you could probably get away with writing everything except the C start-up code (that sets up the stack, initializes variables and so on) in pure C, though specific requirements of interrupt routines / exception or syscall gates etc. may be difficult to impossible to implement (as you have to access special CPU registers).
I assume that you have an OS for x86 in mind. In that case you need at least a few pages of assembler to set up protected mode and stuff like that, and besides that a lot of knowledge of all the stuff like paging, call gates, rings, exceptions, etc. If you are going to use a form of system calls you'll also need some lines of assembly code to switch between kernel and userspace mode.
Besides those things the rest of an OS can easily be programmed in C. For C++ you'll need a runtime environment to support things like virtual members and exceptions, but as far as I know that can all be programmed in C.
Just take a look at Linux kernel source, the most important assembler code (for x86) can be found in arch/x86/boot, but you'll notice that even in that directory most files are written in C. Furthermore you'll find a few assembly lines in the arch/x86/kernel directory for handling system calls and stuff like that.
Outside the arch directory there is hardly any assembler used (because assembler is machine specific, that machine specific code belongs in the arch directory). Even graphic drivers don't use assembler (e.g. nvidia driver in drivers/gpu/drm/nouveau).
A boot loader? You might want to skip that bit. For instance, Linux is quite often started by non-Linux boot loaders such as UBoot. After all, once the system is running, the OS will be present but not the boot loader, that's just there to get the OS proper into memory.
And once you've selected a decent existing bootloader, the remainder is pretty much all straightforward. Well, you have to deal with memory and files yourself; you can't rely on fopen obviously. But even a C++ compiler has little problem generating code that can run without OS support.