Is it possible to directly run C++ at assembly level? - c++

Recently I have been learning how to program in C++, and was wandering, if compiler languages are translated to machine code is it possible to just simply run the code as if it was an assembly code? Or in another example I load just the compiled code onto a formatted flash drive and nothing else and plug up that flash drive into a computer with no OS on it what so ever, and boot from the flash drive to make the computer run the compiled code, and nothing else. Is something like this even possible? Is the language not supported directly by the processor or is some sort of interpreter/execution environment for the language needed to run the program?
Sorry if what im asking is a bit abstract, tbh I don't know exactly how to explain it beyond providing examples.

Almost.
You will probably need some initialization before you can hand execution over to compiled C++. For example you would maybe need to initialize the stack pointer and other low level initialization that can't be done in C++.
After that you should be aware that there are some initialization that needs to be done before main is being run, but that could normally be done in C++, especially if you want a reasonable set of the features of the language (memory allocation, exception handling etc) available.
You should also be aware that much of the functionality that are taken for granted are normally handled by the operating system. Without an OS the executable would have to have libraries that handles that functionality if needed (like for example stream output functionality, file system etc).

Related

How to Prevent I/O Access in C++ or Native Compiled Code

I know this may be impossible but I really hope there's a way to pull it off. Please tell me if there's any way.
I want to write a sandbox application in C++ and allow other developers to write native plugins that can be loaded right into the application on the fly. I'd probably want to do this via DLLs on Windows, but I also want to support Linux and hopefully Mac.
My issue is that I want to be able to prevent the plugins from doing I/O access on their own. I want to require them to use my wrapped routines so that I can ensure none of the plugins write malicious code that starts harming the user's files on disk or doing things undesireable on the network.
My best guess on how to pull off something like this would be to include a compiler with the application and require the source code for the plugins to be distributed and compiled right on the end-user platform. Then I'd need an code scanner that could search the plugin uncompiled code for signatures that would show up in I/O operations for hard disk or network or other storage media.
My understanding is that the STD libaries like fstream wrap platform-specific functions so I would think that simply scanning all the code that will be compiled for platform-specific functions would let me accomplish the task. Because ultimately, any C native code can't do any I/O unless it talks to the OS using one of the OS's provided methods, right??
If my line of thinking is correct on this, does anyone have a book or resource recommendation on where I could find the nuts and bolts of this stuff for Windows, Linux, and Mac?
If my line of thinking is incorrect and its impossible for me to really prevent native code (compiled or uncompiled) from doing I/O operations on its own, please tell me so I don't create an application that I think is secure but really isn't.
In an absolutely ideal world, I don't want to require the plugins to distribute uncompiled code. I'd like to allow the developers to compile and keep their code to themselves. Perhaps I could scan the binaries for signatures that pertain to I/O access????
Sandboxing a program executing code is certainly harder than merely scanning the code for specific accesses! For example, the program could synthesize assembler statements doing system calls.
The original approach on UNIXes is to chroot() the program but I think there are problems with that approach, too. Another approach is a secured environment like selinux, possible combined with chroot(). The modern approach used to do things like that seems to run the program in a virtual machine: upon start of the program fire up a suitable snapshot of a VM. Upon termination just rewind to tbe snaphot. That merely requires that the allowed accesses are somehow channeled somewhere.
Even a VM doesn't block I/O. It can block network traffic very easily though.
If you want to make sure the plugin doesn't do I/O you can scan it's DLL for all it's import functions and run the function list against a blacklist of I/O functions.
Windows has the dumpbin util and Linux has nm. Both can be run via a system() function call and the output of the tools be directed to files.
Of course, you can write your own analyzer but it's much harder.
User code can't do I/O on it's own. Only the kernel. If youre worried about the plugin gaining ring0/kernel privileges than you need to scan the ASM of the DLL for I/O instructions.

What does embedding a language into another do?

This may be kind of basic but... here goes.
If I decide to embed some kind of scripting language like Lua or Ruby into a C++ program by linking it's interpreter what does that allow me to do in C++ then?
Would I be able to write Ruby or Lua code right into the cpp file or simply call scripts from the program?
If the latter is true, how would I do that?
Because they're scripting languages, the code is always going to be "interpreted." In reality, you aren't "calling" the script code inside your program, but rather when you reach that point, you're executing the interpreter in the context of that thread (the thread that reaches the scripting portion), which then reads the scripting language and executes the applicable machine code after interpreting it (JIT compiling kind of, but not really, there's no compiling involved).
Because of this, its basically the same thing as forking the interpreter and running the script, unless you want access to variables in your compiled program/in your script from the compiled program. To access values to/from, because you're using the thread that has your compiled program's context, you should be able to store script variables on the stack as well and access them when your thread stops running the interpreter (assuming you stored the variables on the stack).
Edit: response:
You would have to write it yourself. Think about it this way: if you want to use assembly in c++, you use the asm keyword. You then in the c++ compiler, need to parse the source file, get to the asm keyword, and then switch to the assembly compiler. Then the assembly compiler needs to go until the end bracket of the asm region and compile this code.
If you want to do this,it will be a bit different, since assembly gets compiled, not interpreted (which is what you want to do). What you'll need to do, is change the compiler you're using (lets say c++), so that it recognizes your own user defined keyword. Lets say this keyword is scriptX{}. You need to change the c++'s parser so that when it see's scriptX{}, it stores everything between the brackets in the readonly data section of your compiled program. You then need to add a hook in the compiled assembly file to switch the context of the thread to your script interpreter, and start the program counter at the beginning of your script section (which you put in read only data section of the object file).
Good luck with that...
A common reason to embed a scripting language into a program is to provide for the ability to control the program with scripts provided by the end user.
Probably the simplest example of such a script is a configuration file. Assume that your program has options, and needs to remember the options from run to run. You could write them out to a file as a binary image of your options structure, but that would be fragile, not easy to inspect or edit, and likely not portable across systems. Writing the options out in plain text with some sort of labels for which is which addresses most of those complaints, but now you need to parse that text and recover the options. Then some users want different options on Tuesdays, want to do simple arithmetic to compute one option from another, or to write one configuration file that they can use on both Windows and Linux, and pretty soon you find yourself inventing a little language to express all of those ideas and mechanisms with. At this point, there's a better way.
The languages Lua and TCL both grew out of essentially that scenario. Larger systems needed to be configured and controlled by end users. End users wanted to edit a simple text file and get immediate satisfaction, even (especially) when working with large systems that might have required hours to compile successfully.
One advantage here is that rather than inventing a programming language one feature at a time as user's needs change, you start with a complete language along with its documentation. The language designer has already made a number of tough decisions for you (how do I represent strings and numbers, what about lists, what about named values, what does if look like, etc.) and has generally also brought a carefully designed and debugged implementation to the table.
Lua is particularly easy to integrate. Reading a simple configuration file and extracting the settings from the Lua state can be done using a small subset of its C API. Once you have Lua available, it is attractive to use it for other purposes. In many cases, you will find that it is more productive to write only the innermost loops in C, and use Lua to glue those functions together and provide all the "business logic" of the application. This is how Adobe Lightroom is implemented, as well as many games on platforms ranging from simple set-top-boxes to iOS devices and even PCs.

Kernel mode programming using simplistic c++?

I am about to delve into kernel land. My question relates to the programming language. I have seen most tutorials to be written in C. I currently program in C++ and Assembly. I also studied C before C++, but I didn't use it a lot. Would it be possible to program in kernel mode using simplistic C++without using any advanced constructs? Basically I am trying to avoid the minor differences that exist between the two languages(like no bool in C, no automatic returning of 0 from main, really minor differences). I won't be using templates, classes and the like. So would it be possible to program in kernel mode using simplistic C++ without any major annoyances?
Even if not officially supported, you can use C++ as the development language for Windows kernel development.
You should be aware of the following things :
you MUST define the new and delete operator to map to ExAllocatePoolWithTag and ExFreePool.
try to avoid virtual functions. It seems not possible to control the location of the vtable of the object and this may have unexpected results if it is in a pageable portion and you code is called with IRQL >= DISPATCH_LEVEL.
if you still need to use virtual methods table than lock .rdata segment before using it on IRQL >= DISPATCH_LEVEL.
Apart from these kinds of limitations, you can use C++ for your driver development.
Add two links if you want to do C++ in WDK. It's a one time setup effort.
The NT Insider:Guest Article: C++ in an NT Driver
The NT Insider:Global Relief Effort - C++ Runtime Support for the NT DDK
Have seen kernel codes use lots of auto-locks/smart-pointers; although they make the code neat, I feel it has a learning curve for beginner to fully understand, and if abused, lots of construct/destruct codes slow things down.
If you write your code carefully, knowing what exactly stands behind each definition, operator, call, etc, then there should be no problem writing kernel code in C++. The Microsoft document mentioned in the comments above is a good reading precisely because it describes situations in which C++ isn't as transparent as C or doesn't provide similar important guarantees and from that you know what to avoid.
Microsoft has written a guide. Basically they tell us to steer clear of anything but using C++'s relaxed rules of variable declarations...sigh. Anything else and you're on your own. Anyway it can't be all that bad but here are some examples of what you need to remember:
Memory allocated in the paged pool can get paged out. If you try to access it when IRQL is above PASSIVE_LEVEL you're screwed (or at least you will be every once in a while when your customer complains about your driver BSODding their system)! Test your driver on a low memory system under load!
The non-paged pool is limited, you most likely cannot allocate all your needs from it.
Stack is much smaller than in user mode ~12-24K.
Anything you do involving floating point path in the kernel must be protected by KeSaveFloatingPointState and KeRestoreFloatingPointState
C++ exceptions: No
Read the guide for more. Now if you can make sure that the generated code follows the rules, go ahead and use C++.

Is it possible to implement a small Disk OS in C or C++?

I am not trying to do any such thing, but I was wondering out of curiosity whether one could implement an "entire OS" (not necessarily something big like Linux or Microsoft Windows, but more like a small DOS-like operating system) in C and/or C++ using no or little assembly.
By implementing an OS , I mean making an OS from scratch starting the boot-loader and the kernel to the graphics drivers (and optionally GUI) in C or C++. I have seen a few low-level things done in C++ by accessing low-level features through the compiler. Can this be done for an entire OS?
I am not asking whether it is a good idea, I am just asking whether it is even remotely possible?
Obligatory link to the OSDev wiki, which describes most of the steps needed to create an OS as described on x86/x64.
To answer your question, it is gonna be extremely difficult/unpleasant to create the boot loader and start protected mode without resorting to at least some assembly, though it can be kept to a minimum (especially if you're not really counting stuff like using __asm__ ( "lidt %0\n" : : "m" (*idt) ); as 'assembly').
A big hurdle (again on x86) is that the processor starts in 16-bit real mode, so you need some 16-bit code. According to this discussion you can have GCC generate 16-bit code, but you would still need some way to setup memory, load code from some storage media and so on, all of which requires interfacing with the hardware in ways that standard C just has no concept of (interrupts, IO ports etc.).
For architectures which communicate with hardware solely through memory mapped IO you could probably get away with writing everything except the C start-up code (that sets up the stack, initializes variables and so on) in pure C, though specific requirements of interrupt routines / exception or syscall gates etc. may be difficult to impossible to implement (as you have to access special CPU registers).
I assume that you have an OS for x86 in mind. In that case you need at least a few pages of assembler to set up protected mode and stuff like that, and besides that a lot of knowledge of all the stuff like paging, call gates, rings, exceptions, etc. If you are going to use a form of system calls you'll also need some lines of assembly code to switch between kernel and userspace mode.
Besides those things the rest of an OS can easily be programmed in C. For C++ you'll need a runtime environment to support things like virtual members and exceptions, but as far as I know that can all be programmed in C.
Just take a look at Linux kernel source, the most important assembler code (for x86) can be found in arch/x86/boot, but you'll notice that even in that directory most files are written in C. Furthermore you'll find a few assembly lines in the arch/x86/kernel directory for handling system calls and stuff like that.
Outside the arch directory there is hardly any assembler used (because assembler is machine specific, that machine specific code belongs in the arch directory). Even graphic drivers don't use assembler (e.g. nvidia driver in drivers/gpu/drm/nouveau).
A boot loader? You might want to skip that bit. For instance, Linux is quite often started by non-Linux boot loaders such as UBoot. After all, once the system is running, the OS will be present but not the boot loader, that's just there to get the OS proper into memory.
And once you've selected a decent existing bootloader, the remainder is pretty much all straightforward. Well, you have to deal with memory and files yourself; you can't rely on fopen obviously. But even a C++ compiler has little problem generating code that can run without OS support.

A boot loader in C++

I have messed around a few times by making a small assembly boot loader on a floppy disk and was wondering if it's possible to make a boot loader in c++ and if so where might I begin? For all I know im not sure it would even use int main().
Thanks for any help.
If you're writing a boot loader, you're essentially starting from nothing: a small chunk of code is loaded into memory, and executed. You can write the majority of your boot loader in C++, but you will need to bootstrap your own C++ runtime environment first.
Assembly is really the only option for the first stage, as you need to set up a sensible environment for running anything higher-level. Doing enough to run C code is fairly straightforward -- you need:
code and data loaded in the right place;
there may be an additional part of the data area which must be zero-initialised;
you need to point the stack pointer at a suitable area of memory for the stack.
Then you can jump into the code at an appropriate point (e.g. main()) and expect that the basic language features will work. (It's possible that any features of the standard library that may have been implemented or linked in might require additional initialisation at this stage.)
Getting a suitable environment going for C++ requires more effort, as it needs more initialisation here, and also has core language features which require runtime support (again, this is before considering library features). These include:
running static constructors;
memory allocation to support new and delete;
support for run-time type information (RTTI);
support for exceptions;
probably some other things I've forgotten to mention.
None of these are required until the C environment is up and running, so the code that handles these can be written in C rather than assembler (or even in a subset of C++ that does not make use of the above features).
(The same principles apply in embedded systems, and it's not uncommon for such systems to make use of C++, but only in a limited way -- e.g. no exceptions and/or RTTI because the runtime support isn't implemented.)
It's been a while since I played with writing bootloaders, so I'm going off memory.
For an x86 bootloader, you need to have a C++ compiler that can emit x86 assembly, or, at the very least, you need to write your own preamble in 16-bit assembly that will put the CPU into 32-bit protected (or 64-bit long) mode, before you can call your C++ functions.
Once you've done that, though, you should be able to make use of most, if not all, of C++'s language features, so long as you stay away from things that require an underlying libc. But statically link everything without the CRT and you're golden.
Bootloaders don't have "int main()"s, unless you write assembly code to call it.
If you are writing a stage 1 bootloader, then it is seriously discouraged.
Otherwise, the osdev.org has great documentation on the topic.
While it is probably possible to make a bootloader in C++, remember not to link your code to any dynamic libraries, and remember that just because it is C++, that doesn't mean you can/should use the STL, etc.
Yes it is possible. You have elements of answer and usefull links in this question
You also can have a look here, there is a C++ bootloader example.
The main thing to understand is that you need to create a flat binary instead of the usual fancy executable file formats (PE on windows, or ELF on Unixes), because these file format need an OS to load them, and in a boot loader you don't have an OS yet.
Using library is not a problem if you link statically (no dynamic link because again of the above executable problem). But obviously all OS API related entry points are not available...