Writing functions to binary files? - c++

Something that made me fairly curious was that since it's possible in C++ to pass a function as an argument under the right circumstances, that would suggest that whatever internal code handles that function can be pointed to and otherwise written and read into a binary as its executable code.
This is obviously coming from someone while I may have a strong background in C++, I'm not familiar with the intricate internals in just how memory is managed in the heap and especially how the executable machine code fits into the picture.
I assume since it's possible to pass around the reference to a function, it's possible to get the data pointed to by it and write it somewhere. I don't know.
Anyone want to tell me if this is possible? If so, can you give an example? If not, please tell me why! I love learning more in-depth about how C++ actually works internally.

20 years ago your suggestions could be fresh and usable. People were saving memory by loading code from file on demand , then calling it, then unloading. That was called overlays. To certain level it IS usable, but in form that is standardized in platform and platform's API is what manages it.
Mechanism behind shared libraries (.so in POSIX system, .dll in Windows) is that library's file contains labels where certain functions are , what their name is, as well as data about how stack and data segment should be initialized. It can be done by system automatically, when program is loaded. Otherwise you can load library manually and load pointer to function. E.g. on Windows that would be by LoadLibrary() and GetProcAddress(), dlopen() and dlsym() on Linux.
Reason why it isn't possible now in high level language: security, protection from malicious code in data segment. Run-time library usually handles it.
It is still possible using assembler, but you will challenge antivirus and system security measures. With careful programming you may create own "linker" be able to create your own library and load , I suppose.

No, you can't really do this. There are a whole lot of reasons, but here's a simple and intuitive one: functions may call other functions. If you were able to write a function to disk, and restore it, this would not account for its dependencies (functions it calls, global variables it updates, etc.). It won't work.
If you want to read functions from disk, it is better to express them in a scripting language like Lua. This is a proven solution which is used in many commercial products such as video games and Adobe Lightroom.

While a function pointer is the entry point of a function, and that memory can be read and therefore copied, the first problem is that there is no reliable means of determining the length of that code, so you cannot determine for certain how much to copy to get the entire function and only that.
The other issue is to what practical end? Depending on the platform the code may not be relocatable and will have links to other code. The binary contains no symbolic information; the best you can do is disassemble it, but out of the context of the entire linked executable it may not be very useful to do so.
If your aim is to separate functions from the primary executable, and to be able to later load and run them, then that is what DLLs and shared libraries are for.
If you just want to observe the binary relating to a function, then that is best done in a debugger - it will have a disassembly view mode that will show the raw binary (in hexadecimal), assembly code with symbolic links and the corresponding C source. This makes a lot more sense if your aim is merely to investigate how source code relates to binary machine code.
Below is how you could possibly do what you are asking - even if there is no practical reason for doing it. It makes assumptions about the behaviour of the compiler that may not be valid in some cases. It assumes for example that the compiler will place adjacent functions contiguously in memory and in increasing memory address, so that function2 is immediately after function1 in memory. Here function2 serves only as an end marker for function1 and may be dummy.
int function1()
{
...
return 0 ;
}
void function2()
{
}
#include <stddef.h>
int main()
{
ptrdiff_t function1_length = (char*)function2 - (char*)function1 ;
FILE* fp = fopen( "function1.bin", "wb" ) ;
fwrite( function1, function1_length, 1, fp ) ;
fclose( fp ) ;
}

Related

Share code between bootloader and application

I am working on an embedded system (STM32, ARM M33). I am developing both bootloader and application code. The bootloader and application both use the same filesystem code to access external FLASH memory. Since the size of this code is NOT trivial and it won't change (at least not very often), I would like to have only one copy of it located in the MCU to be a "shared library."
I have referenced the following articles looking for a solution:
Linker script: insert absolute address of the function to the generated code
https://www.embeddedrelated.com/showthread/comp.arch.embedded/213239-1.php
Bootloader and main application to share common code/functionalities
One option is to hard-code addresses to the functions and force the linker (of the bootloader) to place these functions at those addresses. This is very hard to maintain and prone to all sorts of problems.
Option 2 is not much better. It involves exporting a list of symbols from the bootloader and linking the application against this so that my shared functions are linked directly into the bootloader's address space.
Option 3 is to locate some sort of jump table at a very specific address within the bootloader's address space (similar to an interrupt vector). The application code would then call the filesystem functions indirectly via this vector. I think I know how to accomplish something like this using a linker script and a special section in flash.
Finally, one of the articles mentioned "create a jump table or a C++ object
that implements a virtual interface." Since I am using C++ for my application, this seems the most intriguing option to me to use a virtual interface. From my understanding, virtual methods work by two levels of indirection. The object pointer gets you to a vtable, then the vtable gets you to the actual methods. This is very similar to a C-style jump table but with concrete language support.
My question is, how would this be implemented in practice?
At the moment, my bootloader starts executing the application code by calling the Reset ISR from the application's interrupt vector table (the same function the hardware itself would call immediately after reset). In doing this, the bootloader has no way to "pass on" information (i.e., a pointer to a virtual object) to the application.
Your first link is the right thing to do, except you should scrape the ROM map file to generate your linker/symbol definition file that you link to.
The bigger problem is ensuring the symbols you're linking to aren't referencing other symbols that aren't alive any more like static or global variables.
The first option is also the approach that some semiconductor provide ROM code. Normally, the share software should have stable interface since this will mostly unable to change/update in the future. Therefore, it is not necessary to think about the maintenance of share code in the future.
Other option might fit to some special need. However, they might increase the complexity of your software.

hibernate-like saving state of a program

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.
What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.
I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)
There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.
Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.
Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

Calling external files (e.g. executables) in C++ in a cross-platform way

I know many have asked this question before, but as far as I can see, there's no clear answer that helps C++ beginners. So, here's my question (or request if you like),
Say I'm writing a C++ code using Xcode or any text editor, and I want to use some of the tools provided in another C++ program. For instance, an executable. So, how can I call that executable file in my code?
Also, can I exploit other functions/objects/classes provided in a C++ program and use them in my C++ code via this calling technique? Or is it just executables that I can call?
I hope someone could provide a clear answer that beginners can absorb.. :p
So, how can I call that executable file in my code?
The easiest way is to use system(). For example, if the executable is called tool, then:
system( "tool" );
However, there are a lot of caveats with this technique. This call just asks the operating system to do something, but each operating system can understand or answer the same command differently.
For example:
system( "pause" );
...will work in Windows, stopping the exectuion, but not in other operating systems. Also, the rules regarding spaces inside the path to the file are different. Finally, even the separator bar can be different ('\' for windows only).
And can I also exploit other functions/objects/classes... from a c++
and use them in my c++ code via this calling technique?
Not really. If you want to use clases or functions created by others, you will have to get the source code for them and compile them with your program. This is probably one of the easiest ways to do it, provided that source code is small enough.
Many times, people creates libraries, which are collections of useful classes and/or functions. If the library is distributed in binary form, then you'll need the dll file (or equivalent for other OS's), and a header file describing the classes and functions provided y the library. This is a rich source of frustration for C++ programmers, since even libraries created with different compilers in the same operating system are potentially incompatible. That's why many times libraries are distributed in source code form, with a list of instructions (a makefile or even worse) to obtain a binary version in a single file, and a header file, as described before.
This is because the C++ standard does not the low level stuff that happens inside a compiler. There are lots of implementation details that were freely left for compiler vendors to do as they wanted, possibly trying to achieve better performance. This unfortunately means that it is difficult to distribute a simple library.
You can call another program easily - this will start an entirely separate copy of the program. See the system() or exec() family of calls.
This is common in unix where there are lots of small programs which take an input stream of text, do something and write the output to the next program. Using these you could sort or search a set of data without having to write any more code.
On windows it's easy to start the default application for a file automatically, so you could write a pdf file and start the default app for viewing a PDF. What is harder on Windows is to control a separate giu program - unless the program has deliberately written to allow remote control (eg with com/ole on windows) then you can't control anything the user does in that program.

Writing dynamically loadable components in c++

I'm currently working on a program which should perform calculations on a home brewed data structure.
I want to build it in a way that it would be easy to add supported calculations (say, as source files which conform to a predetermined structure).
The problem is that I don't want to load all calculations in advance, because there might be a lot of them.
The only mechanism I found which supports dynamic loading of functionality is dlopen, which expects .so files, so in this context, using dlopen means compiling a separate so file for every group of computations.
While I don't see any inherent problem with this design, my spider senses tell me I should verify with the all-knowing-web that it's not utterly stupid. If there are any other suggested ways to do so I'd be glad to hear.
Using dlopen() is the most widely used way to load executable code dynamically in an application on POSIX-compatible operating systems. It allows using a modular architecture where optional or rarely used code is only loaded on-demand, which sounds pretty much like what you need.
I would certainly use this method - if after some time you find that the shared object compilation step is becoming a hurdle, you can build additional dynamically loaded modules to support e.g. an interpretted language such as Lua or Python. This would allow you to keep your existing codebase without losing in extensibility.
Seems like a good approach.
A good way to do this is to declare an abstract (pure) class in C++, say Calculator, with all the methods and accessors you need to perform a calculation. Then, have your separate dynamic libraries or .so files implement a global function Calculator * create_calculator() that creates an instance of a class that derives from Calculator. Finally, you'll have to devise a registration mechanism so that your main program can determine the name of the dynamic library to load, based on some kind of identifier like a string , enum, or uuid. This would typically be available as a easily editable configuration file.
void *handle;
int (*create_calculator)();
/* open the needed object file */
char *libName = get_lib_name_from_config(identifier);
handle = dlopen(libName, RTLD_LOCAL | RTLD_LAZY);
/* find the address of create_calculator function */
create_calculator = (*(Calculator*)()) dlsym(handle, "create_calculator");
Calculator * calc = create_calculator();
This scheme can be made more flexible (and complex) by allowing the create_calculator method name to vary, at the cost of having to obtain that from the config file as well.
Opening shared libraries using dlopen() is certainly the first thing that comes to my mind; it's a fine plan.

A boot loader in C++

I have messed around a few times by making a small assembly boot loader on a floppy disk and was wondering if it's possible to make a boot loader in c++ and if so where might I begin? For all I know im not sure it would even use int main().
Thanks for any help.
If you're writing a boot loader, you're essentially starting from nothing: a small chunk of code is loaded into memory, and executed. You can write the majority of your boot loader in C++, but you will need to bootstrap your own C++ runtime environment first.
Assembly is really the only option for the first stage, as you need to set up a sensible environment for running anything higher-level. Doing enough to run C code is fairly straightforward -- you need:
code and data loaded in the right place;
there may be an additional part of the data area which must be zero-initialised;
you need to point the stack pointer at a suitable area of memory for the stack.
Then you can jump into the code at an appropriate point (e.g. main()) and expect that the basic language features will work. (It's possible that any features of the standard library that may have been implemented or linked in might require additional initialisation at this stage.)
Getting a suitable environment going for C++ requires more effort, as it needs more initialisation here, and also has core language features which require runtime support (again, this is before considering library features). These include:
running static constructors;
memory allocation to support new and delete;
support for run-time type information (RTTI);
support for exceptions;
probably some other things I've forgotten to mention.
None of these are required until the C environment is up and running, so the code that handles these can be written in C rather than assembler (or even in a subset of C++ that does not make use of the above features).
(The same principles apply in embedded systems, and it's not uncommon for such systems to make use of C++, but only in a limited way -- e.g. no exceptions and/or RTTI because the runtime support isn't implemented.)
It's been a while since I played with writing bootloaders, so I'm going off memory.
For an x86 bootloader, you need to have a C++ compiler that can emit x86 assembly, or, at the very least, you need to write your own preamble in 16-bit assembly that will put the CPU into 32-bit protected (or 64-bit long) mode, before you can call your C++ functions.
Once you've done that, though, you should be able to make use of most, if not all, of C++'s language features, so long as you stay away from things that require an underlying libc. But statically link everything without the CRT and you're golden.
Bootloaders don't have "int main()"s, unless you write assembly code to call it.
If you are writing a stage 1 bootloader, then it is seriously discouraged.
Otherwise, the osdev.org has great documentation on the topic.
While it is probably possible to make a bootloader in C++, remember not to link your code to any dynamic libraries, and remember that just because it is C++, that doesn't mean you can/should use the STL, etc.
Yes it is possible. You have elements of answer and usefull links in this question
You also can have a look here, there is a C++ bootloader example.
The main thing to understand is that you need to create a flat binary instead of the usual fancy executable file formats (PE on windows, or ELF on Unixes), because these file format need an OS to load them, and in a boot loader you don't have an OS yet.
Using library is not a problem if you link statically (no dynamic link because again of the above executable problem). But obviously all OS API related entry points are not available...