The way I understand it, segmented stacks are built with compiler support so that whenever a function running on the segmented stack calls another function, if first checks whether the stack has enough space for the stack frame for that new function. And if it doesn't another segmented stack is attached and code branches to that function.
But does this work if say for example I have a fiber running and I call another function from another shared (or compiled into non shared object file) library that was not compiled with the -fsplit-stack option? How do the functions in that library know that they would have to check to see if the segmented stack has enough space in the segmented stack to continue?
Only interested in clang and gcc implementations (and in particular with boost context), thanks!
I'm going to grab back at a piece of documentation that I remember seeing at an earlier question about this subject:
Backward compatibility
We want to be able to use split stack programs on systems with prebuilt libraries compiled without split stacks. This means that we need to ensure that there is sufficient stack space before calling any such function.
Each object file compiled in split stack mode will be annotated to indicate that the functions use split stacks. This should probably be annotated with a note but there is no general support for creating arbitrary notes in GNU as. Therefore, each object file compiled in split stack mode will have an empty section with a special name: .note.GNU-split-stack. If an object file compiled in split stack mode includes some functions with the no_split_stack attribute, then the object file will also have a .note.GNU-no-split-stack section. This will tell the linker that some functions may not have the expected split stack prologue.
[...]
For calls from split-stack code to non-split-stack code, the linker will change the initial instructions in the split-stack (caller) function. This means that the linker will have to have special knowledge of the instructions that the compiler emits. The effect of the changes will be to increase the required framesize by a number large enough to reasonably work for a non-split-stack. This will be a target dependent number; the default will be something like 64K. Note that this large stack will be released when the split-stack function returns. Note that I'm disregarding the case of split-stack code in a shared library calling non-split-stack code in the main executable; that seems like an unlikely problem.
I specifically remember the list (italicized) caveat - I don't remember it was me who highlighted it or someone else. The keyword in that discussion was about "callbacks".
Related
I am working on an embedded system (STM32, ARM M33). I am developing both bootloader and application code. The bootloader and application both use the same filesystem code to access external FLASH memory. Since the size of this code is NOT trivial and it won't change (at least not very often), I would like to have only one copy of it located in the MCU to be a "shared library."
I have referenced the following articles looking for a solution:
Linker script: insert absolute address of the function to the generated code
https://www.embeddedrelated.com/showthread/comp.arch.embedded/213239-1.php
Bootloader and main application to share common code/functionalities
One option is to hard-code addresses to the functions and force the linker (of the bootloader) to place these functions at those addresses. This is very hard to maintain and prone to all sorts of problems.
Option 2 is not much better. It involves exporting a list of symbols from the bootloader and linking the application against this so that my shared functions are linked directly into the bootloader's address space.
Option 3 is to locate some sort of jump table at a very specific address within the bootloader's address space (similar to an interrupt vector). The application code would then call the filesystem functions indirectly via this vector. I think I know how to accomplish something like this using a linker script and a special section in flash.
Finally, one of the articles mentioned "create a jump table or a C++ object
that implements a virtual interface." Since I am using C++ for my application, this seems the most intriguing option to me to use a virtual interface. From my understanding, virtual methods work by two levels of indirection. The object pointer gets you to a vtable, then the vtable gets you to the actual methods. This is very similar to a C-style jump table but with concrete language support.
My question is, how would this be implemented in practice?
At the moment, my bootloader starts executing the application code by calling the Reset ISR from the application's interrupt vector table (the same function the hardware itself would call immediately after reset). In doing this, the bootloader has no way to "pass on" information (i.e., a pointer to a virtual object) to the application.
Your first link is the right thing to do, except you should scrape the ROM map file to generate your linker/symbol definition file that you link to.
The bigger problem is ensuring the symbols you're linking to aren't referencing other symbols that aren't alive any more like static or global variables.
The first option is also the approach that some semiconductor provide ROM code. Normally, the share software should have stable interface since this will mostly unable to change/update in the future. Therefore, it is not necessary to think about the maintenance of share code in the future.
Other option might fit to some special need. However, they might increase the complexity of your software.
I watched some videos on youtube where bytes for CPP or c# code get hardcoded in an unsigned char* then get injected into memory and executed.
how can I do that with my source code? I only found a way to inject the bytes from an exe with a little bit complicated way which caused me some problems when executing.
I also found this page where they use some kind of pentesting tool to generate an executable code (bytes) that can simply get injected in memory.
https://www.ired.team/offensive-security/code-execution/using-msbuild-to-execute-shellcode-in-c
In short: give up until you understand enough of assembly language to ejects assembly code. Blind copying of executable code won't work.
C++ or C# compiling produce machine code which:
May contain external references. A function may call other function, use global variables, etc. Even if you don't explicitly do this, the language may call its runtime. On program load time this is fixed by having all statically imported objects in executable, and loading dynamically imported modules.
Isn't necessarily position independent. That is it may not behave well in another memory location. It may contains absolute reference to itself that should be adjusted, or relative external references, that also should be adjusted. On program load time this is fixed by processing relocation table.
Actually a specific case of 1, but can be viewed separately. Executable except from code and data contains some annotations to code, most notable, exception handlers. Without exception handlers, it may not execute as expected too.
That is, arbitrary copied bytes of executable may or may not work in another location. If you try to copy entire program, most likely it will not work.
For trick like injecting code one would use assembly or machine code, not high level languages. Sorry.
To get machine code for your instructions generated by compiling C++ code from VS:
During debugging - copy or drag and drop the address from Disassembly window to Memory window.
During compilation - use /FAc option
I recently posted a question about stack segmentation and boost coroutines but it seems like the -fsplit-stack approach only works with source files that are compiled with that flag, the runtime breaks down when you branch to another function that has not been compiled with -fsplit-stack. For example
This implies that the runtime uses a function local technique to detect when the current stack has been surpassed. And not a "guard page signal" trick, where the end of the stack always has a guard page which will raise a signal on write or read, telling the runtime to allocate a new stack frame and branch to that.
Then what is the use of this flag? If I link to any other library that has not been built with this, code will break (even libstdc++ and libc), then how is this something people use practically with big projects?
From reading the gcc wiki about split stacks it seems like calling a non split stack function from a split stack function results in an allocation of a 64KB stack frame. Good.
But it seems like calling a non split stack function from a function pointer has not yet been implemented to follow the above scheme.
What use is this flag then? If I proceed to call any virtual function will my program break?
Further from the answer below it seems like clang has not implemented split stacks?
You have to compile boost (at least boost.context and boost.coroutine) with segmeented-stacks support AND your application.
compile boost (boost.context and boost.coroutine) with b2 property segmented-stacks=on (enables special code inside boost.coroutine and boost.context).
your app has to be compiled with -DBOOST_USE_SEGMENTED_STACKS and -fsplit-stack (required by boost.coroutines headers).
see boost.coroutine documentation
boost.coroutine contains an example that demonstrates segmented stacks (in directory coroutine/example/asymmetric/ call b2 toolset=gcc segmented-stacks=on).
regarding your last question GCC Wiki states:
For calls from split-stack code to non-split-stack code, the linker
will change the initial instructions in the split-stack (caller)
function. This means that the linker will have to have special
knowledge of the instructions that the compiler emits. The effect of
the changes will be to increase the required framesize by a number
large enough to reasonably work for a non-split-stack. This will be a
target dependent number; the default will be something like 64K. Note
that this large stack will be released when the split-stack function
returns. Note that I'm disregarding the case of split-stack code in a
shared library calling non-split-stack code in the main executable;
that seems like an unlikely problem.
please note: while llvm supports segmented stacks, clang seams not to provide the __splitstack_<xyz> functions.
First I'd say split stack support is somewhat experimental in nature to begin with. It is not a widely supported thing nor has a single implementation become accepted as the way to go. As such, part of the purpose of it existing in the compiler is to enable research in real use.
That said, one generally wants to use such a feature to enable lots of threads with small stacks, but which can get bigger if they need to. In some applications, the code that runs in these threads can be tightly controlled. E.g. fairly specialized request handlers that do not call general purpose libraries such as Boost. High performance systems work often involves tightening down the constraints on what code is used in a given path and this would be an example thereof. It certainly limits the applicability of the feature, but I wouldn't be surprised if someone is using it in production this way.
Note that similar issues exist with flags such as -fno-exceptions and -fno-rtti . Generally C++ requires compiling everything that goes into an executable with a compatible set of flags. Sometimes one can mix and match, but it is often fragile. This is part of the motivation of building everything from source and hermetic build tools like bazel. Other languages have different approaches to non-source components, especially virtual machine based languages such as Java and the .NET family. In those worlds things like split stacks are decided at a lower-level of compilation, but typically one would not have any control over or awareness of them at the source code level.
I have the location/offset of a particular function present inside an executable. Would it be possible to call such a function (while suppressing the CRT's execution of the executable's entry point, hopefully) ?
In effect, you can simulate the Windows loader, assuming you run under Windows, but the basics should be the same on any platform. See e.g. http://msdn.microsoft.com/en-us/magazine/cc301805.aspx.
Load the file into memory,
Replace all relative addresses of functions that are called by the loaded executable with the actual function addresses.
Change the memory page to "executable" (this is the difficult and platform-dependent part)
Initialize the CRT in order to, e.g., initialize static variables.
Call.
However, as the commenters point out correctly, this might only be practical as an exercise using very simple functions. There are many, many things that can go wrong if you don't manage to emulate the complete OS loader.
PS: You could also ask the Google: http://www.cultdeadcow.com/tools/pewrap.html
PPS: You may also find helpful advice in the "security" community: https://www.blackhat.com/presentations/bh-usa-07/Harbour/Whitepaper/bh-usa-07-harbour-WP.pdf
Yes, you can call it, if you will initialize all global variables which this function uses. Probably including CRT global variables. As alternative way, you can hook and replace all CRT functions that callee uses. See disassembly of that function to get right solution.
1) Take a look at the LoadLibraryEx() API. It has some flags that could be able to do all the dirty work described by Sebastian.
2) Edit the executable. Several modified bytes will do the job. Here is some documentation on the file format: http://docsrv.sco.com:507/en/topics/COFF.html
I have the following question and from a systems perspective want to know how to achieve this easily and efficiently.
Given a task 'abc' that has been built with debug information and a global variable "TRACE" that is normally set to 0, I would like to print out to file 'log' the address of each function that is called between the time that TRACE is set to 1 and back again to 0.
I was considering doing this through a front-loading / boot-strapping task that I'd develop which looks at the instructions for a common pattern of jump/frame pointer push, writing down the address and then mapping addresses to function names from the symbolic debug information in abc. There could be better system level ways to do this without a front-loader though, and I'm not sure what is most feasible.
Any implemented techniques out there?
One possibility is to preprocess the source before compiling it. This preprocessing would add code at the beginning of each function that would check the TRACE global and, if set, write to the log. As Mystagogue said, the compiler has preprocessor macros that expand to the name of the function.
You might also look at some profiling tools. Some of them have functionality close to what you're asking for. For example, some will sample the entire callstack periodically, which can tell you a lot about the code flow without actually logging every call.
Looking for a common prologue/epilogue won't work in the presence of frame-pointer omission and tail call optimization. Also, modern optimizers like to split functions into several chunks and merge common tail chunks of different functions.
There is no standard solution.
For Microsoft compiler, check out _penter and _pexit hooks. For GCC, look at -finstrument-functions option and friends.
Also, on x86 Windows you can use a monitor such as WinApiOverride32. It's primarily intended for monitoring DLL and system API calls, but you can generate a description file from your application's map file and monitor internal functions as well.
(Edited: added link to GCC option.)
Make sure you've looked into the __func__ or __FUNCTION__ predefined identifiers. They provide a string literal of the function/method name you are currently executing.