Extend call stack to disk in C++? - c++

When it comes to massively-recursive method calls, the call-stack size has to be extended by modifying the appropriate compiler parameters in order to avoid stack-overflow.
Let's consider writing a portable application whose layout is simple enough so that its users need only possess minimal technical knowledge, so manual virtual memory configuration is out of question.
Running massively-recursive methods (behind the scenes obviously) may result in the call-stack limit being surpassed, especially if the machine the application is running on is limited memory-wise.
Enough chit-chat: In C++ is it possible to manually extend the call-stack to disk in case memory is (almost) full?

It may just barely be possible.
Use a coroutine library. With that, you allocate your own stack out of the heap. Restructure your code to track how deep it is in its callstack, and when it gets dangerously deep, create a new cothread and switch into that instead. When you run out of heap memory, freeze old cothreads and free their memory. Of course, you better be sure to unfreeze them to the same address--so I suggest you allocate their stacks yourselves out of your own arena that you can control. In fact it may be easier to just reuse the same piece of memory for the cothread stack and swap them in and out one at a time.
It's certainly easier to rewrite your algorithm to be non-recursive.
This may be an example of it working, or it may just print the right answer on accident:
#include <stdio.h>
#include "libco.h"
//byuu's libco has been modified to use a provided stack; it's a simple mod, but needs to be done per platform
//x86.c:
////if(handle = (cothread_t)malloc(size)) {
//handle = (cothread_t)stack;
//here we're going to have a stack on disk and have one recursion's stack in RAM at a time
//I think it may be impossible to do this without a main thread controlling the coroutines, but I'm not sure.
#define STACKSIZE (32*1024)
char stack[STACKSIZE];
FILE* fpInfiniteStack;
cothread_t co_mothership;
#define RECURSING 0
#define EXITING 1
int disposition;
volatile int recurse_level;
int call_in_cothread( int (*entrypoint)(int), int arg);
int fibo_b(int n);
int fibo(int n)
{
if(n==0)
return 0;
else if(n==1)
return 1;
else {
int a = call_in_cothread(fibo,n-1);
int b = call_in_cothread(fibo_b,n-2);
return a+b;
}
}
int fibo_b(int n) { printf("fibo_b\n"); return fibo(n); } //just to make sure we can call more than one function
long filesize;
void freeze()
{
fwrite(stack,1,STACKSIZE,fpInfiniteStack);
fflush(fpInfiniteStack);
filesize += STACKSIZE;
}
void unfreeze()
{
fseek(fpInfiniteStack,filesize-STACKSIZE,SEEK_SET);
int read = fread(stack,1,STACKSIZE,fpInfiniteStack);
filesize -= STACKSIZE;
fseek(fpInfiniteStack,filesize,SEEK_SET);
}
struct
{
int (*proc)(int);
int arg;
} thunk, todo;
void cothunk()
{
thunk.arg = thunk.proc(thunk.arg);
disposition = EXITING;
co_switch(co_mothership);
}
int call_in_cothread(int (*proc)(int), int arg)
{
if(co_active() != co_mothership)
{
todo.proc = proc;
todo.arg = arg;
disposition = RECURSING;
co_switch(co_mothership);
//we land here after unfreezing. the work we wanted to do has already been done.
return thunk.arg;
}
NEXT_RECURSE:
thunk.proc = proc;
thunk.arg = arg;
cothread_t co = co_create(stack,STACKSIZE,cothunk);
recurse_level++;
NEXT_EXIT:
co_switch(co);
if(disposition == RECURSING)
{
freeze();
proc = todo.proc;
arg = todo.arg;
goto NEXT_RECURSE;
}
else
{
recurse_level--;
unfreeze();
if(recurse_level==0)
return thunk.arg; //return from initial level of recurstion
goto NEXT_EXIT;
}
return -666; //this should not be possible
}
int main(int argc, char**argv)
{
fpInfiniteStack = fopen("infinite.stack","w+b");
co_mothership = co_active();
printf("result: %d\n",call_in_cothread(fibo,10));
}
Now you just need to detect how much memory's in the system, how much of it is available, how big the callstack is, and when the callstack's exhausted, so you know when to deploy the infinite stack. That's not easy stuff for one system, let alone doing it portably. It might be better to learn how the stack is actually meant to be used instead of fighting it.

It's feasible.
You need a bit of assembly to manipulate the stack pointer as there's no standardized way of accessing it from C++ directly (as far as I know). Once you are there you can point to your memory page and take care of swapping memory in and out. There are already libraries out there doing it for you.
On the other hand if the system provider considered that paging memory or the other virtual memory techniques would not work/be worth on the platform they probably had a very good reason (most likely it would be incredibly slow). Try to get your solution to work without the recursion or change it to make the recursion fit into what's available. Even a less efficient implementation would end up faster than your disk paged recursion.

Related

What is the best memory management for GCC C++? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
What's the problem of my memory management? Because it causes crash that I show in a comment in the code below ("A memory block could not be found when trying to free."). I know that my memory management is not thread safe because I use global variables g_numBlocks and g_blocks that can cause risk when using multiple threads.
Since my memory management code seems too complex, can anyone suggest a stable and better "Memory Management for C++" to avoid memory leaks.
The code that contains bug
#include "emc-memory.h" // <-- Declare the functions MALLOC() and FREE() from other library.
#include <vector>
int main() {
printf("HERE(1)\n");
{
std::vector<string> paths = { // <-- Problem, 'std::vector' & 'string' use internal malloc/free & operator new/delete that are overwritten with my own custom memory management.
"/foo/bar.txt",
"/foo/bar.",
"/foo/bar",
"/foo/bar.txt/bar.cc",
"/foo/bar.txt/bar.",
"/foo/bar.txt/bar",
"/foo/.",
"/foo/..",
"/foo/.hidden",
"/foo/..bar",
};
} // <-- It crashes here, error in FREE(): "A memory block could not be found when trying to free.".
printf("HERE(2)\n"); // The reason I know it crashes above is this line is not evaluated, only "HERE(1)" is printed. I'm using [RelWithDebInfo] with blurry debugging info.
return 0;
}
Compilers:
[Visual Studio 2015] [Debug]: No problem.
[Visual Studio 2015] [RelWithDebInfo]: No problem.
[GCC 12.1.0 x86_64-w64-mingw32] [Debug]: No problem.
[GCC 12.1.0 x86_64-w64-mingw32] [RelWithDebInfo]: Broken which means there's a bug.
In "emc-memory.h" in other library .so file
extern const char* __file;
extern int __line;
#define new (__file = __FILE__, __line = __LINE__, 0) ? 0 : new
enum MEMORYBLOCKTYPE {
MEMORYBLOCKTYPE_MALLOC,
MEMORYBLOCKTYPE_NEW,
};
void *MALLOC(size_t size, MEMORYBLOCKTYPE type);
void *REALLOC(void *block, size_t newSize);
void FREE(void *block, MEMORYBLOCKTYPE type);
#define malloc(size) ((__file = __FILE__, __line = __LINE__, 0) ? 0 : MALLOC(size, MEMORYBLOCKTYPE_MALLOC))
#define realloc(block, newSize) REALLOC(block, newSize)
#define free(block) FREE(block, MEMORYBLOCKTYPE_MALLOC)
In "emc-memory.cpp" in other library .so file
I use this code in a link to override the operator new & delete: https://codereview.stackexchange.com/questions/7216/custom-operator-new-and-operator-delete
typedef unsigned long long BlockId; // The reason it's 64-bit is a memory block can be freed and reallocated multiple times, which means that there can be a lot of ids.
BlockId g_blockId = 0;
BlockId newBlockId() {
return g_blockId++;
}
struct Block {
const char *file;
int line;
const char *scope;
char *hint;
size_t size;
BlockId id; // That id is used for comparison because it will never be changed but the block pointer can be changed.
void *block;
MEMORYBLOCKTYPE type;
};
bool g_blocks_initialized = false;
int g_numBlocks;
Block **g_blocks;
void *MALLOC(size_t size, MEMORYBLOCKTYPE type) {
if (g_blocks_initialized == false) {
g_blocks_initialized = true;
_initializeList(g_numBlocks, g_blocks);
}
Block *b = (Block *)malloc(sizeof(*b));
b->file = __file ; __file = nullptr;
b->line = __line ; __line = 0;
b->scope = __scope; __scope = nullptr;
b->hint = allocateMemoryHint(__hint);
b->size = size;
b->id = newBlockId();
b->block = malloc(size);
b->type = type;
_addListItem(g_numBlocks, g_blocks, b);
return b->block;
}
void FREE(void *block, MEMORYBLOCKTYPE type) {
if (block == nullptr) {
return; // 'free' can free a nullptr.
}
for (int i = 0; i < g_numBlocks; i++) {
Block *b = g_blocks[i];
if (b->block == block) {
if (b->type != type) {
switch (type) {
case MEMORYBLOCKTYPE_MALLOC: EMC_ERROR("The memory block type must be MALLOC."); break;
case MEMORYBLOCKTYPE_NEW: EMC_ERROR("The memory block type must be NEW."); break;
default: EMC_ERROR("Error"); break;
}
}
_removeListItem(g_numBlocks, g_blocks, b);
freeMemoryHint(b->hint); b->hint = nullptr;
SAFE_FREE(b->block);
SAFE_FREE(b);
return;
}
}
EMC_ERROR("A memory block could not be found when trying to free.\n\nExamples:\n - Calling free(pointer) where pointer was not set to zero after it's been called twice, the solution was to use SAFE_FREE(). And if possible, replace any free() with SAFE_FREE(). For example, see Lexer::read0() on the original line \"free(out.asIdentifier);\".\n - If an 'Engine' object is destroyed before destroying a Vulkan object then it can cause this error (It can happen with 'Release' or 'RelWithDebInfo' configuration but not with 'Debug' configuration), that problem happened to me and I stuck there for hours until I realized it.");
}
I would humbly suggest that without a very clear reason to think otherwise the best memory management for GCC C++ is the out-of-the-box default memory management for GCC C++.
That would mean your best solution would have been to do nothing or as it is now strip out your overrides of the global operators.
You may find in some area of a system the default memory management is sub-optimal but in 2022 the default options are very effective and if you find a general purpose strategy that is better it's a publishable paper.
However your question tells us nothing about the application in question or your motivations for thinking you should even try to change the memory management let alone give advice on what to.
Sure you can add a global allocation mutex to block memory management and make it thread-safe. I will be surprised if that doesn't turn out to more than throw away whatever advantage you're hoping to gain.
Not sure where SAFE_FREE is coming from.
If you see in MALLOC function, they use c runtime malloc. Meaning that if you want to free the block you need to use the corresponding free() function.
Make sure that SAFE_FREE is indeed using the c runtime free with the correct parameters.

build function at runtime c++ from number of functions that built at compilation

I am creating scripting language that first parse the code
and then copy functions (To execute the code) to one buffer\memory as the parsed code.
There is a way to copy function's binary code to buffer and then execute the whole buffer?
I need to execute all the functions at once to get better performance.
To understand my question to best I want to do something like this:
#include <vector>
using namespace std;
class RuntimeFunction; //The buffer to my runtime function
enum ByteCodeType {
Return,
None
};
class ByteCode {
ByteCodeType type;
}
void ReturnRuntime() {
return;
}
RuntimeFunction GetExecutableData(vector<ByteCode> function) {
RuntimeFunction runtimeFunction=RuntimeFunction(sizeof(int)); //Returns int
for (int i = 0 ; i < function.size() ; i++ ) {
#define CurrentByteCode function[i]
if (CurrentByteCode.Type==Return) {
runtimeFunction.Append(&ReturnRuntime);
} //etc.
#undef
}
return runtimeFunction;
}
void* CallFunc(RuntimeFunction runtimeFunction,vector<void*> custom_parameters) {
for (int i=custom_parameters-1;i>=0;--i) { //Invert parameters loop
__asm {
push custom_parameters[i]
}
}
__asm {
call runtimeFunction.pHandle
}
}
There are a number of ways of doing this, depending on how deep you want to get into generating code at runtime, but one relatively simple way of doing it is with threaded code and a threaded code interpreter.
Basically, threaded code consists of an array of function pointers, and the interpreter goes through the array calling each pointed at function. The tricky part is that you generally have each function return the address of array element containing a pointer to the next function to call, which allows you to implement things like branches and calls without any effort in the interpreter
Usually you use something like:
typedef void *(*tc_func_t)(void *, runtime_state_t *);
void *interp(tc_func_t **entry, runtime_state_t *state) {
tc_func_t *pc = *entry;
while (pc) pc = (*pc)(pc+1, state);
return entry+1;
}
That's the entire interpreter. runtime_state_t is some kind of data structure containing some runtime state (usually one or more stacks). You call it by creating an array of tc_func_t function pointers and filling them in with function pointers (and possibly data), ending with a null pointer, and then call interp with the address of a variable containing the start of the array. So you might have something like:
void *add(tc_func_t *pc, runtime_state_t *state) {
int v1 = state->data.pop();
int v2 = state->data.pop();
state->data.push(v1 + v2);
return pc; }
void *push_int(tc_func_t *pc, runtime_state_t *state) {
state->data.push((int)*pc);
return pc+1; }
void *print(tc_func_t *pc, runtime_state_t *state) {
cout << state->data.pop();
return pc; }
tc_func_t program[] = {
(tc_func_t)push_int,
(tc_func_t)2,
(tc_func_t)push_int,
(tc_func_t)2,
(tc_func_t)add,
(tc_func_t)print,
0
};
void run_prgram() {
runtime_state_t state;
tc_func_t *entry = program;
interp(&entry, &state);
}
Calling run_program runs the little program that adds 2+2 and prints the result.
Now you may be confused by the slightly odd calling setup for interp, with an extra level of indirection on the entry argument. That's so that you can use interp itself as a function in a threaded code array, followed by a pointer to another array, and it will do a threaded code call.
edit
The biggest problem with threaded code like this is related to performance -- the threaded coded interpreter is extremely unfriendly to branch predictors, so performance is pretty much locked at one threaded instruction call per branch misprediction recovery time.
If you want more performance, you pretty much have to go to full-on runtime code generation. LLVM provides a good, machine independent interface to doing that, along with pretty good optimizers for common platforms that will produce pretty good code at runtime.

C/C++ code that damages call stack

Is it possible an usual code to damage call stack in c/c++?
I don't mean a kind of hack or something, just an oversight mistake or something, but not random, such that damages it every time.
Someone told me that an ex colleague managed but I don't think it is possible.
Does someone have such an experience?
Yes, easy. One of the very common issues, in fact. Consider this:
void foo()
{
int i;
int *p = &i;
p -= 5; // now point somewhere god knows where, generally undefined behavior
*p = 0; // boom, on different compilers will end up with various bad things,
// including potentially trashing the call stack
}
Many cases of an out-of-boundaries access of a local array/buffer end up with trashed stacks.
Yes. On many platforms, local variables are stored along with the call stack; in that case, writing outside a local array is a very easy way to corrupt it:
void evil() {
int array[1];
std::fill(array, array+1000000, 0);
return; // BOOM!
}
More subtly, returning a reference to a local variable could corrupt the stack of a function that's called later on:
int & evil() {
int x;
return x;
}
void good(int & x) {
x = 0;
return; // BOOM!
}
void innocent() {
good(evil());
}
Note that neither of these (and indeed anything else that could corrupt the stack) are legal; but the compiler doesn't have to diagnose them. Luckily, most compilers will spot these errors, as long as you enable the appropriate warnings.

Are there any problems with changing the target of a function pointer from within the function? (C++)

I have a program that defines changes the target function of a function pointer from within the function being called by said pointer, like so:
void increment(int&);
void superincrement(int&);
void (*fooncrement)(int&);
int main() {
int j = 0;
fooncrement = increment;
while (true == true) {
fooncrement(j);
}
}
void increment(int& i) {
static int counter = 0;
i++;
if (counter > 7)
fooncrement = superincrement;
counter++;
}
void superincrement(int& i) {
i += 23;
}
A quick run through MSVC's debugger shows that the program more or less works as expected. However, are there are any problems not immediately obvious here that might manifest if I tried something like this in a more complex environment?
This is well-defined.
In fact, this technique is often used to implement state machines.
No, there shouldn't be any problem. It's not like it clings to that pointer once you made the call.
This isn't thread-safe, but for single-threaded apps, no problems. If you don't know what threading is, then you're not using it, so you have no problem.
More advanced programs use multiple "threads", each of which is basically it's own program, but they work together. Imagine if you had int main(), but also int main1(), int main2(), int main3(), and they all ran at the same time. Your program could do four things at once! However, if one thread changes fooncrement, there will be a slight delay before the other threads see the updated value. In the meantime, they'd use the old value, which might cause problems.

Boost, Shared Memory and Vectors

I need to share a stack of strings between processes (possibly more complex objects in the future). I've decided to use boost::interprocess but I can't get it to work. I'm sure it's because I'm not understanding something. I followed their example, but I would really appreciate it if someone with experience with using that library can have a look at my code and tell me what's wrong. The problem is it seems to work but after a few iterations I get all kinds of exceptions both on the reader process and sometimes on the writer process. Here's a simplified version of my implementation:
using namespace boost::interprocess;
class SharedMemoryWrapper
{
public:
SharedMemoryWrapper(const std::string & name, bool server) :
m_name(name),
m_server(server)
{
if (server)
{
named_mutex::remove("named_mutex");
shared_memory_object::remove(m_name.c_str());
m_segment = new managed_shared_memory (create_only,name.c_str(),65536);
m_stackAllocator = new StringStackAllocator(m_segment->get_segment_manager());
m_stack = m_segment->construct<StringStack>("MyStack")(*m_stackAllocator);
}
else
{
m_segment = new managed_shared_memory(open_only ,name.c_str());
m_stack = m_segment->find<StringStack>("MyStack").first;
}
m_mutex = new named_mutex(open_or_create, "named_mutex");
}
~SharedMemoryWrapper()
{
if (m_server)
{
named_mutex::remove("named_mutex");
m_segment->destroy<StringStack>("MyStack");
delete m_stackAllocator;
shared_memory_object::remove(m_name.c_str());
}
delete m_mutex;
delete m_segment;
}
void push(const std::string & in)
{
scoped_lock<named_mutex> lock(*m_mutex);
boost::interprocess::string inStr(in.c_str());
m_stack->push_back(inStr);
}
std::string pop()
{
scoped_lock<named_mutex> lock(*m_mutex);
std::string result = "";
if (m_stack->size() > 0)
{
result = std::string(m_stack->begin()->c_str());
m_stack->erase(m_stack->begin());
}
return result;
}
private:
typedef boost::interprocess::allocator<boost::interprocess::string, boost::interprocess::managed_shared_memory::segment_manager> StringStackAllocator;
typedef boost::interprocess::vector<boost::interprocess::string, StringStackAllocator> StringStack;
bool m_server;
std::string m_name;
boost::interprocess::managed_shared_memory * m_segment;
StringStackAllocator * m_stackAllocator;
StringStack * m_stack;
boost::interprocess::named_mutex * m_mutex;
};
EDIT Edited to use named_mutex. Original code was using interprocess_mutex which is incorrect, but that wasn't the problem.
EDIT2 I should also note that things work up to a point. The writer process can push several small strings (or one very large string) before the reader breaks. The reader breaks in a way that the line m_stack->begin() does not refer to a valid string. It's garbage. And then further execution throws an exception.
EDIT3 I have modified the class to use boost::interprocess::string rather than std::string. Still the reader fails with invalid memory address. Here is the reader/writer
//reader process
SharedMemoryWrapper mem("MyMemory", true);
std::string myString;
int x = 5;
do
{
myString = mem.pop();
if (myString != "")
{
std::cout << myString << std::endl;
}
} while (1); //while (myString != "");
//writer
SharedMemoryWrapper mem("MyMemory", false);
for (int i = 0; i < 1000000000; i++)
{
std::stringstream ss;
ss << i; //causes failure after few thousand iterations
//ss << "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" << i; //causes immediate failure
mem.push(ss.str());
}
return 0;
There are several things that leaped out at me about your implementation. One was the use of a pointer to the named mutex object, whereas the documentation of most boost libraries tends to bend over backwards to not use a pointer. This leads me to ask for a reference to the program snippet you worked from in building your own test case, as I have had similar misadventures and sometimes the only way out was to go back to the exemplar and work forward one step at a time until I come across the breaking change.
The other thing that seems questionable is your allocation of a 65k block for shared memory, and then in your test code, looping to 1000000000, pushing a string onto your stack each iteration.
With a modern PC able to execute 1000 instructions per microsecond and more, and operating systems like Windows still doling out execution quanta in 15 millisecond. chunks, it won't take long to overflow that stack. That would be my first guess as to why things are haywire.
P.S.
I just returned from fixing my name to something resembling my actual identity. Then the irony hit that my answer to your question has been staring us both in the face from the upper left hand corner of the browser page! (That is, of course, presuming I was correct, which is so often not the case in this biz.)
Well maybe shared memory is not the right design for your problem to begin with. However we would not know, because we don't know what you try to achieve in the first place.