How to set the stacksize with C++11 std::thread - c++

I've been trying to familiarize myself with the std::thread library in C++11, and have arrived at a stumbling block.
Initially I come from a posix threads background, and was wondering how does one setup the stack size of the std::thread prior to construction, as I can't seem to find any references to performing such a task.
Using pthreads setting the stack size is done like this:
void* foo(void* arg);
.
.
.
.
pthread_attr_t attribute;
pthread_t thread;
pthread_attr_init(&attribute);
pthread_attr_setstacksize(&attribute,1024);
pthread_create(&thread,&attribute,foo,0);
pthread_join(thread,0);
Is there something similar when using std::thread?
I've been using the following reference:
http://en.cppreference.com/w/cpp/thread

Initially I come from a posix threads background, and was wondering how does one setup the stack size of the std::thread prior to construction, as I can't seem to find any references to performing such a task.
You can't. std::thread doesn't support this because std::thread is standardized, and C++ does not require that a machine even has a stack, much less a fixed-size one.
pthreads are more restrictive in terms of the hardware that they support, and it assumes that there is some fixed stack size per thread. (So you can configure this)

As Loki Astari already said, it is extremely rare to actually need a non-default stack-size and usually either a mistake or the result of bad coding.
If you feel like the default stack size is too big for your needs and want to reduce it, just forget about it. Every modern OS now uses virtual memory / on-demand commit, which means that memory is only reserved, not actually allocated, until you access the pages. Reducing the stack size will not reduce your actual memory footprint.
Due to this very behaviour, OSes can afford to set the default stack size to very big values. E.g. on a vanilla Debian this is 8MB (ulimit -s) which should be enough for every need. If you still manage to hit that limit, my first idea would be that your code is wrong, so you should first and foremost review it and move things to the heap, transform recursive functions into loops, etc.
If despite all of this you really really need to change the stack size (i.e. increase it, since reducing it is useless), on POSIX you can always use setrlimit at the start of your program to increase the default stack size. Sure this will affect all threads, but only the ones who need it will actually use the additional memory.
Last but not least, in all fairness I can see a corner case where reducing the stack size would make sense: if you have tons of threads on a 32 bits system, they could eat up your virtual address space (again, not the actual memory consumption) up to the point that you don't have enough address space available for the heap. Again, setrlimit is your friend here even though I'd advise to move to a 64 bits system to benefit from the larger virtual address space (and if your program is that big anyway, you'll probably benefit from the additional RAM too).

I have also been investigating this issue. For some applications, the default stack size is not adequate. Examples: the program does deep recursion dependent on the specific problem it is solving; the program needs to create many threads and memory consumption is an issue.
Here is a summary of (partial) solutions / workarounds I found:
g++ supports a -fsplit-stack option on Linux. See for more information about Split stacks. Here is summary from their website:
The goal of split stacks is to permit a discontiguous stack which is
grown automatically as needed. This means that you can run multiple
threads, each starting with a small stack, and have the stack grow and
shrink as required by the program.
Remark: -fsplit-stack only worked for me after I started using the gold linker.
It seems clang++ will also support this flag. The version I tried (clang++ 3.3) crashed when trying to compile my application using the flag -fsplit-stack.
On Linux, set the stack size by executing ulimit -s <size> before starting your application. size is the stack size in Kbs. Remark: the command unlimit -s unlimited did not affect the size of threads created with std::thread. When I used ulimit -s unlimited, the main thread could grow, but the threads created with std::thread had the default size.
On Windows using Visual Studio, we can use use the linker /STACK parameter or /STACKSIZE in the module definition file, this is the default size for all created threads. See this link for more information. We can also modify this parameter in any executable using the command line tool EDITBIN.
On Windows using mingw g++, we can use the option -Wl,--stack,<size>. For some reason, when using cygwin g++, this flag only affects the size of the main thread.
Approaches that did not work for me:
ulimit -s <size> on OSX. It affects only the size of the main thread. Moreover, the Mac OSX default for a pthread stack size is 512kB.
setrlimit only affects the size of the main thread on Linux and OSX. On cygwin, it never worked for me, it seems it always returns an error.
For OSX, the only alternative seems to use boost::thread instead of std::thread, but this is not nice if we want to stick with the standard. I hope g++ and clang++ will also support -fsplit-stack on OSX in the future.

I found this in Scott Meyers book Overview of the New C++(C++0x), as it's quite long I can't post it as a comment, is this helpful?
There is also a standard API for getting at the platform-specific
handles behind threads, mutexes, condition variables, etc.. These
handles are assumed to be the mechanism for setting thread priorities,
setting stack sizes, etc. (Regarding setting stack sizes, Anthony
Williams notes: "Of those OSs that support setting the stack size,
they all do it differently. If you're coding for a specify platform
(such that use of the native_handle would be OK), then you could use
that platform's facilities to switch stacks. e.g. on POSIX you could
use makecontext and swapcontext along with explicit allocation of a
stack, and on Windows you could use Fibers. You could then use the
platform-specific facilities (e.g. Linker flags) to set the default
stack size to something really tiny, and then switch stacks to
something bigger where necessary.“)

Was looking for the answer to this myself just now.
It appears that while std::thread does not support this, boost::thread does.
In particular, you can use boost::thread::attributes to accomplish this:
boost::thread::attributes attrs;
attrs.set_stack_size(4096*10);
boost::thread myThread(attrs, fooFunction, 42);

You can do some modifications like this if you don't want to include a big library.
It is still dependend C++ compiler STL library. (Clang / MSVC now)
HackingSTL Library
std::thread thread = std::stacking_thread(65536, []{
printf("Hello, world!\n");
});

Related

Capping allocated memory in multi-threaded C++ library

I've developed a library in C++ that allows multi-threaded usage. I want to support an option for the caller to specify a cap on the memory allocated by a given thread. (We can ignore the case of one thread allocating memory and others using it.)
Possibly making this more complicated is that my library uses various open source components (boost, ICU, etc), some of which are statically linked and others dynamically.
One option I've been looking into is overriding the allocation functions (new/delete/etc) to do the bookkeeping per thread ID. Natural concerns come up around the bookkeeping: performance, etc.
But an even bigger question/concern is whether this approach will work with the open source components without code changes to them?
I can't seem to find pre-existing solutions for this, though it seems to me like it's not very unusual.
Any suggestions on this approach, or another approach?
EDIT: More background: The library can allocate a significantly large range of memory per calling thread depending on the input provided (ie. KBs to GBs).
So the goal of this request is to (more graciously & deterministically) support running in RAM-constrained environments. This is not for a hard-real-time environment with strict memory limits--it's to support a number of concurrent threads which each have a "safe" allocation cap to avoid engaging the page/swap file.
Basic example use case: a system with 32GB RAM, 20GB free, the application using my library may configure itself to use a max of 10 threads and configure the library to use a max of 1GB per thread.
Upon hitting the cap the current thread's call into the library will cease further work and return a suitable error. (The code is already fully RAII so unwinding cleanly is easy.)
BTW I found some interesting content on the web already, sadly none provide a lot of hope for a "simple & effective" solution. But this one is especially insightful.

How big is the stack memory for a certain program, and are there any compiler flags that can set it?

As the title states: Is there any general "rule of thumb" about the size of the stack. I'm guessing the size will vary depending on the OS, the architecture, the size of the cache(s), how much RAM is available etc.
However can anything be said in general, or is there any way to find out, how much of the stack, this program is allowed to use?. As a bonus question is there any way (with compiler flags etc. (thinking mostly C/C++ here, but also more general)) that the size of the stack can be set to a fixed size by the user?
Btw, I'm asking strictly out of curiosity, I'm not having a stack overflow. :)
In Windows the default stack size for a thread is a million bytes, regardless of operating system, etc.
In managed code (C#, VB, etc) you can force a new thread to have a different stack size with this ctor:
http://msdn.microsoft.com/en-us/library/5cykbwz4.aspx
To change the stack size of the default thread of a Windows program, whether it is managed or not, you can use the editbin utility:
http://msdn.microsoft.com/en-us/library/xd3shwhf.aspx
Yes you can set the stack size, it usually is a linker flag, and it depends on your toolchain (typically this is referred to by the name of the compiler).
For Microsoft Visual C++, use the /F option to change the size, and DUMPBIN /HEADERS to see what the setting is.
For the GCC toolchain and most other Unix linkers, use -Wl,--stack
You will also find several existing questions here on StackOverflow.

How to diagnose weird race-condition-bug?

The bug we are tracking occurs within a specific VxWorks-based embedded environment (the vendor modified stuff to an unknown extend and provides an abstraction layer of much of the VxWorks-stuff). We have two tasks running at different priorities, executing roughly every 100ms. The task with the higher priority simply counts adds counts up an integer (just so it does anything), while the task with the lower priority creates a string, like this:
std::string text("Some text");
Note that there is no shared state between these task whatsoever. They both operate exclusively on automatic local variables.
On each run, each task does this a hundred times, so that the probability of the race-condition occurring is higher. The application runs fine for a couple of minutes, and then the CPU-load shots from 5% to 100% and stays there. The entire time appears to be used by the task that created the string. So far we have not been able to reproduce the behavior without using std::string.
We are using GCC 4.1.2 and running on VxWorks 5.5. The program is run on a Pentium III.
I have tried analyzing what happens there, but I cannot enter any of the string-methods with a debugger, and adding print-statements into basic-string does not seem to work (this was the background for this question of mine). My suspicion is that something in there corrupts the stack resulting in a power-loop. My question is, is there any know error in older VxWorks-versions that could explain this? If not, do you have any further suggestions how to diagnose this? I can get the disassembly and stack-dumps, but I have no experience in interpreting either. Can anyone provide some pointers?
If I remember, vxWorks provides thread specific memory locations (or possibly just one location). This feature lets you specify a memory location that will be automatically shadowed by task switches so that whenever a thread writes on it the value is preserved across task switches. It's sort of like an additional register save/restore.
GCC uses one of those thread-specific memory locations to track the exception stack. Even if you don't otherwise use exceptions there are some situations (particularly new, such as the std::string constructor might invoke) which implicitly create try/catch like environments which manipulate this stack. On a much older version of gcc I saw that go haywire in code that nominally did not use any exception handling.
In that case the solution was to compile with -fno-exceptions to eliminate all of that behavior, after which the problem went away.
Whenever I see a weird race-condition in a VxWorks system with unexplainable behavior, my first thought is always "VX_FP_TASK strikes again!" The first thing you should check is whether your threads are being created with the VX_FP_TASK flag in taskSpawn.
The documentation says something like "It is deadly to execute any floating-point operations in a task spawned without VX_FP_TASK option, and very difficult to find." Now, you may think that you're not using FP registers at all, but C++ uses them for some optimizations, and MMX operations (like you may be using for your add there) do require those registers to be preserved.

Setting the default stack size on Linux globally for the program

So I've noticed that the default stack size for threads on linux is 8MB (if I'm wrong, PLEASE correct me), and, incidentally, 1MB on Windows. This is quite bad for my application, as on a 4-core processor that means 64 MB is space is used JUST for threads! The worst part is, I'm never using more than 100kb of stack per thread (I abuse the heap a LOT ;)).
My solution right now is to limit the stack size of threads. However, I have no idea how to do this portably. Just for context, I'm using Boost.Thread for my threading needs. I'm okay with a little bit of #ifdef hell, but I'd like to know how to do it easily first.
Basically, I want something like this (where windows_* is linked on windows builds, and posix_* is linked under linux builds)
// windows_stack_limiter.c
int limit_stack_size()
{
// Windows impl.
return 0;
}
// posix_stack_limiter.c
int limit_stack_size()
{
// Linux impl.
return 0;
}
// stack_limiter.cpp
int limit_stack_size();
static volatile int placeholder = limit_stack_size();
How do I flesh out those functions? Or, alternatively, am I just doing this entirely wrong? Remember I have no control over the actual thread creation (no new params to CreateThread on Windows), as I'm using Boost.Thread.
You do not need to do this. The machine's physical memory is employed only where it is needed by a demand page fault system. Even if the thread stacks are significantly larger than the amount you are using the extra size is in virtual address space and does not tie up physical RAM.
Had physical RAM been tied up at that rate, a typical machine would run out of memory with only a few dozen processes running. You can see from a ps -Al that quite a few more than that execute concurrently.
I've run into similar problems on 32-bit systems (especially MIPS) running large application programs with hundreds of threads. Large default stacks don't tie up physical memory, but virtual memory can be a scarce resource as well. There are a couple of ways to resolve the problem:
Use setrlimit from within the program. I haven't done this but I suspect it would work.
Before starting the program from the shell, use "ulimit -s" with a parameter smaller than the default. (e.g., "ulimit -s 1024" for default 1 MB stack)
First, you don't need to change this unless you are getting SEGVs from hitting this limit. (see man setrlimit for detailed info)
Second, you change this in all of the linux distributions I'm aware of by editing /etc/security/limits.conf (to change the default) or by running ulimit -s <stack size in kilobytes> to change the value until you exit the shell.

Switching stacks in C++

I have some old code written in C for 16-bit using Borland C++ that switches between multiple stacks, using longjmps. It creates a new stack by doing a malloc, and then setting the SS and SP registers to the segment and offset, resp., of the address of the malloc'd area, using inline Assembler. I would like to convert it to Win32, and it looks like the two instructions should be replaced by a single one setting the ESP. The two instructions were surrounded by a CLI/STI pair, but in Win32 these give "privileged instructions", so I have cut them out for now. I am a real innocent when it comes to Windows, so, I was rather surprised that my first test case worked! So, my rather vague question is to ask the experts here if what I am doing is a) too dangerous to continue with, or b) will work if I add some code, take certain precautions, etc.? If the latter, what should be added, and where can I find out about it? Do I have to worry about any other registers, like the SS, EBX, etc.? I am using no optimization... Thanks for any tips people can give me.
Removing CLI/STI still works due to the differences in the operating environment.
On 16-bit DOS, an interrupt could occur and this interrupt would be initially running on the same stack. If you got interrupted in the middle of the operation, the interrupt could crash because you only updated ss and not sp.
On Windows, and any other modern environment, each user mode thread gets its own stack. If your thread is interrupted for whatever reason, it's stack and context are safely preserved - you don't have to worry about something else running on your thread and your stack. cli/sti in this case would be protecting against something you're already protected against by the OS.
As Greg mentioned, the safe, supported way of swapping stacks like this on Windows is CreateFiber/SwitchToFiber. This does have the side-effect of changing your entire context, so it is not like just switching the stack.
This really raises the question of what you want to do. A lot of times, switching stacks is to get by limited stack space, which was 64k on 16-bit DOS. On Windows, you have a 1 MB stack and you can allocate even larger. Why are you trying to switch stacks?
By far the safest way to do this is to port the code to official Win32 multiprogramming structures, such as threads or fibers. Fibers provide a very lightweight multi-stack paradigm that sounds like it might be suitable for your application.
The Why does Win32 even have fibers? article is an interesting read too.
I have done this in user mode, and it appears to have no problems. You do not need cli/sti, theses instructions merely prevent interrupts at that point in code, which should be unnecessary from the limited information you have told us.
Have a look at Mtasker by Bert Hubert. It does simple cooperative multitasking, it might be easy for you to use this to port your code.
Don't forget that jumping stacks is going to hose any arguments or stack-resident variables.