Specifying thread stack size in pthreads - c++

I want to increase the stack size of a thread created through pthread_create(). The way to go seems to be
int pthread_attr_setstack( pthread_attr_t *attr,
void *stackaddr,
size_t stacksize );
from pthread.h.
However, according to multiple online references,
The stackaddr shall be aligned appropriately to be used as a stack; for example, pthread_attr_setstack() may fail with [EINVAL] if ( stackaddr & 0x7) is not 0.
My question: could someone provide an example of how to perform the alignment? Is it (the alignment) platform or implementation dependent?
Thanks in advance

Never use pthread_attr_setstack. It has a lot of fatal flaws, the worst of which is that it is impossible to ever free or reuse the stack after a thread has been created using it. (POSIX explicitly states that any attempt to do so results in undefined behavior.)
POSIX provides a much better function, pthread_attr_setstacksize which allows you to request the stack size you need, but leaves the implementation with the responsibility for allocating and deallocating the stack.

Look into posix_memalign().
It will allocate a memory block of the requested alignment and size.

Related

Do I have to lock std::queue when getting size?

I'm using std::queue in a multithreaded environment. Other threads may modify the queue as they wish. At some point I would like to call std::queue::size(). Do I have to lock the queue for that call? Will something bad happen if I don't?
This is undefined behavior and anything can happen. Behavior is undefined when one thread accesses an object while another thread is, or might be, modifying it.
I hesitate to add this because whether or not you can think of a way it can fail is not relevant. It's not defined, period. But just in case someone argues there's no imaginable way it could fail: Consider a queue that dynamically allocates a control structure that contains the size of the queue and information about each object in the queue. When the queue is enlarged, a new control structure might be allocated, the old structure freed, and a pointer updated. A concurrent call to size might grab the old pointer and then access it after it's freed and possibly contains completely different information or even has been removed from the memory map.
Just reading the size isn't going to be an issue in itself. You're going to read a single memory location. You might read the wrong size, because a change in the size made by another thread might not yet be visible in this thread, but you're not going to read a corrupted value. Such as half the bits from one value and the other from another value.

How to increase c++ thread stack size on solaris using pthreads?

What is the easiest way to increase the size of the default stacksize for pthreads? Is there any way to tune the heapsize as well, process level and individual thread level? If new operator fails because of underlying memory leakage, how do I set new handler to take care of bad allocations?
What is the easiest way to increase the size of the default stacksize for pthreads?
You can use pthread_attr_setstacksize to set the stacksize when creating new threads. The stack size must not be smaller than PTHREAD_STACK_MIN.
Is there any way to tune the heapsize as well, process level and individual thread level?
Using the Solaris compiler you can try to change the pagesize using the -xpagesize option, but you cannot adjust the size of the heap (it will be as large as the memory available to the machine). There is only one heap shared by all threads, so you cannot adjust it per-thread.
If new operator fails because of underlying memory leakage, how do I set new handler to take care of bad allocations?
The new handler is a specialized feature and there is no general answer, how to use a new handler is very dependent on the details of your program. It can't be used to fix memory leaks, once the memory is leaked it's too late so you need to prevent leaks from happening in the first place. (And if you don't know how to write a new handler then you probably don't need to use one.)

Use memory region as stack space?

In Linux is it possible to start a process (e.g. with execve) and make it use a particular memory region as stack space?
Background:
I have a C++ program and a fast allocator that gives me "fast memory". I can use it for objects that make use of the heap and create them in fast memory. Fine. But I also have a lot of variable living on the stack. How can I make them use the fast memory as well?
Idea: Implement a "program wrapper" that allocates fast memory and then starts the actual main program, passing a pointer to the fast memory and the program uses it as stack. Is that possible?
[Update]
The pthread setup seems to work.
With pthreads, you could use a secondary thread for your program logic, and set its stack address using pthread_attr_setstack():
NAME
pthread_attr_setstack, pthread_attr_getstack - set/get stack
attributes in thread attributes object
SYNOPSIS
#include <pthread.h>
int pthread_attr_setstack(pthread_attr_t *attr,
void *stackaddr, size_t stacksize);
DESCRIPTION
The pthread_attr_setstack() function sets the stack address and
stack size attributes of the thread attributes object referred
to by attr to the values specified in stackaddr and stacksize,
respectively. These attributes specify the location and size
of the stack that should be used by a thread that is created
using the thread attributes object attr.
stackaddr should point to the lowest addressable byte of a buf‐
fer of stacksize bytes that was allocated by the caller. The
pages of the allocated buffer should be both readable and
writable.
What I don't follow is how you're expecting to get any performance improvements out of doing something like this (I assume the purpose of your "fast" memory is better performance).

How can I get the size of a memory block allocated using malloc()? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
How can I get the size of an array from a pointer in C?
Is there any way to determine the size of a C++ array programmatically? And if not, why?
I get a pointer to a chunk of allocated memory out of a C style function.
Now, it would be really interesting for debugging purposes to know how
big the allocated memory block that this pointer points is.
Is there anything more elegant than provoking an exception by blindly running over its boundaries?
Thanks in advance,
Andreas
EDIT:
I use VC++2005 on Windows, and GCC 4.3 on Linux
EDIT2:
I have _msize under VC++2005
Unfortunately it results in an exception in debug mode....
EDIT3:
Well. I have tried the way I described above with the exception, and it works.
At least while I am debugging and ensuring that immediately after the call
to the library exits I run over the buffer boundaries. Works like a charm.
It just isn't elegant and in no way usable in production code.
It's not standard but if your library has a msize() function that will give you the size.
A common solution is to wrap malloc with your own function that logs each request along with the size and resulting memory range, in the release build you can switch back to the 'real' malloc.
If you don't mind sleazy violence for the sake of debugging, you can #define macros to hook calls to malloc and free and pad the first 4 bytes with the size.
To the tune of
void *malloc_hook(size_t size) {
size += sizeof (size_t);
void *ptr = malloc(size);
*(size_t *) ptr = size;
return ((size_t *) ptr) + 1;
}
void free_hook (void *ptr) {
ptr = (void *) (((size_t *) ptr) - 1);
free(ptr);
}
size_t report_size(ptr) {
return * (((size_t *) ptr) - 1);
}
then
#define malloc(x) malloc_hook(x)
and so on
The C runtime library does not provide such a function. Furthermore, deliberately provoking an exception will not tell you how big the block is either.
Usually the way this problem is solved in C is to maintain a separate variable which keeps track of the size of the allocated block. Of course, this is sometimes inconvenient but there's generally no other way to know.
Your C runtime library may provide some heap debug functions that can query allocated blocks (after all, free() needs to know how big the block is), but any of this sort of thing will be nonportable.
With gcc and the GNU linker, you can easily wrap malloc
#include <stdlib.h>
#include <stdio.h>
void* __real_malloc(size_t sz);
void* __wrap_malloc(size_t sz)
{
void *ptr;
ptr = __real_malloc(sz);
fprintf(stderr, "malloc of size %d yields pointer %p\n", sz, ptr);
/* if you wish to save the pointer and the size to a data structure,
then remember to add wrap code for calloc, realloc and free */
return ptr;
}
int main()
{
char *x;
x = malloc(103);
return 0;
}
and compile with
gcc a.c -o a -Wall -Werror -Wl,--wrap=malloc
(Of course, this will also work with c++ code compiled with g++, and with the new operator (through it's mangled name) if you wish.)
In effect, the statically/dynamically loaded library will also use your __wrap_malloc.
No, and you can't rely on an exception when overrunning its boundaries, unless it's in your implementation's documentation. It's part of the stuff you really don't need to know about to write programs. Dig into your compiler's documentation or source code if you really want to know.
There is no standard C function to do this. Depending on your platform, there may be a non-portable method - what OS and C library are you using?
Note that provoking an exception is unreliable - there may be other allocations immediately after the chunk you have, and so you might not get an exception until long after you exceed the limits of your current chunk.
Memory checkers like Valgrind's memcheck and Google's TCMalloc (the heap checker part) keep track of this sort of thing.
You can use TCMalloc to dump a heap profile that shows where things got allocated, or you can just have it check to make sure your heap is the same at two points in program execution using SameHeap().
Partial solution: on Windows you can use the PageHeap to catch a memory access outside the allocated block.
PageHeap is an alternate memory manager present in the Windows kernel (in the NT varieties but nobody should be using any other version nowadays). It takes every allocation in a process and returns a memory block that has its end aligned with the end of a memory page, then it makes the following page unaccessible (no read, no write access). If the program tries to read or write past the end of the block, you'll get an access violation you can catch with your favorite debugger.
How to get it: Download and install the package Debugging Tools for Windows from Microsoft: http://www.microsoft.com/whdc/devtools/debugging/default.mspx
then launch the GFlags utility, go to the 3rd tab and enter the name of your executable, then Hit the key. Check the PageHeap checkbox, click OK and you're good to go.
The last thing: when you're done with debugging, don't ever forget to launch GFlags again, and disable PageHeap for the application. GFlags enters this setting into the Registry (under HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\), so it is persistent, even across reboots.
Also, be aware that using PageHeap can increase the memory needs of your application tremendously.
The way to do what you want is to BE the allocator. If you filter all requests, and then record them for debugging purposes, then you can find out what you want when the memory is free'd.
Additionally, you can check at the end of the program to see if all allocated blocks were freed, and if not, list them. An ambitious library of this sort could even take FUNCTION and LINE parameters via a macro to let you know exactly where you are leaking memory.
Finally, Microsoft's MSVCRT provides a a debuggable heap that has many useful tools that you can use in your debug version to find memory problems: http://msdn.microsoft.com/en-us/library/bebs9zyz.aspx
On Linux, you can use valgrind to find many errors. http://valgrind.org/

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.
Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference
I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).
I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?
There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}
That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.
With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.
Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.
It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.
Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)
I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.
I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.
I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.
It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.