How to view backtrace of untracked stack - gdb

Using arm-none-eabi-gdb, Cortex M0, FreeRTOS.
How can I view stack trace of an arbitrary stack at address addr?
Do I have to manually change the stack pointer, or is there a simpler way?

Just view the memory in the debugger, remembering the stack grows downwards.

Related

How do segmented stacks work

How do segmented stacks work? This question also applies to Boost.Coroutine so I am using the C++ tag as well here. The main doubt comes from this article It looks like what they do is keep some space at the bottom of the stack and check if it has gotten corrupted by registering some sort of signal handler with the memory allocated there (perhaps via mmap and mprotect?) And then when they detect that they have run out of space they continue by allocating more memory and then continuing from there. 3 questions about this
Isn't this construct a user space thing? How do they control where the new stack is allocated and how do the instructions the program is compiled down to get aware of that?
A push instruction is basically just adding a value to the stack pointer and then storing the value in a register on the stack, then how can the push instruction be aware of where the new stack starts and correspondingly how can the pop know when it has to move the stack pointer back to the old stack?
They also say
After we've got a new stack segment, we restart the goroutine by retrying the function that caused us to run out of stack
what does this mean? Do they restart the entire goroutine? Won't this possibly cause non deterministic behavior?
How do they detect that the program has overrun the stack? If they keep a canary-ish memory area at the bottom then what happens when the user program creates an array big enough that overflows that? Will that not cause a stack overflow and is a potential security vulnerability?
If the implementations are different for Go and Boost I would be happy to know how either of them deal with this situation 🙂
I'll give you a quick sketch of one possible implementation.
First, assume most stack frames are smaller than some size. For ones that are larger, we can use a longer instruction sequence at entry to make sure there is enough stack space. Let's assume we're on an architecture that that has 4k pages and we're choosing 4k - 1 as the maximum size stack frame handled by the fast path.
The stack is allocated with a single guard page at the bottom. That is, a page that is not mapped for write. At function entry, the stack pointer is decremented by the stack frame size, which is less than the size of a page, and then the program arranges to write a value at the lowest address in the newly allocated stack frame. If the end of the stack has been reached, this write will cause a processor exception and ultimately be turned into some sort of upcall from the OS to the user program -- e.g. a signal in UNIX family OSes.
The signal handler (or equivalent) has to be able to determine this is a stack extension fault from the address of the instruction that faulted and the address it was writing to. This is determinable as the instruction is in the prolog of a function and the address being written to is in the guard page of the stack for the current thread. The instruction being in the prolog can be recognized by requiring a very specific pattern of instructions at the start of functions, or possibly by maintaining metadata about functions. (Possibly using traceback tables.)
At this point the handler can allocate a new stack block, set the stack pointer to the top of the block, do something to handle unchaining the stack block, and then call the function that faulted again. This second call is safe because the fault is in the function prolog the compiler generated and no side effects are allowed before validating there is enough stack space. (The code may also need to fixup the return address for architectures that push it onto the stack automatically. If the return address is in a register, it just needs to be in the same register when the second call is made.)
Likely the easiest way to handle unchaining is to push a small stack frame onto the new extension block for a routine that when returned to unchains the new stack block and frees the allocated memory. It then returns the processor registers to the state they were in when the call was made that caused the stack to need to be extended.
The advantage of this design is that the function entry sequence is very few instructions and is very fast in the non-extending case. The disadvantage is that in the case where the stack does need to be extended, the processor incurs an exception, which may cost much much more than a function call.
Go doesn't actually use a guard page if I understand correctly. Rather the function prolog explicitly checks the stack limit and if the new stack frame won't fit it calls a function to extend the stack.
Go 1.3 changed its design to not use a linked list of stack blocks. This is to avoid the trap cost if the extension boundary is crossed in both directions many times in a certain calling pattern. They start with a small stack, and use a similar mechanism to detect the need for extension. But when a stack extension fault does occur, the entire stack is moved to a larger block. This removes the need for unchaining entirely.
There are quite a few details glossed over here. (E.g. one may not be able to do the stack extension in the signal handler itself. Rather the handler can arrange to have the thread suspended and hand it off to a manager thread. One likely has to use a dedicated signal stack to handle the signal as well.)
Another common pattern with this sort of thing is the runtime requiring there to be a certain amount of valid stack space below the current stack frame for either something like a signal handler or for calling special routines in the runtime. Go works this way and the stack limit test guarantees a certain fixed amount of stack space is available below the current frame. One can e.g. call plain C functions on the stack so long as one guarantees they do not consume more than the fixed stack reserve amount. (One can use this to call C library routines in theory, though most of these have no formal specification of how much stack they might use.)
Dynamic allocation in the stack frame, such as alloca or stack allocated variable length arrays, add some complexity to the implementation. If the routine can compute the entire final size of the frame in the prolog then it is fairly straightforward. Any increase in the frame size while the routine is running likely has to be modeled as a new call, though with Go's new architecture that allows moving the stack, it is possible the alloca point in the routine can be made such that all the state allows a stack move to happen there.

Is it ok to allocate lots of memory on stack in single threaded applications?

I understand that if you have a multithreaded application, and you need to allocate a lot of memory, then you should allocate on heap. Stack space is divided up amongst threads of your application, thus the size of stack for each thread gets smaller as you create new threads. Thus, if you tried to allocate lots of memory on stack, it could overflow. But, assuming that you have a single-threaded application, is the stack size essentially the same as that for heap?
I read elsewhere that stack and heap don't have a clearly defined boundary in the address space, rather that they grow into each other.
P.S. Lifetime of the objects being allocated is not an issue. The objects gets created first thing in the program, and gets cleaned at exit. I don't have to worry about it going out of scope, and thus getting cleaned from stack space.
No, stack size is not the same as heap. Stack objects get pushed/popped in a LIFO manner, and used for things such as program flow. For example, arguments are "pushed" into the stack before a function call, then "popped" into function arguments to be accessed. Recursion therefore, uses a lot of stack space if you go too deep. Heap is really for pointers and allocated memory. In the real world, the stack is like the gears in your clock, and the heap is like your desk. Your clock sits on your desk, because it takes up room - but you use it for something completely different than your desk.
Check out this question on Stack Overflow:
Why is memory split up into stack and heap?'

What does stack size in a thread define in C++?

I'm using C++ and Windows.h in my source code. I read the CreateThread API in MSDN, but I still don't understand the essence of specifying stack size. By default it is 1 MB. But what will happen if I specify 32 bytes?
What does stack size in a thread define?
Please provide a thorough explanation and I'll appreciate it. Thanks.
The stack is used to store local variables, pass parameters in function calls, store return addresses. A thread's stack has a fixed size which is determined when the thread is created. That is the value that you are referring too.
The stack size is determined when the thread is created since it needs to occupy contiguous address space. That means that the entire address space for the thread's stack has to be reserved at the point of creating the thread.
If the stack is too small then it can overflow. That's an error condition known as stack overflow, from which this website took its name. When you call a function some or all of the following happens:
Parameters are pushed onto the stack.
The return address is pushed onto the stack.
A stack frame containing space for the function's local variables is created.
All of this consumes space from the stack. When the function in turn calls another function, more stack space is consumed. As the call stack goes deeper, more stack space is required.
The consequence therefore of setting the stack size too low is that you can exhaust the stack and overflow it. That is a terminal condition from which you cannot recover. Certainly 32 bytes (rounded up to one page which is 4096 bytes) is too small for almost all threads.
If you have a program with a lot of threads, and you know that the thread's don't need to reserve 1MB of stack size then there can be benefits to using a smaller stack size. Doing so can avoid exhausting the available process address space.
On the other hand you might have a program with a single thread that has deep call stacks that consume large amounts of stack space. In this scenario you might reserve more than the default 1MB.
However, unless you have strong reason to do otherwise, it is likely best to stick to the default stack size.
Stack size is just tradeoff between ability to create many threads and possibility of stack overflow in one of them.
The more stack size is, the less number of threads you can create and the less possibility of stack overflow is. You should worry about stack size only if you are going to create many threads (you will have to lower stack size but remember about stack overflow). Otherwise default value is suffice.
But what will happen if I specify 32 bytes?
I have not read the Windows documentation, but if Windows allows this (to specify only 32 bytes), you will most likely get a stack overflow. According to their documentation the value is rounded up to the page size in anycase, therefore in reality you stack size will be at least the size of a page. The created thread assumes that there exists enough "stack space" for it to use (for allocating automatic variables, storing function addresses etc), and allocates space according to it's needs. When there is not enough stack space, the stack allocator might use invalid memory, overriding memory used elsewhere.
What does stack size in a thread define?
It defines how much memory will be allocated for use by that thread's stack.
There is a good description of what exactly a thread call stack is here

when the c++ program stack size used is determined?

I know the maximum stack size usually is fixed on link (maybe on windows is that).
But I don't know when the program stack size ( not maximum stack size just used size) used is be fixed to OS. compile ? linked ? execute ?
like this:
int main(){ int a[10]; return 0;}
the program just use 10 * sizeof(int) stack. so, is the stack size fixed?
above all. if the heap size is changed when malloc or free?
Stack size is not explicitly provided to OS, when program is loaded. Instead, OS uses mechanism of page faults (if it is supported by MMU).
If you try to access memory which was not granted by operating system yet, MMU generates a page fault which is handled by OS. OS checks address of page fault and either expands stack by creating new memory page or if you have exhausted stack limits, handles it as stack overflow.
Consider following program running on x86 and Linux:
void foo(void) {
volatile int a = 10;
foo();
}
int main() {
foo();
}
It faults because of infinite recursion and stack overflow. It actually requires infinite stack to be completed. When program is loaded, OS allocates initial stack and writes it to %rsp (stack pointer). Let's look at foo() disassembly:
push %rbp
mov %rsp,%rbp <--- Save stackpointer to %rbp
sub $0x10,%rsp <--- Advance stack pointer by 16 bytes
movl $0xa,-0x4(%rbp) <--- Write memory at %rbp
callq 0x400500 <foo>
leaveq
retq
After at most 4096 / 16 = 256 calls of foo(), you will break page boundary by writing a memory at address X + 4096 where X is initial %rsp value. Then page fault will be generated, and OS provide new memory page for stack, allowing program to utilize it.
After about 500k of foo() calls (default Linux ulimit for stack), OS will detect that application utilizes too many stack pages and send SIGSEGV to it.
In an answer to a question I provided the following information:
The BSS/DATA segment contains all the global variables, initialized to a specific value or to zero by default. This segment is part of the executable image. At load time, the heap segment is added to this; however, it is not a "segment" but just the amount of extra data to be allocated as an extension of the loaded BSS/DATA segment. In the same way the stack "segment" is not a true segment but is added to the BSS+heap segment. The stack grows down whereas the heap grows up. If these overlap (more heap used and stack still growing) an "out of memory" error occurrs (heap) or "stack overflow" (stack) - this may be detected with the use of segment registers (Intel) to trigger a hardware generated exception or by using software checks.
This is the traditional way of laying out the segments. Think of older Intel chips where all progeram data must be in 64KB. With more modern chips the same layout is often used where address space of 32MB is used in this layout but only actual physical memory required is used. The stack can thus be pretty big.

Is the stack corrupted if the EBP frame pointer is NULL?

My understanding of stack traces is essentially based on What is exactly the base pointer and stack pointer? To what do they point?.
A program I have been helping to develop for years spits out a stack dump when it crashes, and I have become accustomed to evaluating these stack traces, in correspondence with a .map file that the C++ compiler produces. A number of times, I have successfully been able to walk the stack and debug issues.
However, sometimes the stack trace has a NULL EBP (frame) pointer. Here is the relevant snippet from such a sample stack dump:
Initial EBP pointer value: 04d8fab0
{at address 04d8fab0: 00000000}
As you can see, the value of the EBP frame pointer is NULL. Therefore, I cannot walk the stack.
Is this the sign of a corrupted stack, or is there another possible explanation?
As you can see, the value of the EBP frame pointer is NULL. Therefore,
I cannot walk the stack. Is this the sign of a corrupted stack, or is
there another possible explanation?
I think there is another explanation, rooted in the fact that in addition to holding the address of the current stack frame, the EBP register can also be used for any other purpose like general-purpose registers. In order to do that safely, two things are required:
Store its current content to the stack by calling
PUSH EBP
Restore the content after the general-purpose usage and before exiting the current procedue by calling
POP EBP
So I was thinking the case you were experiencing was not necessarily caused by corruption of the stack, as it technically may have been that the dump was generated while the EBP register was temporarily being used for general-purpose usage by someplace else in the process' code, maybe not even code you've written.