Visual Studio 2013 C++ projects have a /GS switch to enable buffer security check validation at runtime. We are encountering many more STATUS_STACK_BUFFER_OVERRUN errors since upgrading to VS 2013, and suspect it has something to do with improved checking of buffer overrun in the new compiler. I've been trying to verify this and better understand how buffer overrun is detected. I'm befuddled by the fact that buffer overrun is reported even when the memory updated by a statement only changes the contents of another local variable on the stack in the same scope! So it must be checking not only that the change doesn't corrupt memory not "owned" by a local variable, but that the change doesn't affect any local variable other than that allocated to the one referenced by the individual update statement. How does this work? Has it changed since VS 2010?
Edit:
Here's an example illustrating a case that Mysticial's explanation doesn't cover:
void TestFunc1();
int _tmain(int argc, _TCHAR* argv[])
{
TestFunc1();
return 0;
}
void TestFunc1()
{
char buffer1[4] = ("123");
char buffer2[4] = ("456");
int diff = buffer1 - buffer2;
printf("%d\n", diff);
getchar();
buffer2[4] = '\0';
}
The output is 4 indicating that the memory about to be overwritten is within the bounds of buffer1 (immediately after buffer2), but then the program terminates with a buffer overrun. Technically it should be considered a buffer overrun, but I don't know how it's being detected since it's still within the local variables' storage and not really corrupting anything outside local variables.
This screenshot with memory layout proves it. After stepping one line the program aborted with the buffer overrun error.
I just tried the same code in VS 2010, and although debug mode caught the buffer overrun (with a buffer offset of 12), in release mode it did not catch it (with a buffer offset of 8). So I think VS 2013 tightened the behavior of the /GS switch.
Edit 2:
I managed to sneak past even VS 2013 range checking with this code. It still did not detect that an attempt to update one local variable actually updated another:
void TestFunc()
{
char buffer1[4] = "123";
char buffer2[4] = "456";
int diff;
if (buffer1 < buffer2)
{
puts("Sequence 1,2");
diff = buffer2 - buffer1;
}
else
{
puts("Sequence 2,1");
diff = buffer1 - buffer2;
}
printf("Offset: %d\n", diff);
switch (getchar())
{
case '1':
puts("Updating buffer 1");
buffer1[diff] = '!';
break;
case '2':
puts("Updating buffer 2");
buffer2[diff] = '!';
break;
}
getchar(); // Eat enter keypress
printf("%s,%s\n", buffer1, buffer2);
}
You are seeing an improvement to the /GS mechanism, first added to VS2012. Originally /GS could detect buffer overflows but there's still a loop-hole where attacking code can stomp the stack but bypass the cookie. Roughly like this:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
}
If the attacker can manipulate the value of index then the cookie doesn't help. This code is now rewritten to:
void foo(int index, char value) {
char buf[256];
buf[index] = value;
if (index >= 256) __report_rangefailure();
}
Just plain index checking. Which, when triggered, instantly terminates the app with __fastfail() if no debugger is attached. Backgrounder is here.
From the MSDN page on /GS in Visual Studio 2013 :
Security Checks
On functions that the compiler recognizes as subject to buffer overrun problems, the compiler allocates space on the stack before the return address. On function entry, the allocated space is loaded with a security cookie that is computed once at module load. On function exit, and during frame unwinding on 64-bit operating systems, a helper function is called to make sure that the value of the cookie is still the same. A different value indicates that an overwrite of the stack may have occurred. If a different value is detected, the process is terminated.
for more details, the same page refers to Compiler Security Checks In Depth:
What /GS Does
The /GS switch provides a "speed bump," or cookie, between the buffer and the return address. If an overflow writes over the return address, it will have to overwrite the cookie put in between it and the buffer, resulting in a new stack layout:
Function parameters
Function return address
Frame pointer
Cookie
Exception Handler frame
Locally declared variables and buffers
Callee save registers
The cookie will be examined in more detail later. The function's execution does change with these security checks. First, when a function is called, the first instructions to execute are in the function’s prolog. At a minimum, a prolog allocates space for the local variables on the stack, such as the following instruction:
sub esp, 20h
This instruction sets aside 32 bytes for use by local variables in the function. When the function is compiled with /GS, the functions prolog will set aside an additional four bytes and add three more instructions as follows:
sub esp,24h
mov eax,dword ptr [___security_cookie (408040h)]
xor eax,dword ptr [esp+24h]
mov dword ptr [esp+20h],eax
The prolog contains an instruction that fetches a copy of the cookie, followed by an instruction that does a logical xor of the cookie and the return address, and then finally an instruction that stores the cookie on the stack directly below the return address. From this point forward, the function will execute as it does normally. When a function returns, the last thing to execute is the function’s epilog, which is the opposite of the prolog. Without security checks, it will reclaim the stack space and return, such as the following instructions:
add esp,20h
ret
When compiled with /GS, the security checks are also placed in the epilog:
mov ecx,dword ptr [esp+20h]
xor ecx,dword ptr [esp+24h]
add esp,24h
jmp __security_check_cookie (4010B2h)
The stack's copy of the cookie is retrieved and then follows with the XOR instruction with the return address. The ECX register should contain a value that matches the original cookie stored in the __security_cookie variable. The stack space is then reclaimed, and then, instead of executing the RET instruction, the JMP instruction to the __security_check_cookie routine is executed.
The __security_check_cookie routine is straightforward: if the cookie was unchanged, it executes the RET instruction and ends the function call. If the cookie fails to match, the routine calls report_failure. The report_failure function then calls __security_error_handler(_SECERR_BUFFER_OVERRUN, NULL). Both functions are defined in the seccook.c file of the C run-time (CRT) source files.
Related
We all know that stack is growing downward, so it's really a straightforward assumption that if we find the address of the last declared variable, we will get out the smallest address in stack, so we could just assume that this address will be our residual available stack.
And i did it, and i got just humongous address {0x000000dc9354f540} = {947364623680} we know that stack growing downward and we know that we can't go lower than 0.
so a bit of math:
947364623680 / (1024*1024*1024) = 882.302060425
--> Do they imply that i have 882Gb of stack on my machine?!
I test it and obviously get the stack overflow exception after allocating additional 2mb on stack:
uint8 array[1024*1024*2] = {};
And there my question come WTF is this, and how can i get my actual stack size?
Since you question has a tag "visual-studio-debugging" I assume you use windows.
First you should get the current stack pointer. Either get an address of a local dummy variable (like you did now), or by raw asm read esp/rsp, or get an address of a local dummy variable (like you did now), or get CPU register via Win32 API call to GetThreadContext).
Now, in order to find out the available stack size you may use VirtualQuery to see the starting address of this virtual memory region (aka allocation base address). Basically subtracting those pointers would give you the remaining stack size (precision up to the size of the current stack frame).
Long time ago I've written an article about this subject, including querying the currently allocated/reserved stack size. You can find out more info there if you want:
Do they imply that i have 882Gb of stack on my machine?!
It has nothing to do with the "stack on your machine". It's about virtual address space, which has nothing to do with the physical storage (RAM + page files) available in the system.
Another approach to get an approximate value of the stack space left at any given point in a win32 application would be something like the following function. It uses structured exception handling to catch the stack overflow exception.
Note: #valdo's solution is the correct solution. I'm posting this answer because it's kind of an interesting way to solve it. It's going to be very slow because it's runtime is linear (in terms of stack size), as opposed to constant runtime of #valdo's solution.
static uint64_t GetAvailableStackSpace()
{
volatile uint8_t var;
volatile uint8_t* addr = &var;
volatile uint8_t sink;
auto filter = [](unsigned int code) -> int
{
return (code == EXCEPTION_STACK_OVERFLOW) ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH;
};
__try
{
while (true)
{
addr = addr - 1024;
sink = *addr;
}
}
__except (filter(GetExceptionCode()))
{
return (&var - addr);
}
return 0;
}
This is an implementation of the VirtualQuery technique mentioned by #valdo.
This function returns an approximate number of bytes of stack available. I tested this on Windows x64.
static uint64_t GetAvailableStackSpace()
{
volatile uint8_t var;
MEMORY_BASIC_INFORMATION mbi;
auto virtualQuerySuccess = VirtualQuery((LPCVOID)&var, &mbi, sizeof(mbi));
if (!virtualQuerySuccess)
{
return 0;
}
return &var - mbi.AllocationBase;
}
I'm trying to do some kind of timing attack to a Java Card.I need a way to measure the time elapsed between sending the command and getting the answer.I'm using the winscard.h interface and the language is c++. .I created a wrapper to winscard.h interface in order to make my work easier. For example for sending an APDU now i'm using this code which seems to work.
Based on this answer I updated my code
byte pbRecvBuffer[258];
long rv;
if (this->sessionHandle >= this->internal.vSessions.size())
throw new SmartCardException("There is no card inserted");
SCARD_IO_REQUEST pioRecvPci;
pioRecvPci.dwProtocol = (this->internal.vSessions)[sessionHandle].dwActiveProtocol;
pioRecvPci.cbPciLength = sizeof(pioRecvPci);
LPSCARD_IO_REQUEST pioSendPci;
if ((this->internal.vSessions)[sessionHandle].dwActiveProtocol == SCARD_PROTOCOL_T1)
pioSendPci = (LPSCARD_IO_REQUEST)SCARD_PCI_T1;
else
pioSendPci = (LPSCARD_IO_REQUEST)SCARD_PCI_T0;
word expected_length = 258;//apdu.getExpectedLen();
word send_length = apdu.getApduLength();
CardSession session = (this->internal.vSessions).operator[](sessionHandle);
byte * data = const_cast<Apdu&>(apdu).getNonConstantData();
auto start = Timer::now();
rv = SCardTransmit(session.hCard, pioSendPci,data,
send_length, &pioRecvPci, pbRecvBuffer,&expected_length);
auto end = Timer::now();
auto duration = (float)(end - start) / Timer::ticks();
return *new ApduResponse(pbRecvBuffer, expected_length,duration);
class Timer
{
public:
static inline int ticks()
{
LARGE_INTEGER ticks;
QueryPerformanceFrequency(&ticks);
return ticks.LowPart;
}
static inline __int64 now()
{
struct { __int32 low, high; } counter;
__asm cpuid
__asm push EDX
__asm rdtsc
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
return *(__int64 *)(&counter);
}
};
My code fails with error The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.. My guessing is that instruction rdtsc is not supported by my Intel Processor.I have an Intel Broadwell 5500U.
.I'm looking for a proper way to do this kind of measurement and get eventually responses with a more accuracy.
The error message that you provided
The value of ESP was not properly saved across a function call. This
is usually a result of calling a function declared with one calling
convention with a function pointer declared with a different calling
convention.
indicates a mistake in the inline assembly function that you call. Assuming that the default calling convention is used when calling it, it's fundamentally flawed : cpuid destroys ebx, which is a callee-saved register. Furthermore, it only pushes one argument to the stack, and pops two : the second pop is effectively (most possibly) the return address of the function, or the base pointer saved as a part of the stack frame. As a result, the function fails when it calls ret, since it has no valid address to return to, or the runtime detects that the new value of esp (which is restored from the value at the beginning of the function) is simply invalid. This has nothing to do with the CPU that you're using, since all x86 CPUs support RDTSC - though the base clock that it uses may be different depending on the CPU's current speed state, which is why using the instruction directly is discouraged, and OS facilities should be favoured over it, as they offer compensation for different implementations of the instruction on various steppings.
Seeing how you're using C++11 - judging by the use of auto - use std::chrono for measuring time intervals. If that doesn't work for some reason, use the facilities provided by your OS (this looks like Windows, so QueryPerformanceCounter is probably the one to use). If this still doesn't satisfy you, you can just generate the rdtsc by using the __rdtsc intrinsic function and not worry about inline assembly.
For example:
In the file demo.c,
#inlcude<stdio.h>
int a = 5;
int main(){
int b=5;
int c=a;
printf("%d", b+c);
return 0;
}
For int a = 5, does the compiler translate this into something like store 0x5 at the virtual memory address, for example, Ox0000000f in the const area so that for int c = a, it is translated to something like movl 0x0000000f %eax?
Then for int b = 5, the number 5 is not put into the const area, but translated directly to a immediate in the assembly instruction like mov $0x5 %ebx.
It depends. Your program has several constants:
int a = 5;
This is a "static" initialization (which occurs when the program text and data is loaded before running). The value is stored in the memory reserved by a which is in a read-write data "program section". If something changes a, the value 5 is lost.
int b=5;
This is a local variable with limited scope (only by main()). The storage could well be a CPU register or a location on the stack. The instructions generated for most architectures will place the value 5 in an instruction as "immediate data", for an x86 example:
mov eax, 5
The ability for instructions to hold arbitrary constants is limited. Small constants are supported by most CPU instructions. "Large" constants are not usually directly supported. In that case the compiler would store the constant in memory and load it instead. For example,
.psect rodata
k1 dd 3141592653
.psect code
mov eax k1
The ARM family has a powerful design for loading most constants directly: any 8-bit constant value can be rotated any even number of times. See this page 2-25.
One not-as-obvious but totally different item is in the statement:
printf("%d", b+c);
The string %d is, by modern C semantics, a constant array of three char. Most modern implementations will store it in read-only memory so that attempts to change it will cause a SEGFAULT, which is a low level CPU error which usually causes the program to instantly abort.
.psect rodata
s1 db '%', 'd', 0
.psect code
mov eax s1
push eax
In OP's program, a is an "initialized" "global". I expect that it is placed in the initialized part of the data segment. See https://en.wikipedia.org/wiki/File:Program_memory_layout.pdf, http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.gif (from more info on Memory layout of an executable program (process)). The location of a is decided by the compiler- linker duo.
On the other hand, being automatic (stack) variables, b and c are expected in the stack segment.
Being said that, the compiler/linker has the liberty to perform any optimization as long as the observed behavior is not violated (What exactly is the "as-if" rule?). For example, if a is never referenced, then it may be optimized out completely.
I'm trying to debug a rather large program with many variables. The code is setup in this way:
while (condition1) {
//non timing sensitive code
while (condition2) {
//timing sensitive code
//many variables that change each iteration
}
}
I have many variables on the inner loop that I want to save for viewing. I want to write them to a text file each outer loop iteration. The inner loop executes a different number of times each iteration. It can be just 2 or 3, or it can be several thousands.
I need to see all the variables values from each inner iteration, but I need to keep the inner loop as fast as possible.
Originally, I tried just storing each data variable in its own vector where I just appended a value at each inner loop iteration. Then, when the outer loop iteration came, I would read from the vectors and write the data to a debug file. This quickly got out of hand as variables were added.
I thought about using a string buffer to store the information, but I'm not sure if this is the fastest way given strings would need to be created multiple times within the loop. Also, since I don't know the number of iterations, I'm not sure how large the buffer would grow.
With the information stored being in formats such as:
"Var x: 10\n
Var y: 20\n
.
.
.
Other Text: Stuff\n"
So, is there a cleaner option for writing large amounts of debug data quickly?
If it's really time-sensitive, then don't format strings inside the critical loop.
I'd go for appending records to a log buffer of binary records inside the critical loop. The outer loop can either write that directly to a binary file (which can be processed later), or format text based on the records.
This has the advantage that the loop only needs to track a couple extra variables (pointers to the end of used and allocated space of one std::vector), rather than two pointers for a std::vector for every variable being logged. This will have much lower impact on register allocation in the critical loop.
In my testing, it looks like you just get a bit of extra loop overhead to track the vector, and a store instruction for every variable you want to log. I didn't write a big enough test loop to expose any potential problems from keeping all the variables "alive" until the emplace_back(). If the compiler does a bad job with bigger loops where it needs to spill registers, see the section below about using a simple array without any size checking. That should remove any constraint on the compiler that makes it try to do all the stores into the log buffer at the same time.
Here's an example of what I'm suggesting. It compiles and runs, writing a binary log file which you can hexdump.
See the source and asm output with nice formatting on the Godbolt compiler explorer. It can even colourize source and asm lines so you can more easily see which asm comes from which source line.
#include <vector>
#include <cstdint>
#include <cstddef>
#include <iostream>
struct loop_log {
// Generally sort in order of size for better packing.
// Use as narrow types as possible to reduce memory bandwidth.
// e.g. logging an int loop counter into a short log record is fine if you're sure it always in-practice fits in a short, and has zero performance downside
int64_t x, y, z;
uint64_t ux, uy, uz;
int32_t a, b, c;
uint16_t t, i, j;
uint8_t c1, c2, c3;
// isn't there a less-repetitive way to write this?
loop_log(int64_t x, int32_t a, int outer_counter, char c1)
: x(x), a(a), i(outer_counter), c1(c1)
// leaves other members *uninitialized*, not zeroed.
// note lack of gcc warning for initializing uint16_t i from an int
// and for not mentioning every member
{}
};
static constexpr size_t initial_reserve = 10000;
// take some args so gcc can't count the iterations at compile time
void foo(std::ostream &logfile, int outer_iterations, int inner_param) {
std::vector<struct loop_log> log;
log.reserve(initial_reserve);
int outer_counter = outer_iterations;
while (--outer_counter) {
//non timing sensitive code
int32_t a = inner_param - outer_counter;
while (a != 0) {
//timing sensitive code
a <<= 1;
int64_t x = outer_counter * (100LL + a);
char c1 = x;
// much more efficient code with gcc 5.3 -O3 than push_back( a struct literal );
log.emplace_back(x, a, outer_counter, c1);
}
const auto logdata = log.data();
const size_t bytes = log.size() * sizeof(*logdata);
// write group size, then a group of records
logfile.write( reinterpret_cast<const char *>(&bytes), sizeof(bytes) );
logfile.write( reinterpret_cast<const char *>(logdata), bytes );
// you could format the records into strings at this point if you want
log.clear();
}
}
#include <fstream>
int main() {
std::ofstream logfile("dbg.log");
foo(logfile, 100, 10);
}
gcc's output for foo() pretty much optimizes away all the vector overhead. As long as the initial reserve() is big enough, the inner loop is just:
## gcc 5.3 -masm=intel -O3 -march=haswell -std=gnu++11 -fverbose-asm
## The inner loop from the above C++:
.L59:
test rbx, rbx # log // IDK why gcc wants to check for a NULL pointer inside the hot loop, instead of doing it once after reserve() calls new()
je .L6 #,
mov QWORD PTR [rbx], rbp # log_53->x, x // emplace_back the 4 elements
mov DWORD PTR [rbx+48], r12d # log_53->a, a
mov WORD PTR [rbx+62], r15w # log_53->i, outer_counter
mov BYTE PTR [rbx+66], bpl # log_53->c1, x
.L6:
add rbx, 72 # log, // struct size is 72B
mov r8, r13 # D.44727, log
test r12d, r12d # a
je .L58 #, // a != 0
.L4:
add r12d, r12d # a // a <<= 1
movsx rbp, r12d # D.44726, a // x = ...
add rbp, 100 # D.44726, // x = ...
imul rbp, QWORD PTR [rsp+8] # x, %sfp // x = ...
cmp r14, rbx # log$D40277$_M_impl$_M_end_of_storage, log
jne .L59 #, // stay in this tight loop as long as we don't run out of reserved space in the vector
// fall through into code that allocates more space and copies.
// gcc generates pretty lame copy code, using 8B integer loads/stores, not rep movsq. Clang uses AVX to copy 32B at a time
// anyway, that code never runs as long as the reserve is big enough
// I guess std::vector doesn't try to realloc() to avoid the copy if possible (e.g. if the following virtual address region is unused) :/
An attempt to avoid repetitive constructor code:
I tried a version that uses a braced initializer list to avoid having to write a really repetitive constructor, but got much worse code from gcc:
#ifdef USE_CONSTRUCTOR
// much more efficient code with gcc 5.3 -O3.
log.emplace_back(x, a, outer_counter, c1);
#else
// Put the mapping from local var names to struct member names right here in with the loop
log.push_back( (struct loop_log) {
.x = x, .y =0, .z=0, // C99 designated-initializers are a GNU extension to C++,
.ux=0, .uy=0, .uz=0, // but gcc doesn't support leaving having uninitialized elements before the last initialized one:
.a = a, .b=0, .c=0, // without all the ...=0, you get "sorry, unimplemented: non-trivial designated initializers not supported"
.t=0, .i = outer_counter, .j=0,
.c1 = (uint8_t)c1
} );
#endif
This unfortunately stores a struct onto the stack and then copies it 8B at a time with code like:
mov rax, QWORD PTR [rsp+72]
mov QWORD PTR [rdx+8], rax // rdx points into the vector's buffer
mov rax, QWORD PTR [rsp+80]
mov QWORD PTR [rdx+16], rax
... // total of 9 loads/stores for a 72B struct
So it will have more impact on the inner loop.
There are a few ways to push_back() a struct into a vector, but using a braced-initializer-list unfortunately seems to always result in a copy that doesn't get optimized away by gcc 5.3. It would nice to avoid writing a lot of repetitive code for a constructor. And with designated initializer lists ({.x = val}), the code inside the loop wouldn't have to care much about what order the struct actually stores things. You could just write them in easy-to-read order.
BTW, .x= val C99 designated-initializer syntax is a GNU extension to C++. Also, you can get warnings for forgetting to initialize a member in a braced-list with gcc's -Wextra (which enables -Wmissing-field-initializers).
For more on syntax for initializers, have a look at Brace-enclosed initializer list constructor and the docs for member initialization.
This was a fun but terrible idea:
// Doesn't compiler. Worse: hard to read, probably easy to screw up
while (outerloop) {
int64_t x=0, y=1;
struct loop_log {int64_t logx=x, logy=y;}; // loop vars as default initializers
// error: default initializers can't be local vars with automatic storage.
while (innerloop) { x+=y; y+=x; log.emplace_back(loop_log()); }
}
Lower overhead from using a flat array instead of a std::vector
Perhaps trying to get the compiler to optimize away any kind of std::vector operation is less good than just making a big array of structs (static, local, or dynamic) and keeping a count yourself of how many records are valid. std::vector checks to see if you've used up the reserved space on every iteration, but you don't need anything like that if there is a fixed upper-bound you can use to allocate enough space to never overflow. (Depending on the platform and how you allocate the space, a big chunk of memory that's allocated but never written isn't really a problem. e.g. on Linux, malloc uses mmap(MAP_ANONYMOUS) for big allocations, and that gives you pages that are all copy-on-write mapped to a zeroed physical page. The OS doesn't need to allocate physical pages until you write, them. The same should apply to a large static array.)
So in your loop, you could just have code like
loop_log *current_record = logbuf;
while(inner_loop) {
int64_t x = ...;
current_record->x = x;
...
current_record->i = (short)outer_counter;
...
// or maybe
// *current_record = { .x = x, .i = (short)outer_counter };
// compilers will probably have an easier time avoiding any copying with a braced initializer list in this case than with vector.push_back
current_record++;
}
size_t record_bytes = (current_record - log) * sizeof(log[0]);
// or size_t record_bytes = static_cast<char*>(current_record) - static_cast<char*>(log);
logfile.write((const char*)logbuf, record_bytes);
Scattering the stores throughout the inner loop will require the array pointer to be live all the time, but OTOH doesn't require all the loop variables to be live at the same time. IDK if gcc would optimize an emplace_back to store each variable into the vector once the variable was no longer needed, or if it might spill variables to the stack and then copy them all into the vector in one group of instructions.
Using log[records++].x = ... might lead to the compiler keeping the array and counter tying up two registers, since we'd use the record count in the outer loop. We want the inner loop to be fast, and can take the time to do the subtraction in the outer loop, so I wrote it with pointer increments to encourage the compiler to only use one register for that piece of state. Besides register pressure, base+index store instructions are less efficient on Intel SnB-family hardware than single-register addressing modes.
You could still use a std::vector for this, but it's hard to get std::vector not to write zeroes into memory it allocates. reserve() just allocates without zeroing, but you calling .data() and using the reserved space without telling vector about it with .resize() kind of defeats the purpose. And of course .resize() will initialize all the new elements. So you std::vector is a bad choice for getting your hands on a large allocation without dirtying it.
It sounds like what you really want is to look at your program from within a debugger. You haven't specified a platform, but if you build with debug information (-g using gcc or clang) you should be able to step through the loop when starting the program from within the debugger (gdb on linux.) Assuming you are on linux, tell it to break at the beginning of the function (break ) and then run. If you tell the debugger to display all the variables you want to see after each step or breakpoint hit, you'll get to the bottom of your problem in no time.
Regarding performance: unless you do something fancy like set conditional breakpoints or watch memory, running the program through the debugger will not dramatically affect perf as long as the program is not stopped. You may need to turn down the optimization level to get meaningful information though.
I wrote a function using a recursion. While testing it, it turned out, that the function is killed without any obvious reason, while the recursion is still running.
To test this, I wrote an infinite recursion.
On my PC this function quits after about 2 seconds and the last output is about 327400.
The last number isn't always the same.
I am using Ubuntu Lucid Lynx, the GCC compiler and Eclipse as IDE. If somebody has an idea what the problem is and how I can prevent the program from exiting I would be really pleased.
#include <iostream>
void rek(double x){
std::cout << x << std::endl;
rek(x + 1);
}
int main(int argc, char **argv) {
rek(1);
}
You are most likely overflowing the stack, at which point your program will be summarily killed. The depth of the stack will always limit the amount you can recurse, and if you are hitting that limit, it means your algorithm needs to change.
I think you are right in expecting the code to run forever, as explained in
How do I check if gcc is performing tail-recursion optimization?
your code should be able to run for ever and ever, if gcc is performing tail recursion. On my machine it looks like -O3 actually makes gcc generate tail calls and actually flatten the stack. :-)
I surgest you set the optimize flag to O2 or O3.
You are causing a stack overflow (running out of stack space) because you don't provide an exit condition.
void rek(double x){
if(x > 10)
return;
std::cout << x << std::endl;
rek(x + 1);
}
Are you expecting this to work forever?
It won't. At some point you're going to run out of stack.
This is funny, talking about stack overflow on stackoverflow.com. ;) The call stack is limited (you can customized it from the project settings), but at some point, when you have infinite loop calls, it will be exceed and your program terminated.
If you want to avoid a stack overflow with infinite recursion, you're unfortunately going to have to delve into some assembly in order to change the stack so that a new activation record isn't constantly pushed onto the stack, which after some point will cause the overflow. Because you make the recursive call at the end of the function, this is called in other languages where recursion is popular (i.e., Lisp, Scheme, Haskell, etc.) a trail-call optimization. It prevents a stack overflow by basically transforming the tail-call into a loop. It would be something like this in C (note: I'm using inline assembly with gcc on x86, and I changed your arguments to int from double in order to simplify the assembly. Also I've changed to C from C++ in order to avoid name-mangling of function-names. Finally the "\n\t" at the end of each statement is not an actual assembly command but is needed for inline assembly in gcc):
#include <stdio.h>
void rek(int x)
{
printf("Value for x: %d\n", x);
//we now duplicate the equvalent of `rek(x+1);` with tail-call optimization
__asm("movl 8(%ebp), %eax\n\t" //get the value of x off the stack
"incl %eax\n\t" //add 1 to the value of x
"movl 4(%ebp), %ecx\n\t" //save the return address on the stack
"movl (%ebp), %edx\n\t" //save the caller's activation record base pointer
"addl $12, %ebp\n\t" //erase the activation record
"movl %ebp, %esp\n\t" //reset the stack pointer
"pushl %eax\n\t" //push the new value of x on the stack for function call
"pushl %ecx\n\t" //push the return value back to the caller (i.e., main()) on the stack
"movl %edx, %ebp\n\t" //restore the old value of the caller's stack base pointer
"jmp rek\n\t"); //jump to the start of rek()
}
int main()
{
rek(1);
printf("Finished call\n"); //<== we never get here
return 0;
}
Compiled with gcc 4.4.3 on Ubuntu 10.04, this ran pretty much "forever" in an infinite loop with no stack overflow, where-as without the tail-call optimization, it crashed with a segmentation fault pretty quickly. You can see from the comments in the __asm section how the stack activation record space is being "recycled" so that each new call does not use up space on the stack. This involves saving the key values in the old activation record (the previous caller's activation record base pointer and the return address), and restoring them, but with the arguments changed for the next recursive call to the function.
And again, other languages, mainly functional languages, perform tail-call optimization as a base-feature of the language. So a tail-call recursive function in Scheme/Lisp/etc. won't overflow the stack since this type of stack manipulation is done under-the-hood for you when a new function call is made as the last statement of an existing function.
Well you have defined infinite recursion and overflowing the stack, which kills your app. If you really want to print all numbers; then use a loop.
int main(...)
{
double x = 1;
while (true)
{
std:cout << x << std::endl;
x += 1;
}
}
Each recursive method should implement an exit condition, otherwise you will get stack overflow and the program will terminate.
In your case, there is no condition on the parameter you are passing to the function,hence, it runs forever and eventually crashes.