I want to do DFS on a 100 X 100 array. (Say elements of array represents graph nodes) So assuming worst case, depth of recursive function calls can go upto 10000 with each call taking upto say 20 bytes. So is it feasible means is there a possibility of stackoverflow?
What is the maximum size of stack in C/C++?
Please specify for gcc for both
1) cygwin on Windows
2) Unix
What are the general limits?
In Visual Studio the default stack size is 1 MB i think, so with a recursion depth of 10,000 each stack frame can be at most ~100 bytes which should be sufficient for a DFS algorithm.
Most compilers including Visual Studio let you specify the stack size. On some (all?) linux flavours the stack size isn't part of the executable but an environment variable in the OS. You can then check the stack size with ulimit -s and set it to a new value with for example ulimit -s 16384.
Here's a link with default stack sizes for gcc.
DFS without recursion:
std::stack<Node> dfs;
dfs.push(start);
do {
Node top = dfs.top();
if (top is what we are looking for) {
break;
}
dfs.pop();
for (outgoing nodes from top) {
dfs.push(outgoing node);
}
} while (!dfs.empty())
Stacks for threads are often smaller.
You can change the default at link time,
or change at run time also.
For reference, some defaults are:
glibc i386, x86_64: 7.4 MB
Tru64 5.1: 5.2 MB
Cygwin: 1.8 MB
Solaris 7..10: 1 MB
MacOS X 10.5: 460 KB
AIX 5: 98 KB
OpenBSD 4.0: 64 KB
HP-UX 11: 16 KB
Platform-dependent, toolchain-dependent, ulimit-dependent, parameter-dependent.... It is not at all specified, and there are many static and dynamic properties that can influence it.
Yes, there is a possibility of stack overflow. The C and C++ standard do not dictate things like stack depth, those are generally an environmental issue.
Most decent development environments and/or operating systems will let you tailor the stack size of a process, either at link or load time.
You should specify which OS and development environment you're using for more targeted assistance.
For example, under Ubuntu Karmic Koala, the default for gcc is 2M reserved and 4K committed but this can be changed when you link the program. Use the --stack option of ld to do that.
I just ran out of stack at work, it was a database and it was running some threads, basically the previous developer had thrown a big array on the stack, and the stack was low anyway. The software was compiled using Microsoft Visual Studio 2015.
Even though the thread had run out of stack, it silently failed and continued on, it only stack overflowed when it came to access the contents of the data on the stack.
The best advice i can give is to not declare arrays on the stack - especially in complex applications and particularly in threads, instead use heap. That's what it's there for ;)
Also just keep in mind it may not fail immediately when declaring the stack, but only on access. My guess is that the compiler declares stack under windows "optimistically", i.e. it will assume that the stack has been declared and is sufficiently sized until it comes to use it and then finds out that the stack isn't there.
Different operating systems may have different stack declaration policies. Please leave a comment if you know what these policies are.
I am not sure what you mean by doing a depth first search on a rectangular array, but I assume you know what you are doing.
If the stack limit is a problem you should be able to convert your recursive solution into an iterative solution that pushes intermediate values onto a stack which is allocated from the heap.
(Added 26 Sept. 2020)
On 24 Oct. 2009, as #pixelbeat first pointed out here, Bruno Haible empirically discovered the following default thread stack sizes for several systems. He said that in a multithreaded program, "the default thread stack size is" as follows. I added in the "Actual" size column because #Peter.Cordes indicates in his comments below my answer, however, that the odd tested numbers shown below do not include all of the thread stack, since some of it was used in initialization. If I run ulimit -s to see "the maximum stack size" that my Linux computer is configured for, it outputs 8192 kB, which is exactly 8 MB, not the odd 7.4 MB listed in the table below for my x86-64 computer with the gcc compiler and glibc. So, you can probably add a little to the numbers in the table below to get the actual full stack size for a given thread.
Note also that the below "Tested" column units are all in MB and KB (base 1000 numbers), NOT MiB and KiB (base 1024 numbers). I've proven this to myself by verifying the 7.4 MB case.
Thread stack sizes
System and std library Tested Actual
---------------------- ------ ------
- glibc i386, x86_64 7.4 MB 8 MiB (8192 KiB, as shown by `ulimit -s`)
- Tru64 5.1 5.2 MB ?
- Cygwin 1.8 MB ?
- Solaris 7..10 1 MB ?
- MacOS X 10.5 460 KB ?
- AIX 5 98 KB ?
- OpenBSD 4.0 64 KB ?
- HP-UX 11 16 KB ?
Bruno Haible also stated that:
32 KB is more than you can safely allocate on the stack in a multithreaded program
And he said:
And the default stack size for sigaltstack, SIGSTKSZ, is
only 16 KB on some platforms: IRIX, OSF/1, Haiku.
only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris.
only 4 KB on some platforms: AIX.
Bruno
He wrote the following simple Linux C program to empirically determine the above values. You can run it on your system today to quickly see what your maximum thread stack size is, or you can run it online on GDBOnline here: https://onlinegdb.com/rkO9JnaHD.
Explanation: It simply creates a single new thread, so as to check the thread stack size and NOT the program stack size, in case they differ, then it has that thread repeatedly allocate 128 bytes of memory on the stack (NOT the heap), using the Linux alloca() call, after which it writes a 0 to the first byte of this new memory block, and then it prints out how many total bytes it has allocated. It repeats this process, allocating 128 more bytes on the stack each time, until the program crashes with a Segmentation fault (core dumped) error. The last value printed is the estimated maximum thread stack size allowed for your system.
Important note: alloca() allocates on the stack: even though this looks like dynamic memory allocation onto the heap, similar to a malloc() call, alloca() does NOT dynamically allocate onto the heap. Rather, alloca() is a specialized Linux function to "pseudo-dynamically" (I'm not sure what I'd call this, so that's the term I chose) allocate directly onto the stack as though it was statically-allocated memory. Stack memory used and returned by alloca() is scoped at the function-level, and is therefore "automatically freed when the function that called alloca() returns to its caller." That's why its static scope isn't exited and memory allocated by alloca() is NOT freed each time a for loop iteration is completed and the end of the for loop scope is reached. See man 3 alloca for details. Here's the pertinent quote (emphasis added):
DESCRIPTION
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
RETURN VALUE
The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
Here is Bruno Haible's program from 24 Oct. 2009, copied directly from the GNU mailing list here:
Again, you can run it live online here.
// By Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
// =============== Program for determining the default thread stack size =========
#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
int n = 0;
for (;;) {
printf("Allocated %d bytes\n", n);
fflush(stdout);
n += 128;
*((volatile char *) alloca(128)) = 0;
}
}
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, threadfunc, NULL);
for (;;) {}
}
When I run it on GDBOnline using the link above, I get the exact same results each time I run it, as both a C and a C++17 program. It takes about 10 seconds or so to run. Here are the last several lines of the output:
Allocated 7449856 bytes
Allocated 7449984 bytes
Allocated 7450112 bytes
Allocated 7450240 bytes
Allocated 7450368 bytes
Allocated 7450496 bytes
Allocated 7450624 bytes
Allocated 7450752 bytes
Allocated 7450880 bytes
Segmentation fault (core dumped)
So, the thread stack size is ~7.45 MB for this system, as Bruno mentioned above (7.4 MB).
I've made a few changes to the program, mostly just for clarity, but also for efficiency, and a bit for learning.
Summary of my changes:
[learning] I passed in BYTES_TO_ALLOCATE_EACH_LOOP as an argument to the threadfunc() just for practice passing in and using generic void* arguments in C.
Note: This is also the required function prototype, as required by the pthread_create() function, for the callback function (threadfunc() in my case) passed to pthread_create(). See: https://www.man7.org/linux/man-pages/man3/pthread_create.3.html.
[efficiency] I made the main thread sleep instead of wastefully spinning.
[clarity] I added more-verbose variable names, such as BYTES_TO_ALLOCATE_EACH_LOOP and bytes_allocated.
[clarity] I changed this:
*((volatile char *) alloca(128)) = 0;
to this:
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
Here is my modified test program, which does exactly the same thing as Bruno's, and even has the same results:
You can run it online here, or download it from my repo here. If you choose to run it locally from my repo, here's the build and run commands I used for testing:
Build and run it as a C program:
mkdir -p bin && \
gcc -Wall -Werror -g3 -O3 -std=c11 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
Build and run it as a C++ program:
mkdir -p bin && \
g++ -Wall -Werror -g3 -O3 -std=c++17 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
It takes < 0.5 seconds to run locally on a fast computer with a thread stack size of ~7.4 MB.
Here's the program:
// =============== Program for determining the default thread stack size =========
// Modified by Gabriel Staples, 26 Sept. 2020
// Originally by Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
#include <alloca.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h> // sleep
/// Thread function to repeatedly allocate memory within a thread, printing
/// the total memory allocated each time, until the program crashes. The last
/// value printed before the crash indicates how big a thread's stack size is.
///
/// Note: passing in a `uint32_t` as a `void *` type here is for practice,
/// to learn how to pass in ANY type to a func by using a `void *` parameter.
/// This is also the required function prototype, as required by the
/// `pthread_create()` function, for the callback function (this function)
/// passed to `pthread_create()`. See:
/// https://www.man7.org/linux/man-pages/man3/pthread_create.3.html
void* threadfunc(void* bytes_to_allocate_each_loop)
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP =
*(uint32_t*)bytes_to_allocate_each_loop;
uint32_t bytes_allocated = 0;
while (true)
{
printf("bytes_allocated = %u\n", bytes_allocated);
fflush(stdout);
// NB: it appears that you don't necessarily need `volatile` here,
// but you DO definitely need to actually use (ex: write to) the
// memory allocated by `alloca()`, as we do below, or else the
// `alloca()` call does seem to get optimized out on some systems,
// making this whole program just run infinitely forever without
// ever hitting the expected segmentation fault.
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
bytes_allocated += BYTES_TO_ALLOCATE_EACH_LOOP;
}
}
int main()
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP = 128;
pthread_t thread;
pthread_create(&thread, NULL, threadfunc,
(void*)(&BYTES_TO_ALLOCATE_EACH_LOOP));
while (true)
{
const unsigned int SLEEP_SEC = 10000;
sleep(SLEEP_SEC);
}
return 0;
}
Sample output (same results as Bruno Haible's original program):
bytes_allocated = 7450240
bytes_allocated = 7450368
bytes_allocated = 7450496
bytes_allocated = 7450624
bytes_allocated = 7450752
bytes_allocated = 7450880
Segmentation fault (core dumped)
Related
This question already has answers here:
Segmentation fault on large array sizes
(7 answers)
Huge array is causing segmentation fault [duplicate]
(1 answer)
Segmentation Fault, large arrays
(1 answer)
Closed 2 months ago.
For my application I need to declare a big std::array in global memory. Its total size is about 1GB big. So I declared a global variable just like this:
#include<array>
std::array<char,1000000000> BigGlobal;
int main()
{
//Do stuff with BigGlobal
}
The code compiles fine. When I run the application I am getting the error message:
The application was unable to start correctly (0xc0000018). Click OK to close the application
I am using Visual Studio 2017. I am aware of the fact, that there is a MSVC Linker Option for the stack reserve size. But it is only relevant for local variables not for global variables. Can you please help me to fix the issue?
C++ compilers are full of limits - some make it into the standard, some don't.
Common limits include a size limit on the length of variable names, the number of times a function can call itself (directly or indirectly), the maximum size of memory grabbed by a variable with automatic storage duration and so on.
You've hit upon another limit with your use of std::array.
A sensible workaround in your case could be to use a std::vector as the type for the global, then resize that vector in the first statement of main. Of course this assumes there is no use of the global variable prior to program control reaching main - if there is then put it somewhere more explicit.
tl;dr
The solution is to increase the stack size. e.g., when using Visual Studio 2022, pass in /STACK:<reserve>[,commit]; details here: https://learn.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=msvc-170&viewFallbackFrom=vs-2017.
Clarification: this does not mean that the global object is created on the stack; it's likely due to stack space needed for temporary objects created to initialize the object contents; see the details below for the case when a std::array (whose size is known at compile time) is used instead of a std::vector. The std::array is wholly allocated in the data segment, IIRC. The internal storage for the vector is heap allocated, but the vector instance itself is in the data segment; the additional stack space is needed in the case of the vector for the temporaries.
Example with Details
The issue is that even globals have to be initialized by some thread. In the case of MSVC, from what I can tell, it's done by the main thread. If we have a global vector of 564,168 64-bit prime numbers declared something like this:
std::vector<int64_t> knownInt24PrimesVector = {
2, 3, 5, 7, 11,
13, 17, 19, 23, 29,
31, 37, 41, 43, 47,
...
...
8388461, 8388473, 8388539, 8388547, 8388571,
8388581, 8388587, 8388593,
};
The program compiles fine, but when it is run, it will crash. It shows in the debugger as:
Unhandled exception at 0x00007FF620824107 in isprime.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000002F19603000).
The call stack looks like this:
isprime.exe!__chkstk() Line 109
> isprime.exe!`dynamic initializer for 'knownInt24PrimesVector''() Line 112842
[External Code]
The threads window in the debugger shows this:
Not Flagged > 16252 0 Main Thread Main Thread isprime.exe!__chkstk
Not Flagged 22980 0 Worker Thread ntdll.dll thread ntdll.dll!00007ffefdb30b14
Not Flagged 13112 0 Worker Thread ntdll.dll thread ntdll.dll!00007ffefdb30b14
Not Flagged 5580 0 Worker Thread ntdll.dll thread ntdll.dll!00007ffefdb30b14
As can be seen from the call stack and the threads window, the huge (~4.5MB) vector is getting initialized by Main Thread.
The default stack size is 1MB for programs compiled with MSVC. dumpbin shows this (the 100000 is in hexadecimal even though it doesn't have the 0x prefix):
> dumpbin.exe /headers .\isprime.exe | findstr /ispn stack
46: 100000 size of stack reserve
47: 1000 size of stack commit
Since the data is ~4.5MB in size, increasing the stack size via /STACK:0x00480000 fixes the problem. dumpbin of the rebuilt program shows the stack reserve as 0x480000, which is 4,718,592.
> dumpbin.exe /headers .\isprime.exe | findstr /ispn stack
46: 480000 size of stack reserve
47: 1000 size of stack commit
Note that with the Visual Studio 2022 using the v143 toolchain, using a std::array<int64_t, 564168> did not require the stack size to be changed from the default value, which is expected since the array size is known at compile time.
Similarly, neither did constructing the vector passing in the array::cbegin() and array::cend() iterators require a stack size increase.
Also, more of an FYI: Using the v141 toolchain (which VS 2017 ships with), the compiler crashed (no additional info other than a message to contact tech support) trying to compile the file.
According to Does std::array<> guarantee allocation on the stack only?
std::array is allocated on the stack, not the heap so it is a bad idea to use it if you need a big chunk of memory
I would use a std::vector and do dynamic allocation.
This can be done as follows:
#include<vector>
static std::vector<char> BigGlobal;
int main()
{
// one time init: can be done anywhere.
if (BigGlobal.empty())
{
BigGlobal.resize(1000000000);
}
//Do stuff with BigGlobal
}
i can create this array:
int Array[490000000];
cout << "Array Byte= " << sizeof(Array) << endl;
Array byte = 1,960,000,000 byte and convert gb = 1,96 GB about 2 gb whatever.
but i cant create same time these:
int Array[490000000];
int Array2[490000000];
it give error why ? sorry for bad englisgh :)
Also i checked my compiler like this:
printf("%d\n", sizeof(char *));
it gives me 8.
C++ programs are not usually compiled to have 2Gb+ of stack space, regardless of whether it is compiled in 32-bit mode or 64-bit mode. Stack space can be increased as part of the compiler options, but even in the scenario where it is permissible to set the stack size that high, it's still not an ideomatic solution or recommended.
If you need an array of 2Gb, you should use std::vector<int> Array(490'000'000); (strongly recommended) or a manually created array, i.e. int* Array = new int[490'000'000]; (remember that manually allocated memory must be manually deallocated with delete[]), either of which will allocate dynamic memory. You'll still want to be compiling in 64-bit mode, since this will brush up against the maximum memory limit of your application if you don't, but in your scenario, it's not strictly necessary, since 2Gb is less than the maximum memory of a 32-bit application.
But still I can't use more than 2 GB :( why?
The C++ language does not have semantics to modify (nor report) how much automatic memory is available (or at least I have not seen it.) The compilers rely on the OS to provide some 'useful' amount. You will have to search (google? your hw documents, user's manuals, etc) for how much. This limit is 'machine' dependent, in that some machines do not have as much memory as you may want.
On Ubuntu, for the last few releases, the Posix function ::pthread_attr_getstacksize(...) reports 8 M Bytes per thread. (I am not sure of the proper terminology, but) what linux calls 'Stack' is the resource that the C++ compiler uses for the automatic memory. For this release of OS and compiler, the limit for automatic var's is thus 8M (much smaller than 2G).
I suppose that because the next machine might have more memory, the compiler might be given a bigger automatic memory, and I've seen no semantics that will limit the size of your array based on memory size of the machine performing the compile ...
there can can be no compile-time-report that the stack will overflow.
I see Posix has a function suggesting a way to adjust size of stack. I've not tried it.
I have also found Ubuntu commands that can report and adjust size of various memory issues.
From https://www.nics.tennessee.edu/:
The command to modify limits varies by shell. The C shell (csh) and
its derivatives (such as tcsh) use the limit command to modify limits.
The Bourne shell (sh) and its derivatives (such as ksh and bash) use
the ulimit command. The syntax for these commands varies slightly and
is shown below. More detailed information can be found in the man page
for the shell you are using.
One minor experiment ... the command prompt
& dtb_chimes
launches this work-in-progress app which uses Posix and reports 8 MByte stack (automatic var)
With the ulimit prefix command
$ ulimit -S -s 131072 ; dtb_chimes
the app now reports 134,217,728
./dtb_chimes_ut
default Stack size: 134,217,728
argc: 1
1 ./dtb_chimes_ut
But I have not confirmed the actual allocation ... and this is still a lot smaller than 1.96 GBytes ... but, maybe you can get there.
Note: I strongly recommend std::vector versus big array.
On my Ubuntu desktop, there is 4 GByte total dram (I have memory test utilities), and my dynamic memory is limited to about 3.5 GB. Again, the amount of dynamic memory is machine dependent.
64 bits address a lot more memory than I can afford.
I was doing a question where I used a recursive function to create a segment tree. For larger values it started giving segmentation fault. So I thought before it might be because of array index value out of bound but later I thought it might be because of program stack going too big.
I wrote this code to count what is the maximum number of recursive calls allowed before the system give seg-fault.
#include<iostream>
using namespace std;
void recur(long long int);
int main()
{
recur(0);
return 0;
}
void recur(long long int v)
{
v++;
cout<<v<<endl;
recur(v);
}
After running the above code I got value of v to be 261926 and 261893 and 261816 before getting segmentation fault and all values were close to these.
Now I know that this would depend on machine to machine, and the size of the stack of the function being called but can someone explain the basics of how to keep safe from seg-faults and what is a soft limit that one can keep in mind.
The number of recursion levels you can do depends on the call-stack size combined with the size of local variables and arguments that are placed on such a stack. Aside from "how the code is written", just like many other memory related things, this is very much dependent on the system you're running on, what compiler you are using, optimisation level [1], and so on. Some embedded systems I've worked on, the stack would be a few hundred bytes, my first home computer had 256 bytes of stack, where modern desktops have megabytes of stack (and you can adjust it, but eventually you will run out)
Doing recursion at unlimited depth is not a good idea, and you should look at changing your code to so that "it doesn't do that". You need to understand the algorithm and understand to what depth it will recurse, and whether that is acceptable in your system. There is unfortunately nothing anyone can do at the time stack runs out (at best your program crashes, at worst it doesn't, but instead causes something ELSE to go wrong, such as the stack or heap of some other application gets messed up!)
On a desktop machine, I'd think it's acceptable to have a recursion depth of a hew hundred to some thousands, but not much more than this - and that is if you have small usage of stack in each call - if each call is using up kilobytes of stack, you should limit the call level even further, or reduce the need for stack-space.
If you need to have more recursion depth than that, you need to re-arrange the code - for example using a software stack to store the state, and a loop in the code itself.
[1] Using g++ -O2 on your posted code, I got to 50 million and counting, and I expect if I leave it long enough, it will restart at zero because it keeps going forever - this since g++ detects that this recursion can be converted into a loop, and does that. Same program compiled with -O0 or -O1 does indeed stop at a little over 200000. With clang++ -O1 it just keeps going. The clang-compiled code is still running as I finished writing the rest of the code, at 185 million "recursions".
There is (AFAIK) no well established limit. (I am answering from a Linux desktop point of view).
On desktops, laptops the default stack size is a few megabytes in 2015. On Linux you could use setrlimit(2) to change it (to a reasonable figure, don't expect to be able to set it to a gigabyte these days) - and you could use getrlimit(2) or parse /proc/self/limits (see proc(5)) to query it . On embedded microcontrollers - or inside the Linux kernel- , the entire stack may be much more limited (to a few kilobytes in total).
When you create a thread using pthread_create(3) you could use an explicit pthread_attr_t and use pthread_attr_setstack(3) to set the stack space.
BTW, with recent GCC, you might compile all your software (including the standard C library) with split stacks (so pass -fsplit-stack to gcc or g++)
At last your example is a tail call, and GCC could optimize that (into a jump with arguments). I checked that if you compile with g++ -O2 (using GCC 4.9.2 on Linux/x86-64/Debian) the recursion would be transformed into a genuine loop and no stack allocation would grow indefinitely (your program run for nearly 40 millions calls to recur in a minute, then I interrupted it) In better languages like Scheme or Ocaml there is a guarantee that tail calls are indeed compiled iteratively (then the tail recursive call becomes the usually -or even the only- looping construct).
CyberSpok is excessive in his comment (hinting to avoid recursions). Recursions are very useful, but you should limit them to a reasonable depth (e.g. a few thousands), and you should take care that call frames on the call stack are small (less than a kilobyte each), so practically allocate and deallocate most of the data in the C heap. The GCC -fstack-usage options is really useful for reporting stack usage of every compiled function. See this and that answers.
Notice that continuation passing style is a canonical way to transform recursions into iterations (then you trade stack frames with dynamically allocated closures).
Some clever algorithms replace a recursion with fancy modifying iterations, e.g. the Deutche-Shorr-Waite graph marking algorithm.
For Linux based applications, we can use getrlimit and setrlimit API's to know various kernel resource limits, like size of core file, cpu time, stack size, nice values, max. no. of processes etc. 'RLIMIT_STACK' is the resource name for stack defined in linux kernel. Below is simple program to retrieve stack size :
#include <iostream>
#include <sys/time.h>
#include <sys/resource.h>
#include <errno.h>
using namespace std;
int main()
{
struct rlimit sl;
int returnVal = getrlimit(RLIMIT_STACK, &sl);
if (returnVal == -1)
{
cout << "Error. errno: " << errno << endl;
}
else if (returnVal == 0)
{
cout << "stackLimit soft - max : " << sl.rlim_cur << " - " << sl.rlim_max << endl;
}
}
why the release version memset is slower than debug version in visual studio 2012?
in visual sutido 2010, it is that result too.
my computer:
Intel Core i7-3770 3.40GHz
8G memory
os:
windows 7 sp1 64bit
this is my test code:
#include <boost/progress.hpp>
int main()
{
const int Size = 1000*1024*1024;
char* Data = (char*)malloc(Size);
#ifdef _DEBUG
printf_s("debug\n");
#else
printf_s("release\n");
#endif
boost::progress_timer timer;
memset(Data, 0, Size);
return 0;
}
the output:
release
0.27 s
debug
0.06 s
edited:
if i change code to this, it will get the same result:
#include <boost/progress.hpp>
int main()
{
const int Size = 1000*1024*1024;
char* Data = (char*)malloc(Size);
memset(Data, 1, Size);
#ifdef _DEBUG
printf_s("debug\n");
#else
printf_s("release\n");
#endif
{
boost::progress_timer timer;
memset(Data, 0, Size);
}
return 0;
}
so Hans Passant is right, thank you very much.
This is a standard benchmark mistake, you don't measure the execution time of memset() at all. You actually measure the time needed for the operating system to deal with the quarter of a million page faults that your code generates. Which is highly dependent on what other processes are running and how many pages were prepped by the kernel's zero page thread.
On a demand-page virtual memory operating system like Windows, malloc() doesn't allocate memory at all. It allocates address space. Just numbers to the processor. The physical memory allocation doesn't happen until the processor accesses the address space. At which point the kernel is forced to provide the physical RAM to allow the processor to continue. Triggered by a soft page fault generated by the processor when it discovers that an address isn't mapped to RAM yet.
If you want to have an estimate of how long memset() really takes then you have to call it twice. The first call ensures that the RAM is mapped. Time the second call to measure how long the memory writes take. Which is a fixed number for large memory ranges like you are using, the memory cache and write-back buffers are ineffective so speed is entirely determined by the bandwidth of the memory bus. Your debug result suggests DDR3 clocked at 266 MHz, pretty common.
This also removes the bias you get from using the debug allocator in the debug build of the CRT. Which fills allocated memory with a bit-pattern that's likely to induce a crash when you try to access uninitialized memory. This hides the page fault overhead since you didn't include the cost of malloc() in the measurement.
I'm playing with some basic stuff of cpp. I'm new in this language... so I'm warning that my question maybe was not correctly formulated. I appreciate any help.
The thing is that after saw the example in www.cplusplus.com/reference/cstdlib/malloc/ I found my self with this code:
#include <stdio.h>
int main (void) {
char *str;
str = (char*) malloc(2);
str[0] ='8';
str[1] ='8';
str[2] ='6';
str[3] ='\0';
printf ("%s\n",str);
}
And compiling with:
gcc -O0 -pedantic -Wall test2.cpp
(gcc version 4.7.2)
I get no errors and the output 886. Why I get no errors? Have I not passed the boundary of the allocated space?
I didn't get no errors and I got the output 886. Why no errors? Have I not passed the boundary of the allocated space?
In the case that code is ok... Why the example in the reference?
In the other (more probable) case... What are the risks?
Thanks!
You don't get any errors because C and C++ don't do bounds checking. You overwrote sections of memory that you weren't using, but you got lucky and it wasn't anything important. Compare it to putting a row of nails into a wall where you know there's a stud. If you miss the stud, most of the time, you just put a hole in the plaster, but it's dangerous to keep doing it because eventually, you're going to hit one of the live wires instead.
You have passed over the boundary of the allocated memory.
However, printf does not bother what size of a memory you have declared. All it cares is it will start from the start and continue till it finds a 0.
The case you created is an undefined behaviour. There can be some other data right after your allocated region (maybe another variable) in which case it will get corrupted. If the next part is unallocated memory you might escape without a visible problem. And if the memory right after your allocated memory belongs to another process, you will see the nice and tidy Segmentation Fault. The consequences can be even worse, so better not try this anywhere.
the following can be found in comments in malloc.c of glibc:
Minimum overhead per allocated chunk: 4 or 8 bytes Each malloced
chunk has a hidden word of overhead holding size and status
information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4
overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
ptrs but 4 byte size) or 24 (for 8/8) additional bytes are needed;
4 (8) for a trailing size field and 8 (16) bytes for free list
pointers. Thus, the minimum allocatable size is 16/24/32 bytes.
Since minimum allocated size would be 16/24/32, since it is greater than 3 bytes your program ran without errors. This is one of the possibility executing your program correctly.