I'm trying to do this:
int main(void){
u_int64_t NNUM = 2<<19;
u_int64_t list[NNUM], i;
for(i = 0; i < 4; i++){
list[i] = 999;
}
}
Why am I getting segfault at my Ubuntu 10.10 64 bits (gcc 4.6.1)?
You try to create a very large array on the stack. This leads to a stack overflow.
Try allocating the array on the heap instead. For example:
// Allocate memory
u_int64_t *list = malloc(NNUM * sizeof(u_int64_t));
// work with `list`
// ...
// Free memory again
free(list);
You declare NNUM = 2*2^19 == 2<<19 == 1048576.
and try to allocate on the stack 64 bits * 1048576 = num of bits* num of cells.
It is 8.5 MegaBytes, it is just too much for allocation on the stack, you can try to allocate it on the heap and check if it really works using the return value of malloc.
heap VS. stack
Your program requires a minimum stack size of 1048576,
if you check with 'ulimit -s', it is most likely less than that.
you can try 'ulimit -s 16384' and then re-execute again.
Related
My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.
int main(){
int * x = new int[3];
for(int i = 0; i < 5; ++i){
x = new int[i];
x[i] = i;
cout << i << endl;
}
}
So say I have an integer array allocated on the heap, with capacity of 3 integers, as seen in x.
Now say I have this for loop where I want to change the values of x into whatever i is.
Now when I run this code by it self it does run, and 0,1,2,3,4 prints like I want.
What I'm wondering is, when I do new int[i] when i is 0, 1, 2, since x[0], x[1], x[2] is already allocated on the heap, am I make three new address in the heap?
Thanks!
int main(){
int * x = new int[3];
for(int i = 0; i < 5; ++i){
x = new int[i];
x[i] = i;
cout << i << endl;
}
}
Running it
-> An array of size 3 is created on the heap
-> An array of size 0 is created on the heap
-> The 0 index of array size 0 is equal to 0
-> Print 0
-> An array of size 1 is created on the heap
-> The 1 index of array size 1 is equal to 1
-> Print 1
.
.
.
-> An array of size 4 is created on the heap
-> The 4 index of array size 4 is equal to 4
-> Print 4
I'm not sure if this is your intention, but as the rest of the comments said, there are memory leaks and undefined behavior.
I think what you're trying to implement is instead
#include <iostream>
#include <vector>
int main()
{
std::vector<int> g1; //initialises a vector called g1
for (int i = 1; i <= 5; i++) {
// Adds i to the vector, expands the vector if
// there is not enough space
g1.push_back(i);
// Prints out i
std::cout<<i<<std::endl;
}
Every time you new something, you are requesting memory and system is allocating a memory block(it may fail, that's another case itself) and giving you a pointer to the initial address of that memory, please remember this when you new something. So here initially you new an array of 3 ints, then in the loop you again new 5 times which returns 5 new memory addresses(which is 5 different memory blocks). So you have 6 new addresses(memory blocks of different sizes) to deal with. It's definitely not the thing you want. So you should use the 1st allocation without any more new in the loop, in that case you should know the bounds of array beforehand. So to make that automatic you can use vector which can grow when you push elements into it.
please refer to this for vector.
a sidenote: when you new something you should take care of that memory yourself, so new is generally not inspired, please look at smart pointers to make your code safe.
Can you allocate memory for an array already on the heap?
Answer: Yes (but not how you are doing it...)
Whenever you have memory already allocated, in order to expand or reduce the allocation size making up a given block of memory, you must (1) allocate a new block of memory of the desired size, and (2) copy the existing block to the newly allocated block (up to the size of the newly allocated block), before (3) freeing the original block. In essence since there is no equivalent to realloc in C++, you simply have to do it yourself.
In your example, beginning with an allocation size of 3-int, you can enter your for loop and create a temporary block to hold 1-int (one more than the loop index) and copy the number of existing bytes in x that will fit in your new tmp block to tmp. You can then delete[] x; and assign the beginning address of the new temporary block of memory to x (e.g. x = tmp;)
A short example continuing from your post could be:
#include <iostream>
#include <cstring>
int main (void) {
int nelem = 3, /* var to track no. of elements allocated */
*x = new int[nelem]; /* initial allocation of 3 int - for fun */
for (int i = 0; i < 5; i++) {
nelem = i + 1; /* update nelem */
/* create temporary block to hold nelem int */
int *tmp = new int[nelem]; /* allocate tmp for new x */
memcpy (tmp, x, i * sizeof *tmp); /* copy i elements to tmp */
delete[] x; /* free x */
x = tmp; /* assign tmp to x */
x[i] = i; /* assign x[i] */
for (int j = 0; j < nelem; j++) /* output all */
std::cout << " " << x[j];
std::cout << '\n';
}
delete[] x; /* free x */
}
(note: on the first iteration zero bytes are copied from x -- which is fine. You can include an if (i) before the memcpy if you like)
Example Use/Output
$ ./bin/allocrealloc
0
0 1
0 1 2
0 1 2 3
0 1 2 3 4
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/allocrealloc
==6202== Memcheck, a memory error detector
==6202== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6202== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6202== Command: ./bin/allocrealloc
==6202==
0
0 1
0 1 2
0 1 2 3
0 1 2 3 4
==6202==
==6202== HEAP SUMMARY:
==6202== in use at exit: 0 bytes in 0 blocks
==6202== total heap usage: 7 allocs, 7 frees, 72,776 bytes allocated
==6202==
==6202== All heap blocks were freed -- no leaks are possible
==6202==
==6202== For counts of detected and suppressed errors, rerun with: -v
==6202== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
I'll just throw your code back at you with some comments that hopefully clear things up a bit.
int main()
{
// Allocate memory for three integers on the heap.
int * x = new int[3];
for(int i = 0; i < 5; ++i)
{
// Allocate memory for i (0-4) integers on the heap.
// This will overwrite the x variable allocated earlier.
// What happens when i is zero?
x = new int[i];
// Set the recently allocated array at x[i] to its index.
x[i] = i;
// Print out current index of 0-4.
cout << i << endl;
}
}
I am implementing a version of malloc and free for practice. So, I have a static char array of a fixed length (10000). Then, I implemented a struct memblock that holds information like size of the block, if it is free...
The way I am implementing malloc is such that I put small blocks (< 8 bytes) to the front of the char array and larger ones to the other end. So, I am basically using two linked lists to link the blocks in front and blocks in back. However, I am having weird problems with initializing the lists (on first call of malloc).
This is my code:
#define MEMSIZE 10000 // this is the maximum size of the char * array
#define BLOCKSIZE sizeof(memblock) // size of the memblock struct
static char memory[MEMSIZE]; // array to store all the memory
static int init; // checks if memory is initialized
static memblock root; // general ptr that deals with both smallroot and bigroot
static memblock smallroot, bigroot; // pointers to storage of small memory blocks and bigger blocks
void initRoots(size_t size, char* fileName, int lineNum)
{
smallroot = (memblock)memory;
smallroot->prev = smallroot->next = 0;
smallroot->size = MEMSIZE - 2 * BLOCKSIZE;
smallroot->isFree = 1;
smallroot->file = fileName;
smallroot->lineNum = lineNum;
bigroot = (memblock)(((char *)memory) + MEMSIZE - BLOCKSIZE - 1);
bigroot->prev = bigroot->next = 0;
bigroot->size = MEMSIZE - 2 * BLOCKSIZE;
bigroot->isFree = 1;
bigroot->file = fileName;
bigroot->lineNum = lineNum;
init = 1;
}
I used GDB to see where I am getting a Seg Fault. It happens when bigroot->next = 0; is executed. This somehow sets smallroot to 0. What is more weird? If I set bigroot->next = 0x123, then smallroot becomes 0x1. If I set 0x1234, then it becomes 0x12. It is setting smallroot to the value of bigroot->next's value excluding its last two bits. I really don't understand how this is happening!
This is the definition of memblock:
typedef struct memblock_* memblock;
struct memblock_ {
struct memblock_ *prev, *next; // pointers to next and previous blocks
/* size: size of allocated memory
isFree: 0 if not free, 1 if free
lineNum: line number of user's file where malloc was invoked
*/
size_t size, isFree, lineNum;
char* file; // user's file name where the block was malloced
};
#define BLOCKSIZE sizeof(memblock) // size of the memblock struct
You want:
#define BLOCKSIZE sizeof(*memblock) // size of the memblock_ struct
Also the -1 here is bogus (creates mis-aligned pointer):
bigroot = (memblock)(((char *)memory) + MEMSIZE - BLOCKSIZE - 1);
Actually, I am storing the pointer to the memblock in the memory array. The values of memblock are stored in stack.
No, they are not. The smallroot and bigroot clearly point into the array itself.
I'm experiencing strange memory access performance problem, any ideas?
int* pixel_ptr = somewhereFromHeap;
int local_ptr[307200]; //local
//this is very slow
for(int i=0;i<307200;i++){
pixel_ptr[i] = someCalculatedVal ;
}
//this is very slow
for(int i=0;i<307200;i++){
pixel_ptr[i] = 1 ; //constant
}
//this is fast
for(int i=0;i<307200;i++){
int val = pixel_ptr[i];
local_ptr[i] = val;
}
//this is fast
for(int i=0;i<307200;i++){
local_ptr[i] = someCalculatedVal ;
}
Tried consolidating values to local scanline
int scanline[640]; // local
//this is very slow
for(int i=xMin;i<xMax;i++){
int screen_pos = sy*screen_width+i;
int val = scanline[i];
pixel_ptr[screen_pos] = val ;
}
//this is fast
for(int i=xMin;i<xMax;i++){
int screen_pos = sy*screen_width+i;
int val = scanline[i];
pixel_ptr[screen_pos] = 1 ; //constant
}
//this is fast
for(int i=xMin;i<xMax;i++){
int screen_pos = sy*screen_width+i;
int val = i; //or a constant
pixel_ptr[screen_pos] = val ;
}
//this is slow
for(int i=xMin;i<xMax;i++){
int screen_pos = sy*screen_width+i;
int val = scanline[0];
pixel_ptr[screen_pos] = val ;
}
Any ideas? I'm using mingw with cflags -01 -std=c++11 -fpermissive.
update4:
I have to say that these are snippets from my program and there are heavy code/functions running before and after. The scanline block did ran at the end of function before exit.
Now with proper test program. thks to #Iwillnotexist.
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#define SIZE 307200
#define SAMPLES 1000
double local_test(){
int local_array[SIZE];
timeval start, end;
long cpu_time_used_sec,cpu_time_used_usec;
double cpu_time_used;
gettimeofday(&start, NULL);
for(int i=0;i<SIZE;i++){
local_array[i] = i;
}
gettimeofday(&end, NULL);
cpu_time_used_sec = end.tv_sec- start.tv_sec;
cpu_time_used_usec = end.tv_usec- start.tv_usec;
cpu_time_used = cpu_time_used_sec*1000 + cpu_time_used_usec/1000.0;
return cpu_time_used;
}
double heap_test(){
int* heap_array=new int[SIZE];
timeval start, end;
long cpu_time_used_sec,cpu_time_used_usec;
double cpu_time_used;
gettimeofday(&start, NULL);
for(int i=0;i<SIZE;i++){
heap_array[i] = i;
}
gettimeofday(&end, NULL);
cpu_time_used_sec = end.tv_sec- start.tv_sec;
cpu_time_used_usec = end.tv_usec- start.tv_usec;
cpu_time_used = cpu_time_used_sec*1000 + cpu_time_used_usec/1000.0;
delete[] heap_array;
return cpu_time_used;
}
double heap_test2(){
static int* heap_array = NULL;
if(heap_array==NULL){
heap_array = new int[SIZE];
}
timeval start, end;
long cpu_time_used_sec,cpu_time_used_usec;
double cpu_time_used;
gettimeofday(&start, NULL);
for(int i=0;i<SIZE;i++){
heap_array[i] = i;
}
gettimeofday(&end, NULL);
cpu_time_used_sec = end.tv_sec- start.tv_sec;
cpu_time_used_usec = end.tv_usec- start.tv_usec;
cpu_time_used = cpu_time_used_sec*1000 + cpu_time_used_usec/1000.0;
return cpu_time_used;
}
int main (int argc, char** argv){
double cpu_time_used = 0;
for(int i=0;i<SAMPLES;i++)
cpu_time_used+=local_test();
printf("local: %f ms\n",cpu_time_used);
cpu_time_used = 0;
for(int i=0;i<SAMPLES;i++)
cpu_time_used+=heap_test();
printf("heap_: %f ms\n",cpu_time_used);
cpu_time_used = 0;
for(int i=0;i<SAMPLES;i++)
cpu_time_used+=heap_test2();
printf("heap2: %f ms\n",cpu_time_used);
}
Complied with no optimization.
local: 577.201000 ms
heap_: 826.802000 ms
heap2: 686.401000 ms
The first heap test with new and delete is 2x slower. (paging as suggested?)
The second heap with reused heap array is still 1.2x slower.
But I guess the second test is not that practical as there tend to other codes running before and after at least for my case. For my case, my pixel_ptr of course only allocated once during
prograim initialization.
But if anyone has solutions/idea to speeding things up please reply!
I'm still perplexed why heap write is so much slower than stack segment.
Surely there must be some tricks to make the heap more cpu/cache flavourable.
Final update?:
I revisited, the disassemblies again and this time, suddenly I have an idea why some of my breakpoints
don't activate. The program looks suspiciously shorter thus I suspect the complier might
have removed the redundant dummy code I put in which explains why the local array is magically many times faster.
I was a bit curious so I did the test, and indeed I could measure a difference between stack and heap access.
The first guess would be that the generated assembly is different, but after taking a look, it is actually identical for heap and stack (which makes sense, memory shouldn't be discriminated).
If the assembly is the same, then the difference must come from the paging mechanism. The guess is that on the stack, the pages are already allocated, but on the heap, first access cause a page fault and page allocation (invisible, it all happens at kernel level). To verify this, I did the same test, but first I would access the heap once before measuring. The test gave identical times for stack and heap. To be sure, I also did a test in which I first accessed the heap, but only every 4096 bytes (every 1024 int), then 8192, because a page is usually 4096 bytes long. The result is that accessing only every 4096 bytes also gives the same time for heap and stack, but accessing every 8192 gives a difference, but not as much as with no previous access at all. This is because only half of the pages were accessed and allocated beforehand.
So the answer is that on the stack, memory pages are already allocated, but on the heap, pages are allocated on-the-fly. This depends on the OS paging policy, but all major PC OSes probably have a similar one.
For all the tests I used Windows, with MS compiler targeting x64.
EDIT: For the test, I measured a single, larger loop, so there was only one access at each memory location. deleteing the array and measuring the same loop multiple time should give similar times for stack and heap, because deleteing memory probably don't de-allocate the pages, and they are already allocated for the next loop (if the next new allocated on the same space).
The following two code examples should not differ in runtime with a good compiler setting. Probably your compiler will generate the same code:
//this is fast
for(int i=0;i<307200;i++){
int val = pixel_ptr[i];
local_ptr[i] = val;
}
//this is fast
for(int i=0;i<307200;i++){
local_ptr[i] = pixel_ptr[i];
}
Please try to increase optimization setting.
I need to allocate space for a temporary array once per iteration. I try to use realloc each iteration to optimize memory using. Like that:
int *a = (int*)std::alloc(2 * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
a = (int*)std::realloc(m * sizeof(int));
// ...
}
std::free(a);
N is a big number, 1000000 for example. There are example m values per iteration: 8,2,6,10,4,8
Am I doing right when I realloc a at each iteration? Does it prevent redundant memory allocation?
Firstly, realloc takes 2 parameters. First is the original pointer and the second is the new size. You are trying to pass the size as the original pointer and the code shouldn't compile.
Secondly, the obligatory reminder: Don't optimize prematurely. Unless you've measured and found that the allocations are a bottleneck, just use std::vector.
Few issues I have noticed are:
Realloc should be used in case you want older values remain in the memory, if you didn't bother about old values as mentioned in one of your comment use just alloc.
Please check size of already allocated memory before allocating again, if allocated memory is insufficient for new data then only allocate new memory.
Please refer to the sample code which will taking care of above mentioned problems:
int size = 2;
int *a = (int*)std::alloc(size * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
if(m > size)
{
size = m;
std::free(a);
a = (int*)std::alloc(size * sizeof(int));
}
// ...
}
std::free(a);
Also you can further optimized memory allocation by allocating some extra memory, e.g:
size = m*2; //!
To better understand this step let's take an example suppose m = 8, then you will allocate memory = 16, so when now m changes to 10, 12 up-to 16 there is no need to allocate memory again.
If you can get all the sizes beforehand, allocate the biggest you need before the cycle and then use as much as needed.
If, on the other hand, you can not do that, then reallocation is a good solution, I think.
You can also further optimize your solution by reallocating only when a bigger size is needed:
int size = 0;
for(int i = 0; i < N; ++i)
{
int new_size = calculate_enough_size();
if ( new_size > size ){
a = (int*)std::realloc(new_size * sizeof(int));
size = new_size;
}
// ...
}
Like this you will need less reallocations (half of them in a randomized case).