I have the following code
int main()
{
int* myDynamicArray;
myDynamicArray = new int[20000000];
int numIte;
cout << "number of iterations" << endl;
cin >> numIte;
for (int i = 0; i < numIte; ++i)
foo(myDynamicArray);
delete [] myDynamicArray;
return 0;
}
The thing that i dont understand is that when the number of iterations input is large, the memory used by the system increases as we loop through more iterations. Is that normal?
Since foo is not shown and because it it possibly doesn't make sense to call it without the array index passed in, I'll make a guess. In other words, I'm guessing that the real foo accepts some kind of array index or length as a parameter and that it accesses the elements of myDynamicArray based on that index.
If that is true (and it is not a simple case of foo leaking memory), then what you might be measuring is the amount of memory actually committed. The allocation is for 80MB, but the commit of the memory may not happen until you access the array. So the more of the array accessed by foo could cause more of the memory to be committed.
Without having a full definition for foo, this question is impossible to answer. However here are some thoughts...
It is probably a good idea to wrap myDynamicArray inside some form of safe pointer, possibly std::auto_ptr or in the case that foo might keep reference to the pointer, std::tr1::shared_ptr.
Unless the call to the foo constructor/function is causing additional memory to be allocated, there is no reason to suggest that increasing the number of loop iterations should affect the programs runtime memory usage in any way.
Finally, how are you monitoring the runtime memory usage of the program? Watching the numbers within Windows Task Manager (or equivalent) isn't a particularly robust solution, you could try manually tracking all memory allocations yourself (by overriding new/malloc) to get a true idea of when, where and how much memory is being allocated on the heap.
Related
I am trying to better understand items on a stack and how they are addressed. The article I found here seems to indicate that when MIPS stack is initialized, a fixed amount of memory is allocated and the stack grows down to the stack limit which would appear to be smaller addresses. I would assume that based on this logic a stack overflow would occur when 0x0000 was traversed?
I realize MIPS is big endian, but does that change how the stack grows? I wrote what I believed would be a quick way to observe this on an x86_64 machine, but the stack appears to grow up, as I originally assumed it did.
#include <iostream>
#include <vector>
int main() {
std::vector<int*> v;
for( int i = 0; i < 10; i++ ) {
v.push_back(new int);
std::cout << v.back() << std::endl;
}
}
I'm also confused by the fact that not all of the memory address's do not appear to be contiguous, which makes me think I did something stupid. Could somebody please clarify?
The stack on x86 machines also grows downwards. Endianness is unrelated to the direction in which the stack grows.
The stack of a machine has absolutely nothing to do with std::vector<>. Also, new int allocates heap memory, so it tells you absolutely nothing about the stack.
In order to see in which direction the stack grows, you need to do something like this:
recursive( 5 );
void recursive( int n )
{
if( n == 0 )
return;
int a;
printf( "%p\n", &a );
recursive( n - 1 );
}
(Note that if your compiler is smart enough to optimize tail recursion, then you will need to tell it to not optimize it, otherwise the observations will be all wrong.)
essentially there are 3 types of memory you use in programming: static, dynamic/heap and stack.
Static memory is pre-allocated by the compiler and consist of the constants and variables declared statically in your program.
Heap is the memory which you can freely allocate and release
Stack is the memory which gets allocated for all local variables declared in a function. This is important because every time you call the function a new memory for its variables is allocated. So, that every call to a function will assure that it has its own unique copy of the variables. And every time you return from the function the memory gets freed.
It absolutely does not matter how the stack is managed as soon as it follows the above rules. It is convenient however to have program memory to be allocated in the lower address space and grow up, and the stack to start from a top memory space and grow down. Most systems implement this scheme.
In general there is a stack pointer register/variable which points so the current stack address. when a function gets called it decrease this address by the number of bytes it needs for its variables. when it calls the next function, this new one will start with the new pointer already decreased by the caller. When the function returns it restores the pointer which it started from.
There could be different schemes but as far as I know, mips and i86 follow this one.
And essentially there is only one virtual memory space in the program. This is up to the operating system and/or compiler how to use it. The compiler will split the memory in the logical regions for its own use and handle them, hopefully, according to the calling conventions defined in the platform documents.
So, in our example, v and i are allocated on the function stack. cout is static. every new int allocate space in heap. v is not a simple variable but a struct which contains fields which it needs to manage the list. it needs space for all these internals. So, every push_back modifies those fields to point to the allocated 'int' in some way. push_back() and back() are function calls and allocate their own stacks for internal variables to not interfere with the top function.
What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.
More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.
The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.
I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area
Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.
we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';
I'm very confused with regard to the following instructions:
#include <iostream>
#define MAX_IT 100
using namespace std;
class Integer{
private :
int a;
public:
Integer(int valoare){a=valoare;}
int getA(){return a;}
void setA(int valoare){a=valoare;}
};
int main(){
Integer* a=new Integer(0);
//cout<<a[0].getA();
for(int i=1;i<=MAX_IT;i++)
{
a[i]=*(new Integer(i));
}
for(int i=0;i<=MAX_IT;i++)
cout<<a[i].getA()<<endl;
return 13;
}
It works for small values of MAX_IT, but when I try to set MAX_IT to 1000 it doesn't work anymore.
Initially, I thought "new" operator was supposed to do the job, but after some reading documentation I understood it is not supposed to work at all like this (out of bound array).
So my question is: why is it working for small values of MAX_IT and not for bigger ones?
EDIT:
I am experimenting with this code for a larger program, where I am not allowed to use STL. You have not understood my concern: if I have Integer *var=new Integer[10]; for(int k=1;K<10;k++) *(var+k)=k; //this is perfectly fine, but if I try var[10]=new Integer; //this should not be working and should generate a memory problem //My concern is that it is working if I do it only 100 times or so...The question if why is it working everytime for small number of iterations?
Because by allocating space for one Integer then using it as an array of multiple Integers, your code invokes undefined behavior, meaning that it can do anything, including crashing, working seemingly fine, or pulling demons out of your nose.
And anyways it's leaking memory. If you don't need dynamic memory allocation, then don't use it.
a[i]=*(new Integer(i));
And kaboom, you lost the pointer to the Integer, no chance to delete it later. Leaks.
If you don't need raw arrays, don't use them. Prefer std::vector. Or switch to C if C++ is too hard.
std::vector<Integer> vec;
vec.push_back(Integer(1337));
The reason that things tend to work nicely when you overflow your buffer by just a little bit is... memory fragmentation! Who would have guessed?
To avoid memory fragmentation, allocators won't return you a block of just sizeof (Integer). They'll give you a somewhat larger block, to ensure that if the block is later freed before the adjacent blocks, it's at least big enough to be useful.
Exactly how big this is can vary by architecture, OS, compiler version, or even how much memory is physically present in the machine. You should consider it to be completely unpredictable. Also, some libraries designed to help catch this sort of bug force any small object to be placed at the end of the block instead of the beginning, so the extra bytes could be negative array indices instead of positive.
Therefore, don't ever rely on having spare area given to you for free after (or before) an object.
Guru note: Occasionally someone comes up with a valid use for the extra memory, and asks for a way to discover how large it is. One good example is that the capacity (not size!) of a std::vector could be adjusted to match the actual allocated space instead of the requested space, and therefore reduce (on average) the number of reallocations needed. Such requests usually come paired with other guru allocator APIs, such as the ability to expand an allocation in-place if there happen to be free blocks adjacent.
Note that in your particular case you do still have undefined behavior, because you're calling operator= on a non-POD object which hasn't first been constructed. If you gave class Integer a trivial default constructor that would change.
you actually need
Integer* a=new Integer[MAX_IT];
//cout<<a[0].getA();
for(int i=1;i<MAX_IT;i++) << note < not <=
{
a[i]=i;
}
better would be to use std::vector though
I have an application where the sequence of malloc/free operations is known in advance. I'd like to do a pre-computation to minimize the maximum memory usage. Are there any resources on that (c++ implementations/research papers)?
More precisely, the same sequence of malloc/free operations is repeated many times (in the end of each cycle everything is freed). So I can afford some computation to optimize memory usage.
Assuming what you want to achieve is minimize time spent allocating memory and possibly improve cache locality, this sounds quite simple, actually.
Just choose memory manager (write one or use a pre-existing such as Hoard). Then, let the memory manager allocate the maximum amount of memory used during a cycle at the start of the program.
The main issue is calculating this amount of memory. A simple solution would be to go through one cycle using an allocator which does nothing other than wrap malloc/free together with a counter which keeps track of current memory usage and maximum usage. At the end of your cycle, that maximum is how much you should allocate in the beginning.
The one thing to look out for is that fragmentation in your allocated memory could cause the need for extra allocations. This can usually be avoided by a good memory manager. In the worst case, you may have to track max memory allocated for every allocation size separately.
As a sidenote, if you are using C++, why are you using malloc/free instead of new/delete?
More precisely, the same sequence of malloc/free operations is
repeated many times (in the end of each cycle everything is freed). So
I can afford some computation to optimize memory usage.
For memory usage, this is not a hard case to solve. The same memory will be reallocated for the same purpose, so it's not "wasteful" of memory if you allocate the same lumps of memory over and over again.
Since you are saying, malloc and free, are we talking old style "C" type usage of the heap? So there's no constructors or destructors to worry about? Why not then create arrays of elements of a given type, e.g.
struct X
{
...
};
Old code:
X* px[10];
for(i = 0; i < 10; i++)
{
px[i] = malloc(sizeof(X));
...
}
instead do:
X* px[10];
X* xx = malloc(sizeof(X)*10);
for(i = 0; i < 10; i++)
{
px[i] = &xx[i];
}
I want to explore the performance differences for multiple dereferencing of data inside a vector of new-ly allocated structs (or classes).
struct Foo
{
int val;
// some variables
}
std::vector<Foo*> vectorOfFoo;
// Foo objects are new-ed and pushed in vectorOfFoo
for (int i=0; i<N; i++)
{
Foo *f = new Foo;
vectorOfFoo.push_back(f);
}
In the parts of the code where I iterate over vector I would like to enhance locality of reference through the many iterator derefencing, for example I have very often to perform a double nested loop
for (vector<Foo*>::iterator iter1 = vectorOfFoo.begin(); iter!=vectorOfFoo.end(); ++iter1)
{
int somevalue = (*iter)->value;
}
Obviously if the pointers inside the vectorOfFoo are very far, I think locality of reference is somewhat lost.
What about the performance if before the loop I sort the vector before iterating on it? Should I have better performance in repeated dereferencings?
Am I ensured that consecutive ´new´ allocates pointer which are close in the memory layout?
Just to answer your last question: no, there is no guarantee whatsoever where new allocates memory. The allocations can be distributed throughout the memory. Depending on the current fragmentation of the memory you may be lucky that they are sometimes close to each other but no guarantee is - or, actually, can be - given.
If you want to improve the locality of reference for your objects then you should look into Pool Allocation.
But that's pointless without profiling.
It depends on many factors.
First, it depends on how your objects that are being pointed to from the vector were allocated. If they were allocated on different pages then you cannot help it but fix the allocation part and/or try to use software prefetching.
You can generally check what virtual addresses malloc gives out, but as a part of the larger program the result of separate allocations is not deterministic. So if you want to control the allocation, you have to do it smarter.
In case of NUMA system, you have to make sure that the memory you are accessing is allocated from the physical memory of the node on which your process is running. Otherwise, no matter what you do, the memory will be coming from the other node and you cannot do much in that case except transfer you program back to its "home" node.
You have to check the stride that is needed in order to jump from one object to another. Pre-fetcher can recognize the stride within 512 byte window. If the stride is greater, you are talking about a random memory access from the pre-fetcher point of view. Then it will shut off not to evict your data from the cache, and the best you can do there is to try and use software prefetching. Which may or may not help (always test it).
So if sorting the vector of pointers makes the objects pointed by them continuously placed one after another with a relatively small stride - then yes, you will improve the memory access speed by making it more friendly for the prefetch hardware.
You also have to make sure that sorting that vector doesn't result in a worse gain/lose ratio.
On a side note, depending on how you use each element, you may want to allocate them all at once and/or split those objects into different smaller structures and iterate over smaller data chunks.
At any rate, you absolutely must measure the performance of the whole application before and after your changes. These sort of optimizations is a tricky business and things can get worse even though in theory the performance should have been improved. There are many tools that can be used to help you profile the memory access. For example, cachegrind. Intel's VTune does the same. And many other tools. So don't guess, experiment and verify the results.