I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.
Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.
Related
My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.
So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.
As far as I can tell, calling malloc() basically means the program is asking the OS for a hunk of memory. I'm writing a program to interface with a camera, in which I need to allocate chucks of memory large enough to store hundreds of images at a time (its a fast camera).
When I allocate space for about 1.9 Gb worth of images, everything works just fine. The allocation calculation is pretty simple:
int allocateBurst( int numImages )
{
int streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
But as soon as I go over the 2 Gb limit, I get runtime errors like this:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
It seems like 2 Gigs might be the maximum size that I can allocate at once. I have 32 Gigs of ram, and would like to simply be able to allocate larger pieces of memory in one allocation. Is this possible?
I'm running Ubuntu 12.10.
There may be an underlying issue that the OS can't grant your large memory allocation because it is using memory for other applications. Check with your OS to see what the limits are.
Also know that some OS's will "page" memory to the hard disk. When your program asks for memory outside the page, the OS will swap pages with the hard disk. Knowing this, I recommend a classic technique of "Double Buffering" or "Multiple Buffering".
You will need at least two threads: reading and writing. One thread is responsible for reading data from the camera and placing into a buffer. When it fills up a buffer, it starts on another buffer. Meanwhile the writing thread is starting at the buffer and writing it to disk (block file writes). When the writing thread finishes a buffer, it starts on the next one. The buffers should be in a circular sequence to reuse them.
The magic is to have enough buffers so that the reader never catches up to the writer.
Since you are using a couple of small buffers, you should not get any errors from the OS.
The are methods to optimize this, such as obtaining static buffers from the OS.
The problem is you're using a signed 32-bit variable to describe an unsigned 64-bit number.
Use "size_t" instead of "int" for holding the storage count. This has nothing to do with what you intend to store, just how large a count of them you need.
#include <iostream>
int main(int /*argc*/, const char** /*argv*/)
{
int units = 2;
// 32-bit signed, i.e. 31-bit numbers.
int intSize = units * 1024 * 1024 * 1024;
// 64-bit values (ULL suffix)
size_t sizetSize = units * 1024ULL * 1024ULL * 1024ULL;
std::cout << "intSize = " << intSize << ", sizetSize = " << sizetSize << std::endl;
try {
unsigned short* intAlloc = new unsigned short[intSize];
std::cout << "intAlloc = " << intAlloc << std::endl;
delete [] intAlloc;
} catch (std::bad_alloc) {
std::cout << "intAlloc failed (std::bad_alloc)" << std::endl;
}
try {
unsigned short* sizetAlloc = new unsigned short[sizetSize];
std::cout << "sizetAlloc = " << sizetAlloc << std::endl;
delete [] sizetAlloc;
} catch (std::bad_alloc) {
std::cout << "sizetAlloc failed (std::bad_alloc)" << std::endl;
}
return 0;
}
Output (g++ -m64 -o test test.cpp under Mint 15 64 bit with g++ 4.7.3 on a virtual machine with 4Gb of memory)
intSize = -2147483648, sizetSize = 2147483648
intAlloc failed
sizetAlloc = 0x7f55affff010
int allocateBurst( int numImages )
{
// change that from int to long
long streamSize = ZIMAGESIZE * numImages;
data.images = new unsigned short [streamSize];
return 0;
}
Try using
long
OR
cast the result of the allocateBurst function to "uint_64" and the return type of the function to uint_64
Because int you allocate 32 bit allocation while long or uint_64 allocates 64 bit allocation which could possibly allocate more memory space for you.
Hope that helps
I'm trying to understand why I am getting std::bad_alloc exceptions when I seem to have enough (virtual?) memory available to me.
Essentially I have a prime number generator (Eratosthenes sieve (not segmented yet)) where I'm newing bools for an indicator array, and then newing ints for the primes I've found under a bound I specify on the command line.
I have 1GB RAM (some of this will be hogged by my OS (ubuntu 10.04), and probably some of it is not available as heap memory (am I wrong here?)) and 2.8 GB of swap space (I believe that was auto set for me when installing Ubuntu)
If I set an upper bound of 600000000 then I'm asking for 0.6 GB of memory for my indicator array and roughly 30000000*4 bytes (slight over estimate given there are 26355867 primes less than 500000000) for my primes array, and a few variables here and there; this means I'm asking for about .72 (+ negligible) GB of memory which I believe should be covered by the swap space available to me (I am aware touching that stuff will slow my program down ridiculously). However I am getting std::bad_allocs.
Could anyone point out what I'm missing here? (one last thing having changed long long ints to ints before pasting my last error was a seg fault (my numbers are way below 2^31 though so I can't see where I'm overflowing) - still trying to figure that one out)
My code is as follows (and without taking away from me the benefit of my own investigation into quicker algorithms etc.. I'd be appreciative of any code improvements here! (i.e. if I'm committing major no-no s))
main.cpp
#include <iostream>
#include <cmath>
#include "Prime.hpp"
#include <ctime>
#include <stdio.h>
#include <cstring>
//USAGE: execute program with the nth prime you want and an upper bound for finding primes --too high may cause bad alloc
int main(int argc, const char *argv[])
{
int a = strlen(argv[1]);
clock_t start = clock();
if(argc != 2)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
const char* primeBound = argv[1];
int inputNum = 0;
for(int i = 0; i < strlen(argv[1]); i++)
{
if(primeBound[i] < 48 || primeBound[i] > 57 || primeBound[0] == 48)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
inputNum = (int)(primeBound[i]-48) + (10 * inputNum);
}
if(inputNum > 600000000)//getting close to the memory limit for this machine (1GB - memory used by the OS):
//(each bool takes 1 byte and I'd be asking for more than 500 million of these
//and I'd also asking for over 100000000 bytes to store the primes > 0.6 GB)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
Prime p(inputNum);
std::cout << "the largest prime less than " << inputNum << " is: " << p.getPrime(p.getNoOfPrimes()) << "\n";
std::cout << "Number of primes: " << p.getNoOfPrimes() << "\n";
std::cout << ((double)clock() - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
Prime.hpp
#ifndef PRIME_HPP
#define PRIME_HPP
#include <iostream>
#include <cmath>
class Prime
{
int lastStorageSize;
bool* primeIndicators;
int* primes;
int noOfPrimes;
void allocateIndicatorArray(int num);
void allocatePrimesArray();
void generateIndicators();
void generatePrimeList();
Prime(){}; //forcing constructor with param = size
public:
Prime(int num);
int getNoOfPrimes();
int getPrime(int nthPrime);
~Prime(){delete [] primeIndicators; delete [] primes;}
};
#endif
Prime.cpp
#include "Prime.hpp"
#include <iostream>
//don't know how much memory I will need so allocate on the heap
void Prime::allocateIndicatorArray(int num)
{
try
{
primeIndicators = new bool[num];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
//if I'm looking for a particular prime I might have over-allocated here anyway...might be worth
//decreasing num and trying again - if this is possible!
}
lastStorageSize = num;
}
void Prime::allocatePrimesArray()
{
//could probably speed up generateIndicators() if, using some prime number theory, I slightly over allocate here
//since that would cut down the operations dramatically (a small procedure done many times made smaller)
try
{
primes = new int[lastStorageSize];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
//if I'm looking for a particular prime I might have over-allocated here anyway...might be worth
//decreasing num and trying again - if this is possible!
}
}
void Prime::generateIndicators()
{
//first identify the primes -- if we see a 0 then start flipping all elements that are multiples of i starting from i*i (these will not be prime)
int numPrimes = lastStorageSize - 2; //we'll be starting at i = 2 (so numPrimes is at least 2 less than lastStorageSize)
for(int i=4; i < lastStorageSize; i+=2)
{
primeIndicators[i]++; //dispense with all the even numbers (barring 2 - that one's prime)
numPrimes--;
}
//TODO here I'm multiple counting the same things...not cool >;[
//may cost too much to avoid this wastage unfortunately
for(int i=3; i < sqrt(double(lastStorageSize)); i+=2) //we start j at i*i hence the square root
{
if(primeIndicators[i] == 0)
{
for(int j = i*i; j < lastStorageSize; j = j+(2*i)) //note: i is prime, and we'll have already sieved any j < i*i
{
if(primeIndicators[j] == 0)
{
numPrimes--;//we are not checking each element uniquely yet :/
primeIndicators[j]=1;
}
}
}
}
noOfPrimes = numPrimes;
}
void Prime::generatePrimeList()
{
//now we go and get the primes, i.e. wherever we see zero in primeIndicators[] then populate primes with the value of i
int primesCount = 0;
for(int i=2;i<lastStorageSize; i++)
{
if(primeIndicators[i] == 0)
{
if(i%1000000 = 0)
std::cout << i << " ";
primes[primesCount]=i;
primesCount++;
}
}
}
Prime::Prime(int num)
{
noOfPrimes = 0;
allocateIndicatorArray(num);
generateIndicators();
allocatePrimesArray();
generatePrimeList();
}
int Prime::getPrime(int nthPrime)
{
if(nthPrime < lastStorageSize)
{
return primes[nthPrime-1];
}
else
{
std::cout << "insufficient primes found\n";
return -1;
}
}
int Prime::getNoOfPrimes()
{
return noOfPrimes;
}
Whilst I'm reading around has anybody got any insight on this?
edit For some reason I decided to start newing my primes list with lastStorageSize ints instead of noOfPrime! thanks to David Fischer for spotting that one!
I can now exceed 600000000 as an upper bound
The amount of memory you can use inside your program is limited by the lesser of the two: 1) the available virtual memory, 2) the available address space.
If you are compiling your program as a 32-bit executable on a platform with flat memory model, the absolute limit of addressable space for a single process is 4GB. In this situation it is completely irrelevant how much swap space you have available. You simply can't allocate more than 4GB in a flat-memory 32-bit program, even if you still have lots of free swap space. Moreover, a large chunk of those 4GB of available addresses will be reserved for system needs.
On such a 32-bit platform allocating a large amount of swap space does make sense, since it will let you run multiple processes at once. But it does nothing to overcome the 4GB address space barrier for each specific process.
Basically, think of it as a phone number availability problem: if some region uses 7-digit phone numbers, then once you run out of the available 7-digit phone numbers in that region, manufacturing more phones for that region no longer makes any sense - they won't be usable. By adding swap space you essentially "manufacturing phones". But you have already run out of available "phone numbers".
The same issue formally exists, of course, with flat-memory model 64-bit platforms. However, the address space of 64-bit platform is so huge, that it is no longer a bottleneck (you know, "64-bit should be enough for everyone" :) )
When you allocate the sieve,
void Prime::allocateIndicatorArray(int num)
{
try
{
primeIndicators = new bool[num];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
}
lastStorageSize = num;
}
you set lastStorageSize to num, the given bound for the primes. Then you never change it, and
void Prime::allocatePrimesArray()
{
try
{
primes = new int[lastStorageSize];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
}
}
try to allocate an int array of lastStorageSize elements.
If num is around 500 million, that's around 2 GB that you request. Depending on operating system/overcommitting strategy, that can easily cause a bad_alloc even though you only need a fraction of the space actually.
After the sieving is finished, you set noOfPrimes to the count of found primes - use that number to allocate the primes array.
Since the memory usage of the program is so easy to analyze, just let the memory layout be completely fixed. Don't dynamically allocate anything. Use std::bitset to get a fixed-size bitvector, and make that a global variable.
std::bitset< 600000000 > indicators; // 75 MB
This won't take up space on disk. The OS will just allocate pages of zeroes as you progress along the array. And it makes better use of each bit.
Of course, half the bits represent even numbers, despite there being only one even prime. Here are a couple prime generators that optimize out such things.
By the way, it's better to avoid explicitly writing new if possible, avoid calling functions from the constructor, and to rethrow the std::bad_alloc to avoid allowing the object to be constructed into an invalid state.
The first question is "what other processes are running?" The
2.87 GB of swap space is shared between all of the running
processes; it is not per process. And frankly, on a modern
system, 2.8 GB sounds fairly low to me. I wouldn't try to run
recent versions of Windows or Linux with less than 2GB ram and
4GB swap. (Recent versions of Linux, at least in the Ubuntu
distribution, especially, seem to start up a lot of daemons
which hog the memory.) You might want to try top, sorted on
virtual memory size, just to see how much other processes are
taking.
cat /proc/meminfo will also give you a lot of valuable
information about what is actually being used. (On my system,
running just a couple of xterm with bash, plus Firefox, I
have only 3623776 kB free, on a system with 8GB. Some of the
memory counted as used is probably things like disk caching,
which the system can scale back if an application requests
memory.)
Second, concerning your seg faults: by default, Linux doesn't
always report allways report allocation failures correctly; it
will often lie, telling you that you have the memory, when you
don't. Try cat /proc/sys/vm/overcommit_memory. If it
displays zero, then you need to change it. If this is the case,
try echo 2 > /proc/sys/vm/overcommit_memory (and do this in
one of the rc files). You may have to change the
/proc/sys/vm/overcommit_ratio as well to get reliable behavior
from sbrk (which both malloc and operator new depend on).
when using C++ vector, time spent is 718 milliseconds,
while when I use Array, time is almost 0 milliseconds.
Why so much performance difference?
int _tmain(int argc, _TCHAR* argv[])
{
const int size = 10000;
clock_t start, end;
start = clock();
vector<int> v(size*size);
for(int i = 0; i < size; i++)
{
for(int j = 0; j < size; j++)
{
v[i*size+j] = 1;
}
}
end = clock();
cout<< (end - start)
<<" milliseconds."<<endl; // 718 milliseconds
int f = 0;
start = clock();
int arr[size*size];
for(int i = 0; i < size; i++)
{
for(int j = 0; j < size; j++)
{
arr[i*size+j] = 1;
}
}
end = clock();
cout<< ( end - start)
<<" milliseconds."<<endl; // 0 milliseconds
return 0;
}
Your array arr is allocated on the stack, i.e., the compiler has calculated the necessary space at compile time. At the beginning of the method, the compiler will insert an assembler statement like
sub esp, 10000*10000*sizeof(int)
which means the stack pointer (esp) is decreased by 10000 * 10000 * sizeof(int) bytes to make room for an array of 100002 integers. This operation is almost instant.
The vector is heap allocated and heap allocation is much more expensive. When the vector allocates the required memory, it has to ask the operating system for a contiguous chunk of memory and the operating system will have to perform significant work to find this chunk of memory.
As Andreas says in the comments, all your time is spent in this line:
vector<int> v(size*size);
Accessing the vector inside the loop is just as fast as for the array.
For an additional overview see e.g.
[What and where are the stack and heap?
[http://computer.howstuffworks.com/c28.htm][2]
[http://www.cprogramming.com/tutorial/virtual_memory_and_heaps.html][3]
Edit:
After all the comments about performance optimizations and compiler settings, I did some measurements this morning. I had to set size=3000 so I did my measurements with roughly a tenth of the original entries. All measurements performed on a 2.66 GHz Xeon:
With debug settings in Visual Studio 2008 (no optimization, runtime checks, and debug runtime) the vector test took 920 ms compared to 0 ms for the array test.
98,48 % of the total time was spent in vector::operator[], i.e., the time was indeed spent on the runtime checks.
With full optimization, the vector test needed 56 ms (with a tenth of the original number of entries) compared to 0 ms for the array.
The vector ctor required 61,72 % of the total application running time.
So I guess everybody is right depending on the compiler settings used. The OP's timing suggests an optimized build or an STL without runtime checks.
As always, the morale is: profile first, optimize second.
If you are compiling this with a Microsoft compiler, to make it a fair comparison you need to switch off iterator security checks and iterator debugging, by defining _SECURE_SCL=0 and _HAS_ITERATOR_DEBUGGING=0.
Secondly, the constructor you are using initialises each vector value with zero, and you are not memsetting the array to zero before filling it. So you are traversing the vector twice.
Try:
vector<int> v;
v.reserve(size*size);
To get a fair comparison I think something like the following should be suitable:
#include <sys/time.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include <numeric>
int main()
{
static size_t const size = 7e6;
timeval start, end;
int sum;
gettimeofday(&start, 0);
{
std::vector<int> v(size, 1);
sum = std::accumulate(v.begin(), v.end(), 0);
}
gettimeofday(&end, 0);
std::cout << "= vector =" << std::endl
<< "(" << end.tv_sec - start.tv_sec
<< " s, " << end.tv_usec - start.tv_usec
<< " us)" << std::endl
<< "sum = " << sum << std::endl << std::endl;
gettimeofday(&start, 0);
int * const arr = new int[size];
std::fill(arr, arr + size, 1);
sum = std::accumulate(arr, arr + size, 0);
delete [] arr;
gettimeofday(&end, 0);
std::cout << "= Simple array =" << std::endl
<< "(" << end.tv_sec - start.tv_sec
<< " s, " << end.tv_usec - start.tv_usec
<< " us)" << std::endl
<< "sum = " << sum << std::endl << std::endl;
}
In both cases, dynamic allocation and deallocation is performed, as well as accesses to elements.
On my Linux box:
$ g++ -O2 foo.cpp
$ ./a.out
= vector =
(0 s, 21085 us)
sum = 7000000
= Simple array =
(0 s, 21148 us)
sum = 7000000
Both the std::vector<> and array cases have comparable performance. The point is that std::vector<> can be just as fast as a simple array if your code is structured appropriately.
On a related note switching off optimization makes a huge difference in this case:
$ g++ foo.cpp
$ ./a.out
= vector =
(0 s, 120357 us)
sum = 7000000
= Simple array =
(0 s, 60569 us)
sum = 7000000
Many of the optimization assertions made by folks like Neil and jalf are entirely correct.
HTH!
EDIT: Corrected code to force vector destruction to be included in time measurement.
Change assignment to eg. arr[i*size+j] = i*j, or some other non-constant expression. I think compiler optimizes away whole loop, as assigned values are never used, or replaces array with some precalculated values, so that loop isn't even executed and you get 0 milliseconds.
Having changed 1 to i*j, i get the same timings for both vector and array, unless pass -O1 flag to gcc, then in both cases I get 0 milliseconds.
So, first of all, double-check whether your loops are actually executed.
You are probably using VC++, in which case by default standard library components perform many checks at run-time (e.g whether index is in range). These checks can be turned off by defining some macros as 0 (I think _SECURE_SCL).
Another thing is that I can't even run your code as is: the automatic array is way too large for the stack. When I make it global, then with MingW 3.5 the times I get are 627 ms for the vector and 26875 ms (!!) for the array, which indicates there are really big problems with an array of this size.
As to this particular operation (filling with value 1), you could use the vector's constructor:
std::vector<int> v(size * size, 1);
and the fill algorithm for the array:
std::fill(arr, arr + size * size, 1);
Two things. One, operator[] is much slower for vector. Two, vector in most implementations will behave weird at times when you add in one element at a time. I don't mean just that it allocates more memory but it does some genuinely bizarre things at times.
The first one is the main issue. For a mere million bytes, even reallocating the memory a dozen times should not take long (it won't do it on every added element).
In my experiments, preallocating doesn't change its slowness much. When the contents are actual objects it basically grinds to a halt if you try to do something simple like sort it.
Conclusion, don't use stl or mfc vectors for anything large or computation heavy. They are implemented poorly/slowly and cause lots of memory fragmentation.
When you declare the array, it lives in the stack (or in static memory zone), which it's very fast, but can't increase its size.
When you declare the vector, it assign dynamic memory, which it's not so fast, but is more flexible in the memory allocation, so you can change the size and not dimension it to the maximum size.
When profiling code, make sure you are comparing similar things.
vector<int> v(size*size);
initializes each element in the vector,
int arr[size*size];
doesn't. Try
int arr[size * size];
memset( arr, 0, size * size );
and measure again...