Can't deque.push_back() 10 million+ deques

Can't deque.push_back() 10 million+ deques - c++

I'm a student, and my Operating Systems class project has a little snag, which is admittedly a bit superfluous to the assignment specifications itself:
While I can push 1 million deques into my deque of deques, I cannot push ~10 million or more.
Now, in the actual program, there is lots of stuff going on, and the only thing already asked on Stack Overflow with even the slightest relevance had exactly that, only slight relevance. https://stackoverflow.com/a/11308962/3407808
Since that answer had focused on "other functions corrupting the heap", I isolated the code into a new project and ran that separately, and found everything to fail in exactly the same ways.
Here's the code itself, stripped down and renamed for the sake of space.
#include <iostream>
#include <string>
#include <sstream>
#include <deque>
using namespace std;
class cat
{
cat();
};
bool number_range(int lower, int upper, double value)
{
while(true)
{
if(value >= lower && value <= upper)
{
return true;
}
else
{
cin.clear();
cerr << "Value not between " << lower << " and " << upper << ".\n";
return false;
}
}
}
double get_double(char *message, int lower, int upper)
{
double out;
string in;
while(true) {
cout << message << " ";
getline(cin,in);
stringstream ss(in); //convert input to stream for conversion to double
if(ss >> out && !(ss >> in))
{
if (number_range(lower, upper, out))
{
return out;
}
}
//(ss >> out) checks for valid conversion to double
//!(ss >> in) checks for unconverted input and rejects it
cin.clear();
cerr << "Value not between " << lower << " and " << upper << ".\n";
}
}
int main()
{
int dq_amount = 0;
deque<deque <cat> > dq_array;
deque<cat> dq;
do {
dq_amount = get_double("INPUT # OF DEQUES: ", 0, 99999999);
for (int i = 0; i < number_of_printers; i++)
{
dq_array.push_back(dq);
}
} while (!number_range(0, 99999999, dq_amount));
}
In case that's a little obfuscated, the design (just in case it's related to the error) is that my program asks for you to input an integer value. It takes your input and verifies that it can be read as an integer, and then further parses it to ensure it is within certain numerical bounds. Once it's found within bounds, I push deques of myClass into a deque of deques of myClass, for the amount of times equal to the user's input.
This code has been working for the past few weeks that I've being making this project, but my upper bound had always been 9999, and I decided to standardize it with most of the other inputs in my program, which is an appreciably large 99,999,999. Trying to run this code with 9999 as the user input works fine, even with 99999999 as the upper bound. The issue is a runtime error that happens if the user input is 9999999+.
Is there any particular, clear reason why this doesn't work?
Oh, right, the error message itself from Code::Blocks 13.12:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's supprt team for more information.
Process returned 3 (0x3) execution time : 12.559 s
Press any key to continue.
I had screenshots, but I need to be 10+ reputation in order to put images into my questions.

This looks like address space exhaustion.
If you are compiling for a 32-bit target, you will generally be limited to 2 GiB of user-mode accessible address space per process, or maybe 3 GiB on some platforms. (The remainder is reserved for kernel-mode mappings shared between processes)
If you are running on a 64-bit platform and build a 64-bit binary, you should be able to do substantially more new/alloc() calls, but be advised you may start hitting swap.
Alternatively, you might be hitting a resource quota even if you are building a 64-bit binary. On Linux you can check ulimit -d to see if you have a per-process memory limit.

Related

Max number of elements vector

I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.

Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.

Why does the program ends even before running?

I want to use this question to improve a bit in my general understanding of how computer works, since I'll probably never have the chance to study in a profound and deep manner. Sorry in advance if the question is silly and not useful in general, but I prefer to learn in this way.
I am learning c++, I found online a code that implements the Newton-Raphson method for finding the root of a function. The code is pretty simple, as you can see from it, at the beginning it asks for the tolerance required, and if I give a "decent" number it works fine. If instead, when it asks for the tolerance I write something like 1e-600, the program break down immediately and the output is Enter starting value x: Failed to converge after 100 iterations .
The output of failed convergence should be a consequence of running the loop for more than 100 iterations, but this isn't the case since the loop doesn't even start. It looks like the program knows already it won't reach that level of tolerance.
Why does this happen? How can the program write that output even if it didn't try the loop for 100 times?
Edit: It seems that everything meaningless (too small numbers, words) I write when it asks for tolerance produces a pnew=0.25 and then the code runs 100 times and fails.
The code is the following:
#include <iostream>
#include <cmath>
using namespace std;
#define N 100 // Maximum number of iterations
int main() {
double p, pnew;
double f, dfdx;
double tol;
int i;
cout << "Enter tolerance: ";
cin >> tol;
cout << "Enter starting value x: ";
cin >> pnew;
// Main Loop
for(i=0; i < N; i++){
p = pnew;
//Evaluate the function and its derivative
f = 4*p - cos(p);
dfdx= 4 + sin(p);
// The Newton-Raphson step
pnew = p - f/dfdx;
// Check for convergence and quit if done
if(abs(p-pnew) < tol){
cout << "Root is " << pnew << " to within " << tol << "\n";
return 0;
}
}
// We reach this point only if the iteration failed to converge
cerr << "Failed to converge after " << N << " iterations.\n";
return 1;
}

1e-600 is not representable by most implementations of double. std::cin will fail to convert your input to double and fall into a failed state. This means that, unless you clear the error state, any future std::cin also automatically fails without waiting for user input.
From cppreference (since c++17) :
If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits<T>::max() or std::numeric_limits<T>::min() is written and failbit flag is set.

As mentioned, 1e-600 is not a valid double value. However, there's more to it than being outside of the range. What's likely happening is that 1 is scanned into tol, and then some portion of e-600 is being scanned into pnew, and that's why it ends immediately, instead of asking for input for pnew.

Like François said, you cannot exeed 2^64 when you work on an 64bit machine (with corresponding OS) and 2^32 on a 32bit machine, you can use SSE which are 4 32 bytes data used for floating point representation. In your program the function fails at every iteration and skips your test with "if" and so never returns before ending the loop.

How can I print a newline without flushing the buffer?

I was experimenting with c++ trying to figure out how I could print the numbers from 0 to n as fast as possible.
At first I just printed all the numbers with a loop:
for (int i = 0; i < n; i++)
{
std::cout << i << std::endl;
}
However, I think this flushes the buffer after every single number that it outputs, and surely that must take some time, so I tried to first print all the numbers to the buffer (or actually until it's full as it seems then seems to flush automatically) and then flush it all at once. However it seems that printing a \n after flushes the buffer like the std::endl so I omitted it:
for (int i = 0; i < n; i++)
{
std::cout << i << ' ';
}
std::cout << std::endl;
This seems to run about 10 times faster than the first example. However I want to know how to store all the values in the buffer and flush it all at once rather than letting it flush every time it becomes full so I have a few questions:
Is it possible to print a newline without flushing the buffer?
How can I change the buffer size so that I could store all the values inside it and flush it at the very end?
Is this method of outputting text dumb? If so, why, and what would be a better alternative to it?
EDIT: It seems that my results were biased by a laggy system (Terminal app of a smartphone)... With a faster system the execution times show no significant difference.

TL;DR: In general, using '\n' instead of std::endl is faster since std::endl
Explanation:
std::endl causes a flushing of the buffer, whereas '\n' does not.
However, you might or might not notice any speedup whatsoever depending upon the method of testing that you apply.
Consider the following test files:
endl.cpp:
#include <iostream>
int main() {
for ( int i = 0 ; i < 1000000 ; i++ ) {
std::cout << i << std::endl;
}
}
slashn.cpp:
#include <iostream>
int main() {
for ( int i = 0 ; i < 1000000 ; i++ ) {
std::cout << i << '\n';
}
}
Both of these are compiled using g++ on my linux system and undergo the following tests:
1. time ./a.out
For endl.cpp, it takes 19.415s.
For slashn.cpp, it takes 19.312s.
2. time ./a.out >/dev/null
For endl.cpp, it takes 0.397s
For slashn.cpp, it takes 0.153s
3. time ./a.out >temp
For endl.cpp, it takes 2.255s
For slashn.cpp, it takes 0.165s
Conclusion: '\n' is definitely faster (even practically), but the difference in speed can be dependant upon other factors. In the case of a terminal window, the limiting factor seems to depend upon how fast the terminal itself can display the text. As the text is shown on screen, and auto scrolling etc needs to happen, massive slowdowns occur in the execution. On the other hand, for normal files (like the temp example above), the rate at which the buffer is being flushed affects it a lot. In the case of some special files (like /dev/null above), since the data is just sinked into a black-hole, the flushing doesn't seem to have an effect.

How to set a priority for the execution of the program in source code?

I wrote the following code, that must do search of all possible combinations of two digits in a string whose length is specified:
#include <iostream>
#include <Windows.h>
int main ()
{
using namespace std;
cout<<"Enter length of array"<<endl;
int size;
cin>>size;
int * ps=new int [size];
for (int i=0; i<size; i++)
ps[i]=3;
int k=4;
SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
while (k>=0)
{
for (int bi=0; bi<size; bi++)
std::cout<<ps[bi];
std::cout<<std::endl;
int i=size-1;
if (ps[i]==3)
{
ps[i]=4;
continue;
}
if (ps[i]==4)
{
while (ps[i]==4)
{
ps[i]=3;
--i;
}
ps[i]=4;
if (i<k)
k--;
}
}
}
When programm was executing on Windows 7, I saw that load of CPU is only 10-15%, in order to make my code worked faster, i decided to change priority of my programm to High. But when i did it there was no increase in work and load of CPU stayed the same. Why CPU load doesn't change? Incorrect statement SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);? Or this code cannot work faster?

If your CPU is not working at it's full capacity it means that your application is not capable of using it because of causes like I/O, sleeps, memory or other device throughtput capabilties.
Most probably, however, it means that your CPU has 2+ cores and your application is single-threaded. In this case you have to go through the process of paralellizing your application, which is often neither simple nor fast.
In case of the code you posted, the most time consuming operation is actually (most probably) printing the results. Remove the cout code and see for yourself how fast the code will work.

Increasing the priority of your programm won't help much.
What you need to do is to remove the cout from your calculations. Store your computations and output them afterwards.
As others have noted it might also be that you use a multi-core machine. Anyway removing any output from your computation loop is always a first step to use 100% of your machines computation power for that and not waste cycles on output.
std::vector<int> results;
results.reserve(1000); // this should ideally match the number of results you expect
while (k>=0)
{
for (int bi=0; bi<size; bi++){
results.push_back(ps[bi]);
}
int i=size-1;
if (ps[i]==3)
{
ps[i]=4;
continue;
}
if (ps[i]==4)
{
while (ps[i]==4)
{
ps[i]=3;
--i;
}
ps[i]=4;
if (i<k)
k--;
}
}
// now here yuo can output your data
for(auto&& res : results){
cout << res << "\n"; // \n to not force flush
}
cout << endl; // now force flush

What's probably happening is you're on a multi-core/multi-thread machine and you're running on only one thread, the rest of the CPU power is just sitting idle. So you'll want to multi-thread your code. Look at boost thread.

std::bad_alloc without going into swap space

I'm trying to understand why I am getting std::bad_alloc exceptions when I seem to have enough (virtual?) memory available to me.
Essentially I have a prime number generator (Eratosthenes sieve (not segmented yet)) where I'm newing bools for an indicator array, and then newing ints for the primes I've found under a bound I specify on the command line.
I have 1GB RAM (some of this will be hogged by my OS (ubuntu 10.04), and probably some of it is not available as heap memory (am I wrong here?)) and 2.8 GB of swap space (I believe that was auto set for me when installing Ubuntu)
If I set an upper bound of 600000000 then I'm asking for 0.6 GB of memory for my indicator array and roughly 30000000*4 bytes (slight over estimate given there are 26355867 primes less than 500000000) for my primes array, and a few variables here and there; this means I'm asking for about .72 (+ negligible) GB of memory which I believe should be covered by the swap space available to me (I am aware touching that stuff will slow my program down ridiculously). However I am getting std::bad_allocs.
Could anyone point out what I'm missing here? (one last thing having changed long long ints to ints before pasting my last error was a seg fault (my numbers are way below 2^31 though so I can't see where I'm overflowing) - still trying to figure that one out)
My code is as follows (and without taking away from me the benefit of my own investigation into quicker algorithms etc.. I'd be appreciative of any code improvements here! (i.e. if I'm committing major no-no s))
main.cpp
#include <iostream>
#include <cmath>
#include "Prime.hpp"
#include <ctime>
#include <stdio.h>
#include <cstring>
//USAGE: execute program with the nth prime you want and an upper bound for finding primes --too high may cause bad alloc
int main(int argc, const char *argv[])
{
int a = strlen(argv[1]);
clock_t start = clock();
if(argc != 2)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
const char* primeBound = argv[1];
int inputNum = 0;
for(int i = 0; i < strlen(argv[1]); i++)
{
if(primeBound[i] < 48 || primeBound[i] > 57 || primeBound[0] == 48)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
inputNum = (int)(primeBound[i]-48) + (10 * inputNum);
}
if(inputNum > 600000000)//getting close to the memory limit for this machine (1GB - memory used by the OS):
//(each bool takes 1 byte and I'd be asking for more than 500 million of these
//and I'd also asking for over 100000000 bytes to store the primes > 0.6 GB)
{
std::cout << "USAGE: Enter a positive inputNumber <= 500000000.\n"
<< "This inputNumber is an upper bound for the primes that can be found\n";
return -1;
}
Prime p(inputNum);
std::cout << "the largest prime less than " << inputNum << " is: " << p.getPrime(p.getNoOfPrimes()) << "\n";
std::cout << "Number of primes: " << p.getNoOfPrimes() << "\n";
std::cout << ((double)clock() - start) / CLOCKS_PER_SEC << "\n";
return 0;
}
Prime.hpp
#ifndef PRIME_HPP
#define PRIME_HPP
#include <iostream>
#include <cmath>
class Prime
{
int lastStorageSize;
bool* primeIndicators;
int* primes;
int noOfPrimes;
void allocateIndicatorArray(int num);
void allocatePrimesArray();
void generateIndicators();
void generatePrimeList();
Prime(){}; //forcing constructor with param = size
public:
Prime(int num);
int getNoOfPrimes();
int getPrime(int nthPrime);
~Prime(){delete [] primeIndicators; delete [] primes;}
};
#endif
Prime.cpp
#include "Prime.hpp"
#include <iostream>
//don't know how much memory I will need so allocate on the heap
void Prime::allocateIndicatorArray(int num)
{
try
{
primeIndicators = new bool[num];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
//if I'm looking for a particular prime I might have over-allocated here anyway...might be worth
//decreasing num and trying again - if this is possible!
}
lastStorageSize = num;
}
void Prime::allocatePrimesArray()
{
//could probably speed up generateIndicators() if, using some prime number theory, I slightly over allocate here
//since that would cut down the operations dramatically (a small procedure done many times made smaller)
try
{
primes = new int[lastStorageSize];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
//if I'm looking for a particular prime I might have over-allocated here anyway...might be worth
//decreasing num and trying again - if this is possible!
}
}
void Prime::generateIndicators()
{
//first identify the primes -- if we see a 0 then start flipping all elements that are multiples of i starting from i*i (these will not be prime)
int numPrimes = lastStorageSize - 2; //we'll be starting at i = 2 (so numPrimes is at least 2 less than lastStorageSize)
for(int i=4; i < lastStorageSize; i+=2)
{
primeIndicators[i]++; //dispense with all the even numbers (barring 2 - that one's prime)
numPrimes--;
}
//TODO here I'm multiple counting the same things...not cool >;[
//may cost too much to avoid this wastage unfortunately
for(int i=3; i < sqrt(double(lastStorageSize)); i+=2) //we start j at i*i hence the square root
{
if(primeIndicators[i] == 0)
{
for(int j = i*i; j < lastStorageSize; j = j+(2*i)) //note: i is prime, and we'll have already sieved any j < i*i
{
if(primeIndicators[j] == 0)
{
numPrimes--;//we are not checking each element uniquely yet :/
primeIndicators[j]=1;
}
}
}
}
noOfPrimes = numPrimes;
}
void Prime::generatePrimeList()
{
//now we go and get the primes, i.e. wherever we see zero in primeIndicators[] then populate primes with the value of i
int primesCount = 0;
for(int i=2;i<lastStorageSize; i++)
{
if(primeIndicators[i] == 0)
{
if(i%1000000 = 0)
std::cout << i << " ";
primes[primesCount]=i;
primesCount++;
}
}
}
Prime::Prime(int num)
{
noOfPrimes = 0;
allocateIndicatorArray(num);
generateIndicators();
allocatePrimesArray();
generatePrimeList();
}
int Prime::getPrime(int nthPrime)
{
if(nthPrime < lastStorageSize)
{
return primes[nthPrime-1];
}
else
{
std::cout << "insufficient primes found\n";
return -1;
}
}
int Prime::getNoOfPrimes()
{
return noOfPrimes;
}
Whilst I'm reading around has anybody got any insight on this?
edit For some reason I decided to start newing my primes list with lastStorageSize ints instead of noOfPrime! thanks to David Fischer for spotting that one!
I can now exceed 600000000 as an upper bound

The amount of memory you can use inside your program is limited by the lesser of the two: 1) the available virtual memory, 2) the available address space.
If you are compiling your program as a 32-bit executable on a platform with flat memory model, the absolute limit of addressable space for a single process is 4GB. In this situation it is completely irrelevant how much swap space you have available. You simply can't allocate more than 4GB in a flat-memory 32-bit program, even if you still have lots of free swap space. Moreover, a large chunk of those 4GB of available addresses will be reserved for system needs.
On such a 32-bit platform allocating a large amount of swap space does make sense, since it will let you run multiple processes at once. But it does nothing to overcome the 4GB address space barrier for each specific process.
Basically, think of it as a phone number availability problem: if some region uses 7-digit phone numbers, then once you run out of the available 7-digit phone numbers in that region, manufacturing more phones for that region no longer makes any sense - they won't be usable. By adding swap space you essentially "manufacturing phones". But you have already run out of available "phone numbers".
The same issue formally exists, of course, with flat-memory model 64-bit platforms. However, the address space of 64-bit platform is so huge, that it is no longer a bottleneck (you know, "64-bit should be enough for everyone" :) )

When you allocate the sieve,
void Prime::allocateIndicatorArray(int num)
{
try
{
primeIndicators = new bool[num];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
}
lastStorageSize = num;
}
you set lastStorageSize to num, the given bound for the primes. Then you never change it, and
void Prime::allocatePrimesArray()
{
try
{
primes = new int[lastStorageSize];
}
catch(std::bad_alloc ba)
{
std::cout << "not enough memory :[";
}
}
try to allocate an int array of lastStorageSize elements.
If num is around 500 million, that's around 2 GB that you request. Depending on operating system/overcommitting strategy, that can easily cause a bad_alloc even though you only need a fraction of the space actually.
After the sieving is finished, you set noOfPrimes to the count of found primes - use that number to allocate the primes array.

Since the memory usage of the program is so easy to analyze, just let the memory layout be completely fixed. Don't dynamically allocate anything. Use std::bitset to get a fixed-size bitvector, and make that a global variable.
std::bitset< 600000000 > indicators; // 75 MB
This won't take up space on disk. The OS will just allocate pages of zeroes as you progress along the array. And it makes better use of each bit.
Of course, half the bits represent even numbers, despite there being only one even prime. Here are a couple prime generators that optimize out such things.
By the way, it's better to avoid explicitly writing new if possible, avoid calling functions from the constructor, and to rethrow the std::bad_alloc to avoid allowing the object to be constructed into an invalid state.

The first question is "what other processes are running?" The
2.87 GB of swap space is shared between all of the running
processes; it is not per process. And frankly, on a modern
system, 2.8 GB sounds fairly low to me. I wouldn't try to run
recent versions of Windows or Linux with less than 2GB ram and
4GB swap. (Recent versions of Linux, at least in the Ubuntu
distribution, especially, seem to start up a lot of daemons
which hog the memory.) You might want to try top, sorted on
virtual memory size, just to see how much other processes are
taking.
cat /proc/meminfo will also give you a lot of valuable
information about what is actually being used. (On my system,
running just a couple of xterm with bash, plus Firefox, I
have only 3623776 kB free, on a system with 8GB. Some of the
memory counted as used is probably things like disk caching,
which the system can scale back if an application requests
memory.)
Second, concerning your seg faults: by default, Linux doesn't
always report allways report allocation failures correctly; it
will often lie, telling you that you have the memory, when you
don't. Try cat /proc/sys/vm/overcommit_memory. If it
displays zero, then you need to change it. If this is the case,
try echo 2 > /proc/sys/vm/overcommit_memory (and do this in
one of the rc files). You may have to change the
/proc/sys/vm/overcommit_ratio as well to get reliable behavior
from sbrk (which both malloc and operator new depend on).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can't deque.push_back() 10 million+ deques - c++

Related

Max number of elements vector

Why does the program ends even before running?

How can I print a newline without flushing the buffer?

How to set a priority for the execution of the program in source code?

std::bad_alloc without going into swap space

Categories

Resources