C++ vector and memoization runtime error issues - c++

I encountered a problem here at Codechef. I am trying to use a vector for memoization. As I am still new at programming and quite unfamiliar with STL containers, I have used vector, for the lookup table. (although, I was suggested that using map helps to solve the problem).
So, my question is how is the solution given below running into a run time error. In order to get the error, I used the boundary value for the problem (100000000) as the input. The error message displayed by my Netbeans IDE is RUN FAILED (exit value 1, total time: 4s) with input as 1000000000. Here is the code:
#include <iostream>
#include <cstdlib>
#include <vector>
#include <string>
#define LCM 12
#define MAXSIZE 100000000
using namespace std;
/*
*
*/
vector<unsigned long> lookup(MAXSIZE,0);
int solve(int n)
{
if ( n < 12) {
return n;
}
else {
if (n < MAXSIZE) {
if (lookup[n] != 0) {
return lookup[n];
}
}
int temp = solve(n/2)+solve(n/3)+solve(n/4);
if (temp >= lookup[n] ) {
lookup[n] = temp;
}
return lookup[n];
}
}
int main(int argc, char** argv) {
int t;
cin>>t;
int n;
n = solve(t);
if ( t >= n) {
cout<<t<<endl;
}
else {
cout<<n<<endl;
}
return 0;
}

I doubt if this is a memory issue because he already said that the program actually runs and he inputs 100000000.
One things that I noticed, in the if condition you're doing a lookup[n] even if n == MAXSIZE (in this exact condition). Since C++ is uses 0-indexed vectors, then this would be 1 beyond the end of the vector.
if (n < MAXSIZE) {
...
}
...
if (temp >= lookup[n] ) {
lookup[n] = temp;
}
return lookup[n];
I can't guess what the algorithm is doing but I think the closing brace } of the first "if" should be lower down and you could return an error on this boundary condition.

You either don't have enough memory or don't have enough contiguous address space to store 100,000,000 unsigned longs.

This mostly is a memory issue. For a vector, you need contiguous memory allocation [so that it can keep up with its promise of constant time lookup]. In your case, with an 8 byte double, you are basically requesting your machine to give you around 762 mb of memory, in a single block.
I don't know which problem you're solving, but it looks like you're solving Bytelandian coins. For this, it is much better to use a map, because:
You will mostly not be storing the values for all 100000000 cases in a test case run. So, what you need is a way to allocate memory for only those values that you are actually memoize.
Even if you are, you have no need for a constant time lookup. Although it would speed up your program, std::map uses trees to give you logarithmic look up time. And it does away with the requirement of using up 762 mb contiguously. 762 mb is not a big deal, but expecting in a single block is.
So, the best thing to use in your situation is an std::map. In your case, actually just replacing std::vector<unsigned long> by std::map<int, unsigned long> would work as map also has [] operator access [for the most part, it should].

Related

Performance gap between vector<bool> and array

I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n.
So I first came up with some code:
int countPrimes(int n) {
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)
{
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
}
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
}
which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:
int countPrimes(int n) {
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)
{
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
}
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
}
which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.
EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool> with vector<char>.
std::vector<bool> isn't like any other vector. The documentation says:
std::vector<bool> is a possibly space-efficient specialization of
std::vector for the type bool.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).
Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).
As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
I'd like to add some remarks to the good answers already posted.
The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.
See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.
Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
{
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
{
if ( sieve[i] == 0 )
{
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
}
}
return result;
}
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:
vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16
vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52
bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n) {
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++) {
if(flag[i]==1) {
for(size_t j=i;i*j<n;j++) {
flag[i*j]=0;
}
}
}
size_t result=0;
for(size_t i=2;i<n;i++) {
result+=flag[i];
}
return result;
}
int main() {
{
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize) {
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
}
}
printf("%zu\n", countPrimes(10e6));
return 0;
}

Recursive code exits with exit code 3221225477

I am new to c++ programming and StackOverflow, but I have some experience with core Java. I wanted to participate in programming Olympiads and I choose c++ because c++ codes are generally faster than that of an equivalent Java code.
I was solving some problems involving recursion and DP at zonal level and I came across this question called Sequence game
But unfortunately my code doesn't seem to work. It exits with exit code 3221225477, but I can't make anything out of it. I remember Java did a much better job of pointing out my mistakes, but here in c++ I don't have a clue of what's happening. Here's the code btw,
#include <iostream>
#include <fstream>
#include <cstdio>
#include <algorithm>
#include <vector>
#include <set>
using namespace std;
int N, minimum, maximum;
set <unsigned int> result;
vector <unsigned int> integers;
bool status = true;
void score(unsigned int b, unsigned int step)
{
if(step < N)
{
unsigned int subtracted;
unsigned int added = b + integers[step];
bool add_gate = (added <= maximum);
bool subtract_gate = (b <= integers[step]);
if (subtract_gate)
subtracted = b - integers[step];
subtract_gate = subtract_gate && (subtracted >= minimum);
if(add_gate && subtract_gate)
{
result.insert(added);
result.insert(subtracted);
score(added, step++);
score(subtracted, step++);
}
else if(!(add_gate) && !(subtract_gate))
{
status = false;
return;
}
else if(add_gate)
{
result.insert(added);
score(added, step++);
}
else if(subtract_gate)
{
result.insert(subtracted);
score(subtracted, step++);
}
}
else return;
}
int main()
{
ios_base::sync_with_stdio(false);
ifstream input("input.txt"); // attach to input file
streambuf *cinbuf = cin.rdbuf(); // save old cin buffer
cin.rdbuf(input.rdbuf()); // redirect cin to input.txt
ofstream output("output.txt"); // attach to output file
streambuf *coutbuf = cout.rdbuf(); // save old cout buffer
cout.rdbuf(output.rdbuf()); // redirect cout to output.txt
unsigned int b;
cin>>N>>b>>minimum>>maximum;
for(unsigned int i = 0; i < N; ++i)
cin>>integers[i];
score(b, 0);
set<unsigned int>::iterator iter = result.begin();
if(status)
cout<<*iter<<endl;
else
cout<<-1<<endl;
cin.rdbuf(cinbuf);
cout.rdbuf(coutbuf);
return 0;
}
(Note: I intentionally did not use typedef).
I compiled this code with mingw-w64 in a windows machine and here is the Output:
[Finished in 19.8s with exit code 3221225477] ...
Although I have an intel i5-8600, it took so much time to compile, much of the time was taken by the antivirus to scan my exe file, and even sometimes it keeps on compiling for long without any intervention from the anti-virus.
(Note: I did not use command line, instead I used used sublime text to compile it).
I even tried tdm-gcc, and again some other peculiar exit code came up. I even tried to run it on a Ubuntu machine, but unfortunately it couldn't find the output file. When I ran it on a Codechef Online IDE, even though it did not run properly, but the error message was less scarier than that of mingw's.
It said that there was a run-time error and "SIGSEGV" was displayed as an error code. Codechef states that
A SIGSEGV is an error(signal) caused by an invalid memory reference or
a segmentation fault. You are probably trying to access an array
element out of bounds or trying to use too much memory. Some of the
other causes of a segmentation fault are : Using uninitialized
pointers, dereference of NULL pointers, accessing memory that the
program doesn’t own.
It's been a few days that I am trying to solve this, and I am really frustrated by now. First when i started solving this problem I used c arrays, then changed to vectors and finally now to std::set, while hopping that it will solve the problem, but nothing worked. I tried a another dp problem, and again this was the case.
It would be great if someone help me figure out what's wrong in my code.
Thanks in advance.
3221225477 converted to hex is 0xC0000005, which stands for STATUS_ACCESS_VIOLATION, which means you tried to access (read, write or execute) invalid memory.
I remember Java did a much better job of pointing out my mistakes, but here in c++ I don't have a clue of what's happening.
When you run into your program crashing, you should run it under a debugger. Since you're running your code on Windows, I highly recommend Visual Studio 2017 Community Edition. If you ran your code under it, it would point exact line where the crash happens.
As for your crash itself, as PaulMcKenzie points out in the comment, you're indexing an empty vector, which makes std::cin write into out of bounds memory.
integers is a vector which is a dynamic contiguous array whose size is not known at compile time here. So when it is defined initially, it is empty. You need to insert into the vector. Change the following:
for(unsigned int i = 0; i < N; ++i)
cin>>integers[i];
to this:
int j;
for(unsigned int i = 0; i < N; ++i) {
cin>> j;
integers.push_back(j);
}
P.W's answer is correct, but an alternative to using push_back is to pre-allocate the vector after N is known. Then you can read from cin straight into the vector elements as before.
integers = vector<unsigned int>(N);
for (unsigned int i = 0; i < N; i++)
cin >> integers[i];
This method has the added advantage of only allocating memory for the vector once. The push_back method will reallocate if the underlying buffer fills up.

What is the fastest implementation for accessing and changing a long array of boolean?

I want to implement a very long boolean array (as a binary genome) and access some intervals to check if that interval is all true or not, and in addition I want to change some intervals value,
For example, I can create 4 representations:
boolean binaryGenome1[10e6]={false};
vector<bool> binaryGenome2; binaryGenome2.resize(10e6);
vector<char> binaryGenome3; binaryGenome3.resize(10e6);
bitset<10e6> binaryGenome4;
and access this way:
inline bool checkBinGenome(long long start , long long end){
for(long long i = start; i < end+1 ; i++)
if(binaryGenome[i] == false)
return false;
return true;
}
inline void changeBinGenome(long long start , long long end){
for(long long i = start; i < end+1 ; i++)
binaryGenome[i] = true;
}
vector<char> and normal boolean array (ass stores every boolean in a byte) both seem to be a poor choice as I need to be efficient in space. But what are the differences between vector<bool> and bitset?
Somewhere else I read that vector has some overhead as you can choose it's size and compile time - "overhead" for what - accessing? And how much is that overhead?
As I want to access array elements many times using CheckBinGenome() and changeBinGenome(), what is the fastest implementation?
Use std::bitset It's the best.
If the length of the data is known at compile time, consider std::array<bool> or std::bitset. The latter is likely to be more space-efficient (you'll have to measure whether the associated extra work in access times outweighs the speed gain from reducing cache pressure - that will depend on your workload).
If your array's length is not fixed, then you'll need a std::vector<bool> or std::vector<char>; there's also boost::dynamic_bitset but I've never used that.
If you will be changing large regions at once, as your sample implies, it may well be worth constructing your own representation and manipulating the underlying storage directly, rather than one bit at a time through the iterators. For example, if you use an array of char as the underlying representation, then setting a large range to 0 or 1 is mostly a memset() or std::fill() call, with computation only for the values at the start and end of the range. I'd start with a simple implementation and a good set of unit tests before trying anything like that.
It is (at least theoretically) possible that your Standard Library has specialized versions of algorithms for the iterators of std::vector<bool>, std::array<bool> and/or std::bitset that do exactly the above, or you may be able to write and contribute such specializations. That's a better path if possible - the world may thank you, and you'll have shared some of the maintenance responsibility.
Important note
If using std::array<bool>, you do need to be aware that, unlike other std::array<> instantiations, it does not implement the standard container semantics. That's not to say it shouldn't be used, but make sure you understand its foibles!
E.g., checking whether all the elements are true
I am really NOT sure whether this will give us more overheads than speedup or not. Actually I think that nowadays CPU can do this quite fast, are you really experiencing a poor performance? (or is this just a skeleton of your real problem?)
#include <omp.h>
#include <iostream>
#include <cstring>
using namespace std;
#define N 10000000
bool binaryGenome[N];
int main() {
memset(binaryGenome, true, sizeof(bool) * N);
int shouldBreak = 0;
bool result = true;
cout << result << endl;
binaryGenome[9999995] = false;
bool go = true;
uint give = 0;
#pragma omp parallel
{
uint start, stop;
#pragma omp critical
{
start = give;
give += N / omp_get_num_threads();
stop = give;
if (omp_get_thread_num() == omp_get_num_threads() - 1)
stop = N;
}
while (start < stop && go) {
if (!binaryGenome[start]) {
cout << start << endl;
go = false;
result = false;
}
++start;
}
}
cout << result << endl;
}

which data type should we use for taking input a number which is in between 0<= number <= 10^18

in this below code , i am getting run time error , i want to input a number which is in between 0<= a <= 10^18 , a is number , also which data type should we use for taking such a big number like 10^18 , help
#include<iostream>
#include<algorithm>
#include<string.h>
using namespace std;
main()
{
int flag=1,cases;
long int j,i;
unsigned long long int a;
cin>>a;
if(a==0)
{
cout<<"yes";
}
else
{
unsigned long long int temp[a];
temp[0]=0;
cases=1;
i=1;
while(temp[i]!=a)
{
if(cases==1)
temp[i]=temp[i-1]+1;
else if(cases==2)
temp[i]=temp[i-1]+2;
else if(cases==3)
temp[i]=temp[i-1]+3;
cases++;
if(cases>3)
cases=1;
i++;
}
for(j=0;j<i;j++)
{
if(temp[j]==a)
{
cout<<"yes";
break;
}
if(j==i-1)
flag=0;
}
if(flag==0)
cout<<"no";
}
}
You are using temp without initializing it. You do set the first element to zero but then you start iterating from 1. Furter how should temp of any index be to equal a? Unlikely - so your while will not terminate. Sooner or later you will get a run time error.
Try adding
temp[a-1] = a;
Before the while loop
Or perhaps you actually want the while to be
while(i <a)
You're already using a unsigned long long int, which is large enough for 18446744073709551615 (2^64 - 1) or greater*
http://www.cplusplus.com/reference/climits/
My guess is your runtime error is resulting from something other than this.
Be wary of using a variable length array though. Think about how much memory it would take to allocate 10^18 unsigned long long int's for your array? Assuming that it maps to a 64 bit type : 8 bytes per * (10^18) is ~ 7105 PetaBytes.
Let's say you are using unsigned long long int vector.
vector<unsigned long long int> temp ;
if you write cout<<temp.max_size(); it will give you how many elements vector can theoretically manage depending on your machine(((e.g. if you write the same in computer where RAM is bigger, it will give you another number))).It doesn't know how much memory will actually be available when you run the program.Let's say max_size() gives us some number,578910255011, So let's say your a=10^10, unsigned long long = 8 byte,(10^10*8)/1024/1024/1024 = 9.31GB. So your RAM should be equal to 9.31Gb, that's why it's a runtime error, because RAM is the memory used by running programs. So to solve this problem you should either increase your RAM which for 10^12 unsigned long long elements=931Gb and would cost a fortune :) (if your computer can even support that much) or use something like STXXL

How could I parallelize the following loop USING C++ AMP?

I have the following loop in c++
dword result = 0;
for ( int i = 0; i < 16; i++ ) {
result |= ( value[i] << (unsigned int)( i << 1 ) );
}
And I would like to parallelize it in amp. I know it might go slower then the actual non-parallelized version above, but I want to do it to learn something more about AMP.
My idea was to loop trough the value array in parallel:
And fill a new array with newarray[0] = value[0] << (unsigned int)(0 << 1 ), newarray[1] = value[1] << (unsigned int)(1 << 1 ), etc. Then I would OR the values in the array in parallel in a tree structure (see image).
I have tried to put this idea in some simple c++ amp code, but I don't succeed in it, so any help would be appreciated.
Thank you for your consideration of this matter, I look forward to a response.
The following code is part of what I think you need. This code will take a number of elements as input and preps the vector on the CPU, then it does the bit shift operations in parallel on the GPU. Then I set av[elements] back to 0 because I am using that element to store your final result. It's rough, but AMP is pretty restrictive about what data types can be processed on the GPU, so I just use an extra element of the existing array for it. After the bit shifting is done, I do another parallel for each for the bitwise OR function. This one also happens on the GPU, but it is less satisfactory because every operation is ORing any given element of the array with exactly the av[elements] element, so that will create a bottleneck. Your tree structure will make this part run much more quickly, but I was unable to figure out how to do that part easily. As it is, this program can process 100 million elements in a couple seconds on a fairly old computer. Apologies in advance for any best-practice violations in the code; I am a novice as well. The code follows:
#include <conio.h>
#include <amp.h>
#include <iostream>
using namespace concurrency;
using namespace std;
unsigned int doParallel(unsigned int);
unsigned int elements;
void main()
{
int ch=NULL;
cout<<"\nHow many elements to populate: ";
cin>>elements;
cout<<"The result is: "<<doParallel(elements);
cout<<"\nPress 'X' to exit.";
do
{
ch=_getch();
} while (ch!='X' && ch!='x');
exit(0);
}
unsigned int doParallel(unsigned int elements)
{
vector<unsigned int> v(elements+1);
for (unsigned int i = 0; i<elements+1;i++)
{
v[i]=i;
}
array_view<unsigned int,1> av(elements+1,v);
parallel_for_each(av.extent,[=](index<1> idx)
restrict(amp)
{
av[idx] = static_cast<unsigned int>(av[idx])<<1;
});
av[elements]=0;
parallel_for_each(av.extent,[=](index<1> idx)
restrict(amp)
{
av[elements] |= static_cast<unsigned int>(av[idx]);
});
return av[elements];
}