Why can't I insert 6 million elements in STL set? - c++

I am trying to insert a little over 6.5 million elements(ints) in an stl set. Here is the code:
set<int> s;
cout << s.max_size() << endl;
for(int i = 0; i < T.MULT * T.MAXP; i++) {
s.insert(a[i]);
}
T.MULT is 10; T.MAXP is 666013.
a is an array - statically allocated - (int a[T.MULT * T.MAXP];) that contains distinct elements.
After about 4.6 million elements s.insert() throws a bad_alloc exception. The resource monitor available on Windows 7 says I have 3 GB free memory left.
What am I doing wrong? Why can't STL set allocate the memory?
Edit: Here is the full code: http://ideone.com/rdrEnt
Edit2: apparently the inserted elements might not be distinct after all, but that should not be a problem.
Edit3: Here is a simplified version of the code: http://ideone.com/dTp0fZ

The problem actually lies in the fact that you statically allocated the array A with more than 6.5 million elements, which corrupts your program stack space. If you allocate the array on the heap, it actually works. I did some code change based on your description, it worked fine.
int *A = new int[T.MULT * T.MAXP];
for (int i= 0; i < T.MULT * T.MAXP; ++i)
{
A[i] = i; //for simplicity purpose, your array may have different elem. values
}
set<int> s;
for (int i = 0; i < T.MULT * T.MAXP; ++i )
{
s.insert(A[i]);
}
cout << s.size();
set<int>::iterator iter;
int count = 0;
for (iter = s.begin(); iter != s.end(); ++ iter)
{
cout << *iter << " ";
count ++;
if (count == 100)
{
cout <<endl;
count = 0;
}
}
delete [] A;
return 0;
It worked perfectly fine with both vector and set. It can print all those 6.6 million elements on the screen.
As other posts indicated, you may also want to try STXXL if you have interest.

You might want to take a look at STXXL.

While I can't answer your question directly, I think it is more efficient to store your data in a std::vector, sort it, and then use std::binary_search to test for the existence of the item. Storage in a std::set is relatively expensive compared to that of std::vector. That's because there is some overhead when storing each element.
As an example, here's how you could do it. This sorts the static array.
std::sort(a,a+(T.MULT*T.MAXP));
bool existence=std::binary_search(a,a+(T.MULT*T.MAXP),3);
Fast and easy.

Related

heap-buffer-overflow issue in array C++ [duplicate]

So the user inputs values within the for loop and the vector pushes it back, creating its own index. The problem arises in the second for loop, I think it has to do something with sizeof(v)/sizeof(vector).
vector<int> v;
for (int i; cin >> i;)
{
v.push_back(i);
cout << v.size() << endl;
}
for (int i =0; i < sizeof(v)/sizeof(vector); i++)
{
cout << v[i] << endl;
}
How will I determine the size of the vector after entering values?
(I'm quite new to C++ so If I have made a stupid mistake, I apologize)
Use the vector::size() method: i < v.size().
The sizeof operator returns the size in bytes of the object or expression at compile time, which is constant for a std::vector.
How will I determine the size of the vector after entering values?
v.size() is the number of elements in v. Thus,
another style for the second loop, which is easy to understand
for (int i=0; i<v.size(); ++i)
A different aspect of the 'size' function you might find interesting:
on Ubuntu 15.10, g++ 5.2.1,
Using a 32 byte class UI224, the sizeof(UI224) reports 32 (as expected)
Note that
sizeof(std::vector<UI224>) with 0 elements reports 24
sizeof(std::vector<UI224>) with 10 elements reports 24
sizeof(std::vector<UI224>) with 100 elements reports 24
sizeof(std::vector<UI224>) with 1000 elements reports 24
Note also, that
sizeof(std::vector<uint8_t> with 0 elements reports 24
(update)
Thus, in your line
for (int i =0; i < sizeof(v) / sizeof(vector); i++)
^^^^^^^^^ ^^^^^^^^^^^^^^
the 2 values being divided are probably not what you are expecting.
http://cppreference.com is a great site to look-up member functions of STL containers.
That being said you are looking for the vector::size() member function.
for (int i = 0; i < v.size(); i++)
{
cout << v[i] << endl;
}
If you have at your disposal a compiler that supports C++11 onwards you can use the new range based for loops:
for(auto i : v)
{
cout << i << endl;
}
A std::vector is a class. It's not the actual data, but a class that manages it.
Use std::vector.size() to get the size of the actual data.
Coliru example:
http://coliru.stacked-crooked.com/a/de0bffb1f4d8c836

Why is my C++ vector resizing and appending a new vector instead of changing in place?

I'm doing some LeetCode questions, and I'm not sure why my vector is resizing. Here's relevant portions of my code:
void turnToString(std::vector<int> & charFreq, std::string & freqStr)
{
for(int i : charFreq)
freqStr.append(std::to_string(i));
std::cout << freqStr << std::endl;
}
std::vector<int> charFreq (26,0);
for(int i = 0, j = p.size() - 1; j < s.size(); i++, j++)
{
charFreq[s[j] -'a']++;
turnToString(charFreq, str);
if(str == freqStr)
res.push_back(i);
charFreq[s[i]-'a']--;
}
Everything compiles fine, but in my turnToString() function, when I print out the frequency vector as a string, it keeps doubling:
output
I'm not sure why it's acting like this. My intention was for the vector to stay a size of 26 and change the frequencies in place as I iterate through 's'. Instead, it's appending a new frequency array to my vector. I know I can fix this with just using a regular array, but thought I'd use this as a learning opportunity. Why is this happening and what can I do to fix it w/o using a different data structure?
For each entry in charFreq, you append that entry to freqStr:
freqStr.append(std::to_string(i));
so str (outside the function) will grow more and more.

C++ Persistent Vector, fill vector with data from a text file

i am currently trying to learn some C++ and now i got stuck in an exercise with vectors. So the task is to read ints from a text file and store them in the vector which should be dynamic.
I guess there is something wrong with the while-loop?
If I start this, the program fails and if I set the vector size to 6, I get
6 0 0 0 0 0 as output.
Thanks for any hints.
int main()
{
const string filename = "test.txt";
int s = 0;
fstream f;
f.open(filename, ios::in);
vector<int> v;
if (f){
while(f >> s){
int i = 0;
v[i] = s;
i = i+1;
}
f.close();
}
for(int i = 0; i < 6; i++){
cout << v[i] << "\n";
}
}
You don't grow the vector. It is empty and cannot hold any ints. You'll need to either resize it every time you want to add another int or you use push_back which automatically enlarges the vector.
You set i = 0 for every iteration so you would change the first value of the vector every iteration instead of the next one.
Go for:
v.push_back(s);
in your loop and
for(int i = 0; i < v.size(); i++) { // ...
Remark:
You normally don't hardcode vector sizes/bounds. One major point about using std::vector is its ability to behave dynamically with respect to its size. Thus, the code dealing with vectors should not impose any restrictions about the size of the vector onto the respective object.
Example:
for(int i = 0; i < 6; i++){ cout << v[i] << "\n"; }
requires the vector to have at least 6 elements, otherwise (less than 6 ints) you access values out of bounds (and you potentially miss elements if v contains more than 6 values).
Use either
for(int i = 0; i < v.size(); i++){ cout << v[i] << "\n"; }
or
for(std::vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
{
cout << *i << "\n";
}
or
for(auto i = v.begin(); i != v.end(); ++i)
{
cout << *i << "\n";
}
or
for(int x : v){ cout << x << "\n"; }
or
for(auto && x : v){ cout << x << "\n"; }
or
std::for_each(v.begin(), v.end(), [](int x){ std::cout << x << "\n"; });
or variants of the above which possibly pre-store v.size() or v.end()
or whatever you like as long as you don't impose any restriction on the dynamic size of your vector.
The issue is in the line i= 0. Fixing that will give an issue in the line v[i] = s.
You always initialise i to 0 in the while loop, and that is responsible for the current output. You should shift it out of the while loop.
After fixing that, you have not allocated memory to that vector, and so v[i] doesn't make sense as it would access memory beyond bounds. This will give a segmentation fault. Instead, it should be v.push_back(i), as that adds elements to the end of a vector, and also allocates memory if needed.
If you are using std::vector you can use v.push_back(i) to fill this vector
Error is this line int i = 0;
because you declare i=0 every time in while-loop.
To correct this move this line outside from loop.
Note: this will work, if you declare v like normal array for example int v[101]
When you use std vectors you can just push element at the end of vector with v.push_back(element);
v[i] = s; //error,you dont malloc room for vector
change into : v.push_back(s);

Removing duplicate entries in an array (C++)

I'm having an issue in which a function that in theory should remove all duplicate values from an array doesn't work. Here's how it works:
I have two arrays, and then I populate them with random numbers
between 0 and 50 inclusive.
I sort the array values in order using a sort function
I then run my dedupe function
I sort the array values in order again
I then output the values in both arrays
The problem is, the loop in the dedupe function is ran 19 times regardless of how many duplicate entries it finds, which is extremely strange. Also, it still gives duplicates.
Any ideas? Thanks!
int* dedupe(int array[ARRAY_SIZE]) //remove duplicate array values and replace with new values.
{ bool dupe = false;
while(dupe!=true)
{
for(int j=0; j<ARRAY_SIZE; j++)
{ if(array[j] == array[j+1])
{ array[j] = rand();
array[j] = array[j] % 51;
dupe = false;
}
else { dupe = true; // the cout part is for debugging
cout << dupe << endl; }
}
} return array;
}
int main()
{
int a[9], b[9];
srand(time(0));
populate(b);
populate(a);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
dedupe(a);
dedupe(b);
sort(a,ARRAY_SIZE);
sort(b,ARRAY_SIZE);
for(int i=0; i<10; i++)
{ cout << "a[" << i << "] = " << a[i] << "\t\t" << "b[" << i << "] = " << b[i] << endl; }
return 0;
}
Nothing suggested so far has solved the problem. Does anyone know of a solution?
You're not returning from inside the for loop... so it should run exactly ARRAY_SIZE times each time.
The problem that you want to solve and the algorithm that you provided do not really match. You do not really want to remove the duplicates, but rather guarantee that all the elements in the array are different, the difference being that by removing duplicates the number of elements in the array would be less than the size of the array, but you want a full array.
I don't know what the perfect solution would be (algorithmically), but one simple answer would be creating an array of all the values in the valid range (since the range is small), shuffling it and then picking up the first N elements. Think of this as using cards to pick the values.
const int array_size = 9;
void create_array( int (&array)[array_size] ) {
const int max_value = 51;
int range[max_value];
for ( int i = 0; i < max_value; ++i ) {
range[i] = i;
}
std::random_shuffle( range, range+max_value );
std::copy_n( range, array_size, array );
}
This is not the most efficient approach, but it is simple, and with a small number of elements there should not be any performance issues. A more complex approach would be to initialize the array with the random elements in the range, sort and remove duplicates (actually remove, which means that the array will not be full at the end) and then continue generating numbers and checking whether they are new against the previously generated numbers.
Simplest approach is just comparing with every other value which is linear time but on an array of 9 elements linear time is small enough not to matter.
you are doing it wrong at
array[j] = rand();
array[j] = array[j] % 51
It will always have 1 to ARRAY SIZE!!

tallying elements in an array

So, I'm trying to tally the elements of an array. By this I mean, I have a large array, and each element will have multiples of itself throughout the array. I am trying to figure out how many times each element occurs, however I keep running into the issue of there being duplicate tallies. Since "x" could exist at 12 different places in the array, when I loop through it and keep a running sum, I get the tally for "x" 12 different times. Does anyone know of a simpler/better way to keep a tally of an array with no duplicates?
My code is:
where count is the number of elements
for(i=0;i<count;i++)
{
for(x=0; x<count;x++)
{
if(array[i]==array[x])
{
tallyz++;
}
}
tally[i]=tallyz-1;
tallyz=0;
}
}
std::map<X, unsigned> tally;
for(i = 0; i < count; ++i)
++tally[array[i]];
Note that this is best if the redundancy in the array is fairly high. If most items are unique you're probably better just sorting the array as others have mentioned.
If you can sort the array, simply sort it. Then all you have left is a linear scan of the elements, checking if the element behind this one is the same as the current element (don't forget bounds checking).
As an alternative to sorting, you could use a map:
template<class T, size_t N>
void printSums(T (array&)[N]) {
map<T, size_t> m;
for(T*p = array; p < array+N; ++p) {
++m[*p];
}
for(map<T,size_t>::iterator it = m.begin(); it != m.end(); ++it) {
cout << it->first << ": " << it->second << "\n";
}
}
Warning: this is untested code.
first use a map just as John said,then traverse the tally array:
std::map<X, unsigned> data;
for(i = 0; i < count; i++)
data[array[i]]++;
for(i = 0; i < count; i++)
tally[i]=data[tally[i]]-1;