Program killed: using vector of set of vector - c++

In my Algorithm, I need to keep all the combinations of (3 bytes of) extended ASCII characters. Following is my code But when i run this code, the program gets killed on terminal when the last step occurs(BigVector.pushback).Why is this so and what can be the alternative in my case?
vector<set<vector<int> > > BigVector;
set<vector<int> > SmallSet;
for(int k=0; k <256; k++)
{
for(int j=0; j <256; j++)
{
for(int m=0; m <256; m++)
{
vector<int> temp;
temp.push_back(k);
temp.push_back(j);
temp.push_back(m);
SmallSet.insert(temp);
}
}
}
BigVector.push_back(SmallSet);
P.S: I have to keep the ascii characters like this:
{ {(a,b,c) ,(a,b,d),...... (z,z,z)} }

Please note that 256^3 = 16,777,216. This is huge, especially when you use vector and set!
Because you only need to record 256 = 2^8 information, you can store this in a char ( one byte). You can store each combination in one tuple of three chars. The memory is now 16,777,216 / 1024 / 1024 = 16 MB. On my computer, it finishes in 1 second.
If you accept C++11, I would suggest using std::array, instead of writing a helper struct like Info in my old code.
C++11 code using std::array.
vector<array<char,3>> bs;
.... for loop
array<char,3> temp;
temp[0]=k; temp[1]=j; temp[2]=m;
bs.push_back(temp);
C++98 code using home-made struct.
struct Info{
char chrs[3];
Info ( char c1, char c2, char c3):chrs({c1,c2,c3}){}
};
int main() {
vector<Info> bs;
for (int k = 0; k < 256; k++) {
for (int j = 0; j < 256; j++) {
for (int m = 0; m < 256; m++) {
bs.push_back(Info(k,j,m));
}
}
}
return 0;
}
Ways to use the combinations. (You can write wrapper method for Info).
// Suppose s[256] contains the 256 extended chars.
for( auto b : bs){
cout<< s[b.chrs[0]] << " " << s[b.chrs[1]] << " "<< s[b.chrs[2]] << endl;
}

First: your example doesn't correspond with the actual code.
You are creating ( { (a,a,a), ..., (z,z,z) } )
As already mentioned you will have 16'777'216 different vectors. Every vector will hold the 3 characters and typically ~20 bytes[1] overhead because of the vector object.
In addition a typical vector implementation will reserve memory for future push_backs.
You can avoid this by specifying the correct size during initialization or using reserve():
vector<int> temp(3);
(capacity() tells you the "real" size of the vector)
push_back makes a copy of the object you are pushing [2], which might be too much memory and therefore crashing your program.
16'777'216 * (3 characters + 20 overhead) * 2 copy = ~736MiB.
(This assumes that the vectors are already initialized with the correct size!)
See [2] for a possible solution to the copying problem.
I do agree with Potatoswatter: your data structure is very inefficient.
[1] What is the overhead cost of an empty vector?
[2] Is std::vector copying the objects with a push_back?

Related

Function behaves badly when passing dynamically allocated pointer

I have this function
void shuffle_array(int* array, const int size){
/* given an array of size size, this is going to randomly
* attribute a number from 0 to size-1 to each of the
* array's elements; the numbers don't repeat */
int i, j, r;
bool in_list;
for(i = 0; i < size; i++){
in_list = 0;
r = mt_lrand() % size; // my RNG function
for(j = 0; j < size; j++)
if(array[j] == r){
in_list = 1;
break;
}
if(!in_list)
array[i] = r;
else
i--;
}
}
When I call this function from
int array[FIXED_SIZE];
shuffle_array(array, FIXED_SIZE);
everything goes all right and I can check the shuffling was according to expected, in a reasonable amount of time -- after all, it's not that big of an array (< 1000 elements).
However, when I call the function from
int *array = new int[dynamic_size];
shuffle_array(array, dynamic_size);
[...]
delete array;
the function loops forever for no apparent reason. I have checked it with debugging tools, and I can't say tell where the failure would be (in part due to my algorithm's reliance on random numbers).
The thing is, it doesn't work... I have tried passing the array as int*& array, I have tried using std::vector<int>&, I have tried to use random_shuffle (but the result for the big project didn't please me).
Why does this behavior happen, and what can I do to solve it?
Your issue is that array is uninitialized in your first example. If you are using Visual Studio debug mode, Each entry in array will be set to all 0xCC (for "created"). This is masking your actual problem (see below).
When you use new int[dynamic_size] the array is initialized to zeros. This then causes your actual bug.
Your actual bug is that you are trying to add a new item only when your array doesn't already contain that item and you are looking through the entire array each time, however if your last element of your array is a valid value already (like 0), your loop will never terminate as it always finds 0 in the array and has already used up all of the other numbers.
To fix this, change your algorithm to only look at the values that you have put in to the array (i.e. up to i).
Change
for(j = 0; j < size; j++)
to
for(j = 0; j < i; j++)
I am going to guess that the problem lies with the way the array is initialized and the line:
r = mt_lrand() % size; // my RNG function
If the dynamically allocated array has been initialized to 0 for some reason, your code will always get stack when filling up the last number of the array.
I can think of the following two ways to overcome that:
You make sure that you initialize array with numbers greater than or equal to size.
int *array = new int[dynamic_size];
for ( int i = 0; i < dynnamic_size; ++i )
array[i] = size;
shuffle_array(array, dynamic_size);
You can allows the random numbers to be between 1 and size instead of between 0 and size-1 in the loop. As a second step, you can subtract 1 from each element of the array.
void shuffle_array(int* array, const int size){
int i, j, r;
bool in_list;
for(i = 0; i < size; i++){
in_list = 0;
// Make r to be betwen 1 and size
r = rand() % size + 1;
for(j = 0; j < size; j++)
if(array[j] == r){
in_list = 1;
break;
}
if(!in_list)
{
array[i] = r;
}
else
i--;
}
// Now decrement the elements of array by 1.
for(i = 0; i < size; i++){
--array[i];
// Debugging output
std::cout << "array[" << i << "] = " << array[i] << std::endl;
}
}
You are mixing C code with C++ memory allocation routines of new and delete. Instead stick to pure C and use malloc/free directly.
int *array = malloc(dynamic_size * sizeof(int));
shuffle_array(array, dynamic_size);
[...]
free(array);
On a side note, if you are allocating an array using the new[] operator in C++, use the equivalent delete[] operator to properly free up the memory. Read more here - http://www.cplusplus.com/reference/new/operator%20new[]/

Define an array which the number of rows(clos) is unknown in C++

I have a 2048x2048 matrix of grayscale image,i want to find some points which value are > 0 ,and store its position into an array of 2 columns and n rows (n is also the number of founded points) Here is my algorithm :
int icount;
icount = 0;
for (int i = 0; i < 2048; i++)
{
for (int j = 0; j < 2048; j++)
{
if (iout.at<double>(i, j) > 0)
{
icount++;
temp[icount][1] = i;
temp[icount][2] = j;
}
}
}
I have 2 problems :
temp is an array which the number of rows is unknown 'cause after each loop the number of rows increases ,so how can i define the temp array ? I need the exact number of rows for another implementation later so i can't give some random number for it.
My algorithm above doesn't work,the results is
temp[1][1]=0 , temp[1][2]=0 , temp[2][1]=262 , temp[2][2]=655
which is completely wrong,the right one is :
temp[1][1]=1779 , temp[1][2]=149 , temp[2][1]=1780 , temp[2][2]=149
i got the right result because i implemented it in Matlab, it is
[a,b]=find(iout>0);
How about a std::vector of std::pair:
std::vector<std::pair<int, int>> temp;
Then add (i, j) pairs to it using push_back. No size needed to be known in advance:
temp.push_back(make_pair(i, j));
We'll need to know more about your problem and your code to be able to tell what's wrong with the algorithm.
When you define a variable of pointer type, you need to allocate memory and have the pointer point to that memory address. In your case, you have a multidimensional pointer so it requires multiple allocations. For example:
int **temp = new int *[100]; // This means you have room for 100 arrays (in the 2nd dimension)
int icount = 0;
for(int i = 0; i < 2048; i++) {
for(int j = 0; j < 2048; j++) {
if(iout.at<double>(i, j) > 0) {
temp[icount] = new int[2]; // only 2 variables needed at this dimension
temp[icount][1] = i;
temp[icount][2] = j;
icount++;
}
}
}
This will work for you, but it's only good if you know for sure you're not going to need any more than the pre-allocated array size (100 in this example). If you know exactly how much you need, this method is ok. If you know the maximum possible, it's also ok, but could be wasteful. If you have no idea what size you need in the first dimension, you have to use a dynamic collection, for example std::vector as suggested by IVlad. In case you do use the method I suggested, don't forget to free the allocated memory using delete []temp[i]; and delete []temp;

C/C++ How a 3 dimensional array is stored in memory and what is the fastest way to traverse it

I am trying to understand how a 3 dimensional array is stored in memory and the difference between how std:vector is stored.
This is the way I understand that they are stored, and std::vectors, same way, with the difference that they make full use of memory blocks
a[0][0][0] a[0][0][1] a[0][0][2]... a[0][1][0] a[0][1][1] ... a[1][0][0] a[1][0][1]...
My goal is to find which is the most efficient way to traverse and array.
For example, I have array:
v[1000][500][3];
so how is more efficient to traverse it?
for(i = 0; i < 1000; i++)
{
for(j = 0; j < 500; j++)
{
for(k = 0; k < 3; ++k)
{
//some operation
}
}
}
or may be it would be more efficient to declare the array as;
v[3][500][1000]
and to traverse as
for(i = 0; i < 3; i++) {
for(j = 0; j < 500; j++)
{
for(k = 0; k < 1000; ++k)
{
//some operation
}
} }
Is there any CL tool to visualize how arrays are stored?
You're right in your representation of arrays in memory values are contiguous. So an int v[2][2][2] initialized to 0 would look like:
[[[0, 0], [0, 0]], [[0, 0], [0, 0]]]
As far as performance goes you want to access data as close to each other as possible to avoid data cache misses so iterating on the outer most dimension first is a good thing since they are located next to each other.
Something that might happen though with your first example is the compiler might optimize the inner loop(if right conditions are met) and unroll it so you would save some time there by skipping branching.
Since both your example are already iterating in the right way, I would say profile it and see which is faster.
std::vector also store its element contiguous in memory but since it is 1 dimension, locality apply by default(provided you aren't iterating randomly). The good side of vector is they can grow whereas an array can't(automatically anyway).
When the memory address is continuous (e.g., complied time array a[][][]), the most efficient way to traverse a multidimensional array is use a pointer. The a[i][j][k] actually is &a[0][0][0]+(i*j*k + j*k + k). Thus, initialize a pointer p to the beginning address, then calls *(p++)
int main() {
int a[2][3]={{1,2,3},{4,5,6}};
int *p = &a[0][0];
for( int i=0; i<6; ++i ){
cout<<*(p++)<<endl;
}
return 0;
}
To make it visible:
#include <iostream>
int main()
{
int a[][3] = { { 0, 1, 2 }, { 3, 4, 5 } };
int* p = reinterpret_cast<int*>(a);
for(unsigned i = 0; i < 6; ++i) {
std::cout << *(p + i);
}
std::cout << std::endl;
return 0;
}
Shows a row major order - see: http://en.wikipedia.org/wiki/Row-major_order
Having this, you should iterate per row to utilize the cache. In higher dimension N you will get similar, where each element represents a block of data with a dimension N-1

printing a 2d array read from a txt file displays wrong in c++

i am new to c++ and i would really appreciate some help with my following problem:
i have dynamically allocated space for a 2d int array of N rows and two columns the following way :
int **input;
input = new int *[N];
for (int count = 0; count < N; count++)
input[count] = new int[2];
when i print its contents in the while-loop in which i "fill" the array the actual contents are printed :
while (!myfileafter.eof())
{
int temp1,temp2;
int i=0;
int j=0;
myfileafter >> temp1>>temp2;
input[i][j]=temp1;
input [i][j+1] = temp2;
i++;
j=0;
cout<<input[i-1][j]<<" "<<input[i-1][j+1]<<endl;
}
// for (int p=0;p<N;p++)
// cout<<input[p][0]<<" "<<input[p][1]<<endl;
however , if i use the two commented-out lines just after the while loop the array seems to contain totally different contents than the right ones printed before and this is the cause of many problems in the rest of the programm . any idea how can this be solved?
It seems that the contents of the file do not match with the length of your array.
Try this:
int temp1,temp2;
int i=0;
int j=0;
while ( i < N && myfileafter >> temp1>> temp2 )
{
input[i][j]=temp1;
input[i][j+1] = temp2;
i++;
j=0;
cout<<input[i-1][j]<<" "<<input[i-1][j+1]<<endl;
}
// Note the termination condition. It is uncertain whether all N locations have been filled.
for (int p=0;p<i;p++)
cout<<input[p][0]<<" "<<input[p][1]<<endl;
EDIT: Instead of using a 2D Nx2 array, I would suggest you to use 2 1D arrays to avoid possible errors and for code clarity. Or better still, use two 1D vectors.
With pointers you will have to take care of deleting the allocating memory.
At the beginning of the loop you set i to zero, so you're always reading into input[0].
It's better to use the actual reading as the condition:
int i = 0;
while ( my_file_after >> input[i][0] >> input[i][1] ) ++i;
The first thing I'd recommend is to practice formatting your code more consistently. There are some accepted conventions that can make your code a lot more readable. Only changing formatting, I'd recommend something like this:
while (!myfileafter.eof())
{
int temp1,temp2;
int i=0;
int j=0;
myfileafter >> temp1 >> temp2;
input[i][j] = temp1;
input [i][j+1] = temp2;
i++;
j=0;
cout << input[i-1][j] << " " << input[i-1][j+1] << endl;
}
//for (int p=0; p < N; p++)
//{
// cout << input[p][0] << " " << input[p][1] << endl;
//}
I'm sure it's also possible that you formatted it correctly and it just got messed up when you entered it here, but proper formatting can make a world of difference.
Moving on... gahh! Carl beat me to it: you're overwriting input[0] every time.
The only thing I'll add is that the cout inside your loop is a bit deceptive because it will print out what you expect it to, but it's printing it from input[0][0] and input[0][1] every time.
Ok, there are other answers here that explain what is wrong with your code specifically, but I'll also add some other information about how you're approaching the array allocation itself.
Typically speaking, an array (unless it's some STL or other intelligent/class array) is a contiguous piece of memory. Then an additional array points to that. In other words, for foo[X][Y] you're creating foo[X] first then adding the [Y] component after the fact, individually, instead of creating a contiguous [X*Y] piece of memory then having each foo[X] element point to the first element of each [Y]. Visually, you're doing this:
foo -|
|
|
| [...]
then each int individually like
foo -| -- _
|_
| -- _
|_
| -- _
|_
| [...]
when you should be allocating the entire int chunk as one piece because 1) if you do a lot of small allocations like this it tends to kill performance (not that important for you here I don't think) and b) pointer arithmetic will actually work if the array is properly created.
Let's say you have the following chunks:
foo is an array of int* that starts at ADDRESS_X and is 4 elements long. For your example you need 4 elements * 2 columns = 8 ints total. So you create a contiguous 8 int long array that starts at ADDRESS_Y. You'd then want to do this (pseudocode-ish here):
int **foo = new int *[SIZE_OF_X]; // an array of 4 pointers
int *bar = new int[SIZE_OF_X * NUMBER_OF_COLUMNS]; // in other words, 8 ints
for (i = 0; i < SIZE_OF_X; i++) {
foo[i] = &bar + (i * SIZE_OF_X);
}
Where bar is:
ADDRESS_Y + 0: 10
ADDRESS_Y + 1: 20
ADDRESS_Y + 2: 30
...
And foo is:
ADDRESS_X + 0: ADDRESS_Y
ADDRESS_X + 1: ADDRESS_Y + 2
...
so foo[1][0] == 30.

Vector of Vector Initialization

I am having a tough time getting my head wrapped around how to initialize a vector of vectors.
typedef vector< vector < vector < vector< float > > > > DataContainer;
I want this to conform to
level_1 (2 elements/vectors)
level_2 (7 elements/vectors)
level_3 (480 elements/vectors)
level_4 (31 elements of float)
Addressing the elements isn't the issue. That should be as simple as something like
dc[0][1][2][3];
The problem is that I need to fill it with data coming in out of order from a file such that successive items need to be placed something like
dc[0][3][230][22];
dc[1][3][110][6]; //...etc
So I need to initialize the V of V beforehand.
Am I psyching myself out or is this as simple as
for 0..1
for 0..6
for 0..479
for 0..30
dc[i][j][k][l] = 0.0;
It doesn't seem like that should work. Somehow the top level vectors must be initialized first.
Any help appreciated. I am sure this must be simpler than I am imagining.
Please do not use nested vectors if the size of your storage is known ahead of time, i.e. there is a specific reason why e.g. the first index must be of size 6, and will never change. Just use a plain array. Better yet, use boost::array. That way, you get all the benefits of having a plain array (save huge amounts of space when you go multi-dimensional), and the benefits of having a real object instantiation.
Please do not use nested vectors if your storage must be rectangular, i.e. you might resize one or more of the dimensions, but every "row" must be the same length at some point. Use boost::multi_array. That way, you document "this storage is rectangular", save huge amounts of space and still get the ability to resize, benefits of having a real object, etc.
The thing about std::vector is that it (a) is meant to be resizable and (b) doesn't care about its contents in the slightest, as long as they're of the correct type. This means that if you have a vector<vector<int> >, then all of the "row vectors" must maintain their own separate book-keeping information about how long they are - even if you want to enforce that they're all the same length. It also means that they all manage separate memory allocations, which hurts performance (cache behaviour), and wastes even more space because of how std::vector reallocates. boost::multi_array is designed with the expectation that you may want to resize it, but won't be constantly resizing it by appending elements (rows, for a 2-dimensional array / faces, for a 3-dimensional array / etc.) to the end. std::vector is designed to (potentially) waste space to make sure that operation is not slow. boost::multi_array is designed to save space and keep everything neatly organized in memory.
That said:
Yes, you do need to do something before you can index into the vector. std::vector will not magically cause the indexes to pop into existence because you want to store something there. However, this is easy to deal with:
You can default-initialize the vector with the appropriate amount of zeros first, and then replace them, by using the (size_t n, const T& value = T()) constructor. That is,
std::vector<int> foo(10); // makes a vector of 10 ints, each of which is 0
because a "default-constructed" int has the value 0.
In your case, we need to specify the size of each dimension, by creating sub-vectors that are of the appropriate size and letting the constructor copy them. This looks like:
typedef vector<float> d1;
typedef vector<d1> d2;
typedef vector<d2> d3;
typedef vector<d3> d4;
d4 result(2, d3(7, d2(480, d1(31))));
That is, an unnamed d1 is constructed of size 31, which is used to initialize the default d2, which is used to initialize the default d3, which is used to initialize result.
There are other approaches, but they're much clumsier if you just want a bunch of zeroes to start. If you're going to read the entire data set from a file, though:
You can use .push_back() to append to a vector. Make an empty d1 just before the inner-most loop, in which you repeatedly .push_back() to fill it. Just after the loop, you .push_back() the result onto the d2 which you created just before the next-innermost loop, and so on.
You can resize a vector beforehand with .resize(), and then index into it normally (up to the amount that you resized to).
You would probably have to set a size or reserve memory
Could you do a for-each or a nested for that would call
myVector.resize(x); //or size
on each level.
EDIT: I admit this code is not elegant. I like #Karl answer which is the right way to go.
This code is compiled and tested. It printed 208320 zeroes which is expected (2 * 7 * 480 * 31)
#include <iostream>
#include <vector>
using namespace std;
typedef vector< vector < vector < vector< float > > > > DataContainer;
int main()
{
const int LEVEL1_SIZE = 2;
const int LEVEL2_SIZE = 7;
const int LEVEL3_SIZE = 480;
const int LEVEL4_SIZE = 31;
DataContainer dc;
dc.resize(LEVEL1_SIZE);
for (int i = 0; i < LEVEL1_SIZE; ++i) {
dc[i].resize(LEVEL2_SIZE);
for (int j = 0; j < LEVEL2_SIZE; ++j) {
dc[i][j].resize(LEVEL3_SIZE);
for (int k = 0; k < LEVEL3_SIZE; ++k) {
dc[i][j][k].resize(LEVEL4_SIZE);
}
}
}
for (int i = 0; i < LEVEL1_SIZE; ++i) {
for (int j = 0; j < LEVEL2_SIZE; ++j) {
for (int k = 0; k < LEVEL3_SIZE; ++k) {
for (int l = 0; l < LEVEL4_SIZE; ++l) {
dc[i][j][k][l] = 0.0;
}
}
}
}
for (int i = 0; i < LEVEL1_SIZE; ++i) {
for (int j = 0; j < LEVEL2_SIZE; ++j) {
for (int k = 0; k < LEVEL3_SIZE; ++k) {
for (int l = 0; l < LEVEL4_SIZE; ++l) {
cout << dc[i][j][k][l] << " ";
}
}
}
}
cout << endl;
return 0;
}