creating large 2d array of size int arr[1000000][1000000] - c++

I want to create a two-dimensional integer array of size 106 × 106 elements. For this I'm using the boost library:
boost::multi_array<int, 2> x(boost::extents[1000000][1000000]);
But it throws the following exception:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Please tell me how to solve the problem.

You seriously don't want to allocate an array that huge. It's about 4 terabytes in memory.
Depending on what you want to do with that array you should consider two options:
External data structure. The array will be written on a hard drive. The most recently accessed parts is also in RAM, so depending on how you access it it can be pretty fast, but of course never as fast as if it would be fully in RAM. Have a look at STXXL for external data structures.
This method has the advantage that you can access all of the elements in the array (in contrast to the second method as you'll see). However, the problem still remains: 4 terabytes are very huge even on a hard drive, at least if you are talking about a general desktop application.
Sparse data structure. If you only actually need a couple of items from that array, but you want to address these items in a space of size 10⁶ ⨯ 10⁶, don't use an array but something like a map or a combination of both: Allocate the array in "blocks" of, let's say 1024 x 1024 elements. Put these blocks into a map while referencing the block index (coordinate divided by 1024) as the key in the map.
This method has the advantage that you don't have to link against another library, since it can be written easily by yourself. However, it has the disadvantage that if you access elements distributed over the whole coordinate space of 10⁶ ⨯ 10⁶ or even need all of the values, it also uses around 4TB (even a bit more) memory. It only works if you actually access only a smart part of this huge "virtual" array.
The following (untested) C++ code should demonstrate this:
class Sparse2DArray
{
struct Coord {
int x, y;
Coord(int x, int y) : x(x), y(y) {}
bool operator<(const Coord &o) const { return x < o.x || (x == o.x && y < o,y); } // required for std::map
};
static const int BLOCKSIZE = 1024;
std::map<Coord, std::array<std::array<int,BLOCKSIZE>,BLOCKSIZE> blocks;
static Coord block(Coord c) {
return coord(c.x / BLOCKSIZE, c.y / BLOCKSIZE);
}
static Coord blockSubCoord(Coord c) {
return coord(c.x % BLOCKSIZE, c.y % BLOCKSIZE);
}
public:
int & operator[](int x, int y) {
Coord c(x, y);
Coord b = block(c);
Coord s = blockSubCoord(c);
return blocks[b][s.x][s.y];
}
};
Instead of a std::map you can also use a std::unordered_map (hash map) but have to define a hash function instead of operator< for the Coord type (or use std::pair instead).

When you create an array that way, it is created on the stack and the stack has a limited size. Therefore, your program will crash because it doesn't have enough room to allocate that big of an array.
There are two ways you can solve this, you can create the array on the heap using the new keyword But you have to delete it afterword or else you have a memory leak, and also be careful because while the heap has a larger memory size then the stack it is still finite.
The other way is for you to use std::vector inside std::vector and let it handle the memory for you.

What do you intend by creating a 106×106 matrix? If you're trying to create a sparse matrix (i.e. a diffusion matrix for a heat transfer problem with 106 finite elements), then you should look at using an existing linear algebra library. For example, the trilinos project has support for solving large sparse matrices like the one you may be trying to create.

Related

C++ allocate N-dimensional vector without copying a c-array

I want to load N-dimensional matrices from disk (HDF5) into std::vector objects.
I know their rank beforehand, just not the shape. For instance, one of the matrices is 4-rank std::vector<std::vector<std::vector<std::vector<float>>>> data;
I want to use vectors to store the values because they are standard and not as ugly as c-arrays (mostly because they are aware of their length).
However, the way to load them is using a loading function that takes a void *, which would work fine for rank 1 vectors where I can just resize them and then access its data pointer (vector.data()). For higher ranks, vector.data() will just point to vectors, not the actual data.
Worst case scenario I just load all the data to an auxiliary c-array and then copy it manually but this could slow it down quite a bit for big matrices.
Is there a way to have contiguous multidimensional data in vectors and then get a single address to it?
If you are concerned about performance please don't use a vector of vector of vector... .
Here is why. I think the answer of #OldPeculier is worth reading.
The reason that it's both fat and slow is actually the same. Each "row" in the matrix is a separately allocated dynamic array. Making a heap allocation is expensive both in time and space. The allocator takes time to make the allocation, sometimes running O(n) algorithms to do it. And the allocator "pads" each of your row arrays with extra bytes for bookkeeping and alignment. That extra space costs...well...extra space. The deallocator will also take extra time when you go to deallocate the matrix, painstakingly free-ing up each individual row allocation. Gets me in a sweat just thinking about it.
There's another reason it's slow. These separate allocations tend to live in discontinuous parts of memory. One row may be at address 1,000, another at address 100,000—you get the idea. This means that when you're traversing the matrix, you're leaping through memory like a wild person. This tends to result in cache misses that vastly slow down your processing time.
So, if you absolute must have your cute [x][y] indexing syntax, use that solution. If you want quickness and smallness (and if you don't care about those, why are you working in C++?), you need a different solution.
Your plan is not a wise one. Vectors of vectors of vectors are inefficient and only really useful for dynamic jagged arrays, which you don't have.
Instead of your plan, load into a flst vector.
Next, wrap it with a multidimensional view.
template<class T, size_t Dim>
struct dimensional{
size_t const* strides;
T* data;
dimensional<T, Dim-1> operator[](size_t i)const{
return {strides+1, data+i* *strides};
}
};
template<class T>
struct dimensional<T,0>{
size_t const* strides; // not valid to dereference
T* data;
T& operator[](size_t i)const{
return data[i];
}
};
where strides points at an array of array-strides for each dimension (the product of the sizes of all later dimensions).
So my_data.access()[3][5][2] gets a specific element.
This sketch of a solution leaves everything public, and doesn't support for(:) iteration. A more shipping quality one would have proper privacy and support c++11 style for loops.
I am unaware of the name of a high quality multi-dimensional array view already written for you, but there is almost certainly one in boost.
For a bi-dimensional matrix, you could use an ugly c-array like that:
float data[w * h]; //width, height
data[(y * w) + x] = 0; //access (x,y) element
For a tri-dimensional matrix:
float data[w * h * d]; //width, height, depth
data[((z * h) + y) * w + x] = 0; //access (x,y,z) element
And so on. To load data from, let's say, a file,
float *data = yourProcToLoadData(); //works for any dimension
That's not very scalable but you deal with a known dimension. This way your data is contiguous and you have a single address.

Dynamic allocated 2d array

I have in my class 2 const int variables:
const int m_width;
const int m_height;
In my constructor, I have set the variables and I want to create a 2D array with exactly this size that will be passed by value from the player. I am trying to make a TicTacToe game. I need the input of the user to determine the size of the playing field(in this case the width and height of it). How do I dynamically declare a 2D array in my situation?
It is a common misconception that 2-dimensional matrices should be supported by two-dimensional storage. People often try to use vectors of vectors or other techniques, and this comes at a cost, both performance and code maintainability.
This is not needed. In fact, perfect two-dimensional matrix is a single std::vector, where every row is packed one after each another. Such a vector has a size of of M * N, where M and N are matrix height and width. To access the element at location X, Y, you do v[K], where K is calculated as X * N + Y.
C++ doesn't provide a standard dynamic 2D array container.
What you can do (if you don't want to write your own full implementation) is use an std::vector of std::vectors instead.
It's not exactly the same thing (provides you with an extra degree of freedom: rows can be of different length) but unless you're working in an extremely constrained environment (or need an extremely optimized solution) the extra cost is not big.
Supposing your elements needs to be integers the code to initialize a 2d array can be for example:
std::vector<std::vector<int>> board(rows, std::vector<int>(cols));
PS: A few years ago I wrote a class here to implement a simple 2D array as an answer to an SO question... you can find it here.

Is it worth to use vector in case of making a map

I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?
I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array
A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.
Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.

Maximum number of stl::list objects

The problem is to find periodic graph patterns in a dataset. So I have 1000 timesteps with a graph(encoded as integers) in each timestep. So, there are 999 possible periods in which the graph can occur. Also I define a phase offset defined as (timestep mod period). For a graph which was first seen in the 5th timestep with period 2, the phase offset is 1.
I am trying to create a bidimensional array of lists in C++. Each cell contains a list containing graphs having a specified period and phase offset. I keep inserting graphs in the corresponding lists.
list<ListNode> A[timesteps][phase offsets]
ListNode is a class with 4 integer variables.
This gives me Segmentation fault. Using 500 for the size runs fine. Is this due to lack of memory or some other issue?
Thanks.
Probably due to limited stack size.
You're creating an array of 1000x1000 = 1000000 objects that are almost certainly at least 4 bytes apiece, so roughly 4 megabytes at a minimum. Assuming that's inside a function, it'll be auto storage class, which normally translates to being allocated on the stack. Typical stack sizes are around 1 to 4 megabytes.
Try something like: std::vector<ListNode> A(1000*1000); (and, if necessary, create a wrapper to make it look 2-dimensional).
Edit: The wrapper would overload an operator to give you 2D addressing:
template <class T>
class array_2D {
std::vector<T> data;
size_t cols;
public:
array_2D(size_t x, size_t y) : cols(x), data(x*y) {}
T &operator()(size_t x, size_t y) { return data[y*cols+x]; }
};
You may want to embellish that (e.g., with bounds checking) but that's the general idea. Addressing it would use (), as in:
array_2d<int> x(1000, 1000);
x(100, 3) = 2;
y = x(20, 20);
Sounds like you're running out of stack space. Try allocating it on the heap, e.g. through std::vector, and wrap in try ... catch to see out of memory errors instead of crashing.
(Edit: Don't use std::array since it also allocates on the stack.)
try {
std::vector<std::list<ListNode> > a(1000000); // create 1000*1000 lists
// index a by e.g. [index1 * 1000 + index2]
a[42 * 1000 + 18].size(); // size of that list
// or if you really want double subscripting without a wrapper function:
std::vector<std::vector<std::list<ListNode> > > a(1000);
for (size_t i = 0; i < 1000; ++i) { // do 1000 times:
a[i].resize(1000); // default-create and create 1000 in each
}
a[42][18].size(); // size of that list
} catch (std::exception const& e) {
std::cerr << "Caught " << typeid(e).name() << ": " << e.what() << std::endl;
}
In libstdc++ on a 32 bit system a std::list object weights 8 bytes (only the object itself, not counting the allocations it may make), and even in other implementations I don't think it will be much different; so, you are allocating about 8 MB of data, which isn't much per se on a regular computer, but, if you are putting that declaration in a function it will be a local variable, thus allocated on the stack, which is quite limited in size (few MBs at most).
You should allocate that thing on the heap, e.g. using new, or, even better using a std::vector.
By the way, it doesn't seem right that you need a 1000x1000 array of std::list, could you specify exactly what you are trying to achieve? Probably there are data structures that better fit your needs.
You're declaring a two-dimensional array [1000]x[1000] of list<ListNode>. I don't think that's what you intended.
The segmentation fault is probably from trying to use elements of the list that aren't valid.

C++ Array Size Initialization

I am trying to define a class. This is what I have:
enum Tile {
GRASS, DIRT, TREE
};
class Board {
public:
int toShow;
int toStore;
Tile* shown;
Board (int tsh, int tst);
~Board();
};
Board::Board (int tsh, int tst) {
toShow = tsh;
toStore = tst;
shown = new Tile[toStore][toStore]; //ERROR!
}
Board::~Board () {
delete [] shown;
}
However, I get the following error on the indicated line -- Only the first dimension of an allocated array can have dynamic size.
What I want to be able to do is rather then hard code it, pass the parameter toShow to the constructor and create a two-dimensional array which only contains the elements that I want to be shown.
However, my understanding is that when the constructor is called, and shown is initialized, its size will be initialized to the current value of toStore. Then even if toStore changes, the memory has already been allocated to the array shown and therefore the size should not change. However, the compiler doesn't like this.
Is there a genuine misconception in how I'm understanding this? Does anyone have a fix which will do what I want it to without having to hard code in the size of the array?
Use C++'s containers, that's what they're there for.
class Board {
public:
int toShow;
int toStore;
std::vector<std::vector<Tile> > shown;
Board (int tsh, int tst) :
toShow(tsh), toStore(tst),
shown(tst, std::vector<Tile>(tst))
{
};
};
...
Board board(4, 5);
board.shown[1][3] = DIRT;
You can use a one dimensional array. You should know that bi-dimensional arrays are treated as single dimensional arrays and when you want a variable size you can use this pattern. for example :
int arr1[ 3 ][ 4 ] ;
int arr2[ 3 * 4 ] ;
They are the same and their members can be accessed via different notations :
int x = arr1[ 1 ][ 2 ] ;
int x = arr2[ 1 * 4 + 2 ] ;
Of course arr1 can be seen as a 3 rows x 4 cols matrix and 3 cols x 4 rows matrix.
With this type of multi-dimensional arrays you can access them via a single pointer but you have to know about its internal structure. They are one dimensional arrays which they are treated as 2 or 3 dimensional.
Let me tell you about what I did when I needed a 3D array. It might be an overkeill, but it's rather cool and might help, although it's a whole different way of doing what you want.
I needed to represent a 3D box of cells. Only a part of the cells were marked and were of any interest. There were two options to do that. The first one, declare a static 3D array with the largest possible size, and use a portion of it if one or more of the dimensions of the box were smaller than the corresponding dimensions in the static array.
The second way was to allocate and deallocate the array dynamically. It's quite an effort with a 2D array, not to mention 3D.
The array solution defined a 3D array with the cells of interest having a special value. Most of the allocated memory was unnecessary.
I dumped both ways. Instead I turned to STL map.
I define a struct called Cell with 3 member variables, x, y, z which represented coordinates. The constructor Cell(x, y, z) was used to create such a Cell easily.
I defined the operator < upon it to make it orderable. Then I defined a map<Cell, Data>. Adding a marked cell with coordinates x, y, z to the map was done simply by
my_map[Cell(x, y, z)] = my_data;
This way I didn't need to maintain 3D array memory management, and also only the required cells were actually created.
Checking if a call at coordinate x0, y0, z0 exists (or marked) was done by:
map<Cell, Data>::iterator it = my_map.find(Cell(x0, y0, z0));
if (it != my_map.end()) { ...
And referencing the cell's data at coordinat x0, y0, z0 was done by:
my_map[Cell(x0, y0, z0)]...
This methid might seem odd, but it is robust, self managed regarding to memory, and safe - no boundary overrun.
First, if you want to refer to a 2D array, you have to declare a pointer to a pointer:
Tile **shown;
Then, have a look at the error message. It's proper, comprehensible English. It says what the error is. Only the first dimension of an allocated array can have dynamic size. means -- guess what, that only the first dimension of an allocated array can have dynamic size. That's it. If you want your matrix to have multiple dynamic dimensions, use the C-style malloc() to maintain the pointers to pointers, or, which is even better for C++, use vector, made exactly for this purpose.
It's good to understand a little of how memory allocation works in C and C++.
char x[10];
The compiler will allocate ten bytes and remember the starting address, perhaps it's at 0x12 (in real life probably a much larger number.)
x[3] = 'a';
Now the compiler looks up x[3] by taking the starting address of x, which is 0x12, and adding 3*sizeof(char), which brings to 0x15. So x[3] lives at 0x15.
This simple addition-arithmetic is how memory inside an array is accessed. For two dimensional arrays the math is only slightly trickier.
char xy[20][30];
Allocates 600 bytes starting at some place, maybe it's 0x2000. Now accessing
xy[4][3];
Requires some math... xy[0][0], xy[0][1], xy[0][2]... are going to occupy the first 30 bytes. Then xy[1][0], xy[1][1], ... are going to occupy bytes 31 to 60. It's multiplication: xy[a][b] will be located at the address of xy, plus a*20, plus b.
This is only possible if the compiler knows how long the first dimension is - you'll notice the compiler needed to know the number "20" to do this math.
Now function calls. The compiler little cares whether you call
foo(int* x);
or
foo(int[] x);
Because in either case it's an array of bytes, you pass the starting address, and the compiler can do the additional to find the place at which x[3] or whatever lives. But in the case of a two dimensional array, the compiler needs to know that magic number 20 in the above example. So
foo(int[][] xy) {
xy[3][4] = 5; //compiler has NO idea where this lives
//because it doesn't know the row dimension of xy!
}
But if you specify
foo(int[][30] xy)
Compiler knows what to do. For reasons I can't remember it's often considered better practice to pass it as a double pointer, but this is what's going on at the technical level.