Handling Huge Multidimensional Arrays in C++

Handling Huge Multidimensional Arrays in C++ - c++

I'm designing a game in C++ similar to Minecraft that holds an enormous amount of terrain data in memory. In general, I want to store an array in memory that is [5][4][5][50][50][50]. This isn't bad since it amounts to about 100mb of virtual memory since my structure will only be about 8 bytes.
However, I'm having trouble figuring out the best way to handle this. I do want this to be in virtual memory, but obviously not on the stack. And I keep making the mistake some how of creating this array on the stack an causing a stack overflow. What I would like to do is below. This is just code that I threw together to give you an example of what I'm doing, I have code with correct syntax on my machine, I just didn't want to clutter the post.
typedef struct modelBlock
{
// Information about the blocks
} BLOCK;
typedef struct modelGrid
{
bool empty;
BLOCK blocksArray[50][50][50];
} GRID;
class Parent
{
Child* child;
Parent(void);
}
Parent::Parent()
{
Child c;
child = &c;
}
class Child
{
GRID grids[5][4][5];
}
However, every time I do this, I cause a stack overflow (appropriate web site choice right?). I played with using pointer based arrays, but I had a lot of trouble with data being lost outside of its scope.
If anyone could give me some insight on how to get my data to store on the heap instead of the stack, or if I should use some other way of creating my array, I'd really appreciate the help. I'd like to avoid using vectors because of overhead, though I'm not sure how substantial it is.

Use boost::multi_array

If you want to allocate something on the heap, use new.
#include <memory>
class Parent
{
std::auto_ptr<Child> child; // use auto_ptr for dynamically-allocated members
Parent(const Parent&); // You probably don't want to copy this giant thing
public:
Parent();
};
Parent::Parent()
: child(new Child) // initialize members with an initializer list
{
}
Also, avoid mixing C and C++ styles. There's no reason to do
typedef struct blah{ ... } BLAH;
in C++. A struct is just a class with all of the members public by default; just like a class, you can refer to the struct type's name without using the struct tag. There's also no need to specify void for a function that takes no parameters.
boost::multi_array (linked in PigBen's answer) is a good choice over raw arrays.

If you want the class created on the heap, create it with new:
Child * c = new Child;
and then of course delete it, or better still use a smart pointer.

In order to do exactly what you're trying to do you have to declare everything as pointers (and pointers to pointers to pointers to pointers) and then allocate each one individually.
Teh sux!
A better option is to simply allocate the ginormous block in one chunk and use multiple variable along with pointer arithmetic to arrive at the correct location.
Edit: Wasn't paying attention and didn't notice your constructor. That's not only not the way to get your Child allocated on the free-store, it's a great way to create situations eliciting undefined behavior. Your Child will be gone when the constructor is through and the pointer to it will then be invalid. I wonder if you shouldn't run through some basic tutorials before trying to write a game.

Here's something that works and can be built upon without the boost dependency. One downside is it removes use of [][][] style of referencing elements, but it's a small cost and can be added.
template<class T>
class Matrix {
unsigned char* _data;
const size_t _depth;
const size_t _cols;
const size_t _rows;
public:
Matrix(const size_t& depth, const size_t& rows, const size_t& cols):
_depth(depth),
_rows(rows),
_cols(cols) {
_data = new unsigned char [depth * rows * cols * sizeof(T)];
}
~Matrix() {
delete[] _data;
}
T& at(const size_t& depthIndex, const size_t& rowIndex, const size_t& colIndex) const {
return *reinterpret_cast<T*>(_data + ((((depthIndex * _cols + colIndex) * _rows) + rowIndex) * sizeof(T)));
}
const size_t& getDepth() const {
return _depth;
}
const size_t& getRows() const {
return _rows;
}
const size_t& getCols() const {
return _cols;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
Matrix<int> block(50, 50, 50);
size_t d, r, c;
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
block.at(d, r, c) = d * 10000000 + r * 10000 + c;
}
}
}
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
assert(block.at(d, r, c) == d * 10000000 + r * 10000 + c);
}
}
}
return 0;
}

A smaller example (with changed names for all the structs, to make the general principle clearer). The 'Bloe' struct is the one you want to allocate on the heap, and this is accomplished using 'new'.
struct Bla {
int arr[4][4];
};
struct Bloe {
Bla bla[2][2];
};
int main()
{
Bloe* bloe = new Bloe();
bloe->bla[1][1].arr[1][1] = 1;
return 0;
}

I did this by putting all the data in a binary file. I calculated the offset of the data and used seek() and read() to get the data when needed. The open() call is very slow so you should leave the file open during the lifetime of the program.

Below is how I understood what you showed you were trying to do in your example. I tried to keep it straightforward. Each Array of [50][50][50] is allocated in one memory chunk on the heap, and only allocated when used. There is also an exemple of access code. No fancy boost or anything special, just basic C++.
#include <iostream>
class Block
{
public:
// Information about the blocks
int data;
};
class Grid
{
public:
bool empty;
Block (*blocks)[50][50];
Grid() : empty(true) {
}
void makeRoom(){
this->blocks = new Block[50][50][50];
this->empty = false;
}
~Grid(){
if (!this->empty){
delete [] this->blocks;
}
}
};
class Parent
{
public:
Grid (* child)[4][5];
Parent()
{
this->child = new Grid[5][4][5];
}
~Parent()
{
delete [] this->child;
}
};
main(){
Parent p;
p.child[0][0][0].makeRoom();
if (!p.child[0][0][0].empty){
Block (* grid)[50][50] = p.child[0][0][0].blocks;
grid[49][49][49].data = 17;
}
std::cout << "item = "
<< p.child[0][0][0].blocks[49][49][49].data
<< std::endl;
}
This could still be more simple and straightfoward and just use one bug array of [50][50][50][5][4][5] blocks in one memory chunk on the heap, but I'll let you figure out how if this is what you want.
Also, usind dynamic allocation in class Parent only has the sole purpose to use heap instaed of stack, but for such a small array (5*4*5 pointers), allocating it on stack should not be a problem, hence it could be written.
class Parent
{
public:
Grid child[5][4][5];
};
without changing anything in the way it is used.

Related

how to find and track objects and functions that using a dynamically allocated memory?

if we have a code like this :
void g(int *n){
int m=n[1];
n= null ;
}
void main(){
int ∗ p = (int∗)malloc(10 ∗ sizeof (int));
int * q= p;
g(p);
}
so we know if we overload malloc, calloc , realloc ,new ,free and delete functions we can track first pointer that create or delete with this functions(p pointer in above example ) but how can i find and track other pointers and functions that using this allocated memory ? ( q pointer and g function in above example ) .should i overload Assignment statement and function call ? if yes how ? in other words i want to know live objects and last used time and location of an allocated memory too .
i want to implement an custom memory leak detection tools so i need to find all objects and pointer that using an allocated memory before report it that's leak or not .

What you're talking about is called reference counting. Searching stackoverflow for this topic gives you these results:
What is a smart pointer and when should I use one?
How does a reference-counting smart pointer's reference counting work?
what exactly reference counting in c++ means?,
The standard template library already has a std::shared_ptr which does that.
We need to keep track of the lifecycle of the resource, this is possible by controlling the creation, the copying and the deletion. If you do it without using a class, you'll end up with some functions, a state variable and a global variable. This is not very effective in keeping the concept of the shared resource in the focus of the user's mind. One tends to forget that one is using a shared resource because one sees a naked pointer and tends to use it like one, which would disregard the provided shared-pointer functionality.
If you were to encapsulate the functionality in a class, you should want to implement the concept for all types i.e. you should want to use templates. one way would be this:
#include <vector>
#include <stdexcept>
#include <cstdlib>
#ifndef SHARED_PTR_H
#define SHARED_PTR_H
template<class Tres_t>
class shared_pointer
{
// members
Tres_t* data_;
static std::vector<std::size_t> ref_counts_;
std::size_t ref_index_ = {};
public:
// lifecycle
shared_pointer() = delete;
shared_pointer(const std::size_t& size)
: data_{nullptr}
{
data_ = static_cast<Tres_t*>(std::malloc(size * sizeof(Tres_t)));
ref_counts_.push_back(1);
ref_index_ = ref_counts_.size() - 1;
}
shared_pointer(const shared_pointer& rhs)
: data_{rhs.data_}, ref_index_{rhs.ref_index_}
{
if (ref_index_ >= ref_counts_.size())
{
throw std::runtime_error("shared_pointer_ctor_index_error");
}
++ref_counts_[ref_index_];
}
shared_pointer(shared_pointer&& rhs)
: data_{rhs.data_}, ref_index_{rhs.ref_index_} {}
shared_pointer& operator=(const shared_pointer& rhs)
{
data_ = rhs.data_;
ref_index_ = rhs.ref_index_;
if (ref_index_ >= ref_counts_.size())
{
throw std::runtime_error("shared_pointer_ctor_index_error");
}
++ref_counts_[ref_index_];
}
shared_pointer& operator=(shared_pointer&& rhs)
{
data_ = rhs.data_;
ref_index_ = rhs.ref_index_;
}
~shared_pointer()
{
if (ref_counts_[ref_index_] == 0)
{
std::logic_error("shared_point_dtor_reference_counting_error");
}
--ref_counts_[ref_index_];
if (ref_counts_[ref_index_] == 0)
{
std::free(data_);
}
}
// main functionality
Tres_t* data()
{
return data_;
}
const Tres_t* data() const
{
return data_;
}
};
template<class Tres_t>
std::vector<std::size_t> shared_pointer<Tres_t>::ref_counts_ = {};
#endif
I've tested this code only rudimentarily so I'll leave it to you to test and improve it.

Why is alloca returning the same address twice?

I'm trying to implement my own math library, and I'm starting off with vectors. The idea is to give the class a pointer to an array of numbers, then copy the array and store it in the data address given by a private variable pointer. To begin with, I used alloca to try and free up some memory for the private variable
vml.h
namespace vml {
// Vectors
template <typename in_type, const int in_length>
class vec {
public:
vec(in_type* in_data) {
std::cout << data << std::endl;
std::copy(in_data, in_data + in_length, data);
}
vec() {
data = nullptr;
}
in_type& operator()(int index) const {
_ASSERT(0 <= index && index < in_length);
return data[index];
}
private:
in_type* data = alloca(in_length * sizeof(in_type));
};
main.cpp
int main() {
int list[] = { 1,2,3 };
int list2[] = {2,4,6 };
vml::vec<int, 3> a(list);
vml::vec<int, 3> b(list);
return 0;
}
This gives no errors however, for some reason, alloca returns the same address twice when calling two instances. I searched this up everywhere and I couldn't find an explanation why. So I decided to allocate memory using an array. If you can answer this question that would be extremely helpful.
Thanks.

You have to be very careful with alloca. It allocates memory on the stack rather than the heap. That memory is freed as soon as the function which called alloca exits. In this case, it will be called in the constructor so when you call operator() that memory has already been freed and you are dealing with undefined behavior.
Unless you really need to avoid heap allocations and you know for sure that you wont overflow the stack and you understand all the limitations of using alloca, it's best to steer clear of it.

Lets start with the basics, your stack is most likely only 1 MB, so after a few vectors and recursive calls you program will likely die.
To solve it if you want it on stack you could use std::array as data
Warning untested code ahead
template <typename in_type, const int in_length>
class vec {
public:
vec(in_type* in_data) {
std::cout << data << std::endl;
std::copy(in_data, in_data + in_length, data);
}
vec() = default;
in_type& operator()(int index) const {
_ASSERT(0 <= index && index < in_length);
return data[index];
}
private:
std::array<in_type, in_length> data;
};
Alternatively if you want to use all the nice things from std::array
template <typename in_type, const int in_length>
class vec : public std::array<in_type, in_length> {
public:
using std::array::array; // use constructors, might need template param to compile
}
This also means that if you at some point just want to change to heap you just allocate your vec as every other class.
Another alternative is to use C++17 PMR, use an allocation on the stack as the storage and make vec PMR aware.

You cannot wrap alloca in a function and return its pointer outside, since the stack of wrapper function would be freed.
If you call it as member initializer, it is actually called from constructor, and may be freed when constructor returns and then re-used.

C++ Access memory which isn't part of the object itself

It sounds weird, I guess, but I'm creating some low-level code for a hardware device. Dependend on specific conditions I need to allocate more space than the actual struct needs, store informations there and pass the address of the object itself to the caller.
When the user is deallocating such an object, I need to read these informations before I actually deallocate the object.
At the moment, I'm using simple pointer operations to get the addresses (either of the class or the extra space). However, I tought it would be more understandable if I do the pointer arithmetics in member functions of an internal (!) type. The allocator, which is dealing with the addresses, is the only one who know's about this internal type. In other words, the type which is returned to the user is a different one.
The following example show's what I mean:
struct foo
{
int& get_x() { return reinterpret_cast<int*>(this)[-2]; }
int& get_y() { return reinterpret_cast<int*>(this)[-1]; }
// actual members of foo
enum { size = sizeof(int) * 2 };
};
int main()
{
char* p = new char[sizeof(foo) + foo::size];
foo* bar = reinterpret_cast<foo*>(p + foo::size);
bar->get_x() = 1;
bar->get_y() = 2;
std::cout << bar->get_x() << ", " << bar->get_y() << std::endl;
delete p;
return 0;
}
Is it arguable to do it in that way?

It seems needlessly complex to do it this way. If I were to implement something like this, I would take a simpler approach:
#pragma pack(push, 1)
struct A
{
int x, y;
};
struct B
{
int z;
};
#pragma pack(pop)
// allocate space for A and B:
unsigned char* data = new char[sizeof(A) + sizeof(B)];
A* a = reinterpret_cast<A*>(data);
B* b = reinterpret_cast<B*>(a + 1);
a->x = 0;
a->y = 1;
b->z = 2;
// When deallocating:
unsigned char* address = reinterpret_cast<unsigned char*>(a);
delete [] address;
This implementation is subtly different, but much easier (in my opinion) to understand, and doesn't rely on intimate knowledge of what is or is not present. If all instances of the pointers are allocated as unsigned char and deleted as such, the user doesn't need to keep track of specific memory addresses aside from the first address in the block.

The very straightforward idea: wrap your extra logic in a factory which will create objects for you and delete them smart way.

You can also create the struct as a much larger object, and use a factory function to return an instance of the struct, but cast to a much smaller object that would basically act as the object's handle. For instance:
struct foo_handle {};
struct foo
{
int a;
int b;
int c;
int d;
int& get_a() { return a; }
int& get_b() { return b; }
//...more member methods
//static factory functions to create and delete objects
static foo_handle* create_obj() { return new foo(); }
static void delete_obj(foo_handle* obj) { delete reinterpret_cast<foo*>(obj); }
};
void another_function(foo_handle* masked_obj)
{
foo* ptr = reinterpret_cast<foo*>(masked_obj);
//... do something with ptr
}
int main()
{
foo_handle* handle = foo::create_obj();
another_function(handle);
foo::delete_obj(handle);
return 0;
}
Now you can hide any extra space you may need in your foo struct, and to the user of your factory functions, the actual value of the pointer doesn't matter since they are mainly working with an opaque handle to the object.

It seems your question is a candidate for the popular struct hack.
Is the "struct hack" technically undefined behavior?

Offset and pass vector reference

When using arrays you can do something like
class SomeClass
{
public:
int* LockMember( size_t& numInts );
private:
int* member;
size_t numInts;
};
int* SomeClass::LockMember( size_t& out_numInts )
{
out_numInts = numInts - 1;
return member + 1;
}
To return an array offset by some amount so as to prevent someone from modifying some part of contingeous memory, or, atleast, show some intent that this part of contingeous memory of the object should remain untouched.
Since I use vectors everywhere, I am wondering if there was some way to accomplish the same sort of thing:
class SomeClass
{
public:
std::vector<int> LockMember( void );
private:
std::vector<int> member;
};
std::vector<int> SomeClass::LockMember( void )
{
// somehow make a vector with its beginning iterator pointing to member.begin() + 1
// have a size smaller by one, still the same end iterator. The vector must be
// pointing to the same data as in this class as it needs to be modifiable.
return magicOffsetVector;
}
With the commented part replaced by real code. Any ideas?

If I understand you correctly: You want some memory with two parts: At the beginning you want something that can't be touched, and after that you want something that is open for use by client code.
You could do something along the following code. This will give the client code a copy to play with. This does mean you would have to do a lot of copying, though.
class SomeClass
{
public:
std::vector<int> getMember( void ) const;
void setMember(std::vector<int> newContent);
private:
std::vector<int> member;
size_t magicOffset;
};
// Read restricted part
std::vector<int> SomeClass::getMember( void ) const
{
return vector<int>(member.begin() + magicOffset, member.end());
}
// Assign to restricted part
void SomeClass::setMember(const std::vector<int>& v)
{
std::copy(v.begin(), v.end(), member.begin() + magicOffset);
}
In order to avoid the copying, it is possible that you could allocate memory for two vectors, one for the protected part and one for the unprotected part, and use placement new to put both vectors into that memory, thus ensuring that they are in contiguous memory. And then give the client code more or less free access to the public part of the vector. However, there's still the thing with bookkeeping variables in vector, and basically this would be an awful hack that's just waiting to blow up.
However, if you only need access to the unrestricted part on a per-element basis, you could just do range-checking on the arguments, i.e.:
int getElement(size_t idx)
{
idx += magicOffset;
if (idx > member.size() || idx < 0) throw std::out_of_range("Illegal index");
return member[idx];
}
And then either provide a setElement, or return int&.

Determine array size in constructor initializer

In the code below I would like array to be defined as an array of size x when the Class constructor is called. How can I do that?
class Class
{
public:
int array[];
Class(int x) : ??? { }
}

You folks have so overcomplicated this. Of course you can do this in C++. It is fine for him to use a normal array for efficiency. A vector only makes sense if he doesn't know the final size of the array ahead of time, i.e., it needs to grow over time.
If you can know the array size one level higher in the chain, a templated class is the easiest, because there's no dynamic allocation and no chance of memory leaks:
template < int ARRAY_LEN > // you can even set to a default value here of C++'11
class MyClass
{
int array[ARRAY_LEN]; // Don't need to alloc or dealloc in structure! Works like you imagine!
}
// Then you set the length of each object where you declare the object, e.g.
MyClass<1024> instance; // But only works for constant values, i.e. known to compiler
If you can't know the length at the place you declare the object, or if you want to reuse the same object with different lengths, or you must accept an unknown length, then you need to allocate it in your constructor and free it in your destructor... (and in theory always check to make sure it worked...)
class MyClass
{
int *array;
MyClass(int len) { array = calloc(sizeof(int), len); assert(array); }
~MyClass() { free(array); array = NULL; } // DON'T FORGET TO FREE UP SPACE!
}

You can't initialize the size of an array with a non-const dimension that can't be calculated at compile time (at least not in current C++ standard, AFAIK).
I recommend using std::vector<int> instead of array. It provides array like syntax for most of the operations.

Use the new operator:
class Class
{
int* array;
Class(int x) : array(new int[x]) {};
};

I don't think it can be done. At least not the way you want. You can't create a statically sized array (array[]) when the size comes from dynamic information (x).
You'll need to either store a pointer-to-int, and the size, and overload the copy constructor, assignment operator, and destructor to handle it, or use std::vector.
class Class
{
::std::vector<int> array;
Class(int x) : array(x) { }
};

Sorry for necroing this old thread.
There is actually a way to find out the size of the array compile-time. It goes something like this:
#include <cstdlib>
template<typename T>
class Class
{
T* _Buffer;
public:
template<size_t SIZE>
Class(T (&static_array)[SIZE])
{
_Buffer = (T*)malloc(sizeof(T) * SIZE);
memcpy(_Buffer, static_array, sizeof(T) * SIZE);
}
~Class()
{
if(_Buffer)
{
free(_Buffer);
_Buffer = NULL;
}
}
};
int main()
{
int int_array[32];
Class<int> c = Class<int>(int_array);
return 0;
}
Alternatively, if you hate to malloc / new, then you can create a size templated class instead. Though, I wouldn't really recommend it and the syntax is quite ugly.
#include <cstdio>
template<typename T, size_t SIZE>
class Class
{
private:
T _Array[sz];
public:
Class(T (&static_array)[SIZE])
{
memcpy(_Array, static_array, sizeof(T) * SIZE);
}
};
int main()
{
char int_array[32];
Class<char, sizeof(int_array)> c = Class<char, sizeof(int_array)>(int_array);
return 0;
}
Anyways, I hope this was helpful :)

I had the same problem and I solved it this way
class example
{
int *array;
example (int size)
{
array = new int[size];
}
}

Don't you understand there is not need to use vector, if one wants to use arrays it's a matter of efficiency, e.g. less space, no copy time (in such case if handled properly there is not even need to delete the array within a destructor), etc. wichever reasons one has.
the correct answer is: (quoted)
class Class
{
int* array;
Class(int x) : array(new int[x]) {};
};
Do not try to force one to use non optimal alternatives or you'll be confusing unexperienced programmers

Instead of using a raw array, why not use a vector instead.
class SomeType {
vector<int> v;
SomeType(size_t x): v(x) {}
};
Using a vector will give you automatic leak protection in the face of an exception and many other benefits over a raw array.

Like already suggested, vector is a good choice for most cases.
Alternatively, if dynamic memory allocation is to be avoided and the maximum size is known at compile time, a custom allocator can be used together with std::vector or a library like the embedded template library can be used.
See here: https://www.etlcpp.com/home.html
Example class:
#include <etl/vector.h>
class TestDummyClass {
public:
TestDummyClass(size_t vectorSize) {
if(vectorSize < MAX_SIZE) {
testVector.resize(vectorSize);
}
}
private:
static constexpr uint8_t MAX_SIZE = 20;
etl::vector<int, MAX_SIZE> testVector;
uint8_t dummyMember = 0;
};

You can't do it in C++ - use a std::vector instead:
#include <vector>
struct A {
std::vector <int> vec;
A( int size ) : vec( size ) {
}
};

Declare your array as a pointer. You can initialize it in the initializer list later through through new.
Better to use vector for unknown size.
You might want to look at this question as well on variable length arrays.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Handling Huge Multidimensional Arrays in C++ - c++

Use boost::multi_array

If you want the class created on the heap, create it with new: Child * c = new Child; and then of course delete it, or better still use a smart pointer.

I did this by putting all the data in a binary file. I calculated the offset of the data and used seek() and read() to get the data when needed. The open() call is very slow so you should leave the file open during the lifetime of the program.

Related

how to find and track objects and functions that using a dynamically allocated memory?

Why is alloca returning the same address twice?

C++ Access memory which isn't part of the object itself

Offset and pass vector reference

Determine array size in constructor initializer

Categories

Resources