How to perform deep copying of struct with CUDA? [duplicate] - c++

This question already has answers here:
Copying a struct containing pointers to CUDA device
(3 answers)
Closed 5 years ago.
Programming with CUDA I am facing a problem trying to copy some data from host to gpu.
I have 3 nested struct like these:
typedef struct {
char data[128];
short length;
} Cell;
typedef struct {
Cell* elements;
int height;
int width;
} Matrix;
typedef struct {
Matrix* tables;
int count;
} Container;
So Container "includes" some Matrix elements, which in turn includes some Cell elements.
Let's suppose I dynamically allocate the host memory in this way:
Container c;
c.tables = malloc(20 * sizeof(Matrix));
for(int i = 0;i<20;i++){
Matrix m;
m.elements = malloc(100 * sizeof(Cell));
c.tables[i] = m;
}
That is, a Container of 20 Matrix of 100 Cells each.
How could I now copy this data to the device memory using cudaMemCpy()?
Is there any good way to perform a deep copy of "struct of struct" from host to device?
Thanks for your time.
Andrea

The short answer is "just don't". There are four reasons why I say that:
There is no deep copy functionality in the API
The resulting code you will have to writeto set up and copy the structure you have described to the GPU will be ridiculously complex (about 4000 API calls at a minimum, and probably an intermediate kernel for your 20 Matrix of 100 Cells example)
The GPU code using three levels of pointer indirection will have massively increased memory access latency and will break what little cache coherency is available on the GPU
If you want to copy the data back to the host afterwards, you have the same problem in reverse
Consider using linear memory and indexing instead. It is portable between host and GPU, and the allocation and copy overhead is about 1% of the pointer based alternative.
If you really want to do this, leave a comment and I will try and dig up some old code examples which show what a complete folly nested pointers are on the GPU.

Related

How can I allocate memory for a data structure that contains a vector?

If I have a struct instanceData:
struct InstanceData
{
unsigned usedInstances;
unsigned allocatedInstances;
void* buffer;
Entity* entity;
std::vector<float> *vertices;
};
And I allocate enough memory for an Entity and std::vector:
newData.buffer = size * (sizeof(Entity) + sizeof(std::vector<float>)); // Pseudo code
newData.entity = (Entity *)(newData.buffer);
newData.vertices = (std::vector<float> *)(newData.entity + size);
And then attempt to copy a vector of any size to it:
SetVertices(unsigned i, std::vector<float> vertices)
{
instanceData.vertices[i] = vertices;
}
I get an Access Violation Reading location error.
I've chopped up my code to make it concise, but it's based on Bitsquid's ECS. so just assume it works if I'm not dealing with vectors (it does). With this in mind, I'm assuming it's having issues because it doesn't know what size the vector is going to scale to. However, I thought the vectors might increase along another dimension, like this?:
Am I wrong? Either way, how can I allocate memory for a vector in a buffer like this?
And yes, I know vectors manage their own memory. That's besides the point. I'm trying to do something different.
It looks like you want InstanceData.buffer to have the actual memory space which is allocated/deallocated/accessed by other things. The entity and vertices pointers then point into this space. But by trying to use std::vector, you are mixing up two completely incompatible approaches.
1) You can do this with the language and the standard library, which means no raw pointers, no "new", no "sizeof".
struct Point {float x; float y;} // usually this is int, not float
struct InstanceData {
Entity entity;
std::vector<Point> vertices;
}
This is the way I would recommend. If you need to output to a specific binary format for serialization, just handle that in the save method.
2) You can manage the memory internal to the class, using oldschool C, which means using N*sizeof(float) for the vertices. Since this will be extremely error prone for a new programmer (and still rough for vets), you must make all of this private to class InstanceData, and do not allow any code outside InstanceData to manage them. Use unit tests. Provide public getter functions. I've done stuff like this for data structures that go across the network, or when reading/writing files with a specified format (Tiff, pgp, z39.50). But just to store in memory using difficult data structures -- no way.
Some other questions you asked:
How do I allocate memory for std::vector?
You don't. The vector allocates its own memory, and manages it. You can tell it to resize() or reserve() space, or push_back, but it will handle it. Look at http://en.cppreference.com/w/cpp/container/vector
How do I allocate memory for a vector [sic] in a buffer like this?
You seem to be thinking of an array. You're way off with your pseudo code so far, so you really need to work your way up through a tutorial. You have to allocate with "new". I could post some starter code for this, if you really need, which I would edit into the answer here.
Also, you said something about vector increasing along another dimension. Vectors are one dimensional. You can make a vector of vectors, but let's not get into that.
edit addendum:
The basic idea with a megabuffer is that you allocate all the required space in the buffer, then you initialize the values, then you use it through the getters.
The data layout is "Header, Entity1, Entity2, ..., EntityN"
// I did not check this code in a compiler, sorry, need to get to work soon
MegaBuffer::MegaBuffer() {AllocateBuffer(0);}
MegaBuffer::~MegaBuffer() {ReleaseBuffer();}
MegaBuffer::AllocateBuffer(size_t size /*, whatever is needed for the header*/){
if (nullptr!=buffer)
ReleaseBuffer();
size_t total_bytes = sizeof(Header) + count * sizeof(Entity)
buffer = new unsigned char [total_bytes];
header = buffer;
// need to set up the header
header->count = 0;
header->allocated = size;
// set up internal pointer
entity = buffer + sizeof(Header);
}
MegaBuffer::ReleaseBuffer(){
delete [] buffer;
}
Entity* MegaBuffer::operator[](int n) {return entity[n];}
The header is always a fixed size, and appears exactly once, and tells you how many entities you have. In your case there's no header because you are using member variables "usedInstances" and "allocatednstances" instead. So you do sort of have a header but it is not part of the allocated buffer. But you don't want to allocate 0 bytes, so just set usedInstances=0; allocatedInstances=0; buffer=nullptr;
I did not code for changing the size of the buffer, because the bitsquid ECS example covers that, but he doesn't show the first time initialization. Make sure you initialize n and allocated, and assign meaningful values for each entity before you use them.
You are not doing the bitsquid ECS the same as the link you posted. In that, he has several different objects of fixed size in parallel arrays. There is an entity, its mass, its position, etc. So entity[4] is an entity which has mass equal to "mass[4]" and its acceleration is "acceleration[4]". This uses pointer arithmetic to access array elements. (built in array, NOT std::Array, NOT std::vector)
The data layout is "Entity1, Entity2, ..., EntityN, mass1, mass2, ..., massN, position1, position2, ..., positionN, velocity1 ... " you get the idea.
If you read the article, you'll notice he says basically the same thing everyone else said about the standard library. You can use an std container to store each of these arrays, OR you can allocate one megabuffer and use pointers and "built in array" math to get to the exact memory location within that buffer for each item. In the classic faux-pas, he even says "This avoids any hidden overheads that might exist in the Array class and we only have a single allocation to keep track of." But you don't know if this is faster or slower than std::Array, and you're introducing a lot of bugs and extra development time dealing with raw pointers.
I think I see what you are trying to do.
There are numerous issues. First. You are making a buffer of random data, telling C++ that a Vector sized piece of it is a Vector. But, at no time do you actually call the constructor to Vector which will initialize the pointers and constructs inside to viable values.
This has already been answered here: Call a constructor on a already allocated memory
The second issue is the line
instanceData.vertices[i] = vertices;
instanceData.vertices is a pointer to a Vector, so you actually need to write
(*(instanceData.vertices))[i]
The third issue is that the contents of *(instanceData.vertices) are floats, and not Vector, so you should not be able to do the assignment there.

Vector Object Invetory, Object that can store other Object types?

I'm trying to create a Inventory system that can hold any object
for example
struct Ore{
string name;
int Size;
};
struct Wood{
string name;
int size;
int color;
};
my idea is to create a struct with 2 vectors, one for Numeric numbers, like items with Attack, Deffense and stuff, and the other vector for Name,description or other text.
With multiple constructors for different item types.
the problem I have with it is ive heard vectors can take up more memory, and i expect this program to create hundreds or thousands of items.
So i was looking for any suggestion for bettery memory storage.
struct Invetory{
vector<float> Number;
vector<string> Word;
Invetory(string n,float a)
{Word.push_back(s); Number.push_back(a)}
Invetory(string n,float a, float b)
{Word.push_back(s); Number.push_back(a); Number.push_back(b);}
};
vector<Invetory>Bag_Space;
You are trying to optimize too early.
Go with whatever is the cleanest thing to use. vectors are not an insane choice. (Using arrays or std::vectors in C++, what's the performance gap?)
Deal with a performance issue if/when it arises.
Checkout the following discussions on premature optimizations.
When is optimisation premature?
https://softwareengineering.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil
BTW I stumbled upon this interesting discussion on potential performance issues with vectors. In summary, it's saying that if your vectors are shrinking, then the memory footprint won't shrink with the vector size unless you call swap function.
And if you are making a lot of vectors and don't need to initialize it's elements to 0s, then instead of
vector<int> bigarray(N);
try
vector<int> bigarray;
bigarray.reserve(N);

C++ - Performance of vector of pointer to objects, vs performance of objects

In this case the question scenario is a game, so all resources are allocated at the beginning then iterated over for a level.
The objects being stored in the vector are instances of complex classes, and of course the actual copying them into the vector at load-time is time-consuming, but of low-concern.
But if my main concern is the speed of iteration over the class objects at runtime, would I be better to store the class objects themselves in the vector, rather than just pointers to the class objects as is traditionally recommended?
I am not worried about memory management in this example, only speed of iteration.
I'm answering this question late, but the performance aspect is important and the answers online so far have been purely theoretical and/or focusing exclusively on the memory-management aspects. So here is some actual benchmarking info on three related scenarios I recently tried. Your results may be different but at least there's some idea of how things pan out in a practical application.
The class A referenced here has about 10 member fields, half of which are primitives and the other half are std::string, std::vector<int>, and other dynamically sized containers. The application has already been fairly optimized and thus we would like to see which architecture now gives us the fastest looping over the collection of A. The values of any of A object's member fields may be changing over the application lifetime, but the number of A objects in the vector do not change over the many repeated iterations we perform (this continual iterating constitutes about 95% of this application's execution time). In all scenarios, looping was performed with the typical std::iterator or std::const_iterator. Each enumerated A object has at least several member fields accessed.
Scenario 1 — Vector Of Object Pointers
Although the simplest, this architecture of std::vector<A*> ended being slightly slower than the others.
Scenario 2 — Vector Of Object Pointers, Objects Are Allocated Using Placement New
The idea behind this approach is that we can improve the locality of caching by forcing our objects to be allocated into contiguous memory space. So the std::vector<A*> of object pointers is guaranteed to be contiguous by the std::vector implementation and the A objects themselves will also be contiguous on the heap because we've used the placement new idiom. I used the same approach outlined in this answer; more info on placement new can be found here.
This scenario was 2.7% faster than Scenario 1.
Scenario 3 — Vector Of Objects
Here we use std::vector<A> directly. The std::vector implementation guarantees our A objects will be contiguous in memory. Note that a std::vector of objects does involve considerations of the move and copy constructors of A. To avoid unnecessary moving and/or reconstruction, it is best to std::vector.reserve() the maximum possibly needed size in advance (if possible) and then use std::vector.emplace_back() (instead of push_back()) if at all possible. Looping over this structure was the fastest because we are able to eliminate one level of pointer indirection.
This approach was 6.4% faster than Scenario 1.
A related answer to a different question also shows that plain objects (as class members) can be quite faster than the respective pointers (as class members).
No, She is not wrong, she is absolutely right, though you are asking only about fast iteration, but that has alot of link with Memory... More the memory stack slower will be the access...
I have a live demo...
#include <iostream>
#include <string>
#include <vector>
#include "CHRTimer.h"
struct Items
{
std::string name;
int id;
float value;
float quantity;
};
void main()
{
std::vector<Items> vecItems1;
for(int i = 0; i < 10000; i++)
{
Items newItem;
newItem.name = "Testing";
newItem.id = i + 1;
newItem.value = 10.00;
newItem.quantity = 1.00;
vecItems1.push_back(newItem);
}
CHRTimer g_timer;
g_timer.Reset();
g_timer.Start();
for(int i = 0; i < 10000; i++)
{
Items currentItem = vecItems1[i];
}
g_timer.Stop();
float elapsedTime1 = g_timer.GetElapsedSeconds();
std::cout << "Time Taken to load Info from Vector of 10000 Objects -> " << elapsedTime1 << std::endl;
std::vector<Items*> vecItems;
for(int i = 0; i < 100000; i++)
{
Items *newItem = new Items();
newItem->name = "Testing";
newItem->id = i + 1;
newItem->value = 10.00;
newItem->quantity = 1.00;
vecItems.push_back(newItem);
}
g_timer.Reset();
g_timer.Start();
for(int i = 0; i < 100000; i++)
{
Items *currentItem = vecItems[i];
}
g_timer.Stop();
float elapsedTime = g_timer.GetElapsedSeconds();
std::cout << "\nTime Taken to load Info from Vector of 100000 pointers of Objects -> " << elapsedTime;
}
The first thing is pointers should be used for storing bulky things . Because if you use array of objects if would be creating n bulky objects and copy each one everytime its stored(it is also a big cost)
And the second thing if you are using vectors(STL)The vectors size grows every time its gets full of memory.
The main cost is copying the data of first in second and this is actually the main cost i.e copying.
Also this cost is minimal cost that would be incured if use built in.

creating large 2d array of size int arr[1000000][1000000]

I want to create a two-dimensional integer array of size 106 × 106 elements. For this I'm using the boost library:
boost::multi_array<int, 2> x(boost::extents[1000000][1000000]);
But it throws the following exception:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Please tell me how to solve the problem.
You seriously don't want to allocate an array that huge. It's about 4 terabytes in memory.
Depending on what you want to do with that array you should consider two options:
External data structure. The array will be written on a hard drive. The most recently accessed parts is also in RAM, so depending on how you access it it can be pretty fast, but of course never as fast as if it would be fully in RAM. Have a look at STXXL for external data structures.
This method has the advantage that you can access all of the elements in the array (in contrast to the second method as you'll see). However, the problem still remains: 4 terabytes are very huge even on a hard drive, at least if you are talking about a general desktop application.
Sparse data structure. If you only actually need a couple of items from that array, but you want to address these items in a space of size 10⁶ ⨯ 10⁶, don't use an array but something like a map or a combination of both: Allocate the array in "blocks" of, let's say 1024 x 1024 elements. Put these blocks into a map while referencing the block index (coordinate divided by 1024) as the key in the map.
This method has the advantage that you don't have to link against another library, since it can be written easily by yourself. However, it has the disadvantage that if you access elements distributed over the whole coordinate space of 10⁶ ⨯ 10⁶ or even need all of the values, it also uses around 4TB (even a bit more) memory. It only works if you actually access only a smart part of this huge "virtual" array.
The following (untested) C++ code should demonstrate this:
class Sparse2DArray
{
struct Coord {
int x, y;
Coord(int x, int y) : x(x), y(y) {}
bool operator<(const Coord &o) const { return x < o.x || (x == o.x && y < o,y); } // required for std::map
};
static const int BLOCKSIZE = 1024;
std::map<Coord, std::array<std::array<int,BLOCKSIZE>,BLOCKSIZE> blocks;
static Coord block(Coord c) {
return coord(c.x / BLOCKSIZE, c.y / BLOCKSIZE);
}
static Coord blockSubCoord(Coord c) {
return coord(c.x % BLOCKSIZE, c.y % BLOCKSIZE);
}
public:
int & operator[](int x, int y) {
Coord c(x, y);
Coord b = block(c);
Coord s = blockSubCoord(c);
return blocks[b][s.x][s.y];
}
};
Instead of a std::map you can also use a std::unordered_map (hash map) but have to define a hash function instead of operator< for the Coord type (or use std::pair instead).
When you create an array that way, it is created on the stack and the stack has a limited size. Therefore, your program will crash because it doesn't have enough room to allocate that big of an array.
There are two ways you can solve this, you can create the array on the heap using the new keyword But you have to delete it afterword or else you have a memory leak, and also be careful because while the heap has a larger memory size then the stack it is still finite.
The other way is for you to use std::vector inside std::vector and let it handle the memory for you.
What do you intend by creating a 106×106 matrix? If you're trying to create a sparse matrix (i.e. a diffusion matrix for a heat transfer problem with 106 finite elements), then you should look at using an existing linear algebra library. For example, the trilinos project has support for solving large sparse matrices like the one you may be trying to create.

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.
Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.
If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.
I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value
Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}
My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value
While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};