cudaMalloc/cudaMemcpy with embedded objects/structures - c++

I am working on quite a big parallel application using OpenMPI to distribute data among MPI Processses. Using MPI with some serialization library, such as "cereal" makes it very comfortable to pass huge-multi embedded objects around. To give a hint of what I mean by multi-embedded structure, I am currently working with simplified version, such as :
// structures for CUDA - this is inside std::vector<struct_multi_data> multi_data_vector
struct struct_multi_data{
int intended_kernel_block;
int intended_kernel_thread;
std::vector<float> data_float;
std::vector<float> data_int;
float result;
};
struct struct_unique_data{
// this structure is shared among all blocks/threads
float x;
float y;
float z;
};
class Data_object{
// functions
public:
Data_object();
~Data_object();
int resize(int multi_data_vector_len, int data_float_len, int data_int_len);
void set_id(int id);
int clean(void);
int get_multi_data_len();
int get_multi_data(struct_multi_data * data, int vector_element);
int set_multi_data(struct_multi_data * data, int vector_element);
// variables
private:
std::vector<struct_multi_data> multi_data_vector;
struct_unique_data unique_data;
int data_id;
};
* the above code is simplified, I have removed serialization functions and some other basic stuff, but the overall structure holds
To put it simply, I am moving around the Data_object, containing vector{struct_multi_data}, which is a vector of structures, where every structure struct_multi_data contains some vector{float}.
I have a good reason to embed all the data into 1 Data_object, as it simplifies the MPI sending and receiving.
QUESTION
Is there some comfortable way to move the Data_object to GPU memory using cudaMalloc/cudaMemcpy functions ?
There seems to be problem with regular std::vector. I dont want to rely on Thrust library, because I am not sure whether it would work with my MPI serialization solution.
EDIT QUESTION
Can I use managed for my Data_object, or cudaMallocManaged() to make the data accessible to GPU ?
PLEASE READ
The size of the Data_object is well defined at the beginning of the program execution. None of the vectors changes size anywhere else, but the beginning of the execution. So why am I using vectors ? this way I can set the vectors size by passing parameters, instead of re-compiling the program to change the data size (such as when the data are defined as arrays).
RESPONSE TO COMMENTS
1) I think can replace all the vectors with pointers to arrays.

No, and the extra sections in this question don't help. std::vector is just not intended to work that way: It "owns" the memory it points to, and if you mem-copy it someplace else (even in host memory) and use it from there, you'll just corrupt your memory. Also, the std::vector code can't even run on the GPU since it's not __device__-code.
What you could do is use an std::span, which doesn't own the memory, instead of the std::vector. If you do that, and the memory is managed, then mem-copying a class might work.
Note I'm completely disregarding the members other than the vector as that seems to be the main issue here.

Related

Flexible Array Member for 2D-Array

I am currently working on a big project involving repast_hpc and mpi. I wanted to implement a two dimensional shared (across processes) array, because repast_hpc itself does not seem to come with that. For that I need an array member of a class. However I do not know the size of the array at compile time.
I need to be able to access and change the values in constant time. The code given below is my current header file, where the problem is located. How can I get a array member like the values in array in c++11?
template <typename Value>
class SharedValueField {
private:
Value[][] values;
std::queue<ValueChangePackage<Value>> changes;
public:
void initializeValueChange(int x, int y, Value value);
Value getValue(int x, int y);
void update();
};
All help appreciated. Thanks!
Tritos
I already tried using std::array. That has the same problems. I cant use std::vector, because they dont allow for constnt-time random element value manipulation.
Using ‚std::vector‘ works after all. As explained by many people in the comments, only changing the size of the vector has time-complexity in O(n). I only need to reassign elements, which works fine.

Adding member field for 3rd library type

I am dealing with a library that contain a type (let us say rectangle) as follow:
namespace some_lib{
struct rectangle{
int x;
int y;
int width;
int height;
};
}
I have a vector of this rectangles (very huge vector maybe 10^9 rectangle) and I want to compute the area of this rectangles and use it in many places of the program.
I want to compute it once of course. So I should store it somewhere. I can not edit the struct. I suggested this solution:
namespace my_own_program{
struct rectangle_wrapper{
some_lib::rectangle rect;
int area;
operator some_lib::rectangle() const { return rect; }
};
}
Now I can store the area in this structure and if I want to pass the vector to the library to do some process on it, I have to casting it while copying it to another vector.
I feel this method is rubbish. I solved the problem of computing areas but also casting in each time I need to process it in the library seems horrible.
My question:
How can I achieve this in more better way?
Well, if you are stuck with using the library and you have to pass the vector many times, then I'd suggest creating a "shadow vector" that holds the area for each rectangle i at index i.
It depends on how the vector is used though. If you have a high churn rate, then this is obviously not the way to go, but then you shouldn't use a vector in the first place anyway.
You can wrap both vectors in a custom class that you use in your code to access the rectangles and also to pass the vector to the library.
I would prefer another solution:
struct rectangle{
int x;
int y;
int width;
int height;
};
struct extended_rectangle : public rectangle {
int area;
};
In this case, you could pass the extended_rectangle to any function expecting a rectangle type (which probably is passed as reference)
UPDATE: after reading Aconcagua's comment, to be more precise:
If you want to call a 3rd function which receives as parameter a vector --a contiguous memory block-- with the sequence of objects (that is, as a c-vector) you have no solution, as Aconcagua indicated (thank you, Aconcagua, I did not think about that detail).
But, if you have selected your rectangle_wrapper which is not possible to pass as a parameter of contiguous objects, I would prefer my extended_rectangle solution.
If you want the objects to be in a vector, you could use any solution based on a vector of pointers to objects in order to take advantage of polymorphism (you could process *v[i] as a rectangle).

Dynamic multi-dimensional array

I have to create a three-dimensional array using class A as element ,class A is defined like below, should I use vector<vector<vector<A> > > or boost::multi_array? Which one is better?
struct C
{
int C_1;
short C_2;
};
class B
{
public:
bool B_1;
vector<C> C_;
};
class A
{
public:
bool A_1;
B B_[6];
};
If you know the size of all three dimensions at the time, that you write your code, and if you don't need checking for array bounds, then just use traditional arrays:
const int N1 = ...
const int N2 = ...
const int N3 = ...
A a[N1][N2][N3]
If the array dimensions can onlybe determined at run time, but remain constant after program initialization, and if array usage is distributed uniformly, then boost::multi_array is your friend. However, if a lot of dynamic extension is going on at runtime, and/or if array sizes are not uniform (for example, you need A[0][0][0...99] but only A[2][3][0...3]), then the nested vector is likely the best solution. In the case of non-uniform sizes, put the dimension, whose size variies the most, as last dimension. Also, in the nested vector solution, it is generally a good idea to put small dimensions first.
The main concern that I would have about using vector<vector<vector<A> > > would be making sure that the second- and third-level vectors all have the same length like they would in a traditional 3D array, since there would be nothing in the data type to enforce that. I'm not terribly familiar with boost::multi_array, but it looks like this isn't an issue there - you can resize() the whole array, but unless I'm mistaken you can't accidentally remove an item from the third row and leave it a different size than all of the other rows (for example).
So assuming concerns like file size and compile time aren't much of an issue, I would think you'd want boost::multi_array. If those things are an issue, you might want to consider using a plain-old 3D array, since that should beat either of the other two options hands-down in those areas.

Flexible Array Members on iOS in Objective-C++

I am working on some core audio code and have a problem that could be solved by a variable array in a struct--a la Flexible Array Members. In doing a bit of looking around, I see that there is a lot of dialogue about the portability and viability of Flexible Member Arrays.
From what I understand, Objective-C is C99 compliant. For this reason, I think Flexible Array Members should be a fine solution. I also see that Flexible Array Members are not a good idea in C++.
What to do in Objective-C++? Technically, I won't use it in Objective-C++. I am writing callbacks that are C and C++ based... That seems like a point against.
Anyway, can I (should I) do it? If not, is there another technique with the same results?
You can always just declare a trailing array of size 1. In the worst case here, you waste a pretty small amount of memory, and it is very slightly more complicated to compute the right size for malloc.
don't bother. it's not compatible. it is messy and error prone. c++ had solutions which are managed more easily long before this feature existed. what are you tacking onto the end of your struct? normally, you'll just use something like a std::vector, std::array, or fixed size array.
UPDATE
I want to have a list of note start times (uint64_t) and iterate through them to see which, if any, is playing. i was going to add a count var to the struct to track how many items are in the flexible array.
ok, then a fixed size array should be fine if you have fixed polyphony. you will not need more than one such array in most iOS synths. of course, 'upcoming note' array sizes could vary based on the app synth? sampler? sequencer? live input?
template <size_t NumNotes_>
class t_note_start_times {
public:
static const size_t NumNotes = NumNotes_;
typedef uint64_t t_timestamp;
/*...*/
const t_timestamp& timestampAt(const size_t& idx) const {
assert(this->d_numFutureNotes <= NumNotes);
assert(idx < NumNotes);
assert(idx < this->d_numFutureNotes);
return this->d_startTimes[idx];
}
private:
t_timestamp d_presentTime;
size_t d_numFutureNotes; // presumably, this will be the number of active notes,
// and values will be compacted to [0...d_numFutureNotes)
t_timestamp d_startTimes[NumNotes];
};
// in use
const size_t Polyphony = 16;
t_note_start_times<Polyphony> startTimes;
startTimes.addNoteAtTime(noteTimestamp); // defined in the '...' ;)
startTimes.timestampAt(0);
if you need a dynamically sized array which could be very large, then use a vector. if you need only one instance of this and the max polyphony is (say) 64, then just use this.

C++ fixed size arrays vs multiple objects of same type

I was wondering whether (apart from the obvious syntax differences) there would be any efficiency difference between having a class containing multiple instances of an object (of the same type) or a fixed size array of objects of that type.
In code:
struct A {
double x;
double y;
double z;
};
struct B {
double xvec[3];
};
In reality I would be using boost::arrays which are a better C++ alternative to C-style arrays.
I am mainly concerned with construction/destruction and reading/writing such doubles, because these classes will often be constructed just to invoke one of their member functions once.
Thank you for your help/suggestions.
Typically the representation of those two structs would be exactly the same. It is, however, possible to have poor performance if you pick the wrong one for your use case.
For example, if you need to access each element in a loop, with an array you could do:
for (int i = 0; i < 3; i++)
dosomething(xvec[i]);
However, without an array, you'd either need to duplicate code:
dosomething(x);
dosomething(y);
dosomething(z);
This means code duplication - which can go either way. On the one hand there's less loop code; on the other hand very tight loops can be quite fast on modern processors, and code duplication can blow away the I-cache.
The other option is a switch:
for (int i = 0; i < 3; i++) {
int *r;
switch(i) {
case 0: r = &x; break;
case 1: r = &y; break;
case 1: r = &z; break;
}
dosomething(*r); // assume this is some big inlined code
}
This avoids the possibly-large i-cache footprint, but has a huge negative performance impact. Don't do this.
On the other hand, it is, in principle, possible for array accesses to be slower, if your compiler isn't very smart:
xvec[0] = xvec[1] + 1;
dosomething(xvec[1]);
Since xvec[0] and xvec[1] are distinct, in principle, the compiler ought to be able to keep the value of xvec[1] in a register, so it doesn't have to reload the value at the next line. However, it's possible some compilers might not be smart enough to notice that xvec[0] and xvec[1] don't alias. In this case, using seperate fields might be a very tiny bit faster.
In short, it's not about one or the other being fast in all cases. It's about matching the representation to how you use it.
Personally, I would suggest going with whatever makes the code working on xvec most natural. It's not worth spending a lot of human time worrying about something that, at best, will probably only produce such a small performance difference that you'll only catch it in micro-benchmarks.
MVC++ 2010 generated exactly the same code for reading/writing from two POD structs like in your example. Since the offsets to read/write to are computable at compile time, this is not surprising. Same goes for construction and destruction.
As for the actual performance, the general rule applies: profile it if it matters, if it doesn't - why care?
Indexing into an array member is perhaps a bit more work for the user of your struct, but then again, he can more easily iterate over the elements.
In case you can't decide and want to keep your options open, you can use an anonymous union:
struct Foo
{
union
{
struct
{
double x;
double y;
double z;
} xyz;
double arr[3];
};
};
int main()
{
Foo a;
a.xyz.x = 42;
std::cout << a.arr[0] << std::endl;
}
Some compilers also support anonymous structs, in that case you can leave the xyz part out.
It depends. For instance, the example you gave is a classic one in favor of 'old-school' arrays: a math point/vector (or matrix)
has a fixed number of elements
the data itself is usually kept
private in an object
since (if?) it has a class as an
interface, you can properly
initialize them in the constructor
(otherwise, classic array
inialization is something I don't
really like, syntax-wise)
In such cases (going with the math vector/matrix examples), I always ended up using C-style arrays internally, as you can loop over them instead of writing copy/pasted code for each component.
But this is a special case -- for me, in C++ nowadays arrays == STL vector, it's fast and I don't have to worry about nuthin' :)
The difference can be in storing the variables in memory. In the first example compiler can add padding to align the data. But in your paticular case it doesn't matter.
raw arrays offer better cache locality than c++ arrays, as presented however, the array example's only advantage over the multiple objects is the ability to iterate over the elements.
The real answer is of course, create a test case and measure.