CUDA C++, modifying a function so that it runs from device? [duplicate]

CUDA C++, modifying a function so that it runs from device? [duplicate] - c++

The question is that: is there a way to use the class "vector" in Cuda kernels? When I try I get the following error:
error : calling a host function("std::vector<int, std::allocator<int> > ::push_back") from a __device__/__global__ function not allowed
So there a way to use a vector in global section?
I recently tried the following:
create a new Cuda project
go to properties of the project
open Cuda C/C++
go to Device
change the value in "Code Generation" to be set to this value:
compute_20,sm_20
........ after that I was able to use the printf standard library function in my Cuda kernel.
is there a way to use the standard library class vector in the way printf is supported in kernel code? This is an example of using printf in kernel code:
// this code only to count the 3s in an array using Cuda
//private_count is an array to hold every thread's result separately
__global__ void countKernel(int *a, int length, int* private_count)
{
printf("%d\n",threadIdx.x); //it's print the thread id and it's working
// vector<int> y;
//y.push_back(0); is there a possibility to do this?
unsigned int offset = threadIdx.x * length;
int i = offset;
for( ; i < offset + length; i++)
{
if(a[i] == 3)
{
private_count[threadIdx.x]++;
printf("%d ",a[i]);
}
}
}

You can't use the STL in CUDA, but you may be able to use the Thrust library to do what you want. Otherwise just copy the contents of the vector to the device and operate on it normally.

In the cuda library thrust, you can use thrust::device_vector<classT> to define a vector on device, and the data transfer between host STL vector and device vector is very straightforward. you can refer to this useful link:http://docs.nvidia.com/cuda/thrust/index.html to find some useful examples.

you can't use std::vector in device code, you should use array instead.

I think you can implement a device vector by youself, because CUDA supports dynamic memory alloction in device codes. Operator new/delete are also supported. Here is an extremely simple prototype of device vector in CUDA, but it does work. It hasn't been tested sufficiently.
template<typename T>
class LocalVector
{
private:
T* m_begin;
T* m_end;
size_t capacity;
size_t length;
__device__ void expand() {
capacity *= 2;
size_t tempLength = (m_end - m_begin);
T* tempBegin = new T[capacity];
memcpy(tempBegin, m_begin, tempLength * sizeof(T));
delete[] m_begin;
m_begin = tempBegin;
m_end = m_begin + tempLength;
length = static_cast<size_t>(m_end - m_begin);
}
public:
__device__ explicit LocalVector() : length(0), capacity(16) {
m_begin = new T[capacity];
m_end = m_begin;
}
__device__ T& operator[] (unsigned int index) {
return *(m_begin + index);//*(begin+index)
}
__device__ T* begin() {
return m_begin;
}
__device__ T* end() {
return m_end;
}
__device__ ~LocalVector()
{
delete[] m_begin;
m_begin = nullptr;
}
__device__ void add(T t) {
if ((m_end - m_begin) >= capacity) {
expand();
}
new (m_end) T(t);
m_end++;
length++;
}
__device__ T pop() {
T endElement = (*m_end);
delete m_end;
m_end--;
return endElement;
}
__device__ size_t getSize() {
return length;
}
};

You can't use std::vector in device-side code. Why?
It's not marked to allow this
The "formal" reason is that, to use code in your device-side function or kernel, that code itself has to be in a __device__ function; and the code in the standard library, including, std::vector is not. (There's an exception for constexpr code; and in C++20, std::vector does have constexpr methods, but CUDA does not support C++20 at the moment, plus, that constexprness is effectively limited.)
You probably don't really want to
The std::vector class uses allocators to obtain more memory when it needs to grow the storage for the vectors you create or add into. By default (i.e. if you use std::vector<T> for some T) - that allocation is on the heap. While this could be adapted to the GPU - it would be quite slow, and incredibly slow if each "CUDA thread" would dynamically allocate its own memory.
#Now, you could say "But I don't want to allocate memory, I just want to read from the vector!" - well, in that case, you don't need a vector per se. Just copy the data to some on-device buffer, and either pass a pointer and a size, or use a CUDA-capable span, like in cuda-kat. Another option, though a bit "heavier", is to use the [NVIDIA thrust library]'s 3 "device vector" class. Under the hood, it's quite different from the standard library vector though.

Related

Is it concurrency-safe to call concurrency::concurrent_vector::push_back while iterating over that concurrent_vector in other thread?

push_back, begin, end are described as concurrent safe in
https://learn.microsoft.com/en-us/cpp/parallel/concrt/reference/concurrent-vector-class?view=vs-2019#push_back
However the below code is asserting. Probably because element is added but not initialized yet.
struct MyData
{
explicit MyData()
{
memset(arr, 0xA5, sizeof arr);
}
std::uint8_t arr[1024];
};
struct MyVec
{
concurrency::concurrent_vector<MyData> v;
};
auto vector_pushback(MyVec &vec) -> void
{
vec.v.push_back(MyData{});
}
auto vector_loop(MyVec &vec) -> void
{
MyData myData;
for (auto it = vec.v.begin(); it != vec.v.end(); ++it)
{
auto res = memcmp(&(it->arr), &(myData.arr), sizeof myData.arr);
assert(res == 0);
}
}
int main()
{
auto vec = MyVec{};
auto th_vec = std::vector<std::thread>{};
for (int i = 0; i < 1000; ++i)
{
th_vec.emplace_back(vector_pushback, std::ref(vec));
th_vec.emplace_back(vector_loop, std::ref(vec));
}
for(auto &th : th_vec)
th.join();
return 0;
}

According to the docs, it should be safe to append to a concurrency::concurrent_vector while iterating over it because the elements are not actually stored contiguously in memory like std::vector:
A concurrent_vector object does not relocate its elements when you append to it or resize it. This enables existing pointers and iterators to remain valid during concurrent operations.
However, looking at the actual implementation of push_back in VS2017, I see the following, which I don't think is thread-safe:
iterator push_back( _Ty &&_Item )
{
size_type _K;
void *_Ptr = _Internal_push_back(sizeof(_Ty), _K);
new (_Ptr) _Ty( std::move(_Item));
return iterator(*this, _K, _Ptr);
}
I have to speculate on _Internal_push_back here, but I'd wager it allocates raw memory for storing the item (and points the last element towards this new node) so that the next line can use emplacement new. I'd imagine that _Internal_push_backis internally thread-safe, however I don't see any synchronization happening before the emplacement new. Meaning the following is possible:
memory is obtained and the node is "present" (yet emplacement new hasn't happend)
the looping thread encounters this node and performs memcmp to discover that they're not equal
emplacement new happens.
There's definitely a race condition here. I can spontaneously reproduce the problem, moreso the more threads I use.
I recommend that you open a ticket with Microsoft support on this one.

Placement New on already existing object in SharedMemory

I have two programs. The first allocates a Shared-Memory file and the second reads from it.. I am using placement-new to place objects into this memory guaranteeing that the objects do NOT use new or allocate any memory outside of the Shared-Memory file.
My Array structure:
template<typename T, size_t Size>
struct SHMArray {
SHMArray() : ptr(elements) {}
SHMArray(const SHMArray& other) { std::copy(other.begin(), other.end(), begin()); }
SHMArray(SHMArray&& other)
{
std::swap(other.ptr, ptr);
std::fill_n(ptr.get(), Size, T());
}
~SHMArray()
{
std::fill_n(ptr.get(), Size, T());
}
constexpr bool empty() const noexcept
{
return Size == 0;
}
constexpr size_type size() const noexcept
{
return Size;
}
T& operator[](std::size_t pos)
{
return *(ptr.get() + pos);
}
constexpr const T& operator[](std::size_t pos) const
{
return *(ptr.get() + pos);
}
T* data() noexcept
{
return ptr.get();
}
constexpr const T* data() const noexcept
{
return ptr.get();
}
private:
offset_ptr<T> ptr;
T elements[];
};
Program 1:
int main()
{
//Allocate a shared memory file of 1mb..
auto memory_map = SharedMemoryFile("./memory.map", 1024 * 1024, std::ios::in | std::ios::out);
memory_map.lock();
//Pointer to the shared memory.
void* data = memory_map.data();
//Place the object in the memory..
SHMArray<int, 3>* array = ::new(data) SHMArray<int, 3>();
(*array)[0] = 500;
(*array)[1] = 300;
(*array)[2] = 200;
memory_map.unlock(); //signals other program it's okay to read..
}
Program 2:
int main()
{
//Open the file..
auto memory_map = SharedMemoryFile("./memory.map", 1024 * 1024, std::ios::in | std::ios::out);
memory_map.lock();
//Pointer to the shared memory.
void* data = memory_map.data();
//Place the object in the memory..
//I already understand that I could just cast the `data` to an SHMArray..
SHMArray<int, 3>* array = ::new(data) SHMArray<int, 3>();
for (int i = 0; i < array.size(); ++i)
{
std::cout<<(*array)[i]<<"\n";
}
memory_map.unlock(); //signals other program it's okay to read..
}
Program One placed the SHMArray in memory with placement new. Program Two does the same thing on top of program one's already placed object (overwriting it). Is this undefined behaviour? I don't think it is but I want to confirm.
Neither program calls the destructor array->~SHMVEC(); I also don't think this leaks as long as I close the MemoryMapped file then it should all be fine.. but I want to make sure this is fine. If I ran the programs again on the same file, it shouldn't be a problem.
I am essentially making the assumption that placement new is working as if I placed a C struct in memory in this particular scenario via: struct SHMArray* array = (struct SHMArray*)data;.. Is this correct?

I am essentially making the assumption that placement new is working
as if I placed a C struct in memory in this particular scenario via:
struct SHMArray* array = (struct SHMArray*)data;.. Is this correct?
No, this is not correct. Placement new also invokes the object's appropriate constructor. "struct SHMArray* array = (struct SHMArray*)data;" does not invoke any object's constructor. It's just a pointer conversion cast. Which does not invoke anyone's constructor. Key difference.
In your sample code, you do actually want to invoke the templated object's constructor. Although the shown example has other issues, as already mentioned in the comments, this does appear to be what needs to be done in this particular situation.
But insofar as the equivalent of placement new versus a pointer cast, no they're not the same. One invokes a constructor, one does not. new always invokes the constructor, whether it's placement new, or not. This is a very important detail, that's not to be overlooked.

how to allocate memory for arrays of structure of arrays

So I have a struct as shown below, I would like to create an array of that structure and allocate memory for it (using malloc).
typedef struct {
float *Dxx;
float *Dxy;
float *Dyy;
} Hessian;
My first instinct was to allocate memory for the whole structure, but then, I believe the internal arrays (Dxx, Dxy, Dyy) won't be assigned. If I assign internal arrays one by one, then the structure of arrays would be undefined. Now I think I should assign memory for internal arrays and then for the structure array, but it seems just wrong to me. How should I solve this issue?
I require a logic for using malloc in this situation instead of new / delete because I have to do this in cuda and memory allocation in cuda is done using cudaMalloc, which is somewhat similar to malloc.

In C++ you should not use malloc at all and instead use new and delete if actually necessary. From the information you've provided it is not, because in C++ you also rather use std::vector (or std::array) over C-style-arrays. Also the typedef is not needed.
So I'd suggest rewriting your struct to use vectors and then generate a vector of this struct, i.e.:
struct Hessian {
std::vector<float> Dxx;
std::vector<float> Dxy;
std::vector<float> Dyy;
};
std::vector<Hessian> hessianArray(2); // vector containing two instances of your struct
hessianArray[0].Dxx.push_back(1.0); // example accessing the members
Using vectors you do not have to worry about allocation most of the time, since the class handles that for you. Every Hessian contained in hessianArray is automatically allocated for you, stored on the heap and destroyed when hessianArray goes out of scope.

It seems like problem which could be solved using STL container. Regarding the fact you won't know sizes of arrays you may use std::vector.
It's less error-prone, easier to maintain/work with and standard containers free their resources them self (RAII). #muXXmit2X already shown how to use them.
But if you have/want to use dynamic allocation, you have to first allocate space for array of X structures
Hessian *h = new Hessian[X];
Then allocate space for all arrays in all structures
for (int i = 0; i < X; i++)
{
h[i].Dxx = new float[Y];
// Same for Dxy & Dyy
}
Now you can access and modify them. Also dont forget to free resources
for (int i = 0; i < X; i++)
{
delete[] h[i].Dxx;
// Same for Dxy & Dyy
}
delete[] h;
You should never use malloc in c++.
Why?
new will ensure that your type will have their constructor called. While malloc will not call constructor. The new keyword is also more type safe whereas malloc is not typesafe at all.

As other answers point out, the use of malloc (or even new) should be avoided in c++. Anyway, as you requested:
I require a logic for using malloc in this situation instead of new / delete because I have to do this in cuda...
In this case you have to allocate memory for the Hessian instances first, then iterate throug them and allocate memory for each Dxx, Dxy and Dyy. I would create a function for this like follows:
Hessian* create(size_t length) {
Hessian* obj = (Hessian*)malloc(length * sizeof(Hessian));
for(size_t i = 0; i < length; ++i) {
obj[i].Dxx = (float*)malloc(sizeof(float));
obj[i].Dxy = (float*)malloc(sizeof(float));
obj[i].Dyy = (float*)malloc(sizeof(float));
}
return obj;
}
To deallocate the memory you allocated with create function above, you have to iterate through Hessian instances and deallocate each Dxx, Dxy and Dyy first, then deallocate the block which stores the Hessian instances:
void destroy(Hessian* obj, size_t length) {
for(size_t i = 0; i < length; ++i) {
free(obj[i].Dxx);
free(obj[i].Dxy);
free(obj[i].Dyy);
}
free(obj);
}
Note: using the presented method will pass the responsibility of preventing memory leaks to you.
If you wish to use the std::vector instead of manual allocation and deallocation (which is highly recommended), you can write a custom allocator for it to use cudaMalloc and cudaFree like follows:
template<typename T> struct cuda_allocator {
using value_type = T;
cuda_allocator() = default;
template<typename U> cuda_allocator(const cuda_allocator<U>&) {
}
T* allocate(std::size_t count) {
if(count <= max_size()) {
void* raw_ptr = nullptr;
if(cudaMalloc(&raw_ptr, count * sizeof(T)) == cudaSuccess)
return static_cast<T*>(raw_ptr);
}
throw std::bad_alloc();
}
void deallocate(T* raw_ptr, std::size_t) {
cudaFree(raw_ptr);
}
static std::size_t max_size() {
return std::numeric_limits<std::size_t>::max() / sizeof(T);
}
};
template<typename T, typename U>
inline bool operator==(const cuda_allocator<T>&, const cuda_allocator<U>&) {
return true;
}
template<typename T, typename U>
inline bool operator!=(const cuda_allocator<T>& a, const cuda_allocator<U>& b) {
return !(a == b);
}
The usage of an custom allocator is very simple, you just have to specify it as second template parameter of std::vector:
struct Hessian {
std::vector<float, cuda_allocator<float>> Dxx;
std::vector<float, cuda_allocator<float>> Dxy;
std::vector<float, cuda_allocator<float>> Dyy;
};
/* ... */
std::vector<Hessian, cuda_allocator<Hessian>> hessian;

wondering if I can use stl smart pointers for this

I currently have a c++ class as follows:
template<class T>
class MyQueue {
T** m_pBuffer;
unsigned int m_uSize;
unsigned int m_uPendingCount;
unsigned int m_uAvailableIdx;
unsigned int m_uPendingndex;
public:
MyQueue(): m_pBuffer(NULL), m_uSize(0), m_uPendingCount(0), m_uAvailableIdx(0),
m_uPendingndex(0)
{
}
~MyQueue()
{
delete[] m_pBuffer;
}
bool Initialize(T *pItems, unsigned int uSize)
{
m_uSize = uSize;
m_uPendingCount = 0;
m_uAvailableIdx = 0;
m_uPendingndex = 0;
m_pBuffer = new T *[m_uSize];
for (unsigned int i = 0; i < m_uSize; i++)
{
m_pBuffer[i] = &pItems[i];
}
return true;
}
};
So, I have this pointer to arrays m_pBuffer object and I was wondering if it is possible to replace this way of doing things with the c++ smart pointer perhaps? I know I can do things like:
std::unique_ptr<T> buffer(new T[size]);
Is using a vector of smart pointers the way to go? Is this recommended and safe?
[EDIT]
Based on the answers and the comments, I have tried to make a thread-safe buffer array. Here it is. Please comment.
#ifndef __BUFFER_ARRAY_H__
#define __BUFFER_ARRAY_H__
#include <memory>
#include <vector>
#include <mutex>
#include <thread>
#include "macros.h"
template<class T>
class BufferArray
{
public:
class BufferArray()
:num_pending_items(0), pending_index(0), available_index(0)
{}
// This method is not thread-safe.
// Add an item to our buffer list
void add(T * buffer)
{
buffer_array.push_back(std::unique_ptr<T>(buffer));
}
// Returns a naked pointer to an available buffer. Should not be
// deleted by the caller.
T * get_available()
{
std::lock_guard<std::mutex> lock(buffer_array_mutex);
if (num_pending_items == buffer_array.size()) {
return NULL;
}
T * buffer = buffer_array[available_index].get();
// Update the indexes.
available_index = (available_index + 1) % buffer_array.size();
num_pending_items += 1;
return buffer;
}
T * get_pending()
{
std::lock_guard<std::mutex> lock(buffer_array_mutex);
if (num_pending_items == 0) {
return NULL;
}
T * buffer = buffer_array[pending_index].get();
pending_index = (pending_index + 1) % buffer_array.size();
num_pending_items -= 1;
}
private:
std::vector<std::unique_ptr<T> > buffer_array;
std::mutex buffer_array_mutex;
unsigned int num_pending_items;
unsigned int pending_index;
unsigned int available_index;
// No copy semantics
BufferArray(const BufferArray &) = delete;
void operator=(const BufferArray &) = delete;
};
#endif

Vector of smart pointers is good idea. It is safe enough inside your class - automatic memory deallocation is provided.
It is not thread-safe though, and it's not safe in regard of handling external memory given to you by simple pointers.
Note that you current implementation does not delete pItems memory in destructor, so if after refactoring you mimic this class, you should not use vector of smart pointers as they will delete memory referenced by their pointers.
On the other side you cannot garantee that noone outside will not deallocate memory for pItems supplied to your Initialize. IF you want to use vector of smart pointers, you should formulate contract for this function that clearly states that your class claims this memory etc. - and then you should rework outside code that calls your class to fit into new contract.
If you don't want to change memory handling, vector of simple pointers is the way to go. Nevertheless, this piece of code is so simple, that there is no real benefit of vector.
Note that overhead here is creation of smart pointer class for each buffer and creation of vector class. Reallocation of vector can take up more memory and happens without your direct control.

The code has two issues:
1) Violation of the rule of zero/three/five:
To fix that you do not need a smart pointer here. To represent a dynamic array with variable size use a std:vector<T*>. That allows you to drop m_pBuffer, m_uSize and the destructor, too.
2) Taking the addresses of elements of a possible local array
In Initialize you take the addresses of the elements of the array pItems passed as argument to the function. Hence the queue does not take ownership of the elements. It seems the queue is a utility class, which should not be copyable at all:
template<class T>
class MyQueue
{
std::vector<T*> m_buffer;
unsigned int m_uPendingCount;
unsigned int m_uAvailableIdx;
unsigned int m_uPendingndex;
public:
MyQueue(T *pItems, unsigned int uSize)
: m_buffer(uSize, nullptr), m_uPendingCount(0), m_uAvailableIdx(0), m_uPendingndex(0)
{
for (unsigned int i = 0; i < uSize; i++)
{
m_buffer[i] = &pItems[i];
}
}
private:
MyQueue(const MyQueue&); // no copy (C++11 use: = delete)
MyQueue& operator = (const MyQueue&); // no copy (C++11 use: = delete)
};
Note:
The red herring is the local array.
You may consider a smart pointer for that, but that is another question.

Use constructor to allocate memory

I have a class that contains several arrays whose sizes can be determined by parameters to its constructor. My problem is that instances of this class have sizes that can't be determined at compile time, and I don't know how to tell a new method at run time how big I need my object to be. Each object will be of a fixed size, but different instances may be different sizes.
There are several ways around the problem:- use a factory- use a placement constructor- allocate arrays in the constructor and store pointers to them in my object.
I am adapting some legacy code from an old application written in C. In the original code, the program figures out how much memory will be needed for the entire object, calls malloc() for that amount, and proceeds to initialize the various fields.
For the C++ version, I'd like to be able to make a (fairly) normal constructor for my object. It will be a descendant of a parent class, and some of the code will be depending on polymorphism to call the right method. Other classes descended from the same parent have sizes known at compile time, and thus present no problem.
I'd like to avoid some of the special considerations necessary when using placement new, and I'd like to be able to delete the objects in a normal way.
I'd like to avoid carrying pointers within the body of my object, partially to avoid ownership problems associated with copying the object, and partially because I would like to re-use as much of the existing C code as possible. If ownership were the only issue, I could probably just use shared pointers and not worry.
Here's a very trimmed-down version of the C code that creates the objects:
typedef struct
{
int controls;
int coords;
} myobject;
myobject* create_obj(int controls, int coords)
{
size_t size = sizeof(myobject) + (controls + coords*2) * sizeof(double);
char* mem = malloc(size);
myobject* p = (myobject *) mem;
p->controls = controls;
p->coords = coords;
return p;
}
The arrays within the object maintain a fixed size of the life of the object. In the code above, memory following the structure of myobject will be used to hold the array elements.
I feel like I may be missing something obvious. Is there some way that I don't know about to write a (fairly) normal constructor in C++ but be able to tell it how much memory the object will require at run time, without resorting to a "placement new" scenario?

How about a pragmatic approach: keep the structure as is (if compatibility with C is important) and wrap it into a c++ class?
typedef struct
{
int controls;
int coords;
} myobject;
myobject* create_obj(int controls, int coords);
void dispose_obj(myobject* obj);
class MyObject
{
public:
MyObject(int controls, int coords) {_data = create_obj(controls, coords);}
~MyObject() {dispose_obj(_data);}
const myobject* data() const
{
return _data;
}
myobject* data()
{
return _data;
}
int controls() const {return _data->controls;}
int coords() const {return _data->coords;}
double* array() { return (double*)(_data+1); }
private:
myobject* _data;
}

While I understand the desire to limit the changes to the existing C code, it would be better to do it correctly now rather than fight with bugs in the future. I suggest the following structure and changes to your code to deal with it (which I suspect would mostly be pulling out code that calculates offsets).
struct spots
{
double x;
double y;
};
struct myobject
{
std::vector<double> m_controls;
std::vector<spots> m_coordinates;
myobject( int controls, int coordinates ) :
m_controls( controls ),
m_coordinates( coordinates )
{ }
};

To maintain the semantics of the original code, where the struct and array are in a single contigious block of memory, you can simply replace malloc(size) with new char[size] instead:
myobject* create_obj(int controls, int coords)
{
size_t size = sizeof(myobject) + (controls + coords*2) * sizeof(double);
char* mem = new char[size];
myobject* p = new(mem) myobject;
p->controls = controls;
p->coords = coords;
return p;
}
You will have to use a type-cast when freeing the memory with delete[], though:
myobject *p = create_obj(...);
...
p->~myobject();
delete[] (char*) p;
In this case, I would suggest wrapping that logic in another function:
void free_obj(myobject *p)
{
p->~myobject();
delete[] (char*) p;
}
myobject *p = create_obj(...);
...
free_obj(p);
That being said, if you are allowed to, it would be better to re-write the code to follow C++ semantics instead, eg:
struct myobject
{
int controls;
int coords;
std::vector<double> values;
myobject(int acontrols, int acoords) :
controls(acontrols),
coords(acoords),
values(acontrols + acoords*2)
{
}
};
And then you can do this:
std::unique_ptr<myobject> p = std::make_unique<myobject>(...); // C++14
...
std::unique_ptr<myobject> p(new myobject(...)); // C++11
...
std::auto_ptr<myobject> p(new myobject(...)); // pre C++11
...

New Answer (given comment from OP):
Allocate a std::vector<byte> of the correct size. The array allocated to back the vector will be contiguous memory. This vector size can be calculated and the vector will manage your memory correctly. You will still need to be very careful about how you manage your access to that byte array obviously, but you can use iterators and the like at least (if you want).
By the way here is a little template thing I use to move along byte blobs with a little more grace (note this has aliasing issues as pointed out by Sergey in the comments below, I'm leaving it here because it seems to be a good example of what not to do... :-) ) :
template<typename T>
T readFromBuf(byte*& ptr) {
T * const p = reinterpret_cast<T*>(ptr);
ptr += sizeof(T);
return *p;
}
Old Answer:
As the comments suggest, you can easily use a std::vector to do what you want. Also I would like to make another suggestion.
size_t size = sizeof(myobject) + (controls + coords*2) * sizeof(double);
The above line of code suggests to me that you have some "hidden structure" in your code. Your myobject struct has two int values from which you are calculating the size of what you actually need. What you actually need is this:
struct ControlCoord {
double control;
std::pair<double, double> coordinate;
};
std::vector<ControlCoord>> controlCoords;

When the comments finally scheded some light on the actual requirements, the solution would be following:
allocate a buffer large enough to hold your object and the array
use placement new in the beginning of the buffer
Here is how:
class myobject {
myobject(int controls, int coords) : controls(controls), coords(coords) {}
~myobject() {};
public:
const int controls;
const int coords;
static myobject* create(int controls, int coords) {
std::unique_ptr<char> buffer = new char[sizeof(myobject) + (controls + coords*2) * sizeof(double)];
myobject obj* = new (buffer.get()) myobject(controls, coords);
buffer.release();
return obj;
}
void dispose() {
~myobject();
char* p = (char*)this;
delete[] p;
}
};
myobject *p = myobject::create(...);
...
p->dispose();
(or suitably wrapped inside deleter for smart pointer)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

CUDA C++, modifying a function so that it runs from device? [duplicate] - c++

You can't use the STL in CUDA, but you may be able to use the Thrust library to do what you want. Otherwise just copy the contents of the vector to the device and operate on it normally.

In the cuda library thrust, you can use thrust::device_vector<classT> to define a vector on device, and the data transfer between host STL vector and device vector is very straightforward. you can refer to this useful link:http://docs.nvidia.com/cuda/thrust/index.html to find some useful examples.

you can't use std::vector in device code, you should use array instead.

Related

Is it concurrency-safe to call concurrency::concurrent_vector::push_back while iterating over that concurrent_vector in other thread?

Placement New on already existing object in SharedMemory

how to allocate memory for arrays of structure of arrays

wondering if I can use stl smart pointers for this

Use constructor to allocate memory

Categories

Resources