Dynamic array in an array of structures in OpenCL - c++

I have a struct :
struct A
{
double a;
int c;
double *array;
}
main()
{
A *str = new A[50];
for(int i=0;i<50;i++)
{
str[i].array = new double[5];
str[i].array[0] = 50;
}
.....
Buffer BufA = Buffer(...,..., 50 * sizeof(A),str);
.....
}
In kernel
struct A
{
double a;
int c;
double *array;
}
__kernel void vector(__global A *str)
{
int id = get_global_id(0);
printf("Element - %f",str[id].array[0]);
}
But in the kernel does not see the value in the array. Probably, because in the buffer I allocated memory for an array of structures without the memory of a dynamic array. How can I implement this?

On modern system, a process doesn't see the actual addresses of objects, but rather the virtual addresses of such objects.
This means, two processes cannot pass each others pointers and expect them to mean the same thing. You need to rethink your application with that in mind.

On top of the address virtualization mentioned by YSC, you should also keep in mind that the memory that your graphics card (or other OCL device) is operating on may be distinct (as in, different pieces of hardware) from the memory your CPU is operating on.
The OpenCL buffers are responsible for transporting their contents between these memories. So for example an array of ints that you create and write to on the CPU would have to be copied to GPU memory (and have space allocated there, and possibly be copied back after the kernel is done), which these buffers do for you. But if you store pointers to other CPU memory in your buffer, then that other memory will not be transferred automatically. Further, the pointer relation would most likely break, as there is no guarantee that your other data is at the same location in GPU memory as in CPU memory.
The solution, naturally, is to put all the data you want transferred into buffers, including the sub-arrays. One way to do this without using excessive amounts of buffers would be to pack the sub-arrays together into one and storing indices into it instead of pointers to memory.

Related

Memory allocate in c++

I have a project in which I have to allocate 1024 bytes when my program starts. In C++ program.
void* available = new char*[1024];
I write this and I think it is okay.
Now my problem starts, I should make a function that receives size_t size (number of bytes) which I should allocate. My allocate should return a void* pointer to the first bytes of this available memory. So my question is how to allocate void* pointer with size and to get memory from my available.
I'm a student and I'm not a professional in C++.
Also sorry for my bad explanation.
It looks like you're trying to make a memory pool. Even though that's a big topic let's check what's the minimal effort you can pour to create something like this.
There are some basic elements to a pool that one needs to grasp. Firstly the memory itself, i.e. where do you draw memory from. In your case you already decided that you're going to dynamically allocate a fixed amount of memory. To do it properly the the code should be:
char *poolMemory = new char[1024];
I didn't choose void* pool here because delete[] pool is undefined when pool is a void pointer. You could go with malloc/free but I'll keep it C++. Secondly I didn't allocate an array of pointers as your code shows because that allocates 1024 * sizeof(char*) bytes of memory.
A second consideration is how to give back the memory you acquired for your pool. In your case you want to remember to delete it so best you put it in a class to do the RAII for you:
class Pool
{
char *_memory;
void *_pool;
size_t _size;
public:
Pool(size_t poolSize = 1024)
: _memory(new char[poolSize])
, _pool(_memory)
, _size(poolSize)
{
}
~Pool() { delete[] _memory; } // Forgetting this will leak memory.
};
Now we come to the part you're asking about. You want to use memory inside that pool. Make a method in the Pool class called allocate that will give back n number of bytes. This method should know how many bytes are left in the pool (member _size) and essentially performs pointer arithmetic to let you know which location is free. There is catch unfortunately. You must provide the required alignment that the resulting memory should have. This is another big topic that judging from the question I don't think you intent to handle (so I'm defaulting alignment to 2^0=1 bytes).
#include <memory>
void* Pool::allocate(size_t nBytes, size_t alignment = 1)
{
if (std::align(alignment, nBytes, _pool, _size))
{
void *result = _pool;
// Bookkeeping
_pool = (char*)_pool + nBytes; // Advance the pointer to available memory.
_size -= nBytes; // Update the available space.
return result;
}
return nullptr;
}
I did this pointer arithmetic using std::align but I guess you could do it by hand. In a real world scenario you'd also want a deallocate function, that "opens up" spots inside the pool after they have been used. You'd also want some strategy for when the pool has run out of memory, a fallback allocation. Additionally the initially memory acquisition can be more efficient e.g. by using static memory where appropriate. There are many flavors and aspects to this, I hope the initial link I included gives you some motivation to research a bit on the topic.

lazy allocation for c++ object arrays

If I do something like:
void f() {
const int n = 1<<14;
int *foo = new int [n];
}
or
void f() {
const int n = 1<<14;
int *foo = new int [n]();
}
Will the Linux kernel will use lazy memory allocation? For the second case, in the same way than when creating static arrays?
How far can I take this? For instance, having a struct that will be filled with 0s, will it always be allocated lazily, or will it actually allocate physical RAM when it is initialized?
struct X {
int a, b, c, d, f, g, ..., z;
}
void f() {
int *foo = new X();//lazy?
const int n = 1<<14;
int *foo = new X [n]();//lazy?
}
For a standard Ubuntu 20.04 machine running Linux 5.4.0-51-generic....
We can observe this directly. In the code below, I increased the n value to 1 << 24 (~16 million ints = 64MB for 32-bit int) so it's the dominant factor in overall memory usage. I compiled, ran, and observed memory usage in htop:
#include <unistd.h>
int main() {
int *foo = new int [1 << 24];
sleep(100);
}
htop values: VIRT 71416KB / RES 1468KB
The virtual address allocations include the memory allocated by new, but the resident memory size is much smaller - indicating that distinct physical backing memory pages weren't needed yet for all the 64MB allocated.
After changing to int *foo = new int[1<<24]();:
htop values: VIRT 71416KB / RES 57800KB
Requesting the memory be zeroed resulted in a resident memory value just under the 64MB that was initialised, and it won't have been due to memory pressure (I have 64GB RAM), but some algorithm in the kernel must have decided to page out some of the backing memory after it was zeroed (I suspect kswapd?). The large RES value suggests that each page zeroed was given a distinct page of physical backing memory (as distinct from e.g. being mapped to the OS's zero-page for COW-allocation of an actual backing page).
With structs:
#include <unistd.h>
struct X {
int a[1 << 24];
};
int main() {
auto foo = new X;
sleep(100);
}
htop values: VIRT 71416KB / RES 1460KB
This shows insufficient RES for the static arrays to have distinct backing pages. Either the virtual memory has been pre-mapped to the OS zero-page, or it's unmapped and will be mapped initially to the zero-page when accessed, then given its own physical backing page if written to - I'm not sure which, but in terms of actual physical RAM usage it doesn't make any difference.
After changing to auto foo = new X{};
htop values: VIRT 71416KB / RES 67844KB
You can clearly see that initialising the bytes to 0s resulted in use of backing memory for the arrays.
Addressing your questions:
Will the Linux kernel will use lazy memory allocation?
The virtual memory allocation is done when the new is done. Distinct physical backing memory is allocated lazily when an actual write is done to the memory by the user-space code.
For the second case, in the same way than when creating static arrays?
#include <unistd.h>
int g_a[1 << 24];
int f(int i) {
static int a[1 << 24];
return a[i];
}
int main(int argc, const char* argv[]) {
sleep(20);
int k = f(2930);
sleep(20);
return argc + k;
}
VIRT 133MB RES 1596KB
When this was run, the memory didn't jump after 20 seconds, indicating all the virtual address space was allocated during program loading. The low resident memory shows that the pages were not accessed and zeroed the way they were for new.
Just to address a potential point of confusion: while the Linux Kernel will zero out backing memory the first time it's provided to the process, any given call to new won't (in any implementation I've seen) know whether the memory allocated is being recycled from earlier dynamic allocations - which might have had non-zero values written into it - that have since been deleted/freed. Because of this, if you use memory-zeroing forms like new X{} or new int[n]() then the memory will be unconditionally cleared by the user-space code, causing the full amount of backing memory to be assigned and faulted in.
As many comments said, operator new usually uses malloc under the hood. malloc allocates space but does not by default allocate physical pages. However, malloc often writes internal data to the beginning of a block of memory, so only the first or first couple of pages allocated as virtual address space will fault and be allocated physically by the Linux kernel. The Linux kernel zeroes all allocated physical pages so whether you add () to the end of the allocation to zero-initialize the allocated memory probably has no effect, in terms of new physical pages being assigned. (Already allocated physical pages mapped to the allocated virtual address range are zeroed in that case.)

How to create pointer to pointer array same as VST audio buffer?

In the VST spec, a buffer of multichannel audio data is passed around.....
MyClass::ProcessDoubleReplacing(double **inputs, double **outputs, int frames)
{
//and accessed like this...
outputs[channel][sample] = inputs[channel][sample]
}
I want to create a similar "2d pointer array", but am not having much success. I can create a simple pointer and iterate through it reading/writing values....
double* samples;
samples[0] = aValue;
.... but am having a crash festival trying to implement something that will allow me to...
samples[0][0] = aValue;
What would be the correct way to implement this?
double* samples;
samples[0] = aValue;
That's really bad. :( Please don't do this! "sample" is just a pointer to somewhere in your memory.
The memory it points to is not allocated at all, but you're writing to this memory...
You can allocate a block of memory either from the heap or from the stack. However, the stack has a size limit (configured in your compiler settings) - so for larger blocks (like audio data) you would typically allocate it from the heap. But you have to take care, that you won't leak memory from the heap - the stack memory is automatic managed by the scope of your variable, so that's easier to start with.
In C/C++ you can allocate memory from the stack like this:
double samples[512];
then you can do stuff like:
samples[0] = aValue; // change value of 1st sample in sample buffer with 512 elements
or
double* pointerToSample = samples[255]; // point to 256ths sample in the sample buffer
pointerToSample[127] = aValue; // change value of 384ths sample (256+128) in our sample buffer with 512 elements
and so on...
BUT if you just do,
double* pointerToSample;
pointerToSample[127] = aValue;
You're actualing writing to unallocated memory! Your pointer points somewhere, but there is no allocated memory behind it.
Be carefull with this! Also never access pointerToSample if the samples variable is already out-of-scope: the memory pointerToSample points to is no longer allocated otherwise.
To allocate memory from the heap in C++ there is the keyword new (to allocate memory) and delete (to free memory afterwards) dynamically.
i.e.
double *samples = new double[512];
will allocate a block of memory for your sample data. But after using it, you have to manually delete it - otherwise you're leaking memory. So just do:
delete[] samples;
after you're finished with it.
Last but not least to answer your question how to create a two dimensional array to call the method ProcessDoubleReplacing()
int main(int argc, char ** argv){
/* create 2 dimensional array */
int** samplesIn = new int*[44100];
int** samplesOut = new int*[44100];
for(int i = 0; i < 44100; ++i){ // 1s # 44.1Khz
samplesIn[i] = new int[2]; // stereo
samplesOut[i] = new int[2]; // stereo
}
/* TODO: fill your input buffer with audio samples from somewhere i.e. file */
ProcessDoubleReplacing(samplesIn, samplesOut, 44100);
/* cleanup */
for(int i = 0; i < 44100; ++i) {
delete [] samplesIn[i];
delete [] samplesOut[i];
}
delete [] samplesIn;
delete [] samplesOut;
return 0;
}
#Constantin's answer pretty much nailed it, but I just wanted to add that in your implementation you should not allocate the buffers in your process() callback. Doing so may cause your plugin to take too much time, and as a consequence the system can drop audio buffers, causing playback glitches.
So instead, these buffers should be fields of your main processing class (ie, the AEffect), and you should allocate their size in the constructor. Never use new or delete inside of the process() method or else you are asking for trouble!
Here's a great guide about the do's and don'ts of realtime audio programming.
If you want to write something in C++ to provide a similar interface like the one you showed, I would use std::vector for managing the memory like this:
vector<vector<double>> buffers (2,vector<double>(500));
This only stores the data. For an array of pointers you need an array of pointers. :)
vector<double*> pointers;
pointers.push_back(buffers[0].data());
pointers.push_back(buffers[1].data());
This works since std::vector makes the guarantee that all elements are stored adjacent and linearly in memory. So, you're also allowed to do this:
double** p = pointers.data();
p[0][123] = 17;
p[1][222] = 29;
It's important to note that if you resize some of these vectors, the pointers might get invalid in which case you should go ahead and get the new pointer(s).
Keep in mind that the data member function is a C++11 feature. If you don't want to use it, you can write
&some_vector[0] // instead of some_vector.data()
(unless the vector is empty)
Instead of passing a double** to some function, you might be interested in passing the buffers vector directly by reference, though, this obviously won't work if you want your interface to be C compatible. Just saying.
Edit: A note on why I chose std::vector over new[] and malloc: Because it's the right thing to do in modern C++! The chance of messing up in this case is lower. You won't have any memory leaks since the vector takes care of managing the memory. This is especially important in C++ since you might have exceptions flying around so that functions might be exited early before the use of a delete[] at the end of the function.

How do I "reset" a buffer?

Say I create a member variable pointer pBuffer. I send this buffer into some unknown land to be filled with data. Now say pBuffer has an arbitrary amount of data in it.
Q: Is there a way to reset pBuffer without completely deleting it, while still deallocating all unnecessary memory it was occupying?
Example:
class Blah
{
public:
unsigned char* pBuffer;
Blah(){pBuffer = NULL;}
~Blah(){}
FillBuffer()
{
//fill the buffer with data, doesn't matter how
}
ResetBuffer()
{
//????? reset the buffer without deleting it, still deallocate memory ?????
}
};
int main()
{
Blah b;
b.FillBuffer();
b.ResetBuffer();
b.FillBuffer(); //if pBuffer were deleted, this wouldn't work
}
Try realloc() if you know the amount of stuff in the buffer vs the remaining space in the buffer.
Using only a single raw pointer, no; but if you keep a size variable you can reset the buffer relatively easily.
However, this being tagged as C++, I would like to caution you from doing this and will instead propose an alternative. This meets your requirement of allowing memory to be allocated then later for the buffer to be "reset", without deallocating the memory. As a side benefit, using std::vector means that you don't have to worry about the memory leaking in subsequent calls to FillBuffer(), specifically when the existing buffer is too small and would need to be reallocated.
#include <vector>
class Blah
{
public:
std::vector<unsigned char> pBuffer;
Blah(){}
~Blah(){}
FillBuffer()
{
//fill the buffer with data, doesn't matter how
}
ResetBuffer()
{
pBuffer.clear();
// if you _really_ want the memory "pointed to" to be freed to the heap
// use the std::vector<> swap idiom:
// std::vector<unsigned char> empty_vec;
// pBuffer.swap(empty_vec);
}
};
Buffers typically need a maximum size and a current size. To "reset", you would set the current size to zero. When you use it again, you might need to grow or shrink the maximum size of the buffer. Use realloc or malloc/new and memcpy (which realloc does internally when growing) to move existing data to the new buffer.
Note that these are expensive operations. If you expect the buffer to need to grow from use to use, you might consider doubling its maximum size every time. This effectively amortizes the cost of the allocation and copy.

New is taking lots of extra memory

I'm making an application that is going to be using many dynamically created objects (raytracing). Instead of just using [new] over and over again, I thought I'd just make a simple memory system to speed things up. Its very simple at this point, as I don't need much.
My question is: when I run this test application, using my memory manager uses the correct amount of memory. But when I run the same loop using [new], it uses 2.5 to 3 times more memory. Is there just something I'm not seeing here, or does [new] incur a huge overhead?
I am using VS 2010 on Win7. Also I'm just using the Task Manager to view the process memory usage.
template<typename CLASS_TYPE>
class MemFact
{
public:
int m_obj_size; //size of the incoming object
int m_num_objs; //number of instances
char* m_mem; //memory block
MemFact(int num) : m_num_objs(num)
{
CLASS_TYPE t;
m_obj_size = sizeof(t);
m_mem = new char[m_obj_size * m_num_objs);
}
CLASS_TYPE* getInstance(int ID)
{
if( ID >= m_num_objs) return 0;
return (CLASS_TYPE*)(m_mem + (ID * m_obj_size));
}
void release() { delete m_mem; m_mem = 0; }
};
/*---------------------------------------------------*/
class test_class
{
float a,b,c,d,e,f,g,h,i,j; //10 floats
};
/*---------------------------------------------------*/
int main()
{
int num = 10 000 000; //10 M items
// at this point we are using 400K memory
MemFact<test_class> mem_fact(num);
// now we're using 382MB memory
for(int i = 0; i < num; i++)
test_class* new_test = mem_fact.getInstance(i);
mem_fact.release();
// back down to 400K
for(int i = 0; i < num; i++)
test_class* new_test = new test_class();
// now we are up to 972MB memory
}
There is a minimum size for a memory allocation, depending on the CRT you are using. Often that's 16 bytes. Your object is 12 bytes wide (assuming x86), so you're probably wasting at least 4 bytes per allocation right there. The memory manager also has it's own structures to keep track of what memory is free and what memory is not -- that's not free. Your memory manager is probably much simplier (e.g. frees all those objects in one go) which is inherently going to be more efficient than what new does for the general case.
Also keep in mind that if you're building in debug mode, the debugging allocator will pad both sides of the returned allocation with canaries in an attempt to detect undefined behavior. That'll probably put you over the 16 byte boundary and into the next one -- probably a 32 byte allocation, at least. That'll be disabled when you build in release mode.
Boy, I sure hope that nobody wants to allocate any non-PODs from your memory manager. Or objects of dynamic size. And doesn't mind instantiating it for every type. Or creating as many as they like all at once. Or having their lifetime be longer than the MemFact.
In fact, there is a valid pattern known as an Object Pool, which is similar to yours but doesn't suck. The simple answer is that operator new is required to be ultra flexible- it's objects must live forever until delete is called- and their destructor must be called too, and they must all have completely separate, independent lifetimes. It must be able to allocate variable-size objects, and of any type, at any time. Your MemFact meets none of these requirements. The Object Pool also has less requirements, and is significantly faster than regular new because of it, but it also doesn't completely fail on all the other fronts.
You're trying to compare an almost completely rotten apple with an orange.