copying between interleaved openGL Vertex Buffer Objects - c++

using opengl 3.3, radeon 3870HD, c++..
I got question about interleaved arrays of data. I got in my application structure in vector, which is send as data to buffer object. Something like this:
struct data{
int a;
int b;
int c;
};
std::vector<data> datVec;
...
glBufferData(GL_ARRAY_BUFFER, sizeof(data)*datVec.size(), &datVec[0], GL_DYNAMIC_DRAW);
this is ok I use this thing very often. But what I create is interleaved array so data are like:
a1,b1,c1,a2,b2,c2,a3,b3,c3
Now I send this thing down for processing in GPU and with transform feedback I read back into buffer for example b variables. So it looks like:
bU1, bU2, bU3
I'd like to copy updated values into interleaved buffer, can this be done with some single command like glCopyBufferSubData? This one isn't suitable as it only takes offset and size not stride (probably it's something like memcpy in c++)... The result should look like:
a1, bU1, c1, a2, bU2, c2, a3, bU3, c3
If not is there better approach than these 2 mine?
map updated buffer, copy values into temp storage in app, unmap updated, map data buffer and itterating through it set new values
separate buffers on constant buffer and variable buffer. constant will stay same over time but using glCopyBufferSubData the variable one can be updated in single call..
Thanks

glMapBuffer seems like a better solution for what you are doing.
The basic idea, from what I can tell, is to map the buffer into your address space, and then update the buffer manually using your own update method (iterative loop likely).
glBindBuffer(GL_ARRAY_BUFFER, buffer_id);
void *buffer = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
if (buffer == NULL)
//Handle Error, usually means lack of virtual memory
for (int i = 1; i < bufferLen; i += stride /* 3, in this case */)
buffer[i] = newValue;
glUnmapBuffer(GL_ARRAY_BUFFER);

I would separate the dynamic part with a static one (your point 2).
If you still want to keep them interleaved into a single buffer, and you have some spare video memory, you can do the following:
Copy the original interleaved array into a backup one. This requires memory for all components rather than only dynamic ones, how it was originally.
Transform Feedback into the original interleaved, carrying the static values unchanged.

Related

Map a layout onto memory address

In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
I.e. there is a void* buffer, and I know its layout:
byte1: uint8_t
byte2-3: uint16_t
byte4: uint8_t
I know I can create a struct, and memcpy the data to the struct, and then I can have the values as fields of struct.
But is there a way achieving this without copying? The data is already there, I just need to get some fields, and I'm looking a way for something can help with the layout.
(I can have some static ints for the memory offsets, but I'm hoping for some more generic).
I.e: I would have more "layouts", and based on type of the raw data I'd map the appropriate layout and access its fields which still points to the original data.
I know I can point structs to data, it is easy:
struct message {
uint8_t type;
};
struct request:message {
uint8_t rid;
uint8_t other;
};
struct response:message {
uint8_t result;
};
vector<uint8_t> data;
data.push_back(1); //type
data.push_back(10);
data.push_back(11);
data.push_back(12);
data.push_back(13);
struct request* ptrRequest;
ptrRequest = (struct request*)&data[1];
cout << (int)ptrRequest->rid; //10
cout << (int)ptrRequest->other; //11
But what I'd like to achieve is to have a map with the layouts, i.e:
map<int, struct message*> messagetypes;
But I have no clue on how can I proceed as emplacing would need a new object, and casting is also challenging if the maps stores the base pointers only.
If your layout structure is POD you can do placement new-expression with no initialization, that serves as an object creation marker. E.g.:
#include <new> // Placement new.
// ...
uint8_t* data = ...; // Read from disk, network, or elsewhere.
static_assert(std::is_pod<request>::value, "struct request must be POD.");
request* ptrRequest = new (static_cast<void*>(data)) request;
That only works with PODs. This is a long-standing issue documented in P0593R6
Implicit creation of objects for low-level object manipulation.
If your target architecture requires data to be aligned, add data pointer alignment check.
As another answer states, memcpy may be eliminated by the compiler, examine the assembly output.
In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
No, not in standard C++.
If the layout matches that of the class1, then what you might be able to do is to write the memory data onto the class instance initially, so that it doesn't need for copying afterwards.
If the above is not possible, then what you might do is copy (yes, this is memcopy, but hold that thought) the data onto an automatic instance of the class, then placement-new a copy of the automatic instance onto the source array. A good optimiser can see that these copies back and forth do not change the value, and can optimise them away. Matching layout is also necessary here. Example:
struct data {
std::uint8_t byte;
std::uint8_t another;
std::uint16_t properly_aligned;
};
void* buffer = get_some_buffer();
if (!std::align(alignof(data), sizeof(data), buffer, space))
throw std::invalid_argument("bad alignment");
data local{};
std::memcpy(&local, buffer, sizeof local);
data* dataptr = new(buffer) data{local};
std::uint16_t value_from_offset = dataptr->properly_aligned;
https://godbolt.org/z/uvrXS2 Notice how there is no call to std::memcpy in the generated assembly.
One thing to consider here is that the multi-byte integers must have the same byte order as the CPU uses natively. Therefore the data is not portable across systems (of different byte endienness). More advanced de-serialisation is required for portability.
1 It however seems unlikely that the data could possibly match the layout of the class, because the second element which is uint16_t is not aligned to two a 16 bit boundary from start of the layout.

Byte Array Initialization Causes DirectX to Crash

So I'm trying to gain access to vertex buffers on the GPU. Specifically I need to do some calculations with the vertices. So in order to do that I attempt to map the resource (vertex buffer) from the GPU, and copy it into system memory so the CPU can access the vertices. I used the following SO thread to put the code together: How to read vertices from vertex buffer in Direct3d11
Here is my code:
HRESULT hr = pSwapchain->GetDevice(__uuidof(ID3D11Device), (void**)&pDevice);
if (FAILED(hr))
return false;
pDevice->GetImmediateContext(&pContext);
pContext->OMGetRenderTargets(1, &pRenderTargetView, nullptr);
//Vertex Buffer
ID3D11Buffer* veBuffer;
UINT Stride;
UINT veBufferOffset;
pContext->IAGetVertexBuffers(0, 1, &veBuffer, &Stride, &veBufferOffset);
D3D11_MAPPED_SUBRESOURCE mapped_rsrc;
pContext->Map(veBuffer, NULL, D3D11_MAP_READ, NULL, &mapped_rsrc);
void* vert = new BYTE[mapped_rsrc.DepthPitch]; //DirectX crashes on this line...
memcpy(vert, mapped_rsrc.pData, mapped_rsrc.DepthPitch);
pContext->Unmap(veBuffer, 0);
I'm somewhat of a newbie when it comes to C++. So my assumptions may be incorrect. The initialization value that
mapped_rsrc.DepthPitch
returns is quite large. It returns 343597386. According to the documentation I listed below, it states that the return value of DepthPitch is returned in bytes. If I replace the initialization value with a much smaller number, like 10, the code runs just fine. From what I read about the Map() function here: https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ns-d3d11-d3d11_mapped_subresource
It states :
Note The runtime might assign values to RowPitch and DepthPitch that
are larger than anticipated because there might be padding between
rows and depth.
Could this have something to do with the large value that is being returned? If so, does that mean I have to parse DepthPitch to remove any unneeded data? Or maybe it is an issue with the way vert is initialized?
There was no Vertex Buffer bound, so your IAGetVertexBuffers failed to return anything. You have to create a VB.
See Microsoft Docs: How to Create a Vertex Buffer
As someone new to DirectX 11, you should take a look at DirectX Tool Kit.

Dynamic array in an array of structures in OpenCL

I have a struct :
struct A
{
double a;
int c;
double *array;
}
main()
{
A *str = new A[50];
for(int i=0;i<50;i++)
{
str[i].array = new double[5];
str[i].array[0] = 50;
}
.....
Buffer BufA = Buffer(...,..., 50 * sizeof(A),str);
.....
}
In kernel
struct A
{
double a;
int c;
double *array;
}
__kernel void vector(__global A *str)
{
int id = get_global_id(0);
printf("Element - %f",str[id].array[0]);
}
But in the kernel does not see the value in the array. Probably, because in the buffer I allocated memory for an array of structures without the memory of a dynamic array. How can I implement this?
On modern system, a process doesn't see the actual addresses of objects, but rather the virtual addresses of such objects.
This means, two processes cannot pass each others pointers and expect them to mean the same thing. You need to rethink your application with that in mind.
On top of the address virtualization mentioned by YSC, you should also keep in mind that the memory that your graphics card (or other OCL device) is operating on may be distinct (as in, different pieces of hardware) from the memory your CPU is operating on.
The OpenCL buffers are responsible for transporting their contents between these memories. So for example an array of ints that you create and write to on the CPU would have to be copied to GPU memory (and have space allocated there, and possibly be copied back after the kernel is done), which these buffers do for you. But if you store pointers to other CPU memory in your buffer, then that other memory will not be transferred automatically. Further, the pointer relation would most likely break, as there is no guarantee that your other data is at the same location in GPU memory as in CPU memory.
The solution, naturally, is to put all the data you want transferred into buffers, including the sub-arrays. One way to do this without using excessive amounts of buffers would be to pack the sub-arrays together into one and storing indices into it instead of pointers to memory.

How do I "reset" a buffer?

Say I create a member variable pointer pBuffer. I send this buffer into some unknown land to be filled with data. Now say pBuffer has an arbitrary amount of data in it.
Q: Is there a way to reset pBuffer without completely deleting it, while still deallocating all unnecessary memory it was occupying?
Example:
class Blah
{
public:
unsigned char* pBuffer;
Blah(){pBuffer = NULL;}
~Blah(){}
FillBuffer()
{
//fill the buffer with data, doesn't matter how
}
ResetBuffer()
{
//????? reset the buffer without deleting it, still deallocate memory ?????
}
};
int main()
{
Blah b;
b.FillBuffer();
b.ResetBuffer();
b.FillBuffer(); //if pBuffer were deleted, this wouldn't work
}
Try realloc() if you know the amount of stuff in the buffer vs the remaining space in the buffer.
Using only a single raw pointer, no; but if you keep a size variable you can reset the buffer relatively easily.
However, this being tagged as C++, I would like to caution you from doing this and will instead propose an alternative. This meets your requirement of allowing memory to be allocated then later for the buffer to be "reset", without deallocating the memory. As a side benefit, using std::vector means that you don't have to worry about the memory leaking in subsequent calls to FillBuffer(), specifically when the existing buffer is too small and would need to be reallocated.
#include <vector>
class Blah
{
public:
std::vector<unsigned char> pBuffer;
Blah(){}
~Blah(){}
FillBuffer()
{
//fill the buffer with data, doesn't matter how
}
ResetBuffer()
{
pBuffer.clear();
// if you _really_ want the memory "pointed to" to be freed to the heap
// use the std::vector<> swap idiom:
// std::vector<unsigned char> empty_vec;
// pBuffer.swap(empty_vec);
}
};
Buffers typically need a maximum size and a current size. To "reset", you would set the current size to zero. When you use it again, you might need to grow or shrink the maximum size of the buffer. Use realloc or malloc/new and memcpy (which realloc does internally when growing) to move existing data to the new buffer.
Note that these are expensive operations. If you expect the buffer to need to grow from use to use, you might consider doubling its maximum size every time. This effectively amortizes the cost of the allocation and copy.

Who can tell me what this bit of C++ does?

CUSTOMVERTEX* pVertexArray;
if( FAILED( m_pVB->Lock( 0, 0, (void**)&pVertexArray, 0 ) ) ) {
return E_FAIL;
}
pVertexArray[0].position = D3DXVECTOR3(-1.0, -1.0, 1.0);
pVertexArray[1].position = D3DXVECTOR3(-1.0, 1.0, 1.0);
pVertexArray[2].position = D3DXVECTOR3( 1.0, -1.0, 1.0);
...
I've not touched C++ for a while - hence the topic but this bit of code is confusing myself. After the m_pVB->Lock is called the array is initialized.
This is great and all but the problem I'm having is how this happens. The code underneath uses nine elements, but another function (pretty much copy/paste) of the code I'm working with only access say four elements.
CUSTOMVERTEX is a struct, but I was under the impression that this matters not and that an array of structs/objects need to be initialized to a fixed size.
Can anyone clear this up?
Edit:
Given the replies, how does it know that I require nine elements in the array, or four etc...?
So as long as the buffer is big enough, the elements are legal. If so, this code is setting the buffer size if I'm not mistaken.
if( FAILED( m_pd3dDevice->CreateVertexBuffer( vertexCount * sizeof(CUSTOMVERTEX), 0, D3DFVF_CUSTOMVERTEX, D3DPOOL_DEFAULT, &m_pVB, NULL ) ) ) {
return E_FAIL;
}
m_pVB points to a graphics object, in this case presumably a vertex buffer. The data held by this object will not generally be in CPU-accessible memory - it may be held in onboard RAM of your graphics hardware or not allocated at all; and it may be in use by the GPU at any particular time; so if you want to read from it or write to it, you need to tell your graphics subsystem this, and that's what the Lock() function does - synchronise with the GPU, ensure there is a buffer in main memory big enough for the data and it contains the data you expect at this time from the CPU's point of view, and return to you the pointer to that main memory. There will need to be a corresponding Unlock() call to tell the GPU that you are done reading / mutating the object.
To answer your question about how the size of the buffer is determined, look at where the vertex buffer is being constructed - you should see a description of the vertex format and an element count being passed to the function that creates it.
You're pasing a pointer to the CUSTOMVERTEX pointer (pointer to a pointer) into the lock function so lock itself must be/ needs to be creating the CUSTOMVERTEX object and setting your pointer to point to the object it creates.
In order to modify a vertex buffer in DX you have to lock it. To enforce this the DX API will only reveal the guts of a VB through calling Lock on it.
Your code is passing in the address of pVertexArray which Lock points at the VB's internal data. The code then proceeds to modify the vertex data, presumably in preparation for rendering.
You're asking the wrong question, it's not how does it know that you require x objects, it's how YOU know that IT requires x objects. You pass a pointer to a pointer to your struct in and the function returns the pointer to your struct already allocated in memory (from when you first initialized the vertex buffer). Everything is always there, you're just requesting a pointer to the array to work with it, then "release it" so dx knows to read the vertex buffer and upload it to the gpu.
When you created the vertex buffer, you had to specify a size. When you call Lock(), you're passing 0 as the size to lock, which tells it to lock the entire size of the vertex buffer.