So I'm trying to gain access to vertex buffers on the GPU. Specifically I need to do some calculations with the vertices. So in order to do that I attempt to map the resource (vertex buffer) from the GPU, and copy it into system memory so the CPU can access the vertices. I used the following SO thread to put the code together: How to read vertices from vertex buffer in Direct3d11
Here is my code:
HRESULT hr = pSwapchain->GetDevice(__uuidof(ID3D11Device), (void**)&pDevice);
if (FAILED(hr))
return false;
pDevice->GetImmediateContext(&pContext);
pContext->OMGetRenderTargets(1, &pRenderTargetView, nullptr);
//Vertex Buffer
ID3D11Buffer* veBuffer;
UINT Stride;
UINT veBufferOffset;
pContext->IAGetVertexBuffers(0, 1, &veBuffer, &Stride, &veBufferOffset);
D3D11_MAPPED_SUBRESOURCE mapped_rsrc;
pContext->Map(veBuffer, NULL, D3D11_MAP_READ, NULL, &mapped_rsrc);
void* vert = new BYTE[mapped_rsrc.DepthPitch]; //DirectX crashes on this line...
memcpy(vert, mapped_rsrc.pData, mapped_rsrc.DepthPitch);
pContext->Unmap(veBuffer, 0);
I'm somewhat of a newbie when it comes to C++. So my assumptions may be incorrect. The initialization value that
mapped_rsrc.DepthPitch
returns is quite large. It returns 343597386. According to the documentation I listed below, it states that the return value of DepthPitch is returned in bytes. If I replace the initialization value with a much smaller number, like 10, the code runs just fine. From what I read about the Map() function here: https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ns-d3d11-d3d11_mapped_subresource
It states :
Note The runtime might assign values to RowPitch and DepthPitch that
are larger than anticipated because there might be padding between
rows and depth.
Could this have something to do with the large value that is being returned? If so, does that mean I have to parse DepthPitch to remove any unneeded data? Or maybe it is an issue with the way vert is initialized?
There was no Vertex Buffer bound, so your IAGetVertexBuffers failed to return anything. You have to create a VB.
See Microsoft Docs: How to Create a Vertex Buffer
As someone new to DirectX 11, you should take a look at DirectX Tool Kit.
Related
Goal
I want to set 'shared' buffer's size in GLSL on runtime.
Question is "How to create shared memory in Vulkan Host C/C++ program?" \
Example
In OpenCL, below OpenCL Kernel function has '__local' argument.
// foo.cl
__kernel void foo(__global float *dst, __global float *src, __local *buf ){
/*do something great*/
}
and in host C++ program, I create __local memory and pass it kernel args.
void main(){
...
cl_uint args_index = 2;
foo.setArg(args_index,__local(128) ); // make 128bytes local memory and pass it to kernel.
}
I want to do same thing on Vulkan Compute Pipeline.
and I tried below.
GLSL
//foo.comp
#version 450
layout(binding=0) buffer dstBuffer{
uint dst[];
};
layout(binding=1) buffer srcBuffer{
uint src[];
};
// try 1
layout(binding=2) uniform SharedMemSize{
uint size[];
};
shared uint buf[size[0]]; // compile Error! msg : array size must be a constant integer expression
// try 2
layout(binding=2) shared SharedBuffer{
uint buf[];
}; // compile Error! msg :'shared block' not supported
//try 3
layout(binding=2) shared uint buf[]; // compile Error! msg : binding requires uniform or buffer.
I failed above things.
need your help. thanks.
GLSL has shared variables, which represent storage accessible to any member of a work group. However, it doesn't have "shared memory" in the same way as OpenCL. You cannot directly upload data to shared variables from the CPU, for example. You can't get a pointer to shared variables.
The closest you might get to this is to have some kind of shared variable array whose size is determined from outside of the shader. But even then, you're not influencing the size of memory in total; you're influencing the size of just that array.
And you can sort of do that. SPIR-V in OpenGL allows specialization constants. These are values specified by the outside world during the compilation process for the shader object. Such constants can be used as the size for a shared variable array. Of course, this means that changing it requires a full recompile/relink process.
From the man page for vkEnumerateDeviceExtensionProperties,
vkEnumerateDeviceExtensionProperties retrieves properties for
extensions on a physical device whose handle is given in
physicalDevice. To determine the extensions implemented by a layer set
pLayerName to point to the layer’s name and any returned extensions
are implemented by that layer. Setting pLayerName to NULL will return
the available non-layer extensions. pPropertyCount must be set to the
size of the VkExtensionProperties array pointed to by pProperties. The
pProperties should point to an array of VkExtensionProperties to be
filled out or null. If null, vkEnumerateDeviceExtensionProperties will
update pPropertyCount with the number of extensions found. The
definition of VkExtensionProperties is as follows:
(emphasis mine). It seems in the current implementation (Window SDK v1.0.13), pPropertyCount is updated with the number of extensions, regardless of whether pProperties is null or not. However, the documentation doesn't appear to be explicit on what happens in this situation.
Here's an example, of why having such a feature is 'nicer':
const uint32_t MaxCount = 1024; // More than you'll ever need
uint32_t ActualCount = MaxCount;
VkLayerProperties layers[MaxCount];
VkResult result = vkEnumerateDeviceLayerProperties(physicalDevice, &ActualCount, layers);
//...
vs.
uint32_t ActualCount = 0;
VkLayerProperties* layers;
VkResult result = vkEnumerateDeviceLayerProperties(physicalDevice, &ActualCount, nullptr);
if (ActualCount > 0)
{
extensions = alloca(ActualCount * sizeof(VkLayerProperties));
result = vkEnumerateDeviceLayerProperties(physicalDevice, &ActualCount, layers);
//...
}
My question is: am I depending on unsupported functionality by doing this, or is this somehow advertised somewhere else in the documentation?
From the latest spec:
For both vkEnumerateInstanceExtensionProperties and vkEnumerateDeviceExtensionProperties, if pProperties is NULL, then the number of extensions properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the user to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties. If pPropertyCount is less than the number of extension properties available, at most pPropertyCount structures will be written. If pPropertyCount is smaller than the number of extensions available, VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not all the available properties were returned.
So your approach is correct, even though it's a bit wasteful on memory. Similar functions returning arrays also behave like this.
Also note that since 1.0.13, device layers are deprecated. All instance layers are able to intercept commands to both the instance and the devices created from it.
Most Vulkan commands relays in double calls:
First call to get count number of returning structures or handles;
Second call to pass an properly sized array to get back requested structures/handle. In this second call, the count parameter tells the size of your array.
If , in second step, you get VkResult::VK_INCOMPLETE result then you passed an array too short to get all objects back. Note VK_INCOMPLETE is not an error, it is a partial success (2.6.2 Return Codes ... "All successful completion codes are non-negative values. ")
Your Question :
Am I depending on unsupported functionality by doing
this, or is this somehow advertised somewhere else in the
documentation?
You proposed create a big array before calling the function, to avoid a call Vulkan function twice.
My reply: Yes, and you are doing a bad design decision by "guessing"
the array size.
Please, don't get me wrong. I strongly agree with you that is annoying to call same function twice, but you can solve it by wrapping those sort functions with a more programmer friendly behaviour.
I'll use another Vulkan function, just to illustrate it. Let say you want to avoid double call to :
VkResult vkEnumeratePhysicalDevices(
VkInstance instance,
uint32_t* pPhysicalDeviceCount,
VkPhysicalDevice* pPhysicalDevices);
A possible solution would be write the sweet wrap function:
VkResult getPhysicalDevices(VkInstance instance, std::vector<VkPhysicalDevice>& container){
uint32_t count = 0;
VkResult res = vkEnumeratePhysicalDevices(instance, &count, NULL); // get #count
container.resize(count); //Removes extra entries or allocates more.
if (res < 0) // something goes wrong here
return res;
res = vkEnumeratePhysicalDevices(instance, &count, container.data()); // all entries overwritten.
return res; // possible OK
}
That is my two cents about the double call to Vulkan functions. It is a naive implementation and may not work for all cases! Note you must create the vector BEFORE you call the wrapping function.
Good Luck!
I allocated some space, wrote some asm and tried to start a thread at that point.
But I keep getting an access violation. Its suppose to push four 0s and call the messageboxa function. But right at the area address it gets a access violation.
How can I get it to run like normal code?
void test2()
{
byte* area;
HANDLE process;
area = new byte[1024];
for(int i = 0; i < 1024; i++)
area[i] = 0;
memmove((char*)area, "\x6a\x00\x6a\x00\x6a\x00\x6a\x00\xE8", 9);
*(DWORD*)&area[9] = ((DWORD)GetProcAddress(GetModuleHandle("User32.dll"), "MessageBoxA") - (DWORD)&area[9] - 4);
memmove((char*)&area[13], "\x33\xc0\xc3", 3);
VirtualProtect(area, 17, PAGE_EXECUTE_READWRITE, 0);
CreateThread(0, 0, (LPTHREAD_START_ROUTINE)area, 0, 0, 0);
}
here's a screen shot of the disassembly
http://screensnapr.com/v/P33NsH.png
The VirtualProtect() call doesn't do anything in this case: it just fails since it expects the 4th parameter to be a valid pointer to a memory area which receives the previous access protection flags (so you can restore it later). So, the CPU refuses to execute this page and you get the GPF at the very first instruction.
You also need to use PAGE_EXECUTE_READ for the flag, otherwise the first heap operation (even read access to any other variable in the heap, which happens to touch the same page) will generate GPF. Alternatively, use VirtualAlloc(), instead of allocating on the heap.
Note, I didn't check the rest of the code, so there might be some other issues with it. Also note that this is not the way to write assembly, unless you're writing an exploit (messing with VirtualProtect() is a sure sign of that). Here's to hoping that I'm wrong in my assumption about the exploit.
using opengl 3.3, radeon 3870HD, c++..
I got question about interleaved arrays of data. I got in my application structure in vector, which is send as data to buffer object. Something like this:
struct data{
int a;
int b;
int c;
};
std::vector<data> datVec;
...
glBufferData(GL_ARRAY_BUFFER, sizeof(data)*datVec.size(), &datVec[0], GL_DYNAMIC_DRAW);
this is ok I use this thing very often. But what I create is interleaved array so data are like:
a1,b1,c1,a2,b2,c2,a3,b3,c3
Now I send this thing down for processing in GPU and with transform feedback I read back into buffer for example b variables. So it looks like:
bU1, bU2, bU3
I'd like to copy updated values into interleaved buffer, can this be done with some single command like glCopyBufferSubData? This one isn't suitable as it only takes offset and size not stride (probably it's something like memcpy in c++)... The result should look like:
a1, bU1, c1, a2, bU2, c2, a3, bU3, c3
If not is there better approach than these 2 mine?
map updated buffer, copy values into temp storage in app, unmap updated, map data buffer and itterating through it set new values
separate buffers on constant buffer and variable buffer. constant will stay same over time but using glCopyBufferSubData the variable one can be updated in single call..
Thanks
glMapBuffer seems like a better solution for what you are doing.
The basic idea, from what I can tell, is to map the buffer into your address space, and then update the buffer manually using your own update method (iterative loop likely).
glBindBuffer(GL_ARRAY_BUFFER, buffer_id);
void *buffer = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
if (buffer == NULL)
//Handle Error, usually means lack of virtual memory
for (int i = 1; i < bufferLen; i += stride /* 3, in this case */)
buffer[i] = newValue;
glUnmapBuffer(GL_ARRAY_BUFFER);
I would separate the dynamic part with a static one (your point 2).
If you still want to keep them interleaved into a single buffer, and you have some spare video memory, you can do the following:
Copy the original interleaved array into a backup one. This requires memory for all components rather than only dynamic ones, how it was originally.
Transform Feedback into the original interleaved, carrying the static values unchanged.
CUSTOMVERTEX* pVertexArray;
if( FAILED( m_pVB->Lock( 0, 0, (void**)&pVertexArray, 0 ) ) ) {
return E_FAIL;
}
pVertexArray[0].position = D3DXVECTOR3(-1.0, -1.0, 1.0);
pVertexArray[1].position = D3DXVECTOR3(-1.0, 1.0, 1.0);
pVertexArray[2].position = D3DXVECTOR3( 1.0, -1.0, 1.0);
...
I've not touched C++ for a while - hence the topic but this bit of code is confusing myself. After the m_pVB->Lock is called the array is initialized.
This is great and all but the problem I'm having is how this happens. The code underneath uses nine elements, but another function (pretty much copy/paste) of the code I'm working with only access say four elements.
CUSTOMVERTEX is a struct, but I was under the impression that this matters not and that an array of structs/objects need to be initialized to a fixed size.
Can anyone clear this up?
Edit:
Given the replies, how does it know that I require nine elements in the array, or four etc...?
So as long as the buffer is big enough, the elements are legal. If so, this code is setting the buffer size if I'm not mistaken.
if( FAILED( m_pd3dDevice->CreateVertexBuffer( vertexCount * sizeof(CUSTOMVERTEX), 0, D3DFVF_CUSTOMVERTEX, D3DPOOL_DEFAULT, &m_pVB, NULL ) ) ) {
return E_FAIL;
}
m_pVB points to a graphics object, in this case presumably a vertex buffer. The data held by this object will not generally be in CPU-accessible memory - it may be held in onboard RAM of your graphics hardware or not allocated at all; and it may be in use by the GPU at any particular time; so if you want to read from it or write to it, you need to tell your graphics subsystem this, and that's what the Lock() function does - synchronise with the GPU, ensure there is a buffer in main memory big enough for the data and it contains the data you expect at this time from the CPU's point of view, and return to you the pointer to that main memory. There will need to be a corresponding Unlock() call to tell the GPU that you are done reading / mutating the object.
To answer your question about how the size of the buffer is determined, look at where the vertex buffer is being constructed - you should see a description of the vertex format and an element count being passed to the function that creates it.
You're pasing a pointer to the CUSTOMVERTEX pointer (pointer to a pointer) into the lock function so lock itself must be/ needs to be creating the CUSTOMVERTEX object and setting your pointer to point to the object it creates.
In order to modify a vertex buffer in DX you have to lock it. To enforce this the DX API will only reveal the guts of a VB through calling Lock on it.
Your code is passing in the address of pVertexArray which Lock points at the VB's internal data. The code then proceeds to modify the vertex data, presumably in preparation for rendering.
You're asking the wrong question, it's not how does it know that you require x objects, it's how YOU know that IT requires x objects. You pass a pointer to a pointer to your struct in and the function returns the pointer to your struct already allocated in memory (from when you first initialized the vertex buffer). Everything is always there, you're just requesting a pointer to the array to work with it, then "release it" so dx knows to read the vertex buffer and upload it to the gpu.
When you created the vertex buffer, you had to specify a size. When you call Lock(), you're passing 0 as the size to lock, which tells it to lock the entire size of the vertex buffer.