OpenGL: why does glMapNamedBuffer() return GL_INVALID_OPERATION? - opengl

Using OpenGL 4.6, I have the following (abbreviated) code, in which I create a buffer and then attempt to map it in order to copy data over using memcpy():
glCreateBuffers(buffers.size(), buffers.data()); // buffers is a std::array of GLuints
// ...
glNamedBufferStorage(buffers[3], n * sizeof(glm::vec4), nullptr, 0); // I also tried GL_DYNAMIC_STORAGE_BIT
// ...
void* bfrptr = glMapNamedBuffer(buffers[3], GL_WRITE_ONLY);
This latter call returns GL_INVALID_OPERATION. I am sure that this is the call that generates the error, as I catch OpenGL errors right before it as well. The manpage suggests that this error is only generated if the given buffer handle is not the name of an existing buffer object, but I'm sure I created it. Is there anything else I'm missing or that I'm doing wrong?

When you create immutable buffer storage, you must tell OpenGL how you intend to access that storage from the CPU. These are not "usage hints"; these are requirements, a contract between yourself and OpenGL which GL will hold you to.
You passed 0 for the access mask. That means that you told OpenGL (among other things) that you were not going to access it by mapping it. Which you then tried to do.
So it didn't let you.
If you want to map an immutable buffer, you must tell OpenGL at storage allocation time that you're going to do that. Specifically, if you want to map it for writing, you must use the GL_MAP_WRITE_BIT flag in the gl(Named)BufferStorage call.

Related

How to read subresource data in DirectX12?

I have a DDS texture that I am creating using the CreateDDSTextureFromMemory12 function from the DDSTextureLoader helper library from microsoft. The texture has 10 mipmaps. I am able to create the texture and use it without any problem.
What I would like to do is read the texture data for a specific mipmap at a given index.
Here I am trying to read the data at the subresource index 5:
DirectX::CreateDDSTextureFromMemory12(
g_device,
g_cmd_list,
&bytes.front(),
file_size,
texResource,
tmpUploadHeap);
void* pData = nullptr;
texResource->ReadFromSubresource(pData, 64, 1, 5, nullptr);
However I am getting the following error:
D3D12 ERROR: ID3D12Resource1::ID3D12Resource::ReadFromSubresource:
ReadFromSubresource can not be called on a resource associated with a heap that has the CPU page properties of D3D12_CPU_PAGE_PROPERTY_NOT_AVAILABLE.
Heaps of the type D3D12_HEAP_TYPE_DEFAULT should be assumed to have these properties.
[ RESOURCE_MANIPULATION ERROR #895: READFROMSUBRESOURCE_INVALIDRESOURCE]
The easiest way is to use LoadDDSTextureFromMemory from the DirectXTK12 project on github. The 5th parameter returns all the subresources in a std::vector.

Vulkan: AMD vkCmdDebugMarkerBeginEXT only found with vkGetInstanceProcAddr

I've run into some odd behavior with getting a handle to vkCmdDebugMarkerBeginEXT using vkGetDeviceProcAddr, which differs between AMD and Nvidia. However, using vkGetInstanceProcAddr works.
VkDevice device = ...; // valid initialized device
VkInstance instance = ...; // valid initialized instance
PFN_vkVoidFunction fnDevice = vkGetDeviceProcAddr(device, "vkCmdDebugMarkerBeginEXT");
// fnDevice == nullptr on AMD. Non-null on Nvidia
PFN_vkVoidFunction fnInstance = vkGetInstanceProcAddr(instance, "vkCmdDebugMarkerBeginEXT");
// fnInstance == Non-null on both
From the layer interface documentation:
vkGetDeviceProcAddr can only be used to query for device extension or
core device entry points. Device entry points include any command that
uses a VkDevice as the first parameter or a dispatchable object that
is a child of a VkDevice (currently this includes VkQueue and
VkCommandBuffer). vkGetInstanceProcAddr can be used to query either
device or instance extension entry points in addition to all core
entry points.
The prototype for vkCmdDebugMarkerBeginEXT seems to match this description:
VKAPI_ATTR void VKAPI_CALL vkCmdDebugMarkerBeginEXT(
VkCommandBuffer commandBuffer,
VkDebugMarkerMarkerInfoEXT* pMarkerInfo);
While I can quite easily call the device version, and if this fails, call the instance version (to avoid the extra dispatch cost, if possible), I'm wondering if this is expected behavior, or a driver bug?
Yes, vkCmdDebugMarkerBeginEXT fits that description.
You should quote Vulkan spec instead (which IMO should have higher specifying power in this matter).
There is one additional requirement: the particular extension has to be enabled on that device for vkGetDeviceProcAddr to work. Otherwise seems like a driver bug.
In fact, the in-spec Example 2 does use vkGetDeviceProcAddr.

Source Filter cBuffers > 1 and GetDeliveryBuffer

I'm writing a source filter for directshow. The intel Media SDK H.264 Encoder requires ALLOCATOR_PROPERTIES->cBuffer > 1.
when in DoBufferProcessingLoop I get the buffer using GetDeliveryBuffer(&pSample, NULL, NULL, 0)
Do I need to do anything to make sure I get the next buffer, and I'm not overwriting the previous buffer?
I noticed the pSample->AddRef() in the sample encoder. Do I have to do something similar when I GetdeliveryBuffer or in FillBuffer?
The buffer won't be reused until the only reference to the buffer is the reference from its owning memory allocator.
This means that in DoBufferProcessingLoop you get clean buffer, you do your thing filling it, then you pass it downstream. Then the magic continues and finally the buffer is ready for reuse when it is discarded or presented, and is not being used by anybody else. You don't need to do anything to ensure this, it happens on its own.

Accessing buffer using C++-AMP

Could somebody please help me understand exactly the step that is not working here?
I am trying to use C++-AMP to do parallel-for loops, however despite having no trouble or errors going through my process, I can't get my final data.
I want to pull out my data by means of mapping it
m_pDeviceContext->Map(pBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource);
{
blah
}
But I've worked on this for days on end without even a single inch of progress.
Here is everything I do with C++-AMP:
Constructor: I initialise my variables because I have to
: m_AcceleratorView(concurrency::direct3d::create_accelerator_view(reinterpret_cast<IUnknown *>(_pDevice)))
, m_myArray(_uiNumElement, m_AcceleratorView)
I copy my initial data into the C++-AMP array
concurrency::copy(Data.begin(), m_myArray);
I do stuff to the data
concurrency::parallel_for_each(...) restrict(amp)
{
blah
}
All of this seems fine, I run into no errors.
However the next step I want to do is pull the data from the buffer, which doesn't seem to work:
ID3D11Buffer* pBuffer = reinterpret_cast<ID3D11Buffer *>(concurrency::direct3d::get_buffer(m_myArray));
When I map this data (deviceContext->Map) the data inside is 0x00000000
What step am I forgetting that will allow me to read this data? Even when I try to set the CPU read/write access type I get an error, and I didn't even see any of my references do it that way either:
m_Accelerator.set_default_cpu_access_type(concurrency::access_type::access_type_read_write);
This creates an error to say "accelerator does not support zero copy"
Can anyone please help me and tell me why I can't read my buffer, and how to fix it?
The following code should work for this. You should also check that the DX device you and the C++AMP accelerator are associated with the same hardware.
HRESULT hr = S_OK;
array<int, 1> arr(1024);
CComPtr<ID3D11Buffer> buffer;
IUnknown* unkBuf = get_buffer(arr);
hr = unkBuf->QueryInterface(__uuidof(ID3D11Buffer), reinterpret_cast<LPVOID*>(&buffer));
This question has an answer that shows you how to do the opposite.
Reading Buffer Data using C++ AMP

parallelize a video transformation program with tbb

So, I am given a program in c++ and I have to parallelize it using TBB (make it faster). As I looked into the code I thought that using pipeline would make sense. The problem is that I have little experience and whatever I found on the web confused me even more. Here is the main part of the code:
uint64_t cbRaw=uint64_t(w)*h*bits/8;
std::vector<uint64_t> raw(cbRaw/8);
std::vector<uint32_t> pixels(w*h);
while(1){
if(!read_blob(STDIN_FILENO, cbRaw, &raw[0]))
break; // No more images
unpack_blob(w, h, bits, &raw[0], &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], &raw[0]);
write_blob(STDOUT_FILENO, cbRaw, &raw[0]);
}
It actually reads a video file, unpacks it, applies the transformation, packs it and then writes it to the output. It seems pretty straightforward, so if you have any ideas or resources that could be helpful please share.
Thanx in advance,
D. Christ.
Indeed you can use tbb::parallel_pipeline to process multiple video "blobs" in parallel.
The basic scheme is a 3-stage pipeline: an input filter reads a blob, a middle filter processes it, and the last one writes the processed blob into the file. The input and output filters should be serial_in_order, and the middle filter can be parallel. Unpacking and packing seemingly might be done in either the middle stage (I would start with that, to minimize the amount of work in the serial stages) or in the input & output stages (but that could be slower).
You will also need to ensure that the data storage (raw and pixels in your case) is not shared between concurrently processed blobs. Perhaps the easiest way is to have a per-blob storage which is passed through the pipeline. Unlike the serial program, it will impossible to use automatic variables for the storage that needs to be passed between pipeline stages; thus, you will need to allocate your storage with new in the input filter, pass it by reference (or via a pointer) through the pipeline, and then delete after all processing is done in the output filter. This is surely necessary for raw storage. For pixels however, you can keep using an automatic variable if all operations that need it - i.e. unpacking, processing, and packing the result - are done within the body of the middle filter. Of course the declaration of the variable should move there as well.
Let me sketch a modification to your serial code to make it more ready for applying parallel_pipeline. Note that I changed raw to be a dynamically allocated array, rather than std::vector; the code you showed seemingly did not use it as a vector anyway. Be aware that it's just a sketch, and it might not work as is.
uint64_t cbRaw=uint64_t(w)*h*bits/8;
uint64_t * raw; // now a pointer to a dynamically allocated array
while(1){
{ // The input stage
raw = new uint64_t[cbRaw/8];
if(!read_blob(STDIN_FILENO, cbRaw, raw)) {
delete[] raw;
break; // No more images
}
}
{ // The second stage
std::vector<uint32_t> pixels(w*h);
unpack_blob(w, h, bits, raw, &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], raw);
}
{ // The output stage
write_blob(STDOUT_FILENO, cbRaw, raw);
delete[] raw;
}
}
There is a tutorial on the pipeline in the TBB documentation. Try matching your code to the example there; it should be pretty easy to do. You may also ask for help at the TBB forum.