Accessing buffer using C++-AMP - c++

Could somebody please help me understand exactly the step that is not working here?
I am trying to use C++-AMP to do parallel-for loops, however despite having no trouble or errors going through my process, I can't get my final data.
I want to pull out my data by means of mapping it
m_pDeviceContext->Map(pBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource);
{
blah
}
But I've worked on this for days on end without even a single inch of progress.
Here is everything I do with C++-AMP:
Constructor: I initialise my variables because I have to
: m_AcceleratorView(concurrency::direct3d::create_accelerator_view(reinterpret_cast<IUnknown *>(_pDevice)))
, m_myArray(_uiNumElement, m_AcceleratorView)
I copy my initial data into the C++-AMP array
concurrency::copy(Data.begin(), m_myArray);
I do stuff to the data
concurrency::parallel_for_each(...) restrict(amp)
{
blah
}
All of this seems fine, I run into no errors.
However the next step I want to do is pull the data from the buffer, which doesn't seem to work:
ID3D11Buffer* pBuffer = reinterpret_cast<ID3D11Buffer *>(concurrency::direct3d::get_buffer(m_myArray));
When I map this data (deviceContext->Map) the data inside is 0x00000000
What step am I forgetting that will allow me to read this data? Even when I try to set the CPU read/write access type I get an error, and I didn't even see any of my references do it that way either:
m_Accelerator.set_default_cpu_access_type(concurrency::access_type::access_type_read_write);
This creates an error to say "accelerator does not support zero copy"
Can anyone please help me and tell me why I can't read my buffer, and how to fix it?

The following code should work for this. You should also check that the DX device you and the C++AMP accelerator are associated with the same hardware.
HRESULT hr = S_OK;
array<int, 1> arr(1024);
CComPtr<ID3D11Buffer> buffer;
IUnknown* unkBuf = get_buffer(arr);
hr = unkBuf->QueryInterface(__uuidof(ID3D11Buffer), reinterpret_cast<LPVOID*>(&buffer));
This question has an answer that shows you how to do the opposite.
Reading Buffer Data using C++ AMP

Related

OpenGL: why does glMapNamedBuffer() return GL_INVALID_OPERATION?

Using OpenGL 4.6, I have the following (abbreviated) code, in which I create a buffer and then attempt to map it in order to copy data over using memcpy():
glCreateBuffers(buffers.size(), buffers.data()); // buffers is a std::array of GLuints
// ...
glNamedBufferStorage(buffers[3], n * sizeof(glm::vec4), nullptr, 0); // I also tried GL_DYNAMIC_STORAGE_BIT
// ...
void* bfrptr = glMapNamedBuffer(buffers[3], GL_WRITE_ONLY);
This latter call returns GL_INVALID_OPERATION. I am sure that this is the call that generates the error, as I catch OpenGL errors right before it as well. The manpage suggests that this error is only generated if the given buffer handle is not the name of an existing buffer object, but I'm sure I created it. Is there anything else I'm missing or that I'm doing wrong?
When you create immutable buffer storage, you must tell OpenGL how you intend to access that storage from the CPU. These are not "usage hints"; these are requirements, a contract between yourself and OpenGL which GL will hold you to.
You passed 0 for the access mask. That means that you told OpenGL (among other things) that you were not going to access it by mapping it. Which you then tried to do.
So it didn't let you.
If you want to map an immutable buffer, you must tell OpenGL at storage allocation time that you're going to do that. Specifically, if you want to map it for writing, you must use the GL_MAP_WRITE_BIT flag in the gl(Named)BufferStorage call.

Error reading character of string - Access violation error C++

I working with a Kinect v2 related project in C++ while I cannot use Depth Frame (BYTE*) outside the function.
It works for first some minutes I think by luck so.
Then I got errors like:
Error reading characters of string
and Access violation error and no symbols loaded for kinect20.dll at some point of time.
Here is the method I am calling the values.
BYTE* bodyIndex = new BYTE[512*424]; // initialization
HRESULT frameGet(){
//Initialization method if success
hr = pDepthFrame->AccessUnderlyingBuffer(&m_nDepthBufferSize, &bodyIndex); //Kinect dll method
prints(depth[300]); // Prints the value every time
return hr;
}
HRESULT getDepthFrame(){
if frameGet is success
prints(bodyIndex[300]); // throws error reading character of string
return hr;
}
Can anyone please explain how I can access the bodyIndex data everytime.
I didnt get any response when posted the full code so need the logic how c++ works.
If assumption is right the depth data got cleaned up after sometimes by kinectdll so it reflects.
I tried with memcpy the error still there.
Thanks in advance.
According to https://msdn.microsoft.com/en-us/library/microsoft.kinect.kinect.idepthframe.accessunderlyingbuffer.aspx
you don't need to allocate the memory.
Gets a pointer to the depth frame data.
public:
HRESULT AccessUnderlyingBuffer(
UINT *capacity,
UINT16 **buffer
)
buffer Type: UINT16 [out] When this method returns, contains the
pointer to the depth frame data.
If I understand the spec correctly you have always call AccessUnderlyingBuffer() before access to it.

Using new to allocate memory for unsigned char array fails

I'm trying to load a tga file in c++ code that I got from google searching, but the part that allocates memory fails. The beginning of my "LoadTarga" method includes these variables:
int imageSize;
unsigned char* targaImage;
Later on in the method the imageSize variable gets set to 262144 and I use that number to set the size of the array:
// Calculate the size of the 32 bit image data.
imageSize = width * height * 4;
// Allocate memory for the targa image data.
targaImage = new unsigned char[imageSize];
if (!targaImage)
{
MessageBox(hwnd, L"LoadTarga - failed to allocate memory for the targa image data", L"Error", MB_OK);
return false;
}
The problem is that the body of the if statement executes and I have no idea why the memory allocation failed. As far as I know it should work - I know the code compiles and runs up to this point and I haven't seen anything yet in google that would show a proper alternative.
What should I change in my code to make it allocate memory correctly?
Important Update:
Rob L's comments and suggestions were very useful (though I didn't try _heapchk since I solved the issue before I tried using it)
Trying each of fritzone's ideas meant the program ran past the "if (!targaImage)" point without trouble. The code that sets "targaImage and the if statement checks if memory was allocated correctly has been replaced with this:
try
{
targaImage = new unsigned char[imageSize];
}
catch (std::bad_alloc& ba)
{
std::cerr << "bad_alloc caught: " << ba.what() << '\n';
return false;
}
However I got a new problem with the very next bit of code:
count = (unsigned int)fread(targaImage, 1, imageSize, filePtr);
if (count != imageSize)
{
MessageBox(hwnd, L"LoadTarga - failed to read in the targa image data", L"Error", MB_OK);
return false;
}
Count was giving me a value of "250394" which is different to imageSize's value of "262144". I couldn't figure out why this was and doing a bit of searching (though I must admit, not much searching) on how "fread" works didn't yield info.
I decided to cancel my search and try the answer code on the tutorial site here http://www.rastertek.com/dx11s2tut05.html (scroll to the bottom of the page where it says "Source Code and Data Files" and download the zip. However creating a new project, putting in the source files and image file didn't work as I got a new error. At this point I thought maybe the way I converted the image file from to tga might have been incorrect.
So rather than spend a whole lot of time debugging the answer code I put the image file from the answer into my own project. I noted that the size of mine was MUCH smaller than the answer (245KB compared to 1025 KB) )so maybe if I use the answer code's image my code would run fine. Turns out I was right! Now the image is stretched sideways for some reason but my original query appears to have been solved.
Thanks Rob L and fritzone for your help!
You are NOT using the form of new which returns a null pointer in case of error, so it makes no sense for checking the return value. Instead you should be aware of catching a std::bad_alloc. The null pointer returning new for you has the syntax: new (std::nothrow) unsigned char[imageSize];
Please see: http://www.cplusplus.com/reference/new/operator%20new[]/
Nothing in your sample looks wrong. It is pretty unlikely that a modern Windows system will run out of memory allocating 256k just once. Perhaps your allocator is being called in a loop and allocating more than you think, or the value of imagesize is wrong. Look in the debugger.
Another possibility is that your heap is corrupt. Calling _heapchk() can help diagnose that.
Check the "memory peak working set" in windows tasks manager and ensure how much memory you are really trying to allocate.

Source Filter cBuffers > 1 and GetDeliveryBuffer

I'm writing a source filter for directshow. The intel Media SDK H.264 Encoder requires ALLOCATOR_PROPERTIES->cBuffer > 1.
when in DoBufferProcessingLoop I get the buffer using GetDeliveryBuffer(&pSample, NULL, NULL, 0)
Do I need to do anything to make sure I get the next buffer, and I'm not overwriting the previous buffer?
I noticed the pSample->AddRef() in the sample encoder. Do I have to do something similar when I GetdeliveryBuffer or in FillBuffer?
The buffer won't be reused until the only reference to the buffer is the reference from its owning memory allocator.
This means that in DoBufferProcessingLoop you get clean buffer, you do your thing filling it, then you pass it downstream. Then the magic continues and finally the buffer is ready for reuse when it is discarded or presented, and is not being used by anybody else. You don't need to do anything to ensure this, it happens on its own.

parallelize a video transformation program with tbb

So, I am given a program in c++ and I have to parallelize it using TBB (make it faster). As I looked into the code I thought that using pipeline would make sense. The problem is that I have little experience and whatever I found on the web confused me even more. Here is the main part of the code:
uint64_t cbRaw=uint64_t(w)*h*bits/8;
std::vector<uint64_t> raw(cbRaw/8);
std::vector<uint32_t> pixels(w*h);
while(1){
if(!read_blob(STDIN_FILENO, cbRaw, &raw[0]))
break; // No more images
unpack_blob(w, h, bits, &raw[0], &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], &raw[0]);
write_blob(STDOUT_FILENO, cbRaw, &raw[0]);
}
It actually reads a video file, unpacks it, applies the transformation, packs it and then writes it to the output. It seems pretty straightforward, so if you have any ideas or resources that could be helpful please share.
Thanx in advance,
D. Christ.
Indeed you can use tbb::parallel_pipeline to process multiple video "blobs" in parallel.
The basic scheme is a 3-stage pipeline: an input filter reads a blob, a middle filter processes it, and the last one writes the processed blob into the file. The input and output filters should be serial_in_order, and the middle filter can be parallel. Unpacking and packing seemingly might be done in either the middle stage (I would start with that, to minimize the amount of work in the serial stages) or in the input & output stages (but that could be slower).
You will also need to ensure that the data storage (raw and pixels in your case) is not shared between concurrently processed blobs. Perhaps the easiest way is to have a per-blob storage which is passed through the pipeline. Unlike the serial program, it will impossible to use automatic variables for the storage that needs to be passed between pipeline stages; thus, you will need to allocate your storage with new in the input filter, pass it by reference (or via a pointer) through the pipeline, and then delete after all processing is done in the output filter. This is surely necessary for raw storage. For pixels however, you can keep using an automatic variable if all operations that need it - i.e. unpacking, processing, and packing the result - are done within the body of the middle filter. Of course the declaration of the variable should move there as well.
Let me sketch a modification to your serial code to make it more ready for applying parallel_pipeline. Note that I changed raw to be a dynamically allocated array, rather than std::vector; the code you showed seemingly did not use it as a vector anyway. Be aware that it's just a sketch, and it might not work as is.
uint64_t cbRaw=uint64_t(w)*h*bits/8;
uint64_t * raw; // now a pointer to a dynamically allocated array
while(1){
{ // The input stage
raw = new uint64_t[cbRaw/8];
if(!read_blob(STDIN_FILENO, cbRaw, raw)) {
delete[] raw;
break; // No more images
}
}
{ // The second stage
std::vector<uint32_t> pixels(w*h);
unpack_blob(w, h, bits, raw, &pixels[0]);
process(levels, w, h, bits, pixels);
//invert(levels, w, h, bits, pixels);
pack_blob(w, h, bits, &pixels[0], raw);
}
{ // The output stage
write_blob(STDOUT_FILENO, cbRaw, raw);
delete[] raw;
}
}
There is a tutorial on the pipeline in the TBB documentation. Try matching your code to the example there; it should be pretty easy to do. You may also ask for help at the TBB forum.