DirectX 12 - Command list questions - c++

Trying to understand command lists.
Well, command lists records my commands for rendering, but also binding resources, let's say a buffer with vertex data.
m_commandList->IASetVertexBuffers(0, 1, &m_vertexBufferView);
This records the binding of vertex buffer. What happends with the buffer at this moment ? What will happen if I change a content of this vertex buffer after recording it ? What will happen if i change the content of this vertex buffer after calling execute command list and gpu not finished it yet ?
I guess ExecuteCommandList is a asynchronous function call, am I right ? Does it execute all binding ( data transfer to gpu ) at once, or it executes commands one by one even all bindings ? Is the command list executed by a driver, or is is all sent to gpu ?
Well, becuase lack of good examples, I still have lots of questions. I would be happy, if you can answer few of them to make it clear.

With DirectX 12, synchronization is entirely the application's responsibility. You have to insert and check fences to make sure the GPU is done with your buffer before you modify it. For dynamic buffers, you need to do your own double/triple/n buffering.
When you call ExecuteCommandList is just queues it up to the GPU. It will take some time before the GPU actually picks it up and then completes it.
Be sure to check the DirectX Graphics Samples GitHub project. The samples there and the Mini Engine demo are good places to find example usage.
This design makes DirectX 12 extremely powerful as it gives the application direct control over lots of things that were 'magic' in the Direct3D 11 Runtime. That said, the resulting API is much harder to actually use unless you are already a good enough graphics engineer that you could write the Direct3D 11 Runtime. Using Direct3D 11 API is still a fine choice for a project too.

Related

C++, OpenCV, & Kinect: Processing speed goes down

I use C++ (Visual Studio 2015) and OpenCV (ver 3.2.0) to process data sent from Kinect v1. My C++ program has no problem when it starts debugging for the first time. After it stops debugging and re-start debugging, however, it gets very slow.
I am suspecting that the program closes without releasing some memory (i.e., memory leak). I am aware of that I would need to use the delete function to release the memory if I use the new function. But I didn't use the new function in the C++ program (I neither used the malloc() function, which is equivalent to the new function in C programs).
For OpenCV, I use the destroyAllWindows function at the end of the program. For Kinect v1, I also use the NuiShutdown(), Release(), and CloseHandle() functions at the end of the program.
Is there anything else I need do to release the memory (e.g., releasing memory associated with Mat in OpenCV)? Or is something else causing the decrease in processing speed?
I'd appreciate your help. Thanks.
After first run disconnect Kinect then reconnect and try second run.
If all goes well now then the problem is most likely stuck thread. The device access is usually handled by separate threads and especially with USB they can get stuck (in case of error or sync problem between accessing form host and expecting on device side) until you disconnect device (not sure which Kinect driver are you using but JUNGO version which NuiShutdown() infers have this problem). You can also check task manager before disconnection if there are not some stuck processes left after first run.
To remedy this you need to find out what are you doing wrong during access. It could be:
wrong USB port
use the back side not front slots.
invalid USB transfer request
device is always waiting for specific set of commands or stream and waits until it does not receive it so it blocks all other things. So using unsupported commands or reading in wrong times or sizes of packets can cause this.
USB communication is out of sync
PC host can timeout in case you do not have enough CPU power while critical operation is processed (or have opened too many apps on background).
This can be caused also by wrong gfx driver as I suspect you are using rendering ... Intel HD graphics can generate such problems with ease especially on notebooks. Try to disable any rendering in your app or at least limit rendering to OpenGL 1.0 to see if speed is the same in between runs. If this is the case the whole desktop usually flickers or is not repainting parts of apps ... and animations are sometimes sluggish.
Another problem might be a debugger. If without it all is well then debugger is the problem and you can not solve it. Debugging while accessing IO can cause sync and timeout problems especially with USB.
To check for memory leaks you can simply see how much free memory you got before 1st run and compare it to values after 1st,2nd,3th .. runs if the value lowers you got something stuck somewhere. After app close all the memory belonging to app is freed by OS so even if you forget some delete that does not matter unless some thread is still running ...
Some USB drivers based on libUSB I encountered got also problem with Handle leaks. But that behaves differently ... all runs fine until there are no free handles. After that OS is non functional you can not open any window,app, anything ... until any app is closed.
[Edit1] Front USB slots
Front slots are usually connected to motherboard with relatively long cable (usually flat and not very well shielded) so it is more susceptible to noise. Also as it is located usually around HDD and above high frequency parts of the motherboard it also induce it into the USB feed. All this degrades the quality of USB signal causing much much bigger rejection rate hence lowering sync capability and also the overall usable bandwidth.
If you compare that with backside USB ports they have no cables but are connected directly in PCB with short and well shielded paths so the connection quality is much much better.
So if you use device demanding high bandwith or synchronism then front ports are a bad choice.

Inner workings of Raspberry Pi userland graphics driver (not firmware or kernel part)

I'm trying to understand the userland part of the Raspberry Pi graphics driver code from https://github.com/raspberrypi/userland
My understanding so far is:
- a firmware blob runs in the GPU and offers an OpenGL-like interface which, on lower levels, is based on message (byte-array) passing on top of one of multiple 28-bit-word FIFOs called VCHIQ (the other VCHIQ queues are irrelevant for graphics)
- on the CPU part, OpenGL calls are turned into messages to the GPU. Access to the low-level facility (either the message queue or VCHIQ -- I haven't found that part yet in the code) requires a Linux kernel module, but no high-level logic happens in there.
- the GPU part is closed, but that's okay for my purposes. The (ARM) CPU part is, AFAIK, open
My ultimate goal is to get communication with the GPU working on bare metal (without Linux), but with the closed firmware blob intact. As a first goal, I want to understand how an OpenGL call is actually passed to the GPU. Anything beyond that is not part of this question.
However, I'm stuck at finding the actual code for this. The OpenGL calls use RPC_CALL* and in turn RPC_DO, which calls khronos_server_lock_func_table(). However, that function seems to be missing from the code, and to my surprise, I couldn't find anything useful about it on Google.
My questions:
- am I still on the ARM CPU side, or did I move to GPU land without noticing? If the latter is the case, where did I cross that line?
- Assuming I'm still on the CPU side -- where is the code for that function? Is it open at all, or do we actually have closed parts left around on the CPU side here? All sources on the web seem to indicate that the code for the CPU is 100% open.
- at which point does the implementation of the C OpenGL functions actually send a message to the GPU? I'm somewhat expecting a call to the kernel functionality that represents VCHIQ to be happening at some point, probably implemented as a device file.
I don't fully understand how do you intend to access the GPU without using Linux, and I am not that familiar with the technicalities, but some time ago I've been digging into the GPU for my private project so I'll tell you what I know.
The GPU is VideoCore IV and its documentation is available on Broadcom's website.
Also, on the Raspberry Pi Wiki you can see on the picture on the left that VCHIQ is in the kernel driver, so you might look for the implementation details in the kernel's source code.
Maybe this might be of some help too: VideoCore IV Programmer's Manual. About the document:
This is a independent documentation project based on a combination of static analysis and trial and error on real hardware. This work is 100% independent from and not sanctioned by or connected with Broadcom or its agents. No Broadcom documents or materials were used beyond those publicly available.
As for the software itself, The Khronos Group provides OpenGL ES and OpenVG implementation, but it's not open source. You can get the documentation from their website, but I doubt you'll find anything on such low level.
Hope it helps.

(linux) How can I know inside a c++ program if Opengl 4 is supported?

I would like to detect inside my C++ program if opengl 4 is supported on the running computer.
I don't know if I search on google and stackoverflow with wrong/bad terms (my english skill...), but surprisingly I didn't found any example... I would not be suprise if you tell me this question is a duplicate...
It would eventually useful for me to know how to get more usefull datas from the video card and the drivers used by it on the running computer. I didn't take time to look around to know how to do that, but if you have some usefull link, feel free to share it with me.
Step 1: Create an OpenGL Context; first try by the "attrib" method requesting the minium OpenGL version you want to have. If that succeeds you're done.
Step 2: If that didn't work and you can gracefully downgrade create a no-frills context
and call glGetString(GL_VERSION) to get the actual context version supported. Note that on MacOS X this limits you to 2.1 and earlier.
Step 3: If you want some context, portable and reliably between 2.1 and your optimimal version, try with the attribs method in a loop, decrementing your needs until it succeeds.
Note that there is no way to determine in advance which version is supported in OpenGL. The main reason for this is, that operating systems and the graphics layer may decide on demand which locally available OpenGL version to use, depending on the request and the resources available at the moment (graphics cards in theory can be hotplugged).

How to get a pointer to a hardware driver in Windows?

I want to write a program that will monitor memory in a driver and print the memory contents every so often.
However, I'm not finding any resources in the Windows API that seem to allow me to grab a pointer (Handle) to a specific driver.
I'd appreciate any answer either from User space OR kernel space.
If you want to know exactly what I'm doing, I'm attempting to duplicate the results from this paper except on Windows. After I gain the ability to monitor a buffer in a basic windows console program, I intend to monitor from the GPU.
[For the record: I am a Graduate Student who is pursuing this as a summer project... this is ethical malware research.]
============UPDATE ==================
This might technically be better suited as an answer, but not really until I have a working solution.
My initial plan of attack is to use WinDbg to do dynamic analysis on the keyboard driver when it gets loaded, so I can get some idea about normal loading/unloading behavior. I'm using chapter 10 of this book, to guide setting up my testbed and once I understand more about the keyboard structure and its buffer, I'll work backwards towards getting a permanent reference to this structure and see about passing it into the graphics card and monitoring it with DMA as the original paper did on Linux.
You won't solve this problem by "grabbing a pointer to a specific driver". You need to locate the specific buffer used by the keyboard driver that resides on top of the USB driver.
You will have to actually grok the keyboard and USB drivers for Windows. At least part of which is probably available if you have a DDK (driver development kit) [aka WDK, Windows Driver Kit]. You will definitely need a graphics driver for this part of the project.
You will also have to develop a driver mechanism to map an arbitrary (kernel) lump of memory to your graphics driver - which means you need access to the source code for the graphics driver. (In theory, you could perhaps hack about in the page-tables, but Windows itself isn't too keen on software messing with the page-tables, and you'd definitely need to be VERY careful if the system is SMP, since modifying page-tables in an SMP system requires that you flush the TLB's of the "other" CPU(cores) in the system after updates).
To me, this seems like a rather interesting project, but a really tough one in a closed source system like Windows. At least in Linux, the developer has the source-code to read. When it comes to Windows, most of the relevant source code is completely unavailable (unless your school has special license to the MS Source code - I think there are some that do).

How to make software run in parallel, c++, directshow, opencv

I've been working on a facetracking system last couple of months and now I need to make everything run in parallel to increase the performance.
The main cpp file is:
int _tmain(int argc, _TCHAR* argv[])
{
cFrame.initCamFrames(20, 1600, 1200, 3); //INITIATES BUFFER FOR CAM FRAMES, 20 frames, res:1600x1200, 3bytes per pixel.
eyeTracking.initTrackingSystem(&cFrame); //INITIATES EYETRACKING SOFTWARE WITH POINTER TO THE BUFFER WHERE EYETRACKINGSOFTWARE GETS THE FRAMES TO SEARCH WITHIN. (opencv)
directShow directShowClass;
directShowClass.initiateDirectShow(false, &cFrame); //INITIATES DIRECTSHOW WITH POINTER TO BUFFER WHERE IT SHOULD SAVE FRAMES FROM CAM
directShowClass.runDirectShow(); //START CAPTURING FRAMES INTO BUFFER
eyeTracking.runTrackingSystem(); //START SEARCH FOR FACE AND EYES.
system("pause");
directShowClass.stopDirectShow();
}
I want "directShowClass.runDirectShow();" and "eyeTracking.runTrackingSystem();" to run in real parallel. now I think that they run as threads in pseudo-parallel. (simple printf in each method occur mixed up in the terminal).
I guess that making a program run in parallel is not that simple as I would like it to be. But I guess that it is possible :D
Please give me some advise where to start searching for information about how to paralellisize.
I have a dual core processor.
Thanks!
Printf is not thread-safe,aka it can mix up buffers like you encountered. You can run the process in pseudo-parallel (like switch each call to another processing step) or run it in hardware-concurrency (std::thread, pthreads, windows thread, boost::thread).
if you have a dual core processor you surely can take advantage of multi-core processing, I would suggest to use boost.
Just to clear out, by using threads you do get real parallelism. But remember that your computer is also running in it's cores other processes in background that occupy CPU time, so your functions are not always being executed.
In order to get some parallelism in C++ you have many options. I name three:
. The oldest most common way is to use the pthread library, which is built in into almost every compiler.
. The new C++ standard, called C++11 includes some native libraries to deal with multi-threading, you can check that out, but it is still not supported by every compiler. And most compilers that support it have only partial functionality. You also need to activate the standard explicitly. For example:
gcc -std=c++11
. Finally, if you are in the mood for some "higher level" stuff, you can put some effort in learning about the OpenMP framework, which uses pragma directives to annotate parallel tasks. The framework will then deal with all the creation of threads, so you can use your time in some other stuff.
P.S: The reason why the output comes out mixed is not because the threads run in pseudo-parallel, but because they are concurrently writting on the buffer. So when the buffer is dumped you see it as the threads wrote it. If any, this is a proof that they are actually running in parallel, but you are making them write their output in the same buffer ;)