I'm trying to capture the desktop frames using Desktop Duplication API, and encode them right away with NvPipe without going through CPU access to pixels.
Is there any way to have the ID3D11Texture2D data as an input for NvPipe, or some other efficient way of doing it? I'm working on a VR solution that requires as low latency as possible, so even a 1ms saved is a big deal.
Edit: After following recommendations from #Soonts, I've ended up with this code which doesn't seem to work:
cudaArray *array;
m_DeviceContext->CopySubresourceRegion(CopyBuffer, 0, 0, 0, 0, m_SharedSurf, 0, Box);
cudaError_t err = cudaGraphicsD3D11RegisterResource(&_cudaResource, CopyBuffer, cudaGraphicsRegisterFlagsNone);
err = cudaGraphicsResourceSetMapFlags(_cudaResource, cudaGraphicsMapFlagsReadOnly);
cudaStream_t cuda_stream;
cudaStreamCreate(&cuda_stream);
err = cudaGraphicsMapResources(1, &_cudaResource, cuda_stream);
err = cudaGraphicsSubResourceGetMappedArray(&array, _cudaResource, 0, 0);
uint64_t compressedSize = NvPipe_Encode(encoder, array, dataPitch, buffer.data(), buffer.size(), width, height, false);
The NvPipe_Encode results in a memory access violation and does nothing. I don't know which step I'm messing up, as I can't seem to find any documentation about any of the functions or variables/structures online, and putting watches on variables shows no valuable information other than their address in memory.
I have not tried but I think should be doable with CUDA interop.
Call cudaGraphicsD3D11RegisterResource to register a texture for CUDA interop. Not sure you can register the DWM-owned texture you get from DD. If you can't, make another one (in a default pool), register that one, and update each frame with ID3D11DeviceContext::CopyResource.
Call cudaGraphicsResourceSetMapFlags to specify you want read-only access from CUDA side of the interop.
Call cudaGraphicsMapResources to allow CUDA to access the texture.
Call cudaGraphicsSubResourceGetMappedArray. Now you have a device pointer with the frame data you can give to NvPipe.
P.S. Other option is using media foundation instead of NvPipe. Will work on all GPUs not just nVidia, on most systems MF also uses hardware encoders. I’m not sure about latency, never used MF for anything too sensitive to it, and never used NvPipe at all, have no idea how they compare.
Related
I'm trying to use the MediaFoundation API to encode a video but I'm having problems pushing the samples to the SinkWriter.
I'm getting the frames to encode through the Desktop Duplication API. What I end up with is an ID3D11Texture2D with the desktop image in it.
I'm trying to create an IMFVideoSample containing this surface and then push that video sample to a SinkWriter.
I've tried going about this in different ways:
I called MFCreateVideoSampleFromSurface(texture, &pSample) where texture is the ID3D11Texture2D, filled in the SampleTime and SampleDuration and then passed the created sample to the SinkWriter.
SinkWriter returned E_INVALIDARG.
I tried creating the sample by passing nullptr as the first argument and creating the buffer myself using MFCreateDXGISurfaceBuffer, and then passing the resulting buffer into the Sample.
That didn't work either.
I read through the MediaFoundation documentation and couldn't find detailed information on how to create the sample out of a DirectX texture.
I ran out of things to try.
Has anyone out there used this API before and can think of things I should check, or of any way on how I can go about debugging this?
First of all you should learn to use mftrace tool.
Very likely, it will tell you the problem right away.
But my guess is, following problems are likely.
Probably, some other attributes are required besides SampleTime / SampleDuration.
Probably, SinkWriter needs a texture it can read on CPU. To fix that, when a frame is available, create a staging texture of the same format + size, call CopyResource to copy desktop to staging texture, then pass that staging texture to MF.
Even if you use a hardware encoder so the CPU never tries to read the texture data, I don’t think it’s a good idea to directly pass your desktop texture to MF.
When you set a D3D texture for sample, no data is copied anywhere, the sample merely retains the texture.
MF works asynchronously, it may buffer several samples in its topology nodes if they want to.
DD gives you data synchronously, you may only access the texture between AcquireNextFrame and ReleaseFrame calls.
Tying to figure out what the issue (and error code) is for this call. First to preface this works just fine on AMD, it only has issues on nVidia.
unsigned char *buffer;
...
cl_int status;
cl::size_t<3> origin;
cl::size_t<3> region;
origin[0]=0;
origin[1]=0;
origin[2]=0;
region[0]=m_width;
region[1]=m_height;
region[2]=1;
status=clEnqueueWriteImage(m_commandQueue, m_image, CL_FALSE, origin, region, 0, 0, buffer, 0, NULL, NULL);
status returns -1000, which is not a standard openCl error code. All other functions related to the opening of the device, context, and command queue all succeed. The context is interop'ed with openGl and again this is all completely functional on AMD.
For future reference, seems the error happens if the image is interop'ed with an OpenGL texture and the call is made before the image is acquired using clEnqueueAcquireGLObjects. I had used the acquire later when images were used but not right before the image was set. Amd's driver does not appear to care about this little detail.
According to this ms blog post
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/02/interop-with-direct3d-textures-in-c-amp.aspx
You can write directly to the backbuffer from C++AMP.
Using Interop, you can get the texture object of the back buffer associated with the window using the IDXGISwapChain and update it directly in the C++ AMP kernel.
I created an amp device descriptor from the dx device and I got a pointer to the backbuffer and then tried to make an amp texture from it, but I found that the texture descriptor bindFlags, of the backbuffer, were only D3D11_BIND_RENDER_TARGET and I needed at least D3D11_BIND_UNORDERED_ACCESS or D3D11_BIND_SHADER_RESOURCE in order for Concurrency::graphics::direct3d::make_texture to function.
I can easily enough make any other d3d texture and connect that to amp, if I set the bindflags, but for the flags set on the backbuffer, I cannot connect them.
Then I find this post
http://social.msdn.microsoft.com/Forums/vstudio/en-US/15aa1186-210b-4ba7-89b0-b74f742d6830/c-amp-and-direct2d
which has the following marked an an answer by a Microsoft community contributor
I was trying to write to back buffer of the swap chain directly. As far as I understood, this can't be done, because usage flags that can be used when creating a back buffer texture are incompatible with ones that are needed by C++ AMP to manipulate the texture.
So, on one hand, it (writing to backbuffer from c++AMP) is used an an example of interop and on the other hand it is explained to not be possible...?
My current requirement is just to generate a raytraced image in C++AMP and show that on a d3d display without copying data back from the graphics card every frame. I realize that I could just generate my own texture and then render a quad with that, but it would be simpler writing directly to the backbuffer, and if it can be done, that is what I would like to do.
Perhaps someone here can explain if it can be done and what steps are required to accomplish this, or alternatively explain that no, this truly cannot be done.
Thanks in advance for any help on this topic.
[EDIT]
I now found this info
https://software.intel.com/en-us/articles/microsoft-directcompute-on-intel-ivy-bridge-processor-graphics
// this qualifies the back buffer for being the target of compute shader writes
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_UNORDERED_ACCESS | DXGI_USAGE_SHADER_INPUT;
I actually did try that previously, but the call to CreateSwapChainForCoreWindow fails with
First-chance exception at 0x75251D4D in TestDxAmp.exe: Microsoft C++ exception: Platform::InvalidArgumentException ^ at memory location 0x0328E484. HRESULT:0x80070057 The parameter is incorrect.
Which is not being very informative.
I think the original forum post is maybe misleading. For both texture and buffer interop the unordered access binding is required for AMP interop. AMP is built on top of DX/DirectCompute so this applies in both cases as noted in the Intel link.
your program can create an arrayassociated with an existing Direct3D
buffer using the make_array()function.
template<typename T, int N>
array<T,N> make_array(const extent& ext, IUnknown* buffer);
The
Direct3D buffer must implement the ID3D11Bufferinterface. It must
support raw views (D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS) and
allow SHADER_RESOURCE and UNORDERED_ACCESS binding. The buffer itself
must be of the correct size, the size of the extent multiplied by the
size of the buffer type. The following code uses make_arrayto create
an array using the accelerator_view, dxView, which was created in the
previous section: HRESULT hr = S_OK;
-- C++ AMP Book
I'm not a DX expert but from the following post it looks like you can configure the swap chain to support UAVs.
Sobel Filter Compute Shader
I need some way to get screen data and pass them to DX9 surface/texture in my aplication and render it at at least 25fps at 1600*900 resolution, 30 would be better.
I tried BitBliting but even after that I am at 20fps and after loading data into texture and rendering it I am at 11fps which is far behind what I need.
GetFrontBufferData is out of question.
Here is something about using Windows Media API, but I am not familiar with it. Sample is saving data right into file, maybe it can be set up to give you individual frames, but I haven't found good enough documentation to try it on my own.
My code:
m_memDC.BitBlt(0, 0, m_Rect.Width(),m_Rect.Height(), //m_Rect is area to be captured
&m_dc, m_Rect.left, m_Rect.top, SRCCOPY);
//at 20-25fps after this if I comment out the rest
//DC,HBITMAP setup and memory alloc is done once at the begining
GetDIBits( m_hDc, (HBITMAP)m_hBmp.GetSafeHandle(),
0L, // Start scan line
(DWORD)m_Rect.Height(), // # of scan lines
m_lpData, // LPBYTE
(LPBITMAPINFO)m_bi, // address of bitmapinfo
(DWORD)DIB_RGB_COLORS); // Use RGB for color table
//at 17-20fps
IDirect3DSurface9 *tmp;
m_pImageBuffer[0]->GetSurfaceLevel(0,&tmp); //m_pImageBuffer is Texture of same
//size as bitmap to prevent stretching
hr= D3DXLoadSurfaceFromMemory(tmp,NULL,NULL,
(LPVOID)m_lpData,
D3DFMT_X8R8G8B8,
m_Rect.Width()*4,
NULL,
&r, //SetRect(&r,0,0,m_Rect.Width(),m_Rect.Height();
D3DX_DEFAULT,0);
//12-14fps
IDirect3DSurface9 *frameS;
hr=m_pFrameTexture->GetSurfaceLevel(0,&frameS); // Texture of that is rendered
pd3dDevice->StretchRect(tmp,NULL,frameS,NULL,D3DTEXF_NONE);
//11fps
I found out that for 512*512 square its running on 30fps (for i.e. 490*450 at 20-25) so I tried dividing screen, but it didn't seem to work well.
If there is something missing in code please write, don't vote down. Thanks
Starting with Windows 8, there is a new desktop duplication API that can be used to capture the screen in video memory, including mouse cursor changes and which parts of the screen actually changed or moved. This is far more performant than any of the GDI or D3D9 approaches out there and is really well-suited to doing things like encoding the desktop to a video stream, since you never have to pull the texture out of GPU memory. The new API is available by enumerating DXGI outputs and calling DuplicateOutput on the screen you want to capture. Then you can enter a loop that waits for the screen to update and acquires each frame in turn.
To encode the frames to a video, I'd recommend taking a look at Media Foundation. Take a look specifically at the Sink Writer for the simplest method of encoding the video frames. Basically, you just have to wrap the D3D textures you get for each video frame into IMFSample objects. These can be passed directly into the sink writer. See the MFCreateDXGISurfaceBuffer and MFCreateVideoSampleFromSurface functions for more information. For the best performance, typically you'll want to use a codec like H.264 that has good hardware encoding support (on most machines).
For full disclosure, I work on the team that owns the desktop duplication API at Microsoft, and I've personally written apps that capture the desktop (and video, games, etc.) to a video file at 60fps using this technique, as well as a lot of other scenarios. This is also used to do screen streaming, remote assistance, and lots more within Microsoft.
If you don't like the FrontBuffer, try the BackBuffer:
LPDIRECT3DSURFACE9 surface;
surface = GetBackBufferImageSurface(&fmt);
to save it to a file use
D3DXSaveSurfaceToFile(filename, D3DXIFF_JPG, surface, NULL, NULL);
A while ago I converted a C# program of mine to use OpenGL and found it ran perfectly (and faster) on my Computer at Home. However, I have 2 issues. Firstly, the code I use to free textures from the graphics card doesn't word, it gives me a memory access violation exception at runtime. Secondly, most of the graphics don't work on any other machine but mine.
By accident, I managed to convert some of the graphics to 8-bit PNGs (all the others are 32bit) and these work fine on other machines. Recognising this, I attempted to regulate the quality when loading the images. My attempts failed (this was a while ago, I think they largely involved trying to format a bitmap then using the GDI to draw the texture onto it, creating a lower quality version). Is there any way in .NET to take a bitmap and nicely change the quality? The code concerned is below. I recall it is largely based on some I found on Stack Overflow in the past, but which didn't quite suit my needs. 'img' as a .NET Image, and 'd' is an integer dimension, which I use to ensure the images are square.
uint[] output = new uint[1];
Bitmap bMap = new Bitmap(img, new Size(d, d));
System.Drawing.Imaging.BitmapData bMapData;
Rectangle rect = new Rectangle(0, 0, bMap.Width, bMap.Height);
bMapData = bMap.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadOnly, bMap.PixelFormat);
gl.glGenTextures(1, output);
gl.glBindTexture(gl.GL_TEXTURE_2D, output[0]);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_MAG_FILTER, gl.GL_NEAREST);
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MIN_FILTER, gl.GL_NEAREST);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_WRAP_S, gl.GL_CLAMP);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_WRAP_T, gl.GL_CLAMP);
gl.glPixelStorei(gl.GL_UNPACK_ALIGNMENT, 1);
if (use16bitTextureLimit)
gl.glTexImage2D(gl.GL_TEXTURE_2D, 0, gl.GL_RGBA_FLOAT16_ATI, bMap.Width, bMap.Height, 0, gl.GL_BGRA, gl.GL_UNSIGNED_BYTE, bMapData.Scan0);
else
gl.glTexImage2D(gl.GL_TEXTURE_2D, 0, gl.GL_RGBA, bMap.Width, bMap.Height, 0, gl.GL_BGRA, gl.GL_UNSIGNED_BYTE, bMapData.Scan0);
bMap.UnlockBits(bMapData);
bMap.Dispose();
return output;
The 'use16bitTextureLimit' is a bool, and I rather hoped the code shown would reduce the quality to 16bit, but I havn't noticed any difference. It may be that this works and the Graphics cards still don't like it. I was unable to find any indication of a way to use 8-bit PNgs.
This is in a function which returns the uint array (as a texture address) for use when rendering. The faulty texture disposale simply involves: gl.glDeleteTextures(1, imgsGL[i]); Where imgGL is an array of unit arrays.
As said, the rendering is fine on some computers, and the texture deletion causes a runtime error on all systems (except my netbook, where I can't create textures atall, though I think that may be linked to the quality issue).
If anyone can provide any info of relevance, that would be great. I've spent many days on the program, and would really like to more compatible with less good graphics cards.
The kind of access violation you encounter usually happens if the call to glTexImage2D causes a buffer overrun. Double check that all the glPixelStore parameters related to unpacking are properly set and that the format parameter (the second one that is) matches the type and size of the data you supply. I know this kind of bg very well, and those are the first checks I usually do, whenever I encounter it.
For the texture not showing up: Did you check, that the texture's dimensions are actually powers of two each?In C using a macro the test for power of two can be written like this (this one boils down to testing, that there's only one of the bits of a integer is set)
#define ISPOW2(x) ( x && !( (x) & ((x) - 1) ) )
It is not neccessary that a texture image is square, though. Common misconception, but you really just have to make sure that each dimension is a power of 2. A 16×128 image is perfectly fine.
Changing the internal format to GL_RGBA_FLOAT16_ATI will probably even increase quality, but one can not be sure, as GL_RGBA may coerce to to anything the driver sees fit. Also this is a vendor specific format, so I disregard it's use. There are all kinds of ARB formats, also a half float one (which FLOAT16_ATI is).