Context:
I'm developing a native C++ Unity 5 plugin that reads in DXT compressed texture data and uploads it to the GPU for further use in Unity. The aim is to create an fast image-sequence player, updating image data on-the-fly. The textures are compressed with an offline console application.
Unity can work with different graphics engines, I'm aiming towards DirectX11 and OpenGL 3.3+.
Problem:
The DirectX runtime texture update code, through a mapped subresource, gives different outputs on different graphics drivers. Updating a texture through such a mapped resource means mapping a pointer to the texture data and memcpy'ing the data from the RAM buffer to the mapped GPU buffer. Doing so, different drivers seem to expect different parameters for the row pitch value when copying bytes. I never had problems on the several Nvidia GPU's I tested on, but AMD and Intel GPU seems to act differently and I get distorted output as shown underneath. Furthermore, I'm working with DXT1 pixel data (0.5bpp) and DXT5 data (1bpp). I can't seem to get the correct pitch parameter for these DXT textures.
Code:
The following initialisation code for generating the d3d11 texture and filling it with initial texture data - e.g. the first frame of an image sequence - works perfect on all drivers. The player pointer points to a custom class that handles all file reads and contains getters for the current loaded DXT compressed frame, it's dimensions, etc...
if (s_DeviceType == kUnityGfxRendererD3D11)
{
HRESULT hr;
DXGI_FORMAT format = (compression_type == DxtCompressionType::DXT_TYPE_DXT1_NO_ALPHA) ? DXGI_FORMAT_BC1_UNORM : DXGI_FORMAT_BC3_UNORM;
// Create texture
D3D11_TEXTURE2D_DESC desc;
desc.Width = w;
desc.Height = h;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = format;
// no anti-aliasing
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
// Initial data: first frame
D3D11_SUBRESOURCE_DATA data;
data.pSysMem = player->getBufferPtr();
data.SysMemPitch = 16 * (player->getWidth() / 4);
data.SysMemSlicePitch = 0; // just a 2d texture, no depth
// Init with initial data
hr = g_D3D11Device->CreateTexture2D(&desc, &data, &dxt_d3d_tex);
if (SUCCEEDED(hr) && dxt_d3d_tex != 0)
{
DXT_VERBOSE("Succesfully created D3D Texture.");
DXT_VERBOSE("Creating D3D SRV.");
D3D11_SHADER_RESOURCE_VIEW_DESC SRVDesc;
memset(&SRVDesc, 0, sizeof(SRVDesc));
SRVDesc.Format = format;
SRVDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
SRVDesc.Texture2D.MipLevels = 1;
hr = g_D3D11Device->CreateShaderResourceView(dxt_d3d_tex, &SRVDesc, &textureView);
if (FAILED(hr))
{
dxt_d3d_tex->Release();
return hr;
}
DXT_VERBOSE("Succesfully created D3D SRV.");
}
else
{
DXT_ERROR("Error creating D3D texture.")
}
}
The following update code that runs for each new frame has the error somewhere. Please note the commented line containing method 1 using a simple memcpy without any rowpitch specified which works well on NVIDIA drivers.
You can see further in method 2 that I log the different row pitch values. For instace for a 1920x960 frame I get 1920 for the buffer stride, and 2048 for the runtime stride. This 128 pixels difference probably have to be padded (as can be seen in the example pic below) but I can't figure out how. When I just use the mappedResource.RowPitch without dividing it by 4 (done by the bitshift), Unity crashes.
ID3D11DeviceContext* ctx = NULL;
g_D3D11Device->GetImmediateContext(&ctx);
if (dxt_d3d_tex && bShouldUpload)
{
if (player->gather_stats) before_upload = ns();
D3D11_MAPPED_SUBRESOURCE mappedResource;
ctx->Map(dxt_d3d_tex, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
/* 1: THIS CODE WORKS ON ALL NVIDIA DRIVERS BUT GENERATES DISTORTED OR NO OUTPUT ON AMD/INTEL: */
//memcpy(mappedResource.pData, player->getBufferPtr(), player->getBytesPerFrame());
/* 2: THIS CODE GENERATES OUTPUT BUT SEEMS TO NEED PADDING? */
BYTE* mappedData = reinterpret_cast<BYTE*>(mappedResource.pData);
BYTE* buffer = player->getBufferPtr();
UINT height = player->getHeight();
UINT buffer_stride = player->getBytesPerFrame() / player->getHeight();
UINT runtime_stride = mappedResource.RowPitch >> 2;
DXT_VERBOSE("Buffer stride: %d", buffer_stride);
DXT_VERBOSE("Runtime stride: %d", runtime_stride);
for (UINT i = 0; i < height; ++i)
{
memcpy(mappedData, buffer, buffer_stride);
mappedData += runtime_stride;
buffer += buffer_stride;
}
ctx->Unmap(dxt_d3d_tex, 0);
}
Example pic 1 - distorted ouput when using memcpy to copy whole buffer without using separate row pitch on AMD/INTEL (method 1)
Example pic 2 - better but still erroneous output when using above code with mappedResource.RowPitch on AMD/INTEL (method 2). The blue bars indicate zone of error, and need to disappear so all pixels align well and form one image.
Thanks for any pointers!
Best,
Vincent
The mapped data row pitch is in byte, when you divide by four, it is definitely an issue.
UINT runtime_stride = mappedResource.RowPitch >> 2;
...
mappedData += runtime_stride; // here you are only jumping one quarter of a row
It is the height count with a BC format that is divide by 4.
Also a BC1 format is 8 bytes per 4x4 block, so the line below should by 8 * and not 16 *, but as long as you handle row stride properly on your side, d3d will understand, you just waste half the memory here.
data.SysMemPitch = 16 * (player->getWidth() / 4);
Related
I am trying to understand how to work the rendering for YV12 format. For example, I took a simple sample. See this graph:
The webcam creates frames by size 640x480 in RGB24 or MJPEG . After it the LAV decoder transforms the frames to YV12 and sends them to DS renderer (EVR or VMR9).
The decoder changes the frame width (stride) 640 on 1024. Hence, the output size of frame will be 1.5*1024*640=737280. The normal size for YV12 is 1.5*640*480=460800. I know the stride can be more than the width of real frame (https://learn.microsoft.com/en-us/windows/desktop/medfound/image-stride). My first question - why did the renderer select that value (1024) than another? Can I get it programmatically?
When I replace the LAV decoder with my filter for transformation RGB24/YV12 (https://gist.github.com/thedeemon/8052fb98f8ba154510d7), the renderer shows me a shifted image, though all parameters are the same, as for the first graph:
Why? I noted that VIDEOINFOHEADER2 had the set interlacing flag dwInterlaceFlags. Therefore my next question: Do I have to add interlacing into my filter for normal work of renderer?
My first question - why did the renderer select that value (1024) than another? Can I get it programmatically?
Video renderer is using Direct3D texture as a carrier for the image. When texture is mapped into system memory to enable CPU's write access such extended stride could be applied because of specifics of implementation of video hardware. You get the value 1024 via dynamic media type negotiation as described in Handling Format Changes from the Video Renderer.
Your transformation filter has to handle such updates if you want it to be able to connect to video renderer directly.
You are generally not interested in getting this extended stride value otherwise because the one you get via media type update is the one to be used and you have to accept it.
When I replace the LAV decoder with my filter for transformation RGB24/YV12, the renderer shows me a shifted image, though all parameters are the same, as for the first graph...
Why?
Your filter does not handle stride update right.
...I noted that VIDEOINFOHEADER2 had the set interlacing flag dwInterlaceFlags. Therefore my next question: Do I have to add interlacing into my filter for normal work of renderer?
You don't have interlaced video here. The problem is unrelated to interlaced video.
My solution:
I must right copy a YV12 frame into a video buffer by its three surfaces: Y = 4x4, U = 1x2, V = 1x2. Here is a code for the frame size 640x480:
CVideoGrabberFilter::Transform(IMediaSample *pIn, IMediaSample *pOut)
{
BYTE* pSrcBuf = 0;
pIn->GetPointer(&pSrcBuf);
BYTE* pDstBuf = 0;
pOut->GetPointer(&pDstBuf);
SIZE size;
size.cx = 640;
size.cy = 480;
int nLen = pOut->GetActualDataLength();
BYTE* pDstTmp = new BYTE[nLen];
YV12ConverterFromRGB24(pSrcBufEnd, pDstTmp, size.cx, size.cy);
BYTE* pDst = pDstTmp;
int stride = 1024; //the real video stride for 640x480. For other resolutions you need to use pOut->GetMediaType() for the stride defining.
//Y
for (int y = 0; y < size.cy; ++y)
{
memcpy(pDstBuf, pDst, size.cx);
pDst += size.cx;
pDstBuf += stride;
}
stride /= 2;
size.cy /= 2;
size.cx /= 2;
//U and V
for (int y = 0; y < size.cy; y++ )
{
memcpy(pDstBuf, pDst, size.cx );
pDst += size.cx;
pDstBuf += stride;
memcpy(pDstBuf, pDst, size.cx);
pDst += size.cx;
pDstBuf += stride;
}
delete[] pDstTmp;
}
For some reason, the code below crashes when I try to create the 1d texture.
D3D11_TEXTURE1D_DESC desc;
ZeroMemory(&desc, sizeof(D3D11_TEXTURE1D_DESC));
desc.Width = 64;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_SNORM;
desc.Usage = D3D11_USAGE_STAGING;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE | D3D11_CPU_ACCESS_READ;
HRESULT hr = D3DDev_0001->CreateTexture1D(&desc, NULL, &texture); //crashes here
assert(hr == S_OK);
where D3DDev_0001 is a ID3D11Device. I am able to create 3d and 2d textures, but making a 1d texture causes the program to crash. Can anyone explain why?
A USAGE_STAGING texture can't have any BindFlags since it can't be set on the graphics context for use as an SRV, UAV or RTV. Set BindFlags to 0 if you want a STAGING texture, or set the Usage to D3D11_USAGE_DEFAULT if you just want a 'normal' texture that can be bound to the context.
USAGE_STAGING resources are either for the CPU to fill in with data before being copied to a USAGE_DEFAULT resource, or, they're the destination for GPU copies to get data from the GPU back to the CPU.
The exact cause of this error would have been explained in a message printed by "D3D11's Debug Layer"; use it to find the cause of these errors in the future.
I am trying to read input from camera and use it to create background surface in D3D11. I receive memory errors all the time.
My render Target size is: 2364 * 1461
Image I get from the camera is an array of type unsigned char
unsigned char* p = g_pOvrvision->GetCamImage(OVR::OV_CAMEYE_LEFT, (OVR::OvPSQuality)processer_quality);
It returns 640 * 480 * 3 bytes. The code I am working with is below. CreateTexture2D gives a memory error. I tried filling the array with dummy data to fit all 2364 * 1461. This did not work either.
Could you please suggest me solution?
D3D11_TEXTURE2D_DESC desc;
ZeroMemory(&desc, sizeof(desc));
desc.Width = renderTargetSize.w;
desc.Height = renderTargetSize.h;
desc.MipLevels = desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
ID3D11Texture2D *pTexture = NULL;
D3D11_SUBRESOURCE_DATA TexInitData;
ZeroMemory(&TexInitData, sizeof(D3D11_SUBRESOURCE_DATA));
TexInitData.pSysMem = texArray;
TexInitData.SysMemPitch = static_cast<UINT>(2364 * 3);
TexInitData.SysMemSlicePitch = static_cast<UINT>(3 * 2364 * 1461 * sizeof(unsigned char));
d3dDevice->CreateTexture2D(&desc, &TexInitData, &d3dEyeTexture);
d3dDevice->CreateShaderResourceView(d3dEyeTexture, nullptr, &d3dEyeTextureShaderResourceView);
d3dDevice->CreateRenderTargetView(d3dEyeTexture, nullptr, &d3dEyeTextureRenderTargetView);
Chuck Walbourn's comment gets you 99% of the way there: there is no three-byte DXGI pixel format, so to properly pack the data you are receiving from the camera you must create a new buffer that will match the DXGI_FORMAT_R8G8B8A8_UNORM format, and copy your camera data to it adding a fourth value (an alpha value) to each pixel of 0xFF.
But as you mentioned your last line of code still fails. I believe this is because you haven't set d3dEyeTexture with the bind flag D3D11_BIND_RENDER_TARGET as well as D3D11_BIND_SHADER_RESOURCE. From what I understand, however, if you're trying to make the camera image of 640x480 the background of your 2364x1461 render target, you'll need to resample the data to fit, which is a pain to do (and slow) on the CPU. My recommendation would be to create a separate, D3D11_BIND_SHADER_RESOURCE flagged, 640x480 texture with your camera data, and draw it to fill the whole screen as the first step in your draw loop. This means creating a separate, dedicated 2364x1461 render target as your back buffer, but that's a common first step in setting up a Direct3D app.
I want to understand DXGI Desktop Duplication. I have read a lot and this is the code I copied from parts of the DesktopDuplication sample on the Microsoft Website. My plan is to get the Buffer or Array from the DesktopImage because I want to make a new Texture for an other program. I hope somebody can explain me what I can do to get it.
void DesktopDublication::GetFrame(_Out_ FRAME_DATA* Data, _Out_ bool* Timeout)
{
IDXGIResource* DesktopResource = nullptr;
DXGI_OUTDUPL_FRAME_INFO FrameInfo;
// Get new frame
HRESULT hr = m_DeskDupl->AcquireNextFrame(500, &FrameInfo, &DesktopResource);
if (hr == DXGI_ERROR_WAIT_TIMEOUT)
{
*Timeout = true;
}
*Timeout = false;
if (FAILED(hr))
{
}
// If still holding old frame, destroy it
if (m_AcquiredDesktopImage)
{
m_AcquiredDesktopImage->Release();
m_AcquiredDesktopImage = nullptr;
}
// QI for IDXGIResource
hr = DesktopResource->QueryInterface(__uuidof(ID3D11Texture2D), reinterpret_cast<void **>(&m_AcquiredDesktopImage));
DesktopResource->Release();
DesktopResource = nullptr;
if (FAILED(hr))
{
}
// Get metadata
if (FrameInfo.TotalMetadataBufferSize)
{
// Old buffer too small
if (FrameInfo.TotalMetadataBufferSize > m_MetaDataSize)
{
if (m_MetaDataBuffer)
{
delete[] m_MetaDataBuffer;
m_MetaDataBuffer = nullptr;
}
m_MetaDataBuffer = new (std::nothrow) BYTE[FrameInfo.TotalMetadataBufferSize];
if (!m_MetaDataBuffer)
{
m_MetaDataSize = 0;
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
m_MetaDataSize = FrameInfo.TotalMetadataBufferSize;
}
UINT BufSize = FrameInfo.TotalMetadataBufferSize;
// Get move rectangles
hr = m_DeskDupl->GetFrameMoveRects(BufSize, reinterpret_cast<DXGI_OUTDUPL_MOVE_RECT*>(m_MetaDataBuffer), &BufSize);
if (FAILED(hr))
{
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
Data->MoveCount = BufSize / sizeof(DXGI_OUTDUPL_MOVE_RECT);
BYTE* DirtyRects = m_MetaDataBuffer + BufSize;
BufSize = FrameInfo.TotalMetadataBufferSize - BufSize;
// Get dirty rectangles
hr = m_DeskDupl->GetFrameDirtyRects(BufSize, reinterpret_cast<RECT*>(DirtyRects), &BufSize);
if (FAILED(hr))
{
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
Data->DirtyCount = BufSize / sizeof(RECT);
Data->MetaData = m_MetaDataBuffer;
}
Data->Frame = m_AcquiredDesktopImage;
Data->FrameInfo = FrameInfo;
}
If I'm understanding you correctly, you want to get the current desktop image, duplicate it into a private texture, and then render that private texture onto your window. I would start by reading up on Direct3D 11 and learning how to render a scene, as you will need D3D to do anything with the texture object you get from DXGI. This, this, and this can get you started on D3D11. I would also spend some time reading through the source of the sample you copied your code from, as it completely explains how to do this. Here is the link to the full source code for that sample.
To actually get the texture data and render it out, you need to do the following:
1). Create a D3D11 Device object and a Device Context.
2). Write and compile a Vertex and Pixel shader for the graphics card, then load them into your application.
3). Create an Input Layout object and set it to the device.
4). Initialize the required Blend, Depth-Stencil, and Rasterizer states for the device.
5). Create a Texture object and a Shader Resource View object.
6). Acquire the Desktop Duplication texture using the above code.
7). Use CopyResource to copy the data into your texture.
8). Render that texture to the screen.
This will capture all data displayed on one of the desktops to your texture. It does not do processing on the dirty rects of the desktop. It does not do processing on moved regions. This is bare bones 'capture the desktop and display it elsewhere' code.
If you want to get more in depth, read the linked resources and study the sample code, as the sample basically does what you're asking for.
Since tacking this onto my last answer didn't feel quite right, I decided to create a second.
If you want to read the desktop data to a file, you need a D3D11 Device object, a texture object with the D3D11_USAGE_STAGING flag set, and a method of converting the RGBA pixel data of the desktop texture to whatever it is you want. The basic procedure is a simplified version of the one in my original answer:
1). Create a D3D11 Device object and a Device Context.
2). Create a Staging Texture with the same format as the Desktop Texture.
3). Use CopyResource to copy the Desktop Texture into your Staging Texture.
4). Use ID3D11DeviceContext::Map() to get a pointer to the data contained in the Staging Texture.
Make sure you know how Map works and make sure you can write out image files from a single binary stream. There may also be padding in the image buffer, so be aware you may also need to filter that out. Additionally, make sure you Unmap the buffer instead of calling free, as the buffer given to you almost certainly does not belong to the CRT.
Here is my trying
in = cvLoadImage("24bpp_1920x1200_1.bmp", 1);
HRESULT err;
IDirect3DTexture9 * texture = NULL;
///D3DFMT_L8, D3DFMT_R8G8B8
err = D3DXCreateTexture(g_pd3dDevice, in->width, in->height, 1, 0 , D3DFMT_R8G8B8, D3DPOOL_MANAGED, &g_pTexture);
D3DLOCKED_RECT lockRect;
RECT rect;
err = g_pTexture->LockRect(0, &lockRect, NULL, 0);//I have specified the format is RGB, then why does lockRect.Picth = 7680?
memcpy(lockRect.pBits, in->imageData, in->widthStep*in->height);
if(FAILED(g_pTexture->UnlockRect(0)))
{
///
}
It can't display an image in RGB format. But it can display an image in grayscale or in RGBA format.
Otherwise, I want to display high resolution image like the sample of displaying image using d3d in Dx Sdk_june10".\DXSDK\Samples\InstalledSamples\Textures" does.
But, again, it can't display an image in size more than 1920x1200px approximately.
How to do ?
Two things to keep in mind:
(1) You need to check for caps that indicates support for the 24-bpp format. Not every Direct3D 9 device supports it.
(2) You have to respect the pitch returned by LockRect, so you need to do the copy scan-line by scan-line, so you can't do the whole thing in just one memcpy. Something like:
BYTE* sptr = in->imageData;
BYTE* dptr = lockRect.pBits;
for( int y = 0; y < in->height; ++y )
{
memcpy( dptr, sptr, in->widthStep );
sptr += in->widthStep;
dptr += lockRect.Pitch;
}
This assumes in->widthStep is the pitch of the source image in bytes, that in->widthStep is <= lockRect.Pitch, and that the format of the source image is identical to the format of the Direct3D resource.