Strange Direct3D memory behavior in Present() - c++

So I am working on my own rendering engine and encountered some strange memory behavior.
When ever I call
IDXGISwapChain::Present(0, 0)
it increases my programs memory usage by the size of the vertices I rendered that frame.
I create a vertex buffer using this code:
ID3D10Buffer *pVertexBuffer;
D3D10_BUFFER_DESC desc;
desc.Usage = D3D10_USAGE_DYNAMIC;
desc.ByteWidth = stride * nNumVertices;
desc.BindFlags = D3D10_BIND_VERTEX_BUFFER;
desc.CPUAccessFlags = D3D10_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
D3D10_SUBRESOURCE_DATA vData;
vData.pSysMem = Vertices;
HRESULT hr = m_pDevice->CreateBuffer(&desc, &vData, &pVertexBuffer);
Draw it using
m_pDevice->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
m_pDevice->IASetVertexBuffers(0, 1, &pVertexBuffer, &stride, &offset);
m_pDevice->Draw(nNumVertices, 0);
And then Release it
pVertexBuffer->Release();
This throws the error
D3D10 INFO: ID3D10Device::IASetVertexBuffers: A currently bound VertexBuffer is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #31: IASETVERTEXBUFFERS_UNBINDDELETINGOBJECT]
But according to MSDN and other questions, this shouldn't be a problem.
Has anyone else experienced this before, or could give me a helping hand?
Edit1:
This happens a certain number of times, around 500, after this is stops using more memory.
If I then unload my mesh and reload it (so it has a different pointer), present() starts allocating more memory, the size of my mesh, for around 500 times. This goes on until my program runs out of memory.
One thing to add, the first time I call Present() it increases the used memory by the size of any texture I have bound, this includes the back buffer!

After a lot of experimenting I have found what is happening here.
Each time a ID3D10Buffer is created using CreateBuffer(), IDXGISwapChain::Present() will allocate 4 Kilobyte. For me, this happens 64 times per identical buffer and then it either stops, or releases the previously allocated memory.
I hope this might help someone else with this undocumented 'feature'

Related

OpenGL How does updating buffers affect speed

I have a buffer I map to vertex attributes to send. Here is the basic functionality of the code:
glBindBuffer(GL_ARRAY_BUFFER, _bufferID);
_buffer = (VertexData*)glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
for(Renderable* renderable : renderables){
const glm::vec3& size = renderable->getSize();
const glm::vec3& position = renderable->getPosition();
const glm::vec4& color = renderable->getColor();
const glm::mat4& modelMatrix = renderable->getModelMatrix();
glm::vec3 vertexNormal = glm::vec3(0, 1, 0);
_buffer->position = glm::vec3(modelMatrix * glm::vec4(position.x, position.y, position.z, 1));
_buffer->color = color;
_buffer->texCoords = glm::vec2(0, 0);
_buffer->normal = vertexNormal;
_buffer++;
}
and then I draw all renderables in one draw call. I am curious as to why touching the _buffer variable at all causes massive slow down in the program. For example, if I call std::cout << _buffer->position.x; every frame, my fps tanks to about 1/4th of what it usually is.
What I want to know is why it does this. The reason I want to know is because I want to be able to give translate objects in the batch when they are moved. Essentially, I want the buffer to always be in the same spot and not change but I can change it without huge sacrifices to performance. I assume this isn't possible but I would like to know why. Here is an example of what I would want to do if this didn't cause massive issues:
if(renderables.at(index)->hasChangedPosition())
_buffer+=index;
_buffer->position = renderables.at(index)->getPosition();
I am aware I can send the transforms through the shader uniform but you can't do that for batched objects in one draw call.
why touching the _buffer variable at all causes massive slow down in the program
...well, you did request a GL_WRITE_ONLY buffer; it's entirely possible that the GL driver set up the memory pages backing the pointer returned by glMapBuffer() with a custom fault handler that actually goes out to the GPU to fetch the requested bytes, which can be...not fast.
Whereas if you only write to the provided addresses the driver/OS doesn't have to do anything until the glUnmapBuffer() call, at which point it can set up a nice, fast DMA transfer to blast the new buffer contents out to GPU memory in one go.

Memory leak when loading a 2D texture with DX11

Loading in a texture with no real issues, the ID3D11Texture2D that I'm using to temporarily grab the D3D11_TEXTURE2D_DESC info seems to not delete properly.
If I don't bother using the ID3D11Texture2D item I don't have memory leaks. When I do use it, despite deleting it after getting what I need, it is left over on shutdown when using the ReportLiveObjects function.
I've tried using my own * and manually deleting it, I've tried using a ComPtr and trying the various ways to free one of those up.
Microsoft::WRL::ComPtr<ID3D11Texture2D> pTextureInterface = nullptr;
ID3D11Resource *res = nullptr;
HRESULT hres = DirectX::CreateWICTextureFromFileEx(m_pDev, const_cast<wchar_t*>(wStrAddress.c_str()), 0, D3D11_USAGE_DEFAULT, D3D11_BIND_SHADER_RESOURCE, 0, 0, DirectX::WIC_LOADER_IGNORE_SRGB, &res, tempTex.ReleaseAndGetAddressOf());
tempTex->GetResource(&res);
res->QueryInterface<ID3D11Texture2D>(&pTextureInterface);
D3D11_TEXTURE2D_DESC desc;
pTextureInterface->GetDesc(&desc);
RECT r;
r.left = 0;
r.top = 0;
r.right = desc.Width;
r.bottom = desc.Height;
if (pTextureInterface != nullptr)
{
pTextureInterface.ReleaseAndGetAddressOf();
pTextureInterface = nullptr;
}
if (res != nullptr)
{
res->Release();
res = nullptr;
}
When I create and use the pTextureInterface I get a DirectX memory leak warning for every texture I load with a Refcount of 1.
If I don't bother to use pTextureInterface at all I have no leaks and everything loads and displays properly (except their dimensions as I then am manually setting the rectangle bounds of the source image)
I realize this is probably some tiny little oversight but I have been unable to find a working solution all day, any help would be appreciated.
At invocation tempTex->GetResource(&res); naked pointer acquired during the call to CreateWICTextureFromFileEx is overwritten without releasing it. So the reference counter for that object is not decremented and object leaks. It looks like tempTex->GetResource(&res); is not needed at all.

Exception "Texture cannot be null" Direct X

I am coding a 2D Game using DirectX11 and DirectXTK.
I did a class Framework that initializes both the window displayed for the game and initializes DirectX. These initializations work correctly. Then, I decided to draw some backgrounds, etc in the window, but after a while it exits on an exception. I did a try{ ... } catch(){ } block, which tells me that "Texture cannot be null". However, i could not find which texture it is talking about, even by debbugging and checking all the values.
I decided to separate the different elements i was drawing in the window, to see where the problem might come from... So now i have 3 draw methods :
Draw(DWORD &elapsedTime);
DrawBackground(DWORD &elapsedTime);
DrawCharacter(DWORD &elapsedTime);
The Draw(DWORD &elapsedTime) method calls both DrawBackground() and DrawCharacter() methods.
Here is my Draw Method :
void Framework::Draw(DWORD * elapsedTime)
{
// Clearing the Back Buffer
immediateContext->ClearRenderTargetView(renderTargetView, Colors::Aquamarine);
//Clearing the depth buffer to max depth (1.0)
immediateContext->ClearDepthStencilView(depthStencilView, D3D11_CLEAR_DEPTH, 1.0f, 0); //immediateContext is a ID3D11DeviceContext*
CommonStates states(d3dDevice); //d3dDevice is a ID3D11Device*
sprites.reset(new SpriteBatch(immediateContext));
sprites->Begin(SpriteSortMode_Deferred, states.NonPremultiplied());
DrawBackground1(elapsedTime);
DrawCharacter(elapsedTime);
sprites->End();
//Presenting the back buffer to the front buffer
swapChain->Present(0, 0);
}
By debugging i am almost sure that the exception comes from both DrawBackground() and DrawCharacter(). Indeed, when I comment those in the Draw method, i have no error, but as soon as i put one it sets the exception after displaying what i want during a few seconds.
Here is the method DrawBackground() for example :
void Framework::DrawBackground1(DWORD * elpasedTime)
{
RECT *try1 = new RECT();
try1->bottom = 0; try1->left = 0; try1->right = (int)WIDTH; try1->bottom = (int)HEIGHT;
ID3D11ShaderResourceView * texture2 = nullptr;
ID3D11ShaderResourceView * textureRV = nullptr;
CreateDDSTextureFromFile(d3dDevice, L"../Images/backgrounds/set2_background.dds", nullptr, &textureRV);
CreateDDSTextureFromFile(d3dDevice, L"../Images/backgrounds/set3_tiles.dds", nullptr, &texture2);
sprites->Draw(textureRV, XMFLOAT2(0, 0), try1, Colors::White);
sprites->Draw(texture2, XMFLOAT2(0, 0), try1, Colors::CornflowerBlue);
}
So as soon as i uncomment this method (or any DrawCharacter(), which follows the same steps), the window displays what i expect it to for a few seconds, but then i get the exception "Texture cannot be null". I also noticed that the method DrawCharacter() lets the window displaying what i want longer than the method DrawBackground(), whose texture is way bigger than the character's one.
I'm not sure if this information is useful but i think that maybe this might be linked to the size of the texture ?
Would you notice anything that i did wrong in this code ? Why would a texture be considered null while it does display it for a while ? I've been looking for answers for a few hours now, some help would be amazing please !
Thank you
I noticed that you create two new ID3D11ShaderResourceView every iteration without Release-ing the old ones. You could try by creating the ShaderResourceViews only once and storing them as global variables, or you could try by ->Release() them after the sprites->Draw(...) calls.

Bind CUDA output array/surface to GL texture in ManagedCUDA

I'm currently attempting to connect some form of output from a CUDA program to a GL_TEXTURE_2D for use in rendering. I'm not that worried about the output type from CUDA (whether it'd be an array or surface, I can adapt the program to that).
So the question is, how would I do that? (my current code copies the output array to system memory, and uploads it to the GPU again with GL.TexImage2D, which is obviously highly inefficient - when I disable those two pieces of code, it goes from approximately 300 kernel executions per second to a whopping 400)
I already have a little bit of test code, to at least bind a GL texture to CUDA, but I'm not even able to get the device pointer from it...
ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);
uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture
//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();
The following exception is thrown when attempting to get the pointer:
ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.
What do I need to do to fix this?
And even if this exception can be resolved, how would I solve the other part of my question; that is, how do I work with the acquired pointer in my kernel? Can I use a surface for that? Access it as an arbitrary array (pointer arithmetic)?
Edit:
Looking at this example, apparently I don't even need to map the resource every time I call the kernel, and call the render function. But how would this translate to ManangedCUDA?
Thanks to the example I found, I was able to translate that to ManagedCUDA (after browsing the source code and fiddling around), and I'm happy to announce that this does really improve my samples per second from about 300 to 400 :)
Apparently it is needed to use a 3D array (I haven't seen any overloads in ManagedCUDA using 2D arrays) but that doesn't really matter - I just use a 3D array/texture which is exactly 1 deep.
id = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture3D, id);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage3D(TextureTarget.Texture3D, 0, PixelInternalFormat.Rgba, width, height, 1, 0, OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); //allocate memory for the texture but do not upload anything
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource((uint)id, CUGraphicsRegisterFlags.SurfaceLDST, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_3D, CUGraphicsMapResourceFlags.WriteDiscard);
resultImage.Map();
CudaArray3D mappedArray = resultImage.GetMappedArray3D(0, 0);
resultImage.UnMap();
CudaSurface surfaceResult = new CudaSurface(kernelSample, "outputSurface", CUSurfRefSetFlags.None, mappedArray); //nothing needs to be done anymore - this call connects the 3D array from the GL texture to a surface reference in the kernel
Kernel code:
surface outputSurface;
__global__ void Sample() {
...
surf3Dwrite(output, outputSurface, pixelX, pixelY, 0);
}

(DirectX 11) Dynamic Vertex/Index Buffers implementation with constant scene content changes

Been delving into un-managed DirectX 11 for the first time (bear with me) and there's an issue that, although asked several times over the forums still leaves me with questions.
I am developing as app in which objects are added to the scene over time. On each render loop I want to collect all vertices in the scene and render them reusing a single vertex and index buffer for performance and best practice. My question is regarding the usage of dynamic vertex and index buffers. I haven't been able to fully understand their correct usage when scene content changes.
vertexBufferDescription.Usage = D3D11_USAGE_DYNAMIC;
vertexBufferDescription.BindFlags = D3D11_BIND_VERTEX_BUFFER;
vertexBufferDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
vertexBufferDescription.MiscFlags = 0;
vertexBufferDescription.StructureByteStride = 0;
Should I create the buffers when the scene is initialized and somehow update their content in every frame? If so, what ByteSize should I set in the buffer description? And what do I initialize it with?
Or, should I create it the first time the scene is rendered (frame 1) using the current vertex count as its size? If so, when I add another object to the scene, don't I need to recreate the buffer and changing the buffer description's ByteWidth to the new vertex count? If my scene keeps updating its vertices on each frame, the usage of a single dynamic buffer would loose its purpose this way...
I've been testing initializing the buffer on the first time the scene is rendered, and from there on, using Map/Unmap on each frame. I start by filling in a vector list with all the scene objects and then update the resource like so:
void Scene::Render()
{
(...)
std::vector<VERTEX> totalVertices;
std::vector<int> totalIndices;
int totalVertexCount = 0;
int totalIndexCount = 0;
for (shapeIterator = models.begin(); shapeIterator != models.end(); ++shapeIterator)
{
Model* currentModel = (*shapeIterator);
// totalVertices gets filled here...
}
// At this point totalVertices and totalIndices have all scene data
if (isVertexBufferSet)
{
// This is where it copies the new vertices to the buffer.
// but it's causing flickering in the entire screen...
D3D11_MAPPED_SUBRESOURCE resource;
context->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
context->Unmap(vertexBuffer, 0);
}
else
{
// This is run in the first frame. But what if new vertices are added to the scene?
vertexBufferDescription.ByteWidth = sizeof(VERTEX) * totalVertexCount;
UINT stride = sizeof(VERTEX);
UINT offset = 0;
D3D11_SUBRESOURCE_DATA resourceData;
ZeroMemory(&resourceData, sizeof(resourceData));
resourceData.pSysMem = &totalVertices[0];
device->CreateBuffer(&vertexBufferDescription, &resourceData, &vertexBuffer);
context->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
isVertexBufferSet = true;
}
In the end of the render loop, while keeping track of the buffer position of the vertices for each object, I finally invoke Draw():
context->Draw(objectVertexCount, currentVertexOffset);
}
My current implementation is causing my whole scene to flicker. But no memory leaks. Wonder if it has anything to do with the way I am using the Map/Unmap API?
Also, in this scenario, when would it be ideal to invoke buffer->Release()?
Tips or code sample would be great! Thanks in advance!
At the memcpy into the vertex buffer you do the following:
memcpy(resource.pData, &totalVertices[0], sizeof(totalVertices));
sizeof( totalVertices ) is just asking for the size of a std::vector< VERTEX > which is not what you want.
Try the following code:
memcpy(resource.pData, &totalVertices[0], sizeof( VERTEX ) * totalVertices.size() );
Also you don't appear to calling IASetVertexBuffers when isVertexBufferSet is true. Make sure you do so.