D3D11: E_OUTOFMEMORY when mapping vertex buffer - c++

In my Unity game, I have to modify a lot of graphic resources like textures and vertex buffers via native code to keep good performance.
The problems start when code calls ID3D11ImmediateContext::Map several times in a very short time (I mean very short - called from different threads running parallel). There is no rule if mapping is successful or not. Method call looks like
ID3D11DeviceContext* sU_m_D_context;
void* BeginModifyingVBO(void* bufferHandle)
{
ID3D11Buffer* d3dbuf = static_cast<ID3D11Buffer*>(bufferHandle);
D3D11_MAPPED_SUBRESOURCE mapped;
HRESULT res = sU_m_D_context->Map(d3dbuf, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
assert(mapped.pData);
return mapped.pData;
}
void FinishModifyingVBO(void* bufferHandle)
{
ID3D11Buffer* d3dbuf = static_cast<ID3D11Buffer*>(bufferHandle);
sU_m_D_context->Unmap(d3dbuf, 0);
}
std::mutex sU_m_D_locker;
void Mesh::ApplyBuffer()
{
sU_m_D_locker.lock();
// map buffer
VBVertex* mappedBuffer = (VBVertex*)BeginModifyingVBO(this->currentBufferPtr);
memcpy(mappedBuffer, this->mainBuffer, this->mainBufferLength * sizeof(VBVertex));
// unmap buffer
FinishModifyingVBO(this->currentBufferPtr);
sU_m_D_locker.unlock();
this->markedAsChanged = false;
}
where d3dbuf is dynamic vertex buffer. I don't know why, but sometimes result is E_OUTOFMEMORY, despite there is a lot of free memory. I tried to surround code with mutexes - no effect.
Is this really memory problem or maybe something less obvious?

None of the device context methods are thread safe. If you are going to use them from several threads you will need to either manually sync all the calls, or use multiple (deferred) contexts, one per thread. See Introduction to Multithreading in Direct3D 11.
Also error checking should be better: you need to always check returned HRESULT values because in case of failure something like assert(mapped.pData); may still succeed.

Related

Why don't these static functions return as expected from another thread?

I'm trying to use multiple threads to make one function run concurrently with another, but when the function that the new thread is running uses a static function, it always returns 0 for some reason.
I'm using Boost for the threading, on Linux, and the static functions work exactly as expected when not using threads. I'm pretty sure this isn't a data race issue because if I join the thread directly after making it (not giving any other code a chance to change anything), the problem persists.
The function that the thread is created in:
void WorldIOManager::createWorld(unsigned int seed, std::string worldName, bool isFlat) {
boost::thread t( [=]() { P_createWorld(seed, worldName, isFlat); } );
t.join();
//P_createWorld(seed, worldName, isFlat); // This works perfectly fine
}
The part of P_createWorld that uses a static function (The function that the newly-created thread actually runs):
m_world->chunks[i]->tiles[y][x] = createBlock(chunkData[i].tiles[y][x].id, chunkData[i].tiles[y][x].pos, m_world->chunks[i]);
m_world is a struct that contains an array of Chunks, which has a two dimensional array of Tiles, which each have texture IDs associated with a texture in a cache. createBlock returns a pointer to a new tile pointer, completely initialized. The static function in question belongs to a statically-linked library, and is defined as follows:
namespace GLEngine {
//This is a way for us to access all our resources, such as
//Models or textures.
class ResourceManager
{
public:
static GLTexture getTexture(std::string texturePath);
private:
static TextureCache _textureCache;
};
}
Also, its implementation:
#include "ResourceManager.h"
namespace GLEngine {
TextureCache ResourceManager::_textureCache;
GLTexture ResourceManager::getTexture(std::string texturePath) {
return _textureCache.getTexture(texturePath);
}
}
Expected result: For each tile to actually get assigned its proper texture ID
Actual result: Every tile, no matter the texturePath, is assigned 0 as its texture ID.
If you need any more code like the constructor for a tile or createBlock(), I'll happily add it, I just don't really know what information is relevant in this kind of situation...
So, as I stated before, all of this works perfectly if I don't have a thread, so my final question is: Is there some sort of undefined behaviour that has to do with static functions being called by threads, or am I just doing something wrong here?
As #fifoforlifo mentioned, OpenGL contexts have thread affinity, and it turns out I was making GL calls deeper into my texture loading function. I created a second GL context and turned on context sharing and then it began to work. Thanks a lot, #fifoforlifo!

How does glDeleteTextures and glDeleteBuffers work?

Basically, in my code I hook the glDeleteTextures and glBufferData functions. I store a list of textures and a list of buffers. The buffer list holds checksums and pointers to the buffer. The below code intercepts the data before it reaches the graphics card.
Hook_glDeleteTextures(GLsizei n, const GLuint* textures)
{
for (int I = 0; I < n; ++I)
{
if (ListOfTextures[I] == textures[I]) //??? Not sure if correct..
{
//Erase them from list of textures..
}
}
(*original_glDeleteTextures)(n, textures);
}
And I do the same thing for my buffers. I save the buffers and textures to a list like below:
void Hook_glBufferData(GLenum target, GLsizeiptr size, const GLvoid* data, GLenum usage)
{
Buffer.size = size;
Buffer.target = target;
Buffer.data = data;
Buffer.usage = usage;
ListOfBuffers.push_back(Buffer);
(*original_glBufferData(target, size, data, usage);
}
Now I need to delete whenever the client deletes. How can I do this? I used a debugger and it seems to know exactly which textures and buffers are being deleted.
Am I doing it wrong? Should I be iterating the pointers passed and deleting the textures?
You do realize, that you should to it other way round: Have a list of texture-info objects and when you delete one of them, call OpenGL to delete the textures. BTW: OpenGL calls don't go to the graphics card, they go to the driver and textures may be stored not on GPU memory at all but be swapped out to system memory.
Am I doing it wrong? Should I be iterating the pointers passed and deleting the textures?
Yes. You should not intercept OpenGL calls to trigger data management in your program. For one, you'd have to track the active OpenGL context as well. But more importantly, it's your program that does the OpenGL calls in the first place. And unless your program/compiler/CPU is schizophrenic it should be easier to track the data first and manage OpenGL objects according to this. Also the usual approach is to keep texture image data in a cache, but delete OpenGL textures of those images, if you don't need them right now, but may need them in the near future.
Your approach is basically inside-out, you're putting the cart before the horse.

Segfault accessing classes across threads

I'm a bit stumped on an issue I'm having with threading and C++. I'm writing a DSP plugin for Windows Media Player, and I want to send the data I intercept to a separate thread where I'll send it out on the network. I'm using a simple producer-consumer queue like the one explained here
The program is crashing on the isFull() function which just compares two integers:
bool ThreadSafeQueue::isFull()
{
if (inCount == outCount) //CRASH!
return true;
else
return false;
}
The thread that's doing the dequeuing:
void WMPPlugin::NetworkThread (LPVOID pParam)
{
ThreadSafeQueue* dataQueue = (ThreadSafeQueue*)(pParam);
while (!networkThreadDone)
{
Sleep(2); /// so we don't hog the processor or make a race condition
if (!dataQueue->isFull())
short s = dataQueue->dequeue();
if (networkThreadDone) // variable set in another process so we know to exit
break;
}
}
The constructor of the class that's creating the consumer thread:
WMPPlugin::WMPPlugin()
{
// etc etc
dataQueue = new ThreadSafeQueue();
_beginthread(WMPPlugin::NetworkThread, 0, dataQueue);
}
inCount and outCount are just integers and they're only read here, not written. I was under the impression this meant they were thread safe. The part that writes them aren't included, but each variable is only written to by one thread, never by both. I've done my best to not include code that I don't feel is the issue, but I can include more if necessary. Thanks in advance for any help.
Most often, when a crash happens accessing a normal member variable, it means this is NULL or an invalid address.
Are you sure you aren't invoking it on a NULL instance?
Regarding this line:
ThreadSafeQueue* dataQueue = (ThreadSafeQueue*)(pParam);
How sure are you that pParam is always non-NULL?
How sure are you that pParam is always a ThreadSafeQueue object?
Are you possible deleting the ThreadSafeQueue objects on other threads?

Accessing and modifying automatic variables on another thread's stack

I want to pass some data around threads but want to refrain from using global variables if I can manage it. The way I wrote my thread routine has the user passing in a separate function for each "phase" of a thread's life cycle: For instance this would be a typical usage of spawning a thread:
void init_thread(void *arg) {
graphics_init();
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) {
graphics_draw();
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
int main () {
threadCreator factory;
factory.createThread(init_thread, 0, process_msg_thread, 0, cleanup_thread, 0);
// even indexed arguments are the args to be passed into their respective functions
// this is why each of those functions must have a fixed function signature is so they can be passed in this way to the factory
}
// Behind the scenes: in the newly spawned thread, the first argument given to
// createThread() is called, then a message pumping loop which will call the third
// argument is entered. Upon receiving a special exit message via another function
// of threadCreator, the fifth argument is called.
The most straightforward way to do it is using globals. I'd like to avoid doing that though because it is bad programming practice because it generates clutter.
A certain problem arises when I try to refine my example slightly:
void init_thread(void *arg) {
GLuint tex_handle[50]; // suppose I've got 50 textures to deal with.
graphics_init(&tex_handle); // fill up the array with them during graphics init which loads my textures
}
void process_msg_thread(message *msg, void *arg) {
if (msg->ID == MESSAGE_DRAW) { // this message indicates which texture my thread was told to draw
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]); // send back the handle so it knows what to draw
}
}
void cleanup_thread(void *arg) {
graphics_cleanup();
}
I am greatly simplifying the interaction with the graphics system here but you get the point. In this example code tex_handle is an automatic variable, and all its values are lost when init_thread completes, so will not be available when process_msg_thread needs to reference it.
I can fix this by using globals but that means I can't have (for instance) two of these threads simultaneously since they would trample on each other's texture handle list since they use the same one.
I can use thread-local globals but is that a good idea?
I came up with one last idea. I can allocate storage on the heap in my parent thread, and send a pointer to in to the children to mess with. So I can just free it when parent thread leaves away since I intend for it to clean up its children threads before it exits anyway. So, something like this:
void init_thread(void *arg) {
GLuint *tex_handle = (GLuint*)arg; // my storage space passed as arg
graphics_init(tex_handle);
}
void process_msg_thread(message *msg, void *arg) {
GLuint *tex_handle = (GLuint*)arg; // same thing here
if (msg->ID == MESSAGE_DRAW) {
graphics_draw_this_texture(tex_handle[msg->texturehandleindex]);
}
}
int main () {
threadCreator factory;
GLuint *tex_handle = new GLuint[50];
factory.createThread(init_thread, tex_handle, process_msg_thread, tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
delete[] tex_handle;
}
This looks more or less safe because my values go on the heap, my main thread allocates it then lets children mess with it as they wish. The children can use the storage freely since the pointer was given to all the functions that need access.
So this got me thinking why not just have it be an automatic variable:
int main () {
threadCreator factory;
GLuint tex_handle[50];
factory.createThread(init_thread, &tex_handle, process_msg_thread, &tex_handle, cleanup_thread, 0);
// do stuff, wait etc
...
} // tex_handle automatically cleaned up at this point
This means children thread directly access parent's stack. I wonder if this is kosher.
I found this on the internets: http://software.intel.com/sites/products/documentation/hpc/inspectorxe/en-us/win/ug_docs/olh/common/Problem_Type__Potential_Privacy_Infringement.htm
it seems Intel Inspector XE detects this behavior. So maybe I shouldn't do it? Is it just simply a warning of potential privacy infringement as suggested by the the URL or are there other potential issues that may arise that I am not aware of?
P.S. After thinking through all this I realize that maybe this architecture of splitting a thread into a bunch of functions that get called independently wasn't such a great idea. My intention was to remove the complexity of requiring coding up a message handling loop for each thread that gets spawned. I had anticipated possible problems, and if I had a generalized thread implementation that always checked for messages (like my custom one that specifies the thread is to be terminated) then I could guarantee that some future user could not accidentally forget to check for that condition in each and every message loop of theirs.
The problem with my solution to that is that those individual functions are now separate and cannot communicate with each other. They may do so only via globals and thread local globals. I guess thread local globals may be my best option.
P.P.S. This got me thinking about RAII and how the concept of the thread at least as I have ended up representing it has a certain similarity with that of a resource. Maybe I could build an object that represents a thread more naturally than traditional ways... somehow. I think I will go sleep on it.
Put your thread functions into a class. Then they can communicate using instance variables. This requires your thread factory to be changed, but is the cleanest way to solve your problem.
Your idea of using automatic variables will work too as long as you can guarantee that the function whose stack frame contains the data will never return before your child threads exit. This is not really easy to achieve, even after main() returns child threads can still run.

Problem with a trainer I'm trying to create (for educational purposes)

I'm trying to create a trainer for Icy Tower 1.4 for educational purposes.
I wrote a function that shorten the WriteProcessMemory function like that:
void WPM(HWND hWnd,int address,byte data[])
{
DWORD proc_id;
GetWindowThreadProcessId(hWnd, &proc_id);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, proc_id);
if(!hProcess)
return;
DWORD dataSize = sizeof(data);
WriteProcessMemory(hProcess,(LPVOID)address,&data,dataSize,NULL);
CloseHandle(hProcess);
}
and that's the function that should stop the Icy Tower Clock:
void ClockHack(int status)
{
if(status==1)//enable
{
//crashes the game
byte data[]={0xc7,0x05,0x04,0x11,0x45,0x00,0x00,0x00,0x00,0x00};
WPM(FindIcyTower(),0x00415E19,data);
}
else if(status==0)//disable
{
byte data[]={0xA3,0x04,0x11,0x45,0x00};
}
}
in the else statement there's the orginal AOB of the Opcode.
When I call the ClockHack function with the status parameter set to 1, the game crashes.
In Cheat Engine I wrote for this a script, that dosen't exactly write to the same address because I did Code Cave and it works great.
Someone knows why? Thank you.
By the way: it is for educational purposes only.
You can't pass an array to a function like that. Having a byte[] parameter is the same as a byte * parameter, and sizeof(data) will just give you the size of a pointer. Also, you shouldn't use &data since it's already a pointer.
So your function should look like:
void WPM(HWND hWnd,int address, byte *data, int dataSize)
{
//....
WriteProcessMemory(hProcess,(LPVOID)address,data,dataSize,NULL);
//...
}
when an array is passed into a function it is always passed by reference, so byte[] is the same as byte*, and you are only writing the first sizeof(byte*) bytes of your code. Or 4 bytes on X86 platforms.
Also, it looks like what you are writing is object code, if not then ignore the rest of this this answer.
Well, assuming that you are writing to the correct location, and what you are writing is correct, you still have problem - WriteProcessMemory isn't guaranteed to be atomic with respect to the thread that is running in the target process.
You need to make sure that that target thread is Suspended, and not executing in that part of code. And I have no idea what sort of thing you (may) have to do to flush the instruction decoding pipeline and or L1 cache.
Edit: Now that I've thought some more. I think that using a mutex to protect this piece of code from being overwritten while it is being executed is better than Suspending the thread.