Trouble Reliably Enabling/Disabling MMCSS - c++

I'm trying to implement MMCSS in my project to improve timing performance, and I am seeing odd behavior. I'm a low-level embedded software developer struggling to do software, and this is my first post here, so your patience and understanding is appreciated if my description needs work.
I can't share all my code, but here's some pseudo-code showing how I'm applying MMCSS (taken mostly from the exclusive-mode streams example from MSDN here:
\\ setup/initialize some things
DWORD taskIndex = 0;
hTask = AvSetMmThreadCharacteristics(TEXT("Pro Audio"), &taskIndex);
if (hTask == NULL)
return 1;
\\ print time since last loop
\\ do things - some conditions will break out of loop
\\ sleep
if (hTask != NULL)
The behavior that I'm seeing is hard to fully characterize. When I run my project, if MMCSS is successfully applied (which it may not be the first few times I run my code), it appears to work reliably the whole time I'm running the exe. If it ran right the previous time, then it will run right every time after that. However, if I comment out the MMCSS code and rebuild and run, it will continue to behave as if MMCSS is applied. I have found that if I restart my computer then run again, I will get timing information indicating MMCSS is no longer being applied.
It seems like I am missing something that is causing the transition from MMCSS enabled to disabled (and vice versa) to be messy. Am I calling the SetMmThreadCharacteristics and RevertMmThreadCharacteristics in the wrong place? Is there some other/additional method(s) I should be calling?


After upgrading to vkQueueWaitIdle == VK_ERROR_DEVICE_LOST

I recently upgraded my ray tracing renderer from Vulkan SDK version to
This was necessary because the ray tracing extension went out of beta and thus now works with non-beta
graphics drivers (am on version 461.40 for my RTX 2070 SUPER). It required me to make quite a few changes to the ray tracing side of my renderer which
I managed thanks to the nvidia tutorial.
Unfortunately, code that used to work started to cause errors now.
In many situations, submitting a single time command causes vkQueueWaitIdle to fail with VK_ERROR_DEVICE_LOST which results in a validation error, saying I'm trying to free the command buffer while it's still in use. This happens for a variety of uses: transitioning an image layout(undef to general it seems), building acceleration structures, copying buffers but not every time (e.g. from a staging to a device buffer, after which freeing the staging buffer also throws an error, since it's still in use, the copy not having finished)... But for other uses, it works fine. I can't currently identify a common denominator...
Finally, the program crashes because presenting the first frame fails, because its layout is undefined - I assume this is caused by one or more of the previously mentioned errors.
Did something change about this since last I used it? This is the offending code (endSingleTimeCommands):
VkSubmitInfo submitInfo{};
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
switch (vkQueueWaitIdle(graphicsQueue)) {
//debug output removed for brevity
vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);
One of the places where it fails is this:
//[fill the structs with info...]
//function pointer grabbed via vkGetDeviceProcAddr
vk::vkCmdBuildAccelerationStructuresKHR(cmd, 1, &buildInfo, &buildOffset);
//[call to the above code here]
But also code unrelated to extensions fails (sometimes!) such as this one:
VkCommandBuffer commandBuffer = beginSingleTimeCommands();
VkBufferCopy copyRegion{};
copyRegion.srcOffset = 0; // Optional
copyRegion.dstOffset = 0; // Optional
copyRegion.size = size;
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
Perhaps beginSingleTimeCommands is also relevant:
VkCommandBufferAllocateInfo allocInfo{};
allocInfo.commandPool = commandPool;
allocInfo.commandBufferCount = 1;
VkCommandBuffer commandBuffer;
if (vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer) != VK_SUCCESS) {
std::cout << "beginSingleTimeCommands: could not allocate command buffer!\n";
VkCommandBufferBeginInfo beginInfo{};
if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
std::cout << "beginSingleTimeCommands: could not begin command buffer!\n";
return commandBuffer;
Some additional info I think I gathered:
I used the nvidia pipeline checkpoint system to add a checkpoint before and after the call to vkCmdBuildAccelerationStructuresKHR and both checkpoints are at TOP_OF_PIPE. After the first call to this function, no more checkpoint output is generated, leading me to believe that the first call to the build somehow ruins everything. I will triplecheck my AS building I guess, I'll get back to you if I find anything.
Turns out, the actual error can occur before the command buffer whose vkQueueWaitIdle returns the DEVICE_LOST error. I've had and continue to have a variety of errors in my acceleration structure building code. I can't easily debug it, because apparently the validation layers don't show if there's subtle mistakes in the structs fed to vkCmdBuildAccelerationStructures, instead it's a lot of trial and error.
One notable example which I'm certain would've been caught by the validation layers pre-upgrade is forgetting to set the VkAccelerationStructureBuildGeometryInfoKHR::scratchData field, the last mistake I had to fix to finally get everything to run.
The answer to my question is thus: Don't look at the commands that trigger the DEVICE_LOST, look at what you do with the queue before that command, there's a chance the error is there, instead. In fact, once the first DEVICE_LOST error occurred, (almost?) all further vkQueueWaitIdle failed with the same error (same with the vkQueueSubmit). In cases such as my copy buffer code being the first to fail, the error was always found in the queue usage before that one.
I can't post the exact solution to my problem as - like I've said - there's more than one cause and I've only fixed some of them so far, there's still some left. I think the details are not relevant to future people who come across my question but if there's anything I can add to help other people, please let me know.
This is so true! I was stuck with this issue for couple of days only to figure out that VkAccelerationStructureBuildGeometryInfoKHR flags was mismatching when I query the size using vkGetAccelerationStructureBuildSizesKHR() vs when I use it to actually build the BLAS! In my case, I was using VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR | VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR while querying the size and only FAST_TRACE while actually creating the AS, this was causing the same issue!

NtQuerySystemInformation Hook Failure

After successfully building a trampoline and learning more about process memory space, I tested the trampoline on MessageBoxA. It worked perfectly so I decided to finally put the code to the use it was supposed to be for in the first place, hiding a process by hooking NtQuerySystemInformation. The redirect function should work fine, but the code I used to write the jmp instruction now causes the task manager to crash every time.
BYTE tmpJMP[5] = {0xE9,0x00,0x00,0x00,0x00}; //jmp,A,D,D,R
DWORD Addr = ((DWORD)func - ((DWORD)oNtQuerySystemInformation + 0x5));
for (int i=0;i<4;++i)
JMP[i+1] = ((BYTE*)&Addr)[i];
if (VirtualProtect((LPVOID)oNtQuerySystemInformation,5,PAGE_EXECUTE_READWRITE,&oldProtect) == FALSE)
MessageBox(NULL,L"Error unprotecting memory",L"",MB_OK);
if (!WriteProcessMemory(GetCurrentProcess(),(LPVOID)oNtQuerySystemInformation,(LPCVOID)JMP,5,NULL))
MessageBox(NULL,L"Unable to write to process memory space",L"",MB_OK);
I'm writing to the memory as such. I can't seem to find a problem with the code. I was thinking that maybe the memory changes from API to API but I was told that's incorrect, making me all out confused. Is there anything wrong that you all see? Please be descriptive. I love to learn o3o

Make main() "uncrashable"

I want to program a daemon-manager that takes care that all daemons are running, like so (simplified pseudocode):
void watchMe(filename)
while (true)
system(filename); //freezes as long as filename runs
//oh, filename must be crashed. Nevermind, will be restarted
int main()
_beginThread(watchMe, "foo.exe");
_beginThread(watchMe, "bar.exe");
This part is already working - but now I am facing the problem that when an observed application - say foo.exe - crashes, the corresponding system-call freezes until I confirm this beautiful message box:
This makes the daemon useless.
What I think might be a solution is to make the main() of the observed programs (which I control) "uncrashable" so they are shutting down gracefully without showing this ugly message box.
Like so:
char *p = NULL;
*p = 123; //nice null pointer exception
catch (...)
cout << "Caught Exception. Terminating gracefully" << endl;
return 0;
But this doesn't work as it still produces this error message:
("Untreated exception ... Write access violation ...")
I've tried SetUnhandledExceptionFilter and all other stuff, but without effect.
Any help would be highly appreciated.
This seems more like a SEH exception than a C++ exception, and needs to be handled differently, try the following code:
char *p = NULL;
*p = 123; //nice null pointer exception
__except(GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ?
cout << "Caught Exception. Terminating gracefully" << endl;
return 0;
But thats a remedy and not a cure, you might have better luck running the processes within a sandbox.
You can change the /EHsc to /EHa flag in your compiler command line (Properties/ C/C++ / Code Generation/ Enable C++ exceptions).
See this for a similar question on SO.
You can run the watched process a-synchronously, and use kernel objects to communicate with it. For instance, you can:
Create a named event.
Start the target process.
Wait on the created event
In the target process, when the crash is encountered, open the named event, and set it.
This way, your monitor will continue to run as soon as the crash is encountered in the watched process, even if the watched process has not ended yet.
BTW, you might be able to control the appearance of the first error message using drwtsn32 (or whatever is used in Win7), and I'm not sure, but the second error message might only appear in debug builds. Building in release mode might make it easier for you, though the most important thing, IMHO, is solving the cause of the crashes in the first place - which will be easier in debug builds.
I did this a long time ago (in the 90s, on NT4). I don't expect the principles to have changed.
The basic approach is once you have started the process to inject a DLL that duplicates the functionality of UnhandledExceptionFilter() from KERNEL32.DLL. Rummaging around my old code, I see that I patched GetProcAddress, LoadLibraryA, LoadLibraryW, LoadLibraryExA, LoadLibraryExW and UnhandledExceptionFilter.
The hooking of the LoadLibrary* functions dealt with making sure the patching was present for all modules. The revised GetProcAddress had provide addresses of the patched versions of the functions rather than the KERNEL32.DLL versions.
And, of course, the UnhandledExceptionFilter() replacement does what you want. For example, start a just in time debugger to take a process dump (core dumps are implemented in user mode on NT and successors) and then kill the process.
My implementation had the patched functions implemented with __declspec(naked), and dealt with all the registered by hand because the compiler can destroy the contents of some registers that callers from assembly might not expect to be destroyed.
Of course there was a bunch more detail, but that is the essential outline.

c++ having strange problem

I have a function that creates and insert some numbers in a vector.
for(int i=0; i<6; i++)
It should insert 6 numbers in two vectors, the only problem is that it doesn't - it actually inserts more.
I don't think the snippet will run unless Enemy2.dEnemy == true, and it won't stay true for ever.
The first time the snippet runs, then Enemy2.dEnemy is set to false and it shouldn't run again.
I don't set Enemy2.dEnemy to true anywhere except when the window is created.
If I insert a break point any where in the snippet, the program will work fine - it will insert ONLY 6 numbers in the two vectors.
Any ideas what's wrong here?
ok so i did some debugging.
i found that Enemy2.dEnemy=false; is being skipped for some reason.
i tried to do this to see if it was.
for(int i=0; i<6; i++)
TCHAR s[244];
MessageBox(hWnd, _T("0"), _T(""), MB_OK);
MessageBox(hWnd, _T("1"), _T(""), MB_OK);
well the message box popped saying 1 and my code worked fine. it seems that Enemy2.dEnemy=false; doesn't have time to run ;/
ok i found where is the real problem which was causing to insert more than 6 numbers..
it was where i was asigning Enemy2.dEnemy=true;
the problem seems that the second if runs more than one time, which is weird!
First things first: get rid of that abominable if (Enemy2.dEnemy == true) - it should be:
if (Enemy2.dEnemy)
(I also prefer to name my booleans as a readable sentence segments like Enemy2.isABerserker or Enemy3.hasHadLeftLegCutOffThreeInchesBelowTheKnee but that's just personal preference).
Other than that, the only thing I can suggest is a threading problem. There's nothing wrong with that code per se, but there is a window in which two threads could enter the if statement and both start pushing values into your vector.
In other words, if thread 1 is doing the pushing when thread 2 encounters the if statement, thread 2 will also start pushing values, since thread 1 has yet to set dEnemy to true. And don't think you can just move the assignment to the top of the if block - that will reduce but not remove the window.
My advice is to print out the contents of the vectors in the situation where they have more than six entries and that may give a clue as to what's happened (post the output here if you wish).
Re your update that the second if below is running twice:
If this code is executed twice in the same second (and that's not beyond the bounds of possibility), the second if statement will run twice.
That's because time(NULL) give you the number of seconds since the epoch so, until that second is over, you may well be executing the contents of that if thousands of times (or more).
If this problem disappears when you put in a breakpoint or a diagnostic output message, that's a strong clue that the problem is undefined behavior, which is usually caused by something like dereferencing an uninitialized pointer or careless use of const_cast.
The cause of the problem probably has nothing to do with the code you're looking at. It's caused somewhere else and just happens to show up here. It's like someone being hit by a falling brick: the obvious symptom is a man lying unconscious on the sidewalk, but the real problem has nothing to do with the man or the sidewalk, it's several stories up.
If you want to find the cause of the error, remove your diagnostics until the problem reappears, then start removing everything else. Prune away all of the other code. Whenever the error stops, back up until it starts again; if you don't see the cause of the error, start pruning somewhere else. Eventually the bug will have nowhere to hide.

Memory leak checking using Instruments on Mac

I've just been pulling my hair out trying to make Instruments cough up my deliberately constructed memory leaks. My test example looks like this:
class Leaker
char *_array;
_array=new char[1000];
void *leaker()
void *p=malloc(1000);
int *pa=new int[2000];
Leaker l;
Leaker *pl=new Leaker();
return p;
int main (int argc, char **argv)
for (int i=0; i<1000; ++i) {
sleep(2); // Needed to give Instruments a chance to poll memory
return 0;
Basically Instruments never found the obvious leaks. I was going nuts as to why, but then discovered "sec Between Auto Detections" in the "Leaks Configuration" panel under the Leaks panel. I dialed it back as low as it would go, which was 1 second, and placed the sleep(2) in in my code, and voila; leaks found!
As far as I'm concerned, a leak is a leak, regardless of whether it happens 30 minutes into an app or 30 milliseconds. In my case, I stripped the test case back to the above code, but my real application is a command-line application with no UI or anything and it runs very quickly; certainly less than the default 10 second sample interval.
Ok, so I can live with a couple of seconds upon exit of my app in instrumentation mode, but what I REALLY want, is to simply have Instruments snapshot memory on exit, then do whatever it needs over time while the app is running.
So... the question is: Is there a way to make Instruments snapshot memory on exit of an application, regardless of the sampling interval?
Instruments, in Leaks mode can be really powerful for leak tracing, but I've found that it's more biased towards event-based GUI apps than command line programs (particularly those which exit after a short time). There used to be a CHUD API where you could programmatically control aspects of the instrumentation, but last time I tried it the frameworks were no longer provided as part of the SDK. Perhaps some of this is now replaced with Dtrace.
Also, ensure you're up to date with Xcode as there were some recent improvements in this area which might make it easier to do what you need. You could also keep the short delay before exit but make it conditional on the presence of an environment variable, and then set that environment variable in the Instruments launch properties for your app, so that running outside Instruments doesn't have the delay.
Most unit testing code executes the desired code paths and exits. Although this is perfectly normal for unit testing, it creates a problem for the leaks tool, which needs time to analyze the process memory space. To fix this problem, you should make sure your unit-testing code does not exit immediately upon completing its tests. You can do this by putting the process to sleep indefinitely instead of exiting normally.
I've just decided to leave the 2 second delay during my debug+leaking build.