ANOMALY: meaningless REX prefix used

ANOMALY: meaningless REX prefix used - c++

What does the error ANOMALY: meaningless REX prefix used mean? I have googled and all information I got was completly random that it is related to java or avg or minecraft (because of java).
However, I got this error in the console output of my Visual Studio console application after I merged several branches of my c++ opengl 4.0 graphics engine and it suddenly popped up. I might have updated the AMD graphics driver between the time points I have written them, so this could be one source. After the error popped up also the depth buffer test was suddenly disabled.
After using clean and rebuild in visual studio the error is gone now, I therefore do not need help in fixing the error but I would like to know what it means and what in general causes this error. It makes me curious as I have not found ANYTHING useful searching for this error.

Myria in the comments said:
It's referring to an x86-64 assembly instruction using a REX prefix byte when it didn't need to
To expand upon this, REX prefixes are ignored in a few different scenarios.
If the ModR/M field specifies other registers or an extended opcode.
If more than 1 REX prefix is used in an instruction (though I read on osdev.org this is undefined
If general formatting isn't followed. For example the REX prefix must precede the opcode or escape opcode byte unless being used in conjunction with a mandatory prefix. In which case REX can be right after the opcode/escape byte.
If you try to use the single byte form of INC/DEC in 64 bit mode.
Looks like this ANOMALY message displays in a variety of contexts from git to Java related programs (maybe the one you are referencing) in which a new driver seems to have been the problem. The culprit: Raptr, which comes with AMD's Radeon drivers. In the Java post someone reported using SAPPHIRE Radeon HD 5850 and on the next site I'll link you to, one person was using AMD R9 390 and another the 380. In this context someone saw the message on the console of their 64-bit Win7 sys. Now this person's site took me through a hook Raptr was using (which connects to the opengl32.dll) called mhook, I started digging through this 'Windows API hooking library' and found this starting on line 1230:
assert(X86Instruction->AddressSize >= 4);
if (X86Instruction->rex.w)
{
X86Instruction->OperandSize = 8;
X86Instruction->HasOperandSizePrefix = FALSE;
}
else if (X86Instruction->HasOperandSizePrefix)
{
assert(X86Instruction->OperandSize == 2);
}
else if (X86Instruction->rex_b == REX_PREFIX_START)
{
if (!Instruction->AnomalyOccurred)
{
if (!SuppressErrors) printf("[0x%08I64X] ANOMALY: meaningless REX prefix used\n", VIRTUAL_ADDRESS);
Instruction->AnomalyOccurred = TRUE;
}
X86Instruction->rex_b = 0;
}
To summarize, this ANOMALY message occurs when software handles a REX prefix ignore, like this Windows API library does.
So there you have it, you were in all the right places. The mhook library even has a long list of Visual Studio files to ignore.
additional note*
I found this comment from the os2museum site a good clue to this whole mystery
The Windows amd64 ABI requires that the first opcode of a function be at least 2 bytes in length. (I think this is so the function can be hotpatched.) Many times the first instruction is “push ” but the instruction has a 1-byte encoding! To comply with the ABI, a rex prefix is added to the instruction, making it 2 bytes — “rex push rbp” or “rex push rbx” or whatever. The compiler does this for you, but if you are writing a function in assembler, you need to remember the rule.
Other fun error messages (just a few of many!) in this particular hook library include
ANOMALY: Meaningless segment override
ANOMALY: REX prefix before legacy prefix 0x%02X\n
ANOMALY: Conflicting prefix\n
ANOMALY: Reached maximum prefix count %d\n
and my favorite:
ANOMALY: branch into the middle of an instruction\n
And just because I can't help myself, it might be worth noting these are the instructions that default to 64-bit operands:
+--------------+------------+-------------+
| CALL (near) | ENTER | Jcc |
+--------------+------------+-------------+
| JrCXZ | JMP (near) | LEAVE |
+--------------+------------+-------------+
| LGDT | LIDT | LLDT |
+--------------+------------+-------------+
| LOOP | LOOPcc | LTR |
+--------------+------------+-------------+
| MOV CR(n) | MOV DR(n) | POP reg/mem |
+--------------+------------+-------------+
| POP reg | POP FS | POP GS |
+--------------+------------+-------------+
| POPFQ | PUSH imm8 | PUSH imm32 |
+--------------+------------+-------------+
| PUSH reg/mem | PUSH reg | PUSH FS |
+--------------+------------+-------------+
| PUSH GS | PUSHFQ | RET (near) |
+--------------+------------+-------------+

Related

Control Windows 10's "Power Mode" programmatically

Background
Hi. I have an SB2 (Surface Book 2), and I'm one of the unlucky users who are facing the infamous 0.4GHz throttling problem that is plaguing many of the SB2 machines. The problem is that the SB2 suddenly, and very frequently depending on the ambient temperature, throttles heavily from a boost of 4GHz to 0.4GHz and hangs in there for a minute or two (this causes a severe slowup of the whole laptop). This is extremely frustrating and almost makes the machine unusable for even the simplest of workloads.
Microsoft apparently stated that it fixed the problem in the October 2019 update, but I and several other users are still facing it. I'm very positive my machine is up to date, and I even manually installed all the latest Surface Book 2 firmware updates.
Here's a capture of the CPU state when the problem is happening:
As you can see, the temperature of the unit itself isn't high at all, but CPU is throttling at 0.4GHz exactly.
More links about this: 1 2
Workarounds
I tried pretty much EVERYTHING. Undervolting until freezing screens, disabling BD PROCHOT, disabling power throttling in GPE, messing up with the registry, tuning several CPU/GPU settings. Nothing worked.
You can do only 2 things when the throttling starts:
Wait for it to finish (usually takes a minute or two).
Change the Power Mode in windows 10. It doesn't even matter if you're changing it from "Best performance" to "Best battery life", what matters is that you change it. As soon as you do, throttling completely stops in a couple seconds. This is the only manual solution that worked.
Question
In practice, changing this slider each 10 seconds no matter how heavy the workload is, indefinitely lead to a smooth experience without throttling. Of course, this isn't a feasible workaround by hand.
In theory, I thought that if I could find a way to control this mode programmatically, I might be able to wish this problem goodbye by switching power modes every 10 seconds or so.
I don't mind if it's in win32 (winapi) or a .net thing. I looked a lot, found this about power management, but it seems there's no interface for setting in win32. I could have overlooked it, so here's my question:
Is there any way at all to control the Power Mode in Windows 10 programmatically?

OK... I've been wanting command line or programmatic access to adjust the power slider for a while, and I've run across this post multiple times when looking into it. I'm surprised no one else has bothered to figure it out. I worked it out myself today, motivated by the fact that Windows 11 appears to have removed the power slider from the taskbar and you have to go digging into the Settings app to adjust it.
As previously discussed, in the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\User\PowerSchemes you can find values "ActiveOverlayAcPowerScheme" and "ActiveOverlayDcPowerScheme" which record the current values of the slider for AC power and battery power, respectively. However, changing these values is not sufficient to adjust the power slider or the system's mode of operation.
Turns out there is an undocumented method in C:\Windows\System32\powrprof.dll called PowerSetActiveOverlayScheme. It takes a single parameter. I "guessed" that it would take a GUID in the same manner that PowerSetActiveScheme does, and it seems to work.
Note — Using an undocumented API is unsupported by Microsoft. This method may break in future Windows releases. It can be used for personal tinkering but I would not suggest using it in any actual production projects.
Here is the C# PInvoke signature:
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerSetActiveOverlayScheme")]
public static extern uint PowerSetActiveOverlayScheme(Guid OverlaySchemeGuid);
It returns zero on success and non-zero on failure.
Calling it is as simple as:
PowerSetActiveOverlayScheme(new Guid("ded574b5-45a0-4f42-8737-46345c09c238"));
It has immediate effect. This particular GUID moved the slider all the way to the right for me and also updated the "ActiveOverlayAcPowerScheme" value in the registry. Using a GUID of all zeros reset the slider to the middle value. You can see what GUID options are available by just observing the values that show up in the registry when you set the power slider to different positions.
There are two methods that can be used to read the current position of the slider. I'm not sure what the difference between them is, they returned the same value each time in my testing.
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerGetActualOverlayScheme")]
public static extern uint PowerGetActualOverlayScheme(out Guid ActualOverlayGuid);
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerGetEffectiveOverlayScheme")]
public static extern uint PowerGetEffectiveOverlayScheme(out Guid EffectiveOverlayGuid);
They also return zero on success and non-zero on failure. They can be called like...
if (PowerGetEffectiveOverlayScheme(out Guid activeScheme) == 0)
{
Console.WriteLine(activeScheme);
}
There is one more method called "PowerGetOverlaySchemes", which I presume can be used to fetch a list of available GUIDs that could be used. It appears to take three parameters and I haven't bothered with figuring it out.
I created a command-line program which can be used to set the power mode, and it can be found at https://github.com/AaronKelley/PowerMode.

Aaron's answer is awesome work, helped me massively, thank you.
If you're anything like me and
don't have Visual Studio at the ready to compile his tool for yourself and/or
don't necessarily want to run an arbitrary executable file off of GitHub (no offence),
you can use Python (3, in this case) to accomplish the same thing.
For completeness' sake, I'll copy over the disclaimer:
Note — Using an undocumented API is unsupported by Microsoft. This method may break in future Windows releases. It can be used for personal tinkering but I would not suggest using it in any actual production projects.
Please also note, that the following is just basic proof-of-concept code!
Getting the currently active Byte Sequence:
import ctypes
output_buffer = ctypes.create_string_buffer(b"",16)
ctypes.windll.powrprof.PowerGetEffectiveOverlayScheme(output_buffer)
print("Current Effective Byte Sequence: " + output_buffer.value.hex())
ctypes.windll.powrprof.PowerGetActualOverlayScheme(output_buffer)
print("Current Actual Byte Sequence: " + output_buffer.value.hex())
On my system, this results in the following values:
Mode
Byte Sequence
Better Battery
77c71c9647259d4f81747d86181b8a7a
Better Performance
00000000000000000000000000000000
Best Performance
b574d5dea045424f873746345c09c238
Apparently Aaron's and my system share the same peculiarity, where the "Better Performance" Byte Sequence is just all zeros (as opposed to the "expected" value of 3af9B8d9-7c97-431d-ad78-34a8bfea439f).
Please note, that the Byte Sequence 77c71c9647259d4f81747d86181b8a7a is equivalent to the GUID 961cc777-2547-4f9d-8174-7d86181b8a7a and b574d5dea045424f873746345c09c238 represents ded574b5-45a0-4f42-8737-46345c09c238.
This stems from the the fact that GUIDs are written down differently than how they're actually represented in memory. (If we assume a GUID's bytes to be written as ABCD-EF-GH-IJ-KLMN its Byte Sequence representation ends up being DCBAFEHGIJKLMN). See https://stackoverflow.com/a/6953207 (particularly the pararaph and table under "Binary encodings could differ") and/or https://uuid.ramsey.dev/en/latest/nonstandard/guid.html if you want to know more.
Setting a value (for "Better Battery" in this example) works as follows:
import ctypes
modes = {
"better_battery": "77c71c9647259d4f81747d86181b8a7a",
"better_performance": "00000000000000000000000000000000",
"best_performance": "b574d5dea045424f873746345c09c238"
}
ctypes.windll.powrprof.PowerSetActiveOverlayScheme(bytes.fromhex(modes["better_battery"]))
For me, this was a nice opportunity to experiment with Python's ctypes :).

Here is a PowerShell version that sets up a scheduled task to toggle the power overlay every minute. It is based off the godsend answers of Michael and Aaron.
The CPU throttling issue has plagued me on multiple Lenovo X1 Yoga laptops (Gen2 and Gen4 models).
# Toggle power mode away from and then back to effective overlay
$togglePowerOverlay = {
$function = #'
[DllImport("powrprof.dll", EntryPoint="PowerSetActiveOverlayScheme")]
public static extern int PowerSetActiveOverlayScheme(Guid OverlaySchemeGuid);
[DllImport("powrprof.dll", EntryPoint="PowerGetActualOverlayScheme")]
public static extern int PowerGetActualOverlayScheme(out Guid ActualOverlayGuid);
[DllImport("powrprof.dll", EntryPoint="PowerGetEffectiveOverlayScheme")]
public static extern int PowerGetEffectiveOverlayScheme(out Guid EffectiveOverlayGuid);
'#
$power = Add-Type -MemberDefinition $function -Name "Power" -PassThru -Namespace System.Runtime.InteropServices
$modes = #{
"better_battery" = [guid] "961cc777-2547-4f9d-8174-7d86181b8a7a";
"better_performance" = [guid] "00000000000000000000000000000000";
"best_performance" = [guid] "ded574b5-45a0-4f42-8737-46345c09c238"
}
$actualOverlayGuid = [Guid]::NewGuid()
$ret = $power::PowerGetActualOverlayScheme([ref]$actualOverlayGuid)
if ($ret -eq 0) {
"Actual power overlay scheme: $($($modes.GetEnumerator()|where {$_.value -eq $actualOverlayGuid}).Key)." | Write-Host
}
$effectiveOverlayGuid = [Guid]::NewGuid()
$ret = $power::PowerGetEffectiveOverlayScheme([ref]$effectiveOverlayGuid)
if ($ret -eq 0) {
"Effective power overlay scheme: $($($modes.GetEnumerator() | where { $_.value -eq $effectiveOverlayGuid }).Key)." | Write-Host
$toggleOverlayGuid = if ($effectiveOverlayGuid -ne $modes["best_performance"]) { $modes["best_performance"] } else { $modes["better_performance"] }
# Toggle Power Mode
$ret = $power::PowerSetActiveOverlayScheme($toggleOverlayGuid)
if ($ret -eq 0) {
"Toggled power overlay scheme to: $($($modes.GetEnumerator()| where { $_.value -eq $toggleOverlayGuid }).Key)." | Write-Host
}
$ret = $power::PowerSetActiveOverlayScheme($effectiveOverlayGuid)
if ($ret -eq 0) {
"Toggled power overlay scheme back to: $($($modes.GetEnumerator()|where {$_.value -eq $effectiveOverlayGuid }).Key)." | Write-Host
}
}
else {
"Failed to toggle active power overlay scheme." | Write-Host
}
}
# Execute the above
& $togglePowerOverlay
Create a scheduled job that runs the above script every minute:
Note that Register-ScheduledJob only works with Windows PowerShell, not PowerShell Core
I couldn't get the job to start without using the System principal. Otherwise gets stuck indefinitely in Task Scheduler with "The task has not run yet. (0x41303)".
Get-Job will show the job in Windows PowerShell, but Receive-Job doesn't return anything even though there is job output in dir $env:UserProfile\AppData\Local\Microsoft\Windows\PowerShell\ScheduledJobs$taskName\Output. This might be due to running as System while trying to Receive-Job as another user?
I wish -MaxResultCount 0 was supported to hide the job in Get-Job, but alas it is not.
You can see the task in Windows Task Scheduler under Task Scheduler Library path \Microsoft\Windows\PowerShell\ScheduledJobs
It was necessary to have two script blocks, one as command and one as arguments (that gets serialized/deserialized as a string) because PowerShell script blocks use dynamic closures instead of lexical closures and thus referencing one script block from another when creating a new runspace is not readily possible.
The min interval for scheduled tasks is 1 minute. If it turns out that more frequent toggling is needed, might just add a loop in the toggling code and schedule the task only for startup or login.
$registerJob = {
param($script)
$taskName = "FixCpuThrottling"
Unregister-ScheduledJob -Name $taskName -ErrorAction Ignore
$job = Register-ScheduledJob -Name $taskName -ScriptBlock $([scriptblock]::create($script)) -RunEvery $([TimeSpan]::FromMinutes(1)) -MaxResultCount 1
$psSobsSchedulerPath = "\Microsoft\Windows\PowerShell\ScheduledJobs";
$principal = New-ScheduledTaskPrincipal -UserId SYSTEM -LogonType ServiceAccount
$someResult = Set-ScheduledTask -TaskPath $psSobsSchedulerPath -TaskName $taskName -Principal $principal
}
# Run as Administrator needed in order to call Register-ScheduledJob
powershell.exe -command $registerJob -args $togglePowerOverlay
To stop and remove the scheduled job (must use Windows PowerShell):
$taskName = "FixCpuThrottling"
Unregister-ScheduledJob -Name $taskName-ErrorAction Ignore

Isolate crash prone (SEGV) but speed critical legacy code into a separate binary

I have a code base (mostly C++) which is well tested and crash free. Mostly. A part of the code -- which is irreplaceable, hard to maintain or improve and links against a binary-only library* -- causes all crashes. These to not happen often, but when they do, the entire program crashes.
+----------------------+
| Shiny new sane |
| code base |
| |
| +-----------------+ | If the legacy code crashes,
| | | | the entire program does, too.
| | Legacy Code | |
| | * Crash prone * | |
| | int abc(data) | |
| +-----------------+ |
| |
+----------------------+
Is it possible to extract that part of the code into a separate program, start that from the main program, move the data between these programs (on Linux, OS X and, if possible, Windows), tolerate crashes in the child process and restart the child? Something like this:
+----------------+ // start,
| Shiny new sane | ------. // re-start on crash
| code base | | // and
| | v // input data
| | +-----------------+
| return | | |
| results <-------- | Legacy Code |
+----------------+ | * Crash prone * |
| int abc(data) |
(or not results +-----------------+
because abc crashed)
Ideally the communication would be fast enough so that the synchronous call to int abc(char *data) can be replaced transparently with a wrapper (assuming the non-crash case). And because of slight memory leaks, the legacy program should be restarted every hour or so. Crashes are deterministic, so bad input data should not be sent twice.
The code base is C++11 and C, notable external libraries are Qt and boost. It runs on Linux, OSX and Windows.
--
*: some of the crashes/leaks stem from this library which has no source code available.

Well, if I were you, I wouldn't start from here ...
However, you are where you are. Yes, you can do it. You are going to have to serialize your input arguments, send them, deserialize them in the child process, run the function, serialize the outputs, return them, and then deserialize them. Boost will have lots of useful code to help with this (see asio).
Global variables will make life much more "interesting". Does the legacy code use Qt? - that probably won't like being split into two processes.
If you were using Windows only, I would say "use DCOM" - it makes this very simple.
Restarting is simple enough if the legacy is only used from one thread (the code which handles "return" just looks to see if it needs to restart, and kills the processes.) If you have multiple threads, then the shiny code will need to check if a restart is required, block any further threads, wait until all calls have returned, restart the process, and then unblock everything.
Boost::interprocess looks to have everything you need for the communication - it's got shared memory, mutexes, and condition variables. Boost::serialization will do the job for marshalling and unmarshalling.

system(mkdir) hangs in my application

There is a method in our codebase which used to work fine, but not any more(without any modification to this method):
void XXX::setCSVFileName()
{
//get current working directory
char the_path[1024];
getcwd(the_path, 1023);
printf("current dir: %s \n",the_path);
std::string currentPath(the_path);
std::string currentPathTmp = currentPath + "/tmp_"+pathSetParam->pathSetTravelTimeTmpTableName;
std::string cmd = "mkdir -p "+currentPathTmp;
if (system(cmd.c_str()) == 0) // stops here
{
csvFileName = currentPathTmp+"/"+pathSetParam->pathSetTravelTimeTmpTableName + ".csv";
}
//...
}
I tried to debug it and found the culprit line to be if (system(cmd.c_str()) == 0) . I put a breakpoint on that line and tried to step over it. it just stays there.
The value of cmd as debugger shows is:
Details:{static npos = , _M_dataplus =
{> = {<__gnu_cxx::new_allocator> = {}, }, _M_p = 0x306ae9e78 "mkdir -p
/home/fm-simmobility/vahid/simmobility/dev/Basic/tmp_xuyan_pathset_exp_dy_traveltime_tmp"}}
I dont know what the system is doing but my application in top shows around 100% cpu usage.
Have you ever hit such a situation?
IMPORTANT UPDATE
As usual, I started reverting changes in my code one-by-one back to the state prior to the problem. Surprisingly, I found the problem(but not the solution....yet).
I added -pg to my compilation options to enable gprof. and that is what caused the issue.
May be you have some knowledge of why gropf doesn't line system() or mkdir ??
thanks

You said in a comment on your other question that you needed to use gprof to support the results generated by your own profiler.
In other words, you want to write a profiler, and compare it to gprof, and you're questioning if the -pg flag is making system hang.
I'm saying forget about the -pg flag. All that does is put call-counting code for gprof in the functions the compiler sees.
If I were you I would find something better to compare your profiler to.
Remember the typical reason why people use a profiler is to find speedups,
and they may think collecting measurements will help them do that.
It doesn't.
What it does instead is convince them there are no speedups to be found.
(They ask questions like "new is taking 5% of the time, and that's my bottleneck, how can I speed it up?")
That's what gprof has done for us.
Here's a table of profiler features, from poor to better to best:
gprof perf zoom pausing
samples program counter | X | X | X | X |
show self % by function | X | X | X | X |
show inclusive % by function | | X | X | X |
samples stack | | X | X | X |
detects extra calls | | X | X | X |
show self % by line | | X | X | X |
show inclusive % by line | | ? | X | X |
handles recursion properly | | ? | X | X |
samples on wall-clock time | | | X | X |
let you examine samples | | | | X |
The reason these are important is that speedups are really good at hiding from profilers:
If % by line not shown, speedup may be anywhere in a large function.
If inclusive % not shown, extraneous calls are not seen.
If samples not taken on wall-clock time, extraneous I/O or blocking not seen.
If hot-path is shown, speedups can hide on either side of it.
If call-graph is shown, speedups can hide in it by not being localized to A calls B, such as by a "tunnel" function.
If flame-graph is shown, speedups can hide in it by not aggregating samples that could be removed.
But they can't hide from simply examining stack samples.
P.S. Here are some examples of how speedups can hide from profilers.
If the profiler shows a "hot-path", it only shows a small subset of the stack samples, so it can only show small problems.
But there could be a large problem that would be evident if only comparing stack samples for similarity, not equality:
Speedups can also hide in call graphs, as in this case the fact that A1 always calls C2 and A2 always calls C1 is obscured by the "tunnel function" B (which might be multiple layers).
The call stacks are shown on the right, and a human recognizes the pattern easily:
In this case, the fact that A always calls C is obscured by A calling any of a number of Bi functions (possibly over multiple layers) that then call C.
Again, the pattern is easily recognized in call stacks:
Another way is if the stack samples show that a lot of time is spent calling functions that have the same name but belong to different classes (and are therefore different functions), or have different names but are related by a similar purpose.
In a profiler these conspire to divide the time into small amounts, telling you there is nothing big going on.
That's a consequence of people "looking for slow functions" which is actually a form of blinders.

Interposing functions calls from static library to system framework

I have a situation in which I need to interpose function calls made by a third-party static library to an iOS system framework (a shared library). Chances of cost-effective or timely maintenance support from the vendor of the static library are slim to non-existent.
In effect changing the calling sequence from:
+-------------+ +-------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +---------------------+
to:
+-------------+ +-------------+ +------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | Interposer | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +------------+ +---------------------+
The orthodox solutions to this problem is either to persuade the run-time linker to load a different library in place of FrameworkA or loading libraries with dlopen(). Neither of these are options on iOS.
One solution that does work is using sed to rename symbols in the the symbol table of libVendor.a.
Suppose I want to interpose calls FrameworkA_Foo() in FrameworkA made by functions in libVendor.a:
sed s/FrameworkA_Foo/InterposeA_Foo/g < libVendor.a > libInterposedVendor.a
And then interpose it with:
void InterposeA_Foo()
{
// do some stuff here
// ....
// then (maybe) forward
FrameworkA_Foo();
}
This works just so long as the length of the symbol names remains the same.
Whilst this approach works, it lacks elegance and feels fairly hacky. Do better solutions exist?
Other approaches considered:
Changing link order in order to get the interposing function to link to libVendor.a rather than the framework: Apple's linkers (unlike those on most UNIX platforms) recursively resolve symbols in libraries, and the order they are presented on the command line makes little difference)
Linker scripts: Not supported by lld
mach-o object-file editing tools: Found nothing that worked
[For clarity, this is C (rather than Objective-c) API and the toolchain is clang/LLVM. Using an Apple-supplied GCC is not an option due deprecation and lack of C++11 support]

What is your favourite Windbg tip/trick? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I have come to realize that Windbg is a very powerful debugger for the Windows platform & I learn something new about it once in a while. Can fellow Windbg users share some of their mad skills?
ps: I am not looking for a nifty command, those can be found in the documentation. How about sharing tips on doing something that one couldn't otherwise imagine could be done with windbg? e.g. Some way to generate statistics about memory allocations when a process is run under windbg.

My favorite is the command .cmdtree <file> (undocumented, but referenced in previous release notes). This can assist in bringing up another window (that can be docked) to display helpful or commonly used commands. This can help make the user much more productive using the tool.
Initially talked about here, with an example for the <file> parameter:
http://blogs.msdn.com/debuggingtoolbox/archive/2008/09/17/special-command-execute-commands-from-a-customized-user-interface-with-cmdtree.aspx
Example:
alt text http://blogs.msdn.com/photos/debuggingtoolbox/images/8954736/original.aspx

To investigate a memory leak in a crash dump (since I prefer by far UMDH for live processes).
The strategy is that objects of the same type are all allocated with the same size.
Feed the !heap -h 0 command to WinDbg's command line version cdb.exe (for greater speed) to get all heap allocations:
"C:\Program Files\Debugging Tools for Windows\cdb.exe" -c "!heap -h 0;q" -z [DumpPath] > DumpHeapEntries.log
Use Cygwin to grep the list of allocations, grouping them by size:
grep "busy ([[:alnum:]]\+)" DumpHeapEntries.log \
| gawk '{ str = $8; gsub(/\(|\)/, "", str); print "0x" str " 0x" $4 }' \
| sort \
| uniq -c \
| gawk '{ printf "%10.2f %10d %10d ( %s = %d )\n", $1*strtonum($3)/1024, $1, strtonum($3), $2, strtonum($2) }' \
| sort > DumpHeapEntriesStats.log
You get a table that looks like this, for example, telling us that 25529270 allocations of 0x24 bytes take nearly 1.2 GB of memory.
8489.52 707 12296 ( 0x3000 = 12288 )
11894.28 5924 2056 ( 0x800 = 2048 )
13222.66 846250 16 ( 0x2 = 2 )
14120.41 602471 24 ( 0x2 = 2 )
31539.30 2018515 16 ( 0x1 = 1 )
38902.01 1659819 24 ( 0x1 = 1 )
40856.38 817 51208 ( 0xc800 = 51200 )
1196684.53 25529270 48 ( 0x24 = 36 )
Then if your objects have vtables, just use the dps command to seek some of the 0x24 bytes heap allocations in DumpHeapEntries.log to know the type of the objects that are taking all the memory.
0:075> dps 3be7f7e8
3be7f7e8 00020006
3be7f7ec 090c01e7
3be7f7f0 0b40fe94 SomeDll!SomeType::`vftable'
3be7f7f4 00000000
3be7f7f8 00000000
It's cheesy but it works :)

The following command comes very handy when looking on the stack for C++ objects with vtables, especially when working with release builds when quite a few things get optimized away.
dpp esp Range
Being able to load an arbitrary PE file as dump is neat:
windbg -z mylib.dll
Query GetLastError() with:
!gle
This helps to decode common error codes:
!error error_number

Almost 60% of the commands I use everyday..
dv /i /t
?? this
kM (kinda undocumented) generates links to frames
.frame x
!analyze -v
!lmi
~
Explanation
dv /i /t [doc]
dv - display names and values of local variables in the current scope
/i - specify the kind of variable: local, global, parameter, function, or unknown
/t - display data type of variables
?? this [doc]
?? - evaluate C++ expression
this - C++ this pointer
kM [doc]
k - display stack back trace
M - DML mode. Frame numbers are hyperlinks to the particular frame. For more info about kM refer to http://windbg.info/doc/1-common-cmds.html
.frame x [doc]
Switch to frame number x. 0 being the frame at top of stack, 1 being frame 1 below the 0th frame, and so on.
To display local variables from another frame on the stack, first switch to that frame - .frame x, then use dv /i /t. By default d will show info from top frame.
!analyze -v [doc1] [doc2 - Using the !analyze Extension]
!analyze - analyze extension. Display information about the current exception or bug check. Note that to run an extension we prefix !.
-v - verbose output
!lmi [doc]
!lmi - lmi extension. Display detailed information about a module.
~ [doc]
~ - Displays status for the specified thread or for all threads in the current process.

The "tip" I use most often is one that will save you from having to touch that pesky mouse so often: Alt + 1
Alt + 1 will place focus into the command window so that you can actually type a command and so that up-arrow actually scrolls through command history. However, it doesn't work if your focus is already in the scrollable command history.
Peeve: why the heck are key presses ignored while the focus is in a source window? It's not like you can edit the source code from inside WinDbg. Alt + 1 to the rescue.

One word (well, OK, three) : DML, i.e. Debugger Markup Language.
This is a fairly recent addition to WinDbg, and it's not documented in the help file. There is however some documentation in "dml.doc" in the installation directory for the Debugging Tools for Windows.
Basically, this is an HTML-like syntax you can add to your debugger scripts for formatting and, more importantly, linking. You can use links to call other scripts, or even the same script.
My day-to-day work involves maintenance on a meta-modeler that provides generic objects and relationship between objects for a large piece of C++ software. At first, to ease debugging, I had written a simple dump script that extracts relevant information from these objects.
Now, with DML, I've been able to add links to the output, allowing the same script to be called again on related objects. This allows for much faster exploration of a model.
Here's a simplified example. Assume the object under introspection has a relationship called "reference" to another object.
r #$t0 = $arg1 $$ arg1 is the address of an object to examine
$$ dump some information from $t0
$$ allow the user to examine our reference
aS /x myref ##(&((<C++ type of the reference>*)#$t0)->reference )
.block { .printf /D "<link cmd=\"$$>a< <full path to this script> ${myref}\">dump Ref</link> " }
Obviously, this a pretty canned example, but this stuff is really invaluable for me. Instead of hunting around in very complex objects for the right data members (which usually took up to a minute and various casting and dereferencing trickery), everything is automated in one click!

.prefer_dml 1
This modifies many of the built in commands (for example, lm) to display DML output which allows you to click links instead of running commands. Pretty handy...
.reload /f /o file.dll (the /o will overwrite the current copy of the symbol you have)
.enable_unicode 1 //Switches the debugger to default to Unicode for strings since all the Windows components use Unicode internally, this is pretty handy.
.ignore_missing_pages 1 //If you do a lot of kernel dump analysis, you will see a lot of errors regarding memory being paged out. This command will tell the debugger to stop throwing this warning.
alias alias alias...
Save yourself some time in the debugger. Here are some of mine:
aS !p !process;
aS !t !thread;
aS .f .frame;
aS .p .process /p /r
aS .t .thread /p /r
aS dv dv /V /i /t //make dv do your favorite options by default
aS f !process 0 0 //f for find, e.g. f explorer.exe

Another answer mentioned the command window and Alt + 1 to focus on the command input window. Does anyone find it difficult to scroll the command output window without using the mouse?
Well, I have recently used AutoHotkey to scroll the command output window using keyboard and without leaving the command input window.
; WM_VSCROLL = 0x115 (277)
ScrollUp(control="")
{
SendMessage, 277, 0, 0, %control%, A
}
ScrollDown(control="")
{
SendMessage, 277, 1, 0, %control%, A
}
ScrollPageUp(control="")
{
SendMessage, 277, 2, 0, %control%, A
}
ScrollPageDown(control="")
{
SendMessage, 277, 3, 0, %control%, A
}
ScrollToTop(control="")
{
SendMessage, 277, 6, 0, %control%, A
}
ScrollToBottom(control="")
{
SendMessage, 277, 7, 0, %control%, A
}
#IfWinActive, ahk_class WinDbgFrameClass
; For WinDbg, when the child window is attached to the main window
!UP::ScrollUp("RichEdit50W1")
^k::ScrollUp("RichEdit50W1")
!DOWN::ScrollDown("RichEdit50W1")
^j::ScrollDown("RichEdit50W1")
!PGDN::ScrollPageDown("RichEdit50W1")
!PGUP::ScrollPageUp("RichEdit50W1")
!HOME::ScrollToTop("RichEdit50W1")
!END::ScrollToBottom("RichEdit50W1")
#IfWinActive, ahk_class WinBaseClass
; Also for WinDbg, when the child window is a separate window
!UP::ScrollUp("RichEdit50W1")
!DOWN::ScrollDown("RichEdit50W1")
!PGDN::ScrollPageDown("RichEdit50W1")
!PGUP::ScrollPageUp("RichEdit50W1")
!HOME::ScrollToTop("RichEdit50W1")
!END::ScrollToBottom("RichEdit50W1")
After this script is run, you can use Alt + up/down to scroll one line of the command output window, Alt + PgDn/PgUp to scroll one screen.
Note: it seems different versions of WinDbg will have different class names for the window and controls, so you might want to use the window spy tool provided by AutoHotkey to find the actual class names first.

Script to load SOS based on the .NET framework version (v2.0 / v4.0):
!for_each_module .if(($sicmp( "##ModuleName" , "mscorwks") = 0) )
{.loadby sos mscorwks} .elsif ($sicmp( "##ModuleName" , "clr") = 0)
{.loadby sos clr}

I like to use advanced breakpoint commands, such as using breakpoints to create new one-shot breakpoints.

Do not use WinDbg's .heap -stat command. It will sometimes give you incorrect output. Instead, use DebugDiags memory reporting.
Having the correct numbers, you can then use WinDbg's .heap -flt ... command.

For command & straightforward (static or automatable) routines where the debugger is used, it is very cool to be able to put all the debugger commands to run through in a text command file and run that as input through kd.exe or cdb.exe, callable via a batch script, etc.
Run that whenever you need to do this same old routine, without having to fire up WinDbg and do things manually. Too bad this doesn't work when you aren't sure what you are looking for, or some command parameters need manual analysis to find/get.

Platform-independent dump string for managed code which will work for x86/x64:
j $ptrsize = 8 'aS !ds .printf "%mu \n", c+';'aS !ds .printf "%mu \n", 10+'
Here is a sample usage:
0:000> !ds 00000000023620b8
MaxConcurrentInstances

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js