How to print result of C++ evaluation with GDB? - c++

I've been looking around but was unable to figure out how one could print out in GDB the result of an evaluation. For example, in the code below:
if (strcmp(current_node->word,min_node->word) > 0)
min_node = current_node;
(above I was trying out a possible method for checking alphabetical order for strings, and wasn't absolutely certain it works correctly.)
Now I could watch min_node and see if the value changes but in more involved code this is sometimes more complicated. I am wondering if there is a simple way to watch the evaluation of a test on the line where GDB / program flow currently is.

There is no expression-level single stepping in gdb, if that's what you are asking for.
Your options are (from most commonly to most infrequently used):
evaluate the expression in gdb, doing print strcmp(current_node->word,min_node->word). Surprisingly, this works: gdb can evaluate function calls, by injecting code into the running program and having it execute the code. Of course, this is fairly dangerous if the functions have side effects or may crash; in this case, it is so harmless that people typically won't think about potential problems.
perform instruction-level (assembly) single-stepping (ni/si). When the call instruction is done, you find the result in a register, according to the processor conventions (%eax on x86).
edit the code to assign intermediate values to variables, and split that into separate lines/statements; then use regular single-stepping and inspect the variables.

you may simply try to type in :
call "my_funtion()"
as far as i rember, though it won't work when a function is inlined.

Related

can the return value from finish in gdb be different from the actual one in execution

I am a gdb novice, and I was trying to debug some GSSAPI code, and was using fin to see the return value from the frame. As seen in the snip pasted below, the call from gssint_mechglue_initialize_library() seems to be 0 but the actual check seems to fail. Can someone please point out if I am missing something obvious here?
Thanks in advance!
One possible explanation for the observed behavior is that you are debugging optimized code, and that line 1001 isn't really executed.
You can confirm this with a few nexts, or by executing fin again and observing whether GSS_S_COMPLETE or something else is returned from gssint_select_mech_type.
When optimization is on, code motion performed by the optimizer often prevents correct assignment of actual code sequences to line numbers (as instructions "belonging" to different lines are mixed and re-ordered). This often causes the code to "jump around" when e.g. doing nexti command.
For ease of debugging, recompile with -O0, or make sure to remove any -O2 and the like from your compile lines.

Why is my program going into both an if statement AND its corresponding else statement?

In part of my program, I have the code:
if(cameraName == L"AVT Prosilica GT2750") {
mCamera = new camera_avtcam_ex_t();
} else if(cameraName == L"QImaging Retiga 2000R\\4000R") {
mCamera = new camera_qcam_ex_t();
}
When I have set up my program so that cameraName defaults to L"AVT Prosilica GT2750" (and my debugger will show this to be its value), it goes into the if statement and runs mCamera = new camera_avtcam_ex_t();, but then when I step to the next executed line my debugger skips directly to the line mCamera = new camera_qcam_ex_t(); and executes it. How can this possibly be happening given the nature of if/else statements?
NOTE: If I replace the else if with just a simple else statement, the same behavior is seen.
You are seeing this due to trying to debug a release build
Try debugging a "debug" build. You should see the behavior you are expecting. When debugging an optimized build the lines don't necessarily 'line up' with the source code. For all you know, the optimizer decided that it was best to execute both of those and throw one away if it wasn't needed.
Note - I am not suggesting the optimizer did do that, I am just saying it is possible and that you may actually be seeing what line is being executed next. The optimizer is free to reorder the code, unroll loops, propogate constants, remove variables add temporaries, etc, etc, etc.
Edit - additional thoughts
When you get down to the hardware level things can get really reordered. The hardware may choose to execute both sides of a branch before it figures out which one should be taken so that the answer is ready as soon as it is needed. It will do that even though it means throwing other work away as that may yield faster execution.

gdb : findind every jumps to an address

I'm trying to understand a small binary using gdb but there is something I can't find a way to achieve : how can I find the list of jumps that point to a specified address?
I have a small set of instructions in the disassembled code and I want to know where it is called.
I first thought about searching the corresponding instruction in .text, but since there are many kind of jumps, and address can be relative, this can't work.
Is there a way to do that?
Alternatively, if I put a breakpoint on this address, is there a way to know the address of the previous instruction (in this case, the jump)?
If this is some subroutine being called from other places, then it must respect some ABI while it's called.
Depending on a CPU used, the return address (and therefore a place from where it was called) will be stored somewhere (on stack or in some registers). If you replace original code with the one that examines this, you can create a list of return addresses. Or simpler, as you suggested, if you use gdb and put a breakpoint at that routine, you can see from where it was called by using a bt command.
If it was actual jump (versus a "jump to subroutine") that led you there (which I doubt, if it's called from many places, unless it's a kind of longjmp/setjmp), then you will probably not be able to determine where this was called from, unless the CPU you are using allows you to trace the execution in some way.

Optimal virtual machine/byte-code interpreter loop

My project has a VM that executes a byte-code compiled from a domain-specific-language. I'm looking at ways that I can improve the execution time of the byte-code. As a first step I'd like to see if there is a way to simply improve the byte-code interpreter before I venture into machine code compilation.
The main loop of the interpreter looks like this:
while(true)
{
uint8_t cmd = *code++;
switch( cmd )
{
case op_1: ...; break;
...
}
}
QUESTION: Is there a faster way to implement this loop without resorting to assembler?
The one option I see is GCC specific to use dynamic goto with label addresses. Rather than a break at the end of each case I could jump directly to the next instruction. I had hoped the optimizer would do this for me, but looking at the disassembly it apparently doesn't: there is a repeated constant jump at the end of most op_codes.
If relevant the VM is a simple register based machine with floating point and integer registers (8 of each). There is no stack, only a global heap (that language is not that complicated).
One very easy optimisation is that instead of
switch /case/case/case/case/case,
just define an array with function pointers (where each function would process a specified command, or a couple of commands in which case you could set several entries in the array to the same function, and the function itself could check the exact code), and instead of
switch(cmd)
just do a
array[cmd]()
This is given that you dont have too many commands. Also, do some checking if you will not define all the possible commands (maybe you only have 300 commands, but you have to use 2 bytes for representing them, so instead of definining an array with 65536 items, just check if the command is less than 301 and if its not, dont do the lookup)
If you won't do that, at least sort the commands that the most used ones are in the beginning of the switch statement.
Otherwise it would be to look into hashtables, but I assume you don't have that many commands, and in that case overhead of doing a hash function would probably cost you more than not having to do a switch. (Or have a VERY simple hash function)
What's the architecture? You may get a speed-up with word-aligned opcodes, but it'll blow out your code size, which means you'll have to balance it against the cost of a cache miss.
Few obvious optimization I see are,
If you don't use cmd anywhere than switch() then, directly use the pointer indirection, switch( *code++ ). For longer while(true) loop, this can be little helpful.
In switch(), you can use continue instead of break. Because when continue is used inside if/else or switch, compiler knows that execution has to jump to the outer loop; the same is not true for break (with respect to switch).
Hope this helps.

Gentle introduction to JIT and dynamic compilation / code generation

The deceptively simple foundation of dynamic code generation within a C/C++ framework has already been covered in another question. Are there any gentle introductions into topic with code examples?
My eyes are starting to bleed staring at highly intricate open source JIT compilers when my needs are much more modest.
Are there good texts on the subject that don't assume a doctorate in computer science? I'm looking for well worn patterns, things to watch out for, performance considerations, etc. Electronic or tree-based resources can be equally valuable. You can assume a working knowledge of (not just x86) assembly language.
Well a pattern I've used in emulators goes something like this:
typedef void (*code_ptr)();
unsigned long instruction_pointer = entry_point;
std::map<unsigned long, code_ptr> code_map;
void execute_block() {
code_ptr f;
std::map<unsigned long, void *>::iterator it = code_map.find(instruction_pointer);
if(it != code_map.end()) {
f = it->second
} else {
f = generate_code_block();
code_map[instruction_pointer] = f;
}
f();
instruction_pointer = update_instruction_pointer();
}
void execute() {
while(true) {
execute_block();
}
}
This is a simplification, but the idea is there. Basically, every time the engine is asked to execute a "basic block" (usually a everything up to next flow control op or whole function in possible), it will look it up to see if it has already been created. If so, execute it, else create it, add it and then execute.
rinse repeat :)
As for the code generation, that gets a little complicated, but the idea is to emit a proper "function" which does the work of your basic block in the context of your VM.
EDIT: note that I haven't demonstrated any optimizations either, but you asked for a "gentle introduction"
EDIT 2: I forgot to mention one of the most immediately productive speed ups you can implement with this pattern. Basically, if you never remove a block from your tree (you can work around it if you do but it is way simpler if you never do), then you can "chain" blocks together to avoid lookups. Here's the concept. Whenever you return from f() and are about to do the "update_instruction_pointer", if the block you just executed ended in either a call, unconditional jump, or didn't end in flow control at all, then you can "fixup" its ret instruction with a direct jmp to the next block it'll execute (cause it'll always be the same one) if you have already emited it. This makes it so you are executing more and more often in the VM and less and less in the "execute_block" function.
I'm not aware of any sources specifically related to JITs, but I imagine that it's pretty much like a normal compiler, only simpler if you aren't worried about performance.
The easiest way is to start with a VM interpreter. Then, for each VM instruction, generate the assembly code that the interpreter would have executed.
To go beyond that, I imagine that you would parse the VM byte codes and convert them into some sort of suitable intermediate form (three address code? SSA?) and then optimize and generate code as in any other compiler.
For a stack based VM, it may help to to keep track of the "current" stack depth as you translate the byte codes into intermediate form, and treat each stack location as a variable. For example, if you think that the current stack depth is 4, and you see a "push" instruction, you might generate an assignment to "stack_variable_5" and increment a compile time stack counter, or something like that. An "add" when the stack depth is 5 might generate the code "stack_variable_4 = stack_variable_4+stack_variable_5" and decrement the compile time stack counter.
It is also possible to translate stack based code into syntax trees. Maintain a compile-time stack. Every "push" instruction causes a representation of the thing being pushed to be stored on the stack. Operators create syntax tree nodes that include their operands. For example, "X Y +" might cause the stack to contain "var(X)", then "var(X) var(Y)" and then the plus pops both var references off and pushes "plus(var(X), var(Y))".
Get yourself a copy of Joel Pobar's book on Rotor (when it's out), and delve through the source to the SSCLI. Beware, insanity lies within :)