How do I run GDB one instruction a time? - gdb

I am on Linux, on x86-64 CPU. I want to run a certain assembly stub exactly one instrucion at a time.
"s" does not work. This stub is part of a shared object and has no line informations.

I want to run a certain assembly stub exactly one instrucion at a
time. "s" does not work
Use nexti command several times until the assembly stub ends. See GDB documentation.

Related

How to run a parallel program from within a fortran code already launched with srun on SLURM?

I think my question is pretty specific and niche, and couldn't find an answer anywhere else.
I have a parallel code in Fortran (using MPI), and I would like a subroutine on each individual processor to call another (in principle serial) program during runtime. I do this with EXECUTE_COMMAND_LINE. Now it turns out the other code I'm calling is also parallelized, with no possibility of producing a purely serial version without MPI. In my SLURM file, the cluster is set up such that I have to use srun, so
srun ./mycode < input.in > output.out
calls my code. In the 3rd party code, however, the easiest way to specify the number of cores is to use the provided launcher, which itself uses mpirun to launch the right number of nodes.
In principle, it is possible to run the 3rd party code without mpirun, in which case it should launch a "serial" version (parallel version but on a single core). However, as my code is already being run with srun, it looks like this is triggering the parallel version of the 3rd party software to run on multiple processors, which is ruining what I'm trying to do with this. If I use the normal launcher that calls mpirun to invoke the 3rd party code, everything hangs because mpirun is waiting for the first instance of srun to complete, which it never will.
Is there any way I can specify to the 3rd party code (that doesn't have a flag to specify this explicitly without invoking mpirun) to run on a single processor? Perhaps an environment variable I can set, or a way of using EXECUTE_COMMAND_LINE that would specify the number of cores to run the command on? Or even a way to make multiple mpirun commands interact with preventing each other from running?
I use Intel compilers and MPI versions for everything.
A colleague found one way to do this for anyone struggling:
call execute_command_line("bash -lc 'env -i PATH=/usr/bin/:/bin mpirun -n 2 ./bin/slave &> slave.out &'", wait=.false.)
Executed from within the calling fortran code.

Intel Pin GDB Runtime Overhead

I am running a Python gdb script on a program that runs with a Pintool. Specifically, I used the -appdebug_enable switch and created a semantic breakpoint in the Pintool that automatically triggers the breakpoint and runs the Python script that I sourced. The script basically inspects local and global variables and scans the memory that was dynamically allocated by the program. I notice that the gdb script runs orders of magnitude slower than if I run the program and gdb without the Pintool. I also tried with a dummy Pintool to see if my Pintool implementation caused the slowdown but it did not seem to be the case.
My conclusion is that Pin slows down my gdb script, but can anyone explain how and why? Is there any tool I can use to profile the performance slowdown from Pin?
(I understand that gdb performance is not something people usually care too much about, but I am curious about the source of the slowdown.)
In your case PIN is used as a JIT (just in time) compiler. So it is effectively stopping your program whenever instruction sequence change, and recompile (main reason for delay) and also add some instructions of its own (additional delays). PIN takes your program executable as argument. From first instruction to next branch instruction (call,jmp,ret) it regenerate sequence of instruction with its additional set of instructions that will transfer control back to PIN framework after executing the generated instruction sequence. While regenerating code PIN allows a user using instrumentation code to inject additional analysis code.

How can you read/write asm registers from an .exe using C++?

I want to modify the value of a register that is in a certain program.
The only problem is, I don't know how to access it. If there is a way, how do I read/write into it? (preferred language C++)
If you want to modify a particular register one time while the program is running you can do so using a debugger, such as OllyDbg.
If you want to modify the code while the program isn't running, such that in the future when you run the program it behaves differently, you can view the Assembly using a disassembler such as IDA. But you'll also need something that can reassemble the program with your modifications, such as NAsm
You can also attach one program to another while both are running using the OpenProcess() function in windows. You can then read and write arbitrary values to the other process, including modifying it's code. This is a pretty tricky thing to set up and have work properly... It's how debuggers work, which are usually pretty complex pieces of software. Better to use an existing one than to try to write your own!
As far as I understand you correctly you want to write a program that:
Stops another running program at a certain address (breakpoint)
Waits until the breakpoint is reached
Reads the current value of a certain register
Writes another value into that register
Continues program execution after the breakpoint
Not using a breakpoint makes absolutely no sense because the assembly registers are used for completely different purposes in different parts of the program. So modifying an assembly register only makes sense when the other program is stopped at a specific position.
Under Windows, Win32 code (not .NET code), you may du this the following way:
Start the other .EXE file using CreateProcess with the DEBUG_PROCESS (and maybe - what would be more safe - the CREATE_SUSPENDED) flag set or...
...use DebugActiveProcess to debug an already running process.
Use ReadProcessMemoryand WriteProcessMemory to replace the assembler instruction (to be correct: the byte and not the instruction) at the breakpoint address by 0xCC (int3).
If you used CREATE_SUSPENDED use ResumeThread to actually start the other .EXE file
Call WaitForDebugEvent and ContinueDebugEvent until the breakpoint is reached. You may use GetThreadContext to get the location where the program is stopped.
Replace the int3 instruction by the original instruction using WriteProcessMemory
Use GetThreadContext and SetThreadContext to modify registers. Node: After an int3 instruction the EIP register must be decremented!
Use ContinueDebugEvent to let the program continue
Unfortunately your program has to wait for further debugging events of the observed EXE file and handle them (WaitForDebugEvent and ContinueDebugEvent) - even if you "just" want the program to run...
If you want to stop the program at this breakpoint multiple times it is becoming more complicated: You must set the "trace" flag in the EFlags register each time you continue after the breakpoint so a single-step is done; after that single step you have to replace the original assembler instruction by int3 again.
I myself did this years ago because I wrote some inspection program for that worked similar to "strace" in Linux...
If you are using C++.NET (or C#): There is a .NET assembly that allows you doing all this but using this assembly is not easier than the Win32 variant.
If you want to do this for educational uses only:
Unfortunately the debugging API of Windows is by far less easy to use then the debugging API (ptrace) of Linux. If you just want to do this for educational purposes (and the EXE file you want to inspect is written by you) I would do this excercice under Linux, not under Windows. However even under Linux this is not an easy job...

How can I get the number of instructions executed by a program?

I have written and cross compiled a small c++ program, and I could run it in an ARM or a PC. Since ARM and a PC have different instruction set architectures, I wanna to compare them. Is that possible for me to get the number of executed instructions in this c++ program for both ISAs?
What you need is a profiler. perf would be one easy to use. It will give you the number of instructions that executed, which is the best metric if you want to compare ISA efficiency.
Check the tutorial here.
You need to use: perf stat ./your binary
Look for instructions metric. This approach uses a register in your CPU's performance monitoring unit - PMU - that counts the number of instructions.
Are you trying to get the number of static instructions or dynamic instructions? So, for instance, if you have the following loop (pseudocode):
for (i 0 to N):
a[i] = b[i] + c[i]
Static instruction count will be just under 10 instructions, give or take based on your ISA, but the dynamic count would depend on N, on the branch prediction implementation and so on.
So for static count I would recommend using objdump, as per recommendations in the comments. You can find the entry and exit labels of your subroutine and count the number of instructions in between.
For dynamic instruction count, I would recommend one of two things:
You can simulate running that code using an instruction set simulator (there are open source ISA simulators for both ARM and x86 out there - Gem5 for instance implements both of them, there are others out there that support one or the other.
Your second option is to run this natively on the target system and setup performance counters in the CPU to report dynamic instruction count. You would reset before executing your code, and read it afterwards (there might be some noise here associated with calling your subroutine and exiting, but you should be able to isolate that out)
Hope this helps :)
objdump -dw mybinary | wc -l
On Linux and friends, this gives a good approximation of the number of instructions in an executable, library or object file. This is a static count, which is of course completely different than runtime behavior.
Linux:
valgrind --tool=callgrind ./program 1 > /dev/null

Can my app arrange a gdb breakpoint or watch?

Is there a way for my code to be instrumented to insert a break point or watch on a memory location that will be honored by gdb? (And presumably have no effect when gdb is not attached.)
I know how to do such things as gdb commands within the gdb session, but for certain types of debugging it would be really handy to do it "programmatically", if you know what I mean -- for example, the bug only happens with a particular circumstance, not any of the first 11,024 times the crashing routine is called, or the first 43,028,503 times that memory location is modified, so setting a simple break point on the routine or watch point on the variable is not helpful -- it's all false positives.
I'm concerned mostly about Linux, but curious about if similar solutions exist for OS X (or Windows, though obviously not with gdb).
For breakpoints, on x86 you can break at any location with
asm("int3");
Unfortunately, I don't know how to detect if you're running inside gdb (doing that outside a debugger will kill your program with a SIGTRAP signal)
GDB supports a scripting language that can help in situations like this. For example, you can trigger a bit of custom script on a breakpoint that (for example) may decided to "continue" because some condition hasn't been met.
Not directly related to your question, but may be helpful. Have you looked at backtrace and backtrace_symbol calls in execinfo.h
http://linux.die.net/man/3/backtrace
This can help you log a backtrace whenever your condition is met. It isn't gdb, so you can't break and step through your program, but may be useful as a quick diagnostic.
The commonly used approach is to use a dummy function with non-obvious name. Then, you can augment your .gdbinit or use whatever other technique to always break on that symbol name.
Trivial dummy function:
void my_dummy_breakpoint_loc(void) {}
Code under test (can be an assert-like macro):
if (rare_condition)
my_dummy_breakpoint_loc();
gdb session (obvious, eh?):
b my_dummy_breakpoint_loc
It is important to make sure that "my_dummy_breakpoint_loc" is not optimized away by compiler for this technique to work.
In the fanciest of cases, the actual assembler instruction that calls my_dummy_breakpoint_loc can be replaced by "nops" and enabled on site by site basis by a bit of code self-modification in run-time. This technique is used by Linux kernel development instrumentation, to name a one example.