Intel Pin GDB Runtime Overhead - gdb

I am running a Python gdb script on a program that runs with a Pintool. Specifically, I used the -appdebug_enable switch and created a semantic breakpoint in the Pintool that automatically triggers the breakpoint and runs the Python script that I sourced. The script basically inspects local and global variables and scans the memory that was dynamically allocated by the program. I notice that the gdb script runs orders of magnitude slower than if I run the program and gdb without the Pintool. I also tried with a dummy Pintool to see if my Pintool implementation caused the slowdown but it did not seem to be the case.
My conclusion is that Pin slows down my gdb script, but can anyone explain how and why? Is there any tool I can use to profile the performance slowdown from Pin?
(I understand that gdb performance is not something people usually care too much about, but I am curious about the source of the slowdown.)

In your case PIN is used as a JIT (just in time) compiler. So it is effectively stopping your program whenever instruction sequence change, and recompile (main reason for delay) and also add some instructions of its own (additional delays). PIN takes your program executable as argument. From first instruction to next branch instruction (call,jmp,ret) it regenerate sequence of instruction with its additional set of instructions that will transfer control back to PIN framework after executing the generated instruction sequence. While regenerating code PIN allows a user using instrumentation code to inject additional analysis code.

Related

How to get linux kernel coredump for later analysis using gdb tool?

Is it possible to intentionally crash the kernel at specific point during the course of its execution (by inserting some C statement there Or otherwise) and then collect the corefile for analysis using normal gdb program ?
Can somebody pls share the steps and what needs to be done.
Is it possible to intentionally crash the kernel
Sure: just insert a call to panic() in desired place.
The easiest way to do this is using user-mode linux. The kernel becomes just a regular program, and you can execute it under GDB the usual way, setting breakpoints, looking at variables, etc.
If you need to do "bare metal" execution, you should probably start here or here.

How can you read/write asm registers from an .exe using C++?

I want to modify the value of a register that is in a certain program.
The only problem is, I don't know how to access it. If there is a way, how do I read/write into it? (preferred language C++)
If you want to modify a particular register one time while the program is running you can do so using a debugger, such as OllyDbg.
If you want to modify the code while the program isn't running, such that in the future when you run the program it behaves differently, you can view the Assembly using a disassembler such as IDA. But you'll also need something that can reassemble the program with your modifications, such as NAsm
You can also attach one program to another while both are running using the OpenProcess() function in windows. You can then read and write arbitrary values to the other process, including modifying it's code. This is a pretty tricky thing to set up and have work properly... It's how debuggers work, which are usually pretty complex pieces of software. Better to use an existing one than to try to write your own!
As far as I understand you correctly you want to write a program that:
Stops another running program at a certain address (breakpoint)
Waits until the breakpoint is reached
Reads the current value of a certain register
Writes another value into that register
Continues program execution after the breakpoint
Not using a breakpoint makes absolutely no sense because the assembly registers are used for completely different purposes in different parts of the program. So modifying an assembly register only makes sense when the other program is stopped at a specific position.
Under Windows, Win32 code (not .NET code), you may du this the following way:
Start the other .EXE file using CreateProcess with the DEBUG_PROCESS (and maybe - what would be more safe - the CREATE_SUSPENDED) flag set or...
...use DebugActiveProcess to debug an already running process.
Use ReadProcessMemoryand WriteProcessMemory to replace the assembler instruction (to be correct: the byte and not the instruction) at the breakpoint address by 0xCC (int3).
If you used CREATE_SUSPENDED use ResumeThread to actually start the other .EXE file
Call WaitForDebugEvent and ContinueDebugEvent until the breakpoint is reached. You may use GetThreadContext to get the location where the program is stopped.
Replace the int3 instruction by the original instruction using WriteProcessMemory
Use GetThreadContext and SetThreadContext to modify registers. Node: After an int3 instruction the EIP register must be decremented!
Use ContinueDebugEvent to let the program continue
Unfortunately your program has to wait for further debugging events of the observed EXE file and handle them (WaitForDebugEvent and ContinueDebugEvent) - even if you "just" want the program to run...
If you want to stop the program at this breakpoint multiple times it is becoming more complicated: You must set the "trace" flag in the EFlags register each time you continue after the breakpoint so a single-step is done; after that single step you have to replace the original assembler instruction by int3 again.
I myself did this years ago because I wrote some inspection program for that worked similar to "strace" in Linux...
If you are using C++.NET (or C#): There is a .NET assembly that allows you doing all this but using this assembly is not easier than the Win32 variant.
If you want to do this for educational uses only:
Unfortunately the debugging API of Windows is by far less easy to use then the debugging API (ptrace) of Linux. If you just want to do this for educational purposes (and the EXE file you want to inspect is written by you) I would do this excercice under Linux, not under Windows. However even under Linux this is not an easy job...

how GDB knows it has to break at specified break point?

A basic question & I am very new to C/C++ and GDB.
We use GDB to debug a process. We attach GDB to a process and then specify filename.c along with line number to put break point.
My question is "How would GDB or OS OR possibly anything else know that it has to break at specified line number (in filename.c) after we connect GDB to running process?"
What is coming into picture that, say, the current process is run in debug mode and a breakpoint is applied and the process execution has to break (wait for user input) at that point?
The same way that if your program stops or crashes at a particular point, the debugger can tell you where in the program that point is.
For both of these to work the program binary must contain additional debugging information that associates addresses in the program image with locations in the source code (source file and line number.)
To add a breakpoint at a particular line the debugger finds the program address closest to that line, modifies the copy of the executable in memory to insert a special "break" instruction at that location which will cause the program's execution to be interrupted, then "traces" the program's execution and waits for it to reach the breakpoint and stop.
For more details see http://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1/ and http://www.howzatt.demon.co.uk/articles/SimplePTrace.html
I can't comment for the latest version of gdb - but many debuggers actually swap the assembly instruction at the desired breakpoint location (in memory) with an interrupt instruction. This "wakes up" the debugger which takes control at this point.
Using a substituted interrupt instruction means that the CPU can execute your program at full speed and "trip up" at the desired location.
Modern processors are very complex, however, and probably have far superior debugging features.
GDB is aware of your code : it knows all about it. When you set a breakpoint at a line, GDB gets the equivalant machine instruction address : all your code (as machine instructions) is loaded in memory, so the instructions of your code have an address.
So now GDB knows the adress of the instruction you want to break. When you run your programm, GDB will use ptrace, which allow GDB to "see" each instructions before their execution. Then GDB have just to look if the current instruction (which will be executed) is the same as your instruction (that you want to break).

Locating segmentation fault for multithread program running on cluster

It's quite straightforward to use gdb in order to locate a segmentation fault while running a simple program in interactive mode. But consider we have a multithread program - written by pthread - submitted to a cluster node (by qsub command). So we don't have an interactive operation.
How we can locate the segmentation fault? I am looking for a general approach, a program or test tool. I can not provide a reproducible example as the program is really big and crashes on the cluster in some unknown situations.
I need to find a problem in such hard situation because the program runs correctly on the local machine with any number of threads.
The "normal" approach is to have the environment produce a core file and get hold of those. If this isn't an option, you might want to try installing a signal handler for SIGSEGV which obtains, at least, a stack trace dumped somewhere. Of course, this immediately leads to the question "how to get a stack trace" but this is answered elsewhere.
The easiest approach is probably to get hold of a core file. Assuming you have a similar machine where the core file can be read, you can use gdb program corefile to debug the program program which produced the core file corefile: You should be able to look at the different threads, their data (to some extend), etc. If you don't have a suitable machine it may be necessary to cross-compile gdb matching the hardware of the machine where it was run.
I'm a bit confused about the statement that the core files are empty: You can set the limits for core files using ulimit on the shell. If the size for cores is set to zero it shouldn't produce any core file. Producing an empty one seems odd. However, if you cannot change the limits on your program you are probably down to installing a signal handler and dumping out a stack trace from the offending thread.
Thinking of it, you may be able to put the program to sleep in the signal handler and attach to it using a debugger, assuming you can run a debugger on the corresponding machine. You would determine the process ID (using, e.g., ps -elf | grep program) and then attach to it using
gdb program pid
I'm not sure how to put a program to sleep from within the program, though (possibly installing the handler for SIGSTOP for SIGSEGV...).
That said, I assume you tried running your program on your local machine...? Some problems are more fundamental than needing a distributed system of many threads running on each node. This is, obviously, not a replacement for the approach above.

What exactly does a debugger do?

I've stumbled onto a very interesting issue where a function (has to deal with the Windows clipboard) in my app only works properly when a breakpoint is hit inside the function. This got me wondering, what exactly does the debugger do (VS2008, C++) when it hits a breakpoint?
Without directly answering your question (since I suspect the debugger's internal workings may not really be the problem), I'll offer two possible reasons this might occur that I've seen before:
First, your program does pause when it hits a breakpoint, and often that delay is enough time for something to happen (perhaps in another thread or another process) that has to happen before your function will work. One easy way to verify this is to add a pause for a few seconds beforehand and run the program normally. If that works, you'll have to look for a more reliable way of finding the problem.
Second, Visual Studio has historically (I'm not certain about 2008) over-allocated memory when running in debug mode. So, for example, if you have an array of int[10] allocated, it should, by rights, get 40 bytes of memory, but Visual Studio might give it 44 or more, presumably in case you have an out-of-bounds error. Of course, if you DO have an out-of-bounds error, this over-allocation might make it appear to be working anyway.
Typically, for software breakpoints, the debugger places an interrupt instruction at the location you set the breakpoint at. This transfers control of the program to the debugger's interrupt handler, and from there you're in a world where the debugger can decide what to do (present you with a command prompt, print the stack and continue, what have you.)
On a related note, "This works in the debugger but not when I run without a breakpoint" suggests to me that you have a race condition. So if your app is multithreaded, consider examining your locking discipline.
It might be a timing / thread synchronization issue. Do you do any multimedia or multithreading stuff in your program?
The reason your app only works properly when a breakpoint is hit might be that you have some watches with side effects still in your watch list from previous debugging sessions. When you hit the break point, the watch is executed and your program behaves differently.
http://en.wikipedia.org/wiki/Debugger
A debugger essentially allows you to step through your source code and examine how the code is working. If you set a breakpoint, and run in debug mode, your code will pause at that break point and allow you to step into the code. This has some distinct advantages. First, you can see what the status of your variables are in memory. Second, it allows you to make sure your code is doing what you expect it to do without having to do a whole ton of print statements. And, third, it let's you make sure the logic is working the way you expect it to work.
Edit: A debugger is one of the more valuable tools in my development toolbox, and I'd recommend that you learn and understand how to use the tool to improve your development process.
I'd recommend reading the Wikipedia article for more information.
The debugger just halts execution of your program when it hits a breakpoint. If your program is working okay when it hits the breakpoint, but doesn't work without the breakpoint, that would indicate to me that you have a race condition or another threading issue in your code. The breakpoint is stopping the execution of your code, perhaps allowing another process to complete normally?
It stops the program counter for your process (the one you are debugging), and shows the current value of your variables, and uses the value of your variables at the moment to calculate expressions.
You must take into account, that if you edit some variable value when you hit a breakpoint, you are altering your process state, so it may behave differently.
Debugging is possible because the compiler inserts debugging information (such as function names, variable names, etc) into your executable. Its possible not to include this information.
Debuggers sometimes change the way the program behaves in order to work properly.
I'm not sure about Visual Studio but in Eclipse for example. Java classes are not loaded the same when ran inside the IDE and when ran outside of it.
You may also be having a race condition and the debugger stops one of the threads so when you continue the program flow it's at the right conditions.
More info on the program might help.
On Windows there is another difference caused by the debugger. When your program is launched by the debugger, Windows will use a different memory manager (heap manager to be exact) for your program. Instead of the default heap manager your program will now get the debug heap manager, which differs in the following points:
it initializes allocated memory to a pattern (0xCDCDCDCD comes to mind but I could be wrong)
it fills freed memory with another pattern
it overallocates heap allocations (like a previous answer mentioned)
All in all it changes the memory use patterns of your program so if you have a memory thrashing bug somewhere its behavior might change.
Two useful tricks:
Use PageHeap to catch memory accesses beyond the end of allocated blocks
Build using the /RTCsu (older Visual C++ compilers: /GX) switch. This will initialize the memory for all your local variables to a nonzero bit pattern and will also throw a runtime error when an unitialized local variable is accessed.