I have a nondeterministic memory corruption problem. Because it's not always the same address, and it occurs only rarely, I can't simply watchpoint it with gdb.
The problem is a value changes between point A and point B in my program. The only thing that is supposed to change it is point C, which does not run in that time (at least not for the specific instance that experiences the unexpected modification).
What I'd like to do is something like mprotect the value at point A so the machine will trap if it is modified and unprotected it again around the intentional modification at point C. Of course, mprotect is not meant to be taken literally as I need it to work with word granularity.
Simply watchpointing at point A manually with gdb is far too much toil, the frequency of the problem is only about one per thousand.
Ideally, I would like a stack trace at the point that modifies it.
Any ideas?
Update: I just found out about rr http://rr-project.org/, a tool that can allegedly "determinize" non-determinism problems. I'm going to give it a go.
Update2: Well that was a short trip:
[FATAL /build/rr-jR8ti5/rr-4.1.0/src/PerfCounters.cc:167:init_attributes() errno: 0 'Success']
-> Microarchitecture `Intel Merom' currently unsupported.
You are experiencing undefined behavior and it's being caused somewhere else, debugging this is really hard.
Since you are apparently on Linux, use valgrind and it will help you a lot. If you are not on Linux or (OS X which is also supported by valgrind), search for equivalent memory error detection software for your system.
I found that it isn't that difficult to script gdb in a scripting language that you know (in my case, Ruby). This cuts down on the need to learn how to make proper gdb scripts!
The API between the target program and the script is that the target program has a blank function called my_breakpoint that accepts a single machine word as an argument. Calling my_breakpoint(1); my_breakpoint(addr); adds an address to the watch list while the same thing with the constant 2 removes an address from the watch list.
To use this, you need to start gdbserver 127.0.0.1:7117 myapp myargs, and then launch the following script. When the script detects a problem, it disconnects cleanly from gdbserver so that you can reconnect another instance of gdb with gdb -ex 'target remote 127.0.0.1:7117' and off you go.
Note that it's extremely slow to use software watchpoints like this; maybe someday something like this can implemented as valgrind tool.
#!/usr/bin/env ruby
system("rm -f /tmp/gdb_i /tmp/gdb_o");
system("mkfifo /tmp/gdb_i /tmp/gdb_o");
system("killall -w gdb");
system("gdb -ex 'target remote 127.0.0.1:7117' </tmp/gdb_i >/tmp/gdb_o &");
$fo = File.open("/tmp/gdb_i", "wb");
$fi = File.open("/tmp/gdb_o", "rb");
def gdb_put(l)
$stderr.puts("gdb_out: #{l}");
$fo.write((l + "\n"));
$fo.flush;
end
gdb_put("b my_breakpoint");
gdb_put("set can-use-hw-watchpoints 0");
gdb_put("c");
$state = 0;
$watchpoint_ctr = 1; # start at 1 so the 1st watchpoint gets 2, etc. this is because the breakpoint gets 1.
$watchpoint_nr = {};
def gdb_got_my_breakpoint(x)
$stderr.puts("my_breakpoint #{x}");
if ((x == 1) || (x == 2))
raise if ($state != 0);
$state = x;
gdb_put("c");
else
if ($state == 1)
raise if ($watchpoint_nr[x].nil?.!);
$watchpoint_nr[x] = ($watchpoint_ctr += 1);
gdb_put("watch *#{x}");
elsif ($state == 2)
nr = $watchpoint_nr[x];
if (nr.nil?)
$stderr.puts("WARNING: ignoring delete request for watchpoint #{x} not previously established");
else
gdb_put("delete #{nr}");
$watchpoint_nr.delete(x);
end
end
$state = 0;
gdb_put("info breakpoints");
$stderr.puts("INFO: my current notion: #{$watchpoint_nr}");
gdb_put("c");
end
end
def gdb_got(l)
t = l.split;
if ((t[0] == "Breakpoint") && (t[2] == "my_breakpoint"))
gdb_got_my_breakpoint(t[3][3..-2].to_i);
end
if (l.start_with?("Program received signal ") || l.start_with?("Watchpoint "))
gdb_put("disconnect");
gdb_put("q");
sleep;
end
end
while (l = $fi.gets)
l = l.strip;
$stderr.puts("gdb_inp: #{l}");
gdb_got(l);
end
Related
SIGSEGV SEGV_MAPERR at 0x00000008
0 libpjsua2.so 0x56585a88 pj::Call::getInfo() const
1 libpjsua2.so 0x56546b44 std::allocator<pj::CallMediaInfo>::allocator()
I'm using pjsip for one of my hobby project(complies with GPL). Above you can see the stacktrace received from crashlytics. I'm using Java wrapper for pjsip.
There are a lot of users(50 %) affected by this error, however I'm not able to reproduce it on my local devices.
Not sure but I suspect that following java call lead to error. Which call C++ via JNI
public void notifyCallState(MyCall call) {
if (currentCall == null || call.getId() != currentCall.getId())
return;
CallInfo ci;
try {
ci = call.getInfo();
} catch (Exception e) {
ci = null;
}
Message m = Message.obtain(handler, MSG_TYPE.CALL_STATE, ci);
m.sendToTarget();
if (ci != null && ci.getState() == pjsip_inv_state.PJSIP_INV_STATE_DISCONNECTED) {
currentCall = null;
}
}
Code snippet is taken from examples which come from psjua download. Link to http repo. My code is the same. Any help highly appreciated
From the stacktrace is looks like call is null, and getId method is at 0x8 offset.
If that's really the case, the fix is to make sure notifyCallState isn't called with null argument, or to check it inside the method, i.e.:
if (call == null || currentCall == null || call.getId() != currentCall.getId())
return;
Your program is most likely hitting some sort of memory corruption and most likely heap memory. Following observations points towards that.
I'm not able to reproduce it on my local devices. This is common symptoms of memory corruption.
stack-trace includes std::allocator which indicates that program has been terminated while using(creating/deleting/accessing) the heap memory.
Recommendation
We should try to review the code logic and whether this program uses Interop service in correct way.I do not have much idea regarding this however it looks like your program logic does have JAVA/C++ interaction. If we are lucky we might get something obvious here and we are done.
If the stack-trace are after effect of something else, then we are in trouble we might have to take approach suggested in below posts.
Windows Platform
https://stackoverflow.com/a/22074401/2724703
Linux Platform
https://stackoverflow.com/a/22658693/2724703
Android Platform
https://stackoverflow.com/a/22663360/2724703
You may want to refer the above posts to get the idea about how to approach on such problems. As per my understanding, android platform does not have dynamic tools so you might have to use some versions(debug/additional logging) of your library.
I do hope that, above information might be useful and would have given some guidelines to approach your problem.
Is there any way to print to a text file the code that it's being executed for debugging purposes?
for example:
if (i == 1)
{
a = true;
}
else
{
a = false
}
So when i = 1 we print to a text file:
if (i == 1)
{
a = true;
}
else
and when i != 1 we print to the text file
if (i == 1)
else
{
a = false
}
I am not saying that this is a good practice. I know that gdb and other tools are much better to debug code so please don't get mad if you think that it's an awful idea. I was just wondering if it can be done. It would be like adding a printf after every line so we only print the lines that get executed. No thread save or anything like that.
I think what you want hasn't anything to do with debugging in the first place, but with unit testing and test coverage:
You'll need to create unit tests (e.g. using googletest) for your code and compile it with code coverage options switched on (e.g. --coverage for GCC). Then you can use a tool to create a coverage report (e.g. lcov/genhtml for the mentioned toolchain).
The unit tests will control the input for your cases (i = 1/0).
For debugging purposes I would say it is not practical. Yes, you can do a printf before/after each line of execution, but that would just clog up your program. Also, if you're talking about debugging the execution of loops, you will end up printing a bunch of junk over and over again and would have to look forever to find potential bugs. In short, use breakpoints.
However, from a theoretical standpoint, it is possible to create a program that outputs itself. This is a little different from what you want because you only need parts of your program, but my best guess is that with a little modification it can be done.
The problem I am trying to solve is that I want to dynamically compute the length of an instruction given its address (from within GDB) and set that length as the value of a variable. The challenge is that I don't want any extraneous output printed to the console (e.g. disassembled instructions, etc.).
My normal approach to this is to do x/2i ADDR, then subtract the two addresses. I would like to achieve the same thing automatically; however, I don't want anything printed to the console. If I could disable console output then I would be able to do this by doing x/2i ADDR, followed by $_ - ADDR.
I have not found a way to disable the output of a command in GDB. If you know such a way then please tell me! However, I have discovered interpreter-exec and GDB/MI. A quick test shows that doing x/2i works on GDB/MI, and the value of $_ computed by the MI interpreter is shared with the console interpreter. Unfortunately, this approach also spits out a lot of output.
Does anyone know a way to either calculate the length of an instruction without displaying anything, or how to disable the output of interpreter-exec, thus allowing me to achieve my goal? Thank you.
I'll give an arguably cleaner and more extensible solution that's not really shorter. It implements $instn_length() as a new GDB convenience function.
Save this to instn-length.py
import gdb
def instn_length(addr_expr):
t = gdb.execute('x/2i ' + addr_expr, to_string=True)
return long(gdb.parse_and_eval('$_')) - long(gdb.parse_and_eval(addr_expr))
class InstnLength(gdb.Function):
def __init__(self):
super(InstnLength, self).__init__('instn_length')
def invoke(self, addr):
return instn_length(str(long(addr)))
InstnLength()
Then run
$ gdb -q -x instn-length.py /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) start
Temporary breakpoint 1 at 0x4014c0: file true.c, line 59.
Starting program: /usr/bin/true
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde28) at true.c:59
59 if (argc == 2)
(gdb) p $instn_length($pc)
$1 = 3
(gdb) disassemble /r $pc, $pc + 4
Dump of assembler code from 0x4014c0 to 0x4014c4:
An alternative implementation of instn_length() is to use the gdb.Architecture.disassemble() method in GDB 7.6+:
def instn_length(addr_expr):
addr = long(gdb.parse_and_eval(addr_expr))
arch = gdb.selected_frame().architecture()
return arch.disassemble(addr)[0]['length']
I have found a suitable solution; however, shorter solutions would be preferred. This solution sets a logging file to /dev/null, sets to to be overridden if it exists, and then redirects the console output to the log file temporarily.
define get-in-length
set logging file /dev/null
set logging overwrite on
set logging redirect on
set logging on
x/2i $arg0
set logging off
set logging redirect off
set logging overwrite off
set $_in_length = ((unsigned long) $_) - ((unsigned long) $arg0)
end
This solution was heavily inspired by another question's answer: How to get my program name in GDB when writting a "define" script?.
I 'm wondering if it 's possible to create a script that will continue the program 's execution (after a break) step by step based on the memory address value.
So, if I 'm tracing a function and it goes into a high memory value, I 'd call the gdb script until the memory value is below a set value - then it would break again.
I 'm very new to gdb and still reading the manual/tutorials, but I 'd like to know if my goal is possible :) - and if you could bump me to the proper direction, even better ;)
Thanks!
Edit, updated with pseudocode:
while (1) {
cma = getMemoryAddressForCurrentInstruction();
if (cma > 0xdeadbeef) {
stepi;
} else {
break;
}
}
You're talking about the Program Counter (sometimes called the instruction pointer). It's available in gdb as $pc. Your pseudocode can be translated into this actual gdb command:
while $pc <= 0xdeadbeef
stepi
It'll be slow, since it's starting and stopping the program for every instruction, but as far as I know there's no fast way to do it if you don't know exactly what address you're looking for. If you do, then you can just set a breakpoint there:
break *0xf0abcdef
cont
will run until the program counter hits 0xf0abcdef
How to detect file leak and the corresponding stack in Solaris? I see the information was well reported by valgrind on Linux. Please let me know if we have any tools on Solaris also?
On Linux you can use strace to log all file open and close calls. Then you can analyse the log on Resource Leak - the number of open calls should match the number of close calls. If this is not true then you have a leak. On Solaris there is a similar tool - DTrace.
You can, in Solaris, look at currently open filedescriptors of a process by simply using the pfiles command. If you want to track files being opened/closed, truss (the Solaris equivalent to strace) comes to mind, with a filter for file-related syscalls (truss -e open,close but there are others that create filedescriptors).
If you find that the pfiles output grows, first identify whether what you're leaking are ordinary files or things like sockets / pipes. If it's leaking ordinary files, then a dtrace script can be used; the following is a base for own experiments, I currently don't have a Solaris system at hand to try it out and refine it. See below.
#!/usr/bin/dtrace -s
syscall::open:entry { self->t = ustack(); }
syscall::open:return /arg0 >= 0/ { trackedfds[arg0] = self->t; }
syscall::open:return { self->t = 0; }
syscall::close:entry { self->t = arg0; }
syscall::close:return /arg0 >= 0/ { trackedfds[self->t] = 0; }
syscall::close:return { self->t = 0; }
END { printa(trackedfds); }
This builds an associative array indexed by filedescriptor number whose contents are the userside stacktrace at the time of the open() system call. On successful close, the entry for the given filedescriptor number is discarded, and when the program exits (or the script is stopped) the remaining contents of said associative array are printed - if anything's left, that'd be a candidate for leaks.
Note that the END {} probe might not be the correct place for this; proc::exit or something of the like may be required. It depends on when exactly this triggers, before or after the cleanup done at program teardown (exiting / killing a program closes all its filedescriptors, which would erase the trackedfds[] array). That's why I've said above this is a starting point, I can't check without a Solaris system.