Debugging an assertion with gdb shows weird std::string size - c++

I have a problem with an assertion in a C++ program.
HA_Archive & HA_Archive::operator << (const string & str) {
buffer[wcursor] = HA_TYPE_STRING;
wcursor++;
unsigned size = str.size();
CASSERT((bufferSize > wcursor + size),"buffer exceeds the maximum");
CASSERT is a simple assert, and there is the problem.
The program left a core dump that I have debugged with gdb, and I found something strange.
Program terminated with signal 6, Aborted.
#0 0xb7766424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7766424 in __kernel_vsyscall ()
#1 0xb6cd1cb1 in raise () from /lib/libc.so.6
#2 0xb6cd33e8 in abort () from /lib/libc.so.6
#3 0xb6ccb58c in __assert_fail () from /lib/libc.so.6
#4 0x086c6dbd in HA_Archive::operator<< (this=0xb2610fb8, str=#0xb49e1f08) at HA_Archive.cxx:94
#5 0x0849b4d3 in PortDriver::serialize (this=0xb49e1ed8, ar=#0xb2610fb8) at PortDriver.cxx:624
#6 0x0838ed80 in PortSession::serialize (this=0xb49e1630, ar=#0xb2610fb8) at PortSession/PortSession.h:71
(gdb) frame 4
#4 0x086c6dbd in HA_Archive::operator<< (this=0xb2610fb8, str=#0xb49e1f08) at HA_Archive.cxx:94
94 HA_Archive.cxx: No such file or directory.
in HA_Archive.cxx
(gdb) print str
$1 = (const string &) #0xb49e1f08: {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0xb322f9b4 "NOT-SET"}}
(gdb) print wcursor
$2 = 180
(gdb) print bufferSize
$3 = 4096
(gdb) print size
$4 = 171791040
Printing the str I can see that it has "NOT-SET" and that is OK, but when I print the variable size that is str.size() the value is huge! Obviously is the cause that make the asserts fails, because bufferSize is 4096 and wcursor is only 180.
I am very far to be and expert in gdb so my first question is if I am doing something wrong whit it. Maybe size is not the real value at runtime?
My second question is: If gdb is showing the correct value of size, why I am seeing correctly the string "NOT-SET" when I print it, but the size is that huge number?
Thanks!

There are a few ways this can happen.
The string could really be that size, but the contents could have a nul character at str[7], which would cause GDB to stop printing it out.
Or maybe something has scribbled on your heap and has overwritten the memory location that stores the string's size, so although the contents are still only 7 bytes long the size member has been overwritten with garbage.
Or str could just be a dangling reference and the memory pointed to by _M_p still contains the string "NOT-SET" but the memory containing the size member has been re-used for something else.
I would try running under valgrind to ensure there are no buffer overruns that might be overwriting the member, or use-after-free errors.

Related

Using gdb to decode hex data to struct

I have a hex stream of hex data that is printed like
0x3a45 0x1234 0x0352 (in real far longer)
I know that it is content in a struct. Is there a way in gdb to map this on the struct? Gdb seems only to accept single values for doing this.
Like:
(gdb) print (myStruct) 0x3a45 0x1234 0x0352
$1 = { a = 3a, b = 45, f = 0x1234, c = 03, e = 52}
In this case it's very simple but there is complex struct and the hex string is far larger.
I think there are a couple viable ways to do this in gdb.
The simplest way is to write the data into the inferior's memory somehow. It might look something like:
(gdb) set $mem = malloc(50) # number of bytes
(gdb) set $mem[0] = 0x72
(gdb) set $mem[1] = 0xff
# etc - you can find faster ways to do this
(gdb) print *(struct whatever *) $mem
Filling the memory is a pain, but this can be scripted. For example you can write a little shell script to convert the raw bytes into a sequence of set commands and then source it. Or you can just write a new gdb command in Python that automates it all.
gdb also has an extension to let one create an array on the command line, and do a kind of "reinterpret cast" on it. I found this method a bit less handy, because I could only make the array feature create arrays of int, not char. But anyhow, consider this little program:
struct x {
int a;
long b;
};
int main() {
struct x x = { 23, 97 };
return 0;
}
I start gdb and stop on the return, then examine the memory:
(gdb) p sizeof(int)
$1 = 4
(gdb) p sizeof(x)
$2 = 16
(gdb) x/4xw &x
0x7fffffffe240: 0x00000017 0x00007fff 0x00000061 0x00000000
(That second word is garbage because it is in the struct padding...)
Now we can recreate x by hand from the raw data:
(gdb) print {struct x}{0x17, 0x7fff, 0x61, 0}
$3 = {
a = 23,
b = 97
}
This expression uses two extensions to C expressions that gdb provides. First, {0x17, 0x7fff...} is a way to write an array. Second, {struct x} is a kind of "reinterpret cast" - it reinterprets the raw bytes of the value as named type.

unexpected results for memory allocation with c malloc function

I have to allocate memory for 4 pointers to pointers on float (2D) over many iterations (6), but at the second iteration, malloc gives me the same address for two allocations. Code :
int i=0, a=0;
for(i=0;i<6;i++)
{
float** P_i=(float**) malloc(4*sizeof(float*));
for(a=0;a<4;a++) P_i[a]=(float*) calloc(4,sizeof(float));
for(a=0;a<4;a++) free(P_i[a]);
free(P_i);
}
Debugging with gdb :
(gdb) print i
$42 = 1
(gdb) set $pos=0
(gdb) print P_i[$pos++]
$51 = (float *) 0x804d500
(gdb) print P_i[$pos++]
$52 = (float *) 0x804d148
(gdb) print P_i[$pos++]
$53 = (float *) 0x804d4e8
(gdb) print P_i[$pos++]
$54 = (float *) 0x804d500
P_i[0] and P_i[3] point to the same address 0x804d500 and I can't find why :/
between the first for(a=0;a<4;a++) and the 2nd (before freeing)
My guess is that gdb breaks on last iteration of the loop, before the last calloc() call. If it's the case P_i[3] have the address of the previous iteration.
Btw, it's hard to use gdb when there's more than one statement per line.
With information available this can't be answered, but let me try.
The code seems ok. I can't reproduce your problem either.
You can't really put a breakpoint on a blank line. I guess that would put it on a line with free.
My guess is, your code was compiled with optimization enabled, which probably reordered things making sure you are not really sure where execution has stopped. Disable optimization and re-build (on GCC that would be -O0). Or show us the disassembly (including current PC where you print).
My run on Ubuntu gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4 built with -O0 -g, stopped on a line with free (before it was executed):
(gdb) print i
$1 = 0
(gdb) set $pos=0
(gdb) print P_i[$pos++]
$2 = (float *) 0x602040
(gdb) print P_i[$pos++]
$3 = (float *) 0x602060
(gdb) print P_i[$pos++]
$4 = (float *) 0x602080
(gdb) print P_i[$pos++]
$5 = (float *) 0x6020a0
(gdb) bt
#0 main () at malloc.c:12
(gdb) list
7 for(i=0;i<6;i++)
8 {
9 float** P_i=(float**) malloc(4*sizeof(float*));
10 for(a=0;a<4;a++) P_i[a]=(float*) calloc(4,sizeof(float));
11
12 for(a=0;a<4;a++) free(P_i[a]);
Does your source code exhibit a problem even if you build it separately (not a part of larger program)? Do you have custom calloc / malloc implemented? What does "nm your-executable|grep calloc" show? It should be something like this:
U calloc##GLIBC_2.2.5

Segfault when using time.h

Ok I've been trying just about everything I know to get this program to stop crashing, but I just can't see why. I was able to isolate the problem to code with ctime, and just made a small program to demonstrate what's wrong. This code compiles without a problem.
#include<iostream>
#include<ctime>
int main();
time_t getDay(time_t t);
int diffDay(time_t end,time_t begin);
int main()
{
time_t curTime=time(NULL); //Assign current time
time_t curDay=getDay(curTime); //Assign beginning of day
time_t yesterday=curDay-16*60*60; //Assign a time that's within yesterday
time_t dif=diffDay(curTime,yesterday); //Assign how many days are between yesterday and curTime
std::cout << "Cur Time: " << curTime << '\n'
<< "Cur Day: " << curDay << '\n'
<< "Yes Day: " << dif << '\n' << std::flush;
char a;
std::cin >> a; ///Program crashes after here.
return 0;
}
///Get beginning of day that t is a part of
time_t getDay(time_t t)
{
//Get current time
struct tm* loctim=localtime(&t);
if(loctim==0)
return 0;
//Set loctim to beginning of day
loctim->tm_sec=0;
loctim->tm_min=0;
loctim->tm_hour=0;
//Create a int from the new time
int reval=mktime(loctim);
//Free memory
delete loctim;
return reval;
}
///Calculate how many days are between begin and end
int diffDay(time_t end,time_t begin)
{
time_t eDay=getDay(end); //Get beginning of day end is a part of
time_t bDay=getDay(begin); //Get beginning of day begin is a part of
time_t dif=(eDay-bDay)/(24*60*60); //Get how many days (86400 seconds)
return dif;
}
Here is some text I got from debugging.
Call Stack
#0 77BC3242 ntdll!LdrLoadAlternateResourceModuleEx() (C:\Windows\system32\ntdll.dll:??)
#1 00000000 0x6d067ad3 in ??() (??:??)
#2 00000000 0x00000018 in ??() (??:??)
#3 77BC3080 ntdll!LdrLoadAlternateResourceModuleEx() (C:\Windows\system32\ntdll.dll:??)
#4 00000000 0x00000018 in ??() (??:??)
#5 77C60FCB ntdll!TpCheckTerminateWorker() (C:\Windows\system32\ntdll.dll:??)
#6 00000000 0x007f0000 in ??() (??:??)
#7 00000000 0x50000163 in ??() (??:??)
#8 00000000 0x00000018 in ??() (??:??)
#9 77C1AC4B ntdll!RtlReAllocateHeap() (C:\Windows\system32\ntdll.dll:??)
#10 00000000 0x007f0000 in ??() (??:??)
#11 00000000 0x50000163 in ??() (??:??)
#12 00000000 0x00000018 in ??() (??:??)
#13 77BC3080 ntdll!LdrLoadAlternateResourceModuleEx() (C:\Windows\system32\ntdll.dll:??)
#14 00000000 0x00000018 in ??() (??:??)
#15 769A9D45 msvcrt!malloc() (C:\Windows\syswow64\msvcrt.dll:??)
#16 769AF5D3 strcpy_s() (C:\Windows\syswow64\msvcrt.dll:??)
#17 769B2B18 open_osfhandle() (C:\Windows\syswow64\msvcrt.dll:??)
#18 00000000 0x00000018 in ??() (??:??)
#19 769B3C7D msvcrt!_get_fmode() (C:\Windows\syswow64\msvcrt.dll:??)
#20 769BA6A0 msvcrt!_fsopen() (C:\Windows\syswow64\msvcrt.dll:??)
#21 00000000 0xc3458a06 in ??() (??:??)
#22 00000000 0x00000000 in ??() (??:??)
Also here's another call stack from the same build.
#0 77BE708C ntdll!RtlTraceDatabaseLock() (C:\Windows\system32\ntdll.dll:??)
#1 00000000 0x6ccdaf66 in ??() (??:??)
#2 00000000 0x00000000 in ??() (??:??)
Is it some special build option? I was using -std=c++0x but decided to try the program without it and it still crashed. Thanks for any help, I've been trying to fix this all day.
I think that the problem is here:
struct tm* loctim=localtime(&t);
delete loctim;
localtime returns a pointer to a static buffer. You shall not free it. This is causing an "undefined behaviour". i.e. some data are put into an inconsistent state and may cause crash at another place of program which may seem not to be directly related to the problem.
A nice way to find such problems is to run the program under valgrind. It gives you very accurate information about what is going wrong -
vlap:~/src $ valgrind ./a.out
==29314== Memcheck, a memory error detector
==29314== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==29314== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==29314== Command: ./a.out
==29314==
==29314== Invalid free() / delete / delete[] / realloc()
==29314== at 0x4C29E6C: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29314== by 0x400D2A: getDay(long) (test.cpp:44)
==29314== by 0x400BEE: main (test.cpp:11)
==29314== Address 0x59f5560 is 0 bytes inside data symbol "_tmbuf"
==29314==
==29314== Invalid free() / delete / delete[] / realloc()
==29314== at 0x4C29E6C: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29314== by 0x400D2A: getDay(long) (test.cpp:44)
==29314== by 0x400D4D: diffDay(long, long) (test.cpp:52)
==29314== by 0x400C13: main (test.cpp:13)
==29314== Address 0x59f5560 is 0 bytes inside data symbol "_tmbuf"
==29314==
==29314== Invalid free() / delete / delete[] / realloc()
==29314== at 0x4C29E6C: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29314== by 0x400D2A: getDay(long) (test.cpp:44)
==29314== by 0x400D5D: diffDay(long, long) (test.cpp:53)
==29314== by 0x400C13: main (test.cpp:13)
==29314== Address 0x59f5560 is 0 bytes inside data symbol "_tmbuf"
==29314==
Cur Time: 1395580379
Cur Day: 1395529200
Yes Day: 1
a
==29314==
==29314== HEAP SUMMARY:
==29314== in use at exit: 0 bytes in 0 blocks
==29314== total heap usage: 12 allocs, 15 frees, 1,846 bytes allocated
==29314==
==29314== All heap blocks were freed -- no leaks are possible
==29314==
==29314== For counts of detected and suppressed errors, rerun with: -v
==29314== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 3 from 3)
You cant use delete, which is a c++ operator, to free the result of localtime() which doesnt use c++ memory management. In any case, you dont actually need to release the value returned by localtime.
You can use the cmd or the terminal to get the time in a file on cmd: echo %time% > time.txt and on linux terminal: date > time.txt
You can run the commsnd with: system(command)
And than you read the file.

gdb, hp ux: getting message Couldn't find virtual table -- object may not be constructed yet

in HP UX 11.23 on ia64 , when debugging a particular code segment, i am getting this error
when trying to access a particular object.
I was wondering if anyone faced it, and could help me make sense of it.
(gdb) p *rsp
$8 = {<> = {Couldn't find virtual table -- object may not be constructed yet.
(gdb) p rrc
$9 = (class iface::rrc::MeasurementMessage *) 0x0
(gdb) l
417 iface::cpr::PositionResponse &rspPtr,
418 bool &is3D)
419 {
420 iface::rrlp::PositionResponse *rsp = lt.getRrlpResponse();
421 iface::rrc::MeasurementMessage *rrc = lt.getRrc();
422 iface::lpp::PositionResponse *lpp = lt.getLppResponse();
423 const iface::util::GadShape *gad = 0;
424 iface::cpr::PositionRequest &req = lt.getCprRequest();
425 const iface::is801::MsRspLocation *cdma = lt.getMsRspLocation();
426
(gdb) bt
#0 eotd::fetchAndSetPosition (lt=#0xa76200, position=#0x65e2c640,
rspPtr=#0x3be9660, is3D=#0x65e2c580)
at /home/egpsbld/source/smlc47hpux/icursor/com/cps/eotd/utils.cpp:422
#1 0x200000007e7195b0:0 in eotd::P6Locator::compute (this=0x4076c0,
lt=#0xa76200)
These kind of issues are frequent when you debug optimized code. Local variables are all messed up by the optimizer. The resulting behavior is the same, of course, but you will not see the data where and when you'd expect.
If that is the case, simply recompile your program with -O0

Can I set a breakpoint on 'memory access' in GDB?

I am running an application through gdb and I want to set a breakpoint for any time a specific variable is accessed / changed. Is there a good method for doing this? I would also be interested in other ways to monitor a variable in C/C++ to see if/when it changes.
watch only breaks on write, rwatch let you break on read, and awatch let you break on read/write.
You can set read watchpoints on memory locations:
gdb$ rwatch *0xfeedface
Hardware read watchpoint 2: *0xfeedface
but one limitation applies to the rwatch and awatch commands; you can't use gdb variables
in expressions:
gdb$ rwatch $ebx+0xec1a04f
Expression cannot be implemented with read/access watchpoint.
So you have to expand them yourself:
gdb$ print $ebx
$13 = 0x135700
gdb$ rwatch *0x135700+0xec1a04f
Hardware read watchpoint 3: *0x135700 + 0xec1a04f
gdb$ c
Hardware read watchpoint 3: *0x135700 + 0xec1a04f
Value = 0xec34daf
0x9527d6e7 in objc_msgSend ()
Edit: Oh, and by the way. You need either hardware or software support. Software is obviously much slower. To find out if your OS supports hardware watchpoints you can see the can-use-hw-watchpoints environment setting.
gdb$ show can-use-hw-watchpoints
Debugger's willingness to use watchpoint hardware is 1.
What you're looking for is called a watchpoint.
Usage
(gdb) watch foo: watch the value of variable foo
(gdb) watch *(int*)0x12345678: watch the value pointed by an address, casted to whatever type you want
(gdb) watch a*b + c/d: watch an arbitrarily complex expression, valid in the program's native language
Watchpoints are of three kinds:
watch: gdb will break when a write occurs
rwatch: gdb will break wnen a read occurs
awatch: gdb will break in both cases
You may choose the more appropriate for your needs.
For more information, check this out.
Assuming the first answer is referring to the C-like syntax (char *)(0x135700 +0xec1a04f) then the answer to do rwatch *0x135700+0xec1a04f is incorrect. The correct syntax is rwatch *(0x135700+0xec1a04f).
The lack of ()s there caused me a great deal of pain trying to use watchpoints myself.
I just tried the following:
$ cat gdbtest.c
int abc = 43;
int main()
{
abc = 10;
}
$ gcc -g -o gdbtest gdbtest.c
$ gdb gdbtest
...
(gdb) watch abc
Hardware watchpoint 1: abc
(gdb) r
Starting program: /home/mweerden/gdbtest
...
Old value = 43
New value = 10
main () at gdbtest.c:6
6 }
(gdb) quit
So it seems possible, but you do appear to need some hardware support.
Use watch to see when a variable is written to, rwatch when it is read and awatch when it is read/written from/to, as noted above. However, please note that to use this command, you must break the program, and the variable must be in scope when you've broken the program:
Use the watch command. The argument to the watch command is an
expression that is evaluated. This implies that the variabel you want
to set a watchpoint on must be in the current scope. So, to set a
watchpoint on a non-global variable, you must have set a breakpoint
that will stop your program when the variable is in scope. You set the
watchpoint after the program breaks.
In addition to what has already been answered/commented by asksol and Paolo M
I didn't at first read understand, why do we need to cast the results. Though I read this: https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html, yet it wasn't intuitive to me..
So I did an experiment to make the result clearer:
Code: (Let's say that int main() is at Line 3; int i=0 is at Line 5 and other code.. is from Line 10)
int main()
{
int i = 0;
int j;
i = 3840 // binary 1100 0000 0000 to take into account endianness
other code..
}
then i started gdb with the executable file
in my first attempt, i set the breakpoint on the location of variable without casting, following were the results displayed
Thread 1 "testing2" h
Breakpoint 2 at 0x10040109b: file testing2.c, line 10.
(gdb) s
7 i = 3840;
(gdb) p i
$1 = 0
(gdb) p &i
$2 = (int *) 0xffffcbfc
(gdb) watch *0xffffcbfc
Hardware watchpoint 3: *0xffffcbfc
(gdb) s
[New Thread 13168.0xa74]
Thread 1 "testing2" hit Breakpoint 2, main () at testing2.c:10
10 b = a;
(gdb) p i
$3 = 3840
(gdb) p *0xffffcbfc
$4 = 3840
(gdb) p/t *0xffffcbfc
$5 = 111100000000
as we could see breakpoint was hit for line 10 which was set by me. gdb didn't break because although variable i underwent change yet the location being watched didn't change (due to endianness, since it continued to remain all 0's)
in my second attempt, i did the casting on the address of the variable to watch for all the sizeof(int) bytes. this time:
(gdb) p &i
$6 = (int *) 0xffffcbfc
(gdb) p i
$7 = 0
(gdb) watch *(int *) 0xffffcbfc
Hardware watchpoint 6: *(int *) 0xffffcbfc
(gdb) b 10
Breakpoint 7 at 0x10040109b: file testing2.c, line 10.
(gdb) i b
Num Type Disp Enb Address What
6 hw watchpoint keep y *(int *) 0xffffcbfc
7 breakpoint keep y 0x000000010040109b in main at testing2.c:10
(gdb) n
[New Thread 21508.0x3c30]
Thread 1 "testing2" hit Hardware watchpoint 6: *(int *) 0xffffcbfc
Old value = 0
New value = 3840
Thread 1 "testing2" hit Breakpoint 7, main () at testing2.c:10
10 b = a;
gdb break since it detected the value has changed.