gdb to debug double free not detected by valgrind (?) - c++

About once every three times I run my program, malloc reports a double free error; e.g.
myprogram(703,0xb06d9000) malloc: *** error for object 0x17dd0240: double free
*** set a breakpoint in malloc_error_break to debug
I've run the same code through valgrind more than a dozen times but it never reports a double free.
I ran the code through gdb with a breakpoint on malloc_error_break and (when the bug occurs) the error is always reported in a standard c++ library function. I isolated the parent function and valgrinded it in a test unit but no errors.
I think the parent function/standard c++ library is not to blame, it is simply freeing something it allocated but some other function in the parent program freed.
I've tried looking up which object is double freed but my gdb skills aren't up to finding the first object that was freed. Please help me find which object caused the first free and additionally any help to why my progam generates this error. Thank you.
The parent function boils down to:
int i;
double px, py;
int start, finish;
std::string comment;
std::vector<double> x, y;
std::fstream myfile;
myfile.open("filename.txt", std::ios_base::in);
// Read header
std::getline(myfile, comment);
// Read data
while(!myfile.eof())
{
myfile >> comment >> start >> comment >> finish;
for(i = 0; i <= finish-start; i++)
{
myfile >> px >> py; // double free here
x.push_back(px);
y.push_back(py);
}
}
EDIT:
My data file is something like this:
Comment: My Data
start 33 end 36
10.2 139.0076
9.22616 141.584
8.62802 141.083
8.87098 141.813
start 33 end 35
300.354 405
301.698 404.029
303.369 403.953
start 33 end 35
336.201 148.07
334.616 147.243
334.735 146.09
The backtrace from gdb is
(gdb) backtrace
#0 0x93c2d4a9 in malloc_error_break ()
#1 0x93c28497 in szone_error ()
#2 0x93b52503 in szone_free ()
#3 0x93b5236d in free ()
#4 0x93b51f24 in localeconv_l ()
#5 0x93c18163 in strtod_l$UNIX2003 ()
#6 0x93c192e0 in strtod$UNIX2003 ()
#7 0x919b76e8 in std::__convert_to_v<double> ()
#8 0x919983cf in std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::do_get ()
#9 0x91991671 in std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::get ()
#10 0x9198d2dc in std::istream::operator>> ()
Just to reiterate, I need help to find which object was freed the first time, I'm not so interested in refactoring my code for this function - which I don't believe is causing the problem; unless you can find something catastrophic in it.
EDIT: Changed the example code.

You appear to be using Mac OSX (you should have divulged that fact :-)
There are several environment variables which can help you debug heap corruption.
In particular, MallocStackLoggingNoCompact looks very promising.
Here is what I see:
$ cat t.c
int main()
{
char *p = strdup("hello");
free(p);
free(p);
return 0;
}
$ gdb ./a.out
GNU gdb 6.3.50-20050815 (Apple version gdb-967) (Tue Jul 14 02:11:58 UTC 2009)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries ... done
(gdb) set env MallocStackLoggingNoCompact 1
(gdb) b malloc_error_break
Breakpoint 1 at 0x13f44a9
(gdb) r
Starting program: /Users/emp-russian/a.out
bash(22634) malloc: recording malloc stacks to disk using standard recorder
bash(22634) malloc: stack logging compaction turned off; size of log files on disk can increase rapidly
bash(22634) malloc: process 22536 no longer exists, stack logs deleted from /tmp/stack-logs.22536.a.out.8D3VZO
bash(22634) malloc: stack logs being written into /tmp/stack-logs.22634.bash.kjFTGa
arch(22634) malloc: recording malloc stacks to disk using standard recorder
arch(22634) malloc: stack logging compaction turned off; size of log files on disk can increase rapidly
arch(22634) malloc: stack logs deleted from /tmp/stack-logs.22634.bash.kjFTGa
arch(22634) malloc: stack logs being written into /tmp/stack-logs.22634.arch.8L8iLX
Reading symbols for shared libraries ++. done
Breakpoint 1 at 0x909b54a9
a.out(22634) malloc: recording malloc stacks to disk using standard recorder
a.out(22634) malloc: stack logging compaction turned off; size of log files on disk can increase rapidly
a.out(22634) malloc: stack logs deleted from /tmp/stack-logs.22634.arch.8L8iLX
a.out(22634) malloc: stack logs being written into /tmp/stack-logs.22634.a.out.s1qQRw
a.out(22634) malloc: *** error for object 0x100080: double free
*** set a breakpoint in malloc_error_break to debug
Breakpoint 1, 0x909b54a9 in malloc_error_break ()
(gdb) shell ls -l /tmp/stack-logs.22634.a.out.s1qQRw
total 16
-rw------- 1 emp-russian wheel 96 Sep 12 09:42 stack-logs.index
-rw------- 1 emp-russian wheel 208 Sep 12 09:42 stack-logs.stacks
(gdb) shell malloc_history 22634 0x100080
This first part of the history we don't actually care about:
Call [2] [arg=24]: thread_a0103720 |_dyld_start | dyldbootstrap::start(mach_header const*, int, char const**, long) | dyld::_main(mach_header const*, unsigned long, int, char const**, char const**, char const**) | dyld::initializeMainExecutable() | ImageLoader::runInitializers(ImageLoader::LinkContext const&) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) | libSystem_initializer | __keymgr_initializer | _dyld_register_func_for_add_image | dyld::registerAddCallback(void (*)(mach_header const*, long)) | dwarf2_unwind_dyld_add_image_hook | calloc | _malloc_initialize | malloc_set_zone_name | malloc_zone_malloc | __disk_stack_logging_log_stack | reap_orphaned_log_files | opendir$INODE64$UNIX2003 | __opendir2$INODE64$UNIX2003 | telldir$INODE64$UNIX2003 | malloc | malloc_zone_malloc
Call [4] [arg=0]: thread_a0103720 |_dyld_start | dyldbootstrap::start(mach_header const*, int, char const**, long) | dyld::_main(mach_header const*, unsigned long, int, char const**, char const**, char const**) | dyld::initializeMainExecutable() | ImageLoader::runInitializers(ImageLoader::LinkContext const&) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int) | ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) | libSystem_initializer | __keymgr_initializer | _dyld_register_func_for_add_image | dyld::registerAddCallback(void (*)(mach_header const*, long)) | dwarf2_unwind_dyld_add_image_hook | calloc | _malloc_initialize | malloc_set_zone_name | malloc_zone_malloc | __disk_stack_logging_log_stack | reap_orphaned_log_files | closedir$UNIX2003 | _reclaim_telldir | free | malloc_zone_free
But here is the interesting stuff:
Call [2] [arg=6]: thread_a0103720 |0x1 | start | main | strdup | malloc | malloc_zone_malloc
Call [4] [arg=0]: thread_a0103720 |0x1 | start | main | free | malloc_zone_free
Call [4] [arg=0]: thread_a0103720 |0x1 | start | main | free | malloc_zone_free

I am currently trying to track down a "trying to free unallocated pointer" error, also on MacOS. My error is triggered about one in fifty runs (with identical input) of the program, and, of course, it has never happened when I've been running it the debugger (sigh).
What I describe below is a convenient way to get a backtrace on that rare occasion when the bug is triggered, without having to interact with gdb on each run. It is an alternative to EmployedRussian's excellent tips.
Use the following gdb command file:
# malloc_error_break.gdb
break malloc_error_break
run # Add program arguments here
backtrace
Run gdb, executing the command file, like this:
gdb -x malloc_error_break.gdb --batch my_program
If the program runs without hitting the freeing problem, it will print "No stack" in response to the backtrace command, then gdb will exit (thanks to the --batch option).
If the program hits the freeing error, I will (hopefully!) get a stack before gdb exits.

This line:
myfile >> comment
will only read "Comment:" and not the entire line "Comment: My Data". The next thing read from myfile after that line will be "my", which will likely cause problems.
In particular, the first time through the outer loop, it will attempt to read the string "Data" into start, and will be unable to do so (since it can't be parsed as an integer). So the statement:
myfile >> comment >> start >> comment >> finish;
will abort and both start and finish will be left uninitialized.
Depending on the (arbitrary) uninitialized values of start and finish, this could easily make your inner loop infinite. Inserting an infinite number of elements into a vector is likely to lead to strange behavior, although I don't see the crash that you do... it just runs for a very long time and I kill it because I run out of patience.
However, when I work around this bug by removing "My Data" from the first line, I can run your program 10,000 times without a crash.

try running gdb, cont to the crash point, then print the backtrace (type bt); see if that helps you point out where the problem is (note, you have to compile your program in debugging mode, g++ -g, to print a legible backtrace).
EDIT:
On most machine, when you free/delete a memory location the pointer you're freeing is not NULL-ed. At the point where you're free-ing the memory for the second time, try adding a "= NULL", i.e.:
delete myPointer;
myPointer = NULL;
that does NOT fix the problem; however, it will isolate the possibility that the first free()/delete is also that exact same line (but on a previous execution, say, if you're on a loop).
btw, your code snippet doesn't contain any dynamically allocated memory (apart from std::string, which internally allocate memory dynamically).

Have you taken a look into what is the value of start and finish and whether the file has enough contents to fill in vectors x and y?
A better approach would be to re-factor the looping logic -- you should break the moment you hit file EOF. At this point you have left it to faith.

Related

How to get the correct line of code with backtrace in C++?

I took this code and modified it to look like this:
std::string Backtrace(int skip = 1)
{
void *callstack[128];
const int nMaxFrames = sizeof(callstack) / sizeof(callstack[0]);
char buf[1024];
int nFrames = backtrace(callstack, nMaxFrames);
char **symbols = backtrace_symbols(callstack, nFrames);
string message = "";
for (int i = skip; i < nFrames; i++) {
Dl_info info;
if (dladdr(callstack[i], &info)) {
char *demangled = nullptr;
int status;
demangled = abi::__cxa_demangle(info.dli_sname, NULL, 0, &status);
if(demangled != nullptr)
message += string(demangled) + ": " +
to_string((char *)callstack[i] - (char *)info.dli_saddr) + "\n";
free(demangled);
}
}
free(symbols);
if (nFrames == nMaxFrames)
message += "[truncated]\n";
return message;
}
This is supposed to print a stack trace of my current program to identify where things went awry without having to turn on gdb every time my program crashes.
When I run this code (in a state guaranteed to trigger an issue) I get the following stack trace:
DebugCallback(VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, VkDebugUtilsMessengerCallbackDataEXT const*, void*): 146
vk::DispatchLoaderStatic::vkQueueSubmit(VkQueue_T*, unsigned int, VkSubmitInfo const*, VkFence_T*) const: 50
Display::UpdateFrame(): 1088
RenderingPipeline::RenderFrame(vk::Buffer&, vk::Buffer&, Image&, unsigned int): 63
RenderHandler::RenderHandler(Window*, HardwareInterface*, Display*, Memory*): 784
My goal is to try to print as much relevant information as possible. (file, function, line). Now, I thought that the instruction:
(char *)callstack[i] - (char *)info.dli_saddr) which I copied from the original script, would get me the line where the code was called, but for example the file where Display::UpdateFrame() is defined doesn;t even have 1000 lines, so trivially that number isn't the number of the calling code in the original file.
Is there a way to obtain this information with stack trace similarly to how GDB does it?
i.e if the function was called in the source code at
File: Display.hpp
Function: Display::UpdateFrame()
Line: 227
Can I retrieve that information at runtime using stacktrace?
The backtrace() returns offsets in bytes relative to the start of some ELF section. In order to get line numbers and function names you need to use a library that can read the debug info of your program and then figure out which source file / line number / function the given offset corresponds to.
Here is an example of how to do this (written by me), using libbfd (assuming you're on linux):
https://github.com/CarloWood/libmemleak/blob/master/src/addr2line.c

Valgrind invalid read on FILE*

The following code, when built on ubuntu creates an executable.
#include <stdio.h>
void otherfunc(FILE* fout){
fclose(fout);//Line 4
fout = fopen("test.txt", "w");//Delete contents and create a new file//Line 5
setbuf(fout, 0);//Line 6
}
int main() {
FILE *fout = fopen("test.txt", "r");//Line 10
if (fout) {
//file exists and can be opened
fclose(fout);//Line 13
fout = fopen("test.txt", "a");//Line 14
setbuf(fout, 0);
}
else {
//file doesn't exists or cannot be opened
fout = fopen("test.txt", "a");//Line 19
}
otherfunc(fout);//Line 22
fclose(fout);//Line 24
return 0;
}
When run through valgrind, valgrind gives the following warnings:
==13569== Invalid read of size 4
==13569== at 0x4EA7264: fclose##GLIBC_2.2.5 (iofclose.c:53)
==13569== by 0x400673: main (newmain.cpp:24)
==13569== Address 0x52042b0 is 0 bytes inside a block of size 552 free'd
==13569== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13569== by 0x4EA7362: fclose##GLIBC_2.2.5 (iofclose.c:84)
==13569== by 0x4005CD: otherfunc(_IO_FILE*) (newmain.cpp:4)
==13569== by 0x400667: main (newmain.cpp:22)
==13569== Block was alloc'd at
==13569== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13569== by 0x4EA7CDC: __fopen_internal (iofopen.c:69)
==13569== by 0x400657: main (newmain.cpp:19)
Essentially, it is complaining that the fclose(fout); on Line 24 is closing an already freed memory that was freed on line 4 fclose(fout); within otherfunc(). But the Line 24's fclose(fout); is meant to close the fopen() performed on Line 5.
At any point in time in the code, whenever a fclose() is called, there is always exactly one open fopen(). Why is this then an invalid read as reported by valgrind?
otherfunc takes a file pointer by value. So the value you've assigned at line 5 is lost after returning from otherfunc and when it returns into main the value of fout there remains unchanged. It contains a dangling file pointer value that you've closed at line 4. Therefore a call to close on line 24 will receive an invalid pointer.

Segmentation fault in reading from a file into a string array

I am having a bizarre error in assigning values in an array of strings. It assigns two or three intermittently and then crashes.
I am trying to implement a simple unsorted dictionary like structure using two parallel arrays(one of strings and one of integers)
Here is where in the code I think the fault is occurring.
std::ifstream infile1("out1.tmp");
std::string word;
while (getline(infile1,word)){
std::istringstream iss(word);
printf("stream created\n");
if (!(iss >> bookone.keys[i]))
break;
i++;
}
bookone is a dictionary object that has the public fields:keys and index. Even if I move the variable assignment into the class it errors in the same way.
The part that confuses me the most is that it appears to work the first few iterations.
strace provides this:
open("out1.tmp", O_RDONLY) = 5
read(5, "This\nEtext\nfile\nis\npresented\nby\n"..., 8191) = 8191
write(1, "stream created\n", 15stream created
) = 15
write(1, "stream created\n", 15stream created
) = 15
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd36ff8} ---
+++ killed by SIGSEGV +++
[1] 16430 segmentation fault strace ./book macbeth.txt othello.txt
and valgrind gives me a bunch of these:
==16499== Invalid read of size 8
==16499== at 0x4EAB634: std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==16499== by 0x401673: main (in /home/luna/sauce/books/book)
==16499== Address 0x5a00ca8 is 24 bytes before a block of size 568 alloc'd
==16499== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==16499== by 0x56C106C: __fopen_internal (iofopen.c:73)
==16499== by 0x401555: main (in /home/luna/sauce/books/book)

Buffer overflow successful, but it shouldn't be?

This is my program, with a vulnerable char buffer, name[400].
void greeting(char *temp1,char *temp2)
{
char name[400];
strcpy(name,temp2);
printf("Hello %s %s\n", temp1, name);
}
int main(int argc,char *argv[])
{
greeting(argv[1],argv[2]);
return 0;
}
Compiled as follows on Linux (64-bit) with ASLR disabled:
gcc -m32 -ggdb -fno-stack-protector -mpreferred-stack-boundary=2 -z execstack -o buffer buffer.c
(gdb) run Mr `perl -e 'print "A" x 400'`
Hello Mr AAAAAAA.... (truncated)
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) info reg eip ebp
eip 0x41414141
ebp 0x41414141
I'm assuming here that a null byte is added, resulting in the overflow, but what I don't understand is how the EIP can be 0x41414141 with only a 1 byte overflow?
EDIT: After more probing with gdb, there is no null byte added, and there is no overflow at all when only 400 bytes are entered. So how does my EIP end up pointing to my buffer contents without any overflow? I'm assuming that the absense of a null byte causes problems for printf().
C string are NUL terminated, so you end up with a 1-byte overflow with a value of zero (NUL).
The one-byte NUL overflow modifies the saved value of $ebp to point lower on the stack than it should. This results in restoring an incorrect value into $esp, and control of $eip.
Take specific note of the value of ebp. After the call, the value of $ebp is still the same, but the value it points to (the value which main will restore off the stack) has been adjusted, and points into the middle of our controlled buffer.
When greeting returns into main, nothing happens. However, when main restores the stack frame with a leave instruction, the stack pointer $esp is set into the middle of our controlled buffer. When the ret instruction is executed, we have control over $eip.
Note that I've used a cyclic pattern generated by pwntools rather than the standard AAAAA since we can use it to calculate offsets. For example 'aaaa' => 0, 'aaab' => 1, 'aaba' => 2.
Before Strcpy
EBP: 0xffffc6e8 --> 0xffffc6f8 --> 0x0
ESP: 0xffffc54c --> 0xffffc558 --> 0xffffc5c8 --> 0xf63d4e2e
EIP: 0x8048466 (<greeting+25>: call 0x8048320 <strcpy#plt>)
After Strcpy
EBP: 0xffffc6e8 --> 0xffffc600 ("raabsaabtaabuaabvaabwaabxaabyaab"...)
ESP: 0xffffc54c --> 0xffffc558 ("aaaabaaacaaadaaaeaaafaaagaaahaaa"...)
EIP: 0x804846b (<greeting+30>: lea eax,[ebp-0x190])
Before leave in main
EBP: 0xffffc600 ("raabsaabtaabuaabvaabwaabxaabyaab"...)
ESP: 0xffffc6f0 --> 0xffffc9bb ("Mister")
EIP: 0x80484b1 (<main+39>: leave)
After leave in main
EBP: 0x62616172 (b'raab')
ESP: 0xffffc604 ("saabtaabuaabvaabwaabxaabyaabzaac"...)
EIP: 0x80484b2 (<main+40>: ret)
At ret in main
EBP: 0x62616172 (b'raab')
ESP: 0xffffc608 ("taabuaabvaabwaabxaabyaabzaacbaac"...)
EIP: 0x62616173 (b'saab')

Linux Segmentation fault with std::string::iterator

I keep getting unusual segmentation faults inside libc.so.6 on a CentOS 6.4 64bit machine. This is the backtrace that gdb most often reports:
0x00007ffff60d9b3f in memcpy () from /lib64/libc.so.6
(gdb) backtrace
#0 0x00007ffff60d9b3f in memcpy () from /lib64/libc.so.6
#1 0x00000000004b6a6b in std::string::_S_construct<__gnu_cxx::__normal_iterator<char*, std::string> > ()
#2 0x00000000004b719b in NewsMAIL::SMTPClient::receiveLine(std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
#3 0x00000000004b776f in NewsMAIL::SMTPClient::handleResponse() ()
And this is the code in question that seems to trigger the segfault:
bool SMTPClient::receiveLine(std::string* Line)
{
static std::string Buffer;
std::string::iterator iter;
while((iter = std::find(Buffer.begin(), Buffer.end(), '\n')) == Buffer.end()) {
char Bucket[MAX_BUCKET_SIZE + 1] = {};
int BytesRecv = read(m_Socket, Bucket, MAX_BUCKET_SIZE);
//Did we get a socket error?
if(BytesRecv == -1) {
//This is generally considered a bad thing..
*Line = Buffer;
Buffer = std::string("");
return false;
}
Bucket[BytesRecv] = 0;
Buffer += Bucket;
}
*Line = std::string(Buffer.begin(), iter);
Buffer = std::string(iter + 1, Buffer.end());
return true;
}
Sometimes it works 100% without any failures so it is not everytime unfortunately.
The above code is a slightly modified version of this: https://stackoverflow.com/a/1584620/3133245
Does anyone have any thoughts on why this might be happening? I am compiling with g++ 4.7.2
Thanks!
Nate
Using a static variable (Buffer) is not thread safe. Could cause a crash.
You should add a check that Line is not NULL.
BTW, the line Buffer = std::string(""); could be Buffer.clear();
In addition to the static variable issue, are you sure that the data that is received contains no embedded NULL characters?
If the resulting Buffer contains embedded NULL bytes, this line will not do the correct concatenation using the += operator:
Buffer += Bucket;
The += overload assumes that Bucket is a c-style string, thus the first NULL byte encountered will be used as the terminator when the concatenation occurs.
Taking a glance at the code, it would seem to be the case that if the Bucket does indeed contain embedded NULL chracters, doing the above concatenation could result in your "iter" iterator pointing passed the end() of Buffer (in those lines after the while() loop).
Instead, you can do this:
Buffer.append(Bucket, BytesRecv)
This guarantees that all characters that Bucket is addressing will be concatenated onto the existing string.
But before making any changes, make sure you know exactly what the issue is, especially since you stated the error doesn't happen very often. Changing around code without first knowing the true cause of the error may just mask the error, thus making it harder to diagnose the real issue.