I am getting an invalid read error when the src string ends with \n, the error disappear when i remove \n:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void)
{
char *txt = strdup ("this is a not socket terminated message\n");
printf ("%d: %s\n", strlen (txt), txt);
free (txt);
return 0;
}
valgrind output:
==18929== HEAP SUMMARY:
==18929== in use at exit: 0 bytes in 0 blocks
==18929== total heap usage: 2 allocs, 2 frees, 84 bytes allocated
==18929==
==18929== All heap blocks were freed -- no leaks are possible
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==18929==
==18929== 1 errors in context 1 of 1:
==18929== Invalid read of size 4
==18929== at 0x804847E: main (in /tmp/test)
==18929== Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd
==18929== at 0x402A17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18929== by 0x8048415: main (in /tmp/test)
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
How to fix this without sacrificing the new line character?
It's not about the newline character, nor the printf format specifier. You've found what is arguably a bug in strlen(), and I can tell you must be using gcc.
Your program code is perfectly fine. The printf format specifier could be a little better, but it won't cause the valgrind error you are seeing. Let's look at that valgrind error:
==18929== Invalid read of size 4
==18929== at 0x804847E: main (in /tmp/test)
==18929== Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd
==18929== at 0x402A17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18929== by 0x8048415: main (in /tmp/test)
"Invalid read of size 4" is the first message we must understand. It means that the processor ran an instruction which would load 4 consecutive bytes from memory. The next line indicates that the address attempted to be read was "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd."
With this information, we can figure it out. First, if you replace that '\n' with a '$', or any other character, the same error will be produced. Try it.
Secondly, we can see that your string has 40 characters in it. Adding the \0 termination character brings the total bytes used to represent the string to 41.
Because we have the message "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd," we now know everything about what is going wrong.
strdup() allocated the correct amount of memory, 41 bytes.
strlen() attempted to read 4 bytes, starting at the 40th, which would extend to a non-existent 43rd byte.
valgrind caught the problem
This is a glib() bug. Once upon a time, a project called Tiny C Compiler (TCC) was starting to take off. Coincidentally, glib was completely changed so that the normal string functions, such as strlen() no longer existed. They were replaced with optimized versions which read memory using various methods such as reading four bytes at a time. gcc was changed at the same time to generate calls to the appropriate implementations, depending on the alignment of the input pointer, the hardware compiled for, etc. The TCC project was abandoned when this change to the GNU environment made it so difficult to produce a new C compiler, by taking away the ability to use glib for the standard library.
If you report the bug, glib maintainers probably won't fix it. The reason is that under practical use, this will likely never cause an actual crash. The strlen function is reading bytes 4 at a time because it sees that the addresses are 4-byte aligned. It's always possible to read 4 bytes from a 4-byte-aligned address without segfaulting, given that reading 1 byte from that address would succeed. Therefore, the warning from valgrind doesn't reveal a potential crash, just a mismatch in assumptions about how to program. I consider valgrind technically correct, but I think there is zero chance that glib maintainers will do anything to squelch the warning.
The error message seems to indicate that it's strlen that read past the malloced buffer allocated by strdup. On a 32-bit platform, an optimal strlen implementation could read 4 bytes at a time into a 32-bit register and do some bit-twiddling to see if there's a null byte in there. If near the end of the string, there are less than 4 bytes left, but 4 bytes are still read to perform the null byte check, then I could see this error getting printed. In that case, presumably the strlen implementer would know if it's "safe" to do this on the particular platform, in which case the valgrind error is a false positive.
Related
I have some strange behaviour that I do not understand. The code is a bit complex so I would refrain from posting it here and instead describe the behaviour and hope that somebody, knowing how valgrind works, has an idea that I can pursue despite this little information.
Background:
I am developing some additional functionality for an open-source, c/c++ based agent-based modelling platform fork # my github. Compilation is fine. Everything seems to work as it should so far based on my validation with test-programs. Also, valgrind does not report any errors of relevance. But, reproducability (which is crucial) is strange.
Within the framework one defines a model file (initialisation of a simulation run, basically). Based on this file, one should be able to reproduce the exact same output (and platform independent). In a way this works: If I start the simulation environment (GUI version), load the file and run it, it produces the same result each time. Also, using the command-line version, I get the same results each time.
But, if, from a running instance of the simulation environment, I run the same model more than once, then the strange behavior occurs - sometimes...
Compiler options used:
CC=g++
GLOBAL_CC=-march=native -std=gnu++14
SSWITCH_CC=-fnon-call-exceptions -Og -ggdb3 -Wall
The set-up:
I run the compiled file and, internally to the program compiled, a fixed simulation set-up three times. Now, it should produce the exact same results each time, which I check by printing random numbers at different stages.
The strange behaviour:
Option #1:
When I run the program in valgrind using the options:
valgrind --leak-check=full --leak-resolution=high --show-reachable=yes
I do not get the same results internally
Report from Option 1:
Finished processing sim1
==6206==
==6206== HEAP SUMMARY:
==6206== in use at exit: 43 bytes in 1 blocks
==6206== total heap usage: 4,124,309 allocs, 4,124,308 frees, 888,390,511 bytes allocated
==6206==
==6206== 43 bytes in 1 blocks are still reachable in loss record 1 of 1
==6206== at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==6206== by 0x5BE7FB2: getcwd (getcwd.c:84)
==6206== by 0x143391: lsdmain(int, char**) (lsdmain.cpp:203)
==6206== by 0x10C37D: main (main_gnuwin.cpp:29)
==6206==
==6206== LEAK SUMMARY:
==6206== definitely lost: 0 bytes in 0 blocks
==6206== indirectly lost: 0 bytes in 0 blocks
==6206== possibly lost: 0 bytes in 0 blocks
==6206== still reachable: 43 bytes in 1 blocks
==6206== suppressed: 0 bytes in 0 blocks
==6206==
==6206== For counts of detected and suppressed errors, rerun with: -v
==6206== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Option #2
However, when I use the following option instead:
valgrind --tool=helgrind
I do get the same results each time with the command line version. Interestingly, the first results with option #1 are the same as the results with option #2.
I would be happy for any suggestions. And, I am not a trained computer scientist... I am using and mt1937 (reinitialised each time) - but the initial random numbers between the simulations are the same, so I do not think the error resides here. Although later within the run the random numbers change in Option #1 (this is my test, besides the time the simulation needs to find an equilibrium).
Finally, I could find the issue: At two points in the program I sort a temporary vector with pairs of distance values and pointers of objects located on a 2d space:
std::sort( vector.begin(),vector.end() ); // vector of std::pairs<double, pointer>
The solution, obviously, is to only sort by the first item of the pair:
std::sort( vector.begin(),vector.end(), [](auto const &A, auto const &B ){return A.first < B.first; } );
Some remarks on why I did not find this issue directly:
When I implemented this sort, I intended to make it "stable". The pointers of the objects are kind of unique, thus in different subsets the ordering would be the same and also independently of how I add the items to the set.
I did not consider that pointer values are (not precisely, but in effect) random numbers outside of my control.
I did not see this, because somehow the OS (or whatever) always assigns the same pointer values between different calls of the program (I suggest there is a "virtual" space that is always initialized again). Because of this, I did not suggest that pointers were the issue.
Curiously, when I ran the program with Valgrind and --tool=helgrind option, the issue did not persist. One suggestion I got (offline) was that memcheck preinitialises the memory with a given pattern, this would have been an answer if uninitialised variables had been the cause. As it seems, helgrind also controls the memory in different scopes, providing for each of my subsequent simulations a "fresh" virtual memory such that my pointer sorting was stable in the repeated loop.
I hope this helps somebody if he or she runs into the same problems. Thanks for all the suggestions!
I am new to Valgrind. Got these Valgrind message:
==932767== Invalid read of size 16
==932767== at 0x3D97D2B9AA: __strcasecmp_l_sse42 (in /lib64/libc-2.12.so)
...
==932767== Address 0x8c3e170 is 9 bytes after a block of size 7 alloc'd
==932767== at 0x6A73B4A: malloc (vg_replace_malloc.c:296)
==932767== by 0x34E821195A: ???
Here I have two questions:
the allocated block is 7 bytes, then how come the address 0x8c3e170 is in 9 bytes? Normally the pointed size is smaller than the allocated size. So under what circumstance we will meet the above issue?
the Invalide read size is 16bytes. Does it include the 2 extra bytes from "Address 0x8c3e170 is 9 bytes after a block of size 7 alloc'd"
If it weren't for the ellipsis I would say the Address 0x8c3e170... msg is directly related to the Invalid read of size 16 because it's indented further.
It's possible to get false positives, so don't rule that out. For example, it's possible that strcasecmp is reading more than it needs to as an optimization.
I read the 2nd message as the address being read from starts 9 bytes after the end of a block of size 7.
I have two suggestions, either of which will probably help you track this down:
1) Run your application under valgrind such that you can attach in a separate terminal window with gdb:
~ valgrind --vgdb=yes --vgdb-error=0 your_program
in another window:
~ gdb your_program
(gdb) target remote | vgdb
This option makes it halt as though a breakpoint were set on every problem valgrind finds
2) Compile with the undefined and/or memory sanitizers either with clang or gcc (4.9 or higher). They catch the same sorts of issues, but I find the error messages more informative.
In valgrind, we have leak logs like this
==15788== 480 bytes in 20 blocks are definitely lost in loss record 5,016 of 5,501
==20901== 112 (48 direct, 64 indirect) bytes in 2 blocks are definitely lost in loss record 3,501 of 5,122
==20901== 1,375,296 bytes in 78 blocks are possibly lost in loss record 5,109 of 5,122
==20901== Conditional jump or move depends on uninitialised value(s)
==20901== Use of uninitialised value of size 8
In Valgrind's documentation, i couldn't find exact details. Can please somebody explain
I know definitely lost means - the memory allocated is not at all freed. But can what does it mean by "20 blocks" and what it means of "lost in loss record 5,016 of 5,501". And if it says 480 bytes are lost, does it mean for one run in a loop or total ..?
In the Second line ,"112 (48 direct, 64 indirect) bytes in 2 blocks are definitely lost", what does it means "48 direct, 64 indirect".
And i understand the meaning of"possibly lost", but does that mean valgrind is not sure if it's a leak ..?
And regarding the 4th line, i have no idea at all. I checked the call stack provided along with that 4th line. I don't notice any "jump or move".
For the 5th line, it says the uninitialised is in last line of this code snippet. I don't see any uninitialised value here.
char *data = new char[somebigSize];
memset(data, '\0', somebigSize);
int sizeInt = sizeof(int);
int length = 20; //some value obtained
int position = 10;
char *newPtrVar = new char[sizeInt + 1];
memset(newPtrVar, '\0', sizeInt + 1);
memcpy(newPtrVar, &length, sizeInt);
memcpy(&data[position], newPtrVar, sizeInt);
The valgrind manual covers this in detail. It's quite complicated - see the link for full details, but in essence you can have:
"still reachable" (memory that is pointed to by a pointer).
"directly lost" (memory that is not pointed to be any live pointer)
"indirectly lost" (memory that is pointed to by a pointer that is in memory that is "directly lost)
"possibly lost" (memory that is pointed to, but the pointer does not point to the start of the memory).
The last case could be some random pointer, and it could be something like a memory manager that allocates a redzone before the memory returned to the user.
I'm playing with some basic stuff of cpp. I'm new in this language... so I'm warning that my question maybe was not correctly formulated. I appreciate any help.
The thing is that after saw the example in www.cplusplus.com/reference/cstdlib/malloc/ I found my self with this code:
#include <stdio.h>
int main (void) {
char *str;
str = (char*) malloc(2);
str[0] ='8';
str[1] ='8';
str[2] ='6';
str[3] ='\0';
printf ("%s\n",str);
}
And compiling with:
gcc -O0 -pedantic -Wall test2.cpp
(gcc version 4.7.2)
I get no errors and the output 886. Why I get no errors? Have I not passed the boundary of the allocated space?
I didn't get no errors and I got the output 886. Why no errors? Have I not passed the boundary of the allocated space?
In the case that code is ok... Why the example in the reference?
In the other (more probable) case... What are the risks?
Thanks!
You don't get any errors because C and C++ don't do bounds checking. You overwrote sections of memory that you weren't using, but you got lucky and it wasn't anything important. Compare it to putting a row of nails into a wall where you know there's a stud. If you miss the stud, most of the time, you just put a hole in the plaster, but it's dangerous to keep doing it because eventually, you're going to hit one of the live wires instead.
You have passed over the boundary of the allocated memory.
However, printf does not bother what size of a memory you have declared. All it cares is it will start from the start and continue till it finds a 0.
The case you created is an undefined behaviour. There can be some other data right after your allocated region (maybe another variable) in which case it will get corrupted. If the next part is unallocated memory you might escape without a visible problem. And if the memory right after your allocated memory belongs to another process, you will see the nice and tidy Segmentation Fault. The consequences can be even worse, so better not try this anywhere.
the following can be found in comments in malloc.c of glibc:
Minimum overhead per allocated chunk: 4 or 8 bytes Each malloced
chunk has a hidden word of overhead holding size and status
information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4
overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
ptrs but 4 byte size) or 24 (for 8/8) additional bytes are needed;
4 (8) for a trailing size field and 8 (16) bytes for free list
pointers. Thus, the minimum allocatable size is 16/24/32 bytes.
Since minimum allocated size would be 16/24/32, since it is greater than 3 bytes your program ran without errors. This is one of the possibility executing your program correctly.
I am using PageHeap to identify heap corruption. My application has a heap corruption. But the application breaks(due to crash) when it creates an stl object for a string passed to a method. I cannot see any visible memory issues near the crash location. I enabled full page heap for detecting heap corruption and /RTCs for detcting stack corruption.
What should I do to break at the exact location where the heap corruption occurs?
Enabling FULL pageheap can increase the chances of the debugger catching a heap corruption as it's happening:
gflags /p /enable /full <processname>
Also, if you can find out what address is getting overwritten, you can set up a breakpoint on memory access in windbg. Not sure if the VS debugger has the same feature.
Pageheap does not always detect heap corruption exactly at the moment when it occurs.
Pageheap inserts an invalid page right after allocations. So whenever you overrun an allocated block you get an AV. But there are other possible cases. One example is writing just before an allocated block corrupting heap block header data structure. Heap block header is a valid writable memory (most likely in the same page with the allocated block). Consider the following example:
#include <stdlib.h>
int
main()
{
void* block = malloc(100);
int* intPtr = (int*)block;
*(intPtr-1) = 0x12345; // no crash
free(block); // crash
return 0;
}
So writing some garbage just before the allocated block passes just fine. With Pageheap enabled the example breaks inside free() call. Here is the call stack:
verifier.dll!_VerifierStopMessage#40() + 0x206 bytes
verifier.dll!_AVrfpDphReportCorruptedBlock#16() + 0x239 bytes
verifier.dll!_AVrfpDphCheckNormalHeapBlock#16() + 0x11a bytes
verifier.dll!_AVrfpDphNormalHeapFree#16() + 0x22 bytes
verifier.dll!_AVrfDebugPageHeapFree#12() + 0xe3 bytes
ntdll.dll!_RtlDebugFreeHeap#12() + 0x2f bytes
ntdll.dll!#RtlpFreeHeap#16() + 0x36919 bytes
ntdll.dll!_RtlFreeHeap#12() + 0x722 bytes
heapripper.exe!free(void * pBlock=0x0603bf98) Line 110 C
> heapripper.exe!main() Line 11 + 0x9 bytes C++
heapripper.exe!__tmainCRTStartup() Line 266 + 0x12 bytes C
kernel32.dll!#BaseThreadInitThunk#12() + 0xe bytes
ntdll.dll!___RtlUserThreadStart#8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart#8() + 0x1b bytes
Pageheap enables rigorous heap consistency checks, but the checks do not kick in untill some other heap API is called. The check routines are seen on stack. (Without Pageheap the application would probably just AV in heap implementation attempting to use an invalid pointer.)
So Pageheap does not give you 100% guarantee to catch a corruption exactly at the moment when it occurs. You need tools like Purify or Valgrind that track every memory access.
Don't get me wrong, I think Pageheap is still very useful. It causes much less performance degradation compared to the mentioned Purify and Valgrind, so it allows running much more complex scenarios.