unexpected successful copy based on strlen - c++

I was reviewing my skills with pointers and buffer in C++. I tried the code below and everything works fine. No leaks, no crash, nothing.
To be honest I didn't expect this.
When I call char* buf2 = new char[strlen(buf)] I didn't expect srlen(buf) returning the right size. I always thought that strlen
needs a NULL terminated string to work. Here it is not the case so why it is working this code?
int main(){
const char* mystr = "mineminemine";
char* buf = new char[strlen(mystr)];
memcpy(buf, mystr, strlen(mystr));
char* buf2 = new char[strlen(buf)];
memcpy(buf2, buf, strlen(buf));
delete[] buf2;
delete[] buf;
}

That's called undefined behavior - the program appears working but you can't rely on that.
When memory is allocated there happens a null character somewhere that is close enough to the start of the buffer and the program can technically access all memory between that null character and the start of the buffer so you don't observe a crash.
You can't rely on that behavior. Don't write code like that, always allocate enough space to store the terminating null character.

Consider another way to do the same thing:
int main(){
std::string mystr = "mineminemine";
std::string mystr2 = mystr;
}
Internally you have a buffer with a null terminating character added. When you copy a standard string you don't have to worry about keeping track of the start and end of the buffer.
Now considering the lifetime of the strings these two variables are declared on the stack and destroyed when main goes out of scope (e.g. terminationa). If you need strings to be shared amongst objects and you do not necessarily know when they will be destroyed I recommend considering using boost shared pointers.

Related

Memory Leak (char[])

When I run my program, it can run for a while, then all of the sudden, it experiences a huge memory leak. I traced it out using a snapshot of the heap when it crashed, and I have a mysterious char[] with the size of 232,023,801 Bytes. The minutes preceding crash have no unusual behavior until then. The only places where I use char arrays is in the following piece of code:
string ReadString(DWORD64 addr) {
char* buffer = new char[128];
bool validChar = true;
for (int c = 0; c < 128 && validChar; c++) {
buffer[c] = Mem.Read<char>(addr+ (0x1 * c), sizeof(char));
if (!isalnum(buffer[c]) && !ispunct(buffer[c]))
validChar = false;
}
string ret= string(buffer);
delete[] buffer;
return ret;
}
All this code should be doing is reading a few characters from memory, saving the char array to a string, cleaning up the array, and returning the string. How is the memory leak originating from here? Or does the char[] in the heap snapshot potentially point to another issue?
Assuming that string here is std::string:
You call string(buffer) which assumes that buffer is 0-terminated and allocates a new string. But your code doesn't ensure that buffer is actually 0-terminated, so this can cause undefined behavior, including potentially crashing or allocating too much memory for the string.
You probably want to use the string(buffer, size) constructor instead, which doesn't require buffer to be 0-terminated.
I'd also recommend avoiding the manual new/delete. One way to do this is to create an empty string and push_back the characters you read to it. This avoid the need for buffer.

c++: Does the new operator for dynamic allocation check for memory safety?

My question arises from one of my c++ exercises (from Programming Abstraction in C++, 2012 version, Exercise 12.2). Here it is:
void strcpy(char *dst, char *src) {
while (*dst++ = *src++);
}
The definition of strcpy is dangerous. The danger stems from the fact
that strcpy fails to check that there is sufficient space in the
character array that receives the copy, thereby increasing the chance
of a buffer-overflow error. It is possible, however, to eliminate much
of the danger by using dynamic allocation to create memory space for
the copied string. Write a function
char *copyCString(char *str);
that allocates enough memory for the C-style string str and then
copies the characters—along with the terminating null character—into
the newly allocated memory.
Here's my question:
Is this new method really safe? Why it's safe?
I mean, to be a little bit radical, what if there isn't enough space in the heap?
Is the new operator able to check for space availability and fall in an elegant way if there isn't enough space?
Will that cause other kind of "something-overflow"?
If new fails to allocate the requested memory, it's supposed to throw a std::bad_alloc exception (but see below for more). After that, the stack will be unwound to the matching exception handler, and it'll be up to your code to figure out what to do from there.
If you really want/need to assure against an exception being thrown, there is a nothrow version of new you can use that will return a null pointer to signal failure--but this is included almost exclusively for C compatibility, and not frequently used (or useful).
For the type of situation cited in the question, you normally want to use std::string instead of messing with allocating space yourself at all.
Also note that on many modern systems, the notion of new either throwing or returning a null pointer in case of failure, is really fairly foreign. In reality, Windows will normally attempt to expand the paging file to meet your request. Linux has an "OOMKiller" process that will attempt to find "bad" processes and kill them to free up memory if you run out.
As such, even though the C++ standard (and the C standard) prescribe what should happen if allocation fails, that's rarely what happens in real life.
New operator will throw bad_alloc exception if it cannot alocate memory, unless nothrow specified. If you specify constant nothrow you will get NULL pointer back if it cannot alocate memory.
The code for strcpy is unsafe because it will try copying outside of the allocated memory for the dst pointer. Example:
int main()
{
const char* s1 = "hello"; // allocated space for 6 characters
char* s2 = new char[ 2 ]; // allocated space for 2 characters.
strcpy( s2, s1 );
cout << s2 << endl;
char c; cin >> c;
return 0;
}
This prints the correct value "hello", but remember that the pointer s2 was allocated to only have space for 2 characters. So we can assume that the other characters were written to the subsequent memory slots, which is unsafe as we could be overwriting data or accessing invalid memory.
Consider this solution:
char* e4_strdup( const char*& c )
{
// holds the number of space required for the c-string
unsigned int sz{ 0 };
// since c-style strings are terminated by the '\0' character,
// increase the required space until we've found a '\0' character.
for ( const char* p_to_c = c; *p_to_c != '\0'; ++p_to_c )
++sz;
// allocate correct amount of space for copy.
// we do ++sz during allocation because we must provide enough space for the '\0' character.
char* c_copy{ new char[ ++sz ] }; // extra space for '\0' character.
for ( unsigned int i{ 0 }; i < sz; ++i )
c_copy[ i ] = c[ i ]; // copy every character onto allocated memory
return c_copy;
}
The new operator will still return a std::bad_alloc exception if you run out of memory.

program crash while using char*

While running following code, my program crashes unexpectedly!
#include<stdio.h>
#include<string.h>
int main(){
char *str = NULL;
strcpy(str, "swami");
printf("%s", str);
return 0;
}
But if I do like this:
#include<stdio.h>
#include<string.h>
int main(){
char *str;
strcpy(str, "swami");
printf("%s", str);
return 0;
}
This code works fine and generates correct output!
I am using gcc compiler(codeblocks IDE). Also, both the codes lead to program crash in DevCpp. Can anyone please explain me why this is so!?
You cannot write to NULL pointers.
In the second case, it happened that your pointer was randomly initialized to a valid location in your program's memory. That is why you could do a strcpy into it.
Change both programs to have
str = malloc(size)
or the option with calloc before the strcpy. (size is the size of the space you want to reserve.)
As per comment, you can also change the declaration of str to be char str[6] (or more).
Last edit: I'll present you this picture showing the memory of your program and the pointers:
The gray areas and the red one are forbidden (you cannot write or read from them; the top gray one is for kernel memory while the others are spaces not yet reclaimed). The red area at the bottom is the special 0 page. Since NULL is 0 your str = NULL will point to this and your program will fail.
If you don't assign anything to str it will end up pointing randomly. It can still point to the red area or to a grey area -> your program will fail. It could point to a green or blue (both hues) area, making your program work (excepting the cases where it is pointing to a read-only location and you write to it). Allocating area for the pointer makes it point to the green area, enlarging it to the top.
The other option, with str[6] enlarges the stack area to bottom. All local variables have space reserved into the stack while all space allocated with malloc, realloc, calloc and other friends goes into the heap.
Lastly, have a look at a blog article about the difference between char[] and char *.
PS: If you'd want to use a GNU extension you can look into the asprintf function. It would allocate space for a string and write some content there:
asprintf(&str, "swami");
or
asprintf(&str, "%d + %d == %d\n", 1, 2, 3);
But, if you want portability, you'd stay away from this function.
In neither version have you allocated memory to copy the string to, so both invoke undefined behaviour. The first one crashes because you explicitly initialised str to NULL, so the strcpy dereferences a NULL pointer, that crashes on most systems. In the second, str points to arbitrary memory, dereferencing that uninitialised pointer may or may not crash.
Because NULL is #define NULL ((void *)0) in <stdlib.h>. So, you try to write in invalid memory address that make your program crash.
Read this link
//destination =Pointer to the destination array where the content is to be copied.
char * strcpy ( char * destination, const char * source );
You're setting destination to NULL, so you're trying to copy source to NULL. That's why it crashes. You should be setting some memory asside to copy the string to instead.
int main(){
char *str=malloc(6); //enough for "swami"+'\0'
strcpy(str, "swami");
printf("%s", str);
return 0;
}
strcpy just copies, not generating space for it.
in the first case you tried to write the string to the beginning of the code segment: not a good idea.
in the second case, you started writing the string to somewhere, and in your case didn't crash do to luck and maybe compiler help.
you should do one of the following:
a. allocate memory: str = new char[10]
b. use strdup witch well duplicate the string into a new location.

good manier to get char[] from another function. Starting thinking in c/c++

As I understood the correct programming style tells that if you want to get string (char []) from another function is best to create char * by caller and pass it to string formating function together with created string length. In my case string formating function is "getss".
void getss(char *ss, int& l)
{
sprintf (ss,"aaaaaaaaaa%d",1);
l=11;
}
int _tmain(int argc, _TCHAR* argv[])
{
char *f = new char [1];
int l =0;
getss(f,l);
cout<<f;
char d[50] ;
cin>> d;
return 0;
}
"getss" formats string and returns it to ss*. I thought that getss is not allowed to got outside string length that was created by caller. By my understanding callers tells length by variable "l" and "getcc" returns back length in case buffer is not filled comleatly but it is not allowed go outside array range defined by caller.
But reality told me that really it is not so important what size of buffer was created by caller. It is ok, if you create size of 1, and getss fills with 11 characters long. In output I will get all characters that "getss" has filled.
So what is reason to pass length variable - you will always get string that is zero terminated and you will find the end according that.
What is the reason to create buffer with specified length if getss can expand it?
How it is done in real world - to get string from another function?
Actually, the caller is the one that has allocated the buffer and knows the maximum size of the string that can fit inside. It passes that size to the function, and the function has to use it to avoid overflowing the passed buffer.
In your example, it means calling snprintf() rather than sprintf():
void getss(char *ss, int& l)
{
l = snprintf(ss, l, "aaaaaaaaaa%d", 1);
}
In C++, of course, you only have to return an instance of std::string, so that's mostly a C paradigm. Since C does not support references, the function usually returns the length of the string:
int getss(char *buffer, size_t bufsize)
{
return snprintf(buffer, bufsize, "aaaaaaaaaa%d", 1);
}
You were only lucky. Sprintf() can't expand the (statically allocated) storage, and unless you pass in a char array of at least length + 1 elements, expect your program to crash.
In this case you are simply lucky that there is no "important" other data after the "char*" in memory.
The C runtime does not always detect these kinds of violations reliably.
Nonetheless, your are messing up the memory here and your program is prone to crash any time.
Apart from that, using raw "char*" pointers is really a thing you should not do any more in "modern" C++ code.
Use STL classes (std::string, std::wstring) instead. That way you do not have to bother about memory issues like this.
In real world in C++ is better to use std::string objects and std::stringstream
char *f = new char [1];
sprintf (ss,"aaaaaaaaaa%d",1);
Hello, buffer overflow! Use snprintf instead of sprintf in C and use C++ features in C++.
By my understanding callers tells length by variable "l" and "getcc" returns back length in case buffer is not filled comleatly but it is not allowed go outside array range defined by caller.
This is spot on!
But reality told me that really it is not so important what size of buffer was created by caller. It is ok, if you create size of 1, and getss fills with 11 characters long. In output I will get all characters that "getss" has filled.
This is absolutely wrong: you invoked undefined behavior, and did not get a crash. A memory checker such as valgrind would report this behavior as an error.
So what is reason to pass length variable.
The length is there to avoid this kind of undefined behavior. I understand that this is rather frustrating when you do not know the length of the string being returned, but this is the only safe way of doing it that does not create questions of string ownership.
One alternative is to allocate the return value dynamically. This lets you return strings of arbitrary length, but the caller is now responsible for freeing the returned value. This is not very intuitive to the reader, because malloc and free happen in different places.
The answer in C++ is quite different, and it is a lot better: you use std::string, a class from the standard library that represents strings of arbitrary length. Objects of this class manage the memory allocated for the string, eliminating the need of calling free manually.
For cpp consider smart pointers in your case propably a shared_ptr, this will take care of freeing the memory, currently your program is leaking memory since, you never free the memory you allocate with new. Space allocate by new must be dealocated with delete or it will be allocated till your programm exits, this is bad, imagine your browser not freeing the memory it uses for tabs when you close them.
In the special case of strings I would recommend what OP's said, go with a String. With Cpp11 this will be moved (not copied) and you don't need to use new and have no worries with delete.
std::string myFunc() {
std::string str
//work with str
return str
}
In C++ you don't have to build a string. Just output the parts separately
std::cout << "aaaaaaaaaa" << 1;
Or, if you want to save it as a string
std::string f = "aaaaaaaaaa" + std::to_string(1);
(Event though calling to_string is a bit silly for a constant value).

Memcpy, string and terminator

I have to write a function that fills a char* buffer for an assigned length with the content of a string. If the string is too long, I just have to cut it. The buffer is not allocated by me but by the user of my function. I tried something like this:
int writebuff(char* buffer, int length){
string text="123456789012345";
memcpy(buffer, text.c_str(),length);
//buffer[length]='\0';
return 1;
}
int main(){
char* buffer = new char[10];
writebuff(buffer,10);
cout << "After: "<<buffer<<endl;
}
my question is about the terminator: should it be there or not? This function is used in a much wider code and sometimes it seems I get problems with strange characters when the string needs to be cut.
Any hints on the correct procedure to follow?
A C-style string must be terminated with a zero character '\0'.
In addition you have another problem with your code - it may try to copy from beyond the end of your source string. This is classic undefined behavior. It may look like it works, until the one time that the string is allocated at the end of a heap memory block and the copy goes off into a protected area of memory and fails spectacularly. You should copy only until the minimum of the length of the buffer or the length of the string.
P.S. For completeness here's a good version of your function. Thanks to Naveen for pointing out the off-by-one error in your terminating null. I've taken the liberty of using your return value to indicate the length of the returned string, or the number of characters required if the length passed in was <= 0.
int writebuff(char* buffer, int length)
{
string text="123456789012345";
if (length <= 0)
return text.size();
if (text.size() < length)
{
memcpy(buffer, text.c_str(), text.size()+1);
return text.size();
}
memcpy(buffer, text.c_str(), length-1);
buffer[length-1]='\0';
return length-1;
}
If you want to treat the buffer as a string you should NULL terminate it. For this you need to copy length-1 characters using memcpy and set the length-1 character as \0.
it seems you are using C++ - given that, the simplest approach is (assuming that NUL termination is required by the interface spec)
int writebuff(char* buffer, int length)
{
string text = "123456789012345";
std::fill_n(buffer, length, 0); // reset the entire buffer
// use the built-in copy method from std::string, it will decide what's best.
text.copy(buffer, length);
// only over-write the last character if source is greater than length
if (length < text.size())
buffer[length-1] = 0;
return 1; // eh?
}
char * Buffers must be null terminated unless you are explicitly passing out the length with it everywhere and saying so that the buffer is not null terminated.
Whether or not you should terminate the string with a \0 depends on the specification of your writebuff function. If what you have in buffer should be a valid C-style string after calling your function, you should terminate it with a \0.
Note, though, that c_str() will terminate with a \0 for you, so you could use text.size() + 1 as the size of the source string. Also note that if length is larger than the size of the string, you will copy further than what text provides with your current code (you can use min(length - 2, text.size() + 1/*trailing \0*/) to prevent that, and set buffer[length - 1] = 0 to cap it off).
The buffer allocated in main is leaked, btw
my question is about the terminator: should it be there or not?
Yes. It should be there. Otherwise how would you later know where the string ends? And how would cout would know? It would keep printing garbage till it encounters a garbage whose value happens to be \0. Your program might even crash.
As a sidenote, your program is leaking memory. It doesn't free the memory it allocates. But since you're exiting from the main(), it doesn't matter much; after all once the program ends, all the memory would go back to the OS, whether you deallocate it or not. But its good practice in general, if you don't forget deallocating memory (or any other resource ) yourself.
I agree with Necrolis that strncpy is the way to go, but it will not get the null terminator if the string is too long. You had the right idea in putting an explicit terminator, but as written your code puts it one past the end. (This is in C, since you seemed to be doing more C than C++?)
int writebuff(char* buffer, int length){
char* text="123456789012345";
strncpy(buffer, text, length);
buffer[length-1]='\0';
return 1;
}
It should most defiantly be there*, this prevents strings that are too long for the buffer from filling it completely and causing an overflow later on when its accessed. though imo, strncpy should be used instead of memcpy, but you'll still have to null terminate it. (also your example leaks memory).
*if you're ever in doubt, go the safest route!
First, I don't know whether writerbuff should terminate the string or not. That is a design question, to be answered by the person who decided that writebuff should exist at all.
Second, taking your specific example as a whole, there are two problems. One is that you pass an unterminated string to operator<<(ostream, char*). Second is the commented-out line writes beyond the end of the indicated buffer. Both of these invoke undefined behavior.
(Third is a design flaw -- can you know that length is always less than the length of text?)
Try this:
int writebuff(char* buffer, int length){
string text="123456789012345";
memcpy(buffer, text.c_str(),length);
buffer[length-1]='\0';
return 1;
}
int main(){
char* buffer = new char[10];
writebuff(buffer,10);
cout << "After: "<<buffer<<endl;
}
In main(), you should delete the buffer you allocated with new., or allocate it statically (char buf[10]). Yes, it's only 10 bytes, and yes, it's a memory "pool," not a leak, since it's a one-time allocations, and yes, you need that memory around for the entire running time of the program. But it's still a good habit to be into.
In C/C++ the general contract with character buffers is that they be null-terminiated, so I would include it unless I had been explicitly told not to do it. And if I did, I would comment it, and maybe even use a typedef or name on the char * parameter indicating that the result is a string that is not null terminated.