Why do I get "double free or corruption"? - c++

I am trying to serialize a struct, but the program crashed with:
*** glibc detected *** ./unserialization: double free or corruption (fasttop): 0x0000000000cf8010 ***
#include <iostream>
#include <cstdlib>
#include <cstring>
struct Dummy
{
std::string name;
double height;
};
template<typename T>
class Serialization
{
public:
static unsigned char* toArray (T & t)
{
unsigned char *buffer = new unsigned char [ sizeof (T) ];
memcpy ( buffer , &t , sizeof (T) );
return buffer;
};
static T fromArray ( unsigned char *buffer )
{
T t;
memcpy ( &t , buffer , sizeof (T) );
return t;
};
};
int main ( int argc , char **argv )
{
Dummy human;
human.name = "Someone";
human.height = 11.333;
unsigned char *buffer = Serialization<Dummy>::toArray (human);
Dummy dummy = Serialization<Dummy>::fromArray (buffer);
std::cout << "Name:" << dummy.name << "\n" << "Height:" << dummy.height << std::endl;
delete buffer;
return 0;
}

I see two problems with this code:
You are invoking undefined behavior by memcpying a struct containing a std::string into another location. If you memcpy a class that isn't just a pure struct (for example, a std::string), it can cause all sorts of problems. In this particular case, I think that part of the problem might be that std::string sometimes stores an internal pointer to a buffer of characters containing the actual contents of the string. If you memcpy the std::string, you bypass the string's normal copy constructor that would duplicate the string. Instead, you now have two different instances of std::string sharing a pointer, so when they are destroyed they will both try to delete the character buffer, causing the bug you're seeing. There is no easy fix for this other than to not do what you're doing. It's just fundamentally unsafe.
You are allocating memory with new[], but deleting it with delete. You should use the array deleting operator delete[] to delete this memory, since using regular delete on it will result in undefined behavior, potentially causing this crash.
Hope this helps!

It's not valid to use memcpy() with a data element of type std::string (or really, any non-POD data type). The std::string class stores the actual string data in a dynamically-allocated buffer. When you memcpy() the contents of a std::string around, you are obliterating the pointers allocated internally and end up accessing memory that has been already freed.
You could make your code work by changing the declaration to:
struct Dummy
{
char name[100];
double height;
};
However, that has the disadvantages of a fixed size name buffer. If you want to maintain a dynamically sized name, then you will need to have a more sophisticated toArray and fromArray implementation that doesn't do straight memory copies.

You're copying the string's internal buffer in the toArray call. When deserializing with fromArray you "create" a second string in dummy, which thinks it owns the same buffer as human.

std::string probably contains a pointer to a buffer that contains the string data. When you call toArray (human), you're memcpy()'ing the Dummy class's string, including the pointer to the string's data. Then when you create a new Dummy object by memcpy()'ing directly into it, you've created a new string object with the same pointer to string data as the first object. Next thing you know, dummy gets destructed and the copy of the pointer gets destroyed, then human gets destructed and BAM, you got a double free.
Generally, copying objects using memcpy like this will lead to all sorts of problems, like the one you've seen. Its probably just going to be the tip of the ice berg. Instead, you might consider explicitly implementing some sort of marshalling function for each class you want to serialize.
Alternatively, you might look into json libraries for c++, which can serialize things into a convenient text based format. JSON protocols are commonly used with custom network protocols where you want to serialize objects to send over a socket.

Related

Can't add item to const char* vector? C++

I have an issue where I need a vector of const char* but for some reason whenever I try adding something nothing happens. Here is the code sample in question.
std::vector<const char*> getArgsFromFile(char* arg) {
std::ifstream argsFile(arg);
std::vector<const char*> args;
while (!argsFile.eof()) {
std::string temp;
argsFile >> temp;
args.push_back(temp.c_str());
}
args.pop_back();
return args;
}
The strange part is if I make this change
std::vector<const char*> getArgsFromFile(char* arg) {
std::ifstream argsFile(arg);
std::vector<const char*> args;
while (!argsFile.eof()) {
std::string temp;
argsFile >> temp;
const char* x = "x";
args.push_back(x);
}
args.pop_back();
return args;
}
It will add 'x' to the vector but I can't get the value of temp into the vector. Any thoughts? Help would be greatly appreciated. Thanks!
A const char* is not a string, but merely a pointer to some memory, usually holding some characters. Now std::string under the hood either holds a small region of memory (like char buff[32]) or, for larger strings, keeps a pointer to memory allocated on the heap. In either case, a pointer to the actual memory holding the data can be obtained via string::c_str(). But when the string goes out of scope that pointer no longer points to secured data and becomes dangling.
This is the reason why C++ introduced methods to avoid direct exposure and usage of raw pointers. Good C++ code avoid raw pointers like the plague. Your homework is for poor/bad C++ code (hopefully only to learn the problems that come with such raw pointers).
So, in order for the pointers held in your vector to persistently point to some characters (and not become dangling), they must point to persistent memory. The only guaranteed way to achieve that is to dynamically allocate the memory
while (!argsFile.eof()) {
std::string temp;
argsFile >> temp;
char* buff = new char[temp.size()+1]; // allocate memory
std::strncpy(buff,temp.c_str(),temp.size()+1); // copy data to memory
args.push_back(buff); // store pointer in vector
}
but then the memory allocated in this way will be leaked, unless you de-allocate it as in
while(!args.empty()) {
delete[] args.back();
args.pop_back();
}
Note that this is extremely bad C++ code and not exception safe (if an exception occurs between allocation and de-allocation, the allocated memory is leaked). In C++ one would instead use std::vector<std::string> or perhaps std::vector<std::unique_ptr<const char[]> (if you cannot use std::string), both being exception safe.
Use a standard-library-based implementation
Guideline SL.1 of the C++ coding guidelines says: "Use the standard library whenever possible" (and relevant). Why work so hard? People have already done most of the work for you...
So, using your function's declaration, you could just have:
std::vector<std::string> getArgsFromFile(char* arg) {
using namespace std;
ifstream argsFile(arg);
vector<string> args;
copy(istream_iterator<string>(argsFile),
istream_iterator<string>(),
back_inserter(args));
return args;
}
and Bob's your uncle.
Still, #Walter's answer is very useful to read, so that you realize what's wrong with your use of char * for strings.

how to initialize a char pointer in a class?

I have a confusion while dealing with char pointers. Please have a look at following code:
class Person
{
char* pname;
public:
Person(char* name)
{
//I want to initialize 'pname' with the person's name. So, I am trying to
//achieve the same with different scenario's
//Case I:
strcpy(pname, name); // As expected, system crash.
//Case II:
// suppose the input is "ABCD", so trying to create 4+1 char space
// 1st 4 for holding ABCD and 1 for '\0'.
pname = (char*) malloc(sizeof(char) * (strlen(name)+1) );
strcpy(pname, name);
// Case III:
pname = (char*) malloc(sizeof(char));
strcpy(pname, name);
}
void display()
{
cout<<pname<<endl;
}
};
void main()
{
Person obj("ABCD");
obj.display();
}
For Case I:
As expected, system crash.
Output for Case II:
ABCD
Output for Case III:
ABCD
So, I am not sure why Case II & III are producing the same output !!!!.....
How I should initialize a char pointer in a class?
The third case invokes Undefined Behavior and so anything might happen in that case.
You are writing beyond the bounds of allocated memory in this case which may or maynot crash but is a UB.
How to do this the right way in C++?
By not using char * at all!
Just simply use std::string.
Note that std::string provides you with c_str() function which gets you the underlying character string. Unless, You are bothered about passing ownership of a char * to a c-style api you should always use std::string in c++.
The third option is also wrong as you haven't allocated enough memory for it. You're trying to copy a string of size 5 to a buffer of size 1, which means the data after pname[1] are incorrectly overwritten and gone..
If you're lucky, you may see a runtime error such as memory access violation, or you won't see anything but the data behind it is corrupted, e.g., your bank account, and you never know about it until..
The correct way to go is to always allocate enough memory to copy to. A better way in C++ is to use std::string, as Als points out, because it'll free you from manual management of memory (allocation, growing, deallocation, etc).
E.g.,
class Person
{
std::string pname;
public:
Person(char* name)
{
pname = name;
}
void display()
{
cout << pname << endl;
}
};
void main()
{
Person obj("ABCD");
obj.display();
}
You have to allocate memory for your member variable pname, however, I don't know why you want to use a char* when you can just use a string:
std::string pname;
//...
pname = std::string(name);
If there is a good reason why you must use a char*, then do something of the sort:
// initialize the pname
pname = new char[strlen(name)];
// copy the pname
strcpy(pname, name);
The reason why you don't need to allocate an extra space at the end of the string for null-termination is because using the double quotes "blah" automatically produces a null-terminated string.
If you are into C++ business, then it's time to dump char pointers on behalf of STL string:
#include <string>
class Person
{
std::string the_name;
public:
Person(std::string name) : the_name(name)
{ ...
Also cout is used the same.
In your Case III, you do pname = (char*) malloc(sizeof(char));, which allocates enough memory for a single char. However, strcpy has no way of knowing that, and writes over whatever memory comes directly after that byte, until it has finished copying over all of the char* you passed into the function. This is known as a buffer overflow, and while this might immediately work, it could possibly break something down the road. If you are looking to copy only a subsection of the char*, you could look into strncpy, which copies up to some length (API reference here). If you use that, be sure to add the null-terminating character yourself, as strncpy will not include it if you copy only part of the string.
That pname = (char*) malloc(sizeof(char)); works is coincidental, the call to strcpy writes into memory that hasn't been allocated, so it could crash your program at any time.
A simpler way to initialize your buffer would be:
pname = strdup(name);
or
pname = strndup(name, strlen(name));
See http://linux.die.net/man/3/strdup.
Also, you must think about freeing the memory allocated by calling free(pname); in the class destructor.
All in all, all of this can be avoided by the use of the C++ std::string class, as mentioned by everyone.
Correct is case II!
Yes, case I is wrong, it will crash since you are copying data to a non initialized pointer.
Case III is also wrong, but it works now because your test string is small! If you try with a bigger string it will corrupt memory since you are copying a big string to a small allocated space.
In some systems malloc works with clusters so it works by allocating chuncks of memory instead of allocating byte-by-byte. This means that when you used malloc to alocate a single byte (like you did in case III), it allocates some more up to reach the minimum block of memory it can handle, that's why you could move more then 1 byte to it without crashing the system.

help with fixing memory leak

i have a member function in which i need to get some char array at run time
My fear
Is if i try
delete buffer;
then i cant
return buffer;
But how to i release the memory i allocated with
char * buffer= new char[size]
The class
class OpenglShaderLoader
{
char * getLastGlslError()
{
char * buffer;//i don't know the size of this until runtime
int size;
glShaderiv(hShaderId,GL_INFO_LOG_LENGTH,&size);//get size of buffer
buffer= new char[size];
//.. fill in the buffer
return buffer;
}
}
You should return a std::vector<char>. That way, when the caller finishes using the vector, its contents are freed automatically.
std::vector<char> getLastGlslError()
{
int size;
glShaderiv(hShaderId, GL_INFO_LOG_LENGTH, &size);
std::vector<char> buffer(size);
// fill in the buffer using &buffer[0] as the address
return buffer;
}
There is a simple adage - for every new there must be a delete, in your case, in relation to the class OpenglShaderLoader, when you call getLastGlsError, it returns a pointer to the buffer, it is there, that you must free up the memory, for example:
OpenglShaderLoader *ptr = new OpenglShaderLoader();
char *buf = ptr->getLastGlsError();
// do something with buf
delete [] buf;
You can see the responsibility of the pointer management rests outside the caller function as shown in the above code example/
You'd need another method, such as:
void freeLastGlslError(const char* s)
{
delete [] s;
}
But since you're using C++, not C, you shouldn't return a char*. For an object-oriented design, use a string class that manages the memory for you, like std::string. (Here's the litmus test to keep in mind: if memory is being freed outside of a destructor, you're probably doing something inadvisable.)
Here's a trick how to do it:
class A {
public:
A() : buffer(0) { }
char *get() { delete [] buffer; buffer = new char[10]; return buffer; }
~A() { delete [] buffer; }
private:
char *buffer;
}
When you return that pointer, whatever you're returning the pointer to should assume responsibility over that resource (i.e. delete it when done with it).
Alternatively, you can use a smart pointer to automatically delete the memory for you when nothing points to it.
Creating and returning a stl container or class (e.g. std::vector, std::string) is also a viable option.
Don't return a primitive char*. Encapsulate it in a class.
Assuming that the char array is really not a NULL terminated string, you need to include the size of it on return anyway. (It is sort of messy to continuously call glShaderiv to get the length, especially if it has performance implications. Easier to store the size with the allocation.)
Some have suggested using std::string or std::vector as the return. While each of these will work to a varying degree, they don't tell you what it is that is in each instance. Is it a string you print or is it an array of signed 8 bit integers?
A vector might be closer to what you need, but when you're looking at the code a year from now you won't know if the output vector of one method contains shader info when compared to another method that also returns a vector. There may also be implications of vector that make it undesirable for things like filling the buffer by passing a pointer to a device driver method since the storage is technically hidden.
So putting the return in a class that allocates your buffer and stores the size of the allocation allows you to let the return instance go out of scope and delete the buffer when the caller is done with it.
now body mentioned managed pointers yet?
If you don't need the features of a vector then ::array_ptr<char> might also help rather than rolling your own as in tp1's answer. Depending on version of compiler, available in boost/TR1/std.
boost::array_ptr<char> getLastGlslError()
{
int size;
glShaderiv(hShaderId, GL_INFO_LOG_LENGTH, &size);
boost::array_ptr<char> buffer = new char[size];
return buffer;
}

Passing C style string around function

I have a situation as below in which I need to pass a C-style string into a function and stored it into a container that needed to be used later on. The container is storing the char*. I couldn't figure out the efficient way to create the memory and store it into the vector. As in overloadedfunctionA (int), I need to create new memory and copy into buffer, and pass into the overloadedfunctionA(char*) which again create new memory and copy into the buffer again. Imagine I have alot of items in int and other types and I am doing twice the work every time. One way to solve it is to copy the logic from overloadedfunctionA(char*) to overloadedfunctionA(int). But it would resulted in alot of repetitive codes. Any ideas on a more efficient way to do this?
Thanks.
int main(){
overloadedfunctionA(5);
overloadedfunctionA("abc");
}
vector<char*> v1;
void overloadedfunctionA(int intA){
char* buffer = new char[];
convert int to char in buffer;
overloadedfunctionA(buffer1);
delete buffer;
}
//act as base function that has a lot of logic need to be performed
void overloadedfunctionA(char* string1){
char* buffer = new char[];
copy string to buffer;
insert string into vector1;
}
For all that's holy, just use std::string internally. You can assign to a std::string from a const char*, and you can access the std::string's underlying C string through the c_str() method.
Otherwise, you'd need to write a wrapper class that handles memory allocation for you, and store instances of that class in your vector; but then, std::string already IS such a wrapper class.
Hopefully making the earlier solutions easier for you to follow...
#include <vector>
#include <string>
#include <sstream>
std::vector<std::string> the_vector;
// template here... use an overload if you prefer...
template <typename T>
void fn(const T& t)
{
std::ostringstream oss;
oss << t;
the_vector.push_back(t.str());
}
int main()
{
fn(3);
fn("whatever");
}
One optimization you can do is to maintain a static hashmap of strings, so that you can use a single instance for duplicate values.
Another solution is to use std::string and redeclare overloadedfunctionA(char*) as overloadedfunctionA(const std::string&).
Yet another option is to let a Garbage Collector do the memory management.
If your strings are long such that copying is likely to be a performance problem, you could consider boost::shared_array as an alternative to std::string. Provided the contents are invariant, this offers a solution where only one copy of the string will need to be held in memory. Note however that you would be trading reference counting overhead for string copying overhead here.
Some compilers may optimize std::string for you this way anyway, so that a single copy is kept and ref-counted until the contents are changed, when a new array is allocated for the modified contents with its refcount decoupled from the original contents

Caching a const char * as a return type

Was reading up a bit on my C++, and found this article about RTTI (Runtime Type Identification):
http://msdn.microsoft.com/en-us/library/70ky2y6k(VS.80).aspx . Well, that's another subject :) - However, I stumbled upon a weird saying in the type_info-class, namely about the ::name-method. It says: "The type_info::name member function returns a const char* to a null-terminated string representing the human-readable name of the type. The memory pointed to is cached and should never be directly deallocated."
How can you implement something like this yourself!? I've been struggling quite a bit with this exact problem often before, as I don't want to make a new char-array for the caller to delete, so I've stuck to std::string thus far.
So, for the sake of simplicity, let's say I want to make a method that returns "Hello World!", let's call it
const char *getHelloString() const;
Personally, I would make it somehow like this (Pseudo):
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
.. But this would mean that the caller should do a delete[] on my return pointer :(
Thx in advance
How about this:
const char *getHelloString() const
{
return "HelloWorld!";
}
Returning a literal directly means the space for the string is allocated in static storage by the compiler and will be available throughout the duration of the program.
I like all the answers about how the string could be statically allocated, but that's not necessarily true for all implementations, particularly the one whose documentation the original poster linked to. In this case, it appears that the decorated type name is stored statically in order to save space, and the undecorated type name is computed on demand and cached in a linked list.
If you're curious about how the Visual C++ type_info::name() implementation allocates and caches its memory, it's not hard to find out. First, create a tiny test program:
#include <cstdio>
#include <typeinfo>
#include <vector>
int main(int argc, char* argv[]) {
std::vector<int> v;
const type_info& ti = typeid(v);
const char* n = ti.name();
printf("%s\n", n);
return 0;
}
Build it and run it under a debugger (I used WinDbg) and look at the pointer returned by type_info::name(). Does it point to a global structure? If so, WinDbg's ln command will tell the name of the closest symbol:
0:000> ?? n
char * 0x00000000`00857290
"class std::vector<int,class std::allocator<int> >"
0:000> ln 0x00000000`00857290
0:000>
ln didn't print anything, which indicates that the string wasn't in the range of addresses owned by any specific module. It would be in that range if it was in the data or read-only data segment. Let's see if it was allocated on the heap, by searching all heaps for the address returned by type_info::name():
0:000> !heap -x 0x00000000`00857290
Entry User Heap Segment Size PrevSize Unused Flags
-------------------------------------------------------------------------------------------------------------
0000000000857280 0000000000857290 0000000000850000 0000000000850000 70 40 3e busy extra fill
Yes, it was allocated on the heap. Putting a breakpoint at the start of malloc() and restarting the program confirms it.
Looking at the declaration in <typeinfo> gives a clue about where the heap pointers are getting cached:
struct __type_info_node {
void *memPtr;
__type_info_node* next;
};
extern __type_info_node __type_info_root_node;
...
_CRTIMP_PURE const char* __CLR_OR_THIS_CALL name(__type_info_node* __ptype_info_node = &__type_info_root_node) const;
If you find the address of __type_info_root_node and walk down the list in the debugger, you quickly find a node containing the same address that was returned by type_info::name(). The list seems to be related to the caching scheme.
The MSDN page linked in the original question seems to fill in the blanks: the name is stored in its decorated form to save space, and this form is accessible via type_info::raw_name(). When you call type_info::name() for the first time on a given type, it undecorates the name, stores it in a heap-allocated buffer, caches the buffer pointer, and returns it.
The linked list may also be used to deallocate the cached strings during program exit (however, I didn't verify whether that is the case). This would ensure that they don't show up as memory leaks when you run a memory debugging tool.
Well gee, if we are talking about just a function, that you always want to return the same value. it's quite simple.
const char * foo()
{
static char[] return_val= "HelloWorld!";
return return_val;
}
The tricky bit is when you start doing things where you are caching the result, and then you have to consider Threading,or when your cache gets invalidated, and trying to store thing in thread local storage. But if it's just a one off output that is immediate copied, this should do the trick.
Alternately if you don't have a fixed size you have to do something where you have to either use a static buffer of arbitrary size.. in which you might eventually have something too large, or turn to a managed class say std::string.
const char * foo()
{
static std::string output;
DoCalculation(output);
return output.c_str();
}
also the function signature
const char *getHelloString() const;
is only applicable for member functions.
At which point you don't need to deal with static function local variables and could just use a member variable.
I think that since they know that there are a finite number of these, they just keep them around forever. It might be appropriate for you to do that in some instances, but as a general rule, std::string is going to be better.
They can also look up new calls to see if they made that string already and return the same pointer. Again, depending on what you are doing, this may be useful for you too.
Be careful when implementing a function that allocates a chunk of memory and then expects the caller to deallocate it, as you do in the OP:
const char *getHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
By doing this you are transferring ownership of the memory to the caller. If you call this code from some other function:
int main()
{
char * str = getHelloString();
delete str;
return 0;
}
...the semantics of transferring ownership of the memory is not clear, creating a situation where bugs and memory leaks are more likely.
Also, at least under Windows, if the two functions are in 2 different modules you could potentially corrupt the heap. In particular, if main() is in hello.exe, compiled in VC9, and getHelloString() is in utility.dll, compiled in VC6, you'll corrupt the heap when you delete the memory. This is because VC6 and VC9 both use their own heap, and they aren't the same heap, so you are allocating from one heap and deallocating from another.
Why does the return type need to be const? Don't think of the method as a get method, think of it as a create method. I've seen plenty of API that requires you to delete something a creation operator/method returns. Just make sure you note that in the documentation.
/* create a hello string
* must be deleted after use
*/
char *createHelloString() const
{
char *returnVal = new char[13];
strcpy("HelloWorld!", returnVal);
return returnVal
}
What I've often done when I need this sort of functionality is to have a char * pointer in the class - initialized to null - and allocate when required.
viz:
class CacheNameString
{
private:
char *name;
public:
CacheNameString():name(NULL) { }
const char *make_name(const char *v)
{
if (name != NULL)
free(name);
name = strdup(v);
return name;
}
};
Something like this would do:
const char *myfunction() {
static char *str = NULL; /* this only happens once */
delete [] str; /* delete previous cached version */
str = new char[strlen("whatever") + 1]; /* allocate space for the string and it's NUL terminator */
strcpy(str, "whatever");
return str;
}
EDIT: Something that occurred to me is that a good replacement for this could be returning a boost::shared_pointer instead. That way the caller can hold onto it as long as they want and they don't have to worry about explicitly deleting it. A fair compromise IMO.
The advice given that warns about the lifetime of the returned string is sound advise. You should always be careful about recognising your responsibilities when it comes to managing the lifetime of returned pointers. The practise is quite safe, however, provided the variable pointed to will outlast the call to the function that returned it. Consider, for instance, the pointer to const char returned by c_str() as a method of class std::string. This is returning a pointer to the memory managed by the string object which is guaranteed to be valid as long as the string object is not deleted or made to reallocate its internal memory.
In the case of the std::type_info class, it is a part of the C++ standard as its namespace implies. The memory returned from name() is actually pointed to static memory created by the compiler and linker when the class was compiled and is a part of the run time type identification (RTTI) system. Because it refers to a symbol in code space, you should not attempt to delete it.
I think something like this can only be implemented "cleanly" using objects and the RAII idiom.
When the objects destructor is called (obj goes out of scope), we can safely assume that the const char* pointers arent be used anymore.
example code:
class ICanReturnConstChars
{
std::stack<char*> cached_strings
public:
const char* yeahGiveItToMe(){
char* newmem = new char[something];
//write something to newmem
cached_strings.push_back(newmem);
return newmem;
}
~ICanReturnConstChars(){
while(!cached_strings.empty()){
delete [] cached_strings.back()
cached_strings.pop_back()
}
}
};
The only other possibility i know of is to pass a smart_ptr ..
It's probably done using a static buffer:
const char* GetHelloString()
{
static char buffer[256] = { 0 };
strcpy( buffer, "Hello World!" );
return buffer;
}
This buffer is like a global variable that is accessible only from this function.
You can't rely on GC; this is C++. That means you must keep the memory available until the program terminates. You simply don't know when it becomes safe to delete[] it. So, if you want to construct and return a const char*, simple new[] it and return it. Accept the unavoidable leak.