C++ string manipulation - c++

My lack of C++ experience, or rather my early learning in garbage collected languages is really stinging me at the moment and I have a problem working with strings in C++.
To make it very clear, using std::string or equlivents is not an option - this is char* 's all the way.
So: what I need to do is very simple and basically boils down to concatenating strings. At runtime I have 2 classes.
One class contains "type" information in the form of a base filename.
in the header:
char* mBaseName;
and later, in the .cpp it is loaded with info passed in from elsewhere.
mBaseName = attributes->BaseName;
The 2nd class provides version information in the form of a suffix to the base file name, it's a static class and implemented like this at present:
static const char* const suffixes[] = {"Version1", "Version", "Version3"}; //etc.
static char* GetSuffix()
{
int i = 0;
//perform checks on some data structures
i = somevalue;
return suffixes[i];
}
Then, at runtime the base class creates the filename it needs:
void LoadStuff()
{
char* suffix = GetSuffix();
char* nameToUse = new char[50];
sprintf(nameToUse, "%s%s",mBaseName,suffix);
LoadAndSetupData(nameToUse);
}
And you can see the problem immediately. nameToUse never gets deleted, memory leak.
The suffixes are a fixed list, but the basefilenames are arbitrary. The name that is created needs to persist beyond the end of "LoadStuff()" as it's not clear when if and how it is used subsequently.
I am probably worrying too much, or being very stupid, but similar code to LoadStuff() happens in other places too, so it needs solving. It's frustrating as I don't quite know enough about the way things work to see a safe and "un-hacky" solution. In C# I'd just write:
LoadAndSetupData(mBaseName + GetSuffix());
and wouldn't need to worry.
Any comments, suggestions, or advice much appreciated.
Update
The issue with the code I am calling LoadAndSetupData() is that, at some point it probably does copy the filename and keep it locally, but the actual instantiation is asynchranous, LoadAndSetupData actually puts things into a queue, and at that point at least, it expects that the string passed in still exists.
I do not control this code so I can't update it's function.

Seeing now that the issue is how to clean up the string that you created and passed to LoadAndSetUpData()
I am assuming that:
LoadAndSetUpData() does not make its own copy
You can't change LoadAndSetUpData() to do that
You need the string to still exist for some time after LoadAndSetupData() returns
Here are suggestions:
Can you make your own queue objects to be called? Are they guaranteed to be called after the ones that use your string. If so, create cleanup queue events with the same string that call delete[] on them
Is there a maximum number you can count on. If you created a large array of strings, could you use them in a cycle and be assured that when you got back to the beginning, it would be ok to reuse that string
Is there an amount of time you can count on? If so, register them for deletion somewhere and check that after some time.
The best thing would be for functions that take char* to take ownership or copy. Shared ownership is the hardest thing to do without reference counting or garbage collection.

EDIT: This answer doesn't address his problem completely -- I made other suggestions here:
C++ string manipulation
His problem is that he needs to extend the scope of the char* he created to outside the function, and until an asynchronous job is finished.
Original Answer:
In C++, if I can't use the standard library or Boost, I still have a class like this:
template<class T>
class ArrayGuard {
public:
ArrayGuard(T* ptr) { _ptr = ptr; }
~ArrayGuard() { delete[] _ptr; }
private:
T* _ptr;
ArrayGuard(const ArrayGuard&);
ArrayGuard& operator=(const ArrayGuard&);
}
You use it like:
char* buffer = new char[50];
ArrayGuard<char *> bufferGuard(buffer);
The buffer will be deleted at the end of the scope (on return or throw).
For just simple array deleting for dynamic sized arrays that I want to be treated like a static sized array that gets released at the end of the scope.
Keep it simple -- if you need fancier smart pointers, use Boost.
This is useful if the 50 in your example is variable.

The thing to remember with C++ memory management is ownership. If the LoadAndSetupData data is not going to take ownership of the string, then it's still your responsibility. Since you can't delete it immediately (because of the asynchronicity issue), you're going to have to hold on to those pointers until such time as you know you can delete them.
Maintain a pool of strings that you have created:
If you have some point in time where you know that the queue has been completely dealt with, you can simply delete all the strings in the pool.
If you know that all strings created after a certain point in time have been dealt with, then keep track of when the strings were created, and you can delete that subset. - If you can somehow find out when an individual string has been dealt with, then just delete that string.
class StringPool
{
struct StringReference {
char *buffer;
time_t created;
} *Pool;
size_t PoolSize;
size_t Allocated;
static const size_t INITIAL_SIZE = 100;
void GrowBuffer()
{
StringReference *newPool = new StringReference[PoolSize * 2];
for (size_t i = 0; i < Allocated; ++i)
newPool[i] = Pool[i];
StringReference *oldPool = Pool;
Pool = newPool;
delete[] oldPool;
}
public:
StringPool() : Pool(new StringReference[INITIAL_SIZE]), PoolSize(INITIAL_SIZE)
{
}
~StringPool()
{
ClearPool();
delete[] Pool;
}
char *GetBuffer(size_t size)
{
if (Allocated == PoolSize)
GrowBuffer();
Pool[Allocated].buffer = new char[size];
Pool[Allocated].buffer = time(NULL);
++Allocated;
}
void ClearPool()
{
for (size_t i = 0; i < Allocated; ++i)
delete[] Pool[i].buffer;
Allocated = 0;
}
void ClearBefore(time_t knownCleared)
{
size_t newAllocated = 0;
for (size_t i = 0; i < Allocated; ++i)
{
if (Pool[i].created < knownCleared)
{
delete[] Pool[i].buffer;
}
else
{
Pool[newAllocated] = Pool[i];
++newAllocated;
}
}
Allocated = newAllocated;
}
// This compares pointers, not strings!
void ReleaseBuffer(char *knownCleared)
{
size_t newAllocated = 0;
for (size_t i = 0; i < Allocated; ++i)
{
if (Pool[i].buffer == knownCleared)
{
delete[] Pool[i].buffer;
}
else
{
Pool[newAllocated] = Pool[i];
++newAllocated;
}
}
Allocated = newAllocated;
}
};

Since std::string is not an option, for whatever reason, have you looked into smart pointers? See boost
But I can only encourage you to use std::string.
Christian

If you must use char*'s, then LoadAndSetupData() should explicitly document who owns the memory for the char* after the call. You can do one of two things:
Copy the string. This is probably the simplest thing. LoadAndSetupData copies the string into some internal buffer, and the caller is always responsible for the memory.
Transfer ownership. LoadAndSetupData() documents that it will be responsible for eventually freeing the memory for the char*. The caller doesn't need to worry about freeing the memory.
I generally prefer safe copying as in #1, because the allocator of the string is also responsible for freeing it. If you go with #2, the allocator has to remember NOT to free things, and memory management happens in two places, which I find harder to maintain. In either case, it's a matter of explicitly documenting the policy so that the caller knows what to expect.
If you go with #1, take a look at Lou Franco's answer to see how you might allocate a char[] in an exception-safe, sure to be freed way using a guard class. Note that you can't (safely) use std::auto_ptr for arrays.

Since you need nameToUse to still exist after the function, you are stuck using new, what I would do is return a pointer to it, so the caller can "delete" it at a later time when it is no longer needed.
char * LoadStuff()
{
char* suffix = GetSuffix();
char* nameToUse = new char[50];
sprintf("%s%s",mBaseName,suffix);
LoadAndSetupData(nameToUse);
return nameToUse;
}
then:
char *name = LoadStuff();
// do whatever you need to do:
delete [] name;

There is no need to allocate on heap in this case. And always use snprintf:
char nameToUse[50];
snprintf(nameToUse, sizeof(nameToUse), "%s%s",mBaseName,suffix);

Where exactly nameToUse is used beyond the scope of LoadStuff? If someone needs it after LoadStuff it needs to pass it, along with the responisbility for memory deallocation
If you would have done it in c# as you suggested
LoadAndSetupData(mBaseName + GetSuffix());
then nothing would reference LoadAndSetupData's parameter, therefore you can safely change it to
char nameToUse[50];
as Martin suggested.

You're going to have to manage the lifetime of the memory you allocate for nameToUse. Wrapping it up in a class such as std::string makes your life a bit simpler.
I guess this is a minor outrage, but since I can't think of any better solution to your problem, I'll point out another potential problem. You need to be very careful to check the size of the buffer you're writing into when copying or concatenating strings. Functions such as strcat, strcpy and sprintf can easily overwrite the end of their target buffers, leading to spurious runtime errors and security vulnerabilities.
Apologies, my own experience is mostly on the Windows platform, where they introduced "safe" versions of these functions, called strcat_s, strcpy_s, and sprintf_s. The same goes for all their many related functions.

First: Why do you need for the allocated string to persist beyond the end of LoadStuff()? Is there a way you can refactor to remove that requirement.
Since C++ doesn't provide a straightforward way to do this kind of stuff, most programming environments use a set of guidelines about pointers to prevent delete/free problems. Since things can only be allocated/freed once, it needs to be very clear who "owns" the pointer. Some sample guidelines:
1) Usually the person that allocates the string is the owner, and is also responsible for freeing the string.
2) If you need to free in a different function/class than you allocated in, there must be an explicit hand-off of ownership to another class/function.
3) Unless explicitly stated otherwise, pointers (including strings) belong to the caller. A function, constructor, etc. cannot assume that the string pointer it gets will persist beyond the end of the function call. If they need a persistent copy of the pointer, they should make a local copy with strdup().
What this boils down to in your specific case is that LoadStuff() should delete[] nameToUse, and the function that it calls should make a local copy.
One alternate solution: if nameToUse is going to be passed lots of places and needs to persist for the lifetime of the program, you could make it a global variable. (This saves the trouble of making lots of copies of it.) If you don't want to pollute your global namespace, you could just declare it static local to the function:
static char *nameToUse = new char[50];

Thankyou everyone for your answers. I have not selected one as "the answer" as there isn't a concrete solution to this problem and the best discussions on it are all upvoted be me and others anyway.
Your suggestions are all good, and you have been very patient with the clunkiness of my question. As I am sure you can see, this is a simplification of a more complicated problem and there is a lot more going on which is connected with the example I gave, hence the way that bits of it may not have entirely made sense.
For your interest I have decided to "cheat" my way out of the difficulty for now. I said that the base names were arbitrary, but this isn't quite true. In fact they are a limited set of names too, just a limited set that could change at some point, so I was attempting to solve a more general problem.
For now I will extend the "static" solution to suffixes and build a table of possible names. This is very "hacky", but will work and moreover avoids refactoring a large amount of complex code which I am not able to.
Feedback has been fantastic, many thanks.

You can combine some of the ideas here.
Depending on how you have modularized your application, there may be a method (main?) whose execution determines the scope in which nameToUse is definable as a fixed size local variable. You can pass the pointer (&nameToUse[0] or simply nameToUse) to those other methods that need to fill it (so pass the size too) or use it, knowing that the storage will disappear when the function having the local variable exits or your program terminates by any other means.
There is little difference between this and using dynamic allocation and deletion (since the pointer holding the location will have to be managed more-or-less the same way). The local allocation is more direct in many cases and is very inexpensive when there is no problem with associating the maximum-required lifetime with the duration of a particular function's execution.

I'm not totally clear on where LoadAndSetupData is defined, but it looks like it's keeping its own copy of the string. So then you should delete your locally allocated copy after the call to LoadAndSetupData and let it manage its own copy.
Or, make sure LoadAndSetupData cleans up the allocated char[] that you give it.
My preference would be to let the other function keep its own copy and manage it so that you don't allocate an object for another class.
Edit: since you use new with a fixed size [50], you might as well make it local as has been suggested and the let LoadAndSetupData make its own copy.

Related

Stack overflow in C++ with big array

Well, I am writing a program for the university, where I have to put a data dump into HDF format. The data dump looks like this:
"1444394028","1","5339","M","873"
"1444394028","1","7045","V","0.34902"
"1444394028","1","7042","M","2"
"1444394028","1","7077","V","0.0470588"
"1444394028","1","5415","M","40"
"1444394028","1","7077","V","0.462745"
"1444394028","1","7076","B","10001101"
"1444394028","1","7074","M","19"
"1444394028","1","7142","M","16"
"1444394028","1","7141","V","0.866667"
For the HDF5 API I need an array. So my method at the moment is, to write the data dump into an array like this:
int count = 0;
std::ifstream countInput("share/ObservationDump.txt");
std::string line;
if(!countInput) cout << "Datei nicht gefunden" << endl;
while( std::getline( countInput, line ) ) {
count++;
}
cout << count << endl;
struct_t writedata[count];
int i = 0;
std::ifstream dataInput("share/ObservationDump.txt");
std::string line2;
char delimeter(',');
std::string timestampTemp, millisecondsSinceStartTemp, deviceTemp, typeTemp, valueTemp;
while (std::getline(dataInput, timestampTemp, delimeter) )
{
std::getline(dataInput, millisecondsSinceStartTemp, delimeter);
std::getline(dataInput, deviceTemp, delimeter);
std::getline(dataInput, typeTemp, delimeter);
std::getline(dataInput, valueTemp);
writedata[i].timestamp = atoi(timestampTemp.substr(1, timestampTemp.size()-2).c_str());
writedata[i].millisecondsSinceStart = atoi(millisecondsSinceStartTemp.substr(1, millisecondsSinceStartTemp.size()-2).c_str());
writedata[i].device = atoi(deviceTemp.substr(1, deviceTemp.size()-2).c_str());
writedata[i].value = atof(valueTemp.substr(1, valueTemp.size()-2).c_str());
writedata[i].type = *(typeTemp.substr(1, typeTemp.size()-2).c_str());
i++;
}
with struct_t defined as
struct struct_t
{
int timestamp;
int millisecondsSinceStart;
int device;
char type;
double value;
};
As some of you might see, with big data dumps (at about 60 thousand lines) the array writedata tends to generate a stack overflow (segmentation error). I need an array to pass it to my HDF adapter. How can I prevent the overflow? I was not able to find answers by extensive googling. Thanks in advance!
The example code you are following is in C, while the code you are writing is it C++. In most cases, valid C code is valid C++ code, although not necessarily good style; this is one of the times where it is not, although since that isn’t your real problem I’ll leave the explanation of that at the end of my answer.
When you declare struct_t writedata[count];, you are creating an array on the stack. The stack is often artificially limited in size, and so creating a large array on the stack could lead to a problem where you run out of stack space. This is what you are seeing. The typical solution is to create large data structures in the heap (although the primary use of the heap is to make data that lasts past the return of the function that creates it).
The most C++-idiomatic way to access the heap is to not do it directly, but to use a helper container class. In this case what you want is an std::vector, which lets you push data onto the end and will automatically grow as you push on more data. Since it automatically grows, you don’t need to specify the size in advance; just declare it as a std::vector<struct_t> writedata; (read “std::vector of struct_t”). Again, since it doesn’t need to know the size in advance, you can also ignore the whole first loop.
The vector is initially empty; to put data into it, you usually want to use writedata.push_back() or writedata.emplace_back(). The first of these takes an existing struct_t; the second takes the parameters you’d use to create one. All of the elements are stored contiguously in memory, like in a C array, which you can access directly with writedata.data().
At the end of the function, when the vector goes out of scope and is no longer accessible, its destructor will be called and automatically clean up the memory it used.
Another option, instead of using std::vector, is to manage the memory yourself. The C++ way of doing that is with new and delete. The easiest way to handle that is to still calculate count, as you do, but instead of creating the array on the stack by just declaring it as a count-sized array, you do struct_t* writedata = new struct_t[count];. This will create an array of count struct_ts in the heap, and set writedata as a pointer to the first element of this array. Then you can use it as you use the array in your program, but since it’s on the heap you won’t run out of stack space.
The downsides to this are that you need to know the size in advance, and you need to clean up the memory you used yourself. To do this, when you no longer need the data, you should run delete[] writedata. After that, writedata will still point to the same place in memory, but your program no longer owns that data, so you need to make sure to never use that value again; the standard way is to, immediately after deletion, set writedata to nullptr.
You can also use the C equivalents to new and delete, which are malloc and free. They are mostly equivalent in your case, but for more complicated examples you should keep in mind that these leave the memory uninitialized, while new and delete will run the constructors/destructors of what you create to make sure the objects are in a sane state at the beginning and don’t leave resources lying around at the end.
Now for why your original code isn’t actually valid C++ for any size of file: Your line struct_t writedata[count]; tries to create an array of count struct_ts. Since count is a variable, this is called a variable-length array (VLA). Such things are legal in newer versions of C, but not in C++. This alone is just worth a warning as long as you only want to compile the code on the same system you’re currently using, since your compiler seems to support VLAs as an extension. However, if you want to compile your code on any other system (make it more portable), you shouldn’t use compiler extensions like this.
struct_t writedata[count];
This array is allocated on the stack which is normally quite small, and when it gets to a value that's too big (which is semi-arbitrary) this will overflow the stack.
You'd be better off allocating on the heap by doing something like:
struct_t* writedata = (struct_t*)malloc(sizeof(struct_t) * count);
And then add a corresponding call to free once you're finished with the memory, e.g.
free(writedata);
writedata = nullptr;
It's best practice to check that i < count in your while loop, as if you write off the end of your array Bad Things may happen.

practical explanation of c++ functions with pointers

I am relatively new to C++...
I am learning and coding but I am finding the idea of pointers to be somewhat fuzzy. As I understand it * points to a value and & points to an address...great but why? Which is byval and which is byref and again why?
And while I feel like I am learning and understanding the idea of stack vs heap, runtime vs design time etc, I don't feel like I'm fully understanding what is going on. I don't like using coding techniques that I don't fully understand.
Could anyone please elaborate on exactly what and why the pointers in this fairly "simple" function below are used, esp the pointer to the function itself.. [got it]
Just asking how to clean up (delete[]) the str... or if it just goes out of scope.. Thanks.
char *char_out(AnsiString ansi_in)
{
// allocate memory for char array
char *str = new char[ansi_in.Length() + 1];
// copy contents of string into char array
strcpy(str, ansi_in.c_str());
return str;
}
Revision 3
TL;DR:
AnsiString appears to be an object which is passed by value to that function.
char* str is on the stack.
A new array is created on the heap with (ansi_in.Length() + 1) elements. A pointer to the array is stored in str. +1 is used because strings in C/C++ typically use a null terminator, which is a special character used to identify the end of the string when scanning through it.
ansi_in.cstr() is called, copying a pointer to its string buffer into an unnamed local variable on the stack.
str and the temporary pointer are pushed onto the stack and strcpy is called. This has the effect of copying the string(including the null-terminator) pointed at from the temporary to str.
str is returned to the caller
Long answer:
You appear to be struggling to understand stack vs heap, and pointers vs non-pointers. I'll break them down for you and then answer your question.
The stack is a concept where a fixed region of memory is allocated for each thread before it starts and before any user code runs.
Ignoring lower level details such as calling conventions and compiler optimizations, you can reason that the following happens when you call a function:
Arguments are pushed onto the stack. This reserves part of the stack for use of the arguments.
The function performs some job, using and copying the arguments as needed.
The function pops the arguments off the stack and returns. This frees the space reserved for the arguments.
This isn't limited to function calls. When you declare objects and primitives in a function's body, space for them is reserved via pushing. When they're out of scope, they're automatically cleaned up by calling destructors and popping.
When your program runs out of stack space and starts using the space outside of it, you'll typically encounter an error. Regardless of what the actual error is, it's known as a stack overflow because you're going past it and therefore "overflowing".
The heap is a different concept where the remaining unused memory of the system is available for you to manually allocate and deallocate from. This is primarily used when you have a large data set that's too big for the stack, or when you need data to persist across arbitrary functions.
C++ is a difficult beast to master, but if you can wrap your head around the core concepts is becomes easier to understand.
Suppose we wanted to model a human:
struct Human
{
const char* Name;
int Age;
};
int main(int argc, char** argv)
{
Human human;
human.Name = "Edward";
human.Age = 30;
return 0;
}
This allocates at least sizeof(Human) bytes on the stack for storing the 'human' object. Right before main() returns, the space for 'human' is freed.
Now, suppose we wanted an array of 10 humans:
int main(int argc, char** argv)
{
Human humans[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
return 0;
}
This allocates at least (sizeof(Human) * 10) bytes on the stack for storing the 'humans' array. This too is automatically cleaned up.
Note uses of ".". When using anything that's not a pointer, you access their contents using a period. This is direct memory access if you're not using a reference.
Here's the single object version using the heap:
int main(int argc, char** argv)
{
Human* human = new Human();
human->Name = "Edward";
human->Age = 30;
delete human;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for the pointer 'human', and at least sizeof(Human) bytes on the heap for storing the object it points to. 'human' is not automatically cleaned up, you must call delete to free it. Note uses of "a->b". When using pointers, you access their contents using the "->" operator. This is indirect memory access, because you're accessing memory through an variable address.
It's sort of like mail. When someone wants to mail you something they write an address on an envelope and submit it through the mail system. A mailman takes the mail and moves it to your mailbox. For comparison the pointer is the address written on the envelope, the memory management unit(mmu) is the mail system, the electrical signals being passed down the wire are the mailman, and the memory location the address refers to is the mailbox.
Here's the array version using the heap:
int main(int argc, char** argv)
{
Human* humans = new Human[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
delete[] humans;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for pointer 'humans', and (sizeof(Human) * 10) bytes on the heap for storing the array it points to. 'humans' is also not automatically cleaned up; you must call delete[] to free it.
Note uses of "a[i].b" rather than "a[i]->b". The "[]" operator(indexer) is really just syntactic sugar for "*(a + i)", which really just means treat it as a normal variable in a sequence so I can type less.
In both of the above heap examples, if you didn't write delete/delete[], the memory that the pointers point to would leak(also known as dangle). This is bad because if left unchecked it could eat through all your available memory, eventually crashing when there isn't enough or the OS decides other apps are more important than yours.
Using the stack is usually the wiser choice as you get automatic lifetime management via scope(aka RAII) and better data locality. The only "drawback" to this approach is that because of scoped lifetime you can't directly access your stack variables once the scope has exited. In other words you can only use stack variables within the scope they're declared. Despite this, C++ allows you to copy pointers and references to stack variables, and indirectly use them outside the scope they're declared in. Do note however that this is almost always a very bad idea, don't do it unless you really know what you're doing, I can't stress this enough.
Passing an argument by-ref means pushing a copy of a pointer or reference to the data on the stack. As far as the computer is concerned pointers and references are the same thing. This is a very lightweight concept, but you typically need to check for null in functions receiving pointers.
Pointer variant of an integer adding function:
int add(const int* firstIntPtr, const int* secondIntPtr)
{
if (firstIntPtr == nullptr) {
throw std::invalid_argument("firstIntPtr cannot be null.");
}
if (secondIntPtr == nullptr) {
throw std::invalid_argument("secondIntPtr cannot be null.");
}
return *firstIntPtr + *secondIntPtr;
}
Note the null checks. If it didn't verify its arguments are valid, they very well may be null or point to memory the app doesn't have access to. Attempting to read such values via dereferencing(*firstIntPtr/*secondIntPtr) is undefined behavior and if you're lucky results in a segmentation fault(aka access violation on windows), crashing the program. When this happens and your program doesn't crash, there are deeper issues with your code that are out of the scope of this answer.
Reference variant of an integer adding function:
int add(const int& firstInt, const int& secondInt)
{
return firstInt + secondInt;
}
Note the lack of null checks. By design C++ limits how you can acquire references, so you're not suppose to be able to pass a null reference, and therefore no null checks are required. That said, it's still possible to get a null reference through converting a pointer to a reference, but if you're doing that and not checking for null before converting you have a bug in your code.
Passing an argument by-val means pushing a copy of it on the stack. You almost always want to pass small data structures by value. You don't have to check for null when passing values because you're passing the actual data itself and not a pointer to it.
i.e.
int add(int firstInt, int secondInt)
{
return firstInt + secondInt;
}
No null checks are required because values, not pointers are used. Values can't be null.
Assuming you're interested in learning about all this, I highly suggest you use std::string(also see this) for all your string needs and std::unique_ptr(also see this) for managing pointers.
i.e.
std::string char_out(AnsiString ansi_in)
{
return std::string(ansi_in.c_str());
}
std::unique_ptr<char[]> char_out(AnsiString ansi_in)
{
std::unique_ptr<char[]> str(new char[ansi_in.Length() + 1]);
strcpy(str.get(), ansi_in.c_str());
return str; // std::move(str) if you're using an older C++11 compiler.
}

Potential dynamic memory problems

I am creating an Arduino device using C++. I need a stack object with variable size and variable data types. Essentially this stack needs to be able to be resized and used with bytes, chars, ints, doubles, floats, shorts, and longs.
I have a basic class setup, but with the amount of dynamic memory allocation that is required, I wanted to make sure that my use of data frees enough space for the program to continue without memory problems. This does not use std methods, but instead built in versions of those for the Arduino.
For clarification, my question is: Are there any potential memory problems in my code?
NOTE: This is not on the Arduino stack exchange because it requires an in depth knoweledge of C/C++ memory allocation that could be useful to all C and C++ programmers.
Here's the code:
Stack.h
#pragma once
class Stack {
public:
void init();
void deinit();
void push(byte* data, size_t data_size);
byte* pop(size_t data_size);
size_t length();
private:
byte* data_array;
};
Stack.cpp
#include "Arduino.h"
#include "Stack.h"
void Stack::init() {
// Initialize the Stack as having no size or items
data_array = (byte*)malloc(0);
}
void Stack::deinit() {
// free the data so it can be re-used
free(data_array);
}
// Push an item of variable size onto the Stack (byte, short, double, int, float, long, or char)
void Stack::push(byte* data, size_t data_size) {
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
for(size_t i = 0; i < sizeof(data); i++)
data_array[sizeof(data_array) - sizeof(data) + i] = data[i];
}
// Pop an item of variable size off the Stack (byte, short, double, int, float, long, or char)
byte* Stack::pop(size_t data_size) {
byte* data;
if(sizeof(data_array) - data_size >= 0) {
data = (byte*)(&data_array + sizeof(data_array) - data_size);
data_array = (byte*)realloc(data_array, sizeof(data_array) - data_size);
} else {
data = NULL;
}
// Make sure to free(data) when done with the data from pop()!
return data;
}
// Return the sizeof the Stack
size_t Stack::length() {
return sizeof(data_array);
}
There are some minor code bugs, apparently, which -- although important -- are easily resolved. The following answer only applies to the overall design of this class:
There is nothing wrong with just the code that is shown.
But only the code that's shown. No opinion is rendered on any code that's not shown.
And, it's fairly likely that there are going to be massive problems, and memory leaks, in the rest of the code which will attempt to use this class.
It's going to very, very easy to use this class in a way that leaks or corrupts memory. It's going to be much harder to use this class correctly, and much easier to screw up. The fact that these functions themselves appear to do their job correctly is not going to help if all you have to do is sneeze in the wrong direction, and end up with these functions not being used in the proper order, or sequence.
Just to name the first two readily apparent problems:
1) Failure to call deinit(), when any instance of this class goes out of scope and gets destroyed, will leak memory. Every time you use this class, you have to be cognizant of when the instance of this class goes out of scope and gets destroyed. It's easy to keep track of every time you create an instance of this class, and it's easy to remember to call init() every time. But keeping track of every possible way an instance of this class could go out of scope and get destroyed, so that you must call deinit() and free up the internal memory, is much harder. It's very easy to not even realize when that happens.
2) If an instance of this class gets copy-constructed, or the default assignment operator gets invoked, this is guaranteed to result in memory corruption, with an extra side-helping of a memory leak.
Note that you don't have to go out of your way to write code that copy-constructs, or assigns one instance of the object to another one. The compiler will be more than happy to do it for you, if you do not pay attention.
Generally, the best way to avoid these kinds of problems is to make it impossible to happen, by using the language correctly. Namely:
1) Following the RAII design pattern. Get rid of init() and deinit(). Instead, do this work in the object's constructor and destructor.
2) Either deleting the copy constructor and the assignment operator, or implementing them correctly. So, if instances of this class should never be copy-constructed or assigned-to, it's much better to have the compiler yell at you, if you accidentally write some code that does that, instead of spending a week tracking down where that happens. Or, if the class can be copy-constructed or assigned, doing it properly.
Of course, if there would only be a small number of instances of this class, it should be possible to safely use it, with tight controls, and lots of care, without doing this kind of a redesign. But, even if it were the case, it's always better to do the job right, instead of shrugging this off now, but then later deciding to expand the use of this class in more places, and then forgetting about the fact that this class is so error-prone.
P.S.: a few of the minor bugs that I mentioned in the beginning:
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
This can't be right. data_array is a byte *, so sizeof(data_array) will always be a compile-time constant, which would be sizeof(byte *). That's obviously not what you want here. You need to explicitly keep track of the allocated array's size.
The same general bug appears in several other places here, but it's easily fixed. The overall class design is the bigger problem.

Determine the nature of parameter in runtime

I have a function
void fname(char* Ptr)
{
...
}
I want to know inside this function whether this pointer Ptr holds the address of dynamically allocated memory using new char[] or the address of locally allocated memory in the calling function. Is there any way I can determine that? I think <typeinfo> doesn't help here.
One way to do this is to have your own operator new functions and keep track of everything allocated so that you can just ask your allocation library if the address given is one it allocated. The custom allocator then just calls the standard one to actually do the allocation.
Another approach (messy and details highly OS dependent) may be to examine the process layout in virtual memory and hence determine which addresses refer to which areas of memory.
You can combine these ideas by actually managing your own memory pools. So if you get a single large chunk of system memory with known address bounds and use that for all new'd memory, you can just check that an address in is the given range to answer your question.
However: Any of these ideas is a lot of work and not appropriate if this problem is the only purpose in doing so.
Having said all that, if you do want to implement something, you will need to work carefully through all the ways that an address might be generated.
For example (and surely I've missed some):
Stack
Return from new
Inside something returned from new.
Was returned from new but already deleted (hopefully not, but that's why we need diagnostics)
statically allocated
static constant memory
command line arguments/ environment
code addresses.
Now, ignoring all that for a moment, and assuming this is for some debug purpose rather than system design, you might be able to try this kind of thing:
This is ugly, unreliable, not guaranteed by the standard, etc etc, but might work . . .
char* firstStack = 0;
bool isOnStack(const void* p)
{
char* check =(char*)p;
char * here = (char*)&check;
int a = firstStack - check;
int b = check - here;
return (a*b > 0);
}
void g(const char* p)
{
bool onStack = isOnStack(p);
std::cout << p << (onStack ? "" : " not" ) << " on stack " << std::endl;
}
void f()
{
char stuff[1024] = "Hello";
g(stuff);
}
void h()
{
char* nonsense = new char[1024];
strcpy(nonsense, "World");
g(nonsense);
delete [] nonsense;
}
int main()
{
int var = 0;
firstStack = (char*)&var;
f();
h();
}
Output:
Hello on stack
World not on stack
The short answer: no, you can't. You have no way of knowing whether Ptr is a pointer to a single char, the start of a statically allocated array, a pointer to a single dynamically allocated char, or the start of an array thereof.
If you really wanted to, you try an overload like so:
template <std::size_t N>
void fname(char Ptr[N])
{
// ...
}
which would match when passed a statically allocated array, whereas the first version would be picked when dealing with dynamically allocated memory or a pointer to a single char.
(But note that function overloading rules are a bit complicated in the presence of templates -- in particular, a non-template function is preferred if it matches. So you might need to make the original function take a "dummy" template parameter if you go for this approach.)
In vc++ there is an assertion _CrtIsMemoryBlock (http://msdn.microsoft.com/en-us/library/ww5t02fa.aspx#BKMK_CRT_assertions) that can be used to check if a pointer was allocated from the heap. This will only work when a debug heap is being used but this is fine if you are just wanting to add some 'debug only' assertions. This method has worked well for me in the past under Windows.
For Linux however I know of no such equivalent.
Alternatively you could use an inline assembler block to try to determine the if it is a stack address or not. This would be hardware dependent as it would rely heavily not only on the processor type but also on the memory model being used (flat address model vs segmented etc). Its probably best to avoid this type of approach.

C++ - Regarding the scope of dynamic arrays

I have a quick question regarding the scope of dynamic arrays, which I assume is causing a bug in a program I'm writing. This snippet checks a function parameter and branches to either the first or the second, depending on what the user passes.
When I run the program, however, I get a scope related error:
error: ‘Array’ was not declared in this scope
Unless my knowledge of C++ fails me, I know that variables created within a conditional fall out of scope when when the branch is finished. However, I dynamically allocated these arrays, so I cannot understand why I can't manipulate the arrays later in the program, since the pointer should remain.
//Prepare to store integers
if (flag == 1) {
int *Array;
Array = new int[input.length()];
}
//Prepare to store chars
else if (flag == 2) {
char *Array;
Array = new char[input.length()];
}
Can anyone shed some light on this?
Declare Array before if. And you can't declare array of different types as one variable, so I think you should use to pointers.
int *char_array = nullptr;
int *int_array = nullptr;
//Prepare to store integers
if (flag == 1) {
int_array = new int[input.length()];
}
//Prepare to store chars
else if (flag == 2) {
char_array = new char[input.length()];
}
if (char_array)
{
//do something with char_array
}
else if (int_array)
{
//do something with int_array
}
Also as j_random_hacker points, you might want to change you program design to avoid lot's of if
While you are right that since you dynamically allocated them on the heap, the memory won't be released to the system until you explicitly delete it (or the program ends), the pointer to the memory falls out of scope when the block it was declared in exits. Therefore, your pointer(s) need to exist at a wider scope if they will be used after the block.
The memory remains allocated (i.e. taking up valuable space), there's just no way to access it after the closing }, because at that point the program loses the ability to address it. To avoid this, you need to assign the pointer returned by new[] to a pointer variable declared in an outer scope.
As a separate issue, it looks as though you're trying to allocate memory of one of 2 different types. If you want to do this portably, you're obliged to either use a void * to hold the pointer, or (less commonly done) a union type containing a pointer of each type. Either way, you will need to maintain state information that lets the program know which kind of allocation has been made. Usually, wanting to do this is an indication of poor design, because every single access will require switching on this state information.
If I understand your intend correctly what you are trying to do is: depending on some logic allocate memory to store n elements of either int or char and then later in your function access that array as either int or char without the need for a single if statement.
If the above understanding is correct than the simple answer is: "C++ is a strong-typed language and what you want is not possible".
However... C++ is also an extremely powerful and flexible language, so here's what can be done:
Casting. Something like the following:
void * Array;
if(flag1) Array = new int[len]
else Array = new char[len];
// ... later in the function
if(flag) // access as int array
int i = ((int*)Array)[0];
Yes, this is ugly and you'll have to have those ifs sprinkled around the function. So here's an alternative: template
template<class T> T foo(size_t _len)
{
T* Array = new T[_len];
T element = Array[0];
return element;
}
Yet another, even more obscure way of doing things, could be the use of unions:
union int_or_char {int i; char c;};
int_or_char *Array = new int_or_char[len];
if(flag) // access as int
int element = Array[0].i;
But one way or the other (or the third) there's no way around the fact that the compiler has to know how to deal with the data you are trying to work with.
Turix's answer is right. You need to keep in mind that two things are being allocated here, The memory for the array and the memory when the location of the array is stored.
So even though the memory from the array is allocated from the heap and will be available to the code where ever required, the memory where the location of the array is stored (the Array variable) is allocated in the stack and will be lost as soon as it goes out of scope. Which in this case is when the if block end. You can't even use it in the else part of the same if.
Another different code suggestion from Andrew I would give is :
void *Array = nullptr;
if (flag == 1) {
Array = new int[input.length()];
} else if (flag == 2) {
Array = new char[input.length()];
}
Then you can directly use if as you intended.
This part I am not sure : In case you want to know if its an int or char you can use the typeid literal. Doesn't work, at least I can't get it to work.
Alternatively you can use your flag variable to guess what type it is.