I have a working function implementation in c that requires a large locally allocated chunk of memory as a working space. This function gets called a lot in succession where it is guaranteed that the required amount of working space does not change. To optimize the function I have refactored it to allocate a static single continuous piece of memory the first time it is called, that is only released when it is asked to. It looks something like this
void worker(struct* ptr, size_t m) {
static double *stack;
static size_t sz_stack;
static double *alpha;
static double *delta;
if (!ptr) {
if (stack) {
free(stack);
}
stack = NULL;
sz_stack = 0;
return;
}
if (!stack) {
sz_stack = 2*m;
stack = calloc(sz_stack, sizeof(*stack));
if (stack==NULL)
// Error and cleanup
alpha = stack;
delta = alpha + m;
}
// Do work using alpha and delta as arrays
return;
}
The caller can call this function successively where ptr will hold the final result as long as the problem size given by m does not change. When the caller is done with the function, or the problem size changes, he calls worker(NULL, 0); and the allocated memory will be freed.
I am now working on rewriting this codebase to c++ and as the best practices tell me I have used individual std::vector for alpha and delta instead of the contiguous stack. However, profiling revealed that there is a huge bottleneck as the std::vector containers are allocated and free'd each and every function call.
My question now is:
What is the modern c++ way to maintain a contiguous piece of working space for a function in between calls?
If it is guaranteed that the required working space will not be changing during contiguous function calls (as you mentioned), to me it seems the simplest solution would be to use a static array (somewhat similar to your C code, but using 'new' and 'delete[]' instead of 'calloc' and 'free').
The C++ way would be to have a class with a private array that you can then manage.
It seems to me that the way you are handling static memory is very analogous to a constructor and destructor. I would have the array as the sole private member and then your worker function as a public member function.
Also, in terms of performance, STL can do some strange things and each implementation can be more or less strange than the next. If you really want speed (as you've seen), sometimes you have to handle things yourself.
static is a dreadful thing because it plays really badly with thread safety and is wholly unnecessary.
The modern way is one of the following:
Declare the memory further up on the stack. vector<> or array<> or even malloc if you like. Pass a pointer (or, equivalently, reference) to this memory into your function.
int main()
{
vector<double> storage;
while (1)
{
worker(&storage,0,0);
}
return 0;
}
Or, write your function as a member of a class. Declare the memory as a member of your class. Create one instance of your class, then call the function repeatedly:
struct oo_hack
{
void worker (struct* ptr, size_t m)
{
// Do some things using storage
}
vector<double> storage;
}
int main()
{
oo_hack myhack; // This is on the stack, but has a vector inside
while (1)
{
myhack.worker(0,0);
}
return 0;
} // memory gets freed here
I'd suggest declaring the memory further up on the stack and passing it into the functions, but you may prefer the second.
Related
Is it possible to allocate a variable length array to the stack in one function from another function?
One way that works is to just allocate the largest possible size up front, but I'm wondering if there is a way to avoid this.
void outside_function(){
char[] place_to_allocate_stack_array;
size_t array_size = allocate_and_fill_array(place_to_allocate_stack_array);
//do stuff with the now allocated variable length array on stack
}
size_t allocate_and_fill_array(char* place_to_allocate){
//does some stuff to determine how long the array needs to be
size_t length= determine_length();
//here I want to allocate the variable length array to the stack,
//but I want the outside_function to still be able to access it after
//the code exits allocate_and_fill_array
place_to_allocate[length];
//do stuff to fill the array with data
return length;
}
size_t determine_length(){
////unknown calculations to determine required length
}
No, even ignoring the concerns people have about using variable-length arrays (VLAs). You are trying to accomplish too much in a single function. Step back a bit and look at what you are asking.
For consistency and to get away from arrays, I'm going to rename some things. Consider this version of your setup:
class X; // instead of "char []" so we can drop the VLA baggage
size_t inner_function(X & data) { // was "allocate_and_fill_array"
// Determine how data should be allocated
// Do stuff with data
}
void outer_function() {
X data;
size_t data_size = inner_function(data);
}
Requirement #1: The inner function needs access to a variable declared in the outer function. This requires that the variable be passed as a parameter to the inner function. This in turn requires that the inner function be called after the variable is declared.
Requirement #2: The inner function determines how data should be allocated (which happens at the point of declaration). This requires that the inner function be called before the variable is declared.
These requirements have contradictory prerequisites. Not possible.
I am led to the question: what led you to this approach? You already wrote a separate determine_length function. Let outside_function call that, declare the VLA, then pass the VLA and the length to the inner function. Much simpler conceptually.
size_t determine_length() {
// unknown calculations to determine required length
}
void fill_array(char* stack_array, size_t length) {
//do stuff to fill the array with data
}
void outside_function(){
size_t length = determine_length();
char stack_array[length];
fill_array(stack_array, length);
//do stuff with the variable length array on stack
}
Still, this obsession with getting the data on the stack is probably premature. While it is true that heap storage is more expensive than stack storage, the difference is often not worth worrying about. Get your program working before jumping through hoops to tweak performance. Focus on robustness. Only spend time on a performance bottleneck after it has been identified by a profiler.
I have some code using a variable length array (VLA), which compiles fine in gcc and clang, but does not work with MSVC 2015.
class Test {
public:
Test() {
P = 5;
}
void somemethod() {
int array[P];
// do something with the array
}
private:
int P;
}
There seem to be two solutions in the code:
using alloca(), taking the risks of alloca in account by making absolutely sure not to access elements outside of the array.
using a vector member variable (assuming that the overhead between vector and c array is not the limiting factor as long as P is constant after construction of the object)
The ector would be more portable (less #ifdef testing which compiler is used), but I suspect alloca() to be faster.
The vector implementation would look like this:
class Test {
public:
Test() {
P = 5;
init();
}
void init() {
array.resize(P);
}
void somemethod() {
// do something with the array
}
private:
int P;
vector<int> array;
}
Another consideration: when I only change P outside of the function, is having a array on the heap which isn't reallocated even faster than having a VLA on the stack?
Maximum P will be about 400.
You could and probably should use some dynamically allocated heap memory, such as managed by a std::vector (as answered by Peter). You could use smart pointers, or plain raw pointers (new, malloc,....) that you should not forget to release (delete,free,....). Notice that heap allocation is probably faster than what you believe (practically, much less than a microsecond on current laptops most of the time).
Sometimes you can move the allocation out of some inner loop, or grow it only occasionally (so for a realloc-like thing, better use unsigned newsize=5*oldsize/4+10; than unsigned newsize=oldsize+1; i.e. have some geometrical growth). If you can't use vectors, be sure to keep separate allocated size and used lengths (as std::vector does internally).
Another strategy would be to special case small sizes vs bigger ones. e.g. for an array less than 30 elements, use the call stack; for bigger ones, use the heap.
If you insist on allocating (using VLAs -they are a commonly available extension of standard C++11- or alloca) on the call stack, be wise to limit your call frame to a few kilobytes. The total call stack is limited (e.g. often to about a megabyte or a few of them on many laptops) to some implementation specific limit. In some OSes you can raise that limit (see also setrlimit(2) on Linux)
Be sure to benchmark before hand-tuning your code. Don't forget to enable compiler optimization (e.g. g++ -O2 -Wall with GCC) before benchmarking. Remember that caches misses are generally much more expensive than heap allocation. Don't forget that developer's time also has some cost (which often is comparable to cumulated hardware costs).
Notice that using static variable or data has also issues (it is not reentrant, not thread safe, not async-signal-safe -see signal-safety(7) ....) and is less readable and less robust.
First of all, you're getting lucky if your code compiles with ANY C++ compiler as is. VLAs are not standard C++. Some compilers support them as an extension.
Using alloca() is also not standard, so is not guaranteed to work reliably (or even at all) when using different compilers.
Using a static vector is inadvisable in many cases. In your case, it gives behaviour that is potentially not equivalent to the original code.
A third option you may wish to consider is
// in definition of class Test
void somemethod()
{
std::vector<int> array(P); // assume preceding #include <vector>
// do something with array
}
A vector is essentially a dynamically allocated array, but will be cleaned up properly in the above when the function returns.
The above is standard C++. Unless you perform rigorous testing and profiling that provides evidence of a performance concern this should be sufficient.
Why don't you make the array a private member?
#include <vector>
class Test
{
public:
Test()
{
data_.resize(5);
}
void somemethod()
{
// do something with data_
}
private:
std::vector<int> data_;
}
As you've specified a likely maximum size of the array, you could also look at something like boost::small_vector, which could be used like:
#include <boost/container/small_vector.hpp>
class Test
{
public:
Test()
{
data_.resize(5);
}
void somemethod()
{
// do something with data_
}
private:
using boc = boost::container;
constexpr std::size_t preset_capacity_ = 400;
boc::small_vector<int, preset_capacity_> data_;
}
You should profile to see if this is actually better, and be aware this will likely use more memory, which could be an issue if there are many Test instances.
I am relatively new to C++...
I am learning and coding but I am finding the idea of pointers to be somewhat fuzzy. As I understand it * points to a value and & points to an address...great but why? Which is byval and which is byref and again why?
And while I feel like I am learning and understanding the idea of stack vs heap, runtime vs design time etc, I don't feel like I'm fully understanding what is going on. I don't like using coding techniques that I don't fully understand.
Could anyone please elaborate on exactly what and why the pointers in this fairly "simple" function below are used, esp the pointer to the function itself.. [got it]
Just asking how to clean up (delete[]) the str... or if it just goes out of scope.. Thanks.
char *char_out(AnsiString ansi_in)
{
// allocate memory for char array
char *str = new char[ansi_in.Length() + 1];
// copy contents of string into char array
strcpy(str, ansi_in.c_str());
return str;
}
Revision 3
TL;DR:
AnsiString appears to be an object which is passed by value to that function.
char* str is on the stack.
A new array is created on the heap with (ansi_in.Length() + 1) elements. A pointer to the array is stored in str. +1 is used because strings in C/C++ typically use a null terminator, which is a special character used to identify the end of the string when scanning through it.
ansi_in.cstr() is called, copying a pointer to its string buffer into an unnamed local variable on the stack.
str and the temporary pointer are pushed onto the stack and strcpy is called. This has the effect of copying the string(including the null-terminator) pointed at from the temporary to str.
str is returned to the caller
Long answer:
You appear to be struggling to understand stack vs heap, and pointers vs non-pointers. I'll break them down for you and then answer your question.
The stack is a concept where a fixed region of memory is allocated for each thread before it starts and before any user code runs.
Ignoring lower level details such as calling conventions and compiler optimizations, you can reason that the following happens when you call a function:
Arguments are pushed onto the stack. This reserves part of the stack for use of the arguments.
The function performs some job, using and copying the arguments as needed.
The function pops the arguments off the stack and returns. This frees the space reserved for the arguments.
This isn't limited to function calls. When you declare objects and primitives in a function's body, space for them is reserved via pushing. When they're out of scope, they're automatically cleaned up by calling destructors and popping.
When your program runs out of stack space and starts using the space outside of it, you'll typically encounter an error. Regardless of what the actual error is, it's known as a stack overflow because you're going past it and therefore "overflowing".
The heap is a different concept where the remaining unused memory of the system is available for you to manually allocate and deallocate from. This is primarily used when you have a large data set that's too big for the stack, or when you need data to persist across arbitrary functions.
C++ is a difficult beast to master, but if you can wrap your head around the core concepts is becomes easier to understand.
Suppose we wanted to model a human:
struct Human
{
const char* Name;
int Age;
};
int main(int argc, char** argv)
{
Human human;
human.Name = "Edward";
human.Age = 30;
return 0;
}
This allocates at least sizeof(Human) bytes on the stack for storing the 'human' object. Right before main() returns, the space for 'human' is freed.
Now, suppose we wanted an array of 10 humans:
int main(int argc, char** argv)
{
Human humans[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
return 0;
}
This allocates at least (sizeof(Human) * 10) bytes on the stack for storing the 'humans' array. This too is automatically cleaned up.
Note uses of ".". When using anything that's not a pointer, you access their contents using a period. This is direct memory access if you're not using a reference.
Here's the single object version using the heap:
int main(int argc, char** argv)
{
Human* human = new Human();
human->Name = "Edward";
human->Age = 30;
delete human;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for the pointer 'human', and at least sizeof(Human) bytes on the heap for storing the object it points to. 'human' is not automatically cleaned up, you must call delete to free it. Note uses of "a->b". When using pointers, you access their contents using the "->" operator. This is indirect memory access, because you're accessing memory through an variable address.
It's sort of like mail. When someone wants to mail you something they write an address on an envelope and submit it through the mail system. A mailman takes the mail and moves it to your mailbox. For comparison the pointer is the address written on the envelope, the memory management unit(mmu) is the mail system, the electrical signals being passed down the wire are the mailman, and the memory location the address refers to is the mailbox.
Here's the array version using the heap:
int main(int argc, char** argv)
{
Human* humans = new Human[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
delete[] humans;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for pointer 'humans', and (sizeof(Human) * 10) bytes on the heap for storing the array it points to. 'humans' is also not automatically cleaned up; you must call delete[] to free it.
Note uses of "a[i].b" rather than "a[i]->b". The "[]" operator(indexer) is really just syntactic sugar for "*(a + i)", which really just means treat it as a normal variable in a sequence so I can type less.
In both of the above heap examples, if you didn't write delete/delete[], the memory that the pointers point to would leak(also known as dangle). This is bad because if left unchecked it could eat through all your available memory, eventually crashing when there isn't enough or the OS decides other apps are more important than yours.
Using the stack is usually the wiser choice as you get automatic lifetime management via scope(aka RAII) and better data locality. The only "drawback" to this approach is that because of scoped lifetime you can't directly access your stack variables once the scope has exited. In other words you can only use stack variables within the scope they're declared. Despite this, C++ allows you to copy pointers and references to stack variables, and indirectly use them outside the scope they're declared in. Do note however that this is almost always a very bad idea, don't do it unless you really know what you're doing, I can't stress this enough.
Passing an argument by-ref means pushing a copy of a pointer or reference to the data on the stack. As far as the computer is concerned pointers and references are the same thing. This is a very lightweight concept, but you typically need to check for null in functions receiving pointers.
Pointer variant of an integer adding function:
int add(const int* firstIntPtr, const int* secondIntPtr)
{
if (firstIntPtr == nullptr) {
throw std::invalid_argument("firstIntPtr cannot be null.");
}
if (secondIntPtr == nullptr) {
throw std::invalid_argument("secondIntPtr cannot be null.");
}
return *firstIntPtr + *secondIntPtr;
}
Note the null checks. If it didn't verify its arguments are valid, they very well may be null or point to memory the app doesn't have access to. Attempting to read such values via dereferencing(*firstIntPtr/*secondIntPtr) is undefined behavior and if you're lucky results in a segmentation fault(aka access violation on windows), crashing the program. When this happens and your program doesn't crash, there are deeper issues with your code that are out of the scope of this answer.
Reference variant of an integer adding function:
int add(const int& firstInt, const int& secondInt)
{
return firstInt + secondInt;
}
Note the lack of null checks. By design C++ limits how you can acquire references, so you're not suppose to be able to pass a null reference, and therefore no null checks are required. That said, it's still possible to get a null reference through converting a pointer to a reference, but if you're doing that and not checking for null before converting you have a bug in your code.
Passing an argument by-val means pushing a copy of it on the stack. You almost always want to pass small data structures by value. You don't have to check for null when passing values because you're passing the actual data itself and not a pointer to it.
i.e.
int add(int firstInt, int secondInt)
{
return firstInt + secondInt;
}
No null checks are required because values, not pointers are used. Values can't be null.
Assuming you're interested in learning about all this, I highly suggest you use std::string(also see this) for all your string needs and std::unique_ptr(also see this) for managing pointers.
i.e.
std::string char_out(AnsiString ansi_in)
{
return std::string(ansi_in.c_str());
}
std::unique_ptr<char[]> char_out(AnsiString ansi_in)
{
std::unique_ptr<char[]> str(new char[ansi_in.Length() + 1]);
strcpy(str.get(), ansi_in.c_str());
return str; // std::move(str) if you're using an older C++11 compiler.
}
I am creating an Arduino device using C++. I need a stack object with variable size and variable data types. Essentially this stack needs to be able to be resized and used with bytes, chars, ints, doubles, floats, shorts, and longs.
I have a basic class setup, but with the amount of dynamic memory allocation that is required, I wanted to make sure that my use of data frees enough space for the program to continue without memory problems. This does not use std methods, but instead built in versions of those for the Arduino.
For clarification, my question is: Are there any potential memory problems in my code?
NOTE: This is not on the Arduino stack exchange because it requires an in depth knoweledge of C/C++ memory allocation that could be useful to all C and C++ programmers.
Here's the code:
Stack.h
#pragma once
class Stack {
public:
void init();
void deinit();
void push(byte* data, size_t data_size);
byte* pop(size_t data_size);
size_t length();
private:
byte* data_array;
};
Stack.cpp
#include "Arduino.h"
#include "Stack.h"
void Stack::init() {
// Initialize the Stack as having no size or items
data_array = (byte*)malloc(0);
}
void Stack::deinit() {
// free the data so it can be re-used
free(data_array);
}
// Push an item of variable size onto the Stack (byte, short, double, int, float, long, or char)
void Stack::push(byte* data, size_t data_size) {
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
for(size_t i = 0; i < sizeof(data); i++)
data_array[sizeof(data_array) - sizeof(data) + i] = data[i];
}
// Pop an item of variable size off the Stack (byte, short, double, int, float, long, or char)
byte* Stack::pop(size_t data_size) {
byte* data;
if(sizeof(data_array) - data_size >= 0) {
data = (byte*)(&data_array + sizeof(data_array) - data_size);
data_array = (byte*)realloc(data_array, sizeof(data_array) - data_size);
} else {
data = NULL;
}
// Make sure to free(data) when done with the data from pop()!
return data;
}
// Return the sizeof the Stack
size_t Stack::length() {
return sizeof(data_array);
}
There are some minor code bugs, apparently, which -- although important -- are easily resolved. The following answer only applies to the overall design of this class:
There is nothing wrong with just the code that is shown.
But only the code that's shown. No opinion is rendered on any code that's not shown.
And, it's fairly likely that there are going to be massive problems, and memory leaks, in the rest of the code which will attempt to use this class.
It's going to very, very easy to use this class in a way that leaks or corrupts memory. It's going to be much harder to use this class correctly, and much easier to screw up. The fact that these functions themselves appear to do their job correctly is not going to help if all you have to do is sneeze in the wrong direction, and end up with these functions not being used in the proper order, or sequence.
Just to name the first two readily apparent problems:
1) Failure to call deinit(), when any instance of this class goes out of scope and gets destroyed, will leak memory. Every time you use this class, you have to be cognizant of when the instance of this class goes out of scope and gets destroyed. It's easy to keep track of every time you create an instance of this class, and it's easy to remember to call init() every time. But keeping track of every possible way an instance of this class could go out of scope and get destroyed, so that you must call deinit() and free up the internal memory, is much harder. It's very easy to not even realize when that happens.
2) If an instance of this class gets copy-constructed, or the default assignment operator gets invoked, this is guaranteed to result in memory corruption, with an extra side-helping of a memory leak.
Note that you don't have to go out of your way to write code that copy-constructs, or assigns one instance of the object to another one. The compiler will be more than happy to do it for you, if you do not pay attention.
Generally, the best way to avoid these kinds of problems is to make it impossible to happen, by using the language correctly. Namely:
1) Following the RAII design pattern. Get rid of init() and deinit(). Instead, do this work in the object's constructor and destructor.
2) Either deleting the copy constructor and the assignment operator, or implementing them correctly. So, if instances of this class should never be copy-constructed or assigned-to, it's much better to have the compiler yell at you, if you accidentally write some code that does that, instead of spending a week tracking down where that happens. Or, if the class can be copy-constructed or assigned, doing it properly.
Of course, if there would only be a small number of instances of this class, it should be possible to safely use it, with tight controls, and lots of care, without doing this kind of a redesign. But, even if it were the case, it's always better to do the job right, instead of shrugging this off now, but then later deciding to expand the use of this class in more places, and then forgetting about the fact that this class is so error-prone.
P.S.: a few of the minor bugs that I mentioned in the beginning:
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
This can't be right. data_array is a byte *, so sizeof(data_array) will always be a compile-time constant, which would be sizeof(byte *). That's obviously not what you want here. You need to explicitly keep track of the allocated array's size.
The same general bug appears in several other places here, but it's easily fixed. The overall class design is the bigger problem.
I have a function
void fname(char* Ptr)
{
...
}
I want to know inside this function whether this pointer Ptr holds the address of dynamically allocated memory using new char[] or the address of locally allocated memory in the calling function. Is there any way I can determine that? I think <typeinfo> doesn't help here.
One way to do this is to have your own operator new functions and keep track of everything allocated so that you can just ask your allocation library if the address given is one it allocated. The custom allocator then just calls the standard one to actually do the allocation.
Another approach (messy and details highly OS dependent) may be to examine the process layout in virtual memory and hence determine which addresses refer to which areas of memory.
You can combine these ideas by actually managing your own memory pools. So if you get a single large chunk of system memory with known address bounds and use that for all new'd memory, you can just check that an address in is the given range to answer your question.
However: Any of these ideas is a lot of work and not appropriate if this problem is the only purpose in doing so.
Having said all that, if you do want to implement something, you will need to work carefully through all the ways that an address might be generated.
For example (and surely I've missed some):
Stack
Return from new
Inside something returned from new.
Was returned from new but already deleted (hopefully not, but that's why we need diagnostics)
statically allocated
static constant memory
command line arguments/ environment
code addresses.
Now, ignoring all that for a moment, and assuming this is for some debug purpose rather than system design, you might be able to try this kind of thing:
This is ugly, unreliable, not guaranteed by the standard, etc etc, but might work . . .
char* firstStack = 0;
bool isOnStack(const void* p)
{
char* check =(char*)p;
char * here = (char*)✓
int a = firstStack - check;
int b = check - here;
return (a*b > 0);
}
void g(const char* p)
{
bool onStack = isOnStack(p);
std::cout << p << (onStack ? "" : " not" ) << " on stack " << std::endl;
}
void f()
{
char stuff[1024] = "Hello";
g(stuff);
}
void h()
{
char* nonsense = new char[1024];
strcpy(nonsense, "World");
g(nonsense);
delete [] nonsense;
}
int main()
{
int var = 0;
firstStack = (char*)&var;
f();
h();
}
Output:
Hello on stack
World not on stack
The short answer: no, you can't. You have no way of knowing whether Ptr is a pointer to a single char, the start of a statically allocated array, a pointer to a single dynamically allocated char, or the start of an array thereof.
If you really wanted to, you try an overload like so:
template <std::size_t N>
void fname(char Ptr[N])
{
// ...
}
which would match when passed a statically allocated array, whereas the first version would be picked when dealing with dynamically allocated memory or a pointer to a single char.
(But note that function overloading rules are a bit complicated in the presence of templates -- in particular, a non-template function is preferred if it matches. So you might need to make the original function take a "dummy" template parameter if you go for this approach.)
In vc++ there is an assertion _CrtIsMemoryBlock (http://msdn.microsoft.com/en-us/library/ww5t02fa.aspx#BKMK_CRT_assertions) that can be used to check if a pointer was allocated from the heap. This will only work when a debug heap is being used but this is fine if you are just wanting to add some 'debug only' assertions. This method has worked well for me in the past under Windows.
For Linux however I know of no such equivalent.
Alternatively you could use an inline assembler block to try to determine the if it is a stack address or not. This would be hardware dependent as it would rely heavily not only on the processor type but also on the memory model being used (flat address model vs segmented etc). Its probably best to avoid this type of approach.