g++ compiler hints to allocate on stack

g++ compiler hints to allocate on stack - c++

Are there any methods to give the compiler hints that some objects may have a more static behaviour, and allocate things on the stack instead of heap ?
For example a string object might have a kind of a constant size inside some functions.
I'm asking this because I'm trying to improve performance for an application by using OpenMP. I've already improved the serial part from going from 50 to 20 seconds, and it goes to 12 seconds with parallelism (mentioning that most of the code can be run in parallel). I'm trying to continue improvement. I think one limitation is related to continuous allocation and release of dynamic memory inside the same process.
The serial optimizations, so far, were related to merging to a more ANSI C approach, with a more hardcoded allocation of variables (they are allocated dynamically, but considering a worst case scenario, so everything is allocated once).
Now I'm pretty much stuck, because I've reached a part of the code which has a lot of C++ approach.

The standard std::basic_string template (of which std::string is a specialization) accepts an allocator as its third argument, and you might supply your own stack-based allocator instead of std::allocator, but that would be brittle and tricky (you could use alloca(3) and ensure that all the allocations are inlined; if they are not alloca won't work as you want it.). I don't recommend this approach.
A more viable approach could be to have your own arena or region based allocator. See std::allocator_traits
You could perhaps simply use the C snprintf(3) on a large enough local buffer (e.g. char buf[128];)

I think you are looking for a small buffer optimization.
Detailed description can be found Here.
Basic idea is to add an union to the class, that will hold buffer:
class string
{
union Buffer
{
char* _begin;
char[16] _local;
};
Buffer _buffer;
size_t _size;
size_t _capacity;
// ...
};

so you are looking for deficiencies by using static analysis to find performance regressions?
It's a good idea, cppcheck has some of those, but those are very rudimentary.
I'm not aware of any tool that does that so far.
There are however tools that do different things:
jemalloc
jemalloc has a allocation profiler. (See: http://www.canonware.com/jemalloc/)
Perhaps this is of some help to you. I haven't tried it myself sofar, but I would expect it to post object lifetimes and the objects that produce the highest pressure on the allocator (to find the most hurting parts first).
cacheGrind
Valgrind has also a cache and branch prediction simulator. http://valgrind.org/docs/manual/cg-manual.html
clang-check
if you find yourself having too much freetime you can try to run your own checking tools using clang-check.
google perftools
The google perf tools also have a heap profiler. https://code.google.com/p/gperftools/

Related

Passing the results of `std::string::c_str()` to `mkdtemp()` using `const_cast<char*>()`

OK, so: we all know that generally the use of const_cast<>() anywhere is so bad it’s practically a programming war crime. So this is a hypothetical question about how bad it might be, exactly, in a specific case.
To wit: I ran across some code that did something like this:
std::string temporary = "/tmp/directory-XXXXX";
const char* dtemp = ::mkdtemp(const_cast<char*>(temporary.c_str()));
/// `temporary` is unused hereafter
… now, I have run across numerous descriptions about how to get writeable access to the underlying buffer of a std::string instance (q.v. https://stackoverflow.com/a/15863513/298171 for example) – all of them have the caveat that yes, these methods aren’t guaranteed to work by any C++ standard, but in practice they all do.
With this in mind, I am just curious on how using const_cast<char*>(string.c_str()) compares to other known methods (e.g. the aforementioned &string[0], &c)… I ask because the code in which I found this method in use seems to work fine in practice, and I thought I’d see what the experts thought before I attempt the inevitable const_cast<>()-free rewrite.

const cannot be enforced at hardware level because in practice, in non-hypothetical environment, you can set read-only attribute only to a full 4K memory page and there are huge pages on the way, which drastically reduce CPU's lookup misses in the TLB.
const doesn't affect code generation like __restrict from C99 does. In fact, const, roughly speaking, means "poison all write attempts to this data, I'd like to protect my invariants here"
Since std::string is a mutable string, its underlying buffer cannot be allocated in read-only memory. So const_cast<> shouldn't cause program crash here unless you're going to change some bytes outside of underlying buffer's bounds or trying to delete, free() or realloc() something. However, altering of chars in the buffer may be classified as invariant violation. Because you don't use std::string instance after that and simply throw it away this shouldn't provoke program crash unless some particular std::string implementation decide to check its invariants' integrity before destruction and force a crash if some of these are broken. Because such check couldn't be done in less than O(N) time and std::string is a performance critical class, it is unlikely to be done by anyone.
Another issue may come from Copy-on-Write strategy. So, by modifying the buffer directly you may break some other std::string's instance which shares the buffer with your string. But few years ago majority of C++ experts came to conclusion that COW is too fragile and too slow especially in multi-threaded environments, so modern C++ libraries shouldn't use it and instead adhere to using move construction where possible and avoiding heap traffic for small length strings where applicable.

How to use C++ STD with AVR compiler?

I have set up the AVR compiler for using with an Atmel microcontroller using this guide.
I don't have access to strings, vectors etc. How can this be added?

The quick answer is that they are not available and you need to write your own wrapper classes to get this sort of functionality.
If you want to use c++ for the embedded platform you won't have access to all of the standard library. Importantly though, you don't want all of the standard library as it's too heavyweight for some embedded projects. Some language features (like exception handling) might not be possible on the platform you are choosing or might be too expensive given the resources available to you. The lack of some language features makes it impossible to implement certain standard containers, for example the containers that can throw exceptions might not be able to be implemented in a standards-conforming way on some platforms. Additionally there's some c++ constructs that might be available but would be a bad idea to use on the embedded platform. Dynamic allocation of memory via new and delete will very likely run you into a significant number of problems as you don't have a lot of memory and issues such as memory fragmentation are very difficult to deal with. (you would probably want to look into placement new along with some other memory allocation scheme to avoid some of these issues if you needed dynamic memory for some reason)
If you want to have the benefits of containers like std::array and std::string you will need to write your own memory management classes. One of the main benefits of using the std containers is the way in which they greatly simplify your memory management (compared with using raw C-style-arrays). If you are doing a large embedded c++ project you can write your own wrappers for the memory management using RAII and other basic c++ language constructs. For the most part you need to avoid dynamic memory allocation and exception handling when making these classes.
One of the things I find has a good ROI is making some structs/classes that wrap an array along with the length of the array. By keeping the sizes connected you can keep your code a lot clearer. Frequently I find myself writing something like this:
template<typename T, uint8_t MAX_SIZE>
class array_helper{
public:
typedef T value_type;
array_wrapper():
m_data()
{}
T& operator[](unsigned int idx){
return m_data[idx];
}
T* data(){
return this->m_data;
}
const uint8_t s_max_size = MAX_SIZE;
private:
T m_data[MAX_SIZE];
};
You would want to expand on this to do what you need, but hopefully this gives you an idea.

do not do this.
using dynamic memory allocation on avr is not recommendable, since it has not a MMU and only very limited RAM and dynamic memory allocation requires some overhead for bookkeeping.
also there is the danger of memory fragmentation.
on such tiny processors you should only use static and autmatic fixed size memory buffers.
that ensures deterministic run time behavior.

Static arrays VS. dynamic arrays in C++11

I know that it's a very old debate that has already been discussed many times all over the world. But I'm currently having troubles deciding which method I should use rather than another between static and dynamic arrays in a particular case. Actually, I woudn't have used C++11, I would have used static arrays. But I'm now confused since there could be equivalent benefits with both.
First solution:
template<size_t N>
class Foo
{
private:
int array[N];
public:
// Some functions
}
Second solution:
template<size_t N>
class Foo
{
private:
int* array;
public:
// Some functions
}
I can't happen to chose since the two have their own advantages:
Static arrays are faster, and we don't care about memory managment at all.
Dynamic arrays do not weigth anything as long as memory is not allocated. After that, they are less handy to use than static arrays. But since C++11, we can have great benefits from move semantics, which we can not use with static arrays.
I don't think there is one good solution, but I would like to get some advice or just to know what you think of all that.

I will actually disagree with the "it depends". Never use option 2. If you want to use a translationtime constant, always use option 1 or std::array. The one advantage you listed, that dynamic arrays weigh nothing until allocated, is actually a horrible, huge disadvantage, and one that needs to be pointed out with great emphasis.
Do not ever have objects that have more than one phase of construction. Never, ever. That should be a rule committed to memory through some large tattoo. Just never do it.
When you have zombies objects that are not quite alive yet, though not quite dead either, the complexity in managing their lifetime grows exponentially. You have to check in every method whether it is fully alive, or only pretending to be alive. Exception safety requires special cases in your destructor. Instead of one simple construction and automatic destruction, you've now added requirements that must be checked at N different places (# methods + dtor). And the compiler doesn't care if you check. And other engineers won't have this requirement broadcast, so they may adjust your code in unsafe ways, using variables without checking. And now all these methods have multiple behaviors depending on the state of the object, so every user of the object needs to know what to expect. Zombies will ruin your (coding) life.
Instead, if you have two different natural lifetimes in your program, use two different objects. But that means you have two different states in your program, so you should have a state machine, with one state having just one object and another state with both, separated by an asynchronous event. If there is no asynchronous event between the two points, if they all fit in one function scope, then the separation is artifical and you should be doing single phase construction.
The only case where a translation time size should translate to a dynamic allocation is when the size is too large for the stack. This then gets to memory optimisation, and it should always be evaluated using memory and profiling tools to see what's best. Option 2 will never be best (it uses a naked pointer - so again we lose RAII and any automatic cleanup and management, adding invariants and making the code more complex and easily breakable by others). Vector (as suggested by bitmask) would be the appropriate first thought, though you may not like the heap allocation costs in time. Other options might be static space in your application's image. But again, these should only be considered once you've determined that you have a memory constraint and what to do from there should be determined by actual measurable needs.

Use neither. You're better off using std::vector in nearly any case. In the other cases, that heavily depends on the reason why std::vector would be insufficient and hence cannot be answered generally!

I'm currently having a problem to decide which one I should use more than another in a particular case.
You'll need to consider your options case-by-case to determine the optimal solution for the given context -- that is, a generalization cannot be made. If one container were ideal for every scenario, the other would be obsolete.
As mentioned already, consider using std implementations before writing your own.
More details:
Fixed Length
Be careful of how much of the stack you consume.
May consume more memory, if you treat it as a dynamically sized container.
Fast copies.
Variable Length
Reallocation and resizing can be costly.
May consume more memory than needed.
Fast moves.
The better choice also requires you understand the complexity of creation, copy, assign, etc. of the element types.
And if you do use std implementations, remember that implementations may vary.
Finally, you can create a container for these types which abstract the implementation details and dynamically select an appropriate data member based on the size and context -- abstracting the detail behind a general interface. This is also useful at times to disable features, or to make some operations (e.g. costly copies) more obvious.
In short, you need to know a lot about the types and usage, and measure several aspects of your program to determine the optimal container type for a specific scenario.

std::string with no free store memory allocation

I have a question very similar to
How do I allocate a std::string on the stack using glibc's string implementation?
but I think it's worth asking again.
I want an std::string with local storage that overflows into the free store. std::basic_string provides an allocator as a template parameter, so it seems like the thing to do is to write an allocator with local storage and use it to parameterize the basic_string, like so:
std::basic_string<
char,
std::char_traits<char>,
inline_allocator<char, 10>
>
x("test");
I tried to write the inline_allocator class that would work the way you'd expect: it reserves 10 bytes for storage, and if the basic_string needs more than 10 bytes, then it calls ::operator new(). I couldn't get it to work. In the course of executing the above line of code, my GCC 4.5 standard string library calls the copy constructor for inline_allocator 4 times. It's not clear to me that there's a sensible way to write the copy constructor for inline_allocator.
In the other StackOverflow thread, Eric Melski provided this link to a class in Chromium:
http://src.chromium.org/svn/trunk/src/base/stack_container.h
which is interesting, but it's not a drop-in replacement for std::string, because it wraps the std::basic_string in a container so that you have to call an overloaded operator->() to get at the std::basic_string.
I can't find any other solutions to this problem. Could it be that there is no good solution? And if that's true, then are the std::basic_string and std::allocator concepts badly flawed? I mean, it seems like this should be a very basic and simple use case for std::basic_string and std::allocator. I suppose the std::allocator concept is designed primarily for pools, but I think it ought to cover this as well.
It seems like the rvalue-reference move semantics in C++0x might make it possible to write inline_allocator, if the string library is re-written so that basic_string uses the move constructor of its allocator instead of the copy constructor. Does anyone know what the prospect is for that outcome?
My application needs to construct a million tiny ASCII strings per second, so I ended up writing my own fixed-length string class based on Boost.Array, which works fine, but this is still bothering me.

Andrei Alexandrescu, C++ programmer extraordinaire who wrote "Modern C++ Design" once wrote a great article about building different string implementations with customizable storage systems. His article (linked here) describes how you can do what you've described above as a special case of a much more general system that can handle all sorts of clever memory allocation requirements. This doesn't talk so much about std::string and focuses more on a completely customized string class, but you might want to look into it as there are some real gems in the implementation.

C++2011 is really going to help you here :)
The fact is that the allocator concept in C++03 was crippled. One of the requirement was that an allocator of type A should be able to deallocate memory from any other allocator from type A... Unfortunately this requirement is also at odds with stateful allocators each hooked to its own pool.
Howard Hinnant (who manages the STL subgroup of the C++ commitee and is implementing a new STL from scratch for C++0x) has explored stack-based allocators on his website, which you could get inspiration from.

This is generally unnecessary. It's called the "short string optimization", and most implementations of std::string already include it. It may be hard to find, but it's usually there anyway.
Just for example, here's the relevant piece of sso_string_base.h that's part of MinGW:
enum { _S_local_capacity = 15 };
union
{
_CharT _M_local_data[_S_local_capacity + 1];
size_type _M_allocated_capacity;
};
The _M_local_data member is the relevant one -- space for it to store (up to) 15 characters (plus a NUL terminator) without allocating any space on the heap.
If memory serves, the Dinkumware library included with VC++ allocates space for 20 characters, though it's been a while since I looked, so I can't swear to that (and tracking down much of anything in their headers tends to be a pain, so I prefer to avoid looking if I can).
In any case, I'd give good odds that you've been engaged in that all-too-popular pass-time known as premature optimization.

I believe the code from Chromium just wraps things into a nice shell. But you can get the same effect without using the Chromium wrapper container.
Because the allocator object gets copied so often, it needs to hold a reference or pointer to the memory. So what you'd need to do is create the storage buffer, create the allocator object, then call the std::string constructor with the allocator.
It will be a lot wordier than using the wrapper class but should get the same effect.
You can see an example of the verbose method (still using the chromium stuff) in my question about stack vectors.

Creating a scoped custom memory pool/allocator?

Would it be possible in C++ to create a custom allocator that works simply like this:
{
// Limit memory to 1024 KB
ScopedMemoryPool memoryPool(1024 * 1024);
// From here on all heap allocations ('new', 'malloc', ...) take memory from the pool.
// If the pool is depleted these calls result in an exception being thrown.
// Examples:
std::vector<int> integers(10);
int a * = new int [10];
}
I couldn't find something like this in the boost libraries, or anywhere else.
Is there a fundamental problem that makes this impossible?

You would need to create a custom allocator that you pass in as a template param to vector. This custom allocator would essentially wrap the access to your pool and do whatever size validations that it wants.

Yes you can make such a construct, it's used in many games, but you'll basically need to implement your own containers and call memory allocation methods of that pool that you've created.
You could also experiment with writing a custom allocator for the STL containers, although it seems that that sort of work is generally advised against. (I've done it before and it was tedious, but I don't remember any specific problems.)
Mind- writing your own memory allocator is not for the faint of heart. You could take a look at Doug Lea's malloc, which provides "memory spaces", which you could use in your scoping construct somehow.

I will answer a different question. Look at 'efficient c++' book. One of the things they discuss is implementing this kind of thing. That was for a web server
For this particular thing you can either mess at the c++ layer by overriding new and supplying custom allocators to the STL.
Or you can mess at the malloc level, start with a custom malloc and work from there (like dmalloc)

Is there a fundamental problem that makes this impossible?
Arguing about program behavior would become fundamentally impossible. All sorts of weird issues will come up. Certain sections of the code may or may not execute though this will seeminly have no effect on the next sections which may work un-hindered. Certain sections may always fail. Dealing with the standard-library or any other third party library will become extremely difficult. There may be fragmentations at run-time at times and at times not.

If intent is that all allocations within that scope occur with that allocator object, then it's essentially a thread-local variable.
So, there will be multithreading issues if you use a static or global variable to implement it. Otherwise, not a bad workaround for the statelessness of allocators.
(Of course, you'll need to pass a second template argument eg vector< int, UseScopedPool >.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js