Stack allocator for C++03 standard containers - c++

For a software I have to avoid any use of memory in the heap, and only rely on stack-allocated memory. Then, this prevents me from using any C++ standard containers, such as vector, map, string (well, basic_string), which I would really like to use, to ease development and data manipulation.
I found (many) implementations of stack allocators, such as this one which itself references two others, or this one from chromium.
Many of them are not fully compliant with the standard, or rely on C++11 (and I am stuck with C++03 at the moment, sadly). Do you have any feedback about a good already existing stack allocator for C++03 or should I adapt one of the above?
Thanks!

Howard Hinnant's short_alloc.h (see also here) is a pretty good start (you will need to add C++03 boilerplate, see here).
Of course, this will still go to the heap if it runs out of memory, the alternative is to throw std::bad_alloc.

Related

Why there is no no-synchronous smart pointer in C++? [duplicate]

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.
1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.
Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.
My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).
One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.
The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.
Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.
Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.
Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.
std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.
Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.
Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.
I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

Is C++'s new operator reentrant (or async-safe)?

The background is in this question of mine. Put shortly, I have to fork in a multithreaded C++ program, so I'd like to figure out how much I can do when restricted to reentrant functions only, and one of the most essential things is dynamic memory.
So, malloc is known to be non-reentrant. But what about C++'s new? I googled for that with not many relevant results (mostly due to the difficulty to hit the correct "new"), but there is at least one claim that new is reentrant. There is also a relevant question concerning the whole C++ standard library with no satisfying answer.
Edit: I guess the standard didn't say anything about this, so I'm mostly concerned about major implementations.
I've looked at both the gcc libsupc++ and clang libc++ source, for replacing the standard-conforming C++ new/delete operators - to support native SIMD alignment requirements on platforms where it wasn't guaranteed by malloc.
They are basically wrappers for malloc and free with some EH logic, etc. I am not a language lawyer, but unless both have it wrong, I think it's safe to conclude: no, they are not reentrant.
Standard allows new to be just a wrapper around malloc, so if malloc can be not reentrant, so can new.
Thread-safety and re-entrance are not exactly the same.
AFAIK, the C++ ISO standard does not guarantee thread-safety for new and delete operators. But g++ implementation does provide thread-safetly (and it's one of the reasons it's slow).

What are the advantages of using an STL clone?

When would I need an alternative to C++'s STL?
Are there any advantages to using an alternative STL?
Which ones would you recommend, if any?
Sorry for these noob bullet points, but I see a lot of products that ship with different STLs linked in and was wondering when something like that is useful.
I'm assuming you're talking about alternative implementations of STL, rather than alternatives to the STL.
There's a few reasons you might use a 3rd party STL implementation, rather than the default one provided by your compiler.
Consistency - you might be using multiple compilers and want to ensure you get the same behavior on each platform.
Speed - An implementation might be more efficient than the one provided by your compiler.
Completeness - Your compilers default library might not provide the full complement of STL features. (This may only be for old compilers, or compilers for embedded systems, or for C++11 features).
Extra features - Some implementations of STL provide features like improved debugging of invalid iterators etc, which may not be in your compilers implementation.
Obviously not all these hold for all compilers .. but there are certainly cases where 3rd party STLs can be helpful.
As for implementations: you can find a list here
Michael's provided a good answer - just a couple points to add:
"Speed" isn't just a linear thing where you can say decisively that STL implementation X is N% faster than STL Y: there are implementation choices trading off speed and memory usage in various usage scenarios. For example, a "short string optimisation" may allow very short strings to be stored directly in the string object rather than in heap memory; implementations may have slightly different choices about how generously to resize containers exceeding their current capacity.
Binary interoperability is a big deal: if you need to call a library function that's pre-compiled to accept STL X objects, you can't simply link the library and feed it the STL Y equivalents: there could be differences in the mangled names preventing linking, the binary layout of the objects may well be different, and even if not and you forced such a call - the operations your client code performs on those objects may not be be everything the library code expects or needs (i.e. wouldn't maintain the same invariants).
Thread safety is a noteworthy example of "extra features"... e.g. many early STLs had errors with Copy-on-Write string implementations.
Another point: some STL implementations allow you to disable the use of exceptions, possibly using a custom global error handler instead of C++ exceptions. This is less important nowadays, but for a long time, a lot of systems had exceptions disabled for various reasons, and there are still a few outlier systems on which exceptions are discouraged or completely unsupported.

Reason for not using the STL? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
To STL or !STL, that is the question
Are there cases where one should avoid the use of the C++ STL in his / her project?
When you choose to use a framework like Qt you might consider using lists, vectors, etc from Qt rather that the STL. Not using STL in this case saves you from having to convert from STL to the Qt equivalents when you need to use them in your GUI.
This is debatable and not everyone wants to use everything from Qt
from http://doc.qt.nokia.com/latest/containers.html
These container classes are designed to be lighter, safer, and easier to use than the STL containers. If you are unfamiliar with the STL, or prefer to do things the "Qt way", you can use these classes instead of the STL classes.
If you cannot use RTTI and/or exceptions, you might experience that parts of STL won't work. This is the case e.g. for native Android apps. So if it doesn't give you what you need, it's a reason not to use it!
Not really. There's no excuse to ban the use of an entire library- unless that lib only serves one function, which is not the case with the Standard library. The provided facilities should be evaluated on a per-function basis- for example, you may well argue that you need a container that performs a more specific purpose than vector, but that is no excuse to ban the use of deque, iostream or for_each too.
More importantly, code generated via template will not be more bloated than the equivalent code written by hand. You won't save code bloat by refusing to use std::vector and then writing your equivalent vector for float and double. Especially in 2011, the size of an executable is pretty meaningless compared to the sizes of other things like media in the vast, vast majority of situations.
If you care a lot about executable size, then you might want to avoid using STL in your program.
For example, uTorrent doesn't use STL and that is one reason why it's so small.
Since STL does rely on templates a lot (it is Standard TEMPLATE Library, after all), many times you use templates, the compiler has to generate extra code for every type you use when dealing with STL.
This is compile time polymorphism and will increase your executable size the more you use it.
If you exclude STL from your project (and use templates sparingly or not at all), your code size will get smaller. Note that it won't necessarily be faster.
Also note that I'm not talking about a program's memory usage during execution, since that will depend on how many objects your allocating during your app's lifetime.
I'm talking about your binary's executable.
If you want an example, note that a simple Hello world program, when compiled, might be bigger than a cleverly code demo which can include an entire 3D engine (run time generated) in a very small executable.
Some info regarding uTorrent's size:
Official FAQ (from 2008), this question doesn't appear in recent FAQ.
How was uTorrent programmed to be so efficient?
Second post regarding this.
Third post regarding this.
Note that, even though uTorrent is >300kb and is compressed with UPX, it is still really small when you take into account what it's capable of doing.
I would say that there may be occasions where you do not use a particular feature of STL in your project for a given circumstance because you can custom write it better for your needs. STL collections are generic by nature.
You might want in your code:
Lock-free containers that are thread-safe. (STL ones are not).
A string class that is immutable by nature and copies the actual data "by reference" (with some mechanism).
An efficient string-building class that is not ostringstream (not part of STL anyway but you may mean all the standard library)
algorithms that use Map and Reduce (not to be confused with std::map. Map and Reduce is a way to iterate over a collection using multiple threads or processes, possibly even distributed on different machines).
Hey, look, so much of boost was written because what the Standard Library provided at the time did not really address the needs of the programmer and thus provided alternatives.
I am not sure if this is what you meant or if you particular meant STL should be "banned" at all times (eg device driver programming where templates are considered bloaty even though that is not always the case).
If you are working to particular standards that forbid it.
For example, the MISRA C/C++ guidelines are aimed at automotive and embedded systems and forbid using dynamic memory allocation, so you might choose to avoid STL containers altogether.
Note: The MISRA guideline is just an example of a standard that might influence your choice to use STL. That particular guideline doesn't rule out using all of the STL. But (I believe) it rules out using STL containers as they rely on runtime allocation of memory.
It can increase executable size. if you're running on an embedded platform you may wish to exclude the STL.
When you use something like the Qt library that implements identical functionality you may not need the STL. May depend on other needs, like performance.
The only reason is if you are working on embedded systems with low memory, or if your project coding guidelines explicitly forbid STL.
I can't other reasonable reason to manually roll your own incompatible, bug ridden implementation of some of the features on STL.
TR18015 deals with some of the limitation of the STL. It looks at it from a different angle - what compilers could do better - but still is an interesting (if in-depth) read.
I'd be careful in general with microprocessors and small embedded systems. First, compiler optimizations are not up to what you know from desktops, and you run into hardware limits much sooner.
Having said that, it depends a lot on the libraries you use. I/O streams are notoriously slow (and require a careful implementation to not be), whereas std::vector is merely a thin wrapper.

Using boost in embedded system with memory limitation

We are using c++ to develop an application that runs in Windows CE 4 on an embedded system.
One of our constraint is that all the memory used by the application shall be allocated during startup only. We wrote a lot of containers and algorithms that are using only preallocated memory instead of allocating new one.
Do you think it is possible for us to use the boost libraries instead of our own containers in these conditions?
Any comments and/or advice are welcomed!
Thanks a lot,
Nic
We use boost for embedded systems. With boost you can pick and choose what you use. We use smart_ptr and boost::bind in all of our projects. We write software for cheap cell phones.
And if Windows CE can run on your hardware I would expect that parts of boost would be applicable.
There are parts of boost that have no allocation and you might find them useful.
I would pick and choose based on your requirements.
Like anything that you use, you need to know the costs.
You could write your own allocator for the container, which allocates from a fixed size static buffer. Depending on the usage patterns of the container the allocator could be as simple as incrementing a pointer (e.g. when you only insert stuff into the container once at app startup, and don't continuously add/remove elements.)
Replacing your containers with Boost containers is NOT a good idea. The work to make appropriate custom allocators wouldn't be that bad, but you'd be violating the spirit of your 'allocate at startup' rule. The idea behind this rule (in my experience) is generally to make sure that you don't have to deal with out of memory type situations at run-time. The idea is to make sure that you have all the memory you could possibly need RIGHT AT THE START, so that there's no possibility of any part of the system coming up short of memory later on.
If you used the Boost containers with a custom allocator, you'd suddenly have to deal with the possibility that the pool the container is allocating from could go empty, thus eliminating the purpose of the 'allocate at startup' rule.
In the situation of a limited memory device, I would avoid any kind of container more complex than a statically allocated array.
Boost is a set of libraries. Some of them are focussed on template metaprogramming. Those don't even use any memory at runtime. But your question seems to be about replacing your containers. I'd doubt that is possible except using custom allocators. But even then, it's most likely you would be using plain STL containers and not boost. Boost only provides the TR1 containers, for those compilers that do not yet include TR1.
Do not use Boost.
It is a big library and your basic memory allocation requirements are very different from those of the libraries designers.
Even if you can get a current version of Boost to work according to your requirements with custom allocators it may break with a new version of Boost.
Feel free to look at the Boost source code though for some useful ideas but use your own implementation for what you need.
I'm looking into this right now — I would like to use circular buffers, lock-free containers, and asynchronous I/O, and instead of allocating dynamic memory, I'd prefer to use memory pools.
The biggest problem I've seen so far is that shared_ptr is used in a lot of places, with no easy way to replace it with intrusive_ptr. Since shared_ptr allocates dynamic memory to keep track of the reference count, I can't use it in an embedded system.
Fixing this looks doable, but a lot of work — I have to expand the template specification of any class that contains a shared_ptr so that the specific type of shared-pointer can be changed to intrusive_ptr if desired. So now I have to consider how much work that'll be, versus how much work it'll be to write my own version of the Boost features I need. Not a pleasant place to be.
I hope someone points out why I'm wrong about this.