I'm writing some code that handles cryptographic secrets, and I've created a custom ZeroedMemory implementation of std::pmr::memory_resource which handles sanitizes memory on deallocation and encapsulates using the magic you have to use to prevent optimizing compilers from eliding away the operation. The idea was to avoid specializing std::array, because the lack of a virtual destructor means that destruction after type erasure would cause memory to be freed without being sanitized.
Unfortunately, I came to realize afterwards that std::array isn't an AllocatorAwareContainer. My std::pmr::polymorphic_allocator approach was a bit misguided, since obviously there's no room in an std::array to store a pointer to a specific allocator instance. Still, I can't fathom why allocators for which std::allocator_traits<A>::is_always_equal::value == true wouldn't be allowed, and I could easily re-implement my solution as a generic Allocator instead of the easier-to-use std::pmr::memory_resource...
Now, I could normally just use an std::pmr::vector instead, but one of the nice features of std::array is that the length of the array is part of the type. If I'm dealing with a 32-byte key, for example, I don't have to do runtime checks to be sure that the std::array<uint8_t, 32> parameter someone passed to my function is, in fact, the right length. In fact, those cast down nicely to a const std::span<uint8_t, 32>, which vastly simplifies writing functions that need to interoperate with C code because they enable me to handle arbitrary memory blocks from any source basically for free.
Ironically, std::tuple takes allocators... but I shudder to imagine the typedef needed to handle a 32-byte std::tuple<uint8_t, uint8_t, uint8_t, uint8_t, ...>.
So: is there any standard-ish type that holds a fixed number of homogenously-typed items, a la std::array, but is allocator aware (and preferably stores the items in a continguous region, so it can be down-cast to an std::span)?
You need cooperation from both the compiler and the OS in order for such a scheme to work. P1315 is a proposal to address the compiler/language side of things. As for the OS, you have to make sure that the memory was never paged out to disk, etc. in order for this to truly zero memory.
This sounds like a XY problem. You seem to be misusing allocators. Allocators are used to handle runtime memory allocation and deallocation, not to hook stack memory. What you are trying to do — zeroing the memory after using — should really be done with a destructor. You may want to write a class Key for this:
class Key {
public:
// ...
~Key()
{
secure_clear(*this); // for illustration
}
// ...
private:
std::array<std::uint8_t, 32> key;
};
You can easily implement iterator and span support. And you don't need to play with allocators.
If you want to reduce boilerplate code and make the new class automatically iterator / span friendly, use inheritance:
class Key :public std::array<std::uint8_t, 32> {
public:
// ...
~Key()
{
secure_clear(*this); // for illustration
}
// ...
};
Related
How can I efficiently return a vector of derived pointers from a vector of base pointers?
std::vector<const Base*> getb();
std::vector<const Derived*> getd()
{
auto vb = getb(); /// I know for a fact all vb elements point to Derived
return ...;
}
Derived does not inherit directly from Base
The objects exist in other containers that have process lifetime.
boost::ranges?
I know for a fact all vb elements point to Derived
The best course of action is to express that assertion with types. Why does getb() return a vector of base pointers in the first place, if you know a better type for the elements? Make it a vector of derived pointers from the start.
Failing that, you need to dynamic_cast each and every individual pointer in vb and put the result in another container. Other casts may or may not work.
First, I would say that if you run into this problem, you should examine why, in your design, you need to do this step. Possibly there is something you could do differently to avoid this problem. Personally, I find it fishy, that you generate a container of Base* that each only point to Derived objects.
But if you want to do this, there are some possibilities how to go about this. If B is not a virtual base class of D, everywhere, you can use a static_cast instead of a dynamic_cast, due to [expr.static.cast]/11 (in short, if you now that the dynamic_cast will work, you can also static_cast). This will save you the runtime check in the dynamic_cast.
Conversion with memory overhead
You basically create a second vector and copy all pointers over:
const auto vb = get_b();
std::vector<const Derived*> ret;
ret.reserve(vb.size());
std::transform(cbegin(vb), cend(vb), std::back_inserter(ret), [](const Base* p) { return dynamic_cast<const Derived*>(p); });
return ret;
This is, in my opinion, the fastest and most concise way to do this only with the stl in C++14. I am not well versed in the capabilities of boost. If you can use some kind of transforming iterator, you could initialize ret directly from two iterators.
No memory overhead, boilerplate code, slight access runtime overhead
Wrap your std::vector<const Base*> in an own class that works like a vector and returns a const Derived* on access. With dynamic_cast this will have a slight runtime overhead when accessing, since it will have to do a check. If you can use a static_cast (as talked about above), this will not be the case. (You may have a very slight overhead due to the added level of indirection).
I really like this solution and personally would use it. I am not sure if boost has some kind of container adaptor, otherwise, you will have to write a bit of boilerplate code to have vector-like interface (you could inherit from std::vector and only overwrite operator[] and at(), but this has problems of its own since std::vector has no virtual methods, including its destructor!).
No memory and no runtime overhead
You would really like that, I guess^^. I don't think this is possible with an std::vector. For no runtime overhead, you would need to return an object that is really of type std::vector<const Derived*>. For no memory overhead you have to recycle the memory of the returned object of get_b.
But a std::vector<T> has no possibilty to relinquish ownership of its owned memory (except to another std::vector<T> on the same T through swap or move construction/assignment). Maybe you can do some fishy stuff with a custom allocator such that the underlying storage is not deleted when the vector is destroyed and you obtain it before destruction through data(). But this seems like a perfect way to obtain a memory leak. Especially since I don't really know how this works with capacity vs size.
Even if you get the underlying storage, you cannot use it, since you cannot construct a vector from a already allocated piece of memory. Of course here you can again do something evil with a custom allocator, but again this seems like a bad idea.
One could go around this problem by using std::unique_ptr<T[]> instead of std::vector. This is basically a compromise between std::array and std::vector. It holds an array of runtime size, but the size of this array is constant once it is allocated. Here you can obtain the storage with release() and construct a new std::unique_ptr from it without issue.
This brings us to the worst problem. The following code is not valid. You cannot even cast from a Derived** to a Base** (and lets not even talk about the other way round)
std::unique_ptr<const Base*[]> base(static_cast<const Base**>(new Derived*[5]));
std::unique_ptr<const Derived*[]> dev(dynamic_cast<const Derived**>(base.release()));
I have no idea if there is some weird way of reinterpreting just the whole chunk of memory as pointers of other type. Even then if this in any way something sensible to do. So I see no way doing this variant.
After I am studying about allocator for a few days by reading some articles
(cppreference and Are we out of memory) ,
I am confused about how to control a data-structure to allocate memory in a certain way.
I am quite sure I misunderstand something,
so I will divide the rest of question into many parts to make my mistake easier to be refered.
Here is what I (mis)understand :-
Snippet
Suppose that B::generateCs() is a function that generates a list of C from a list of CPrototype.
The B::generateCs() is used in B() constructor:-
class C {/*some trivial code*/};
class CPrototype {/*some trivial code*/};
class B {
public:
std::vector<C> generateCs() {
std::vector<CPrototype> prototypes = getPrototypes();
std::vector<C> result; //#X
for(std::size_t n=0; n < prototypes.size(); n++) {
//construct real object (CPrototype->C)
result.push_back( makeItBorn(prototypes[n]) );
}
return result;
}
std::vector<C> bField; //#Y
B() {
this->bField = generateCs(); //#Y ; "generateCs()" is called only here
}
//.... other function, e.g. "makeItBorn()" and "getPrototypes()"
};
From the above code, std::vector<C> currently uses a generic default std::allocator.
For simplicity, from now on, let's say there are only 2 allocators (beside the std::allocator) ,
which I may code it myself or modify from somewhere
:-
HeapAllocator
StackAllocator
Part 1 (#X)
This snippet can be improved using a specific type allocator.
It can be improved in 2 locations. (#X and #Y)
std::vector<C> at line #X seems to be a stack variable,
so I should use stack allocator :-
std::vector<C,StackAllocator> result; //#X
This tends to yield a performance gain. (#X is finished.)
Part 2 (#Y)
Next, the harder part is in B() constructor. (#Y)
It would be nice if the variable bField has an appropriate allocation protocol.
Just coding the caller to use allocator explicitly can't achieve it,
because the caller of constructor can only do as best as :-
std::allocator<B> bAllo;
B* b = bAllo.allocate(1);
which does not have any impact on allocation protocol of bField.
Thus, it is duty of constructor itself to pick a correct allocation protocol.
Part 3
I can't know whether an instance of B will be constructed as a heap variable or a stack variable.
It is matter because this information is importance for picking a correct allocator/protocol.
If I know which one it is (heap or stack), I can change declaration of bField to be:-
std::vector<C,StackAllocator> bField; //.... or ....
std::vector<C,HeapAllocator> bField;
Unfortunately, with the limited information (I don't know which it will be heap/stack, it can be both),
this path (using std::vector) leads to the dead end.
Part 4
Therefore, the better way is passing allocator into constructor:-
MyVector<C> bField; //create my own "MyVector" that act almost like "std::vector"
B(Allocator* allo) {
this->bField.setAllocationProtocol(allo); //<-- run-time flexibility
this->bField = generateCs();
}
It is tedious because callers have to pass an allocator as an additional parameter,
but there are no other ways.
Moreover, it is the only practical way to gain the below data-coherence advantage when there are many callers, each one use its own memory chunk:-
class System1 {
Allocator* heapForSystem1;
void test(){
B b=B(heapForSystem1);
}
};
class System2 {
Allocator* heapForSystem2;
void test(){
B b=B(heapForSystem2);
}
};
Question
Where did I start to go wrong, how?
How can I improve the snippet to use appropriate allocator (#X and #Y)?
When should I pass allocator as a parameter?
It is hard to find a practical example about using allocator.
Edit (reply Walter)
... using another than std:allocator<> is only rarely recommendable.
For me, it is the core of Walter's answer.
It would be a valuable knowledge if it is reliable.
1. Are there any book/link/reference/evidence that support it?
The list doesn't support the claim. (It actually supports the opposite a little.)
Is it from personal experience?
2. The answer somehow contradict with many sources. Please defense.
There are many sources that recommend not to use std:allocator<>.
Are we out of memory :
Can't answer "How much memory are you using for subsystem X?" is a guilty.
Custom C++ allocators suitable for video games
It implies that custom allocator is a must for console games.
# section "Why replace the default allocator?"
Memory Management part 1 of 3
without custom allocator = "every now and then there’s a little lag (in game)"
More specifically, are they just a "hype" that rarely worth using in real world?
Another small question :-
Can the claim be expanded to "Most quality games rarely use custom allocator"?
3. If I am in such rare situation, I have to pay the cost, right?
There are only 2 good ways:-
passing allocator as template argument, or
as a function's (including constructor) parameter
(another bad approach is to create some global flag about what protocol to use)
Is it correct?
In C++, the allocator used for the standard containers is tied to the container type (but see below). Thus, if you want to control the allocation behaviour of your class (including its container members), the allocator must be part of the type, i.e. you must pass it as a template parameter:
template<template <typename T> Allocator>
class B
{
public:
using allocator = Allocator<C>
using fieldcontainer = std::vector<C,allocator>;
B(allocator alloc=allocator{})
: bFields(create_fields(alloc)) {}
private:
const fieldcontainer bFields;
static fieldcontainer create_fields(allocator);
};
Note, however, that there is experimental polymorphic allocator support, which allows you change the allocator behaviour independently of the type. This is certainly preferable to designing your own MyVector<> template.
Note that using another than std::allocator<> is only recommendable if there is a good reason. Possible cases are as follows.
A stack allocator may be preferred for small objects that are frequently allocated and de-allocated, but even the heap allocator may not be less efficient.
An allocator that provides memory aligned to, say, 64bytes (suitable for aligned loading into AVX registers).
A cache-aligned allocator is useful to avoid false sharing in multi-threaded situations.
An allocator could avoid default initialising trivially constructible objects to enhance performance in multi-threaded settings.
note added in response to additional questions.
The article Are we out of memory dates from 2008 and doesn't apply to contemporary C++ practice (using the C++11 standard or later), when memory management using std containers and smart pointers (std::unique_ptr and std::shared_ptr) avoids memory leaks, which are the main source of increasing memory demand in poorly written code.
When writing code for certain specific applications, there may well be good reasons to use a custom allocator -- and the C++ standard library supports this, so this is a legitimate and appropriate approach. The good reasons include those listed already above, in particular when high performance is required in a multi-threaded environment or to be achieved via SIMD instructions.
If memory is very limited (as it may be on some game consoles), a custom allocator cannot really magically increase the amount of memory. So in this case the usage of the allocator, not the allocator itself, is most critical. A custom allocator may help reducing memory fragmentation, though.
It sounds like you are misunderstanding what a stack allocator is. A stack allocator is just an allocator that uses a stack, the data structure. A stack allocator can manage memory that is either allocated on the stack or the heap. It is dangerous to use if you don't know what you are doing as a stack allocator deallocates all the memory past the specified pointer when deallocate is called. You can use a stack allocator for when the most recently initialized element in a data structure is always the next one destroyed (or if you end up destroying them all at once in the end).
You can look at some of the std collections to see how they allow programmers to supply a specified allocator such as std::vector. They use an optional template argument so the user can choose the allocator class. It also allows you to pass the allocator in as an instance if you want to. If you don't, it instantiates one with the default constructor. If you don't choose an allocator class, then it uses the default allocater which just uses the heap. You could do the same.
template<typename C, typename Allocator = std::allocator<C> >
class B {
vector<C, Allocator> bField;
void generateCs() {
std::vector<CPrototype> prototypes = getPrototypes();
for(std::size_t n=0; n < prototypes.size(); n++) {
//construct real object (CPrototype->C)
bField.push_back( makeItBorn(prototypes[n]) );
}
}
B(const Allocator& allo = Allocator()) : bField(allo) {
generateCs();
}
}
This allows the user to have control over allocation when they want to, but they also ignore it if they don't care
I have recently been reading about custom memory allocators for c++ and came across an intressting concept where rather than using pointers "handles" are used which are effectively pointers to pointers, this allows the allocator to rearrange its memory to avoid fragmentation while avoiding the problem of invalidating all the pointers to the allocated memory.
However, different allocators may wish to use handles differently, for example a pool allocator would have no need to rearrange its memory, where as other allocators would. Those that need to rearrange their memory may need to treat handles as pointers to pointers, indexes to an array of pointers, etc whereas allocators that do not rearrange their memory would treat handles as a simple pointer. Ideally each allocator would be able to use a different type of handle so that it could achieve optimum performance, having a base handle class with virtual methods would incur a lot of overhead as handles would be used every time you needed to access any function/member of a class allocated dynamically.
My solution was to use partial template specialization so that the handle type was worked out at compile time, removing the run time overhead of virtuals and allowing the compiler to do other optimizations (eg: inlining)
/////////////////////////////////////////////////
/// \brief The basic handle class, acts as simple pointer
/// Single layer of indirection
/////////////////////////////////////////////////
template <typename T>
class Handle{
public:
T* operator->(){return obj;}
//other methods...
private:
T* obj;
};
/////////////////////////////////////////////////
/// \brief Pointer specialization of the handle class, acts as a pointer to pointer
/// allowing allocators to rearrange their data
/////////////////////////////////////////////////
template <typename T>
class Handle<T *>{
public:
T* operator->(){return *obj;};
//other methods...
private:
T** obj;
};
This works perfectly and allows allocators to return whichever handle type they need, however it means that any function that needs to take a handle as a parameter needs to be overloaded to accept both types of specialization, also a class holding a handle as a member will need to be templated as to whether it has a normal handle, pointer to pointer handle or some other type.
This problem only gets worse as more handle types are added or a function takes more than one handle and all combinations of handle types must be given an overload.
Either I need to be able to make all handles that point to an instance of "TypeA" have the type Handle<TypeA> and then use a different method to template specialization to provide the different functionality or somehow hide the template parameter from any code using the handles. How could this be achieved?
(This method of hiding template parameters would also be useful in other instances, for example in a policy based logging system where a class may wish to hold a reference to any type of logger without itself being templated. Obviously in the case of logging virtual inheritance could be used as the dominating factor in speed would be the I/O rather than function call overhead)
I have implemented a memory system that allowed exactly what you describe, but could not think of a way to have unique handle types without virtual functions. The template parameters are part of the type.
In the end I made a single handle type and used the least significant bit of the pointer to store whether it was a direct or indirect pointer. Before dereferencing I would check the bit and if it was not set I would simply return the pointer, otherwise I would unset the bit and ask the memory system for the actual pointer.
The scheme did work but I eventually removed the indirect memory handle support from my memory system as I found that the overheads could be quite high and because it was so intrusive on all aspects of my code. Basically almost everywhere a pointer would normally be used I had to use a handle instead. It also required memory to be locked before use on other threads so that it wasn't defragmented while in use. Finally it required me to write entirely custom containers in order to get acceptable performance. I didn't want a double indirection on every access of a vector in a loop for example.
A different question inspired the following thought:
Does std::vector<T> have to move all the elements when it increases its capacity?
As far as I understand, the standard behaviour is for the underlying allocator to request an entire chunk of the new size, then move all the old elements over, then destroy the old elements and then deallocate the old memory.
This behaviour appears to be the only possible correct solution given the standard allocator interface. But I was wondering, would it make sense to amend the allocator to offer a reallocate(std::size_t) function which would return a pair<pointer, bool> and could map to the underlying realloc()? The advantage of this would be that in the event that the OS can actually just extend the allocated memory, then no moving would have to happen at all. The boolean would indicate whether the memory has moved.
(std::realloc() is maybe not the best choice, because we don't need do copy data if we cannot extend. So in fact we'd rather want something like extend_or_malloc_new(). Edit: Perhaps a is_pod-trait-based specialization would allow us to use the actual realloc, including its bitwise copy. Just not in general.)
It seems like a missed opportunity. Worst case, you could always implement reallocate(size_t n) as return make_pair(allocate(n), true);, so there wouldn't be any penalty.
Is there any problem that makes this feature inappropriate or undesirable for C++?
Perhaps the only container that could take advantage of this is std::vector, but then again that's a fairly useful container.
Update: A little example to clarify. Current resize():
pointer p = alloc.allocate(new_size);
for (size_t i = 0; i != old_size; ++i)
{
alloc.construct(p + i, T(std::move(buf[i])))
alloc.destroy(buf[i]);
}
for (size_t i = old_size; i < new_size; ++i)
{
alloc.construct(p + i, T());
}
alloc.deallocate(buf);
buf = p;
New implementation:
pair<pointer, bool> pp = alloc.reallocate(buf, new_size);
if (pp.second) { /* as before */ }
else { /* only construct new elements */ }
When std::vector<T> runs out of capacity it has to allocate a new block. You have correctly covered the reasons.
IMO it would make sense to augment the allocator interface. Two of us tried to for C++11 and we were unable to gain support for it: [1] [2]
I became convinced that in order to make this work, an additional C-level API would be needed. I failed in gaining support for that as well: [3]
In most cases, realloc will not extend the memory, but rather allocate a separate block and move the contents. That was considered when defining C++ in the first place, and it was decided that the current interface is simpler and not less efficient in the common case.
In real life, there are actually few cases where reallocis able to grow. In any implementation where malloc has different pool sizes, chances are that the new size (remember that vector sizes must grow geometrically) will fall in a different pool. Even in the case of large chunks that are not allocated from any memory pool, it will only be able to grow if the virtual addresses of the larger size are free.
Note that while realloc can sometimes grow the memory without moving, but by the time realloc completes it might have already moved (bitwise move) the memory, and that binary move will cause undefined behavior for all non-POD types. I don't know of any allocator implementation (POSIX, *NIX, Windows) where you can ask the system whether it will be able to grow, but that would fail if it requires moving.
Yep, you're right that the standard allocator interface doesn't provide optimizations for memcpy'able types.
It's been possible to determine whether a type can be memcpy'd using boost type traits library (not sure if they provide it out of the box or one would have to build a composite type discriminator based on the boost ones).
Anyway, to take advantage of realloc() one would probably create a new container type that can explicitly take advantage of this optimization. With current standard allocator interface it doesn't seem to be possible.
class Help
{
public:
Help();
~Help();
typedef std::set<string> Terms;
typedef std::map<string, std::pair<int,Terms> > TermMap;
typedef std::multimap<int, string, greater<int> > TermsMap;
private:
TermMap terms;
TermsMap termsMap;
};
How can we find the memory used (in bytes) by the objects term and termsMap. Do we have any library ?
If you are looking for the full memory usage of an object, this can't be solved in general in C++ - while we can get the size of an instance itself via sizeof(), the object can always allocate memory dynamically as needed.
If you can find out how big the individual element in a container are, you can get a lower bound:
size = sizeof(map<type>) + sum_of_element_sizes;
Keep in mind though that the containers can still allocate additional memory as an implementation detail and that for containers like vector and string you have to check for the allocated size.
How can we find the memory used (in
bytes) by the objects term and
termsMap. Do we have any library ?
You should use your own allocator type.
typedef std::set<string,
your_allocator_1_that_can_count_memory_consumption_t> Terms;
typedef std::map<string, std::pair<int,Terms>,
your_allocator_2_that_can_count_memory_consumption_t> TermMap;
typedef std::multimap<int, string, greater<int>,
your_allocator_3_that_can_count_memory_consumption_t> TermsMap;
I have not yet checked this idea for std::string so if it is difficult to implement just use your own class fixed_string which just wraps char s[max-string-lenght].
And when you need in your program to find out memory consumption just get it from your_allocator_1_that_can_counts_memory_consumption_t, your_allocator_2_that_can_counts_memory_consumption_t,
your_allocator_3_that_can_counts_memory_consumption_t.
Edited
For UncleBens I want to clarify my point.
As far as I understand the question of the ARV it is necessary to know how much memory is allocated for set::set and std::map including all memory allocated for elements of the set and the map. So it is not just sizeof(terms).
So I just suggested a very simple allocator. Without going into too much details it might look like this:
template <class T>
class your_allocator_1_that_can_counts_memory_consumption_t {
public:
// interfaces that are required by the standart
private:
std::allocator<T> std_allocator_;
// here you need to put your variable to count bytes
size_t globale_variable_for_allocator_1_to_count_bytes_;
};
This allocator just counts number of allocated and deallocated bytes and for real allocation and deallocation use its member std_allocator_. I might need to debug it under gdb in order to set a breakpoint on malloc() and on free() to make sure that every allocation and deallocation actually goes through my allocator.
I would be grateful if you point me at some problems with this idea since I have already implemented it in my program that runs on Windows, Linux and HP-UX and I simply asks my allocators in order to find how much memory each of my containers use.
Short Answer: No
Long Answer:
-> The basic object yes. sizeof(<TYPE>) but this is only useful for limited things.
-> A container and its contained members: NO
If you make assumptions about the structures used to implement these objects you can estimate it. But even that is not really useful ( apart from the very specific case of the vector).
The designers of the STL deliberately did not define the data structures that should be used by these containers. There are several reasons for this, but one of them (in my opinion) is to stop people making assumptions about the internals and thus try and do silly things that are not encapsulated by the interface.
So the question then comes down to why do you need to know the size?
Do you really need to know the size (unlikely but possible).
Or is there a task you are trying to achieve where you think you need the size?
If you're looking for the actual block of memory, the numerical value of a pointer to it should be it. (Then just add the number of bytes, and you have the end of the block).
the sizeof() operator ought to do it:
size_t bytes = sizeof(Help::TermMap);