How to represent existing data as std::vector - c++

I have to pass existing data (unsigned char memory area with known size) to the library function expecting const std::vector<std::byte>& . Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
I have data from the old legacy as a pointer and size, not as a std::vector. Legacy C code allocates memory by malloc() and provides pointer and size. Please do not suggest touching the legacy code - by the end of the phrase I'll cease to be an employee of the company.
I don't want to create temporary vector and copy data because memory throughtput is huge (> 5GB/sec).
Placement new creates vector - but with the first bytes used for the vector data itself. I cannot use few bytes before the memory area - legacy code didn't expect that (see above - memory area is allocated by malloc()).
Changing third party library is out of question. It expects const std::vectorstd::byte& - not span iterators etc.
It looks that I have no way but to go with temporary vector but maybe there are other ideas... I wouldn't care but it is about intensive video processing and there will be a lot of data to copy for nothing.

Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
No.
The potential options are:
Put the data in a vector in the first place.
Or change the function expecting a vector to not expect a vector.
Or create a vector and copy the data.
If 1. and 2. are not valid options for you, that leaves you with 3. whether you want it or not.

As the top answer mentions, this is impossible to do in standard C++. And you should not try to do it.
If you can tolerate only using libstdc++ and getting potentially stuck with a specific standard library version, it looks like you can do it. Again, you should not do this. I'm only writing this answer as it seems to be possible without UB in this specific circumstance.
It appears that the current version of libstdc++ exposes their vectors' important members as protected: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h#L422
All you need to do is inherit from std::vector (it's not forbidden), write your own constructor for setting these protected members, and write a destructor to reset the members so that the actual vector destructor does not delete your memory.
#include <vector>
#include <cstddef>
template <class T>
struct dont_use_me_in_prod : std::vector<T>
{
dont_use_me_in_prod(T* data, size_t n) {
this->_M_impl._M_start = data;
this->_M_impl._M_finish = data + n;
this->_M_impl._M_end_of_storage = this->_M_impl._M_finish;
}
~dont_use_me_in_prod() {
this->_M_impl._M_start = nullptr;
this->_M_impl._M_finish = nullptr;
this->_M_impl._M_end_of_storage = nullptr;
}
};
void innocent_function(const std::vector<int>& v);
void please_dont_do_this_in_prod(int* vals, int n) {
dont_use_me_in_prod evil_vector(vals, n);
innocent_function(evil_vector);
}
Note that this is not compiler, but standard library dependent, meaning that it'll work with clang as well as long as you use libstdc++ with it. But this is not conforming, so you gotta fix innocent_function somehow soon:
https://godbolt.org/z/Tfcn7rdKq

The problem is std::vector is not a reference class like std::string_view or std::span. std::vector owns the managed memory. It allocates the memory and releases the owned memory. It is not designed to acquire the external buffer and release the managed buffer.
What you can do is a very dirty hack. You can create new structure with exactly the same layout as a std::vector, assign the data and size fields with what you get from external lib, and then pass this struct as a std::vector const& using reinterpret_cast. It can work as your library does not modify the vector (I assume they do not perform const_cast on std::vector const&).
The drawback is that code is unmaintainable. The next STL update can cause application crash, if the layout of the std::vector is changed.
Following is a pseudo code
struct FakeVector
{
std::byte* Data;
std::size Size;
std::size Capacity;
};
void onNewData(std::byte* ptr, size_t size)
{
auto vectorRef = FakeVector{ptr, size, size};
doSomething(*reinterpret_cast<std::vector<std::byte>*>(&vectorRef));
}

Well, I've found the way working for me. I must admit that it is not fully standard compliant because casting of vector results in undefined behavior but for the foreseeable future I wouldn't expect this to fail. Idea is to use my own Allocator for the vector that accepts the buffer from the legacy code and works on it. The problem is that std::vector<std::byte> calls default initialization on resize() that zeroes the buffer. If there is a way to disable that - it would be a perfect solution but I have not found... So here the ugly cast comes - from the std::vector<InnerType> where InnerType is nothing but std::byte with default constructor disabled to the std::vector<std::byte> that library expects. Working code is shown at https://godbolt.org/z/7jME79EE9 , also here:
#include <cstdlib>
#include <iostream>
#include <vector>
#include <cstddef>
struct InnerType {
std::byte value;
InnerType() {}
InnerType(std::byte v) : value(v) {}
};
static_assert(sizeof(InnerType) == sizeof(std::byte));
template <class T> class AllocatorExternalBufferT {
T* const _buffer;
const size_t _size;
public:
typedef T value_type;
constexpr AllocatorExternalBufferT() = delete;
constexpr AllocatorExternalBufferT(T* buf, size_t size) : _buffer(buf), _size(size) {}
[[nodiscard]] T* allocate(std::size_t n) {
if (n > _size / sizeof(T)) {
throw std::bad_array_new_length();
}
return _buffer;
}
void deallocate(T*, std::size_t) noexcept {}
};
template <class T, class U> bool operator==(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return true; }
template <class T, class U> bool operator!=(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return false; }
typedef std::vector<InnerType, AllocatorExternalBufferT<InnerType>> BufferDataVector;
typedef std::vector<std::byte, AllocatorExternalBufferT<std::byte>> InterfaceVector;
static void report(const InterfaceVector& vec) {
std::cout << "size=" << vec.size() << " capacity=" << vec.capacity() << " ";
for(const auto& el : vec) {
std::cout << static_cast<int>(el) << " ";
}
std::cout << "\n";
}
int main() {
InnerType buffer4allocator[16] ;
BufferDataVector v((AllocatorExternalBufferT<InnerType>(buffer4allocator, sizeof(buffer4allocator)))); // double parenthesis here for "most vexing parse" nonsense
v.resize(sizeof(buffer4allocator));
std::cout << "memory area kept intact after resizing vector:\n";
report(*reinterpret_cast<InterfaceVector*>(&v));
}

Yes you can do this. Not in a nice safe way but it's certainly possible.
All you need to do is create a fake std::vector that has the same ABI (memory layout) as std::vector. Then set it's internal pointer to point to your data and reinterpet_cast your fake vector back to a std::vector.
I wouldn't recommend it unless you really need to do it because any time your compiler changes its std::vector ABI (field layout basically) it will break. Though to be fair that is very unlikely to happen these days.

Related

Easy way of managing the recycling of C++ STL vectors of POD types

My application consists of calling dozens of functions millions of times. In each of those functions, one or a few temporary std::vector containers of POD (plain old data) types are initialized, used, and then destructed. By profiling my code, I find the allocations and deallocations lead to a huge overhead.
A lazy solution is to rewrite all the functions as functors containing those temporary buffer containers as class members. However this would blow up the memory consumption as the functions are many and the buffer sizes are not trivial.
A better way is to analyze the code, gather all the buffers, premeditate how to maximally reuse them, and feed a minimal set of shared buffer containers to the functions as arguments. But this can be too much work.
I want to solve this problem once for all my future development during which temporary POD buffers become necessary, without having to have much premeditation. My idea is to implement a container port, and take the reference to it as an argument for every function that may need temporary buffers. Inside those functions, one should be able to fetch containers of any POD type from the port, and the port should also auto-recall the containers before the functions return.
// Port of vectors of POD types.
struct PODvectorPort
{
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::vector<std::size_t> > X; // Container pool.
PODvectorPort() { Nlent = 0; }
};
// Functor that manages the port.
struct PODvectorPortOffice
{
std::size_t initialNlent; // Number of already-dispatched containers
// when the office is set up.
PODvectorPort *p; // Pointer to the port.
PODvectorPortOffice(PODvectorPort &port)
{
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
{
// return *((std::vector<X>*)(&y)); // UB although works
std::vector<X> *rst = nullptr;
std::memcpy(&rst, &y, std::min(
sizeof(std::vector<X>*), sizeof(std::vector<Y>*)));
return *rst; // guess it makes no difference. Should still be UB.
}
template<typename T>
std::vector<T> & lend()
{
++p->Nlent;
// Ensure sufficient container pool size:
while (p->X.size() < p->Nlent) p->X.push_back( std::vector<size_t>(0) );
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
void recall() { p->Nlent = initialNlent; }
~PODvectorPortOffice() { recall(); }
};
struct ArbitraryPODstruct
{
char a[11]; short b[7]; int c[5]; float d[3]; double e[2];
};
// Example f1():
// f2(), f3(), ..., f50() are similarly defined.
// All functions are called a few million times in certain
// order in main().
// port is defined in main().
void f1(other arguments..., PODvectorPort &port)
{
PODvectorPort portOffice(port);
// Oh, I need a buffer of chars:
std::vector<char> &tmpchar = portOffice.lend();
tmpchar.resize(789); // Trivial if container already has sufficient capacity.
// ... do things
// Oh, I need a buffer of shorts:
std::vector<short> &tmpshort = portOffice.lend();
tmpshort.resize(456); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of ArbitraryPODstruct:
std::vector<ArbitraryPODstruct> &tmpArb = portOffice.lend();
tmpArb.resize(123); // Trivial if container already has sufficient capacity.
// ... do things.
// Oh, I need a buffer of integers, but also tmpArb is no longer
// needed. Why waste it? Cache hot.
std::vector<int> &tmpint = portOffice.repaint(tmpArb);
tmpint.resize(300); // Trivial.
// ... do things.
}
Although the code is compliable by both gcc-8.3 and MSVS 2019 with -O2 to -Ofast, and passes extensive tests for all options, I expect criticism due to the hacky nature of PODvectorPortOffice::repaint(), which "casts" the vector type in-place.
A set of sufficient but not necessary conditions for the correctness and efficiency of the above code are:
std::vector<T> stores 3 pointers to the underlying buffer's &[0], &[0] + .size(), &[0] + .capacity().
std::vector<T>'s allocator calls malloc().
malloc() returns an 8-byte (or sizeof(std::size_t)) aligned address.
So, if this is unacceptable to you, what would be the modern, proper way of addressing my need? Is there a way of writing a manager that achieve what my code does only without violating the Standard?
Thanks!
Edits: A little more context of my problem. Those functions mainly compute some simple statistics of the inputs. The inputs are data streams of financial parameters of different types and sizes. To compute the statistics, those data need to be altered and re-arranged first, thus the buffers for temporary copies. Computing the statistics is cheap, thus the allocations and deallocations can become expensive, relatively. Why do I want a manger for arbitrary POD type? Because 2 weeks from now I may start receiving a data stream of a different type, which can be a bunch of primitive types zipped in a struct, or a struct of the composite types encountered so far. I, of course, would like the upper stream to just send separate flows of primitive types, but I have no control of that aspect.
More edits: after tons of reading and code experimenting regarding the strict aliasing rule, the answer should be, don't try everything I put up there --- it works, for now, but don't do it. Instead, I'll be diligent and stick to my previous code-as-you-go style, just add a vector<vector<myNewType> > into the port once a new type comes up, and manage it in a similar way. The accepted answer also offers a nice alternative.
Even more edits: conceived a stronger class that has better chance to thwart potential optimizations under the strict aliasing rule. DO NOT USE IT WITHOUT TESTING AND THOROUGH UNDERSTANDING OF THE STRICT ALIASING RULE.
// -std=c++17
#include <cstring>
#include <cstddef>
#include <iostream>
#include <vector>
#include <chrono>
// POD: plain old data.
// Idea: design a class that can let you maximally reuse temporary
// containers during a program.
// Port of vectors of POD types.
template <std::size_t portsize = 42>
class PODvectorPort
{
static constexpr std::size_t Xsize = portsize;
std::size_t signature;
std::size_t Nlent; // Number of dispatched containers.
std::vector<std::size_t> X[portsize]; // Container pool.
PODvectorPort(const PODvectorPort &);
PODvectorPort & operator=( const PODvectorPort& );
public:
std::size_t Ndispatched() { return Nlent; }
std::size_t showSignature() { return signature; }
PODvectorPort() // Permuted random number generator.
{
std::size_t state = std::chrono::high_resolution_clock::now().time_since_epoch().count();
state ^= (uint64_t)(&std::memmove);
signature = ((state >> 18) ^ state) >> 27;
std::size_t rot = state >> 59;
signature = (signature >> rot) | (state << ((-rot) & 31));
Nlent = 0;
}
template<typename podvecport>
friend class PODvectorPortOffice;
};
// Functor that manages the port.
template<typename podvecport>
class PODvectorPortOffice
{
// Number of already-dispatched containers when the office is set up.
std::size_t initialNlent;
podvecport *p; // Pointer to the port.
PODvectorPortOffice( const PODvectorPortOffice& ); // non construction-copyable
PODvectorPortOffice& operator=( const PODvectorPortOffice& ); // non copyable
constexpr void check()
{
while (__cplusplus < 201703)
{
std::cerr << "PODvectorPortOffice: C++ < 17, Stall." << std::endl;
}
// Check if allocation will be 8-byte (or more) aligned.
// Intend it not to work on machine < 64-bit.
constexpr std::size_t aln = alignof(std::max_align_t);
while (aln < 8)
{
std::cerr << "PODvectorPortOffice: Allocation is not at least 8-byte aligned, Stall." <<
std::endl;
}
while ((aln & (aln - 1)) != 0)
{
std::cerr << "PODvectorPortOffice: Alignment is not a power of 2 bytes. Stall." << std::endl;
}
// Random checks to see if sizeof(vector<S>) != sizeof(vector<T>).
if(true)
{
std::size_t vecHeadSize[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
vecHeadSize[0] = sizeof(std::vector<char>(0));
vecHeadSize[1] = sizeof(std::vector<short>(1));
vecHeadSize[2] = sizeof(std::vector<int>(2));
vecHeadSize[3] = sizeof(std::vector<long>(3));
vecHeadSize[4] = sizeof(std::vector<std::size_t>(5));
vecHeadSize[5] = sizeof(std::vector<float>(7));
vecHeadSize[6] = sizeof(std::vector<double>(11));
vecHeadSize[7] = sizeof(std::vector<std::vector<char> >(13));
vecHeadSize[8] = sizeof(std::vector<std::vector<int> >(17));
vecHeadSize[9] = sizeof(std::vector<std::vector<double> >(19));
struct tmpclass1 { char a; short b; };
struct tmpclass2 { char a; float b; };
struct tmpclass3 { char a; double b; };
struct tmpclass4 { int a; char b; };
struct tmpclass5 { double a; char b; };
struct tmpclass6 { double a[5]; char b[3]; short c[3]; };
vecHeadSize[10] = sizeof(std::vector<tmpclass1>(23));
vecHeadSize[11] = sizeof(std::vector<tmpclass2>(29));
vecHeadSize[12] = sizeof(std::vector<tmpclass3>(31));
vecHeadSize[13] = sizeof(std::vector<tmpclass4>(37));
vecHeadSize[14] = sizeof(std::vector<tmpclass4>(41));
vecHeadSize[15] = sizeof(std::vector<tmpclass4>(43));
std::size_t notSame = 0;
for(int i = 0; i < 16; ++i)
notSame += vecHeadSize[i] != sizeof(std::size_t) * 3;
while (notSame)
{
std::cerr << "sizeof(std::vector<S>) != sizeof(std::vector<T>), \
PODvectorPortOffice cannot handle. Stall." << std::endl;
}
}
}
void recall() { p->Nlent = initialNlent; }
public:
PODvectorPortOffice(podvecport &port)
{
check();
p = &port;
initialNlent = p->Nlent;
}
template<typename X, typename Y>
std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
// AFTER A VECTOR IS REPAINTED, DO NOT USE THE OLD VECTOR AGAIN !!
{
while (std::is_same<bool, X>::value)
{
std::cerr << "PODvectorPortOffice: Cannot repaint the vector to \
std::vector<bool>. Stall." << std::endl;
}
std::vector<X> *x;
std::vector<Y> *yp = &y;
std::memcpy(&x, &yp, sizeof(x));
return *x; // Not compliant with strict aliasing rule.
}
template<typename T>
std::vector<T> & lend()
{
while (p->Nlent >= p->Xsize)
{
std::cerr << "PODvectorPortOffice: No more containers. Stall." << std::endl;
}
++p->Nlent;
return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
}
~PODvectorPortOffice()
{
// Because p->signature can only be known at runtime, an aggressive,
// compliant compiler (ACC) will never remove this
// branch. Volatile might do, but trustworthiness?
if(p->signature == 0)
{
constexpr std::size_t sizeofvec = sizeof(std::vector<std::size_t>);
char dummy[sizeofvec * p->Xsize];
std::memcpy(dummy, p->X, p->Nlent * sizeofvec);
std::size_t ticketNum = 0;
char *xp = (char*)(p->X);
for(int i = 0, iend = p->Nlent * sizeofvec; i < iend; ++i)
{
xp[i] &= xp[iend - i - 1] * 5;
ticketNum += xp[i] ^ ticketNum;
}
std::cerr << "Congratulations! After the port office was decommissioned, \
you found a winning lottery ticket. The odds is less than 2.33e-10. Your \
ticket number is " << ticketNum << std::endl;
std::memcpy(p->X, dummy, p->Nlent * sizeofvec);
// According to the strict aliasing rule, a char* can point to any memory
// block pointed by another pointer of any type T*. Thus given an ACC,
// the writes to that block via the char* must be fully acknowledged in
// time by T*, namely, for reading contents from T*, a reload instruction
// will be kept in the assembly code to achieve a sort of
// "register-cache-memory coherence" (RCMC).
// We also do not care about the renters' (who received the reference via
// .lend()) RCMC, because PODvectorPortOffice never accesses the contents
// of those containers.
}
recall();
}
};
Any adversarial test case to break it, especially on GCC>=8.3 or MSVS >= 2019, is welcomed!
Let me frame this by saying I don't think there's an "authoritative" answer to this question. That said, you've provided enough constraints that a suggested path is at least worthwhile. Let's review the requirements:
Solution must use std::vector. This is in my opinion the most unfortunate requirement for reasons I won't get into here.
Solution must be standards compliant and not resort to rule violations, like the strict aliasing rule.
Solution must either reduce the number of allocations performed, or reduce the overhead of allocations to the point of being negligible.
In my opinion this is definitely a job for a custom allocator. There are a couple of off-the-shelf options that come close to doing what you want, for example the Boost Pool Allocators. The one you're most interested in is boost::pool_allocator. This allocator will create a singleton "pool" for each distinct object size (note: not object type), which grows as needed, but never shrinks until you explicitly purge it.
The main difference between this and your solution is that you'll have distinct pools of memory for objects of different sizes, which means it will use more memory than your posted solution, but in my opinion this is a reasonable trade-off. To be maximally efficient, you could simply start a batch of operations by creating vectors of each needed type with an appropriate size. All subsequent vector operations which use these allocators will do trivial O(1) allocations and deallocations. Roughly in pseudo-code:
// be careful with this, probably want [[nodiscard]], this is code
// is just rough guidance:
void force_pool_sizes(void)
{
std::vector<int, boost::pool_allocator<int>> size_int_vect;
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> size_16_vect;
...
size_int_vect.resize(100); // probably makes malloc calls
size_16_vect.resize(200); // probably makes malloc calls
...
// on return, objects go out of scope, but singleton pools
// with allocated blocks of memory remain for future use
// until explicitly purged.
}
void expensive_long_running(void)
{
force_pool_sizes();
std::vector<int, boost::pool_allocator<int>> data1;
... do stuff, malloc/free will never be called...
std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> data2;
... do stuff, malloc/free will never be called...
// free everything:
boost::singleton_pool<boost::pool_allocator_tag, sizeof(int)>::release_memory();
}
If you want to take this a step further on being memory efficient, if you know for a fact that certain pool sizes are mutually exclusive, you could modify the boost pool_allocator to use a slightly different singleton backing store which allows you to move a memory block from one block size to another. This is probably out of scope for now, but the boost code itself is straightforward enough, if memory efficiency is critical, it's probably worthwhile.
It's worth pointing out that there's probably some confusion about the strict aliasing rule, especially when it comes to implementing your own memory allocators. There are lots and lots of SO questions about strict aliasing and what it does and doesn't mean. This one is a good place to start.
The key takeaway is that it's perfectly ordinary and acceptable in low level C++ code to take an array of memory and cast it to some object type. If this were not the case, std::allocator wouldn't exist. You also wouldn't have much use for things like std::aligned_storage. Look at the example use case for std::aligned_storage on cppreference. An STL-like static_vector class is created which keeps an array of aligned_storage objects that get recast to a concrete type. Nothing about this is "unacceptable" or "illegal", but it does require some additional knowledge and care in handling.
The reason your solution is especially going to enrage the code lawyers is that you're taking pointers of one non-char object type and casting them to different non-char object types. This is a particularly offensive violation of the strict aliasing rule, but also not really necessary given some of your other options.
Also keep in mind that it's not an error to alias memory, it's a warning. I'm not saying go crazy with aliasing, but I am saying that as with all things C and C++, there are justifiable cases to break rules, when you have very thorough knowledge and understanding of both your compiler and the machine you're running on. Just be prepared for some very long and painful debug sessions if it turns out you didn't in fact know those two things as well as you thought you did.

Is it is possible to construct container and filled data into it in one line using C++03?

Suppose I have a Container.
template<typename Type>
class Container
{
public:
Container(int size_)
{
size=size_;
data = new Type[size];
}
~Container()
{
delete [] data;
}
private:
int size;
Type* data;
};
I want construct the container and fill data into it in one line like this using C++03
// very easy to implement using C++11 std::initializer_list
Container<int> container{100,200,300}
or
Container<int> container(100,200,300)
or
// other one line solution
after do this, data[0]=100,data[1]=200,data[2]=300.
Thanks for your time.
Appendix
Similiar question is
How to fill data into container at once without temp variable in C++03
Evg already give the answer can implement a two lines solution.
Container<int> container(3);
container << 100, 200, 300;
I still wonder is there exist the one line solution?
The answer you link can almost do that. You only need a minor modification and that is: You need to make your container resizable. This is actually the major issue. Once you have that, adapting the solution is minor. Write a insert method that reallocates the memory and adjusts the size then only minor modifications on the proposed solution are necessary.
There is one caveat, and this is you cannot call the constructur call methods on the constructed object and assign it to a variable in the same line without a copy. For that it is possible to provide a conversion from Proxy to Container. I would rethink if putting something on a single line is really worth this cost, when it can be done much easier on two lines.
I didn't include the implementation of insert, because that would be sort of a different question:
#include <iostream>
template<typename Type>
class Container {
private:
struct Proxy {
Container* container;
Proxy(Container* container) : container(container) {}
Proxy& operator,(Type value) {
container->insert(value);
return *this;
}
operator Container() { return *container; }
};
public:
// ...
void insert(const Type& value) {
std::cout << value;
}
Proxy operator<<(Type value) {
insert(value);
return Proxy(this);
}
};
int main() {
Container<int> container = (Container<int>() << 1,2,3);
}
Output:
123
PS:
The problem is that, there is Container x={1,2,3,....,1000} everywhere in my project using C++11. Now, I must omove to C++03, and there is no std::itializer_list
Yes that is a problem. I suppose 1,2,3,...1000 is just an oversimplified example, otherwise you could use something similar to std::iota to fill the container (also only avaible since C++11, but not too difficult to implement). If that is the actual problem and you are looking for a temporary hack I would rather use plain arrays and construct the container from that:
int temp[] = {1,2,3,4,5 ....};
Container<int> x( &temp[0], &temp[999]);

C++: Can't propagate polymorphic_allocator with scoped_allocator_adaptor

I have a vector<vector<int>> and want the entire memory (i.e., of both the outer and the inner vector) to be taken from a memory_resource. Here is a stripped down example, first the boring part:
#include <boost/container/pmr/memory_resource.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/container/pmr/polymorphic_allocator.hpp>
#include <iostream>
#include <string>
#include <vector>
// Sample memory resource that prints debug information
class MemoryResource : public boost::container::pmr::memory_resource {
void* do_allocate(std::size_t bytes, std::size_t alignment) {
std::cout << "Allocate " << bytes << " bytes" << std::endl;
return malloc(bytes);
}
void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) { free(p); }
bool do_is_equal(const memory_resource& other) const noexcept { return true; }
};
This is the part that I am interested in:
template <typename T>
using Alloc = boost::container::pmr::polymorphic_allocator<T>;
// using Alloc = std::allocator<T>;
template <typename T>
using PmrVector = std::vector<T, boost::container::scoped_allocator_adaptor<Alloc<T>>>;
using Inner = PmrVector<int>;
int main() {
MemoryResource resource{};
PmrVector<Inner> v(1000, Alloc<Inner>{&resource});
// PmrVector<Inner> v(1337, Alloc<Inner>{});
v[0].resize(100);
}
This gives me a lengthy compiler warning, essentially saying that it can't find a constructor for the inner vector.
If, instead of the polymorphic allocator, I use a regular allocator (e.g., std::allocator - see the lines that are commented out), everything seems to work.
The gcc error message is a bit better than that of clang:
/usr/local/include/boost/container/allocator_traits.hpp:415:10:
error: no matching function for call to '
std::vector<int, polymorphic_allocator<int> >::vector(
scoped_allocator_adaptor<...>&, polymorphic_allocator<...>&
)
'
Why would boost try to construct a vector by passing the allocator twice?
Also, here is a version that uses STL (experimental) instead of boost. That one gives an actual error message "construction with an allocator must be possible if uses_allocator is true", but that doesn't help me either.
Maybe I am understanding something conceptually wrong. Is this the way to do it or is there a better way to solve the original problem?
Argh. The explanation is hidden in std::experimental::pmr::polymorphic_allocator::construct:
This function is called (through std::allocator_traits) by any
allocator-aware object, such as std::vector, that was given a
std::polymorphic_allocator as the allocator to use. Since
memory_resource* implicitly converts to polymorphic_allocator, the
memory resource pointer will propagate to any allocator-aware
subobjects using polymorphic allocators.
So it turns out that polymorphic allocators automatically propagate. That also explains why the allocator is passed twice in the gcc error message.
Here is a working version:
template <typename T>
using Alloc = std::experimental::pmr::polymorphic_allocator<T>;
template <typename T>
using PmrVector = std::vector<T, Alloc<T>>;
using Inner = PmrVector<int>;
int main() {
MemoryResource resource{};
PmrVector<Inner> v(1000, Alloc<Inner>{&resource});
v[0].resize(100);
}
And here is the information that I would have need a couple of hours ago:
How do I use polymorphic_allocator and scoped_allocator_adaptor together?
You don't. Make sure that all inner containers also use polymorphic allocators, then the memory resource will be handed down automatically.

How to use boost::pool library to create a custom memory allocator

I am new to boost and I want to know how exactly the boost::pool libraries can help me in creating a custom memory allocator.
And I have two vector of struct objects.
First vector is of structure type A, while second vector is of structure type B.
How can I reuse the memory allocated to the first vector to the second vector.
Boost Pool is a library that defines a few allocator types.
Obviously, the focus of the library is to provide Pool Allocators.
Pool Allocators shine when you allocate objects of identical size.
Note If your structure A and structure B aren't identical/very similar size you may not like this design assumption.
The allocators provided by the framework work with singleton pools, and they differentiate on the size of your container value_type. That's a bit inflexible if you want to reuse or even share the pool between different value-types. Also, singleton pools can be inflexible and imply thread-safety costs.
So, I wanted to see whether I could whip up the simplest allocator that alleviates some of these issues.
I used the source to boost::pool_alloc and the cppreference example as inspiration, and then did some testing and memory profiling.
A More Flexible Stateful Allocator
Here's the simplest pool allocator I could think of:
using Pool = boost::pool<boost::default_user_allocator_malloc_free>;
template <typename T> struct my_pool_alloc {
using value_type = T;
my_pool_alloc(Pool& pool) : _pool(pool) {
assert(pool_size() >= sizeof(T));
}
template <typename U>
my_pool_alloc(my_pool_alloc<U> const& other) : _pool(other._pool) {
assert(pool_size() >= sizeof(T));
}
T *allocate(const size_t n) {
T* ret = static_cast<T*>(_pool.ordered_malloc(n));
if (!ret && n) throw std::bad_alloc();
return ret;
}
void deallocate(T* ptr, const size_t n) {
if (ptr && n) _pool.ordered_free(ptr, n);
}
// for comparing
size_t pool_size() const { return _pool.get_requested_size(); }
private:
Pool& _pool;
};
template <class T, class U> bool operator==(const my_pool_alloc<T> &a, const my_pool_alloc<U> &b) { return a.pool_size()==b.pool_size(); }
template <class T, class U> bool operator!=(const my_pool_alloc<T> &a, const my_pool_alloc<U> &b) { return a.pool_size()!=b.pool_size(); }
Notes:
This allocator is stateful, and thus requires container implementations that allow them (such as Boost Container, Boost MultiIndex). In theory, all C++11-compliant standard libraries should also support them
The comparisons should guide the containers to swap/copy the allocator or not. This is not an area I've though much about and the chosen approach to mark all pools of different requested-sizes different might be inadequate for some.
Sample, Tests
On my compilers it works for both std::vector and Boost's vector:
Live On Coliru - with GCC std::vector
Live On Coliru - with GCC boost::container::vector
Live On Coliru - with Clang std::vector
Live On Coliru - with Clang boost::container::vector
All runs are leak-free and ubsan/asan clean.
Note how we re-use the same pool with different containers of different struct sizes, and how we can even use it with multiple live containers at a time, provided that the element types fit in the request size (32)
struct A { char data[7]; };
struct B { char data[29]; };
int main() {
//using boost::container::vector;
using std::vector;
Pool pool(32); // 32 should fit both sizeof(A) and sizeof(B)
{
vector<A, my_pool_alloc<A> > v(1024, pool);
v.resize(20480);
};
// pool.release_memory();
{
vector<B, my_pool_alloc<B> > v(1024, pool);
v.resize(20480);
}
// pool.release_memory();
// sharing the pool between multiple live containers
{
vector<A, my_pool_alloc<A> > v(512, pool);
vector<B, my_pool_alloc<B> > w(512, pool);
v.resize(10240);
w.resize(10240);
};
}
Profiling
Using Valgrind's Memory profiler shows, with the release_memory lines commented out as shown:
When commenting in the release_memory() calls:
I hope this looks like the thing you wanted.
Further Ideas: Simple Segregated Storage
This allocator uses the existing pool which delegate back to malloc/free to allocate memory on demand. To use it with a fixed "realm", you might prefer using simple_segregated_storage directly. This article looks like a good starter https://theboostcpplibraries.com/boost.pool

Why can't I wrap a T* in an std::vector<T>?

I have a T* addressing a buffer with len elements of type T. I need this data in the form of an std::vector<T>, for certain reasons. As far as I can tell, I cannot construct a vector which uses my buffer as its internal storage. Why is that?
Notes:
Please don't suggest I use iterators - I know that's usually the way around such issues.
I don't mind that the vector having to copy data around if it's resized later.
This question especially baffles me now that C++ has move semantics. If we can pull an object's storage from under its feet, why not be able to shove in our own?
You can.
You write about std::vector<T>, but std::vector takes two template arguments, not just one. The second template argument specifies the allocator type to use, and vector's constructors have overloads that allow passing in a custom instance of that allocator type.
So all you need to do is write an allocator that uses your own internal buffer where possible, and falls back to asking the default allocator when your own internal buffer is full.
The default allocator cannot possibly hope to handle it, since it would have no clue on which bits of memory can be freed and which cannot.
A sample stateful allocator with an internal buffer containing already-constructed elements that should not be overwritten by the vector, including a demonstration of a big gotcha:
struct my_allocator_state {
void *buf;
std::size_t len;
bool bufused;
const std::type_info *type;
};
template <typename T>
struct my_allocator {
typedef T value_type;
my_allocator(T *buf, std::size_t len)
: def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, len, false, &typeid(T) })) { }
template <std::size_t N>
my_allocator(T(&buf)[N])
: def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, N, false, &typeid(T) })) { }
template <typename U>
friend struct my_allocator;
template <typename U>
my_allocator(my_allocator<U> other)
: def(), state(other.state) { }
T *allocate(std::size_t n)
{
if (!state->bufused && n == state->len && typeid(T) == *state->type)
{
state->bufused = true;
return static_cast<T *>(state->buf);
}
else
return def.allocate(n);
}
void deallocate(T *p, std::size_t n)
{
if (p == state->buf)
state->bufused = false;
else
def.deallocate(p, n);
}
template <typename...Args>
void construct(T *c, Args... args)
{
if (!in_buffer(c))
def.construct(c, std::forward<Args>(args)...);
}
void destroy(T *c)
{
if (!in_buffer(c))
def.destroy(c);
}
friend bool operator==(const my_allocator &a, const my_allocator &b) {
return a.state == b.state;
}
friend bool operator!=(const my_allocator &a, const my_allocator &b) {
return a.state != b.state;
}
private:
std::allocator<T> def;
std::shared_ptr<my_allocator_state> state;
bool in_buffer(T *p) {
return *state->type == typeid(T)
&& points_into_buffer(p, static_cast<T *>(state->buf), state->len);
}
};
int main()
{
int buf [] = { 1, 2, 3, 4 };
std::vector<int, my_allocator<int>> v(sizeof buf / sizeof *buf, {}, buf);
v.resize(3);
v.push_back(5);
v.push_back(6);
for (auto &i : v) std::cout << i << std::endl;
}
Output:
1
2
3
4
6
The push_back of 5 fits into the old buffer, so construction is bypassed. When 6 is added, new memory is allocated, and everything starts acting as normal. You could avoid that problem by adding a method to your allocator to indicate that from that point onward, construction should not be bypassed any longer.
points_into_buffer turned out to be the hardest part to write, and I've omitted that from my answer. The intended semantics should be obvious from how I'm using it. Please see my question here for a portable implementation in my answer there, or if your implementation allows it, use one of the simpler versions in that other question.
By the way, I'm not really happy with how some implementations use rebind in such ways that there is no avoiding storing run-time type info along with the state, but if your implementation doesn't need that, you could make it a bit simpler by making the state a template class (or a nested class) too.
The short answer is that a vector can't use your buffer because it wasn't designed that way.
It makes sense, too. If a vector doesn't allocate its own memory, how does it resize the buffer when more items are added? It allocates a new buffer, but what does it do with the old one? Same applies to moving - if the vector doesn't control its own buffer, how can it give control of this buffer to another instance?
These days - you no longer need to wrap a T* in an std::vector, you can wrap it with an std::span (in C++20; before that - use gsl::span). A span offers you all the convenience of a standard library container - in fact, basically all relevant features of std::vector excluding changes to the size - with a very thin wrapper class. That's what you want to use, really.
For more on spans, read: What is a "span" and when should I use one?