I'm creating a pre-allocator with dynamic memory chunk size, and I need to unify contiguous memory chunks.
struct Chunk // Chunk of memory
{
Ptr begin, end; // [begin, end) range
}
struct PreAlloc
{
std::vector<Chunk> chunks; // I need to unify contiguous chunks here
...
}
I tried a naive solution, that, after sorting the chunks based on their begin, basically did a pass through the vector checking if the next chunk's begin was equal to the current chunk's end. I'm sure it could be improved.
Is there a good algorithm to unify contiguous ranges?
Information:
Chunks can never "overlap".
Chunks can have any size greater than 0.
Performance is the most important factor.
NOTE: there was an error in my original algorithm, where I only considered blocks to the left of the current block.
Use two associative tables (e.g. unordered_map), one mapping the begin address to the Chunk, another mapping the end to the Chunk. This lets you find the neighbouring blocks quickly. Alternatively, you can change the Chunk struct to store a pointer/id/whatever to the neighbouring Chunk, plus a flag to mark to tell if it's free.
The algorithm consists of scanning the vector of chunks once, while maintaining the invariant: if there is a neighbour to the left, you merge them; if there is a neighbour to the right, you merge them. At the end, just collect the remaining chunks.
Here's the code:
void unify(vector<Chunk>& chunks)
{
unordered_map<Ptr, Chunk> begins(chunks.size() * 1.25); // tweak this
unordered_map<Ptr, Chunk> ends(chunks.size() * 1.25); // tweak this
for (Chunk c : chunks) {
// check the left
auto left = ends.find(c.begin);
if (left != ends.end()) { // found something to the left
Chunk neighbour = left->second;
c.begin = neighbour.begin;
begins.erase(neighbour.begin);
ends.erase(left);
}
// check the right
auto right = begins.find(c.end);
if (right != begins.end()) { // found something to the right
Chunk neighbour = right->second;
c.end = neighbour.end;
begins.erase(right);
ends.erase(neighbour.end);
}
begins[c.begin] = c;
ends[c.end] = c;
}
chunks.clear();
for (auto x : begins)
chunks.push_back(x.second);
}
The algorithm has O(n) complexity assuming constant time access to the begins and ends tables (which is nearly what you get if you don't trigger rehashing, hence the "tweak this" comments). There are quite a few options to implement associative tables, make sure to try a few different alternatives; as pointed out in the comment by Ben Jackson, a hash table doesn't always make good use of cache, so even a sorted vector with binary searches might be faster.
If you can change the Chunk structure to store left/right pointers, you get a guaranteed O(1) lookup/insert/remove. Assuming you are doing this to consolidate free chunks of memory, the left/right checking can be done in O(1) during the free() call, so there is no need to consolidate it afterwards.
I think you can not do better then N log(N) - the naive approach. The idea using an unordered associative container I dislike - the hashing will degenerate performance. An improvement might be: keep the chunks sorted at each insert, making 'unify' O(N).
It seems you are writing some allocator, hence I dig up some old code of mine (with some adjustment regarding C++ 11 and without any warranty). The allocator is for small objects having a size <= 32 * sizeof(void*).
Code:
// Copyright (c) 1999, Dieter Lucking.
//
// Permission is hereby granted, free of charge, to any person or organization
// obtaining a copy of the software and accompanying documentation covered by
// this license (the "Software") to use, reproduce, display, distribute,
// execute, and transmit the Software, and to prepare derivative works of the
// Software, and to permit third-parties to whom the Software is furnished to
// do so, all subject to the following:
//
// The copyright notices in the Software and this entire statement, including
// the above license grant, this restriction and the following disclaimer,
// must be included in all copies of the Software, in whole or in part, and
// all derivative works of the Software, unless such copies or derivative
// works are solely in the form of machine-executable object code generated by
// a source language processor.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
// SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
// FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
// DEALINGS IN THE SOFTWARE.
//
#include <limits>
#include <chrono>
#include <iomanip>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
// raw_allocator
// =============================================================================
class raw_allocator
{
// Types
// =====
public:
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef void value_type;
typedef void* pointer;
typedef const void* const_pointer;
typedef unsigned char byte_type;
typedef byte_type* byte_pointer;
typedef const unsigned char* const_byte_pointer;
// Information
// ===========
public:
static size_type max_size() noexcept {
return std::numeric_limits<size_type>::max();
}
static size_type mem_size(size_type) noexcept;
// Allocation.System
// =================
public:
static pointer system_allocate(size_type) noexcept;
static void system_allocate(size_type, pointer&, size_type&) noexcept;
static void system_deallocate(pointer) noexcept;
// Allocation
// ==========
public:
static void allocate(size_type, pointer& result, size_type& capacity) noexcept;
static pointer allocate(size_type n) noexcept {
pointer result;
allocate(n, result, n);
return result;
}
static void deallocate(pointer p, size_type n) noexcept;
// Allocation.Temporary:
//======================
public:
static void allocate_temporary(size_type, pointer& result,
size_type& capacity) noexcept;
static pointer allocate_temporary(size_type n) noexcept {
pointer result;
allocate_temporary(n, result, n);
return result;
}
static void deallocate_temporary(pointer, size_type) noexcept;
// Logging
// =======
public:
static void log(std::ostream& stream);
};
// static_allocator
// =============================================================================
template<class T> class static_allocator;
template<>
class static_allocator<void>
{
public:
typedef void value_type;
typedef void* pointer;
typedef const void* const_pointer;
template<class U> struct rebind
{
typedef static_allocator<U> other;
};
};
template<class T>
class static_allocator
{
// Types
// =====
public:
typedef raw_allocator::size_type size_type;
typedef raw_allocator::difference_type difference_type;
typedef T value_type;
typedef T& reference;
typedef const T& const_reference;
typedef T* pointer;
typedef const T* const_pointer;
template<class U> struct rebind
{
typedef static_allocator<U> other;
};
// Construction/Destruction
// ========================
public:
static_allocator() noexcept {};
static_allocator(const static_allocator&) noexcept {};
~static_allocator() noexcept {};
// Information
// ===========
public:
static size_type max_size() noexcept {
return raw_allocator::max_size() / sizeof(T);
}
static size_type mem_size(size_type n) noexcept {
return raw_allocator::mem_size(n * sizeof(T)) / sizeof(T);
}
static pointer address(reference x) {
return &x;
}
static const_pointer address(const_reference x) {
return &x;
}
// Construct/Destroy
//==================
public:
static void construct(pointer p, const T& value) {
new ((void*) p) T(value);
}
static void destroy(pointer p) {
((T*) p)->~T();
}
// Allocation
//===========
public:
static pointer allocate(size_type n) noexcept {
return (pointer)raw_allocator::allocate(n * sizeof(value_type));
}
static void allocate(size_type n, pointer& result, size_type& capacity) noexcept
{
raw_allocator::pointer p;
raw_allocator::allocate(n * sizeof(value_type), p, capacity);
result = (pointer)(p);
capacity /= sizeof(value_type);
}
static void deallocate(pointer p, size_type n) noexcept {
raw_allocator::deallocate(p, n * sizeof(value_type));
}
// Allocation.Temporary
// ====================
static pointer allocate_temporary(size_type n) noexcept {
return (pointer)raw_allocator::allocate_temporary(n * sizeof(value_type));
}
static void allocate_temporary(size_type n, pointer& result,
size_type& capacity) noexcept
{
raw_allocator::pointer p;
raw_allocator::allocate_temporary(n * sizeof(value_type), p, capacity);
result = (pointer)(p);
capacity /= sizeof(value_type);
}
static void deallocate_temporary(pointer p, size_type n) noexcept {
raw_allocator::deallocate_temporary(p, n);
}
// Logging
// =======
public:
static void log(std::ostream& stream) {
raw_allocator::log(stream);
}
};
template <class T1, class T2>
inline bool operator ==(const static_allocator<T1>&,
const static_allocator<T2>&) noexcept {
return true;
}
template <class T1, class T2>
inline bool operator !=(const static_allocator<T1>&,
const static_allocator<T2>&) noexcept {
return false;
}
// allocator:
// =============================================================================
template<class T> class allocator;
template<>
class allocator<void>
{
public:
typedef static_allocator<void>::value_type value_type;
typedef static_allocator<void>::pointer pointer;
typedef static_allocator<void>::const_pointer const_pointer;
template<class U> struct rebind
{
typedef allocator<U> other;
};
};
template<class T>
class allocator
{
// Types
// =====
public:
typedef typename static_allocator<T>::size_type size_type;
typedef typename static_allocator<T>::difference_type difference_type;
typedef typename static_allocator<T>::value_type value_type;
typedef typename static_allocator<T>::reference reference;
typedef typename static_allocator<T>::const_reference const_reference;
typedef typename static_allocator<T>::pointer pointer;
typedef typename static_allocator<T>::const_pointer const_pointer;
template<class U> struct rebind
{
typedef allocator<U> other;
};
// Constructor/Destructor
// ======================
public:
template <class U>
allocator(const allocator<U>&) noexcept {}
allocator() noexcept {};
allocator(const allocator&) noexcept {};
~allocator() noexcept {};
// Information
// ===========
public:
size_type max_size() const noexcept {
return static_allocator<T>::max_size();
}
pointer address(reference x) const {
return static_allocator<T>::address(x);
}
const_pointer address(const_reference x) const {
return static_allocator<T>::address(x);
}
// Construct/Destroy
// =================
public:
void construct(pointer p, const T& value) {
static_allocator<T>::construct(p, value);
}
void destroy(pointer p) {
static_allocator<T>::destroy(p);
}
// Allocation
// ==========
public:
pointer allocate(size_type n, typename allocator<void>::const_pointer = 0) {
return static_allocator<T>::allocate(n);
}
void deallocate(pointer p, size_type n) {
static_allocator<T>::deallocate(p, n);
}
// Logging
// =======
public:
static void log(std::ostream& stream) {
raw_allocator::log(stream);
}
};
template <class T1, class T2>
inline bool operator ==(const allocator<T1>&, const allocator<T2>&) noexcept {
return true;
}
template <class T1, class T2>
inline bool operator !=(const allocator<T1>&, const allocator<T2>&) noexcept {
return false;
}
// Types
// =============================================================================
typedef raw_allocator::size_type size_type;
typedef raw_allocator::byte_pointer BytePointer;
struct LinkType
{
LinkType* Link;
};
struct FreelistType
{
LinkType* Link;
};
// const
// =============================================================================
// Memory layout:
// ==============
//
// Freelist
// Index Request Alignment
// =============================================================================
// [ 0 ... 7] [ 0 * align ... 8 * align] every 1 * align bytes
// [ 8 ... 11] ( 8 * align ... 16 * align] every 2 * align bytes
// [12 ... 13] ( 16 * align ... 24 * align] every 4 * align bytes
// [14] ( 24 * align ... 32 * align] 8 * align bytes
//
// temporary memory:
// [15] [ 0 * align ... 256 * align] 256 * align
static const unsigned FreeListArraySize = 16;
static const size_type FreelistInitSize = 16;
static const size_type MinAlign =
(8 < 2 * sizeof(void*)) ? 2 * sizeof(void*) : 8;
static const size_type MaxAlign = 32 * MinAlign;
static const size_type MaxIndex = 14;
static const size_type TmpIndex = 15;
static const size_type TmpAlign = 256 * MinAlign;
static const size_type IndexTable[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10,
10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14 };
static_assert(sizeof(IndexTable) / sizeof(size_type) == MaxAlign / MinAlign, "Invalid Index Table");
inline size_type get_index(size_type n) {
return IndexTable[long(n - 1) / MinAlign];
}
static const size_type AlignTable[] = { MinAlign * 1, MinAlign * 2, MinAlign
* 3, MinAlign * 4, MinAlign * 5, MinAlign * 6, MinAlign * 7, MinAlign * 8,
MinAlign * 10, MinAlign * 12, MinAlign * 14, MinAlign * 16, MinAlign * 20,
MinAlign * 24, MinAlign * 32, TmpAlign, };
static_assert(sizeof(AlignTable) / sizeof(size_type) == TmpIndex + 1, "Invalid Align Table");
inline size_type get_align(size_type i) {
return AlignTable[i];
}
// Thread
// ============================================================================
static LinkType* Freelist[FreeListArraySize];
static BytePointer HeapBeg;
static BytePointer HeapEnd;
static size_type TotalHeapSize;
static std::mutex FreelistMutex[FreeListArraySize] = { };
inline void lock_free_list(size_type i) {
FreelistMutex[i].lock();
}
inline void unlock_free_list(size_type i) {
FreelistMutex[i].unlock();
}
// Allocation
// ============================================================================
// Requiers: freelist[index] is locked
LinkType* allocate_free_list(size_type index) noexcept {
static std::mutex mutex;
const size_type page_size = 4096; // FIXME some system_page_size();
std::lock_guard<std::mutex> guard(mutex);
size_type heap_size = HeapEnd - HeapBeg;
size_type align = get_align(index);
if(heap_size < align) {
LinkType* new_list = (LinkType*)(HeapBeg);
// If a temporary list:
if(MaxAlign <= heap_size) {
LinkType* current = new_list;
LinkType* next;
while(2*MaxAlign <= heap_size) {
next = (LinkType*)(BytePointer(current) + MaxAlign);
current->Link = next;
current = next;
heap_size -= MaxAlign;
}
if(index != MaxIndex) lock_free_list(MaxIndex);
current->Link = Freelist[MaxIndex];
Freelist[MaxIndex] = new_list;
if(index != MaxIndex) unlock_free_list(MaxIndex);
new_list = (LinkType*)(BytePointer(current) + MaxAlign);
heap_size -= MaxAlign;
}
if(MinAlign <= heap_size) {
std::cout << "heap_size: " << heap_size << std::endl;
size_type i = get_index(heap_size);
if(heap_size < get_align(i)) --i;
if(index != i) lock_free_list(i);
new_list->Link = Freelist[i];
Freelist[i] = new_list;
if(index != i) unlock_free_list(i);
}
heap_size = FreelistInitSize * align + TotalHeapSize / FreelistInitSize;
heap_size = (((heap_size - 1) / page_size) + 1) * page_size;
HeapBeg = BytePointer(raw_allocator::system_allocate(heap_size));
if(HeapBeg) {
HeapEnd = HeapBeg + heap_size;
TotalHeapSize += heap_size;
}
else {
HeapEnd = 0;
size_type i = FreeListArraySize;
while(HeapBeg == 0) {
--i;
if(i <= index) return 0;
lock_free_list(i);
if(Freelist[i]) {
heap_size = get_align(i);
HeapBeg = (BytePointer)(Freelist[i]);
HeapEnd = HeapBeg + heap_size;
Freelist[i] = Freelist[i]->Link;
}
unlock_free_list(i);
}
}
}
size_type size = FreelistInitSize * align;
size_type count = FreelistInitSize;
if(heap_size < size) {
count = heap_size / align;
size = align * count;
}
LinkType* beg_list = (LinkType*)(HeapBeg);
LinkType* end_list = beg_list;
while(--count) {
LinkType* init = (LinkType*)(BytePointer(end_list) + align);
end_list->Link = init;
end_list = init;
}
LinkType*& freelist = Freelist[index];
end_list->Link = freelist;
freelist = beg_list;
HeapBeg += size;
return freelist;
}
// raw_allocator
// ============================================================================
// size
// ====
raw_allocator::size_type
raw_allocator::mem_size(size_type n) noexcept {
if( ! n) return 0;
else {
if(n <= MaxAlign) return get_align(get_index(n));
else return ((difference_type(n) - 1) / difference_type(MaxAlign)) * MaxAlign
+ MaxAlign;
}
}
// allocation.system
// =================
raw_allocator::pointer raw_allocator::system_allocate(size_type n) noexcept
{
return ::malloc(n);
}
void raw_allocator::system_allocate(size_type n, pointer& p, size_type& capacity) noexcept
{
capacity = mem_size(n);
p = ::malloc(capacity);
if(p == 0) capacity = 0;
}
void raw_allocator::system_deallocate(pointer p) noexcept {
::free(p);
}
// allocation
// ==========
void raw_allocator::allocate(size_type n, pointer& p, size_type& capacity) noexcept
{
if(n == 0 || MaxAlign < n) system_allocate(n, p, capacity);
else {
p = 0;
capacity = 0;
size_type index = get_index(n);
lock_free_list(index);
LinkType*& freelist = Freelist[index];
if(freelist == 0) {
freelist = allocate_free_list(index);
}
if(freelist != 0) {
p = freelist;
capacity = get_align(index);
freelist = freelist->Link;
}
unlock_free_list(index);
}
}
void raw_allocator::deallocate(pointer p, size_type n) noexcept {
if(p) {
if(n == 0 || MaxAlign < n) system_deallocate(p);
else {
size_type index = get_index(n);
lock_free_list(index);
LinkType*& freelist = Freelist[index];
LinkType* new_list = ((LinkType*)(p));
new_list->Link = freelist;
freelist = new_list;
unlock_free_list(index);
}
}
}
// allocation.temporary
// ====================
void raw_allocator::allocate_temporary(size_type n, pointer& p,
size_type& capacity) noexcept
{
if(n == 0 || size_type(TmpAlign) < n) system_allocate(n, p, capacity);
else {
p = 0;
capacity = 0;
lock_free_list(TmpIndex);
LinkType*& freelist = Freelist[TmpIndex];
if(freelist == 0) freelist = allocate_free_list(TmpIndex);
if(freelist != 0) {
p = freelist;
freelist = freelist->Link;
capacity = TmpAlign;
}
unlock_free_list(TmpIndex);
}
}
void raw_allocator::deallocate_temporary(pointer p, size_type n) noexcept {
if(p) {
if(n == 0 || size_type(TmpAlign) < n) system_deallocate(p);
else {
lock_free_list(TmpIndex);
LinkType*& freelist = Freelist[TmpIndex];
LinkType* new_list = ((LinkType*)(p));
new_list->Link = freelist;
freelist = new_list;
unlock_free_list(TmpIndex);
}
}
}
void raw_allocator::log(std::ostream& stream) {
stream << " Heap Size: " << TotalHeapSize << '\n';
size_type total_size = 0;
for (unsigned i = 0; i < FreeListArraySize; ++i) {
size_type align = get_align(i);
size_type size = 0;
size_type count = 0;
lock_free_list(i);
LinkType* freelist = Freelist[i];
while (freelist) {
size += align;
++count;
freelist = freelist->Link;
}
total_size += size;
unlock_free_list(i);
stream << " Freelist: " << std::setw(4) << align << ": " << size
<< " [" << count << ']' << '\n';
}
size_type heap_size = HeapEnd - HeapBeg;
stream << " Freelists: " << total_size << '\n';
stream << " Free Heap: " << heap_size << '\n';
stream << " Allocated: " << TotalHeapSize - total_size - heap_size
<< '\n';
}
int main() {
const unsigned sample_count = 100000;
std::vector<char*> std_allocate_pointers;
std::vector<char*> allocate_pointers;
std::vector<unsigned> sample_sizes;
typedef std::chrono::nanoseconds duration;
duration std_allocate_duration;
duration std_deallocate_duration;
duration allocate_duration;
duration deallocate_duration;
std::allocator<char> std_allocator;
allocator<char> allocator;
for (unsigned i = 0; i < sample_count; ++i) {
if (std::rand() % 2) {
unsigned size = unsigned(std::rand()) % MaxAlign;
//std::cout << " Allocate: " << size << std::endl;
sample_sizes.push_back(size);
{
auto start = std::chrono::high_resolution_clock::now();
auto p = std_allocator.allocate(size);
auto end = std::chrono::high_resolution_clock::now();
std_allocate_pointers.push_back(p);
std_allocate_duration += std::chrono::duration_cast<duration>(
end - start);
}
{
auto start = std::chrono::high_resolution_clock::now();
auto p = allocator.allocate(size);
auto end = std::chrono::high_resolution_clock::now();
allocate_pointers.push_back(p);
allocate_duration += std::chrono::duration_cast<duration>(
end - start);
}
}
else {
if (!sample_sizes.empty()) {
char* std_p = std_allocate_pointers.back();
char* p = allocate_pointers.back();
unsigned size = sample_sizes.back();
//std::cout << "Deallocate: " << size << std::endl;
{
auto start = std::chrono::high_resolution_clock::now();
std_allocator.deallocate(std_p, size);
auto end = std::chrono::high_resolution_clock::now();
std_deallocate_duration += std::chrono::duration_cast<
duration>(end - start);
}
{
auto start = std::chrono::high_resolution_clock::now();
allocator.deallocate(p, size);
auto end = std::chrono::high_resolution_clock::now();
deallocate_duration += std::chrono::duration_cast<duration>(
end - start);
}
std_allocate_pointers.pop_back();
allocate_pointers.pop_back();
sample_sizes.pop_back();
}
}
}
for (unsigned i = 0; i < sample_sizes.size(); ++i) {
unsigned size = sample_sizes[i];
std_allocator.deallocate(std_allocate_pointers[i], size);
allocator.deallocate(allocate_pointers[i], size);
}
std::cout << "std_allocator: "
<< (std_allocate_duration + std_deallocate_duration).count() << " "
<< std_allocate_duration.count() << " "
<< std_deallocate_duration.count() << std::endl;
std::cout << " allocator: "
<< (allocate_duration + deallocate_duration).count() << " "
<< allocate_duration.count() << " " << deallocate_duration.count()
<< std::endl;
raw_allocator::log(std::cout);
return 0;
}
Note: The raw allocator never release memory to the system (That
might be a bug).
Note: Without optimizations enabled the performance
is lousy (g++ -std=c++11 -O3 ...)
Result:
std_allocator: 11645000 7416000 4229000
allocator: 5155000 2758000 2397000
Heap Size: 94208
Freelist: 16: 256 [16]
Freelist: 32: 640 [20]
Freelist: 48: 768 [16]
Freelist: 64: 1024 [16]
Freelist: 80: 1280 [16]
Freelist: 96: 1536 [16]
Freelist: 112: 1792 [16]
Freelist: 128: 2176 [17]
Freelist: 160: 5760 [36]
Freelist: 192: 6144 [32]
Freelist: 224: 3584 [16]
Freelist: 256: 7936 [31]
Freelist: 320: 10240 [32]
Freelist: 384: 14208 [37]
Freelist: 512: 34304 [67]
Freelist: 4096: 0 [0]
Freelists: 91648
Free Heap: 2560
Allocated: 0
It seemed like an interesting problem so I invested some time in it. The aproach you took is far from being naive. Actually it has pretty good results. It can definetly be optimized further though. I will assume the list of chunks is not already sorted because your algo is probably optimal then.
To optimize it my aproach was to optimize the sort itself eliminating the chunks that can be combined during the sort, thus making the sort faster for the remaining elements.
The code below is basically a modified version of bubble-sort. I also implemented your solution using std::sort just for comparison.
The results are suprisingly good using my also. For a data set of 10 million chunks the combined sort with the merge of chunks performs 20 times faster.
The output of the code is (algo1 is std::sort followed by merging consecutive elements, algo 2 is the sort optimized with removing the chunks that can be merged):
generating input took: 00:00:19.655999
algo 1 took 00:00:00.968738
initial chunks count: 10000000, output chunks count: 3332578
algo 2 took 00:00:00.046875
initial chunks count: 10000000, output chunks count: 3332578
You can probably improve it further using a better sort algo like introsort.
full code:
#include <vector>
#include <map>
#include <set>
#include <iostream>
#include <boost\date_time.hpp>
#define CHUNK_COUNT 10000000
struct Chunk // Chunk of memory
{
char *begin, *end; // [begin, end) range
bool operator<(const Chunk& rhs) const
{
return begin < rhs.begin;
}
};
std::vector<Chunk> in;
void generate_input_data()
{
std::multimap<int, Chunk> input_data;
Chunk chunk;
chunk.begin = 0;
chunk.end = 0;
for (int i = 0; i < CHUNK_COUNT; ++i)
{
int continuous = rand() % 3; // 66% chance of a chunk being continuous
if (continuous)
chunk.begin = chunk.end;
else
chunk.begin = chunk.end + rand() % 100 + 1;
int chunk_size = rand() % 100 + 1;
chunk.end = chunk.begin + chunk_size;
input_data.insert(std::multimap<int, Chunk>::value_type(rand(), chunk));
}
// now we have the chunks randomly ordered in the map
// will copy them in the input vector
for (std::multimap<int, Chunk>::const_iterator it = input_data.begin(); it != input_data.end(); ++it)
in.push_back(it->second);
}
void merge_chunks_sorted(std::vector<Chunk>& chunks)
{
if (in.empty())
return;
std::vector<Chunk> res;
Chunk ch = in[0];
for (size_t i = 1; i < in.size(); ++i)
{
if (in[i].begin == ch.end)
{
ch.end = in[i].end;
} else
{
res.push_back(ch);
ch = in[i];
}
}
res.push_back(ch);
chunks = res;
}
void merge_chunks_orig_algo(std::vector<Chunk>& chunks)
{
std::sort(in.begin(), in.end());
merge_chunks_sorted(chunks);
}
void merge_chunks_new_algo(std::vector<Chunk>& chunks)
{
size_t new_last_n = 0;
Chunk temp;
do {
int last_n = new_last_n;
new_last_n = chunks.size() - 1;
for (int i = chunks.size() - 2; i >= last_n; --i)
{
if (chunks[i].begin > chunks[i + 1].begin)
{
if (chunks[i].begin == chunks[i + 1].end)
{
chunks[i].begin = chunks[i + 1].begin;
if (i + 1 != chunks.size() - 1)
chunks[i + 1] = chunks[chunks.size() - 1];
chunks.pop_back();
} else
{
temp = chunks[i];
chunks[i] = chunks[i + 1];
chunks[i + 1] = temp;
}
new_last_n = i + 1;
} else
{
if (chunks[i].end == chunks[i + 1].begin)
{
chunks[i].end = chunks[i + 1].end;
if (i + 1 != chunks.size() - 1)
chunks[i + 1] = chunks[chunks.size() - 1];
chunks.pop_back();
}
}
}
} while (new_last_n < chunks.size() - 1);
}
void run_algo(void (*algo)(std::vector<Chunk>&))
{
static int count = 1;
// allowing the algo to modify the input vector is intentional
std::vector<Chunk> chunks = in;
size_t in_count = chunks.size();
boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
algo(chunks);
boost::posix_time::ptime stop = boost::posix_time::microsec_clock::local_time();
std::cout<<"algo "<<count++<<" took "<<stop - start<<std::endl;
// if all went ok, statistically we should have around 33% of the original chunks count in the output vector
std::cout<<" initial chunks count: "<<in_count<<", output chunks count: "<<chunks.size()<<std::endl;
}
int main()
{
boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
generate_input_data();
boost::posix_time::ptime stop = boost::posix_time::microsec_clock::local_time();
std::cout<<"generating input took:\t"<<stop - start<<std::endl;
run_algo(merge_chunks_orig_algo);
run_algo(merge_chunks_new_algo);
return 0;
}
I've seen below you mention n is not that high. so I rerun the test with 1000 chunks, 1000000 runs to make the times significant. The modified bubble sort still performs 5 times better. Basically for 1000 chunks total run time is 3 microseconds. Numbers below.
generating input took: 00:00:00
algo 1 took 00:00:15.343456, for 1000000 runs
initial chunks count: 1000, output chunks count: 355
algo 2 took 00:00:03.374935, for 1000000 runs
initial chunks count: 1000, output chunks count: 355
Add pointers to the chunk struct for previous and next adjacent chunk in contiguous memory, if such exists, null otherwise. When a chunk is released you check if adjacent chunks are free, and if they are you merge them and update prev->next and next->prev pointers. This procedure is O(1) and you do it each time a chunk is released.
Some memory allocators put the size of current and previous chunk at the memory position immediately before the address returned by malloc. It is then possible calculate the offset to adjacent chunks without explicit pointers.
The following doesn't require sorted input or provide sorted output. Treat the input as a stack. Pop a chunk off and check if it is adjacent to a member of the initially empty output set. If not, add it to the output set. If it is adjacent, remove the adjacent chunk from the output set and push the new combined chunk onto the input stack. Repeat until input is empty.
vector<Chunk> unify_contiguous(vector<Chunk> input)
{
vector<Chunk> output;
unordered_set<Ptr, Chunk> begins;
unordered_set<Ptr, Chunk> ends;
while (!input.empty())
{
// pop chunk from input
auto chunk = input.back();
input.pop_back();
// chunk end adjacent to an output begin?
auto it = begins.find(chunk.end);
if (it != begins.end())
{
auto end = it->second.end;
Chunk combined{chunk.begin, end};
ends.erase(end);
begins.erase(it);
input.push_back(combined);
continue;
}
// chunk begin adjacent to an output end?
it = ends.find(chunk.begin);
if (it != ends.end())
{
auto begin = it->second.begin;
Chunk combined{begin, chunk.end};
begins.erase(begin);
ends.erase(it);
input.push_back(combined);
continue;
}
// if not add chunk to output
begins[chunk.begin] = chunk;
ends[chunk.end] = chunk;
}
// collect output
for (auto kv : begins)
output.push_back(kv.second);
return output;
}
Related
Recently, I asked for a code review of my Graph class where I use std::pmr::monotonic_buffer_resource with std::pmr::vector of pointers to Nodes Edges and Pins classes.
struct Node;
struct Edge;
struct Pin;
struct Graph {
//Same approach with ArenaAllocate is used for allocating edges and pins
Node* AddNode(const std::string_view name)
{
auto node = ArenaAllocate<Node>(*this, name);
nodes.push_back(node);
return node;
}
template<typename T, size_t alignment = alignof(T), typename ... Args>
[[nodiscard]] T* ArenaAllocate(Args&& ... args)
{
void* const p = linearArena.allocate(sizeof(T), alignment);
return p ? new(p) T(std::forward<Args>(args)...) : nullptr;
}
std::pmr::monotonic_buffer_resource arena{ 1024 * 1024 };
std::pmr::vector<Node*> nodes{&arena};
std::pmr::vector<Pin*> pins{&arena};
std::pmr::vector<Edge*> edges{&arena};
};
and the answer was to use std::pmr::deque or std::pmr::list instead of std::pmr::vector to have stable memory addresses and objects instead of pointers.
https://codereview.stackexchange.com/a/281841/266548 -> see the first statement
So I did a quick benchmark. I keep the more/less original Node implementation, because its size may be important
#include <benchmark/benchmark.h>
#include <memory_resource>
#include <deque>
struct Node
{
Node(const std::size_t id, std::string_view name)
: name(name), id(id)
{}
std::vector<std::size_t> inputs{ 1, 2, 3, 4, 5 };
std::vector<std::size_t> outputs{ 5, 4, 3, 2, 1 };
const std::string name;
std::uint32_t refCount = 0;
const std::size_t id = 0;
bool target = false;
};
static constexpr size_t count = 10000;
static void BM_PMRVectorIterator(benchmark::State& state) {
std::pmr::monotonic_buffer_resource arena{ 1024 * 1024 };
std::pmr::vector<Node*> nodes{ &arena };
for (std::size_t i = 0; i < count; ++i) {
void* const p = arena.allocate(sizeof(Node), alignof(Node));
auto node = new (p) Node(i, "hello");
nodes.push_back(node);
}
for (auto _ : state) {
for (int i = 0; i < count; ++i) {
nodes[i]->refCount = i;
nodes[i]->refCount++;
}
}
}
BENCHMARK(BM_PMRVectorIterator);
static void BM_PMRDequeIterator(benchmark::State& state) {
std::pmr::monotonic_buffer_resource arena{ 1024 * 1024 };
std::pmr::deque<Node> nodes(&arena);
for (int i = 0; i < count; ++i) nodes.emplace_back(i, "Hello");
for (auto _ : state) {
for (int i = 0; i < count; ++i) {
nodes[i].refCount = i;
nodes[i].refCount++;
}
}
}
BENCHMARK(BM_PMRDequeIterator);
BENCHMARK_MAIN();
and iterating over std::pmr::deque<Node> is much more slower:
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
BM_PMRVectorIterator 131711 ns 92076 ns 11200
BM_PMRDequeIterator 1288377 ns 1293945 ns 640
I don't know what to think about it, because both std::pmr::vector<Node*> and std::pmr::deqeue<Node> don't change much in code usage, but these benchmarks scared me.
The linear allocator should assure that Node*s are allocated next to each other, I guess, so there won't be a cache miss.
I want to build a graphical editor for these nodes, pins and edges, so I can't go without iterating over nodes, pins and edges.
What should I do to have a fast and memory-friendly system?
I'm looking for an implementation of a stack allocated 2d array (array of arrays) which supports O(1) reads. You can guess what I mean from the below picture. Black are filled entries, white are possible gaps from erasure.
The implementation should allow fast random access O(1) on each element but also allow insertion and erase operations which do not shift around too many elements (there might be string objects located there).
Information for each array is held in an object like this:
struct array
{
iterator begin_;
iterator end_;
array* next_;
array* prev_;
};
It contains information on where this particular array starts and the memory neighbours (prev_ and next_).
I'm looking for a proven battle hardened algorithm for insertion and erasing that I can rely on. I tried constructing a few on my own but they tend to become very complicated very quickly.
Hurdles:
When arrays are shifted, each updated array needs to somehow receive the memo (adapt begin and end pointers).
Array objects will be themselves located in an array. This means that with every additional data member of struct array, the memory requirements of the whole thing will grow by member_size * 2d_array_size.
I'm open for all suggestions!
I am thinking of an idea, where we can segment the storage into different segments of size n. entire buffer size will be the multiple of n.
When a new array is to be initialized, we allocate a segment to it. normal array operations can be performed there. when it needs more space, it request for one more segment, and if more segment space available we allocate it to them to extend it.
In this case, Minimum length of an array cannot go below segment size n. this size n can be fine tuned as per requirement for better space efficiency and utilization.
Each segment is numbered, so we can calculate the index of an element and fetch it in O(1).
Sample program (In Python):
class segment:
def __init__(self,number):
self.number=number
class storage:
def __init__(self):
self.size=100
self.default_value=None
self.array=[self.default_value]*self.size
self.segment_size=5
self.number_of_segments =len(self.array)//self.segment_size
self.segment_map=[None]*self.number_of_segments
def store(self,index,value):
if index<self.size and index>=0:
self.array[index]=value
def get(self,index):
return self.array[index]
def get_next_segment(self):
new_seg=None
for i,seg in enumerate(self.segment_map):
if seg == self.default_value:
new_seg= segment(i)
break
self.occupy_segment(new_seg)
self.clean_segment(new_seg)
return new_seg
def occupy_segment(self,seg):
self.segment_map[seg.number]=True
def free_segment(self,seg):
self.segment_map[seg.number]=self.default_value
def destroy_segment(self,seg):
self.clean_segment(seg)
self.free_segment(seg)
def clean_segment(self,segment):
if segment==None:
return
segment_start_index=((segment.number) * (self.segment_size)) + 0
segment_end_index=((segment.number) * (self.segment_size)) + self.segment_size
for i in range(segment_start_index,segment_end_index):
self.store(i,self.default_value)
class array:
def __init__(self,storage):
self.storage=storage
self.length=0
self.segments=[]
self.add_new_segment()
def add_new_segment(self):
new_segment=self.storage.get_next_segment()
if new_segment==None:
raise Exception("Out of storage")
self.segments.append(new_segment)
def is_full(self):
return self.length!=0 and self.length%self.storage.segment_size==0
def calculate_storage_index_of(self,index):
segment_number=index//self.storage.segment_size
element_position=index%self.storage.segment_size
return self.segments[segment_number].number * self.storage.segment_size + element_position
def add(self,value):
if self.is_full():
self.add_new_segment()
last_segement=self.segments[-1]
element_position=0
if self.length!=0:
element_position=(self.length%self.storage.segment_size)
index=(last_segement.number*self.storage.segment_size)+element_position
self.__store(index,value)
self.length+=1
def __store(self,index,value):
self.storage.store(index,value)
def update(self,index,value):
self.__store(
self.calculate_storage_index_of(index),
value
)
def get(self,index):
return self.storage.get(self.calculate_storage_index_of(index))
def destroy(self):
for seg in self.segments:
self.storage.destroy_segment(seg)
st=storage()
array1=array(st)
array1.add(3)
Hi I did not have enough time to implement a full solution (and no time to complete it) but here is the direction I was taking (live demo : https://onlinegdb.com/sp7spV_Ui)
#include <cassert>
#include <array>
#include <stdexcept>
#include <vector>
#include <iostream>
namespace meta_array
{
template<typename type_t, std::size_t N> class meta_array_t;
// template internal class not for public use.
namespace details
{
// a block contains the meta information on a subarray within the meta_array
template<typename type_t, std::size_t N>
class meta_array_block_t
{
public:
// the iterator within a block is of the same type as that of the containing array
using iterator_t = typename std::array<type_t, N>::iterator;
/// <summary>
///
/// </summary>
/// <param name="parent">parent, this link is needed if blocks need to move within the parent array</param>
/// <param name="begin">begin iterator of the block</param>
/// <param name="end">end iterator of the block (one past last)</param>
/// <param name="size">cached size (to not have to calculate it from iterator differences)</param>
meta_array_block_t(meta_array_t<type_t, N>& parent, const iterator_t& begin, const iterator_t& end, std::size_t size) :
m_parent{ parent },
m_begin{ begin },
m_end{ end },
m_size{ size }
{
}
// the begin and end methods allow a block to be used in a range based for loop
iterator_t begin() const noexcept
{
return m_begin;
}
iterator_t end() const noexcept
{
return m_end;
}
// operation to shrink the size of the last free block in the meta-array
void move_begin(std::size_t n) noexcept
{
assert(n <= m_size);
m_size -= n;
m_begin += n;
}
// operation to move a block n items back in the meta array
void move_to_back(std::size_t n) noexcept
{
m_begin += n;
m_end += n;
}
std::size_t size() const noexcept
{
return m_size;
}
// assign a new array to the sub array
// if the new array is bigger then the array that is already there
// then move the blocks after it toward the end of the meta-array
template<std::size_t M>
meta_array_block_t& operator=(const type_t(&values)[M])
{
// move all other sub-arrays back if the new sub-array is bigger
// if it is smaller then adjusting the end iterator of the block is fine
if (M > m_size)
{
m_parent.move_back(m_end, M - m_size);
}
m_size = M;
// copy will do the right thing (copy from back to front) if needed
std::copy(std::begin(values), std::end(values), m_begin);
m_end = m_begin + m_size;
return *this;
}
private:
meta_array_t<type_t, N>& m_parent;
std::size_t m_index;
iterator_t m_begin;
iterator_t m_end;
std::size_t m_size;
};
} // details
//---------------------------------------------------------------------------------------------------------------------
//
template<typename type_t, std::size_t N>
class meta_array_t final
{
public:
meta_array_t() :
m_free_size{ N },
m_size{ 0ul },
m_last_free_block{ *this, m_buffer.begin(), m_buffer.end(), N }
{
}
~meta_array_t() = default;
// meta_array is non copyable & non moveable
meta_array_t(const meta_array_t&) = delete;
meta_array_t operator=(const meta_array_t&) = delete;
meta_array_t(meta_array_t&&) = delete;
meta_array_t operator=(meta_array_t&&) = delete;
// return the number of subarrays
std::size_t array_count() const noexcept
{
return m_size;
}
// return number of items that can still be allocated
std::size_t free_size() const noexcept
{
return m_free_size;
}
template<std::size_t M>
std::size_t push_back(const type_t(&values)[M])
{
auto block = allocate(M);
std::copy(std::begin(values), std::end(values), block.begin());
return m_blocks.size();
}
auto begin()
{
return m_blocks.begin();
}
auto end()
{
return m_blocks.end();
}
auto& operator[](const std::size_t index)
{
assert(index < m_size);
return m_blocks[index];
}
private:
friend class details::meta_array_block_t<type_t, N>;
void move_back(std::array<type_t,N>::iterator begin, std::size_t offset)
{
std::copy(begin, m_buffer.end() - offset - 1, begin + offset);
// update block administation
for (auto& block : m_blocks)
{
if (block.begin() >= begin )
{
block.move_to_back(offset);
}
}
}
auto allocate(std::size_t size)
{
if ((size == 0ul) || (size > m_free_size)) throw std::bad_alloc();
if (m_last_free_block.size() < size)
{
compact();
}
m_blocks.push_back({ *this, m_last_free_block.begin(), m_last_free_block.begin() + size, size });
m_last_free_block.move_begin(size);
m_free_size -= size;
m_size++;
return m_blocks.back();
}
void compact()
{
assert(false); // not implemented yet
// todo when a gap is found between 2 sub-arrays (compare begin/end iterators) then move
// the next array to the front
// the array after that will move to the front by the sum of the gaps ... etc...
}
std::array<type_t, N> m_buffer;
std::vector<details::meta_array_block_t<type_t,N>> m_blocks;
details::meta_array_block_t<type_t,N> m_last_free_block;
std::size_t m_size;
std::size_t m_free_size;
};
} // meta_array
//---------------------------------------------------------------------------------------------------------------------
#define ASSERT_TRUE(x) assert(x);
#define ASSERT_FALSE(x) assert(!x);
#define ASSERT_EQ(x,y) assert(x==y);
static constexpr std::size_t test_buffer_size = 16;
template<typename type_t, std::size_t N>
void show_arrays(meta_array::meta_array_t<type_t, N>& meta_array)
{
std::cout << "\n--- meta_array ---\n";
for (const auto& sub_array : meta_array)
{
std::cout << "sub array = ";
auto comma = false;
for (const auto& value : sub_array)
{
if (comma) std::cout << ", ";
std::cout << value;
comma = true;
}
std::cout << "\n";
}
}
void test_construction()
{
meta_array::meta_array_t<int, test_buffer_size> meta_array;
ASSERT_EQ(meta_array.array_count(),0ul);
ASSERT_EQ(meta_array.free_size(),test_buffer_size);
}
void test_push_back_success()
{
meta_array::meta_array_t<int, test_buffer_size> meta_array;
meta_array.push_back({ 1,2,3 });
meta_array.push_back({ 14,15 });
meta_array.push_back({ 26,27,28,29 });
ASSERT_EQ(meta_array.array_count(),3ul); // cont
ASSERT_EQ(meta_array.free_size(),(test_buffer_size-9ul));
}
void test_range_based_for()
{
meta_array::meta_array_t<int, test_buffer_size> meta_array;
meta_array.push_back({ 1,2,3 });
meta_array.push_back({ 14,15 });
meta_array.push_back({ 26,27,28,29 });
show_arrays(meta_array);
}
void test_assignment()
{
meta_array::meta_array_t<int, test_buffer_size> meta_array;
meta_array.push_back({ 1,2,3 });
meta_array.push_back({ 4,5,6 });
meta_array.push_back({ 7,8,9 });
meta_array[0] = { 11,12 }; // replace with a smaller array then what there was
meta_array[1] = { 21,22,23,24 }; // replace with a bigger array then there was
show_arrays(meta_array);
}
//---------------------------------------------------------------------------------------------------------------------
int main()
{
test_construction();
test_push_back_success();
test_range_based_for();
test_assignment();
return 0;
}
What is the problem and how to fix it?
Without trying to squeeze a custom allocator, my vector, at first glance, works correctly.
I will also be happy to point out any mistakes in my code. Or an example of the correct implementation of a custom vector or another container, that would help me.
This code doesn't work:
using MyLib::Vector;
int main()
{
//Vector<int> v; //Works fine
Vector<int> v(CustomAllocator<int>());
for (size_t i = 0; i < 256; i++) {
v.push_back(i); //Error: "expression must have class type"
}
}
CustomAllocator implementation (it should be fine):
template <typename T>
class CustomAllocator : public std::allocator<T>
{
private:
using Base = std::allocator<T>;
public:
T* allocate(size_t count){
std::cout << ">> Allocating " << count << " elements" << std::endl;
return Base::allocate(count);
}
T* allocate(size_t count, const void* p)
{
std::cout << ">> Allocating " << count << " elements" << std::endl;
return Base::allocate(count, p);
}
void deallocate(T* p, size_t count)
{
if (p != nullptr)
{
std::cout << ">> Deallocating " << count << " elements" << std::endl;
Base::deallocate(p, count);
}
}
};
Vector implementation:
namespace MyLib
{
template <typename T,
template <typename Y> class Allocator = std::allocator>
class Vector
{
private:
std::size_t capacityV;
std::size_t sizeV;
Allocator<T> alloc;
T* arr;
public:
typedef Allocator<T> AllocatorType;
typedef Vector<T, Allocator> VectorType;
using AllocTraits = std::allocator_traits<Allocator<T>>;
public:
explicit Vector(const AllocatorType& allocator = AllocatorType()) {
capacityV = 0;
sizeV = 0;
alloc = allocator;
arr = nullptr;
}
Vector(const std::initializer_list<T>& values,
const AllocatorType& allocator = AllocatorType()) {
sizeV = values.size();
alloc = allocator;
if (sizeV < 128)
capacityV = 128;
else
capacityV = (sizeV / 128) * 256; //that makes sense
arr = AllocTraits::allocate(alloc, capacityV);
AllocTraits::construct(alloc, arr, capacityV);
std::copy(values.begin(), values.end(), arr);
}
~Vector() {
if (arr)
AllocTraits::deallocate(alloc, arr, capacityV);
}
Vector(const Vector& rhs) {
capacityV = rhs.capacityV;
sizeV = rhs.sizeV;
arr = AllocTraits::allocate(alloc, capacityV);
std::copy(rhs.arr, rhs.arr + rhs.sizeV, arr);
}
Vector(Vector&& rhs) noexcept {
capacityV = std::move(rhs.capacityV);
sizeV = std::move(rhs.sizeV);
arr = std::move(rhs.arr);
}
Vector& operator = (const Vector& rhs) {
capacityV = rhs.capacityV;
sizeV = rhs.sizeV;
arr = AllocTraits::allocate(alloc, capacityV);
std::copy(rhs.arr, rhs.arr + rhs.sizeV, arr);
}
Vector& operator = (Vector&& rhs) {
capacityV = std::move(rhs.capacityV);
sizeV = std::move(rhs.sizeV);
arr = std::move(rhs.arr);
}
T& operator [](std::size_t i) noexcept {
if (i < sizeV)
return arr[i];
else
throw std::out_of_range("Wrong index!");
}
const T& operator [](std::size_t i) const noexcept {
if (i < sizeV)
return arr[i];
else
throw std::out_of_range("Wrong index!");
}
T* data() noexcept {
return arr;
}
const T* data() const noexcept {
return arr;
}
void push_back(const T& value) {
++sizeV;
if (!arr) {
if (!capacityV)
capacityV = 128;
arr = AllocTraits::allocate(alloc, capacityV);
}
if (sizeV > capacityV) {
if(capacityV > UINT32_MAX - 256)
throw std::runtime_error("Vector overflowed!");
size_t tmpCap = capacityV;
capacityV = (sizeV / 128) * 256; //Увеличим capacityV
T* buf = AllocTraits::allocate(alloc, capacityV);
std::move(arr, arr + sizeV - 1, buf);
AllocTraits::deallocate(alloc, arr, capacityV); //sizeof
arr = buf;
}
arr[sizeV - 1] = value;
}
void push_back(T&& value) {
++sizeV;
if (!arr) {
if (!capacityV)
capacityV = 128;
arr = AllocTraits::allocate(alloc, capacityV);
}
if (sizeV > capacityV) {
if (capacityV > UINT32_MAX - 256)
throw std::runtime_error("Vector overflowed!");
size_t tmpCap = capacityV;
capacityV = (sizeV / 128) * 256; //Увеличим capacityV
T* buf = AllocTraits::allocate(alloc, capacityV);
std::move(arr, arr + sizeV - 1, buf);
AllocTraits::deallocate(alloc, arr, capacityV); //sizeof
arr = buf;
}
arr[sizeV - 1] = std::move(value);
}
void pop_back() {
--sizeV;
}
void resize(std::size_t size) {
if (this->sizeV == size)
return;
if (this->sizeV > size) {
this->sizeV = size;
}
else {
size_t tmpSize = size;
if (capacityV >= size) {
this->sizeV = size;
for (size_t i = tmpSize - 1; i < this->sizeV; i++)
arr[i] = 0;
}
else {
size_t tmpCap = capacityV;
capacityV = (size / 128) * 256; //that makes sense
T* buf = AllocTraits::allocate(alloc, capacityV);
std::move(arr, arr + sizeV - 1, buf);
AllocTraits::deallocate(alloc, arr, capacityV); //sizeof
arr = buf;
this->sizeV = size;
for (size_t i = tmpSize - 1; i < this->sizeV; i++)
arr[i] = 0;
}
}
}
void reserve(std::size_t capacity) {
if (capacity > this->capacityV)
{
size_t tmpCap = capacity;
this->capacityV = capacity;
T* buf = AllocTraits::allocate(alloc, capacityV);
std::move(arr, arr + sizeV - 1, buf);
AllocTraits::deallocate(alloc, arr, capacityV); //sizeof
arr = buf;
}
}
std::size_t size() const noexcept {
return sizeV;
}
std::size_t capacity() const noexcept {
return capacityV;
}
bool empty() const noexcept {
return (sizeV == 0);
}
};
}
Vector<int> v(CustomAllocator<int>());
You got hit by the most vexing parse. Vector<int> v(CustomAllocator<int>()); can be parsed as a variable declaration or a function declaration and the grammar prefers the latter. Therefore, the compiler thinks that v is a function and this it why you get the "expression must have class type" error -- you can only invoke methods on values with a class type, but v is a function.
Even if you fixed that error using one of these options:
// C++03 solution (extra parens)
Vector<int> v((CustomAllocator<int>()));
// C++11 solution (uniform initialization)
Vector<int> v{CustomAllocator<int>{}};
Your code still wouldn't do what you expected, though it would run. Vector<int> is the same thing as Vector<int, std::allocator> and so v will still use the standard allocator.
Why doesn't this cause a compilation error? Because CustomAllocator<int> inherits std::allocator<int> (which it shouldn't!), so the std::allocator<int> copy constructor is used to slice your custom allocator into an std::allocator<int> and then the program proceeds using the standard allocator. Your CustomAllocator<int> temporary is basically converted into an std::allocator<int>.
To illustrate, both of the above "fixed" examples are roughly equivalent to this code (if we disregard some value copies/moves that are irrelevant to the program's observable behavior):
// Creates a custom allocator value.
CustomAllocator<int> a;
// Custom allocator is converted to std::allocator<int>.
std::allocator<int> b(a);
// std::allocator<int> is provided to the Vector constructor.
Vector<int> v(b);
The correct fix is to specify the second Vector type parameter and then the constructor arguments aren't even needed since the default constructor will do the right thing:
Vector<int, CustomAllocator> v;
i am trying to implement a lockfree stack to be usable with external managed memory from a bounded plain c array. I know reference implementations (like from Anthony Williams: Concurrency in Action) and other books and blogs/article around the web.
The implementation follows those references and avoids the ABA problem, because external memory locations are addressed using unique indexes, rather than recycled pointers. Therefore it does not need to deal with mem managment at all and is simple.
I wrote some tests that execute pop and push operations on that stack under high load and contention (stress tests) and single threaded. The former fail with strange problems, that I do not understand and to me look obscure.
Maybe someone has an idea ?
Problem: Pushing an already popped node back to the stack fails, because precondition is violated that node has no successor (next).
BOOST_ASSERT(!m_aData.m_aNodes[nNode-1].next);
Reproduction setup: At least 3 threads and a capacity of ~16. Around 500 passes. Then push op fails.
Problem: Number of elements popped by all threads and number of elements left in stack after join do not match capacity (nodes lost in transition).
BOOST_ASSERT(aNodes.size()+nPopped == nCapacity);
Reproduction setup: 2 threads and capacity 2. Requires a lot of passes to occur, for me at least 700. After that head of stack is 0, but only one node is present in popped container. Node {2,0} is dangling.
I compiled with vs2005, vs2013 and vs2015. All have the same problem (vs2005 is also the reason that code looks C++03 like).
Here is the basic code for node+stack
template <typename sizeT> struct node
{
sizeT cur; //!< construction invariant
atomic<sizeT> next;
atomic<sizeT> data;
explicit node() // invalid node
: cur(0), next(0), data(0)
{}
explicit node(sizeT const& nCur, sizeT const& nNext, sizeT const& nData)
: cur(nCur), next(nNext), data(nData)
{}
node& operator=(node const& rhs)
{
cur = rhs.cur;
next.store(rhs.next.load(memory_order_relaxed));
data.store(rhs.data.load(memory_order_relaxed));
return *this;
}
};
template <typename sizeT> struct stack
{
private:
static memory_order const relaxed = memory_order_relaxed;
atomic<sizeT> m_aHead;
public:
explicit stack(sizeT const& nHead) : m_aHead(nHead) {}
template <typename tagT, typename T, std::size_t N>
typename enable_if<is_same<tagT,Synchronized>,sizeT>::type
pop(T (&aNodes)[N])
{
sizeT nOldHead = m_aHead.load();
for(;;)
{
if(!nOldHead) return 0;
BOOST_ASSERT(nOldHead <= N);
T& aOldHead = aNodes[nOldHead-1];
sizeT const nNewHead = aOldHead.next.load(/*relaxed*/);
BOOST_ASSERT(nNewHead <= N);
sizeT const nExpected = nOldHead;
if(m_aHead.compare_exchange_weak(nOldHead,nNewHead
/*,std::memory_order_acquire,std::memory_order_relaxed*/))
{
BOOST_ASSERT(nExpected == nOldHead);
// <--- from here on aOldHead is thread local ---> //
aOldHead.next.store(0 /*,relaxed*/);
return nOldHead;
}
// TODO: add back-off strategy under contention (use loop var)
}
}
template <typename tagT, typename T, std::size_t N>
typename enable_if<is_same<tagT,Synchronized>,void>::type
push(T (&aNodes)[N], sizeT const& nNewHead)
{
#ifndef NDEBUG
{
BOOST_ASSERT(0 < nNewHead && nNewHead <= N);
sizeT const nNext = aNodes[nNewHead-1].next;
BOOST_ASSERT(!nNext);
}
#endif
sizeT nOldHead = m_aHead.load(/*relaxed*/);
for(;;)
{
aNodes[nNewHead-1].next.store(nOldHead /*,relaxed*/);
sizeT const nExpected = nOldHead;
BOOST_ASSERT(nOldHead <= N);
if(m_aHead.compare_exchange_weak(nOldHead,nNewHead
/*,std::memory_order_release,std::memory_order_relaxed*/))
{
BOOST_ASSERT(nExpected == nOldHead);
return;
}
// TODO: add back-off strategy under contention (use loop var)
}
}
};
and the quite noisy test class
class StackTest
{
private:
typedef boost::mpl::size_t<64> Capacity;
//typedef boost::uint_t<static_log2_ceil<Capacity::value>::value>::least size_type;
typedef std::size_t size_type;
static size_type const nCapacity = Capacity::value;
static size_type const nNodes = Capacity::value;
typedef node<size_type> Node;
typedef stack<size_type> Stack;
typedef mt19937 Twister;
typedef random::uniform_int_distribution<std::size_t> Distribution;
typedef variate_generator<Twister,Distribution> Die;
struct Data //!< shared along threads
{
Node m_aNodes[nNodes];
Stack m_aStack;
explicit Data() : m_aStack(nNodes)
{
m_aNodes[0] = Node(1,0,0); // tail of stack
for(size_type i=1; i<nNodes; ++i)
{
m_aNodes[i] = Node(static_cast<size_type>(i+1),i,0);
}
}
template <typename syncT>
void Run(
uuids::random_generator& aUUIDGen,
std::size_t const& nPasses,
std::size_t const& nThreads)
{
std::vector<ThreadLocalData> aThreadLocalDatas(nThreads,ThreadLocalData(*this));
{
static std::size_t const N = 100000;
Die aRepetition(Twister(hash_value(aUUIDGen())),Distribution(0,N));
Die aAction(Twister(hash_value(aUUIDGen())),Distribution(0,1));
for(std::size_t i=0; i<nThreads; ++i)
{
std::vector<bool>& aActions = aThreadLocalDatas[i].m_aActions;
std::size_t const nRepetition = aRepetition();
aActions.reserve(nRepetition);
for(std::size_t k=0; k<nRepetition; ++k)
{
aActions.push_back(static_cast<bool>(aAction()));
}
}
}
std::size_t nPopped = 0;
if(nThreads == 1)
{
std::size_t const i = 0;
aThreadLocalDatas[i].Run<syncT>(i);
nPopped += aThreadLocalDatas[i].m_aPopped.size();
}
else
{
std::vector<boost::shared_ptr<thread> > aThreads;
aThreads.reserve(nThreads);
for(std::size_t i=0; i<nThreads; ++i)
{
aThreads.push_back(boost::make_shared<thread>(boost::bind(&ThreadLocalData::Run<syncT>,&aThreadLocalDatas[i],i)));
}
for(std::size_t i=0; i<nThreads; ++i)
{
aThreads[i]->join();
nPopped += aThreadLocalDatas[i].m_aPopped.size();
}
}
std::vector<size_type> aNodes;
aNodes.reserve(nCapacity);
while(size_type const nNode = m_aStack.pop<syncT>(m_aNodes))
{
aNodes.push_back(nNode);
}
std::clog << dump(m_aNodes,4) << std::endl;
BOOST_ASSERT(aNodes.size()+nPopped == nCapacity);
}
};
struct ThreadLocalData //!< local to each thread
{
Data& m_aData; //!< shared along threads
std::vector<bool> m_aActions; //!< either pop or push
std::vector<size_type> m_aPopped; //!< popp'ed nodes
explicit ThreadLocalData(Data& aData)
: m_aData(aData), m_aActions(), m_aPopped()
{
m_aPopped.reserve(nNodes);
}
template <typename syncT>
void Run(std::size_t const& k)
{
BOOST_FOREACH(bool const& aAction, m_aActions)
{
if(aAction)
{
if(size_type const nNode = m_aData.m_aStack.pop<syncT>(m_aData.m_aNodes))
{
BOOST_ASSERT(!m_aData.m_aNodes[nNode-1].next);
m_aPopped.push_back(nNode);
}
}
else
{
if(!m_aPopped.empty())
{
size_type const nNode = m_aPopped.back();
size_type const nNext = m_aData.m_aNodes[nNode-1].next;
ASSERT_IF(!nNext,"nNext=" << nNext << " for " << m_aData.m_aNodes[nNode-1] << "\n\n" << dump(m_aData.m_aNodes));
m_aData.m_aStack.push<syncT>(m_aData.m_aNodes,nNode);
m_aPopped.pop_back();
}
}
}
}
};
template <typename syncT>
static void PushPop(
uuids::random_generator& aUUIDGen,
std::size_t const& nPasses,
std::size_t const& nThreads)
{
BOOST_ASSERT(nThreads > 0);
BOOST_ASSERT(nThreads == 1 || (is_same<syncT,Synchronized>::value));
std::clog << BOOST_CURRENT_FUNCTION << " with threads=" << nThreads << std::endl;
for(std::size_t nPass=0; nPass<nPasses; ++nPass)
{
std::ostringstream s;
s << " " << nPass << "/" << nPasses << ": ...";
std::clog << s.str() << std::endl;
Data().Run<syncT>(aUUIDGen,nPass,nThreads);
}
}
public:
static void Run()
{
typedef StackTest self_t;
uuids::random_generator aUUIDGen;
static std::size_t const nMaxPasses = 1000;
Die aPasses(Twister(hash_value(aUUIDGen())),Distribution(0,nMaxPasses));
{
//std::size_t const nThreads = 2; // thread::hardware_concurrency()+1;
std::size_t const nThreads = thread::hardware_concurrency()+1;
self_t().PushPop<Synchronized>(aUUIDGen,aPasses(),nThreads);
}
}
};
Here is a link to download all required files.
Both problems are just another facet of the ABA problem.
stack: {2,1},{1,0}
Thread A
pop
new_head=1
... time slice exceeded
Thread B
pop
stack: {1,0}, popped: {2,0}
pop
stack: {}, popped: {2,0}, {1,0}
push({2,0})
stack: {2,0}
Thread A
pop continued
cmp_exch succeeds, because head is 2
stack: {}, head=1 --- WRONG, 0 would be correct
Any problems may arise, because access to nodes is not thread local anymore. This includes unexpected modifications of next for popped nodes (problem 1) or lost nodes (problem 2).
head+next need to be modified in one cmp_exch to avoid that problem.
I'm coding in C++, and I have the following code:
int array[30];
array[9] = 1;
array[5] = 1;
array[14] = 1;
array[8] = 2;
array[15] = 2;
array[23] = 2;
array[12] = 2;
//...
Is there a way to initialize the array similar to the following?
int array[30];
array[9,5,14] = 1;
array[8,15,23,12] = 2;
//...
Note: In the actual code, there can be up to 30 slots that need to be set to one value.
This function will help make it less painful.
void initialize(int * arr, std::initializer_list<std::size_t> list, int value) {
for (auto i : list) {
arr[i] = value;
}
}
Call it like this.
initialize(array,{9,5,14},2);
A variant of aaronman's answer:
template <typename T>
void initialize(T array[], const T& value)
{
}
template <size_t index, size_t... indices, typename T>
void initialize(T array[], const T& value)
{
array[index] = value;
initialize<indices...>(array, value);
}
int main()
{
int array[10];
initialize<0,3,6>(array, 99);
std::cout << array[0] << " " << array[3] << " " << array[6] << std::endl;
}
Example: Click here
Just for the fun of it I created a somewhat different approach which needs a bit of infrastructure allowing initialization like so:
double array[40] = {};
"9 5 14"_idx(array) = 1;
"8 15 23 12"_idx(array) = 2;
If the digits need to be separated by commas, there is a small change needed. In any case, here is the complete code:
#include <algorithm>
#include <iostream>
#include <sstream>
#include <iterator>
template <int Size, typename T = int>
class assign
{
int d_indices[Size];
int* d_end;
T* d_array;
void operator=(assign const&) = delete;
public:
assign(char const* base, std::size_t n)
: d_end(std::copy(std::istream_iterator<int>(
std::istringstream(std::string(base, n)) >> std::skipws),
std::istream_iterator<int>(), this->d_indices))
, d_array()
{
}
assign(assign<Size>* as, T* a)
: d_end(std::copy(as->begin(), as->end(), this->d_indices))
, d_array(a) {
}
assign(assign const& o)
: d_end(std::copy(o.begin(), o.end(), this->d_indices))
, d_array(o.d_array)
{
}
int const* begin() const { return this->d_indices; }
int const* end() const { return this->d_end; }
template <typename A>
assign<Size, A> operator()(A* array) {
return assign<Size, A>(this, array);
}
void operator=(T const& value) {
for (auto it(this->begin()), end(this->end()); it != end; ++it) {
d_array[*it] = value;
}
}
};
assign<30> operator""_idx(char const* base, std::size_t n)
{
return assign<30>(base, n);
}
int main()
{
double array[40] = {};
"1 3 5"_idx(array) = 17;
"4 18 7"_idx(array) = 19;
std::copy(std::begin(array), std::end(array),
std::ostream_iterator<double>(std::cout, " "));
std::cout << "\n";
}
I just had a play around for the sake of fun / experimentation (Note my concerns at the bottom of the answer):
It's used like this:
smartAssign(array)[0][8] = 1;
smartAssign(array)[1][4][2] = 2;
smartAssign(array)[3] = 3;
smartAssign(array)[5][9][6][7] = 4;
Source code:
#include <assert.h> //Needed to test variables
#include <iostream>
#include <cstddef>
template <class ArrayPtr, class Value>
class SmartAssign
{
ArrayPtr m_array;
public:
class Proxy
{
ArrayPtr m_array;
size_t m_index;
Proxy* m_prev;
Proxy(ArrayPtr array, size_t index)
: m_array(array)
, m_index(index)
, m_prev(nullptr)
{ }
Proxy(Proxy* prev, size_t index)
: m_array(prev->m_array)
, m_index(index)
, m_prev(prev)
{ }
void assign(Value value)
{
m_array[m_index] = value;
for (auto prev = m_prev; prev; prev = prev->m_prev) {
m_array[prev->m_index] = value;
}
}
public:
void operator=(Value value)
{
assign(value);
}
Proxy operator[](size_t index)
{
return Proxy{this, index};
}
friend class SmartAssign;
};
SmartAssign(ArrayPtr array)
: m_array(array)
{
}
Proxy operator[](size_t index)
{
return Proxy{m_array, index};
}
};
template <class T>
SmartAssign<T*, T> smartAssign(T* array)
{
return SmartAssign<T*, T>(array);
}
int main()
{
int array[10];
smartAssign(array)[0][8] = 1;
smartAssign(array)[1][4][2] = 2;
smartAssign(array)[3] = 3;
smartAssign(array)[5][9][6][7] = 4;
for (auto i : array) {
std::cout << i << "\n";
}
//Now to test the variables
assert(array[0] == 1 && array[8] == 1);
assert(array[1] == 2 && array[4] == 2 && array[2] == 2);
assert(array[3] == 3);
assert(array[5] == 4 && array[9] == 4 && array[6] == 4 && array[7] == 4);
}
Let me know what you think, I don't typically write much code like this, I'm sure someone will point out some problems somewhere ;)
I'm not a 100% certain of the lifetime of the proxy objects.
The best you can do if your indexes are unrelated is "chaining" the assignments:
array[9] = array[5] = array[14] = 1;
However if you have some way to compute your indexes in a deterministic way you could use a loop:
for (size_t i = 0; i < 3; ++i)
array[transform_into_index(i)] = 1;
This last example also obviously applies if you have some container where your indexes are stored. So you could well do something like this:
const std::vector<size_t> indexes = { 9, 5, 14 };
for (auto i: indexes)
array[i] = 1;
Compilers which still doesn't support variadic template argument and universal initialization list, it can be a pain to realize, that some of the posted solution will not work
As it seems, OP only intends to work with arrays of numbers, valarray with variable arguments can actually solve this problem quite easily.
#include <valarray>
#include <cstdarg>
#include <iostream>
#include <algorithm>
#include <iterator>
template <std::size_t size >
std::valarray<std::size_t> selection( ... )
{
va_list arguments;
std::valarray<std::size_t> sel(size);
//Skip the first element
va_start ( arguments, size );
va_arg ( arguments, int );
for(auto &elem : sel)
elem = va_arg ( arguments, int );
va_end ( arguments );
return sel;
}
int main ()
{
//Create an array of 30 integers
std::valarray<int> array(30);
//The first argument is the count of indexes
//followed by the indexes of the array to initialize
array[selection<3>(9,5,14)] = 1;
array[selection<4>(8,15,13, 12)] = 2;
std::copy(std::begin(array), std::end(array),
std::ostream_iterator<int>(std::cout, " "));
return 0;
}
I remember, for static initialization exist syntax like:
int array[30] = {
[9] = 1, [8] = 2
}
And so on. This works in gcc, about another compilers - I do not know.
Use overload operator << .
#include <iostream>
#include <iomanip>
#include <cmath>
// value and indexes wrapper
template< typename T, std::size_t ... Ints> struct _s{ T value; };
//deduced value type
template< std::size_t ... Ints, typename T>
constexpr inline _s<T, Ints... > _ ( T const& v )noexcept { return {v}; }
// stored array reference
template< typename T, std::size_t N>
struct _ref
{
using array_ref = T (&)[N];
array_ref ref;
};
//join _s and _ref with << operator.
template<
template< typename , std::size_t ... > class IC,
typename U, std::size_t N, std::size_t ... indexes
>
constexpr _ref<U,N> operator << (_ref<U,N> r, IC<U, indexes...> ic ) noexcept
{
using list = bool[];
return ( (void)list{ false, ( (void)(r.ref[indexes] = ic.value), false) ... }) , r ;
//return r;
}
//helper function, for creating _ref<T,N> from array.
template< typename T, std::size_t N>
constexpr inline _ref<T,N> _i(T (&array)[N] ) noexcept { return {array}; }
int main()
{
int a[15] = {0};
_i(a) << _<0,3,4,5>(7) << _<8,9, 14>( 6 ) ;
for(auto x : a)std::cout << x << " " ;
// 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
//result: 7 0 0 7 7 7 0 0 6 6 0 0 0 0 6
double b[101]{0};
_i(b) << _<0,10,20,30,40,50,60,70,80,90>(3.14)
<< _<11,21,22,23,24,25>(2.71)
<< _<5,15,25,45,95>(1.414) ;
}
struct _i_t
{
int * array;
struct s
{
int* array;
std::initializer_list<int> l;
s const& operator = (int value) const noexcept
{
for(auto i : l )
array[i] = value;
return *this;
}
};
s operator []( std::initializer_list<int> i ) const noexcept
{
return s{array, i};
}
};
template< std::size_t N>
constexpr _i_t _i( int(&array)[N]) noexcept { return {array}; }
int main()
{
int a[15] = {0};
_i(a)[{1,3,5,7,9}] = 7;
for(auto x : a)std::cout << x << ' ';
}
Any fancy trickery you do will be unrolled by the compiler/assembler into exactly what you have. Are you doing this for readability reasons? If your array is already init, you can do:
array[8] = array[15] = array[23] = array[12] = 2;
But I stress my point above; it will be transformed into exactly what you have.