Queue Algorithms - c++

I am trying to make a linked list based queue for fast operations, now that's what i have:
template< typename T,
template< typename > class Allocator = default_allocator
>
class fast_queue
{
public:
typedef T element_type;
typedef T* element_ptr;
private:
typedef struct e_node
{
element_type val;
e_node * next;
}node_type;
typedef node_type * node_ptr;
node_ptr parent;
node_ptr child;
int m_size;
typedef Allocator< node_type > allocator_type;
public:
fast_queue()
: parent( 0 )
, child( 0 )
, m_size( 0 )
{
}
~fast_queue() {
this->clear();
}
void push(element_type& i)
{
node_ptr n = allocator_type::alloc();
n->val = i;
n->next = (node_ptr)0;
if(child) {
child->next = n;
} else {
parent = n;
}
child = n;
m_size++;
}
element_type pop()
{
element_type ret = parent->val;
node_ptr p = parent;
parent = parent->next;
m_size--;
allocator_type::dealloc(p);
return ret;
}
inline int size() const
{
return m_size;
}
inline bool empty() const
{
return (m_size == 0);
}
void clear()
{
while(!empty()) {
pop();
}
child = 0;
}
};
Pretty straightforward, now what i'm having a problem with is the clear() function.
It seems to be taking way too much time deallocating all the nodes in the queue(7 seconds).
So the question is, what might be a better algorithm? I tried to understand MSVC's implementation of std::deque but the code is so bulky for me to understand.
EDIT:
The queue should be generic, allowing arbitrary types of data to be queued.
Here is my testing code (Windows)
DWORD time1 = timeGetTime();
fast_queue<int> queue;
for(int i = 0; i < 100000; i++) {
queue.push(i);
}
queue.clear();
cout << "Test time: " << (int)(timeGetTime() - time1) << " milliseconds" << endl;

You're constructing a linked list. The deque implementation stores many, many elements in each allocation. You, however, allocate and deallocate individually for each element. This is why your queue is so slow.
In addition to this, the Standard's queue interface says that you should take a complete Allocator type, not a template, although the reality of fulfilling the Standard's allocator requirements are that it must be a template anyway.

There's not much you can do by changing the push/pop/clear algorithms, because 95% of the time goes to allocation and deallocation of the nodes. But there are some things you could do:
1) Use some kind of memory pool for the nodes. You could either use a pool allocator (boost::pool_alloc is a good one if you don't want to implement your own), or you could use an internal node cache in the queue class. So instead of deleting nodes you just push it to the node cache, and when creating nodes you pop them from the cache.
2) Store multiple items in one node. For example if you have 8 items in one node you only have to alloc/dealloc once every 8 pushes/pops. Of course this requires slightly more complicated code; in addition to having pointers to the head and tail nodes, you would also need index variables for both of them to keep track of how many items are actually in use.

I had to take out all the Allocator stuff to get it to compile (under g++ 4.40), but it runs in no time at all. Even if I push 100 times more elements, it only takes about half a second to populate the queue and half a second to clear it. Have you tried using new and delete?

As suggested by other people, memory allocation/deallocation is the largest performance problem here.
I suggest that you try boost::circular_buffer. If the default size is set high enough, it will only cause one memory allocation over its lifetime.

Related

C++ undefined behavior with too many heap pointer deletions

I wrote a program to create a linked list, and I got undefined behavior (or I assume I did, given the program just stopped without any error) when I increased the size of the list to a certain degree and, critically, attempted to delete it (through ending its scope). A basic version of the code is below:
#include <iostream>
#include <memory>
template<typename T> struct Nodeptr;
template<class T>
struct Node {
Nodeptr<T> next;
T data;
Node(const T& data) : data(data) { }
};
template<class T>
struct Nodeptr : public std::shared_ptr<Node<T>> {
Nodeptr() : std::shared_ptr<Node<T>>(nullptr) { }
Nodeptr(const T& data) : std::shared_ptr<Node<T>>(new Node<T>(data)) { }
};
template<class T>
struct LinkedList {
Nodeptr<T> head;
void prepend(const T& data) {
auto new_head = Nodeptr<T>(data);
new_head->next = head;
head = new_head;
}
};
int main() {
int iterations = 10000;
{
LinkedList<float> ls;
std::cout << "START\n";
for(float k = 0.0f; k < iterations; k++) {
ls.prepend(k);
}
std::cout << "COMPLETE\n";
}
std::cout << "DONE\n";
return 0;
}
Right now, when the code is run, START and COMPLETE are printed, while DONE is not. The program exits prior without an error (for some reason).
When I decrease the variable to, say, 5000 instead of 10000, it works just fine and DONE is printed. When I delete the curly braces around the LinkedList declaration/testing block (taking it out its smaller scope, causing it NOT to be deleted before DONE is printed), then everything works fine and DONE is printed. Therefore, the error must be arising because of the deletion process, and specifically because of the volume of things being deleted. There is, however, no error message telling me that there is no more space left in the heap, and 10000 floats seems like awfully little to be filling up the heap anyhow. Any help would be appreciated!
Solved! It now works directly off of heap pointers, and I changed Node's destructor to prevent recursive calls to it:
~Node() {
if(!next) return;
Node* ptr = next;
Node* temp = nullptr;
while(ptr) {
temp = ptr->next;
ptr->next = nullptr;
delete ptr;
ptr = temp;
}
}
It is a stack overflow caused by recursive destructor calls.
This is a common issue with smart pointers one should be aware of when writing any deeply-nested data structure.
You need to an explicit destructor for Node removing elements iteratively by reseting the smart pointers starting from the tail of the list. Also follow the rule-of-3/5 and do the same for all other operations that might destroy nodes recursively as well.
Because this is essentially rewriting all object destruction it does however make use of smart pointers in the first place somewhat questionable, although there is still some benefit in preventing double delete (and leak) mistakes. Therefore it is common to simply not use smart pointers in such a situation at all and instead fall back to raw pointers and manual lifetime management for the data structure's nodes.
Also, there is no need to use std::shared_ptr here in any case. There is only one owner per node. It should be std::unique_ptr. std::shared_ptr has a very significant performance impact and also has the issue of potentially causing leaks if circular references are formed (whether intentionally or not).
I also don't see any point in having Nodeptr as a separate class. It seems to be used to just be an alias for std::make_shared<Node>, which can be achieved by just using a function instead. Especially inheriting from a standard library smart pointer seems dubious. If at all I would use composition instead.

Using Hashtables to get random access to linked list

One of major drawbacks to linked lists is that the access time to elements is linear. Hashtables, on the other hand, have access in constant time. However, linked lists have constant insertion and deletion time given an adjacent node in the list. I am trying to construct a FIFO datastructure with constant access time, and constant insertion/deletion time. I came up with the following code:
unordered_map<string key, T*> hashTable;
list<T> linkedList;
T Foo;
linkedList.push_front(Foo);
hashTable.insert(pair<string, T*>("A", &Foo);
T Bar;
linkedList.push_front(Bar);
hashTable.insert(pair<string, T*>("B", &Bar);
However, this code feels like it is really dangerous. The idea was that I could use the hashtable to access any given element in constant time, and since insertion always occurs at the start of the list, and deletion from the end of the list, I can insert and delete elements in constant time. Is there anything inherently poor about the above code? If I wanted to instead store pointers to the nodes in the linkedList to have constant insertion/deletion time from any node would I just store list< T >::iterator*?
If you build a lookup table you will end up with 2 data structures holding the same data which does not make any sense.
The best thing you can do is making your linked list ordered and build a sparse lookup table in some way to select the start node to search in order to amortize some run time.
It is unclear how you want to access the list elements. From the code you provided, I assume you want the first pushed node as "A", the second pushed node as "B", etc. Then my question is what happens when we delete the first pushed node? Does the second pushed node become "A"?
If the node identifiers don't change when updating the list, your approach seems alright.
If the node identifiers has to change on list update, then here is a lightweight and limited approach:
const int MAX_ELEMENTS = 3;
vector<int> arr(MAX_ELEMENTS + 1);
int head = 0, tail = 0;
int get_size() {
if (head > tail)
return tail + ((int)arr.size() - head);
return tail - head;
}
void push(int value) {
if (get_size() == MAX_ELEMENTS) {
// TODO: handle push to full queue
cout << "FULL QUEUE\n";
return;
}
arr[tail++] = value;
if (tail == (int)arr.size()) {
tail = 0;
}
}
int pop() {
if (get_size() == 0) {
// TODO: handle access to empty queue
cout << "EMPTY QUEUE\n";
return -1;
}
int result = arr[head++];
if (head == (int)arr.size()) {
head = 0;
}
return result;
}
int get_item_at(int id) {
if (id >= get_size()) {
// TODO: index out of range
cout << "INDEX OUT OF RANGE\n";
return -1;
}
int actual_id = head + id;
if (actual_id >= (int)arr.size()) {
actual_id -= (int)arr.size();
}
return arr[actual_id];
}
The above approach will keep indices up-to-date (eg. get_item_at(0) will always return the first node in the queue). You can map ids to any suitable id you want like "A" -> 0, "B" -> 1, etc. The limitation of this solution is that you won't be able to store more than MAX_ELEMENTS in the queue.
If I wanted to instead store pointers to the nodes in the linkedList to have constant insertion/deletion time from any node would I just store list< T >::iterator*?
If identifiers must change with insertion/deletion, then it is going to take O(n) time anyways.
This idea makes sense, especially for large lists where order of insertion matters but things might be removed in the middle. What you want to do is create your own data structure:
template<typename T> class constantAccessLinkedList {
public:
void insert(const T& val) {
mData.push_back(val); //insert into rear of list, O(1).
mLookupTable.insert({val, mData.back()}); //insert iterator into map. O(1).
}
std::list<T>::iterator find(const T& val) {
return mLookupTable[val]; //O(1) lookup time for list member.
}
void delete(const T& val) {
auto iter = mLookupTable.find(val); //O(1)get list iterator
mLookupTable.erase(val); //O(1)remove from LUT.
mData.erase(iter); //O(1) erase from list.
}
private:
std::list<T> mData;
std::unordered_map<T, std::list<T>::iterator> mLookupTable;
};
The reason you can do this with list and unordered_map is because list::iterators are not invalidated on modifying the underlying container.
This will keep your data in order in the list, but will provide constant access time to iterators, and allow you to remove elements in constant time.

How to avoid using new operator in C++?

I have a C++ program that creates Huffman codes for all characters in file. It works good, but I want to create nodes without using new operator because I know that you shouldn't use it. I tried using a vector global variable for saving nodes but that doesn't work.
std::vector<Node> nodes;
Node* create_node(unsigned char value, unsigned long long counter, Node* left, Node* right) {
Node temp;
temp.m_value = value;
temp.m_counter = counter;
temp.m_left = left;
temp.m_right = right;
nodes.push_back(temp);
return &nodes[nodes.size() - 1];
}
Edit: I added more code, I did't really explained what doesn't work. Problem is in generate_code(), it never reaches nullptr. I also tried using Node and not Node* but the same thing happened.
void generate_code(Node* current, std::string code, std::map<unsigned char, std::string>& char_codes) {
if (current == nullptr) {
return;
}
if (!current->m_left && !current->m_right) {
char_codes[current->m_value] = code;
}
generate_code(current->m_left, code + "0", char_codes);
generate_code(current->m_right, code + "1", char_codes);
}
void huffman(std::ifstream& file) {
std::unordered_map<unsigned char, ull> char_frequency;
load_data(file, char_frequency);
std::priority_queue<Node*, std::vector<Node*>, Comparator> queue;
for (auto& node : char_frequency) {
queue.push(create_node(node.first, node.second, nullptr, nullptr));
}
while (queue.size() != 1) {
Node* left = queue.top();
queue.pop();
Node* right = queue.top();
queue.pop();
auto counter = left->m_counter + right->m_counter;
queue.push(create_node('\0', counter, left, right));
}
std::map<unsigned char, std::string> char_codes;
Node* root = queue.top();
generate_code(root, "", char_codes);
for (auto& i : char_codes) {
std::cout << +i.first << ": " << i.second << "\n";
}
}
The general answer is of course to use smart pointers, like std::shared_ptr<Node>.
That said, using regular pointers is not that bad, especially if you hide all pointers from the outside. I wouldn't agree with "you shouldn't use new", more like "you should realize that you have to make sure not to create a memory leak if you do".
In any case, for something like you do, especially with your vector, you don't need actual pointers at all. Simply store an index for your vector and replace every occurence of Node* by int, somewhat like:
class Node
{
public:
// constructors and accessors
private:
ValueType value;
int index_left;
int index_right;
}
I used a signed integer as index here in order to allow storing -1 for a non-existent reference, similar to a null pointer.
Note that this only works if nothing gets erased from the vector, at least not before everything is destroyed. If flexibility is the key, you need pointers of some sort.
Also note that you should not have a vector as a global variable. Instead, have a wrapping class, of which Node is an inner class, somewhat like this:
class Tree
{
public:
class Node
{
...
};
// some methods here
private:
vector<Node> nodes;
}
With such an approach, you can encapsulate your Node class better. Tree should most likely be a friend. Each Node would store a reference to the Tree it belongs to.
Another possibility would be to make the vector a static member for Node, but I would advise against that. If the vector is a static member of Node or a global object, in both cases, you have all trees you create being in one big container, which means you can't free your memory from one of them when you don't need it anymore.
While this would technically not be a memory leak, in practice, it could easily work as one.
On the other hand, if it is stored as a member of a Tree object, the memory is automatically freed as soon as that object is removed.
but I want to create nodes without using new operator because I know that you shouldn't use it.
The reason it is discouraged to use new directly is that the semantics of ownership (i.e. who is responsible for the corresponding delete) isn't clear.
The c++ standard library provides the Dynamic memory management utilities for this, the smart pointers in particular.
So I think your create function should look like follows:
std::unique_ptr<Node> create_node(unsigned char value, unsigned long long counter, Node* left, Node* right) {
std::unique_ptr<Node> temp = std::make_unique<Node>();
temp->m_value = value;
temp->m_counter = counter;
temp->m_left = left;
temp->m_right = right;
return temp;
}
This way it's clear that the caller takes ownership of the newly created Node instance.

Is it concurrency-safe to call concurrency::concurrent_vector::push_back while iterating over that concurrent_vector in other thread?

push_back, begin, end are described as concurrent safe in
https://learn.microsoft.com/en-us/cpp/parallel/concrt/reference/concurrent-vector-class?view=vs-2019#push_back
However the below code is asserting. Probably because element is added but not initialized yet.
struct MyData
{
explicit MyData()
{
memset(arr, 0xA5, sizeof arr);
}
std::uint8_t arr[1024];
};
struct MyVec
{
concurrency::concurrent_vector<MyData> v;
};
auto vector_pushback(MyVec &vec) -> void
{
vec.v.push_back(MyData{});
}
auto vector_loop(MyVec &vec) -> void
{
MyData myData;
for (auto it = vec.v.begin(); it != vec.v.end(); ++it)
{
auto res = memcmp(&(it->arr), &(myData.arr), sizeof myData.arr);
assert(res == 0);
}
}
int main()
{
auto vec = MyVec{};
auto th_vec = std::vector<std::thread>{};
for (int i = 0; i < 1000; ++i)
{
th_vec.emplace_back(vector_pushback, std::ref(vec));
th_vec.emplace_back(vector_loop, std::ref(vec));
}
for(auto &th : th_vec)
th.join();
return 0;
}
According to the docs, it should be safe to append to a concurrency::concurrent_vector while iterating over it because the elements are not actually stored contiguously in memory like std::vector:
A concurrent_vector object does not relocate its elements when you append to it or resize it. This enables existing pointers and iterators to remain valid during concurrent operations.
However, looking at the actual implementation of push_back in VS2017, I see the following, which I don't think is thread-safe:
iterator push_back( _Ty &&_Item )
{
size_type _K;
void *_Ptr = _Internal_push_back(sizeof(_Ty), _K);
new (_Ptr) _Ty( std::move(_Item));
return iterator(*this, _K, _Ptr);
}
I have to speculate on _Internal_push_back here, but I'd wager it allocates raw memory for storing the item (and points the last element towards this new node) so that the next line can use emplacement new. I'd imagine that _Internal_push_backis internally thread-safe, however I don't see any synchronization happening before the emplacement new. Meaning the following is possible:
memory is obtained and the node is "present" (yet emplacement new hasn't happend)
the looping thread encounters this node and performs memcmp to discover that they're not equal
emplacement new happens.
There's definitely a race condition here. I can spontaneously reproduce the problem, moreso the more threads I use.
I recommend that you open a ticket with Microsoft support on this one.

Nested shared_ptr destruction causes stack overflow

This is a more of design problem (I know why this is happening, just want to see how people deal with it). Suppose I have a simple linked list struct:
struct List {
int head;
std::shared_ptr<List> tail;
};
The shared_ptr enables sharing of sublists between multiple lists. However, when the list gets very long, a stack overflow might happen in its destructor (caused by recursive releases of shared_ptrs). I've tried using an explicit stack, but that gets very tricky since a tail can be owned by multiple lists. How can I design my List to avoid this problem?
UPDATE: To clarify, I'm not reinventing the wheel (std::forward_list). The List above is only a simplified version of the real data structure. The real data structure is a directed acyclic graph, which if you think about it is just a lot of of linked lists with shared tails/heads. It's usually prohibitively expensive to copy the graph, so data sharing is necessary.
UPDATE 2: I'm thinking about explicitly traversing down the pointer chain and std::move as I go. Something like:
~List()
{
auto p = std::move(tail);
while (p->tail != nullptr && p->tail.use_count() == 1) {
// Some other thread may start pointing to `p->tail`
// and increases its use count before the next line
p = std::move(p->tail);
}
}
This seems to work in a single thread, but I'm worried about thread safety.
If you're having problems with stack overflows on destruction for your linked datastructure, the easiest fix is just to implement deferred cleanup:
struct Graph {
std::shared_ptr<Graph> p1, p2, p3; // some pointers in your datastructure
static std::list<std::shared_ptr<Graph>> deferred_cleanup;
~Graph() {
deferred_cleanup.emplace_back(std::move(p1));
deferred_cleanup.emplace_back(std::move(p2));
deferred_cleanup.emplace_back(std::move(p3));
}
static void cleanup() {
while (!deferred_cleanup.empty()) {
std::list<std::shared_ptr<Graph>> tmp;
std::swap(tmp, deferred_cleanup);
tmp.clear(); } }
};
and you just need to remember to call Graph::cleanup(); periodically.
this should do it. With a little work it can easily be made thread-safe (a little locking/atomics in the deleter engine)
synopsis:
The shared_ptr's to the nodes are created with a custom destructor which, rather than deleting the node, hands it off to a deleter engine.
The engine's implementation is a singleton. Upon being notified of a new node to be deleted, it adds the node to a delete queue. If there is no node being deleted, the nodes in the queue are deleted in turn (no recursion).
While this is happening, new nodes arriving in the engine are simply added to the back of the queue. The in-progress delete cycle will take care of them soon enough.
#include <memory>
#include <deque>
#include <stdexcept>
#include <iostream>
struct node;
struct delete_engine
{
void queue_for_delete(std::unique_ptr<node> p);
struct impl;
static impl& get_impl();
};
struct node
{
node(int d) : data(d) {}
~node() {
std::cout << "deleting node " << data << std::endl;
}
static std::shared_ptr<node> create(int d) {
return { new node(d),
[](node* p) {
auto eng = delete_engine();
eng.queue_for_delete(std::unique_ptr<node>(p));
}};
}
int data;
std::shared_ptr<node> child;
};
struct delete_engine::impl
{
bool _deleting { false };
std::deque<std::unique_ptr<node>> _delete_list;
void queue_for_delete(std::unique_ptr<node> p)
{
_delete_list.push_front(std::move(p));
if (!_deleting)
{
_deleting = true;
while(!_delete_list.empty())
{
_delete_list.pop_back();
}
_deleting = false;
}
}
};
auto delete_engine::get_impl() -> impl&
{
static impl _{};
return _;
}
void delete_engine::queue_for_delete(std::unique_ptr<node> p)
{
get_impl().queue_for_delete(std::move(p));
}
struct tree
{
std::shared_ptr<node> root;
auto add_child(int data)
{
if (root) {
throw std::logic_error("already have a root");
}
auto n = node::create(data);
root = n;
return n;
}
};
int main()
{
tree t;
auto pc = t.add_child(6);
pc = pc->child = node::create(7);
}
std::shared_ptr (and before that, boost::shared_ptr) is and was the de-facto standard for building dynamic systems involving massive DAGs.
In reality, DAGs don't get that deep (maybe 10 or 12 algorithms deep in your average FX pricing server?) so the recursive deletes are not a problem.
If you're thinking of building an enormous DAG with a depth of 10,000 then it might start to be a problem, but to be honest I think it will be the least of your worries.
re the analogy of a DAG being like a linked list... not really. Since it's acyclic all your pointers pointing "up" will need to be shared_ptr and all your back-pointers (e.g. binding message subscriptions to sink algorithms) will need to be weak_ptr's which you lock as you fire the message.
disclaimer: I've spent a lot of time designing and building information systems based on directed acyclic graphs of parameterised algorithm components, with a great deal of sharing of common components (i.e. same algorithm with same parameters).
Performance of the graph is never an issue. The bottlenecks are:
initially building the graph when the program starts - there's a lot of noise at that point, but it only happens once.
getting data into and out of the process (usually a message bus). This is invariably the bottleneck as it involves I/O.