Boost Fibonacci Heap Access Violation during pop() - c++

Context
I'm currently implementing some form of A* algorithm. I decided to use boost's fibonacci heap as underlying priority queue.
My Graph is being built while the algorithm runs. As Vertex object I'm using:
class Vertex {
public:
Vertex(double, double);
double distance = std::numeric_limits<double>::max();
double heuristic = 0;
HeapData* fib;
Vertex* predecessor = nullptr;
std::vector<Edge*> adj;
double euclideanDistanceTo(Vertex* v);
}
My Edge looks like:
class Edge {
public:
Edge(Vertex*, double);
Vertex* vertex = nullptr;
double weight = 1;
}
In order to use boosts fibonacci heap, I've read that one should create a heap data object, which I did like that:
struct HeapData {
Vertex* v;
boost::heap::fibonacci_heap<HeapData>::handle_type handle;
HeapData(Vertex* u) {
v = u;
}
bool operator<(HeapData const& rhs) const {
return rhs.v->distance + rhs.v->heuristic < v->distance + v->heuristic;
}
};
Note, that I included the heuristic and the actual distance in the comparator to get the A* behaviour, I want.
My actual A* implementation looks like that:
boost::heap::fibonacci_heap<HeapData> heap;
HeapData fibs(startPoint);
startPoint->distance = 0;
startPoint->heuristic = getHeuristic(startPoint);
auto handles = heap.push(fibs);
(*handles).handle = handles;
while (!heap.empty()) {
HeapData u = heap.top();
heap.pop();
if (u.v->equals(endPoint)) {
return;
}
doSomeGraphCreationStuff(u.v); // this only creates vertices and edges
for (Edge* e : u.v->adj) {
double newDistance = e->weight + u.v->distance;
if (e->vertex->distance > newDistance) {
e->vertex->distance = newDistance;
e->vertex->predecessor = u.v;
if (!e->vertex->fib) {
if (!u.v->equals(endPoint)) {
e->vertex->heuristic = getHeuristic(e->vertex);
}
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
}
else {
heap.increase(e->vertex->fib->handle);
}
}
}
}
Problem
The algorithm runs just fine, if I use a very small heuristic (which degenerates A* to Dijkstra). If I introduce some stronger heuristic, however, the program throws an exepction stating:
0xC0000005: Access violation writing location 0x0000000000000000.
in the unlink method of boosts circular_list_algorithm.hpp. For some reason, next and prev are null. This is a direct consequence of calling heap.pop().
Note that heap.pop() works fine for several times and does not crash immediately.
Question
What causes this problem and how can I fix it?
What I have tried
My first thought was that I accidentally called increase() even though distance + heuristic got bigger instead of smaller (according to the documentation, this can break stuff). This is not possible in my implementation, however, because I can only change a node if the distance got smaller. The heurisitic stays constant. I tried to use update() instead of increase() anyway, without success
I tried to set several break points to get a more detailed view, but my data set is huge and I fail to reproduce it with smaller sets.
Additional Information
Boost Version: 1.76.0
C++14
the increase function is indeed right (instead of a decrease function) since all boost heaps are implemented as max-heaps. We get a min-heap by reversing the comparator and using increase/decrease reversed

Okay, prepare for a ride.
First I found a bug
Next, I fully reviewed, refactored and simplified the code
When the dust settled, I noticed a behaviour change that looked like a potential logic error in the code
1. The Bug
Like I commented at the question, the code complexity is high due to over-reliance on raw pointers without clear semantics.
While I was reviewing and refactoring the code, I found that this has, indeed, lead to a bug:
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
In the first line you create a HeapData object. You make the fib member point to that object.
The second line inserts a copy of that object (meaning, it's a new object, with a different object identity, or practically speaking: a different address).
So, now
e->vertex->fib points to a (leaked) HeapData object that does not exist in the queue, and
the actual queued HeapData copy has a default-constructed handle member, which means that the handle wraps a null pointer. (Check boost::heap::detail::node_handle<> in detail/stable_heap.hpp to verify this).
This would handsomely explain the symptom you are seeing.
2. Refactor, Simplify
So, after understanding the code I have come to the conclusion that
HeapData and Vertex should to be merged: HeapData only served to link a handle to a Vertex, but you can already make the Vertex contain a Handle directly.
As a consequence of this merge
your vertex queue now actually contains vertices, expressing intent of the code
you reduce all of the vertex access by one level of indirection (reducing Law Of Demeter violations)
you can write the push operation in one natural line, removing the room for your bug to crop up. Before:
target->fib = new HeapData(target);
target->fib->handle = heap.push(*(target->fib));
After:
target->fibhandle = heap.push(target);
Your Edge class doesn't actually model an edge, but rather an "adjacency" - the target
part of the edge, with the weight attribute.
I renamed it OutEdge for clarity and also changed the vector to contain values instead of
dynamically allocated OutEdge instances.
I can't tell from the code shown, but I can almost guarantee these were
being leaked.
Also, OutEdge is only 16 bytes on most platforms, so copying them will be fine, and adjacencies are by definition owned by their source vertex (because including/moving it to another source vertex would change the meaning of the adjacency).
In fact, if you're serious about performance, you may want to use a boost::container::small_vector with a suitably chosen capacity if you know that e.g. the median number of edges is "small"
Your comparison can be "outsourced" to a function object
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
After which the heap can be typed as:
namespace bh = boost::heap;
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
Your cost function violated more Law-Of-Demeter, which was easily fixed by adding a Literate-Code accessor:
Cost cost() const { return distance + heuristic; }
From quick inspection I think it would be more accurate to use infinite() over max() as initial distance. Also, use a constant for readability:
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
Cost distance = INF;
You had a repeated check for xyz->equals(endPoint) to avoid updating the heuristic for a vertex. I suggest moving the update till after vertex dequeue, so the repetition can be gone (of both the check and the getHeuristic(...) call).
Like you said, we need to tread carefully around the increase/update fixup methods. As I read your code, the priority of a node is inversely related to the "cost" (cumulative edge-weight and heuristic values).
Because Boost Heap heaps are max heaps this implies that increasing the priority should match decreasing cost. We can just assert this to detect any programmer error in debug builds:
assert(target->cost() < previous_cost);
heap.increase(target->fibhandle);
With these changes in place, the code can read a lot quieter:
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
2(b) Live Demo
Adding some test data:
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
using Cost = double;
struct Vertex;
Cost getHeuristic(Vertex const*) { return 0; }
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), heuristic(h) {}
Cost distance = INF;
Cost heuristic = 0;
Handle fibhandle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + heuristic; }
Cost euclideanDistanceTo(Vertex* v);
bool equals(Vertex const* u) const { return this == u; }
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
int main() {
// a very very simple graph data structure with minimal helpers:
std::vector<Vertex> graph(10);
auto node = [&graph](int id) { return &graph.at(id); };
auto id = [&graph](Vertex const* node) { return node - graph.data(); };
// defining 6 edges
graph[0].adj = {{node(2), 1.5}, {node(3), 15}};
graph[2].adj = {{node(4), 2.5}, {node(1), 5}};
graph[1].adj = {{node(7), 0.5}};
graph[7].adj = {{node(3), 0.5}};
// do a search
Node startPoint = node(0);
Node endPoint = node(7);
Cost cost = AStarSearch(startPoint, endPoint);
std::cout << "Overall cost: " << cost << ", reverse path: \n";
for (Node node = endPoint; node != nullptr; node = node->predecessor) {
std::cout << " - " << id(node) << " distance " << node->distance
<< "\n";
}
}
Prints
Overall cost: 7, reverse path:
- 7 distance 7
- 1 distance 6.5
- 2 distance 1.5
- 0 distance 0
3. The Plot Twist: Lurking Logic Errors?
I felt uneasy about moving the getHeuristic() update around. I wondered
whether I might have changed the meaning of the code, even though the control
flow seemed to check out.
And then I realized that indeed the behaviour changed. It is subtle. At first I thought the
the old behaviour was just problematic. So, let's analyze:
The source of the risk is an inconsistency in node visitation vs. queue prioritization.
When visiting nodes, the condition to see whether the target became "less
distant" is expressed in terms of distance only.
However, the queue priority will be based on cost, which is different
from distance in that it includes any heuristics.
The problem lurking there is that it is possible to write code that where the
fact that distance decreases, NEED NOT guarantee that cost decreases.
Going back to the code, we can see that this narrowly avoided, because the
getHeuristic update is only executed in the non-update path of the code.
In Conclusion
Understanding this made me realize that
the Vertex::heuristic field is intended merely as a "cached" version of the getHeuristic() function call
implying that that function is treated as if it is idempotent
that my version did change behaviour in that getHeuristic was now
potentially executed more than once for the same vertex (if visited again
via a cheaper path)
I would suggest to fix this by
renaming the heuristic field to cachedHeuristic
making an enqueue function to encapsulate the three steps for enqueuing a vertex:
simply omitting the endpoint check because it can at MOST eliminate a single invocation of getHeuristic for that node, probably not worth the added complexity
add a comment pointing out the subtlety of that code path
UPDATE as discovered in the comments, we also need the inverse
operatione (dequeue) to symmtrically update handle so it reflects that
the node is no longer in the queue...
It also drives home the usefulness of having the precondition assert added before invoking Heap::increase.
Final Listing
With the above changes
encapsulated into a Graph object, that
also reads the graph from input like:
0 2 1.5
0 3 15
2 4 2.5
2 1 5
1 7 0.5
7 3 0.5
Where each line contains (source, target, weight).
A separate file can contain heuristic values for vertices index [0, ...),
optionally newline-separated, e.g. "7 11 99 33 44 55"
and now returning the arrived-at node instead of its cost only
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
#include <deque>
#include <fstream>
using Cost = double;
struct Vertex;
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using MutableQueue = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = MutableQueue::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), cachedHeuristic(h) {}
Cost distance = INF;
Cost cachedHeuristic = 0;
Handle handle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + cachedHeuristic; }
Cost euclideanDistanceTo(Vertex* v);
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
class Graph {
std::vector<Cost> _heuristics;
Cost getHeuristic(Vertex* v) {
size_t n = id(v);
return n < _heuristics.size() ? _heuristics[n] : 0;
}
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
public:
Graph(std::string edgeFile, std::string heurFile) {
{
std::ifstream stream(heurFile);
_heuristics.assign(std::istream_iterator<Cost>(stream), {});
if (!stream.eof())
throw std::runtime_error("Unexpected heuristics");
}
std::ifstream stream(edgeFile);
size_t src, tgt;
double weight;
while (stream >> src >> tgt >> weight) {
_nodes.resize(std::max({_nodes.size(), src + 1, tgt + 1}));
_nodes[src].adj.push_back({node(tgt), weight});
}
if (!stream.eof())
throw std::runtime_error("Unexpected input");
}
Node search(size_t from, size_t to) {
assert(from < _nodes.size());
assert(to < _nodes.size());
return AStar(node(from), node(to));
}
size_t id(Node node) const {
// ugh, this is just for "pretty output"...
for (size_t i = 0; i < _nodes.size(); ++i) {
if (node == &_nodes[i])
return i;
}
throw std::out_of_range("id");
};
Node node(int id) { return &_nodes.at(id); };
private:
// simple graph data structure with minimal helpers:
std::deque<Vertex> _nodes; // reference stable when growing at the back
// search state
MutableQueue _queue;
void enqueue(Node n) {
assert(n && (n->handle == NOHANDLE));
// get heuristic before insertion!
n->cachedHeuristic = getHeuristic(n);
n->handle = _queue.push(n);
}
Node dequeue() {
Node node = _queue.top();
node->handle = NOHANDLE;
_queue.pop();
return node;
}
Node AStar(Node start, Node destination) {
_queue.clear();
start->distance = 0;
enqueue(start);
while (!_queue.empty()) {
Node u = dequeue();
if (u == destination) {
return u;
}
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = u->distance + weight;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->handle == NOHANDLE) {
// also caches heuristic
enqueue(target);
} else {
// NOTE: avoid updating heuristic here, because it
// breaks the queue invariant if heuristic increased
// more than decrease in distance
assert(target->cost() < cost_prior);
_queue.increase(target->handle);
}
}
}
}
return nullptr;
}
};
int main() {
Graph graph("input.txt", "heur.txt");
Node arrival = graph.search(0, 7);
std::cout << "reverse path: \n";
for (Node n = arrival; n != nullptr; n = n->predecessor) {
std::cout << " - " << graph.id(n) << " cost " << n->cost() << "\n";
}
}
Again, printing the expected
reverse path:
- 7 cost 7
- 1 cost 17.5
- 2 cost 100.5
- 0 cost 7
Note how the heuristics changed the cost, but not optimal path in this case.

Related

How to prove/disprove this algorithm time complexity is O(M+N) amortized?

The following problem on leetcode has 2 described solution. Let N be the number of input equations and M be the number of queries:
One uses union find and is O((M+N)log*(N))
One uses DFS and is O(M*N)
However it seems to me that answering all queries at the end with DFS will have an O(M+N) runtime. The below code passed all tests and was accepted by the OJ.
General outline
Build a graph. Each equation (a/b) = x creates two weighted edges from a to b with weight x, and from b to a with weight 1/x.
I run DFS over all variables and record the connected components. For each letter, I maintain in which connected components it is via component_map.
Each var in the component has a value V = captain/var, where captain was the first variable inserted
Then for each query I can answer when both belongs to the same component without need of backtraking since (captain/var1* var2/captain = var2/var1)
The key differences between my DFS solution and theirs are:
I do not need to backtrack due to last bullet above
I answer all queries at once at the end
My reasoning is that every single operation I do is amortized O(1), basically with hash maps and vectors. I run DFS inside a loop with N iteration, but the sum complexity of all my DFS calls will be O(M+N) as every node is only visited once.
I hence believe this solution to be O(M+N).
Question: Am I correct? Can you prove the time complexity of this algorithm whatever it is?
class Solution {
public:
typedef unordered_map<string,double> component; // var --> captain/var
typedef unordered_map<string,component> components; // captain --> component , each component identified by its captain.
typedef unordered_map<string,string> component_map; // var --> captain, to what component this var belongs?
typedef unordered_map<string, vector<pair<string,double>>> adjacency_list;
void DFS(const string& node, component& compo, adjacency_list& adj, double value, unordered_set<string>& visited,component_map& m, const string& captain)
{
for(auto p: adj[node])
{
if(compo.find(p.first) == compo.end())
{
visited.insert(p.first);
m.insert({p.first,captain}); // this letter belongs to this "captain component"
compo.insert({p.first,value*p.second}); // insert L,V
DFS(p.first,compo,adj,value*p.second,visited,m,captain);
}
}
}
vector<double> calcEquation(vector<vector<string>>& equations, vector<double>& values, vector<vector<string>>& queries)
{
adjacency_list adj;
for(int i=0;i<equations.size();++i)
{
string a = equations[i][0];
string b = equations[i][1];
double v = values[i];
auto it = adj.find(a);
if( it == adj.end())
{
adj.insert({a,{}});
it = adj.find(a);
}
it->second.push_back({b,v});
it = adj.find(b);
if( it == adj.end())
{
adj.insert({b,{}});
it = adj.find(b);
}
it->second.push_back({a,1/v});
}
components cps;
unordered_set<string> visited;
component_map m;
for(int i=0;i<equations.size();++i)
{
string a = equations[i][0];
if(visited.find(a)==visited.end())
{
auto it = cps.insert({a,{}}).first;
DFS(a,it->second,adj,1,visited,m,a);
}
string b = equations[i][1];
if(visited.find(b)==visited.end())
{
auto it = cps.insert({b,{}}).first;
DFS(b,it->second,adj,1,visited,m,a);
}
}
vector<double> res;
for(auto& q:queries)
{
auto it0 = m.find(q[0]);
auto it1 = m.find(q[1]);
if(it0 != m.end() && it1 != m.end() && it0->second == it1->second)
{
auto& captain = it0->second;
auto& cp = cps[captain];
res.push_back(cp[q[1]]/cp[q[0]]);
}
else
{
res.push_back(-1.0);
}
}
return res;
}
};

Iterating over linked list in C++ is slower than in Go with analogous memory access

In a variety of contexts I've observed that linked list iteration is consistently slower in C++ than in Go by 10-15%. My first attempt at resolving this mystery on Stack Overflow is here. The example I coded up was problematic because:
1) memory access was unpredictable because of heap allocations, and
2) because there was no actual work being done, some people's compilers were optimizing away the main loop.
To resolve these issues I have a new program with implementations in C++ and Go. The C++ version takes 1.75 secs compared to 1.48 secs for the Go version. This time, I do one large heap allocation before timing begins and use it to operate an object pool from which I release and acquire nodes for the linked list. This way the memory access should be completely analogous between the two implementations.
Hopefully this makes the mystery more reproducible!
C++:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/timer.hpp>
using namespace std;
struct Node {
Node *next; // 8 bytes
int age; // 4 bytes
};
// Object pool, where every free slot points to the previous free slot
template<typename T, int n>
struct ObjPool
{
typedef T* pointer;
typedef pointer* metapointer;
ObjPool() :
_top(NULL),
_size(0)
{
pointer chunks = new T[n];
for (int i=0; i < n; i++) {
release(&chunks[i]);
}
}
// Giver an available pointer to the object pool
void release(pointer ptr)
{
// Store the current pointer at the given address
*(reinterpret_cast<metapointer>(ptr)) = _top;
// Advance the pointer
_top = ptr;
// Increment the size
++_size;
}
// Pop an available pointer off the object pool for program use
pointer acquire(void)
{
if(_size == 0){throw std::out_of_range("");}
// Pop the top of the stack
pointer retval = _top;
// Step back to the previous address
_top = *(reinterpret_cast<metapointer>(_top));
// Decrement the size
--_size;
// Return the next free address
return retval;
}
unsigned int size(void) const {return _size;}
protected:
pointer _top;
// Number of free slots available
unsigned int _size;
};
Node *nodes = nullptr;
ObjPool<Node, 1000> p;
void processAge(int age) {
// If the object pool is full, pop off the head of the linked list and release
// it from the pool
if (p.size() == 0) {
Node *head = nodes;
nodes = nodes->next;
p.release(head);
}
// Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
Node *node = nodes;
Node *prev = nullptr;
while (true) {
if (node == nullptr || age < node->age) {
Node *newNode = p.acquire();
newNode->age = age;
newNode->next = node;
if (prev == nullptr) {
nodes = newNode;
} else {
prev->next = newNode;
}
return;
}
prev = node;
node = node->next;
}
}
int main() {
Node x = {};
std::cout << "Size of struct: " << sizeof(x) << "\n"; // 16 bytes
boost::timer t;
for (int i=0; i<1000000; i++) {
processAge(i);
}
std::cout << t.elapsed() << "\n";
}
Go:
package main
import (
"time"
"fmt"
"unsafe"
)
type Node struct {
next *Node // 8 bytes
age int32 // 4 bytes
}
// Every free slot points to the previous free slot
type NodePool struct {
top *Node
size int
}
func NewPool(n int) NodePool {
p := NodePool{nil, 0}
slots := make([]Node, n, n)
for i := 0; i < n; i++ {
p.Release(&slots[i])
}
return p
}
func (p *NodePool) Release(l *Node) {
// Store the current top at the given address
*((**Node)(unsafe.Pointer(l))) = p.top
p.top = l
p.size++
}
func (p *NodePool) Acquire() *Node {
if p.size == 0 {
fmt.Printf("Attempting to pop from empty pool!\n")
}
retval := p.top
// Step back to the previous address in stack of addresses
p.top = *((**Node)(unsafe.Pointer(p.top)))
p.size--
return retval
}
func processAge(age int32) {
// If the object pool is full, pop off the head of the linked list and release
// it from the pool
if p.size == 0 {
head := nodes
nodes = nodes.next
p.Release(head)
}
// Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
node := nodes
var prev *Node = nil
for true {
if node == nil || age < node.age {
newNode := p.Acquire()
newNode.age = age
newNode.next = node
if prev == nil {
nodes = newNode
} else {
prev.next = newNode
}
return
}
prev = node
node = node.next
}
}
// Linked list of nodes, in ascending order by age
var nodes *Node = nil
var p NodePool = NewPool(1000)
func main() {
x := Node{};
fmt.Printf("Size of struct: %d\n", unsafe.Sizeof(x)) // 16 bytes
start := time.Now()
for i := 0; i < 1000000; i++ {
processAge(int32(i))
}
fmt.Printf("Time elapsed: %s\n", time.Since(start))
}
Output:
clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out
Size of struct: 16
1.7548
go run minimalPool.go
Size of struct: 16
Time elapsed: 1.487930629s
The big difference between your two programs is that your Go code ignores errors (and will panic or segfault, if you're lucky, if you empty the pool), while your C++ code propagates errors via exception. Compare:
if p.size == 0 {
fmt.Printf("Attempting to pop from empty pool!\n")
}
vs.
if(_size == 0){throw std::out_of_range("");}
There are at least three ways1 to make the comparison fair:
Can change the C++ code to ignore the error, as you do in Go,
Change both versions to panic/abort on error.
Change the Go version to handle errors idiomatically,2 as you do in C++.
So, let's do all of them and compare the results3:
C++ ignoring error: 1.059329s wall, 1.050000s user + 0.000000s system = 1.050000s CPU (99.1%)
C++ abort on error: 1.081585s wall, 1.060000s user + 0.000000s system = 1.060000s CPU (98.0%)
Go panic on error: Time elapsed: 1.152942427s
Go ignoring error: Time elapsed: 1.196426068s
Go idiomatic error handling: Time elapsed: 1.322005119s
C++ exception: 1.373458s wall, 1.360000s user + 0.000000s system = 1.360000s CPU (99.0%)
So:
Without error handling, C++ is faster than Go.
With panicking, Go gets faster,4 but still not as fast as C++.
With idiomatic error handling, C++ slows down a lot more than Go.
Why? This exception never actually happens in your test run, so the actual error-handling code never runs in either language. But clang can't prove that it doesn't happen. And, since you never catch the exception anywhere, that means it has to emit exception handlers and stack unwinders for every non-elided frame all the way up the stack. So it's doing more work on each function call and return—not much more work, but then your function is doing so little real work that the unnecessary extra work adds up.
1. You could also change the C++ version to do C-style error handling, or to use an Option type, and probably other possibilities.
2. This, of course, requires a lot more changes: you need to import errors, change the return type of Acquire to (*Node, error), change the return type of processAge to error, change all your return statements, and add at least two if err != nil { … } checks. But that's supposed to be a good thing about Go, right?
3. While I was at it, I replaced your legacy boost::timer with boost::auto_cpu_timer, so we're now seeing wall clock time (as with Go) as well as CPU time.
4. I won't attempt to explain why, because I don't understand it. From a quick glance at the assembly, it's clearly optimized out some checks, but I can't see why it couldn't optimize out those same checks without the panic.

A* and N-Puzzle optimization

I am writing a solver for the N-Puzzle (see http://en.wikipedia.org/wiki/Fifteen_puzzle)
Right now I am using a unordered_map to store hash values of the puzzle board,
and manhattan distance as the heuristic for the algorithm, which is a plain DFS.
so I have
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
This works great for n = 2 and 3.
However, its really hit and miss for n=4 and above. (stl unable to allocate memory for a new node)
I also suspect that I am getting hash collisions in the unordered_set
unsigned makeHash(const Node & pNode)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < pNode.data_.size(); i++)
{
hash = hash * a + pNode.data_[i];
a = a * b;
}
return hash;
}
16! = 2 × 10^13 (possible arrangements)
2^32 = 4 x 10^9 (possible hash values in a 32 bit hash)
My question is how can I optimize my code to solve for n=4 and n=5?
I know from here
http://kociemba.org/fifteen/fifteensolver.html
http://www.ic-net.or.jp/home/takaken/e/15pz/index.html
that n=4 is possible in less than a second on average.
edit:
The algorithm itself is here:
bool NPuzzle::aStarSearch()
{
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
// if we are in the solved position in the first place, return true
if(initial_ == target_)
{
current_ = initial_;
return true;
}
frontier.insert(new Node(initial_)); // we are going to delete everything from the frontier later..
for(;;)
{
if(frontier.empty())
{
std::cout << "depth first search " << "cant solve!" << std::endl;
return false;
}
// remove a node from the frontier, and place it into the explored set
Node * pLeaf = *frontier.begin();
frontier.erase(frontier.begin());
explored.push_back(pLeaf);
// do the same for the hash table
unsigned hashValue = makeHash(*pLeaf);
frontierHashTable.erase(hashValue);
exploredHashTable.insert(hashValue);
std::vector<Node *> children = pLeaf->genChildren();
for( auto it = children.begin(); it != children.end(); ++it)
{
unsigned childHash = makeHash(**it);
if(inFrontierOrExplored(frontierHashTable, exploredHashTable, childHash))
{
delete *it;
}
else
{
if(**it == target_)
{
explored.push_back(*it);
current_ = **it;
// delete everything else in children
for( auto it2 = ++it; it2 != children.end(); ++it2)
delete * it2;
// delete everything in the frontier
for( auto it = frontier.begin(); it != frontier.end(); ++it)
delete *it;
// delete everything in explored
explored_.swap(explored);
for( auto it = explored.begin(); it != explored.end(); ++it)
delete *it;
return true;
}
else
{
frontier.insert(*it);
frontierHashTable.insert(childHash);
}
}
}
}
}
Since this is homework I will suggest some strategies you might try.
First, try using valgrind or a similar tool to check for memory leaks. You may have some memory leaks if you don't delete everything you new.
Second, calculate a bound on the number of nodes that should be explored. Keep track of the number of nodes you do explore. If you pass the bound, you might not be detecting cycles properly.
Third, try the algorithm with depth first search instead of A*. Its memory requirements should be linear in the depth of the tree and it should just be a matter of changing the sort ordering (pred). If DFS works, your A* search may be exploring too many nodes or your memory structures might be too inefficient. If DFS doesn't work, again it might be a problem with cycles.
Fourth, try more compact memory structures. For example, std::multiset does what you want but std::priority_queue with a std::deque may take up less memory. There are other changes you could try and see if they improve things.
First i would recommend cantor expansion, which you can use as the hashing method. It's 1-to-1, i.e. the 16! possible arrangements would be hashed into 0 ~ 16! - 1.
And then i would implement map by my self, as you may know, std is not efficient enough for computation. map is actually a Binary Search Tree, i would recommend Size Balanced Tree, or you can use AVL tree.
And just for record, directly use bool hash[] & big prime may also receive good result.
Then the most important thing - the A* function, like what's in the first of your link, you may try variety of A* function and find the best one.
You are only using the heuristic function to order the multiset. You should use the min(g(n) + f(n)) i.e. the min(path length + heuristic) to order your frontier.
Here the problem is, you are picking the one with the least heuristic, which may not be the correct "next child" to pick.
I believe this is what is causing your calculation to explode.

Recursive to Iterative Transformation

I've gotten stuck on trying to re-write my code from a recursive function into an iterative function.
I thought I'd ask if there are any general things to think about/tricks/guidelines etc... in regards to going from recursive code to iterative code.
e.g. I can't rly get my head around how to get the following code iterative, mainly due to the loop inside the recursion which further depends on and calls the next recursion.
struct entry
{
uint8_t values[8];
int32_t num_values;
std::array<entry, 256>* next_table;
void push_back(uint8_t value) {values[num_values++] = value;}
};
struct node
{
node* children; // +0 right, +1 left
uint8_t value;
uint8_t is_leaf;
};
void build_tables(node* root, std::array<std::array<entry, 8>, 255>& tables, int& table_count)
{
int table_index = root->value; // root is always a non-leave, thus value is the current table index.
for(int n = 0; n < 256; ++n)
{
auto current = root;
// Recurse the the huffman tree bit by bit for this table entry
for(int i = 0; i < 8; ++i)
{
current = current->children + ((n >> i) & 1); // Travel to the next node current->children[0] is left child and current->children[1] is right child. If current is a leaf then current->childen[0/1] point to the root.
if(current->is_leaf)
tables[table_index][n].push_back(current->value);
}
if(!current->is_leaf)
{
if(current->value == 0) // For non-leaves, the "value" is the sub-table index for this particular non-leave node
{
current->value = table_count++;
build_tables(current, tables, table_count);
}
tables[table_index][n].next_table = &tables[current->value];
}
else
tables[table_index][n].next_table = &tables[0];
}
}
As tables and table_count always refer to the same objects, you might make a small performance gain by taking tables and table_count out of the argument list of build_tables by storing them as members of a temporary struct and then doing something like this:
struct build_tables_struct
{
build_tables_struct(std::array<std::array<entry, 8>, 255>& tables, int& table_count) :
tables(tables), table_count(table_count) {}
std::array<std::array<entry, 8>, 255>& tables;
int& table_count;
build_tables_worker(node* root)
{
...
build_tables_worker(current); // instead of build_tables(current, tables, table_count);
...
}
}
void build_tables(node* root, std::array<std::array<entry, 8>, 255>& tables, int& table_count)
{
build_tables_struct(tables, table_count).build_tables_worker(root);
}
This applies of course only if your compiler is not smart enough to make this optimisation itself.
The only way you can make this non-recursive otherwise is managing the stack yourself. I doubt this would be much if any faster than the recursive version.
This all being said, I doubt your performance issue here is recursion. Pushing three reference arguments to the stack and calling a function I don't think is a huge burden compared to the work your function does.

how to use Dijkstra c++ code using array based version

I need to use (not implement) an array based version of Dijkstras algo .The task is that given a set of line segments(obstacles) and start/end points I have to find and draw the shortest path from start/end point.I have done the calculating part etc..but dont know how to use dijkstras with my code.My code is as follows
class Point
{
public:
int x;
int y;
Point()
{
}
void CopyPoint(Point p)
{
this->x=p.x;
this->y=p.y;
}
};
class NeighbourInfo
{
public:
Point x;
Point y;
double distance;
NeighbourInfo()
{
distance=0.0;
}
};
class LineSegment
{
public:
Point Point1;
Point Point2;
NeighbourInfo neighbours[100];
LineSegment()
{
}
void main()//in this I use my classes and some code to fill out the datastructure
{
int NoOfSegments=i;
for(int j=0;j<NoOfSegments;j++)
{
for(int k=0;k<NoOfSegments;k++)
{
if( SimpleIntersect(segments[j],segments[k]) )
{
segments[j].neighbours[k].distance=INFINITY;
segments[j].neighbours[k].x.CopyPoint(segments[k].Point1);
segments[j].neighbours[k].y.CopyPoint(segments[k].Point2);
cout<<"Intersect"<<endl;
cout<<segments[j].neighbours[k].distance<<endl;
}
else
{
segments[j].neighbours[k].distance=
EuclidianDistance(segments[j].Point1.x,segments[j].Point1.y,segments[k].Point2.x,segments[k ].Point2.y);
segments[j].neighbours[k].x.CopyPoint(segments[k].Point1);
segments[j].neighbours[k].y.CopyPoint(segments[k].Point2);
}
}
}
}
Now I have the distances from each segmnets to all other segments, amd using this data(in neighbourinfo) I want to use array based Dijkstras(restriction ) to trace out the shortest path from start/end points.There is more code , but have shortened the problem for the ease of the reader
Please Help!!Thanks and plz no .net lib/code as I am using core C++ only..Thanks in advance
But I need the array based version(strictly..) I am not suppose to use any other implemntation.
Dijkstras
This is how Dijkstra's works:
Its not a simple algorithm. So you will have to map this algorithm to your own code.
But good luck.
List<Nodes> found; // All processed nodes;
List<Nodes> front; // All nodes that have been reached (but not processed)
// This list is sorted by the cost of getting to this node.
List<Nodes> remaining; // All nodes that have not been explored.
remaining.remove(startNode);
front.add(startNode);
startNode.setCost(0); // Cost nothing to get to start.
while(!front.empty())
{
Node current = front.getFirstNode();
front.remove(current);
found.add(current);
if (current == endNode)
{ return current.cost(); // we found the end
}
List<Edge> edges = current.getEdges();
for(loop = edges.begin(); loop != edges.end(); ++loop)
{
Node dst = edge.getDst();
if (found.find(dst) != found.end())
{ continue; // If we have already processed this node ignore it.
}
// The cost to get here. Is the cost to get to the last node.
// Plus the cost to traverse the edge.
int cost = current.cost() + loop.cost();
Node f = front.find(dst);
if (f != front.end())
{
f.setCost(std::min(f.cost(), cost));
continue; // If the node is on the front line just update the cost
// Then continue with the next node.
}
// Its a new node.
// remove it from the remaining and add it to the front (setting the cost).
remaining.remove(dst);
front.add(dst);
dst.setCost(cost);
}
}