I've gotten stuck on trying to re-write my code from a recursive function into an iterative function.
I thought I'd ask if there are any general things to think about/tricks/guidelines etc... in regards to going from recursive code to iterative code.
e.g. I can't rly get my head around how to get the following code iterative, mainly due to the loop inside the recursion which further depends on and calls the next recursion.
struct entry
{
uint8_t values[8];
int32_t num_values;
std::array<entry, 256>* next_table;
void push_back(uint8_t value) {values[num_values++] = value;}
};
struct node
{
node* children; // +0 right, +1 left
uint8_t value;
uint8_t is_leaf;
};
void build_tables(node* root, std::array<std::array<entry, 8>, 255>& tables, int& table_count)
{
int table_index = root->value; // root is always a non-leave, thus value is the current table index.
for(int n = 0; n < 256; ++n)
{
auto current = root;
// Recurse the the huffman tree bit by bit for this table entry
for(int i = 0; i < 8; ++i)
{
current = current->children + ((n >> i) & 1); // Travel to the next node current->children[0] is left child and current->children[1] is right child. If current is a leaf then current->childen[0/1] point to the root.
if(current->is_leaf)
tables[table_index][n].push_back(current->value);
}
if(!current->is_leaf)
{
if(current->value == 0) // For non-leaves, the "value" is the sub-table index for this particular non-leave node
{
current->value = table_count++;
build_tables(current, tables, table_count);
}
tables[table_index][n].next_table = &tables[current->value];
}
else
tables[table_index][n].next_table = &tables[0];
}
}
As tables and table_count always refer to the same objects, you might make a small performance gain by taking tables and table_count out of the argument list of build_tables by storing them as members of a temporary struct and then doing something like this:
struct build_tables_struct
{
build_tables_struct(std::array<std::array<entry, 8>, 255>& tables, int& table_count) :
tables(tables), table_count(table_count) {}
std::array<std::array<entry, 8>, 255>& tables;
int& table_count;
build_tables_worker(node* root)
{
...
build_tables_worker(current); // instead of build_tables(current, tables, table_count);
...
}
}
void build_tables(node* root, std::array<std::array<entry, 8>, 255>& tables, int& table_count)
{
build_tables_struct(tables, table_count).build_tables_worker(root);
}
This applies of course only if your compiler is not smart enough to make this optimisation itself.
The only way you can make this non-recursive otherwise is managing the stack yourself. I doubt this would be much if any faster than the recursive version.
This all being said, I doubt your performance issue here is recursion. Pushing three reference arguments to the stack and calling a function I don't think is a huge burden compared to the work your function does.
Related
Context
I'm currently implementing some form of A* algorithm. I decided to use boost's fibonacci heap as underlying priority queue.
My Graph is being built while the algorithm runs. As Vertex object I'm using:
class Vertex {
public:
Vertex(double, double);
double distance = std::numeric_limits<double>::max();
double heuristic = 0;
HeapData* fib;
Vertex* predecessor = nullptr;
std::vector<Edge*> adj;
double euclideanDistanceTo(Vertex* v);
}
My Edge looks like:
class Edge {
public:
Edge(Vertex*, double);
Vertex* vertex = nullptr;
double weight = 1;
}
In order to use boosts fibonacci heap, I've read that one should create a heap data object, which I did like that:
struct HeapData {
Vertex* v;
boost::heap::fibonacci_heap<HeapData>::handle_type handle;
HeapData(Vertex* u) {
v = u;
}
bool operator<(HeapData const& rhs) const {
return rhs.v->distance + rhs.v->heuristic < v->distance + v->heuristic;
}
};
Note, that I included the heuristic and the actual distance in the comparator to get the A* behaviour, I want.
My actual A* implementation looks like that:
boost::heap::fibonacci_heap<HeapData> heap;
HeapData fibs(startPoint);
startPoint->distance = 0;
startPoint->heuristic = getHeuristic(startPoint);
auto handles = heap.push(fibs);
(*handles).handle = handles;
while (!heap.empty()) {
HeapData u = heap.top();
heap.pop();
if (u.v->equals(endPoint)) {
return;
}
doSomeGraphCreationStuff(u.v); // this only creates vertices and edges
for (Edge* e : u.v->adj) {
double newDistance = e->weight + u.v->distance;
if (e->vertex->distance > newDistance) {
e->vertex->distance = newDistance;
e->vertex->predecessor = u.v;
if (!e->vertex->fib) {
if (!u.v->equals(endPoint)) {
e->vertex->heuristic = getHeuristic(e->vertex);
}
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
}
else {
heap.increase(e->vertex->fib->handle);
}
}
}
}
Problem
The algorithm runs just fine, if I use a very small heuristic (which degenerates A* to Dijkstra). If I introduce some stronger heuristic, however, the program throws an exepction stating:
0xC0000005: Access violation writing location 0x0000000000000000.
in the unlink method of boosts circular_list_algorithm.hpp. For some reason, next and prev are null. This is a direct consequence of calling heap.pop().
Note that heap.pop() works fine for several times and does not crash immediately.
Question
What causes this problem and how can I fix it?
What I have tried
My first thought was that I accidentally called increase() even though distance + heuristic got bigger instead of smaller (according to the documentation, this can break stuff). This is not possible in my implementation, however, because I can only change a node if the distance got smaller. The heurisitic stays constant. I tried to use update() instead of increase() anyway, without success
I tried to set several break points to get a more detailed view, but my data set is huge and I fail to reproduce it with smaller sets.
Additional Information
Boost Version: 1.76.0
C++14
the increase function is indeed right (instead of a decrease function) since all boost heaps are implemented as max-heaps. We get a min-heap by reversing the comparator and using increase/decrease reversed
Okay, prepare for a ride.
First I found a bug
Next, I fully reviewed, refactored and simplified the code
When the dust settled, I noticed a behaviour change that looked like a potential logic error in the code
1. The Bug
Like I commented at the question, the code complexity is high due to over-reliance on raw pointers without clear semantics.
While I was reviewing and refactoring the code, I found that this has, indeed, lead to a bug:
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
In the first line you create a HeapData object. You make the fib member point to that object.
The second line inserts a copy of that object (meaning, it's a new object, with a different object identity, or practically speaking: a different address).
So, now
e->vertex->fib points to a (leaked) HeapData object that does not exist in the queue, and
the actual queued HeapData copy has a default-constructed handle member, which means that the handle wraps a null pointer. (Check boost::heap::detail::node_handle<> in detail/stable_heap.hpp to verify this).
This would handsomely explain the symptom you are seeing.
2. Refactor, Simplify
So, after understanding the code I have come to the conclusion that
HeapData and Vertex should to be merged: HeapData only served to link a handle to a Vertex, but you can already make the Vertex contain a Handle directly.
As a consequence of this merge
your vertex queue now actually contains vertices, expressing intent of the code
you reduce all of the vertex access by one level of indirection (reducing Law Of Demeter violations)
you can write the push operation in one natural line, removing the room for your bug to crop up. Before:
target->fib = new HeapData(target);
target->fib->handle = heap.push(*(target->fib));
After:
target->fibhandle = heap.push(target);
Your Edge class doesn't actually model an edge, but rather an "adjacency" - the target
part of the edge, with the weight attribute.
I renamed it OutEdge for clarity and also changed the vector to contain values instead of
dynamically allocated OutEdge instances.
I can't tell from the code shown, but I can almost guarantee these were
being leaked.
Also, OutEdge is only 16 bytes on most platforms, so copying them will be fine, and adjacencies are by definition owned by their source vertex (because including/moving it to another source vertex would change the meaning of the adjacency).
In fact, if you're serious about performance, you may want to use a boost::container::small_vector with a suitably chosen capacity if you know that e.g. the median number of edges is "small"
Your comparison can be "outsourced" to a function object
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
After which the heap can be typed as:
namespace bh = boost::heap;
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
Your cost function violated more Law-Of-Demeter, which was easily fixed by adding a Literate-Code accessor:
Cost cost() const { return distance + heuristic; }
From quick inspection I think it would be more accurate to use infinite() over max() as initial distance. Also, use a constant for readability:
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
Cost distance = INF;
You had a repeated check for xyz->equals(endPoint) to avoid updating the heuristic for a vertex. I suggest moving the update till after vertex dequeue, so the repetition can be gone (of both the check and the getHeuristic(...) call).
Like you said, we need to tread carefully around the increase/update fixup methods. As I read your code, the priority of a node is inversely related to the "cost" (cumulative edge-weight and heuristic values).
Because Boost Heap heaps are max heaps this implies that increasing the priority should match decreasing cost. We can just assert this to detect any programmer error in debug builds:
assert(target->cost() < previous_cost);
heap.increase(target->fibhandle);
With these changes in place, the code can read a lot quieter:
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
2(b) Live Demo
Adding some test data:
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
using Cost = double;
struct Vertex;
Cost getHeuristic(Vertex const*) { return 0; }
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), heuristic(h) {}
Cost distance = INF;
Cost heuristic = 0;
Handle fibhandle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + heuristic; }
Cost euclideanDistanceTo(Vertex* v);
bool equals(Vertex const* u) const { return this == u; }
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
int main() {
// a very very simple graph data structure with minimal helpers:
std::vector<Vertex> graph(10);
auto node = [&graph](int id) { return &graph.at(id); };
auto id = [&graph](Vertex const* node) { return node - graph.data(); };
// defining 6 edges
graph[0].adj = {{node(2), 1.5}, {node(3), 15}};
graph[2].adj = {{node(4), 2.5}, {node(1), 5}};
graph[1].adj = {{node(7), 0.5}};
graph[7].adj = {{node(3), 0.5}};
// do a search
Node startPoint = node(0);
Node endPoint = node(7);
Cost cost = AStarSearch(startPoint, endPoint);
std::cout << "Overall cost: " << cost << ", reverse path: \n";
for (Node node = endPoint; node != nullptr; node = node->predecessor) {
std::cout << " - " << id(node) << " distance " << node->distance
<< "\n";
}
}
Prints
Overall cost: 7, reverse path:
- 7 distance 7
- 1 distance 6.5
- 2 distance 1.5
- 0 distance 0
3. The Plot Twist: Lurking Logic Errors?
I felt uneasy about moving the getHeuristic() update around. I wondered
whether I might have changed the meaning of the code, even though the control
flow seemed to check out.
And then I realized that indeed the behaviour changed. It is subtle. At first I thought the
the old behaviour was just problematic. So, let's analyze:
The source of the risk is an inconsistency in node visitation vs. queue prioritization.
When visiting nodes, the condition to see whether the target became "less
distant" is expressed in terms of distance only.
However, the queue priority will be based on cost, which is different
from distance in that it includes any heuristics.
The problem lurking there is that it is possible to write code that where the
fact that distance decreases, NEED NOT guarantee that cost decreases.
Going back to the code, we can see that this narrowly avoided, because the
getHeuristic update is only executed in the non-update path of the code.
In Conclusion
Understanding this made me realize that
the Vertex::heuristic field is intended merely as a "cached" version of the getHeuristic() function call
implying that that function is treated as if it is idempotent
that my version did change behaviour in that getHeuristic was now
potentially executed more than once for the same vertex (if visited again
via a cheaper path)
I would suggest to fix this by
renaming the heuristic field to cachedHeuristic
making an enqueue function to encapsulate the three steps for enqueuing a vertex:
simply omitting the endpoint check because it can at MOST eliminate a single invocation of getHeuristic for that node, probably not worth the added complexity
add a comment pointing out the subtlety of that code path
UPDATE as discovered in the comments, we also need the inverse
operatione (dequeue) to symmtrically update handle so it reflects that
the node is no longer in the queue...
It also drives home the usefulness of having the precondition assert added before invoking Heap::increase.
Final Listing
With the above changes
encapsulated into a Graph object, that
also reads the graph from input like:
0 2 1.5
0 3 15
2 4 2.5
2 1 5
1 7 0.5
7 3 0.5
Where each line contains (source, target, weight).
A separate file can contain heuristic values for vertices index [0, ...),
optionally newline-separated, e.g. "7 11 99 33 44 55"
and now returning the arrived-at node instead of its cost only
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
#include <deque>
#include <fstream>
using Cost = double;
struct Vertex;
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using MutableQueue = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = MutableQueue::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), cachedHeuristic(h) {}
Cost distance = INF;
Cost cachedHeuristic = 0;
Handle handle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + cachedHeuristic; }
Cost euclideanDistanceTo(Vertex* v);
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
class Graph {
std::vector<Cost> _heuristics;
Cost getHeuristic(Vertex* v) {
size_t n = id(v);
return n < _heuristics.size() ? _heuristics[n] : 0;
}
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
public:
Graph(std::string edgeFile, std::string heurFile) {
{
std::ifstream stream(heurFile);
_heuristics.assign(std::istream_iterator<Cost>(stream), {});
if (!stream.eof())
throw std::runtime_error("Unexpected heuristics");
}
std::ifstream stream(edgeFile);
size_t src, tgt;
double weight;
while (stream >> src >> tgt >> weight) {
_nodes.resize(std::max({_nodes.size(), src + 1, tgt + 1}));
_nodes[src].adj.push_back({node(tgt), weight});
}
if (!stream.eof())
throw std::runtime_error("Unexpected input");
}
Node search(size_t from, size_t to) {
assert(from < _nodes.size());
assert(to < _nodes.size());
return AStar(node(from), node(to));
}
size_t id(Node node) const {
// ugh, this is just for "pretty output"...
for (size_t i = 0; i < _nodes.size(); ++i) {
if (node == &_nodes[i])
return i;
}
throw std::out_of_range("id");
};
Node node(int id) { return &_nodes.at(id); };
private:
// simple graph data structure with minimal helpers:
std::deque<Vertex> _nodes; // reference stable when growing at the back
// search state
MutableQueue _queue;
void enqueue(Node n) {
assert(n && (n->handle == NOHANDLE));
// get heuristic before insertion!
n->cachedHeuristic = getHeuristic(n);
n->handle = _queue.push(n);
}
Node dequeue() {
Node node = _queue.top();
node->handle = NOHANDLE;
_queue.pop();
return node;
}
Node AStar(Node start, Node destination) {
_queue.clear();
start->distance = 0;
enqueue(start);
while (!_queue.empty()) {
Node u = dequeue();
if (u == destination) {
return u;
}
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = u->distance + weight;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->handle == NOHANDLE) {
// also caches heuristic
enqueue(target);
} else {
// NOTE: avoid updating heuristic here, because it
// breaks the queue invariant if heuristic increased
// more than decrease in distance
assert(target->cost() < cost_prior);
_queue.increase(target->handle);
}
}
}
}
return nullptr;
}
};
int main() {
Graph graph("input.txt", "heur.txt");
Node arrival = graph.search(0, 7);
std::cout << "reverse path: \n";
for (Node n = arrival; n != nullptr; n = n->predecessor) {
std::cout << " - " << graph.id(n) << " cost " << n->cost() << "\n";
}
}
Again, printing the expected
reverse path:
- 7 cost 7
- 1 cost 17.5
- 2 cost 100.5
- 0 cost 7
Note how the heuristics changed the cost, but not optimal path in this case.
I'm looking for efficient memory allocation when dealing with recursive function. As far as I understand, variables I use in the function will remain allocated in memory until recursion is finished. Is there a way to avoid this as I believe this causes slow run of my code below where state variable is copied every time the function is called (correct me if I'm wrong as I'm new to C++).
#include <fstream>
#include <vector>
using namespace std;
int N = 30;
double MIN_COST = 1000000;
vector<int> MIN_CUT = {};
void minCut(vector<int> state, int index, int nodeValue) {
double currentCost;
if (index >= 0) {
currentCost = getCurrentCost(state); // some magic evaluating state cost
state.push_back(nodeValue);
if (currentCost >= MIN_COST) { // kill branch if incomplete solution is already worse than best achieved solution
return;
}
}
if (index == N - 1) { // check if leaf node
if (currentCost < MIN_COST) {
MIN_COST = currentCost;
MIN_CUT = state;
}
return;
}
minCut(state, index + 1, 1); // left subtree - adding 1 to vector
minCut(state, index + 1, 0); // right subtree - adding 0 to vector
return;
}
int main() {
vector<int> state = {};
minCut(state, -1, NULL);
cout << MIN_COST << "\n";
return 0;
}
Your algorithm is effectively building a tree of paths, but you're using a vector to hold the nodes for each path.
A
/ \
/ \
B C
/ \ / \
D E F G
This is the tree you're traversing.
But you're creating new vectors at every node, which contain the whole path up to that node. So as you're visiting node G, in your stack you have 3 vectors:
vector { A, C, G }
vector { A, C }
vector { A }
It should be clear how this is less efficient as you have noticed, but maybe seeing it this way hints at the correct efficient implementation.
The call stack itself holds the path to the root node. The stack when visiting G would be something like
minCut < visiting G >
minCut < visiting C >
minCut < visiting A >
In order to efficiently exploit this fact, make minCut pass the minimum amount of information. In this case we're talking about something linked-list like.
You have then two options that jump out:
Use vector, but:
Pass it by reference.
And you must then maintain it across calls, pushing and popping nodes to keep synchronized with the actual state.
Use an actual linked list. It should be easy to construct the vector by traversing pointers-to-parent-nodes.
Yes, there is a more efficient way to pass state through each function call. This is called passing by reference and can be achieved like so:
void minCut(vector<int>& state, int index, int nodeValue) { ...
This will result in the original state being referenced instead of copied each time the function is called.
For this to work correctly in the code you posted you will have to make some modifications, this is just the general concept.
I was learning segment tree from this page : http://letuskode.blogspot.com/2013/01/segtrees.html
I am finding trouble to understand various fragments of code.I will ask them one by one.Any help will be appreciated.
Node declaration:
struct node{
int val;
void split(node& l, node& r){}
void merge(node& a, node& b)
{
val = min( a.val, b.val );
}
}tree[1<<(n+1)];
1.What will the split function do here ?
2.This code is used for RMQ . So I think that val will be the minimum of two segments and store it in some other segment.Where the value will be saved?
Range Query Function:
node range_query(int root, int left_most_leaf, int right_most_leaf, int u, int v)
{
//query the interval [u,v), ie, {x:u<=x<v}
//the interval [left_most_leaf,right_most_leaf) is
//the set of all leaves descending from "root"
if(u<=left_most_leaf && right_most_leaf<=v)
return tree[root];
int mid = (left_most_leaf+right_most_leaf)/2,
left_child = root*2,
right_child = left_child+1;
tree[root].split(tree[left_child], tree[right_child]);
//node l=identity, r=identity;
//identity is an element such that merge(x,identity) = merge(identity,x) = x for all x
if(u < mid) l = range_query(left_child, left_most_leaf, mid, u, v);
if(v > mid) r = range_query(right_child, mid, right_most_leaf, u, v);
tree[root].merge(tree[left_child],tree[right_child]);
node n;
n.merge(l,r);
return n;
}
1.What is the use of the array tree and what values will be kept there ?
2.What will this statement : tree[root].split(tree[left_child], tree[right_child]); do ?
3.What will those statements do ? :
node n;
n.merge(l,r);
return n;
Update and Merge Up functions:
I am not understanding those two functions properly:
void mergeup(int postn)
{
postn >>=1;
while(postn>0)
{
tree[postn].merge(tree[postn*2],tree[postn*2+1]);
postn >>=1;
}
}
void update(int pos, node new_val)
{
pos+=(1<<n);
tree[pos]=new_val;
mergeup(pos);
}
Also what should I write inside the main function to make this thing work?
Suppose I have an array A={2,3,2,4,20394,21,-132,2832} , How I can use this code to find RMQ(1,4)?
1.What will the split function do here ?
Nothing: the function body is empty. There may be some other implementation where an action is required. (See Example 3) And see answer to 2b
2.... Where the value will be saved?
In the "val" field of the class/struct for which "merge" is called.
1b.What is the use of the array tree and what values will be kept there ?
Array "node tree[...]" stores all the nodes of the tree. Its element type is "struct node".
2b.What will this statement : tree[root].split(tree[left_child], tree[right_child]); do ?
It calls split for the struct node that's stored at index root, passing the nodes of the split node's children to it. What it actually does to tree[root] depends on the implementation of "split".
3b.What will those statements do ? :
node n; // declare a new node
n.merge(l,r); // call merge - store the minimum of l.val, r.val into n.val
return n; // terminate the function and return n
I'll have to figure out the answers to your last Qs in the context of that code. Will take a little time.
Later
This should build a tree and do a range query. I'm not sure that the code on that page is correct. Anyway, the interface for range_query is not what you'd expect for easy usage.
int main(){
int a[] = { -132, 1, 2, 3, 4, 21, 2832, 20394};
for( int i = 0; i < 8; i++ ){
node x;
x.val = a[i];
update( i, x);
}
node y = range_query(0, 8, 15, 8 + 1, 8 + 4 );
}
I'm new to C++ so am still learning. I am trying to write an algorithm to build a tree recursively, I would usually write it according to Method 1 below, however, as when the function returns it makes a (I hope deep) copy of the RandomTreeNode, I am concerned about calling it recursively and would therefore prefer method 2. Am I correct in my thinking?
Method 1
RandomTreeNode build_tree(std::vector<T>& data, const std::vector<funcion_ptr>& functions){
if(data.size() == 0 || data_has_same_values(data)){
RandomeTreeNode node = RandomTreeNode();
node.setData(node);
return node;
}
RandomTreeNode parent = RandomTreeNode();
vector<T> left_data = split_data_left(data);
vector<T> right_data = split_data_right(data);
parent.set_left_child(build_tree(left_data));
parent.set_right_child(build_tree(right_data));
return parent;
}
Method 2
void build_tree(RandomTreeNode& current_node, vector<T> data){
if(data.size() == 0 || data_has_same_values(data)){
current_node.setData(node);
}
vector<T> left_data = split_data_left(data);
vector<T> right_data = split_data_right(data);
RandomTreeNode left_child = RandomTreeNode();
RandomTreeNode right_child = RandomTreeNode();
current_node.set_left_child(left_child);
current_node.set_right_child(right_child);
build_tree(left_child, left_data);
build_tree(right_child, right_data);
}
There are several improvements.
First, you're copying a vector. As I understand the name of your functions, you're splitting a vector in two blocks ([left|right] and not [l|r|lll|r|...]). So, instead of passing a vector each time, you can just pass index to specify the ranges.
The method 2, if well implemented, will be more efficient in memory. So, you should improve the idea behind it.
Last, you can use an auxilliary function, which will be more suited to the problem (a mix between method 1 and method 2).
Here is some sample code:
// first is inclusive
// last is not inclusive
void build_tree_aux(RandomTreeNode& current_node, std::vector<T>& data, int first, int last)
{
if(last == first || data_has_same_values(data,first,last))
{
current_node.setData(data,first,last);
// ...
}
// Find new ranges
int leftFirst = first;
int leftLast = split_data(data,first,last);
int rightFirst = leftLast;
int rightLast = last;
// Instead of copying an empty node, we create the children
// of current_node, and then process these nodes
current_node.build_left_child();
current_node.build_right_child();
// Recursion, left_child() and right_child() returns reference
build_tree_aux(current_node.left_child(),data,leftFirst,leftLast);
build_tree_aux(current_node.right_child(),data,rightFirst,rightLast);
/*
// left_child() and right_child() are not really breaking encapsulation,
// because you can consider that the child nodes are not really a part of
// a node.
// But if you want, you can do the following:
current_node.build_tree(data,leftFirst,leftLast);
// Where RandomTreeNode::build_tree simply call build_tree_aux on the 2 childrens
*/
}
RandomTreeNode build_tree(std::vector<T>& data)
{
RandomTreeNode root;
build_tree_aux(root,data,0,data.size());
return root;
}
I have written an algorithm for finding nth smallest element in BST but it returns root node instead of the nth smallest one. So if you input nodes in order 7 4 3 13 21 15, this algorithm after call find(root, 0) returns Node with value 7 instead of 3, and for call find(root, 1) it returns 13 instead of 4. Any thoughts ?
Binode* Tree::find(Binode* bn, int n) const
{
if(bn != NULL)
{
find(bn->l, n);
if(n-- == 0)
return bn;
find(bn->r, n);
}
else
return NULL;
}
and definition of Binode
class Binode
{
public:
int n;
Binode* l, *r;
Binode(int x) : n(x), l(NULL), r(NULL) {}
};
It is not possible to efficiently retrieve the n-th smallest element in a binary search tree by itself. However, this does become possible if you keep in each node an integer indicating the number of nodes in its entire subtree. From my generic AVL tree implementation:
static BAVLNode * BAVL_GetAt (const BAVL *o, uint64_t index)
{
if (index >= BAVL_Count(o)) {
return NULL;
}
BAVLNode *c = o->root;
while (1) {
ASSERT(c)
ASSERT(index < c->count)
uint64_t left_count = (c->link[0] ? c->link[0]->count : 0);
if (index == left_count) {
return c;
}
if (index < left_count) {
c = c->link[0];
} else {
c = c->link[1];
index -= left_count + 1;
}
}
}
In the above code, node->link[0] and node->link[1] are the left and right child of node, and node->count is the number of nodes in the entire subtree of node.
The above algorithm has O(logn) time complexity, assuming the tree is balanced. Also, if you keep these counts, another operation becomes possible - given a pointer to a node, it is possible to efficiently determine its index (the inverse of the what you asked for). In the code I linked, this operation is called BAVL_IndexOf().
Be aware that the node counts need to be updated as the tree is changed; this can be done with no (asymptotic) change in time complexity.
There are a few problems with your code:
1) find() returns a value (the correct node, assuming the function is working as intended), but you don't propagate that value up the call chain, so top-level calls don't know about the (possible) found element
.
Binode* elem = NULL;
elem = find(bn->l, n);
if (elem) return elem;
if(n-- == 0)
return bn;
elem = find(bn->r, n);
return elem; // here we don't need to test: we need to return regardless of the result
2) even though you do the decrement of n at the right place, the change does not propagate upward in the call chain. You need to pass the parameter by reference (note the & after int in the function signature), so the change is made on the original value, not on a copy of it
.
Binode* Tree::find(Binode* bn, int& n) const
I have not tested the suggested changes, but they should put you in the right direction for progress