AVL tree balance factor - c++

I have an AVL tree class, I want to find balance factor of each node ( balance_factor: node->Left_child->height - node->right_Child->height )
Here is my code:
int tree::findBalanceFactor(node p){
int a;
if( p.lchild) p.lchild->balance_factor=findBalanceFactor( *p.lchild );
if( p.rchild) p.rchild->balance_factor=findBalanceFactor( *p.rchild );
if( p.rchild && p.lchild ) a=p.balance_factor = p.lchild->height - p.rchild->height ;
if( p.rchild && !p.lchild ) a=p.balance_factor = 0 - p.rchild->height;
if( !p.rchild && p.lchild ) a=p.balance_factor = p.lchild->height;
if( !p.rchild && !p.lchild ) a=p.balance_factor = 0;
cout << "md" << a << endl;
return a;
}
In the main function when I print root->balance_factor it shows me always number zero balance_factor is a public variable and in the constructor I assigned zero to that.
What is the wrong with my code?

There's a much simpler way to do this than testing every permutation of lchild and rchild:
int tree::findBalanceFactor(node &n) {
int lheight = 0;
int rheight = 0;
if (n.lchild) {
findBalanceFactor(*n.lchild);
lheight = n.lchild->height;
}
if (n.rchild) {
findBalanceFactor(*n.rchild);
rheight = n.rchild->height;
}
n.balance_factor = lheight - rheight;
std::cout << "md " << n.balance_factor << std::endl;
return n.balance_factor;
}
Since this otherwise seems to have ended up as an all-code answer, I'll add a brief note on how to get from the original code to this.
On one level, it's trivial to observe that each of the four branches in the original has the same form (left - right), but with left=0 whenever lchild is null, and right=0 whenever rchild is null.
More broadly, it's really useful to look for this kind of pattern (ie, that each branch has essentially the same expression). Writing out truth tables or otherwise partitioning your state space on paper, can help clarify these patterns in more complex code.
You should always aim to know what the general case is - whether because you implemented that first, or because you were able to factor it back out of several specific cases. Often implementing the general case will be good enough anyway, as well as being the easiest version of the logic to understand.
If the general case isn't good enough for some reason, then being easy to understand means it is still a good comment, as it provides a point of comparison for the special cases you actually implement.

I am guessing that the reason why the balance_factor of the root node is always 0 because of these 2 lines of code in the tree::findBalanceFactor method:
if( p.lchild) p.lchild->balance_factor=findBalanceFactor( *p.lchild );
if( p.rchild) p.rchild->balance_factor=findBalanceFactor( *p.rchild );
I suppose that the node struct/class looks something like this:
struct node {
struct node *lchild;
struct node *rchild;
int balance_factor;
int height;
};
What happens in findBalanceFactor( *p.lchild ) and findBalanceFactor( *p.rchild ) is that, we are passing new copies of p.lchild and p.rchild into findBalanceFactor (as seen from the pointer dereference), and hence the balance_factor attribute of the original p.lchild and p.rchild are not updated.
The solution will be to modify the tree::findBalanceFactor method to take in pointers to node, like this (I've taken the liberty to prettify the code a little):
int tree::findBalanceFactor(node *p) {
int a;
if (p->lchild) {
findBalanceFactor(p->lchild);
}
if (p->rchild) {
findBalanceFactor(p->rchild);
}
if (p->rchild && p->lchild) {
a = p->balance_factor = p->lchild->height - p->rchild->height;
} else if (p->rchild && !p->lchild) {
a = p->balance_factor = 0 - p->rchild->height;
} else if (!p->rchild && p->lchild) {
a = p->balance_factor = p->lchild->height;
} else {
// this is the case for !p->rchild && !p->lchild
a = p->balance_factor = 0;
}
cout << "md" << a << endl;
return a;
}
For p->lchild and p->rchild, we do not need to set their balance_factor another time, since the balance_factor of each node is already set in one of the 4 possible cases of the very long if statement.

Related

Boost Fibonacci Heap Access Violation during pop()

Context
I'm currently implementing some form of A* algorithm. I decided to use boost's fibonacci heap as underlying priority queue.
My Graph is being built while the algorithm runs. As Vertex object I'm using:
class Vertex {
public:
Vertex(double, double);
double distance = std::numeric_limits<double>::max();
double heuristic = 0;
HeapData* fib;
Vertex* predecessor = nullptr;
std::vector<Edge*> adj;
double euclideanDistanceTo(Vertex* v);
}
My Edge looks like:
class Edge {
public:
Edge(Vertex*, double);
Vertex* vertex = nullptr;
double weight = 1;
}
In order to use boosts fibonacci heap, I've read that one should create a heap data object, which I did like that:
struct HeapData {
Vertex* v;
boost::heap::fibonacci_heap<HeapData>::handle_type handle;
HeapData(Vertex* u) {
v = u;
}
bool operator<(HeapData const& rhs) const {
return rhs.v->distance + rhs.v->heuristic < v->distance + v->heuristic;
}
};
Note, that I included the heuristic and the actual distance in the comparator to get the A* behaviour, I want.
My actual A* implementation looks like that:
boost::heap::fibonacci_heap<HeapData> heap;
HeapData fibs(startPoint);
startPoint->distance = 0;
startPoint->heuristic = getHeuristic(startPoint);
auto handles = heap.push(fibs);
(*handles).handle = handles;
while (!heap.empty()) {
HeapData u = heap.top();
heap.pop();
if (u.v->equals(endPoint)) {
return;
}
doSomeGraphCreationStuff(u.v); // this only creates vertices and edges
for (Edge* e : u.v->adj) {
double newDistance = e->weight + u.v->distance;
if (e->vertex->distance > newDistance) {
e->vertex->distance = newDistance;
e->vertex->predecessor = u.v;
if (!e->vertex->fib) {
if (!u.v->equals(endPoint)) {
e->vertex->heuristic = getHeuristic(e->vertex);
}
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
}
else {
heap.increase(e->vertex->fib->handle);
}
}
}
}
Problem
The algorithm runs just fine, if I use a very small heuristic (which degenerates A* to Dijkstra). If I introduce some stronger heuristic, however, the program throws an exepction stating:
0xC0000005: Access violation writing location 0x0000000000000000.
in the unlink method of boosts circular_list_algorithm.hpp. For some reason, next and prev are null. This is a direct consequence of calling heap.pop().
Note that heap.pop() works fine for several times and does not crash immediately.
Question
What causes this problem and how can I fix it?
What I have tried
My first thought was that I accidentally called increase() even though distance + heuristic got bigger instead of smaller (according to the documentation, this can break stuff). This is not possible in my implementation, however, because I can only change a node if the distance got smaller. The heurisitic stays constant. I tried to use update() instead of increase() anyway, without success
I tried to set several break points to get a more detailed view, but my data set is huge and I fail to reproduce it with smaller sets.
Additional Information
Boost Version: 1.76.0
C++14
the increase function is indeed right (instead of a decrease function) since all boost heaps are implemented as max-heaps. We get a min-heap by reversing the comparator and using increase/decrease reversed
Okay, prepare for a ride.
First I found a bug
Next, I fully reviewed, refactored and simplified the code
When the dust settled, I noticed a behaviour change that looked like a potential logic error in the code
1. The Bug
Like I commented at the question, the code complexity is high due to over-reliance on raw pointers without clear semantics.
While I was reviewing and refactoring the code, I found that this has, indeed, lead to a bug:
e->vertex->fib = new HeapData(e->vertex);
e->vertex->fib->handle = heap.push(*(e->vertex->fib));
In the first line you create a HeapData object. You make the fib member point to that object.
The second line inserts a copy of that object (meaning, it's a new object, with a different object identity, or practically speaking: a different address).
So, now
e->vertex->fib points to a (leaked) HeapData object that does not exist in the queue, and
the actual queued HeapData copy has a default-constructed handle member, which means that the handle wraps a null pointer. (Check boost::heap::detail::node_handle<> in detail/stable_heap.hpp to verify this).
This would handsomely explain the symptom you are seeing.
2. Refactor, Simplify
So, after understanding the code I have come to the conclusion that
HeapData and Vertex should to be merged: HeapData only served to link a handle to a Vertex, but you can already make the Vertex contain a Handle directly.
As a consequence of this merge
your vertex queue now actually contains vertices, expressing intent of the code
you reduce all of the vertex access by one level of indirection (reducing Law Of Demeter violations)
you can write the push operation in one natural line, removing the room for your bug to crop up. Before:
target->fib = new HeapData(target);
target->fib->handle = heap.push(*(target->fib));
After:
target->fibhandle = heap.push(target);
Your Edge class doesn't actually model an edge, but rather an "adjacency" - the target
part of the edge, with the weight attribute.
I renamed it OutEdge for clarity and also changed the vector to contain values instead of
dynamically allocated OutEdge instances.
I can't tell from the code shown, but I can almost guarantee these were
being leaked.
Also, OutEdge is only 16 bytes on most platforms, so copying them will be fine, and adjacencies are by definition owned by their source vertex (because including/moving it to another source vertex would change the meaning of the adjacency).
In fact, if you're serious about performance, you may want to use a boost::container::small_vector with a suitably chosen capacity if you know that e.g. the median number of edges is "small"
Your comparison can be "outsourced" to a function object
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
After which the heap can be typed as:
namespace bh = boost::heap;
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
Your cost function violated more Law-Of-Demeter, which was easily fixed by adding a Literate-Code accessor:
Cost cost() const { return distance + heuristic; }
From quick inspection I think it would be more accurate to use infinite() over max() as initial distance. Also, use a constant for readability:
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
Cost distance = INF;
You had a repeated check for xyz->equals(endPoint) to avoid updating the heuristic for a vertex. I suggest moving the update till after vertex dequeue, so the repetition can be gone (of both the check and the getHeuristic(...) call).
Like you said, we need to tread carefully around the increase/update fixup methods. As I read your code, the priority of a node is inversely related to the "cost" (cumulative edge-weight and heuristic values).
Because Boost Heap heaps are max heaps this implies that increasing the priority should match decreasing cost. We can just assert this to detect any programmer error in debug builds:
assert(target->cost() < previous_cost);
heap.increase(target->fibhandle);
With these changes in place, the code can read a lot quieter:
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
2(b) Live Demo
Adding some test data:
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
using Cost = double;
struct Vertex;
Cost getHeuristic(Vertex const*) { return 0; }
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using Heap = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = Heap::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), heuristic(h) {}
Cost distance = INF;
Cost heuristic = 0;
Handle fibhandle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + heuristic; }
Cost euclideanDistanceTo(Vertex* v);
bool equals(Vertex const* u) const { return this == u; }
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
Cost AStarSearch(Node start, Node destination) {
Heap heap;
start->distance = 0;
start->fibhandle = heap.push(start);
while (!heap.empty()) {
Node u = heap.top();
heap.pop();
if (u->equals(destination)) {
return u->cost();
}
u->heuristic = getHeuristic(start);
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = weight + u->distance;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->fibhandle == NOHANDLE) {
target->fibhandle = heap.push(target);
} else {
assert(target->cost() < cost_prior);
heap.update(target->fibhandle);
}
}
}
}
return INF;
}
int main() {
// a very very simple graph data structure with minimal helpers:
std::vector<Vertex> graph(10);
auto node = [&graph](int id) { return &graph.at(id); };
auto id = [&graph](Vertex const* node) { return node - graph.data(); };
// defining 6 edges
graph[0].adj = {{node(2), 1.5}, {node(3), 15}};
graph[2].adj = {{node(4), 2.5}, {node(1), 5}};
graph[1].adj = {{node(7), 0.5}};
graph[7].adj = {{node(3), 0.5}};
// do a search
Node startPoint = node(0);
Node endPoint = node(7);
Cost cost = AStarSearch(startPoint, endPoint);
std::cout << "Overall cost: " << cost << ", reverse path: \n";
for (Node node = endPoint; node != nullptr; node = node->predecessor) {
std::cout << " - " << id(node) << " distance " << node->distance
<< "\n";
}
}
Prints
Overall cost: 7, reverse path:
- 7 distance 7
- 1 distance 6.5
- 2 distance 1.5
- 0 distance 0
3. The Plot Twist: Lurking Logic Errors?
I felt uneasy about moving the getHeuristic() update around. I wondered
whether I might have changed the meaning of the code, even though the control
flow seemed to check out.
And then I realized that indeed the behaviour changed. It is subtle. At first I thought the
the old behaviour was just problematic. So, let's analyze:
The source of the risk is an inconsistency in node visitation vs. queue prioritization.
When visiting nodes, the condition to see whether the target became "less
distant" is expressed in terms of distance only.
However, the queue priority will be based on cost, which is different
from distance in that it includes any heuristics.
The problem lurking there is that it is possible to write code that where the
fact that distance decreases, NEED NOT guarantee that cost decreases.
Going back to the code, we can see that this narrowly avoided, because the
getHeuristic update is only executed in the non-update path of the code.
In Conclusion
Understanding this made me realize that
the Vertex::heuristic field is intended merely as a "cached" version of the getHeuristic() function call
implying that that function is treated as if it is idempotent
that my version did change behaviour in that getHeuristic was now
potentially executed more than once for the same vertex (if visited again
via a cheaper path)
I would suggest to fix this by
renaming the heuristic field to cachedHeuristic
making an enqueue function to encapsulate the three steps for enqueuing a vertex:
simply omitting the endpoint check because it can at MOST eliminate a single invocation of getHeuristic for that node, probably not worth the added complexity
add a comment pointing out the subtlety of that code path
UPDATE as discovered in the comments, we also need the inverse
operatione (dequeue) to symmtrically update handle so it reflects that
the node is no longer in the queue...
It also drives home the usefulness of having the precondition assert added before invoking Heap::increase.
Final Listing
With the above changes
encapsulated into a Graph object, that
also reads the graph from input like:
0 2 1.5
0 3 15
2 4 2.5
2 1 5
1 7 0.5
7 3 0.5
Where each line contains (source, target, weight).
A separate file can contain heuristic values for vertices index [0, ...),
optionally newline-separated, e.g. "7 11 99 33 44 55"
and now returning the arrived-at node instead of its cost only
Live On Coliru
#include <boost/heap/fibonacci_heap.hpp>
#include <iostream>
#include <deque>
#include <fstream>
using Cost = double;
struct Vertex;
struct OutEdge { // adjacency from implied source vertex
Vertex* target = nullptr;
Cost weight = 1;
};
namespace bh = boost::heap;
using Node = Vertex*;
struct PrioCompare {
bool operator()(Node a, Node b) const;
};
using MutableQueue = bh::fibonacci_heap<Node, bh::compare<PrioCompare>>;
using Handle = MutableQueue::handle_type;
static const Handle NOHANDLE{}; // for expressive comparisons
static constexpr auto INF = std::numeric_limits<Cost>::infinity();
struct Vertex {
Vertex(Cost d = INF, Cost h = 0) : distance(d), cachedHeuristic(h) {}
Cost distance = INF;
Cost cachedHeuristic = 0;
Handle handle{};
Vertex* predecessor = nullptr;
std::vector<OutEdge> adj;
Cost cost() const { return distance + cachedHeuristic; }
Cost euclideanDistanceTo(Vertex* v);
};
// Now Vertex is a complete type, implement comparison
bool PrioCompare::operator()(Node a, Node b) const {
return a->cost() > b->cost();
}
class Graph {
std::vector<Cost> _heuristics;
Cost getHeuristic(Vertex* v) {
size_t n = id(v);
return n < _heuristics.size() ? _heuristics[n] : 0;
}
void doSomeGraphCreationStuff(Vertex const*) {
// this only creates vertices and edges
}
public:
Graph(std::string edgeFile, std::string heurFile) {
{
std::ifstream stream(heurFile);
_heuristics.assign(std::istream_iterator<Cost>(stream), {});
if (!stream.eof())
throw std::runtime_error("Unexpected heuristics");
}
std::ifstream stream(edgeFile);
size_t src, tgt;
double weight;
while (stream >> src >> tgt >> weight) {
_nodes.resize(std::max({_nodes.size(), src + 1, tgt + 1}));
_nodes[src].adj.push_back({node(tgt), weight});
}
if (!stream.eof())
throw std::runtime_error("Unexpected input");
}
Node search(size_t from, size_t to) {
assert(from < _nodes.size());
assert(to < _nodes.size());
return AStar(node(from), node(to));
}
size_t id(Node node) const {
// ugh, this is just for "pretty output"...
for (size_t i = 0; i < _nodes.size(); ++i) {
if (node == &_nodes[i])
return i;
}
throw std::out_of_range("id");
};
Node node(int id) { return &_nodes.at(id); };
private:
// simple graph data structure with minimal helpers:
std::deque<Vertex> _nodes; // reference stable when growing at the back
// search state
MutableQueue _queue;
void enqueue(Node n) {
assert(n && (n->handle == NOHANDLE));
// get heuristic before insertion!
n->cachedHeuristic = getHeuristic(n);
n->handle = _queue.push(n);
}
Node dequeue() {
Node node = _queue.top();
node->handle = NOHANDLE;
_queue.pop();
return node;
}
Node AStar(Node start, Node destination) {
_queue.clear();
start->distance = 0;
enqueue(start);
while (!_queue.empty()) {
Node u = dequeue();
if (u == destination) {
return u;
}
doSomeGraphCreationStuff(u);
for (auto& [target, weight] : u->adj) {
auto curDistance = u->distance + weight;
// if cheaper route, queue or update queued
if (curDistance < target->distance) {
auto cost_prior = target->cost();
target->distance = curDistance;
target->predecessor = u;
if (target->handle == NOHANDLE) {
// also caches heuristic
enqueue(target);
} else {
// NOTE: avoid updating heuristic here, because it
// breaks the queue invariant if heuristic increased
// more than decrease in distance
assert(target->cost() < cost_prior);
_queue.increase(target->handle);
}
}
}
}
return nullptr;
}
};
int main() {
Graph graph("input.txt", "heur.txt");
Node arrival = graph.search(0, 7);
std::cout << "reverse path: \n";
for (Node n = arrival; n != nullptr; n = n->predecessor) {
std::cout << " - " << graph.id(n) << " cost " << n->cost() << "\n";
}
}
Again, printing the expected
reverse path:
- 7 cost 7
- 1 cost 17.5
- 2 cost 100.5
- 0 cost 7
Note how the heuristics changed the cost, but not optimal path in this case.

How does the Hill Climbing algorithm work?

I'm learning Artificial Intelligence from a book, the book vaguely explains the code I'm about to post here, I assume because the author assumes everyone has experienced hill climbing algorithm before. The concept is rather straightforward, but I just don't understand some of the code below and I'd like someone to help me understand this algorithm a bit clearer before I move on.
I commented next to the parts that confuses me most, a summary of what these lines are doing would be very helpful to me.
int HillClimb::CalcNodeDist(Node* A, Node* B)
{
int Horizontal = abs(A->_iX - B->_iX);
int Vertical = abs(A->_iY - B->_iY);
return(sqrt(pow(_iHorizontal, 2) + pow(_iVertical, 2)));
}
void HillClimb::StartHillClimb()
{
BestDistance = VisitAllCities();
int CurrentDistance = BestDistance;
while (true)
{
int i = 0;
int temp = VisitAllCities();
while (i < Cities.size())
{
//Swapping the nodes
Node* back = Cities.back();
Cities[Cities.size() - 1] = Cities[i];
Cities[i] = back; // Why swap last city with first?
CurrentDistance = VisitAllCities(); // Why visit all nodes again?
if (CurrentDistance < BestDistance) // What is this doing?
{
BestDistance = CurrentDistance; //???
break;
}
else
{
back = Cities.back();
Cities[Cities.size() - 1] = Cities[i];
Cities[i] = back;
}
i++;
}
if (CurrentDistance == temp)
{
break;
}
}
}
int HillClimb::VisitAllCities()
{
int CurrentDistance = 0;
for (unsigned int i = 0; i < Cities.size(); i++)
{
if (i == Cities.size() - 1)//Check if last city, link back to first city
{
CurrentDistance += CalcNodeDist(Cities[i], Cities[0]);
}
else
{
CurrentDistance += CalcNodeDist(Cities[i], Cities[i + 1]);
}
}
return(CurrentDistance);
}
Also the book doesn't state what type of hill climb this is. I assume it's basic hill climb as it doesn't restart when it gets stuck?
Essentially, it does this in pseudo-code:
initialize an order of nodes (that is, a list) which represents a circle
do{
find an element in the list so that switching it with the last element of the
list results in a shorter length of the circle that is imposed by that list
}(until no such element could be found)
VisitAllCities is a helper that computes the length of that circle, CalcNodeDist is a helper that computes the distance between two nodes
the outer while loop is what I called do-until, the inner while loop iterates over all elements.
The if (CurrentDistance < BestDistance) part simply checks whether changing that list by swapping results in a smaller length, if so, update the distance, if not, undo that change.
Did I cover everything you wanted to know? Question about a particular part?

Making a shell in c++, trying to create shell variables

So I have a relatively simple shell, it handles pipes, chdir, redirects and running programs. But I need a way of implementing shell variables like you would have in a normal shell(E.g HELLO=world).
int main()
{
while(true)
{
string result;
char * left[128];
char * right[128];
cout << "$$ ";
char command[128];
cin.getline(command,128);
if(strlen(command) != 0)
{
vector<char*>args;
char* prog = strtok(command, " ");
char* tmp = prog;
while ( tmp != NULL )
{
args.push_back( tmp );
tmp = strtok( NULL, " " );
}
char** argv = new char*[args.size()+1];
for ( int k = 0; k < args.size(); k++ )
{
argv[k] = args[k];
}
argv[args.size()] = NULL;
if ( strcmp( command, "exit" ) == 0 )
{
return 0;
}
if(!strcmp(prog,"cd"))
{
chdir(argv);
}
if(prog[0] == '.')
{
std::system(args[0]);
}
else
{
pid_t kidpid = fork();
if(kidpid < 0)
{
perror("Could not fork");
return -1;
}
else if (kidpid == 0)
{
execvp(prog,argv);
}
else
{
if(waitpid(kidpid,0,0) <0 )
{
return -1;
}
}
}
}
}
return 0;
}
Here's the shell from the simplest form, the function calls do pretty much what they say.
You need 3 things:
Parse FOO=foo variable assignments in the input lines
Parse $FOO variable references in the input lines, replace with value
Storage of the variable names and values
There are endless possibilities how to do the latter.
Single dynamic char array, all variables stored in a single string, with a magic character of your choice separating the entries: FOO=foo#BAR=baz#SPAM=eggs. Scales O(n) with the number of entries.
Dynamic array containing pairs of char pointers for variable names and values. Scales O(n).
Linked list, where you insert above pairs in a sorted way. Scales O(log n) if you do insertions and lookups with a binary search.
Binary tree for above pairs, if unbalanced, scales between O(log n) and O(n).
Hash table. Scales O(1).
Etc, etc, etc.
With a dynamic array, I mean that you always realloc the whole thing upon insertions.
If you want to store the variables in your shell, you should look into getenv() and setenv() found in stdlib.
http://pubs.opengroup.org/onlinepubs/009695399/functions/setenv.html
http://pubs.opengroup.org/onlinepubs/009695399/functions/getenv.html
This avoids having to use storage in your C/C++ program with the STL. For example, you can set variables by
setenv("variablename", "value", 1);
Where the 1 turns on overwrite for the current variable if it exists. So in your example, we would use
setenv("HELLO", "world", 1);
You can also retrieve the value of the variable by using
char *value = getenv("variablename");
You do not need to dynamically allocate value.
Note: These values persist for the life of the program that simulates the shell, after which they no longer exist.

Huffman Coding - Incorrect Codes

Im trying to build a Huffman tree using an array. Everytime i combine two nodes, I add the new node to the array and sort it. My code works for some test cases but for others, it produces the wrong codes. Can someone please point me to the right direct in debugging? Thanks!
Here is a segment of my compress function.
while(tree->getSize() != 1)
{
right = tree->getMinNode();
left = tree->getMinNode();
Node *top = new Node;
top->initializeNode((char)1, left->getFrequency() + right->getFrequency(), left, right);
tree->insertNode(top);
} // while
root = tree->getRootNode();
tree->encodeTree(root, number, 0);
tree->printCode(data);
The getMinNode() function returns the smallest node and after I insert the node that combines the 2 smallest nodes, I use qsort to sort the array. This is the function i use to sort the array.
I am sorting: 1st with frequency, 2nd with data. If the node is not a leaf node, meaning it does not contain one of the characters presented in the uncompressed data, I find the minimum data in the subtree using the function getMinData().
int Tree::compareNodes(const void *a, const void *b)
{
if( ((Node *)a)->frequency < ((Node *)b)->frequency )
return -1;
if( ((Node *)a)->frequency > ((Node *)b)->frequency )
return 1;
if( ((Node *)a)->frequency == ((Node *)b)->frequency )
{
if( ((Node *)a)->isLeafNode() && ((Node *)b)->isLeafNode() )
{
if( (int)((Node *)a)->data < (int)((Node *)b)->data )
return -1;
if( (int)((Node *)a)->data > (int)((Node *)b)->data )
return 1;
} // if
else
{
int minA, minB;
minA = (int)((Node *)a)->data;
minB = (int)((Node *)b)->data;
if(!((Node *)a)->isLeafNode())
getMinData(a, &minA);
if(!((Node *)b)->isLeafNode())
getMinData(b, &minB);
if(minA < minB)
return -1;
if(minA > minB)
return 1;
}// else
} // if
return 0;
} // compareNodes()
Say if for example, i have the following text.
I agree that Miss Emily Grierson is a symbol of the Old South. Her house and family traditions support this suggestion. However, I do not see her as a victim of the values of chivalry, formal manners, and tradition. I consider these values to have positive effects of a person rather have negative impacts. If for any reason that had made Emily isolate herself from her community and ultimately kill a man she likes, it would be herself. She acts as her own antagonist in the story because she does not have conflict with anyone else except herself. She makes herself become a “victim,” as in being friendless and miserable. The traditions and manners taught to her may have effects on her behavior but it is her attitude towards the outside world that separates her from the rest of the townspeople
\n
with the '\n' at the end. some of the characters i get the correct huffman codes, but some others i don't. Ascii 83('S'), 120('x'), 84('T') are some of the characters with the wrong codes. Thanks!

c++ directed graph depth first search

I am attempting to write a method DFS method for a directed graph. Right now I am running into a segmentation fault, and I am really unsure as to where it is. From what I understand of directed graphs I believe that my logic is right... but a fresh set of eyes would be a very nice help.
Here is my function:
void wdigraph::depth_first (int v) const {
static int fVertex = -1;
static bool* visited = NULL;
if( fVertex == -1 ) {
fVertex = v;
visited = new bool[size];
for( int x = 0; x < size; x++ ) {
visited[x] = false;
}
}
cout << label[v];
visited[v] = true;
for (int v = 0; v < adj_matrix.size(); v++) {
for( int x = 0; x < adj_matrix.size(); x++) {
if( adj_matrix[v][x] != 0 && visited[x] != false ) {
cout << " -> ";
depth_first(x);
}
if ( v == fVertex ) {
fVertex = -1;
delete [] visited;
visited = NULL;
}
}
}
}
class definition:
class wdigraph {
public:
wdigraph(int =NO_NODES); // default constructor
~wdigraph() {}; // destructor
int get_size() { return size; } // returns size of digraph
void depth_first(int) const;// traverses graph using depth-first search
void print_graph() const; // prints adjacency matrix of digraph
private:
int size; // size of digraph
vector<char> label; // node labels
vector< vector<int> > adj_matrix; // adjacency matrix
};
thanks!
You are deleting visited before the end of the program.
Coming back to the starting vertex doesn't mean you finished.
For example, for the graph of V = {1,2,3}, E={(1,2),(2,1),(1,3)}.
Also, notice you are using v as the input parameter and also as the for-loop variable.
There are a few things you might want to consider. The first is that function level static variables are not usually a good idea, you can probably redesign and make those either regular variables (at the cost of extra allocations) or instance members and keep them alive.
The function assumes that the adjacency matrix is square, but the initialization code is not shown, so it should be checked. The assumption can be removed by making the inner loop condition adj_matrix[v].size() (given a node v) or else if that is an invariant, add an assert before that inner loop: assert( adj_matrix[v].size() == adj_matrix.size() && "adj_matrix is not square!" ); --the same goes for the member size and the size of the adj_matrix it self.
The whole algorithm seems more complex than it should, a DFS starting at node v has the general shape of:
dfs( v )
set visited[ v ]
operate on node (print node label...)
for each node reachable from v:
if not visited[ node ]:
dfs( node )
Your algorithm seems to be (incorrectly by the way) transversing the graph in the opposite direction. You set the given node as visited, and then try to locate any node that is the start point of an edge to that node. That is, instead of following nodes reachable from v, you are trying to get nodes for which v is reachable. If that is the case (i.e. if the objective is printing all paths that converge in v) then you must be careful not to hit the same edge twice or you will end up in an infinite loop -> stackoverflow.
To see that you will end with stackoverlow, consider this example. The start node is 1. You create the visited vector and mark position 1 as visited. You find that there is an edge (0,1) in the tree, and that triggers the if: adj_matrix[0][1] != 0 && visited[1], so you enter recursively with start node being 1 again. This time you don't construct the auxiliary data, but remark visited[1], enter the loop, find the same edge and call recursively...
I see a couple of problems:
The following line
if( adj_matrix[v][x] != 0 && visited[x] != false ) {
should be changed to
if( adj_matrix[v][x] != 0 && visited[x] == false ) {
(You want to recurse only on vertices that have not been visited already.)
Also, you're creating a new variable v in the for loop that hides the parameter v: that's legal C++, but it's almost always a terrible idea.