Huffman Coding - Incorrect Codes - c++

Im trying to build a Huffman tree using an array. Everytime i combine two nodes, I add the new node to the array and sort it. My code works for some test cases but for others, it produces the wrong codes. Can someone please point me to the right direct in debugging? Thanks!
Here is a segment of my compress function.
while(tree->getSize() != 1)
{
right = tree->getMinNode();
left = tree->getMinNode();
Node *top = new Node;
top->initializeNode((char)1, left->getFrequency() + right->getFrequency(), left, right);
tree->insertNode(top);
} // while
root = tree->getRootNode();
tree->encodeTree(root, number, 0);
tree->printCode(data);
The getMinNode() function returns the smallest node and after I insert the node that combines the 2 smallest nodes, I use qsort to sort the array. This is the function i use to sort the array.
I am sorting: 1st with frequency, 2nd with data. If the node is not a leaf node, meaning it does not contain one of the characters presented in the uncompressed data, I find the minimum data in the subtree using the function getMinData().
int Tree::compareNodes(const void *a, const void *b)
{
if( ((Node *)a)->frequency < ((Node *)b)->frequency )
return -1;
if( ((Node *)a)->frequency > ((Node *)b)->frequency )
return 1;
if( ((Node *)a)->frequency == ((Node *)b)->frequency )
{
if( ((Node *)a)->isLeafNode() && ((Node *)b)->isLeafNode() )
{
if( (int)((Node *)a)->data < (int)((Node *)b)->data )
return -1;
if( (int)((Node *)a)->data > (int)((Node *)b)->data )
return 1;
} // if
else
{
int minA, minB;
minA = (int)((Node *)a)->data;
minB = (int)((Node *)b)->data;
if(!((Node *)a)->isLeafNode())
getMinData(a, &minA);
if(!((Node *)b)->isLeafNode())
getMinData(b, &minB);
if(minA < minB)
return -1;
if(minA > minB)
return 1;
}// else
} // if
return 0;
} // compareNodes()
Say if for example, i have the following text.
I agree that Miss Emily Grierson is a symbol of the Old South. Her house and family traditions support this suggestion. However, I do not see her as a victim of the values of chivalry, formal manners, and tradition. I consider these values to have positive effects of a person rather have negative impacts. If for any reason that had made Emily isolate herself from her community and ultimately kill a man she likes, it would be herself. She acts as her own antagonist in the story because she does not have conflict with anyone else except herself. She makes herself become a “victim,” as in being friendless and miserable. The traditions and manners taught to her may have effects on her behavior but it is her attitude towards the outside world that separates her from the rest of the townspeople
\n
with the '\n' at the end. some of the characters i get the correct huffman codes, but some others i don't. Ascii 83('S'), 120('x'), 84('T') are some of the characters with the wrong codes. Thanks!

Related

C++ if a pointer point to nullptr, can I replace it with number?

Hey guys, I encountered some problems about C++. Actually it's not the problem of language feature, otherwise it is related to kind of coding style.
OK, Let's get to the point!
I try to write the AVL Tree and want to calculate the balance factors,
and according the rule, the subtree without nodes(just a empty tree),
its height should be treated as -1. Yeah, everything thinks fine,
but when I write the code, use pointer to read Node class member,
I cannot read the nullptr BAD ACCESS, so I add lots of conditions, which makes my code look bad. Here is some parts of my code.
struct Node{
int key;
int height;
Node* left;
Node* right;
Node* parent;
Node(void);
Node(int key);
};
while((parent_node->left->height - parent_node->right->height) <= 1
||(parent_node->left->height - parent_node->right->height) >= (-1))
{
parent_node = parent_node->parent;
if(parent_node == nullptr) break;
}
The result I want is that when the parent_node's left subtree is empty,
its height will be treated as -1. And the fact is, though it's empty, its height should not exist.
So in the code I only list four cases
1. left subtree == nullptr && right subtree == nullptr
2. left subtree != nullptr && right subtree == nullptr
3. left subtree != nullptr && right subtree != nullptr
4. left subtree == nullptr && right subtree != nullptr
Then I replace the code of height part with the value -1 respectively.
It feels painful. And this condition happens in my coding time many times, I want to find the better solution.
My English is not that good, so my description maybe sort of misleading, I will appreciate it if you help me in any way.
Create a function that compute the height of a subtree including the special cases, and use that instead of accessing the ->height data member:
int heightOfSubtree(Node* tree) {
if (tree == nullptr) {
return -1;
}
else {
return tree-> height;
}
}
then your code becomes:
while((heightOfSubtree(parent_node->left) - heightOfSubtree(parent_node->right)) <= 1
||((heightOfSubtree(parent_node->left) - heightOfSubtree(parent_node->right)) >= (-1))
{
...
}
or better, you can define a member function in the Node structure such as this:
bool Node::isBalanced() {
int unb = heightOfSubtree(left) - heightOfSubtree(right);
return (unb <= 1) || (unb >=-1);
}
and your while condition becomes:
while(parent_node->isBalanced()) {
...
}
p.s.: I believe there is a logical error in your code: I am not sure the condition you are checking is correct, since it is always true (any number is either bigger than -1 or smaller than 1, for some both are true)
Unless I misunderstand, you could point to a sentinel node in stead of null as the terminator link. Set the height of the sentinel to -1 and it doesn't need to be handled differently for that part of the algorithm.

How does the Hill Climbing algorithm work?

I'm learning Artificial Intelligence from a book, the book vaguely explains the code I'm about to post here, I assume because the author assumes everyone has experienced hill climbing algorithm before. The concept is rather straightforward, but I just don't understand some of the code below and I'd like someone to help me understand this algorithm a bit clearer before I move on.
I commented next to the parts that confuses me most, a summary of what these lines are doing would be very helpful to me.
int HillClimb::CalcNodeDist(Node* A, Node* B)
{
int Horizontal = abs(A->_iX - B->_iX);
int Vertical = abs(A->_iY - B->_iY);
return(sqrt(pow(_iHorizontal, 2) + pow(_iVertical, 2)));
}
void HillClimb::StartHillClimb()
{
BestDistance = VisitAllCities();
int CurrentDistance = BestDistance;
while (true)
{
int i = 0;
int temp = VisitAllCities();
while (i < Cities.size())
{
//Swapping the nodes
Node* back = Cities.back();
Cities[Cities.size() - 1] = Cities[i];
Cities[i] = back; // Why swap last city with first?
CurrentDistance = VisitAllCities(); // Why visit all nodes again?
if (CurrentDistance < BestDistance) // What is this doing?
{
BestDistance = CurrentDistance; //???
break;
}
else
{
back = Cities.back();
Cities[Cities.size() - 1] = Cities[i];
Cities[i] = back;
}
i++;
}
if (CurrentDistance == temp)
{
break;
}
}
}
int HillClimb::VisitAllCities()
{
int CurrentDistance = 0;
for (unsigned int i = 0; i < Cities.size(); i++)
{
if (i == Cities.size() - 1)//Check if last city, link back to first city
{
CurrentDistance += CalcNodeDist(Cities[i], Cities[0]);
}
else
{
CurrentDistance += CalcNodeDist(Cities[i], Cities[i + 1]);
}
}
return(CurrentDistance);
}
Also the book doesn't state what type of hill climb this is. I assume it's basic hill climb as it doesn't restart when it gets stuck?
Essentially, it does this in pseudo-code:
initialize an order of nodes (that is, a list) which represents a circle
do{
find an element in the list so that switching it with the last element of the
list results in a shorter length of the circle that is imposed by that list
}(until no such element could be found)
VisitAllCities is a helper that computes the length of that circle, CalcNodeDist is a helper that computes the distance between two nodes
the outer while loop is what I called do-until, the inner while loop iterates over all elements.
The if (CurrentDistance < BestDistance) part simply checks whether changing that list by swapping results in a smaller length, if so, update the distance, if not, undo that change.
Did I cover everything you wanted to know? Question about a particular part?

Binary Search avoid unreadable entry (hole in list)

I have implemented a binary search function but I have an issue with a list entry that may become unreadable. It's implemented in C++ but ill just use some pseudo code to make it easier. Please to not focus on the unreadable or string implementation, it's just pseudo code. What matter is that there are unreadable entries in the list that have to be navigated around.
int i = 0;
int imin = 0;
int imax = 99;
string search = "test";
while(imin <= imax)
{
i = imin + (imax - imin) / 2;
string text = vector.at(i);
if(text.isUnreadable())
{
continue;
}
if(compare(text, search) = 0)
{
break;
}
else if(compare(text, search) < 0)
{
imin = i + 1;
}
else if(compare(text, search) > 0)
{
imax = i - 1;
}
}
The searching itself is working pretty well, but the problem I have is how to avoid getting an endless loop if the text is unreadable. Anyone has a time tested approach for this? The loop should not just exit when unreadable but rather navigate around the hole.
I had similar task in one of projects - lookup on sequence where some of items are non-comparable.
I am not sure is this the best possible implementation, in my case it looks like this:
int low = first_comparable(0,env);
int high = last_comparable(env.total() - 1,env);
while (low < high)
{
int mid = low + ((high - low) / 2);
int tmid = last_comparable(mid,env);
if( tmid < low )
{
tmid = first_comparable(mid,env);
if( tmid == high )
return high;
if( tmid > high )
return -1;
}
mid = tmid;
...
}
If vector.at(mid) item is non-comparable it does lookup in its neighborhood to find closest comparable.
first/last_comparable() functions return index of first comparable element from given index. Difference is in direction.
inline int first_comparable( int n, E& env)
{
int n_elements = env.total();
for( ; n < n_elements; ++n )
if( env.is_comparable(n) )
return n;
return n;
}
Create a list of pointers to your data items. Do not add "unreadable" ones. Search the resulting list of pointers.
the problem I have is how to avoid getting an endless loop if the text is unreadable.
Seems like that continue should be break instead, so that you break out of the loop. You'd probably want to set a flag or something to indicate the error to whatever code follows the loop.
Another option is to throw an exception.
Really, you should do almost anything other than what you're doing. Currently, when you read one of these 'unreadable' states, you simply continue the loop. But imin and imax still have the same values, so you end up reading the same string from the same place in the vector, and find that it's unreadable again, and so on. You need to decide how you want to respond to one of these 'unreadable' states. I guessed above that you'd want to stop the search, in which case either setting a flag and breaking out of the loop or throwing an exception to accomplish the same thing would be reasonable choices.

AVL tree balance factor

I have an AVL tree class, I want to find balance factor of each node ( balance_factor: node->Left_child->height - node->right_Child->height )
Here is my code:
int tree::findBalanceFactor(node p){
int a;
if( p.lchild) p.lchild->balance_factor=findBalanceFactor( *p.lchild );
if( p.rchild) p.rchild->balance_factor=findBalanceFactor( *p.rchild );
if( p.rchild && p.lchild ) a=p.balance_factor = p.lchild->height - p.rchild->height ;
if( p.rchild && !p.lchild ) a=p.balance_factor = 0 - p.rchild->height;
if( !p.rchild && p.lchild ) a=p.balance_factor = p.lchild->height;
if( !p.rchild && !p.lchild ) a=p.balance_factor = 0;
cout << "md" << a << endl;
return a;
}
In the main function when I print root->balance_factor it shows me always number zero balance_factor is a public variable and in the constructor I assigned zero to that.
What is the wrong with my code?
There's a much simpler way to do this than testing every permutation of lchild and rchild:
int tree::findBalanceFactor(node &n) {
int lheight = 0;
int rheight = 0;
if (n.lchild) {
findBalanceFactor(*n.lchild);
lheight = n.lchild->height;
}
if (n.rchild) {
findBalanceFactor(*n.rchild);
rheight = n.rchild->height;
}
n.balance_factor = lheight - rheight;
std::cout << "md " << n.balance_factor << std::endl;
return n.balance_factor;
}
Since this otherwise seems to have ended up as an all-code answer, I'll add a brief note on how to get from the original code to this.
On one level, it's trivial to observe that each of the four branches in the original has the same form (left - right), but with left=0 whenever lchild is null, and right=0 whenever rchild is null.
More broadly, it's really useful to look for this kind of pattern (ie, that each branch has essentially the same expression). Writing out truth tables or otherwise partitioning your state space on paper, can help clarify these patterns in more complex code.
You should always aim to know what the general case is - whether because you implemented that first, or because you were able to factor it back out of several specific cases. Often implementing the general case will be good enough anyway, as well as being the easiest version of the logic to understand.
If the general case isn't good enough for some reason, then being easy to understand means it is still a good comment, as it provides a point of comparison for the special cases you actually implement.
I am guessing that the reason why the balance_factor of the root node is always 0 because of these 2 lines of code in the tree::findBalanceFactor method:
if( p.lchild) p.lchild->balance_factor=findBalanceFactor( *p.lchild );
if( p.rchild) p.rchild->balance_factor=findBalanceFactor( *p.rchild );
I suppose that the node struct/class looks something like this:
struct node {
struct node *lchild;
struct node *rchild;
int balance_factor;
int height;
};
What happens in findBalanceFactor( *p.lchild ) and findBalanceFactor( *p.rchild ) is that, we are passing new copies of p.lchild and p.rchild into findBalanceFactor (as seen from the pointer dereference), and hence the balance_factor attribute of the original p.lchild and p.rchild are not updated.
The solution will be to modify the tree::findBalanceFactor method to take in pointers to node, like this (I've taken the liberty to prettify the code a little):
int tree::findBalanceFactor(node *p) {
int a;
if (p->lchild) {
findBalanceFactor(p->lchild);
}
if (p->rchild) {
findBalanceFactor(p->rchild);
}
if (p->rchild && p->lchild) {
a = p->balance_factor = p->lchild->height - p->rchild->height;
} else if (p->rchild && !p->lchild) {
a = p->balance_factor = 0 - p->rchild->height;
} else if (!p->rchild && p->lchild) {
a = p->balance_factor = p->lchild->height;
} else {
// this is the case for !p->rchild && !p->lchild
a = p->balance_factor = 0;
}
cout << "md" << a << endl;
return a;
}
For p->lchild and p->rchild, we do not need to set their balance_factor another time, since the balance_factor of each node is already set in one of the 4 possible cases of the very long if statement.

How can text in my main C++ file (in code that hasn't executed yet) show up in a string?

I'm new to C++ so there's a lot I don't really understand, I'm trying to narrow down how I'm getting exc_bad_access but my attempts to print out values seems to be aggravating (or causing) the problem!
#include <iostream>
#include "SI_Term.h"
#include "LoadPrefabs.h"
int main() {
SI_Term * velocity = new SI_Term(1, "m/s");
std::cout<<"MAIN: FIRST UNITS "<<std::endl;
velocity->unitSet()->displayUnits();
return 0;
}
The above code produces an error (EXC_BAD_ACCESS) before the std::cout<< line even occurs. I traced it with xcode and it fails within the function call to new SI_Term(1, "m/s").
Re-running with the cout line commented out it runs and finishes. I would attach more code but I have a lot and I don't know what is relevant to this line seeming to sneak backwards and overwrite a pointer. Can anyone help me with where to look or how to debug this?
NEW INFO:
I narrowed it down to this block. I should explain at this point, this block is attempting to decompose a set of physical units written in the format kg*m/s^2 and break it down into kg, m, divide by s * s. Once something is broken down it uses LoadUnits(const char*) to read from a file. I am assuming (correctly at this point) that no string of units will contain anywhere near my limit of 40 characters.
UnitSet * decomposeUnits(const char* setOfUnits){
std::cout<<"Decomposing Units";
int i = 0;
bool divide = false;
UnitSet * nextUnit = 0;
UnitSet * temp = 0;
UnitSet * resultingUnit = new UnitSet(0, 0, 0, 1);
while (setOfUnits[i] != '\0') {
int j = 0;
char decomposedUnit[40];
std::cout<<"Wiped unit."<<std::endl;
while ((setOfUnits[i] != '\0') && (setOfUnits[i] != '*') && (setOfUnits[i] != '/') && (setOfUnits[i] != '^')) {
std::cout<<"Adding: " << decomposedUnit[i]<<std::endl;
decomposedUnit[j] = setOfUnits[i];
++i;
++j;
}
decomposedUnit[j] = '\0';
nextUnit = LoadUnits(decomposedUnit);
//The new unit has been loaded. now check for powers, if there is one read it, and apply it to the new unit.
//if there is a power, read the power, read the sign of the power and flip divide = !divide
if (setOfUnits[i] == '^') {
//there is a power. Analize.
++i;++j;
double power = atof(&setOfUnits[i]);
temp = *nextUnit^power;
delete nextUnit;
nextUnit = temp;
temp = 0;
}
//skip i and j till the next / or * symbol.
while (setOfUnits[i] != '\0' && setOfUnits[i] != '*' && setOfUnits[i] != '/') {
++i; ++j;
}
temp = resultingUnit;
if (divide) {
resultingUnit = *temp / *nextUnit;
} else {
resultingUnit = *temp * *nextUnit;
}
delete temp;
delete nextUnit;
temp = 0;
nextUnit = 0;
// we just copied a word and setOfUnits[i] is the multiply or divide or power character for the next set.
if (setOfUnits[i] == '/') {
divide = true;
}
++i;
}
return resultingUnit;
}
I'm tempted to say that SI_Term is messing with the stack (or maybe trashing the heap). Here's a great way to do that:
char buffer[16];
strcpy(buffer, "I'm writing too much into a buffer");
Your function will probably finish, but then wreak havoc. Check all arrays you have on the stack and make sure you don't write out of bounds.
Then apply standard debugging practices: Remove code one by one until it doesn't crash anymore, then start reinstating it to find your culprit.
You are mentioning xcode, so I assume you're on a MAC. I'D then suggest looking at the valgrind tool from http://valgrind.org/ That's a memory checker giving you information when yo're doing something wrong with memory. If your program was build including debugging symbols it should give you an stacktrace helping you to find the error.
Here, I removed the unimportant stuff:
while (setOfUnits[i] != '\0') {
while ((setOfUnits[i] != '\0') && (setOfUnits[i] != '*') && (setOfUnits[i] != '/') && (setOfUnits[i] != '^')) {
...
++i;
}
...
nextUnit = LoadUnits(decomposedUnit);
...
if (...) {
double power = ...;
temp = *nextUnit^power;
delete nextUnit;
}
....
temp = resultingUnit;
delete temp;
delete nextUnit;
...
++i;
}
There are a number of problems with this:
In the inner-loop, you increment i until setOfUnits[i] == '\0', the end of the string. Then you increment i again, past the end of the string.
nextUnit is of type UnitSet, which presumably overloads ^. Though it's possible that it overloads it to mean "exponentiation", it probably doesn't (and if it does, it shouldn't): in C-based languages, including C++, ^ means XOR, not exponentiation.
You are deleting pointers returned from other functions - that is, you have functions that return dynamically-allocated memory, and expect the caller to delete that memory. While not incorrect, and in fact common practice in C, it is considered bad practice in C++. Just have LoadUnits() return a UnitSet (rather than a UnitSet*), and make sure to overload the copy constructor and operator= in the UnitSet class. If performance then becomes a concern, you could return a const UnitSet& instead, or use smart pointers.
In similar vein, you are allocating and deleting inside the same function. There is no need for this: just make resultingUnit stack-allocated:
UnitSet resultingUnit(0, 0, 0, 1);
I know that last bullet-point sounds very confusing, but once you finally come to understand it, you'll likely know more about C++ than 90% of coders who claim to "know" C++. This site and this book are good places to start learning.
Good luck!