print all words of a dictionary using trie

print all words of a dictionary using trie - c++

I am working on a dictionary using a trie with the following struct in c
struct trie_node {
int is_end; //0 is is not the end of the word ,otherwise 1
char c;
struct trie_node* child[26];
};
I am able to insert words, search words and I would like to print all the words of the dictionary. Not sure how to handle it. I was trying to print
void print(struct trie_node node) {
int i = 0;
for (i = 0; i < 26; i++) {
if (node->child[i] != NULL) {
printf("%c", node->child[i]->c);
print(node->child[i]);
}
}
}
But it is not printing correctly
if for example I have the words
beer
bee
bear
beast
it is printing
bearster
it should print
bearbeastbeebeer
How can I print correctly the list of words ?

You need to keep track of the path (path from the root to the current node). When you reach to an end node (is_end is true), you print the path which is the dictionary word.
One approach is to use an array of char and keep track of its length so you know how many of elements you need to print. See the code below:
void print_path (char *path, int len){
int i;
for(i = 0; i < len; i++)
printf("%c", path[i]);
}
void print(struct trie_node* node, char *path, int len) {
// sanity check
if (! node)
return;
// current node is part of the current path, so add it
path[len++] = node->c;
// if it is an end node then print the path
if (node->is_end)
print_path(path, len);
// now go through the children and recursive call
int i = 0;
for (i = 0; i < 26; i++) {
if (node->child[i] != NULL) {
print(node->child[i], path, len);
}
}
}
int main(){
// proper allocation for the trie
// ...
// calling the print, assuming the height of tree is at most 128
char path[128];
print(b, path, 0);
}

you can try to use node.child[i]->c,when use struct var you must use a ".",when use struct point must use "->" or "(&point).",i don't know my think is true : )

Related

Permutations of String Using Stack C++

The program I have below finds all the permutations of a given string using a stack without recursion. I am having some trouble understanding what the place in the struct is for and how it plays into the logic for the algorithm. Could anyone help me understand this code? I have a struct that only has two entities:
class Node
{
public:
string word; // stores the word in the node
Node *next;
};
I would just like to understand why the place entity is needed.
Here is the code that finds all the permutations of a given string:
struct State
{
State (std::string topermute_, int place_, int nextchar_, State* next_ = 0)
: topermute (topermute_)
, place (place_)
, nextchar (nextchar_)
, next (next_)
{
}
std::string topermute;
int place;
int nextchar;
State* next;
};
std::string swtch (std::string topermute, int x, int y)
{
std::string newstring = topermute;
newstring[x] = newstring[y];
newstring[y] = topermute[x]; //avoids temp variable
return newstring;
}
void permute (std::string topermute, int place = 0)
{
// Linked list stack.
State* top = new State (topermute, place, place);
while (top != 0)
{
State* pop = top;
top = pop->next;
if (pop->place == pop->topermute.length () - 1)
{
std::cout << pop->topermute << std::endl;
}
for (int i = pop->place; i < pop->topermute.length (); ++i)
{
top = new State (swtch (pop->topermute, pop->place, i), pop->place + 1, i, top);
}
delete pop;
}
}
int main (int argc, char* argv[])
{
if (argc!=2)
{
std::cout<<"Proper input is 'permute string'";
return 1;
}
else
{
permute (argv[1]);
}
return 0;
}

Place helps you to know where is going to be the next character "swap". As you can see, it increments inside the for loop. As you can see, inside that for loop, it behaves like a pivot and i increments in order to behave like a permutator (by swapping characters)

What does the following statement in implementation of tries do

I have been trying to implement a C++ implementation of insertion of trie data-structure, working through a blog in which there are a few things I am unable to understand http://theoryofprogramming.com/2015/01/16/trie-tree-implementation/
#define ALPHABETS 26
#define CASE 'a'
#define MAX_WORD_SIZE 25
using namespace std;
struct Node
{
struct Node * parent;
struct Node * children[ALPHABETS];
vector<int> occurrences;
};
// Inserts a word 'text' into the Trie Tree
// 'trieTree' and marks it's occurence as 'index'.
void InsertWord(struct Node * trieTree, char word[], int index)
{
struct Node * traverse = trieTree;
while (*word != '\0') { // Until there is something to process
if (traverse->children[*word - CASE] == NULL) {
// There is no node in 'trieTree' corresponding to this alphabet
// Allocate using calloc(), so that components are initialised
traverse->children[*word - CASE] = (struct Node *) calloc(1, sizeof(struct Node));
traverse->children[*word - CASE]->parent = traverse; // Assigning parent
}
traverse = traverse->children[*word - CASE];
++word; // The next alphabet
}
traverse->occurrences.push_back(index); // Mark the occurence of the word
}
// Prints the 'trieTree' in a Pre-Order or a DFS manner
// which automatically results in a Lexicographical Order
void LexicographicalPrint(struct Node * trieTree, vector<char> word)
{
int i;
bool noChild = true;
if (trieTree->occurrences.size() != 0) {
// Condition trie_tree->occurrences.size() != 0,
// is a neccessary and sufficient condition to
// tell if a node is associated with a word or not
vector<char>::iterator charItr = word.begin();
while (charItr != word.end()) {
printf("%c", *charItr);
++charItr;
}
printf(" -> # index -> ");
vector<int>::iterator counter = trieTree->occurrences.begin();
// This is to print the occurences of the word
while (counter != trieTree->occurrences.end()) {
printf("%d, ", *counter);
++counter;
}
printf("\n");
}
for (i = 0; i < ALPHABETS; ++i) {
if (trieTree->children[i] != NULL) {
noChild = false;
word.push_back(CASE + i); // Select a child
// and explore everything associated with the cild
LexicographicalPrint(trieTree->children[i], word);
word.pop_back();
// Remove the alphabet as we dealt
// everything associated with it
}
}
word.pop_back();
}
int main()
{
int n, i;
vector<char> printUtil; // Utility variable to print tree
// Creating the Trie Tree using calloc
// so that the components are initialised
struct Node * trieTree = (struct Node *) calloc(1, sizeof(struct Node));
char word[MAX_WORD_SIZE];
printf("Enter the number of words-\n");
scanf("%d", &n);
for (i = 1; i <= n; ++i) {
scanf("%s", word);
InsertWord(trieTree, word, i);
}
printf("\n"); // Just to make the output more readable
LexicographicalPrint(trieTree, printUtil);
return 0;
}
I am unable to understand what this statement in insertword does:
if (traverse->children[*word - CASE] == NULL)
Also as we have initialised all the elements to 1 in the main function then how we can it be null?

The function InsertWord() dynamically adds a new word into the trie, and in the process, creates new nodes whenever that word's prefix does not match the prefix of another word already added in the trie.
This is exactly what your line is testing for. From what I can see, traverse is a pointer to the current Node of a prefix of the word. *word is the next character in the word after the prefix. If the node corresponding to the current k-prefix of the word doesn't have a child (pointer is NULL) with label corresponding to the next character, that means we have to allocate a new node for the next k+1-prefix of the word.

C++ malloc() memory corruption(fast)

I am fairly new to programming and am having memory issues with my program. Somewhere I am overusing memory, but can't find the source. I don't understand why it is giving me issues with malloc allocation as i don't dynamically allocate any variables. Thanks
//returns the index of the character in the string
int find(string line, int begin, int end, char character) {
for (int i = begin; i <= end; i++) {
if (line[i] == character) {
return i;
}
}
//return -1 if not found
return -1;
}
//Get the characters from levelorder that align with inorder
char* getCharacters(char inOrder[], char levelOrder[], int a, int b) {
char *newLevelOrder = new char[a];
int j = 0;
for (int i = 0; i <= b; i++)
if (find(inOrder, 0, a-1, levelOrder[i]) != -1)
newLevelOrder[j] = levelOrder[i], j++;
return newLevelOrder;
}
//creates a new Node given a character
Node* newNode(char character) {
Node *node = new Node;
node->character = character;
node->left = NULL;
node->right = NULL;
return node;
}
//creates the huffman tree from inorder and levelorder
Node* createInLevelTree(char inOrder[], char levelOrder[], int beginning, int end, int size) {
//if start index is out of range
if (beginning > end) {
return NULL;
}
//the head of the tree is the 1st item in level order's traversal
Node *head = newNode(levelOrder[0]);
//if there are no children we can't go farther down
if (beginning == end) {
return head;
}
//get the index of the node
int index = find(inOrder, beginning, end, head->character);
//get the subtree on the left
char *leftTree = getCharacters(inOrder, levelOrder, index, size);
//get the subtree on the right
char *rightTree = getCharacters(inOrder + index + 1, levelOrder, size-index-1, size);
//branch off to the left and right
head->left = createInLevelTree(inOrder, leftTree, beginning, index-1, size);
head->right = createInLevelTree(inOrder, rightTree, index+1, end, size);
//delete
delete [] leftTree;
delete [] rightTree;
return head;
}
Fixed with this line. Thanks Sam.
Char* new level order = new char [b]

Somewhere I am overusing memory, but can't find the source.
I'd suggest you at least replace your character arrays with std::vector<char> or std::string and put some size assertions in, or use the at member to see no over-indexing happens. Furthermore, using operator new more than likely is implemented in terms of malloc, and operator delete in terms of free. Therefore you are allocated dynamically.
Also, wiki for RAII. Try and employ RAII for dynamically allocated memory ... always. std::vector and std::string gives you this for free.
Also, consider the code below:
char* getCharacters(char inOrder[], char levelOrder[], int a, int b) {
char *newLevelOrder = new char[a];
int j = 0;
for (int i = 0; i <= b; i++)
if (find(inOrder, 0, a-1, levelOrder[i]) != -1)
newLevelOrder[j] = levelOrder[i], j++;
return newLevelOrder;
}
Reading this, I'm not sure of the quantity of b. There is no restriction imposed at the call sight. How do I know that the for loop won't invoke indefined behavior (by overindexing). Typically a correct for loop would use "a" here, as "a" was used to create the array... If you want to code like this, use asserts liberally, as you are making assumptions about the calling code (but just use a vector....).
char *newLevelOrder = new char[a];
int j = 0;
for (int i = 0; (i < a) && (i <= b); i++)
{
or
assert (b < a);
char *newLevelOrder = new char[a];
int j = 0;
for (int i = 0; (i <= b); i++)
{
I leave the task of replacing your arrays with vectors and string as an exercise for you, as well as liberally spraying asserts in for loops mentioned... That will likely solve your problems

push attribute data to trie, add to multiple keys

my knowledge is limited but I have been working (hacking) at this specific data structure for awhile
I use a trie to store ontology strings that are then returned as a stack including the 'gap' proximity when get (string) is called. As an add on the trie stores attributes on the key. The further down the string the greater the detail of the attribute. This is working well for my purposes.
As an additional add on, I use a wildcard to apply an attribute to all child nodes. For example, to add 'paws' to all subnodes of 'mammals.dogs.' I push(mammals.dogs.*.paws). Now, all dogs have paws.
The problem is only the first dog get paws. The function works for push attributes without wild
If you want I can clean this up and give a simplified version, but in the past i've found on stackoverflow it is better to just give the code; I use 'z' as the '*' wild
void Trie::push(ParseT & packet)
{
if (root==NULL) AddFirstNode(); // condition 1: no nodes exist, should this be in wrapper
const string codeSoFar=packet.ID;
AddRecord(root, packet, codeSoFar); //condotion 2: nodes exist
}
void Trie::AddFirstNode(){ // run-once, initial condition of first node
nodeT *tempNode=new nodeT;
tempNode->attributes.planType=0;
tempNode->attributes.begin = 0;
tempNode->attributes.end = 0;
tempNode->attributes.alt_end = 0;
root=tempNode;
}
//add record to trie with mutal recursion through InsertNode
//record is entered to trie one char at a time, char is removed
//from record and function repeats until record is Null
void Trie::AddRecord(nodeT *w, ParseT &packet, string codeSoFar)
{
if (codeSoFar.empty()) {
//copy predecessor vector at level n, overwrites higher level vectors
if (!packet.predecessorTemp.empty())
w->attributes.predecessorTemp = packet.predecessorTemp;
return; //condition 0: record's last char
}
else { //keep parsing down record path
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char || codeSoFar[0] == 'z') {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
InsertNode(w, packet, codeSoFar); //condition 3: no existing char --> mutal recursion
}
}
//AddRecord() helper function
void Trie::InsertNode(nodeT *w, ParseT &packet, string codeSoFar) // add new char to vector array
{
for (unsigned int i=0; i <=w->alpha.size(); i++) { // loop and insert tokens in sorted vector
if (i==w->alpha.size() || codeSoFar[0] < w->alpha[i].token_char) { //look for end of vector or indexical position
//create new TokenT
tokenT *tempChar=new tokenT;
tempChar->next=NULL;
tempChar->token_char=codeSoFar[0];
//create new nodeT
nodeT *tempLeaf=new nodeT;
tempLeaf->attributes.begin = 0;
tempLeaf->attributes.end = 0;
tempLeaf->attributes.planType = 0;
tempLeaf->attributes.alt_end = 0;
//last node
if (codeSoFar.size() == 1){
tempLeaf->attributes.predecessorTemp = packet.predecessorTemp;
}
//link TokenT with its nodeT
tempChar->next=tempLeaf;
AddRecord(tempLeaf, packet, codeSoFar.substr(1)); //mutual recursion --> add next char in record, if last char AddRecord will terminate
w->alpha.insert(w->alpha.begin()+i, *tempChar);
return;
}
}
}
root is global nodeT *w
struct ParseT {
string ID; //XML key
int begin = 0; //planned or actual start date
int end = 0; //planned or actual end date - if end is empty then assumed started but not compelted and flag with 9999 and
int alt_end = 0; //in case of started without completion 9999 case, then this holds expected end
int planType = 0; //actuals == 1, forecast == 2, planned == 3
map<string, string> aux;
vector<string> resourceTemp;
vector<string> predecessorTemp;
};

In this code
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char || codeSoFar[0] == 'z') {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
you are returning as soon as you call AddRecord, even if it is because of a wildcard. It might be easier to have a separate loop when codeSoFar[0] == 'z' that goes through all the alphas and adds the record. Then have an else clause that does your current code.
Edit: Here is what I meant, in code form:
else { //keep parsing down record path
// Handle wildcards
if (codeSoFar[0] == 'z') {
for (unsigned int i = 0; i < w->alpha.size(); i++) {
AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
else {
// Not a wildcard, look for a match
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char) {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
InsertNode(w, packet, codeSoFar); //condition 3: no existing char --> mutal recursion
}
}

What's the correct approach to solve SPOJ www.spoj.com/problems/PRHYME/?

I have been trying to solve this problem SPOJ www.spoj.com/problems/PRHYME/? for several days, but have had no success.
Here is the problem in brief:
Given is a wordlist L, and a word w. Your task is to find a word in L that forms a perfect rhyme with w. This word u is uniquely determined by these properties:
It is in L.
It is different from w.
Their common suffix is as long as possible.
Out of all words that satisfy the previous points, u is the lexicographically smallest one.
Length of a word will be<=30.
And number of words both in the dictionary and the queries can be 2,50,000.
I am using a trie to store all the words in the dictionary reversed.
Then to solve the queries I proceed in the following fashion:-
If word is present in the trie,delete it from trie.
Now traverse the trie from the root till the point the character from the query string match the trie values.Let this point where last character match was found be P.
Now from this point P onward ,I traverse the trie using DFS,and on encountering a leaf node,push the string formed to the possible results list.
Now I return the lexicographic ally smallest result from this list.
When I submit my solution on SPOJ,my solution gets a Time Limit Exceeded Error.
Can someone please suggest a detailed algorithm or hint to solve this problem ?
I can post my code if required.
#include<iostream>
#include<cstdio>
#include<cstring>
#include<climits>
#include<vector>
#include<string>
#include<algorithm>
#include<cctype>
#include<cstdlib>
#include<utility>
#include<map>
#include<queue>
#include<set>
#define ll long long signed int
#define ull unsigned long long int
const int alpha=26;
using namespace std;
struct node
{
int value;
node * child[alpha];
};
node * newnode()
{
node * newt=new node;
newt->value=0;
for(int i=0;i<alpha;i++)
{
newt->child[i]=NULL;
}
return newt;
}
struct trie
{
node * root;
int count;
trie()
{
count=0;
root=newnode();
}
};
trie * dict=new trie;
string reverse(string s)
{
int l=s.length();
string rev=s;
for(int i=0;i<l;i++)
{
int j=l-1-i;
rev[j]=s[i];
}
return rev;
}
void insert(string s)
{
int l=s.length();
node * ptr=dict->root;
dict->count++;
for(int i=0;i<l;i++)
{
int index=s[i]-'a';
if(ptr->child[index]==NULL)
{
ptr->child[index]=newnode();
}
ptr=ptr->child[index];
}
ptr->value=dict->count;
}
void dfs1(node *ptr,string p)
{
if(ptr==NULL) return;
if(ptr->value) cout<<"word" <<p<<endl;
for(int i=0;i<26;i++)
{
if(ptr->child[i]!=NULL)
dfs1(ptr->child[i],p+char('a'+i));
}
}
vector<string> results;
pair<node *,string> search(string s)
{
int l=s.length();
node * ptr=dict->root;
node *save=ptr;
string match="";
int i=0;
bool no_match=false;
while(i<l and !no_match)
{
int in=s[i]-'a';
if(ptr->child[in]==NULL)
{
save=ptr;
no_match=true;
}
else
{
ptr=ptr->child[in];
save=ptr;
match+=in+'a';
}
i++;
}
//cout<<s<<" matched till here"<<match <<" "<<endl;
return make_pair(save,match);
}
bool find(string s)
{
int l=s.length();
node * ptr=dict->root;
string match="";
for(int i=0;i<l;i++)
{
int in=s[i]-'a';
//cout<<match<<"match"<<endl;
if(ptr->child[in]==NULL)
{
return false;
}
ptr=ptr->child[in];
match+=char(in+'a');
}
//cout<<match<<"match"<<endl;
return true;
}
bool leafNode(node *pNode)
{
return (pNode->value != 0);
}
bool isItFreeNode(node *pNode)
{
int i;
for(i = 0; i < alpha; i++)
{
if( pNode->child[i] )
return false;
}
return true;
}
bool deleteHelper(node *pNode, string key, int level, int len)
{
if( pNode )
{
// Base case
if( level == len )
{
if( pNode->value )
{
// Unmark leaf node
pNode->value = 0;
// If empty, node to be deleted
if( isItFreeNode(pNode) )
{
return true;
}
return false;
}
}
else // Recursive case
{
int index = (key[level])-'a';
if( deleteHelper(pNode->child[index], key, level+1, len) )
{
// last node marked, delete it
free(pNode->child[index]);
pNode->child[index]=NULL;
// recursively climb up, and delete eligible nodes
return ( !leafNode(pNode) && isItFreeNode(pNode) );
}
}
}
return false;
}
void deleteKey(string key)
{
int len = key.length();
if( len > 0 )
{
deleteHelper(dict->root, key, 0, len);
}
}
string result="***";
void dfs(node *ptr,string p)
{
if(ptr==NULL) return;
if(ptr->value )
{
if((result)=="***")
{
result=reverse(p);
}
else
{
result=min(result,reverse(p));
}
}
for(int i=0;i<26;i++)
{
if(ptr->child[i]!=NULL)
dfs(ptr->child[i],p+char('a'+i));
}
}
int main(int argc ,char ** argv)
{
#ifndef ONLINE_JUDGE
freopen("prhyme.in","r",stdin);
#endif
string s;
while(getline(cin,s,'\n'))
{
if(s[0]<'a' and s[0]>'z')
break;
int l=s.length();
if(l==0) break;
string rev;//=new char[l+1];
rev=reverse(s);
insert(rev);
//cout<<"...........traverse..........."<<endl;
//dfs(dict->root);
//cout<<"..............traverse end.............."<<endl;
}
while(getline(cin,s))
{
results.clear();
//cout<<s<<endl;
int l=s.length();
if(!l) break;
string rev;//=new char[l+1];
rev=reverse(s);
//cout<<rev<<endl;
bool del=false;
if(find(rev))
{
del=true;
//cout<<"here found"<<endl;
deleteKey(rev);
}
if(find(rev))
{
del=true;
//cout<<"here found"<<endl;
deleteKey(rev);
}
else
{
//cout<<"not here found"<<endl;
}
// cout<<"...........traverse..........."<<endl;
//dfs1(dict->root,"");
// cout<<"..............traverse end.............."<<endl;
pair<node *,string> pp=search(rev);
result="***";
dfs(pp.first,pp.second);
//cout<<"search results"<<endl;
//dfs1(pp.first,pp.second);
//cout<<"end of search results"<<
for(int i=0;i<results.size();i++)
{
results[i]=reverse(results[i]);
// cout<<s<<" "<<results[i]<<endl;
}
string smin=result;
if(del)
{
insert(rev);
}
cout<<smin<<endl;
}
return 0;
}

Your algorithm (using a trie that stores all reversed words) is a good start. But one issue with it is that for each lookup, you have to enumerate all words with a certain suffix in order to find the lexicographically smallest one. For some cases, this can be a lot of work.
One way to fix this: In each node (corresponding to each suffix), store the two lexicographically smallest words that have that suffix. This is easy to maintain while building the trie by updating all ancestor nodes of each newly added leaf (see pseudo-code below).
Then to perform a lookup of a word w, start at the node corresponding to the word, and go up in the tree until you reach a node which contains a descendant word other than w. Then return the lexicographically smallest word stored in that node, or the second smallest in case the smallest is equal to w.
To create the trie, the following pseudo-code can be used:
for each word:
add word to trie
let n be the node corresponding to the new word.
for each ancestor a of n (including n):
if a.smallest==null or word < a.smallest:
a.second_smallest = a.smallest
a.smallest = word
else if a.second_smallest==null or word < a.second_smallest:
a.second_smallest = word
To lookup a word w:
let n be the node corresponding to longest possible suffix of w.
while ((n.smallest==w || n.smallest==null) &&
(n.second_smallest==w || n.second_smallest==null)):
n = n.parent
if n.smallest==w:
return n.second_smallest
else:
return n.smallest
Another similar possibility is to use a hash table mapping all suffixes to the two lexicographically smallest words instead of using a trie. This is probably easier to implement if you can use std::unordered_map.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

print all words of a dictionary using trie - c++

you can try to use node.child[i]->c,when use struct var you must use a ".",when use struct point must use "->" or "(&point).",i don't know my think is true : )

Related

Permutations of String Using Stack C++

What does the following statement in implementation of tries do

C++ malloc() memory corruption(fast)

push attribute data to trie, add to multiple keys

What's the correct approach to solve SPOJ www.spoj.com/problems/PRHYME/?

Categories

Resources