I know similar issue has been answered at this link Help me fix this C++ std::set comparator
but unfortunately I am facing exactly same issue and I am unable to understand the reason behind it thus need some help to resolve it.
I am using VS2010 and my release binary is running fine without any issue but the debug binary reports:
My comparator looks like this:
struct PathComp {
bool operator() (const wchar_t* path1, const wchar_t* path2) const
{
int c = wcscmp(path1, path2);
if (c < 0 || c > 0) {
return true;
}
return false;
}
};
My set is declared like this:
set<wchar_t*,PathComp> pathSet;
Could somebody suggest me why my debug binary is failing at this assertion? Is it because I am using wcscmp() function to compare the wide character string getting stored in my set?
Thanks in advance!!!
std::set requires a valid comperator that behaves like operator< or std::less.
The std::set code detected that your operator< is not valid, and as a help to you triggered the assert you showed.
And indeed: your comperator looks like an operator!=, not like an operator<
One of the rules an operator< should follow, is that a<b and b<a cannot be both true. In your implementation, it is.
Correct your code to:
bool operator() (const wchar_t* path1, const wchar_t* path2) const
{
int c = wcscmp(path1, path2);
return (c < 0);
}
and you should be fine.
The problem is that your comparator does not induce a strict-weak ordering. It should only really return true for paths that are "less" - not for all that are different. Change it to:
struct PathComp {
bool operator() (const wchar_t* path1, const wchar_t* path2) const
{
int c = wcscmp(path1, path2);
if (c < 0) { // <- this is different
return true;
}
return false;
}
};
Alternatively, using only c > 0 will also work - but the set will have a reverse order.
The algorithm needs to know the difference between smaller and greater to work, just unequal does not give enough information.
Without smaller-than/greater-than information, the set cannot possibly maintain an order - but that is what a set is all about.
After spending some more time on it we finally decided to take another approach which worked for me.
So we converted wchar_t* to string using this method:
// Converts LPWSTR to string
bool convertLPWSTRToString(string& str, const LPWSTR wStr)
{
bool b = false;
char* p = 0;
int bSize;
// get the required buffer size in bytes
bSize = WideCharToMultiByte(CP_UTF8,
0,
wStr,-1,
0,0,
NULL,NULL);
if (bSize > 0) {
p = new char[bSize];
int rc = WideCharToMultiByte(CP_UTF8,
0,
wStr,-1,
p,bSize,
NULL,NULL);
if (rc != 0) {
p[bSize-1] = '\0';
str = p;
b = true;
}
}
delete [] p;
return b;
}
And then stored that string in the in the set, by doing this I didn't had to worry about comparing the elements getting stored to make sure that all entries are unique.
// set that will hold unique path
set<string> strSet;
So all I had to do was this:
string str;
convertLPWSTRToString(str, FileName);
// store path in the set
strSet.insert(str);
Though I still don't know what was causing "Debug Assertion Failed" issue when I was using a set comparator (PathComp) for set<wchar_t*,PathComp> pathSet;
Related
I have a vector of strings I that pass to my function and I need to compare it with some pre-defined values. What is the fastest way to do this?
The following code snippet shows what I need to do (This is how I am doing it, but what is the fastest way of doing this):
bool compare(vector<string> input1,vector<string> input2)
{
if(input1.size() != input2.size()
{
return false;
}
for(int i=0;i<input1.siz();i++)
{
if(input1[i] != input2[i])
{
return false;
}
}
return true;
}
int compare(vector<string> inputData)
{
if (compare(inputData,{"Apple","Orange","three"}))
{
return 129;
}
if (compare(inputData,{"A","B","CCC"}))
{
return 189;
}
if (compare(inputData,{"s","O","quick"}))
{
return 126;
}
if (compare(inputData,{"Apple","O123","three","four","five","six"}))
{
return 876;
}
if (compare(inputData,{"Apple","iuyt","asde","qwe","asdr"}))
{
return 234;
}
return 0;
}
Edit1
Can I compare two vector like this:
if(inputData=={"Apple","Orange","three"})
{
return 129;
}
You are asking what is the fastest way to do this, and you are indicating that you are comparing against a set of fixed and known strings. I would argue that you would probably have to implement it as a kind of state machine. Not that this is very beautiful...
if (inputData.size() != 3) return 0;
if (inputData[0].size() == 0) return 0;
const char inputData_0_0 = inputData[0][0];
if (inputData_0_0 == 'A') {
// possibly "Apple" or "A"
...
} else if (inputData_0_0 == 's') {
// possibly "s"
...
} else {
return 0;
}
The weakness of your approach is its linearity. You want a binary search for teh speedz.
By utilising the sortedness of a map, the binaryness of finding in one, and the fact that equivalence between vectors is already defined for you (no need for that first compare function!), you can do this quite easily:
std::map<std::vector<std::string>, int> lookup{
{{"Apple","Orange","three"}, 129},
{{"A","B","CCC"}, 189},
// ...
};
int compare(const std::vector<std::string>& inputData)
{
auto it = lookup.find(inputData);
if (it != lookup.end())
return it->second;
else
return 0;
}
Note also the reference passing for extra teh speedz.
(I haven't tested this for exact syntax-correctness, but you get the idea.)
However! As always, we need to be context-aware in our designs. This sort of approach is more useful at larger scale. At the moment you only have a few options, so the addition of some dynamic allocation and sorting and all that jazz may actually slow things down. Ultimately, you will want to take my solution, and your solution, and measure the results for typical inputs and whatnot.
Once you've done that, if you still need more speed for some reason, consider looking at ways to reduce the dynamic allocations inherent in both the vectors and the strings themselves.
To answer your follow-up question: almost; you do need to specify the type:
// new code is here
// ||||||||||||||||||||||||
if (inputData == std::vector<std::string>{"Apple","Orange","three"})
{
return 129;
}
As explored above, though, let std::map::find do this for you instead. It's better at it.
One key to efficiency is eliminating needless allocation.
Thus, it becomes:
bool compare(
std::vector<std::string> const& a,
std::initializer_list<const char*> b
) noexcept {
return std::equal(begin(a), end(a), begin(b), end(b));
}
Alternatively, make them static const, and accept the slight overhead.
As an aside, using C++17 std::string_view (look at boost), C++20 std::span (look for the Guideline support library (GSL)) also allows a nicer alternative:
bool compare(std::span<std::string> a, std::span<std::string_view> b) noexcept {
return a == b;
}
The other is minimizing the number of comparisons. You can either use hashing, binary search, or manual ordering of comparisons.
Unfortunately, transparent comparators are a C++14 thing, so you cannot use std::map.
If you want a fast way to do it where the vectors to compare to are not known in advance, but are reused so can have a little initial run-time overhead, you can build a tree structure similar to the compile time version Dirk Herrmann has. This will run in O(n) by just iterating over the input and following a tree.
In the simplest case, you might build a tree for each letter/element. A partial implementation could be:
typedef std::vector<std::string> Vector;
typedef Vector::const_iterator Iterator;
typedef std::string::const_iterator StrIterator;
struct Node
{
std::unique_ptr<Node> children[256];
std::unique_ptr<Node> new_str_child;
int result;
bool is_result;
};
Node root;
int compare(Iterator vec_it, Iterator vec_end, StrIterator str_it, StrIterator str_end, const Node *node);
int compare(const Vector &input)
{
return compare(input.begin(), input.end(), input.front().begin(), input.front().end(), &root);
}
int compare(Iterator vec_it, Iterator vec_end, StrIterator str_it, StrIterator str_end, const Node *node)
{
if (str_it != str_end)
{
// Check next character
auto next_child = node->children[(unsigned char)*str_it].get();
if (next_child)
return compare(vec_it, vec_end, str_it + 1, str_end, next_child);
else return -1; // No string matched
}
// At end of input string
++vec_it;
if (vec_it != vec_end)
{
auto next_child = node->new_str_child.get();
if (next_child)
return compare(vec_it, vec_end, vec_it->begin(), vec_it->end(), next_child);
else return -1; // Have another string, but not in tree
}
// At end of input vector
if (node->is_result)
return node->result; // Got a match
else return -1; // Run out of input, but all possible matches were longer
}
Which can also be done without recursion. For use cases like yours you will find most nodes only have a single success value, so you can collapse those into prefix substrings, to use the OP example:
"A"
|-"pple" - new vector - "O" - "range" - new vector - "three" - ret 129
| |- "i" - "uyt" - new vector - "asde" ... - ret 234
| |- "0" - "123" - new vector - "three" ... - ret 876
|- new vector "B" - new vector - "CCC" - ret 189
"s" - new vector "O" - new vector "quick" - ret 126
you could make use of std::equal function like below :
bool compare(vector<string> input1,vector<string> input2)
{
if(input1.size() != input2.size()
{
return false;
}
return std::equal(input1.begin(), input2.end(), input2.begin())
}
Can I compare two vector like this
The answer is No, you need compare a vector with another vector, like this:
vector<string>data = {"ab", "cd", "ef"};
if(data == vector<string>{"ab", "cd", "efg"})
cout << "Equal" << endl;
else
cout << "Not Equal" << endl;
What is the fastest way to do this?
I'm not an expert of asymptotic analysis but:
Using the relational operator equality (==) you have a shortcut to compare two vectors, first validating the size and, second, each element on them. This way provide a linear execution (T(n), where n is the size of vector) which compare each item of the vector, but each string must be compared and, generally, it is another linear comparison (T(m), where m is the size of the string).
Suppose that each string has de same size (m) and you have a vector of size n, each comparison could have a behavior of T(nm).
So:
if you want a shortcut to compare two vector you can use the
relational operator equality.
If you want an program which perform a fast comparison you should look for some algorithm for compare strings.
bool fitsKey3(string n) {
int ncheck = str.length(n);
if (ncheck = KEY3) {
return true;
} else {
return false;
}
}
The above function uses a string "n" that is a string given to the function from an input file. I want to write this function that checks the length of this "identifier code" from the input file (it's a drone project), and if the length of the security code is equal to the constant integer "KEY3 (= 50), it returns true. Otherwise, return false.
How do I fix this setup?
= assigns the value of KEY3 to ncheck.
== compares ncheck and KEY3 for equality.
Also, unless you're being paid by lines of code, I'd suggest using the much simpler and clearer form:
return n.length() == KEY3;
(I corrected your usage of the length() member function, since I suppose it was only a typo.)
And as pointed out by Anon Mail, unless you want to make a copy of the string every time you call the function, I'd suggest only passing a reference to it (const because you're not modifying it):
bool fitsKey3(string const& n)
I would write it like this:
bool fitsKey3(string n) {
return n.length() == KEY3;
}
You do two operations:
Get string length by n.length()
Compare the length with a KEY3 (NB: use == to compare)
I am writing a class called Word, that handles a c string and overloads the
<, >, <=, >= operators.
word.h:
friend bool operator<(const Word &a, const Word &b);
word.cc:
bool operator<(const Word &a, const Word &b) {
if(a == NULL && b == NULL)
return false;
if(a == NULL)
return true;
if(b == NULL)
return false;
return strcmp(a.wd, b.wd) < 0; //wd is a valid c string, EDIT: changed to strcmp
}
main:
char* temp = NULL; //EDIT: i was mistaken, temp is a char pointer
Word a("blah"); //a.wd = [b,l,a,h]
cout << (temp<a);
I get a segmentation error before the first line of the operator< method
after the last line in the main. I can correct the problem by writing
cout << (a>temp);
where the operator> is similarly defined and I get no errors but my
assignment requires (temp < a) to work so this is where I ask for help.
EDIT: I made a mistake the first time and i said temp was of type Word,
but it is actually of type char*. So I assume that the compiler converts
temp to a Word using one of my constructors. I dont know which one it would
use and why this would work since the first parameter is not Word.
here is the constructor I think is being used to make the Word using temp:
Word::Word(char* c, char* delimeters="\n") {
char *temporary = "\0";
if(c == NULL)
c = temporary;
check(stoppers!=NULL, "(Word(char*,char*))NULL pointer"); // exits the program if the expression is false
if(strlen(c) == 0)
size = DEFAULT_SIZE; //10
else
size = strlen(c) + 1 + DEFAULT_SIZE;
wd = new char[size];
check(wd!=NULL, "Word(char*,char*))heap overflow");
delimiters = new char[strlen(stoppers) + 1]; // EDIT: changed to []
check(delimiters!=NULL,"Word(char*,char*))heap overflow");
strcpy(wd,c);
strcpy(delimiters,stoppers);
count = strlen(wd);
}
wd is of type char*
thanks for looking at this big question and trying to help. let me know if you
need more code to look at
I'm almost positive you did not mean to construct a char on the heap with an initial value of some integer based on the size of stoppers:
delimiters = new char(strlen(stoppers) + 1); // Should use [] not ()
Also you are using C++ and I would never tell you what to do, but please, unless you know exactly that there is no danger, do not use strcpy. For exactly this reason.
It is a blind copy of strings, and when the destination does not have enough space (as is the case from your typo-ed allocation), things go BAD.
EDIT:
I also see in your overload of operator< that you use
a.wd < b.wd
and claim that .wds are valid C strings. If that is the case, you cannot apply a simple < operator to them and must use strcmp, strncmp or some other full compare function
Cutting out the other bits of the constructor:
Word::Word(char* c, char* delimeters=NULL) {
check(stoppers!=NULL, "(Word(char*,char*))NULL pointer"); //exits the program if the expression is false
delimiters = new char[strlen(stoppers) + 1];
check(delimiters!=NULL,"Word(char*,char*))heap overflow");
strcpy(delimiters,stoppers);
}
You are allocating and copying to the input parameter (delimiters) not the member variable (stoppers). Therefore, when you call:
delimiters = new char[strlen(stoppers) + 1];
Here, stoppers == NULL (infered from the check call) so strlen(NULL) crashes.
Also, in:
bool operator<(const Word &a, const Word &b)
You check things like a == NULL. This is not needed as a and b are references, so the objects are non-null.
If wd can be null, you will need to change these to check a.wd and b.wd.
Just to clarify that I also think the title is a bit silly. We all know that most built-in functions of the language are really well written and fast (there are ones even written by assembly). Though may be there still are some advices for my situation. I have a small project which demonstrates the work of a search engine. In the indexing phase, I have a filter method to filter out unnecessary things from the keywords. It's here:
bool Indexer::filter(string &keyword)
{
// Remove all characters defined in isGarbage method
keyword.resize(std::remove_if(keyword.begin(), keyword.end(), isGarbage) - keyword.begin());
// Transform all characters to lower case
std::transform(keyword.begin(), keyword.end(), keyword.begin(), ::tolower);
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (keyword.size() == 0 || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
At first sign, these functions (alls are member functions of STL container or standard function) are supposed to be fast and not take many time in the indexing phase. But after profiling with Valgrind, the inclusive cost of this filter is ridiculous high: 33.4%. There are three standard functions of this filter take most of the time for that percentage: std::remove_if takes 6.53%, std::set::find takes 15.07% and std::transform takes 7.71%.
So if there are any thing I can do (or change) to reduce the instruction times cost by this filter (like using parallellizing or something like that), please give me your advice. Thanks in advance.
UPDATE: Thanks for all your suggestion. So in brief, I've summarize what I need to do is:
1) Merge tolower and remove_if into one by construct my own loop.
2) Use unordered_set instead of set for faster find method.
Thus I've chosen Mark_B's as the right answer.
First, are you certain that optimization and inlining are enabled when you compile?
Assuming that's the case, I would first try writing my own transformer that combines removing garbage and lower-casing into one step to prevent iterating over the keyword that second time.
There's not a lot you can do about the find without using a different container such as unordered_set as suggested in a comment.
Is it possible for your application that doing the filtering really just is a really CPU-intensive part of the operation?
If you use a boost filter iterator you can merge the remove_if and transform into one, something like (untested):
keyword.erase(std::transform(boost::make_filter_iterator(!boost::bind(isGarbage), keyword.begin(), keyword.end()),
boost::make_filter_iterator(!boost::bind(isGarbage), keyword.end(), keyword.end()),
keyword.begin(),
::tolower), keyword.end());
This is assuming you want the side effect of modifying the string to still be visible externally, otherwise pass by const reference instead and just use count_if and a predicate to do all in one. You can build a hierarchical data structure (basically a tree) for the list of stop words that makes "in-place" matching possible, for example if your stop words are SELECT, SELECTION, SELECTED you might build a tree:
|- (other/empty accept)
\- S-E-L-E-C-T- (empty, fail)
|- (other, accept)
|- I-O-N (fail)
\- E-D (fail)
You can traverse a tree structure like that simultaneously whilst transforming and filtering without any modifications to the string itself. In reality you'd want to compact the multi-character runs into a single node in the tree (probably).
You can build such a data structure fairly trivially with something like:
#include <iostream>
#include <map>
#include <memory>
class keywords {
struct node {
node() : end(false) {}
std::map<char, std::unique_ptr<node>> children;
bool end;
} root;
void add(const std::string::const_iterator& stop, const std::string::const_iterator c, node& n) {
if (!n.children[*c])
n.children[*c] = std::unique_ptr<node>(new node);
if (stop == c+1) {
n.children[*c]->end = true;
return;
}
add(stop, c+1, *n.children[*c]);
}
public:
void add(const std::string& str) {
add(str.end(), str.begin(), root);
}
bool match(const std::string& str) const {
const node *current = &root;
std::string::size_type pos = 0;
while(current && pos < str.size()) {
const std::map<char,std::unique_ptr<node>>::const_iterator it = current->children.find(str[pos++]);
current = it != current->children.end() ? it->second.get() : nullptr;
}
if (!current) {
return false;
}
return current->end;
}
};
int main() {
keywords list;
list.add("SELECT");
list.add("SELECTION");
list.add("SELECTED");
std::cout << list.match("TEST") << std::endl;
std::cout << list.match("SELECT") << std::endl;
std::cout << list.match("SELECTOR") << std::endl;
std::cout << list.match("SELECTED") << std::endl;
std::cout << list.match("SELECTION") << std::endl;
}
This worked as you'd hope and gave:
0
1
0
1
1
Which then just needs to have match() modified to call the transformation and filtering functions appropriately e.g.:
const char c = str[pos++];
if (filter(c)) {
const std::map<char,std::unique_ptr<node>>::const_iterator it = current->children.find(transform(c));
}
You can optimise this a bit (compact long single string runs) and make it more generic, but it shows how doing everything in-place in one pass might be achieved and that's the most likely candidate for speeding up the function you showed.
(Benchmark changes of course)
If a call to isGarbage() does not require synchronization, then parallelization should be the first optimization to consider (given of course that filtering one keyword is a big enough task, otherwise parallelization should be done one level higher). Here's how it could be done - in one pass through the original data, multi-threaded using Threading Building Blocks:
bool isGarbage(char c) {
return c == 'a';
}
struct RemoveGarbageAndLowerCase {
std::string result;
const std::string& keyword;
RemoveGarbageAndLowerCase(const std::string& keyword_) : keyword(keyword_) {}
RemoveGarbageAndLowerCase(RemoveGarbageAndLowerCase& r, tbb::split) : keyword(r.keyword) {}
void operator()(const tbb::blocked_range<size_t> &r) {
for(size_t i = r.begin(); i != r.end(); ++i) {
if(!isGarbage(keyword[i])) {
result.push_back(tolower(keyword[i]));
}
}
}
void join(RemoveGarbageAndLowerCase &rhs) {
result.insert(result.end(), rhs.result.begin(), rhs.result.end());
}
};
void filter_garbage(std::string &keyword) {
RemoveGarbageAndLowerCase res(keyword);
tbb::parallel_reduce(tbb::blocked_range<size_t>(0, keyword.size()), res);
keyword = res.result;
}
int main() {
std::string keyword = "ThIas_iS:saome-aTYpe_Ofa=MoDElaKEYwoRDastrang";
filter_garbage(keyword);
std::cout << keyword << std::endl;
return 0;
}
Of course, the final code could be improved further by avoiding data copying, but the goal of the sample is to demonstrate that it's an easily threadable problem.
You might make this faster by making a single pass through the string, ignoring the garbage characters. Something like this (pseudo-code):
std::string normalizedKeyword;
normalizedKeyword.reserve(keyword.size())
for (auto p = keyword.begin(); p != keyword.end(); ++p)
{
char ch = *p;
if (!isGarbage(ch))
normalizedKeyword.append(tolower(ch));
}
// then search for normalizedKeyword in stopwords
This should eliminate the overhead of std::remove_if, although there is a memory allocation and some new overhead of copying characters to normalizedKeyword.
The problem here isn't the standard functions, it's your use of them. You are making multiple passes over your string when you obviously need to be doing only one.
What you need to do probably can't be done with the algorithms straight up, you'll need help from boost or rolling your own.
You should also carefully consider whether resizing the string is actually necessary. Yeah, you might save some space but it's going to cost you in speed. Removing this alone might account for quite a bit of your operation's expense.
Here's a way to combine the garbage removal and lower-casing into a single step. It won't work for multi-byte encoding such as UTF-8, but neither did your original code. I assume 0 and 1 are both garbage values.
bool Indexer::filter(string &keyword)
{
static char replacements[256] = {1}; // initialize with an invalid char
if (replacements[0] == 1)
{
for (int i = 0; i < 256; ++i)
replacements[i] = isGarbage(i) ? 0 : ::tolower(i);
}
string::iterator tail = keyword.begin();
for (string::iterator it = keyword.begin(); it != keyword.end(); ++it)
{
unsigned int index = (unsigned int) *it & 0xff;
if (replacements[index])
*tail++ = replacements[index];
}
keyword.resize(tail - keyword.begin());
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (keyword.size() == 0 || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
The largest part of your timing is the std::set::find so I'd also try std::unordered_set to see if it improves things.
I would implement it with lower level C functions, something like this maybe (not checking this compiles), doing the replacement in place and not resizing the keyword.
Instead of using a set for garbage characters, I'd add a static table of all 256 characters (yeah, it will work for ascii only), with 0 for all characters that are ok, and 1 for those who should be filtered out. something like:
static const char GARBAGE[256] = { 1, 1, 1, 1, 1, ...., 0, 0, 0, 0, 1, 1, ... };
then for each character in offset pos in const char *str you can just check if (GARBAGE[str[pos]] == 1);
this is more or less what an unordered set does, but will have much less instructions. stopwords should be an unordered set if they're not.
now the filtering function (I'm assuming ascii/utf8 and null terminated strings here):
bool Indexer::filter(char *keyword)
{
char *head = pos;
char *tail = pos;
while (*head != '\0') {
//copy non garbage chars from head to tail, lowercasing them while at it
if (!GARBAGE[*head]) {
*tail = tolower(*head);
++tail; //we only advance tail if no garbag
}
//head always advances
++head;
}
*tail = '\0';
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (tail == keyword || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
I have a project that I am doing, the main objective is to load a list of words (and lots of them 15k+) into a data structure and then do a search on that structure. I did a little research and as far as I can tell a hash table would be best for this (correct me if I am wrong, I looked into tries as well)
Here's the tricky part: I cannot use any STL's for this project. So as far as I can tell I am going to have to write my own hash table class or find one that pretty much works. I understand how has tables work on a basic level but I am not sure I know enough to write a whole one by myself.
I looked around Google and I could not find any suitable sample code.
My question is does anyone know how to do this in c++ and/or where I can find some code to start off with. I need 3 basic functions for the table: insert, search, remove.
Things to remember while you think about this:
The Number 1 Concern is SPEED! this needs to be lighting fast, no concern for system resources. From the reading that I have done, a hash table can do better than O(log n) Consider mutithreading?
Cannot use STL!
I think, sorted array of strings + binary search should be pretty efficient.
std::unordered_map is not STL
http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html
Not entirely clear on all of the restrictions, but assuming you cannot use anything from std, you could write a simple class like the one below to do the job. We will use an array of buckets to store the data, then use a hash function to turn a string into a number in the range 0...MAX_ELEMENTS. each bucket will hold a linked list of strings, so you can retrieve information again. Typically o(1) insertion and find.
Note that for a more effective solution, you may wish to use a vector rather than a fixed length array as I have gone for. There is also minimal error checking and other improvements, but this should get you started.
NOTE you will need to implement your own string hashing function, you can find plenty of these on the net.
class dictionary
{
struct data
{
char* word = nullptr;
data* next = nullptr;
~data()
{
delete [] word;
}
};
public:
const unsigned int MAX_BUCKETS;
dictionary(unsigned int maxBuckets = 1024)
: MAX_BUCKETS(maxBuckets)
, words(new data*[MAX_BUCKETS])
{
memset(words, 0, sizeof(data*) * MAX_BUCKETS);
}
~dictionary()
{
for (int i = 0; i < MAX_BUCKETS; ++i)
delete words[i];
delete [] words;
}
void insert(const char* word)
{
const auto hash_index = hash(word);
auto& d = words[hash_index];
if (d == nullptr)
{
d = new data;
copy_string(d, word);
}
else
{
while (d->next != nullptr)
{
d = d->next;
}
d->next = new data;
copy_string(d->next, word);
}
}
void copy_string(data* d, const char* word)
{
const auto word_length = strlen(word)+1;
d->word = new char[word_length];
strcpy(d->word, word);
printf("%s\n", d->word);
}
const char* find(const char* word) const
{
const auto hash_index = hash(word);
auto& d = words[hash_index];
if (d == nullptr)
{
return nullptr;
}
while (d != nullptr)
{
printf("checking %s with %s\n", word, d->word);
if (strcmp(d->word, word) == 0)
return d->word;
d = d->next;
}
return nullptr;
}
private:
unsigned int hash(const char* word) const
{
// :TODO: write your own hash function here
const unsigned int num = 0; // :TODO:
return num % MAX_BUCKETS;
}
data** words;
};
http://wikipedia-clustering.speedblue.org/trie.php
Seems the above link is down at the moment.
Alternative link:
https://web.archive.org/web/20160426224744/http://wikipedia-clustering.speedblue.org/trie.php
Source Code: https://web.archive.org/web/20160426224744/http://wikipedia-clustering.speedblue.org/download/libTrie-0.1.tgz