C++ storing functions and operators in a structure - c++

How to improve a data structure for storing functions in arithmetic parser converting from infix to postfix notation?
At this moment I am using an array of char arrays:
char *funct[] = { "sin", "cos", "tan"... }
char text[] = "tan";
This impementation is a little bit confused and leads to the following comparisions, if we test char to be a function
if ( strcmp ( funct[0], text) == 0 ) || ( strcmp ( funct[1], "text ) == 0 ) || ( strcmp ( func[2], text) == 0 ))
{
... do something
}
( or to the for cycle version).
If there are a lot of functions (and a lot of comparisions), the index referencing leads to errors and it is not clear. There is also a necessity to change the index when we remove/add a new function....
How to improve such a structure so as it is easy to read, easy to maintain and easy to scale up?
I was thinking about enum
typedef enum
{
Fsin=0,
Fcos,
Ftan
} TFunctions;
which results to
if ( strcmp ( funct[Fsin], text) == 0 ) || ( strcmp ( funct[Fcos], "text ) == 0 ) || ( strcmp ( func[Ftan], text) == 0 ))
{
...
but there may be a better solution...

You can use std::map.
enum functions
{
sin,
cos,
tan
};
std::map<std::string, unsigned char> func_map;
func_map["sin"] = sin;
func_map["cos"] = cos;
func_map["tan"] = tan;
// then:
std::string text = "cos";
std::map<char*, unsigned char>::iterator it;
it = func_map.find(text);
if(it != func_map.end())
{
// ELEMENT FOUND
unsigned char func_id = it->second;
}
else
{
// NOT FOUND
}

For fastest code you may have some kind of map as follow:
typedef std::map<std::string, func_t> func_map;
func_map fm;
fm["sin"] = sin_func(); // get value of this entry from somewhere
fm["cos"] = cos_func(); // for example sin_func or cos_func
auto i = fm.find( "sin" );
if( i != fm.end() ) {
func_t f = i->second; // value found, we may use it.
}
Also if there is really a lot of items you may use std::unordered_map instead of std::map

Related

Efficient way to map a long list of strings to enums

(I am not sure how to correctly call my problem. Please edit the title if there as better way to state it)
In an API I'm using I have a certain enumerator type with 2000+ enumarator values. My objective is to have a function that will take in a string and return the corresponding enumerator. (The enumerator values and the strings2 have almost the same names.)
My idea was to have nested if statements like this:
(queriedEntityNameString is a std::string and queriedEntityType is a enum type)
//P
if (queriedEntityNameString[0] == 'p'){
//PO
if (queriedEntityNameString[1] == 'o'){
if (queriedEntityNameString == "point")
queriedEntityType = et_point;
else if (queriedEntityNameString == "pocket")
queriedEntityType = et_pocket;
else if (queriedEntityNameString == "point_on_curve")
queriedEntityType = et_point_on_curve;
}
//PR
else if (queriedEntityNameString[1] == 'r'){
if (queriedEntityNameString == "product")
queriedEntityType = et_product;
else if (queriedEntityNameString == "product_definition")
queriedEntityType = et_product_definition;
else if (queriedEntityNameString == "product_definition_formation")
queriedEntityType = et_product_definition_formation;
}
}
//Q
else if (queriedEntityNameString[0] == 'q'){
//QU
if (queriedEntityNameString[1] == 'u'){
if (queriedEntityNameString == "qualified_representation_item")
queriedEntityType = et_qualified_representation_item;
else if (queriedEntityNameString == "quantified_assembly_component_usage")
queriedEntityType = et_quantified_assembly_component_usage;
}
}
Obviously setting this up for 2000 cases manually is "impossible". I could write up a little script that could do this for me, but is this sort script available somewhere?
My second idea was to put everything in a std::map, but I'm not sure how efficient it would be.
I would like any suggestions on the best way to approach this problem.
Your chain of if-then-else conditions builds a "poor man's" trie data structure. It is efficient, but it is probably an overkill in your situation.
A better approach is to use std::unordered_map, a hash-based container, which retrieves values associated with keys in O(k), where k is the size of the key (in your case, max length of the string).
Note that the time is not dependent on how many items you have in your map. This is different from std::map, which needs O(log2n * k) time, where n is the number of items in the map.
How fast do you really need it to be?
The simplest way is to use an array, assuming that your enums are ordered 0 to n-1:
enum class ProductProperty : unsigned { ... };
struct ProductProperties
{
static constexpr char const * NAMES [] = { ... };
static ProductProperty parse ( char const * const str )
{
auto const comparer = [str] ( char const * str_rep ) { return not strcmp ( str, str_rep ); };
auto const position = std::find_if ( std::begin ( NAMES ), std::end ( NAMES ). comparer );
return (ProductProperty) ( position - std::begin ( NAMES ) );
}
}
If you want to be fancy, you can build a custom hash function h that you verified to be injective on the domain of valid enums, then order the array with regards to h. Then the parsing of a string will be equivalent to hashing with h, doing a binary search, then 1 more strcmp with the target entry, if any.

find if string starts with sub string using std::equal

May you please point me to what is the wrong thing I am doing here?
auto is_start_with = [](std::string const& whole_string, std::string const& starting_substring)->bool{
if (starting_substring.size() > whole_string.size()) return false;
return std::equal(begin(starting_substring), end(starting_substring), begin(whole_string));
};
It is always return true.
I know there is many many other solutions but I want to know what is the error here.
EDIT :
Debuging!
P.S. I tried it in other main file with directly entering the strings and it worked!!
Edit 2:
I deleted two to lower transforms before the comparison and it worked!
std::transform(std::begin(fd_name), std::end(fd_name), std::begin(fd_name), ::tolower);
std::transform(std::begin(type_id), std::end(type_id), std::begin(type_id_lower), ::tolower);
I would not use such long identifiers like whole_string or starting_substring. It is clear enough from the parameter declaration that the lambda deals with strings. Too long names make the code less readable.
And there is no sense to use general functions std::begin and std::end. The lambda is written specially for strings.
Also you could use only one return statement.`For example
auto is_start_with = []( std::string const &source, std::string const &target )
{
return !( source.size() < target.size() ) &&
std::equal( target.begin(), target.end(), source.begin() );
}
Or even like
auto is_start_with = []( std::string const &source, std::string const &target )
{
return ( not ( source.size() < target.size() ) ) &&
std::equal( target.begin(), target.end(), source.begin() );
}

How to make sure user enters allowed enum

I have to write a program with an Enum state, which is the 50 2-letter state abbreviations(NY, FL, etc). I need to make a program that asks for the user info and they user needs to type in the 2 letters corresponding to the state. How can I check that their input is valid i.e matches a 2 letter state defined in Enum State{AL,...,WY}? I suppose I could make one huge if statement checking if input == "AL" || ... || input == "WY" {do stuff} else{ error input does not match state }, but having to do that for all 50 states would get a bit ridiculous. Is there an easier way to do this?
Also if Enum State is defined as {AL, NY, FL}, how could I cast a user input, which would be a string, into a State? If I changed the States to {"AL", "NY", "FL"} would that be easier or is there another way to do it?
Unfortunately C++ does not provide a portable way to convert enum to string and vice versa. Possible solution would be to populate a std::map<std::string,State> (or hash map) and on conversion do a lookup. You can either populate such map manually or create a simple script that will generate a function in a .cpp file to populate this map and call that script during build process. Your script can generate if/else statements instead of populating map as well.
Another, more difficult, but more flexible and stable solution is to use compiler like llvm and make a plugin that will generate such function based on syntax tree generated by compiler.
The simplest method is to use an STL std::map, but for academic exercises that may not be permitted (for example it may be required to use only techniques covered in the course material).
Unless explicitly initialised, enumerations are integer numbered sequentially starting from zero. Given that, you can scan a lookup-table of strings, and cast the matching index to an enum. For example:
enum eUSstate
{
AL, AK, AZ, ..., NOSTATE
} ;
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
std::string lookup[STATES] = { "AL", "AK", "AZ" ... } ;
int i = 0 ;
for( i = 0; i < STATES && lookup[i] != inp; i++ )
{
// do nothing
}
return static_cast<eUSstate>(i) ;
}
If perhaps you don't want to rely on a brute-force cast and maintaining a look-up table in the same order as the enumerations, then a lookup table having both the string and the matching enum may be used.
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
struct
{
std::string state_string ;
eUSstate state_enum ;
} lookup[STATES] { {"AL", AL}, {"AK", AK}, {"AZ", AL} ... } ;
eUSstate ret = NOSTATE ;
for( int i = 0; ret == NOSTATE && i < STATES; i++ )
{
if( lookup[i].state_string == inp )
{
ret = lookup[i].state_enum ;
}
}
return ret ;
}
The look-up can be optimised by taking advantage of alphabetical ordering and performing a binary search, but for 50 states it is hardly worth it.
What you need is a table. Because the enums are linear,
a simple table of strings would be sufficient:
char const* const stateNames[] =
{
// In the same order as in the enum.
"NY",
"FL",
// ...
};
Then:
char const* const* entry
= std::find( std::begin( stateNames ), std::end( stateNames ), userInput );
if (entry == std::end( stateNames ) ) {
// Illegal input...
} else {
State value = static_cast<State>( entry - std::begin( stateNames ) );
Alternatively, you can have an array of:
struct StateMapping
{
State enumValue;
char const* name;
struct OrderByName
{
bool operator()( StateMapping const& lhs, StateMapping const& rhs ) const
{
return std::strcmp( lhs.name, rhs. name ) < 0;
}
bool operator()( StateMapping const& lhs, std::string const& rhs ) const
{
return lhs.name < rhs;
}
bool operator()( std::string const& lhs, StateMapping const& rhs ) const
{
return lhs < rhs.name;
}
};
};
StateMapping const states[] =
{
{ NY, "NY" },
// ...
};
sorted by the key, and use std::lower_bound:
StateMapping::OrderByName cmp;
StateMapping entry =
std::lower_bound( std::begin( states ), std::end( states ), userInput, cmp );
if ( entry == std::end( states ) || cmp( userInput, *entry) {
// Illegal input...
} else {
State value = entry->enumValue;
// ...
}
The latter is probably slightly faster, but for only fifty
entries, I doubt you'll notice the difference.
And of course, you don't write this code manually; you generate
it with a simple script. (In the past, I had code which would
parse the C++ source for the enum definitions, and generate the
mapping functionality from them. It's simpler than it sounds,
since you can ignore large chunks of the C++ code, other than
for keeping track of the various nestings.)
The solution is simple, but only for 2 characters in the string (as in your case):
#include <stdio.h>
#include <stdint.h>
enum TEnum
{
AL = 'LA',
NY = 'YN',
FL = 'LF'
};
int _tmain(int argc, _TCHAR* argv[])
{
char* input = "NY";
//char* input = "AL";
//char* input = "FL";
switch( *(uint16_t*)input )
{
case AL:
printf("input AL");
break;
case NY:
printf("input NY");
break;
case FL:
printf("input FL");
break;
}
return 0;
}
In above example I used an enumeration with a double character code (it is legal) and passed to the switch statement a input string. I tested it end work!. Notice the word alignment in enumeration.
Ciao

search for a string in a list of string ranges

I have a list of lexicographical ranges, for example
[a,an) [an,bb) [bb,d) [c,h)
Given a string say apple, I need to find which range it belongs to. In this case it is in the second range. If the string could belong to multiple ranges, the first one needs to be returned. Eg: cat should return range3 and not range4.
Brute force approach would be to loop through the list in order and check if the string fits in there.
Better approach would be to resolve overlaps first, sort the ranges and do a binary search.
Any suggestions for further optimized algorithm? Also implementation tips for c++ is welcome. This logic happens to occur on a critical execution path and has to be fast.
Update:
Yes, there could be gaps in the ranges.
Yes binary search can make it O(log(n)). Is there someway I can come up with a hash and make it even better? How would the hash look like? We can assume we have only lowercase characters in all the strings and ranges.
Here is what you should do:
First sort the ranges with respect to their beginnings in lexicographical order. Then you should do the following pre-processing on them - for each range make it's beginning the greater of it's begining and the end of the previous range(if this makes the current range empty, simply ignore it). You do that because if a word is before the end of the previous range, then it will belong to some of the previous ranges and will never be classified in the current one. After this pre-processing all the ranges are non-overlapping and so each word you search for will belong to at most one of them. So all you need to do is to perform a binarry search on the resulting pre-processed ranges which will be in O(log(n)) complexity. I doubt you can achieve better complexity for this problem.
Some kind of index to the start of each range, perhaps a binary tree, would probably be a good idea. Not sure if you need to index to the end of each range, unless there may be gaps.
One solution comes to my mind, may be you can sort the word apple and identify the character that comes last in the a-z order. And just check for that one character in your ranges. Thinking more...
If you have memory to spare and are limited to lowercase, you can build a multi-way tree. Top node has an array of 26 pointers. Pointers are Null if no range starts with that character. They point to a range if all words starting with that character fall into the range, and point to another node if the ranges split on a following character. (so given [aa-ap],[ar-bl]; the 'a' entry would point to another node where entries 'a' through 'p' pointed to range 1, entry 'q' was null, and 'r' thru 'z' pointed to range 2. )
This should be O(max(range_specifier)).
You might approach this by "gridding".
Use an array with 26 entries corresponding to the first letter. Every bin contains the list of ranges having a nonempty intersection with it. Like
'a' -> [a,an) [an,bb), 'b' -> [an,bb) [bb,d), 'c' -> [bb,d) [c,h) ...
You easily generalize the idea to a prefix of a few letters
'aaa' -> [a,an), 'aab' -> [a,an), 'aac' -> [a,an) ...
This can much shorten the list of ranges to be tried, especially if there are little overlaps, at the expense of storage and preprocessing time.
A special convention can be used to indicate that a bin is wholly covered.
Happy distributions can lead to O(1), I guess.
I wouldn't be surprised that your set of ranges can be represented with a trie (http://en.wikipedia.org/wiki/Trie). Once the trie is filled, the query time should not exceed the length of the longest range bound nor the length of the query string.
This is optimal in terms of query time (in fact O(1) in your computational model).
My approach would be
a range has two limits (the lower and upper limit)
each range partitions the space into three parts (below, inside, above)
each limit partitions the space into two parts (below, above_or_equal)
So the method could be:
number the ranges
decompose the ranges into two limits
put the limits into a tree, in nodes containing two lists with the ranges that refer to them (one list for nodes that use this limit as lower limit, one for upper limit)
these lists can be bitmaps, since the ranges are numbered
to find a string
you walk the tree, and every time you step down you actually cross a limit and gain knowledge about which limits you have to your right/left, and which ranges you are left/right/inside.
you need two additional lists (of range numbers) to do this traversal.
these lists can be bitmaps
every time you cross a border you add the range number from one of the lists and remove it from the other.
once you are inside a range (x >= lower limit && x < upper limit; with the limits corresponding to the same range of course) the algorihtm finishes.
(given that this is actually the range with the lowest number: first match)
this can be detected if the two lists share one or more members
we want the lowest-numbered overlapping member.
Since this method is a tree search, it has O(log(N)) complexity.
UPDATE: On second thought, bitmaps are not good way to store the usage lists or the results. A linked list (actually two) is better. Code is 300 lines. Should I post it here ?
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define COUNTOF(a) (sizeof a / sizeof a[0])
#define WANT_DUPLICATES 0
struct range {
/* separate linked lists for {lo,hi})
** for limits that this range participates in. */
struct range *next_lo;
struct range *next_hi;
unsigned num;
char *str_lo, *str_hi;
} ranges[] =
{{NULL,NULL,0, "a","an"}, {NULL,NULL,1, "an", "bb"}
,{NULL,NULL,2, "bb", "d"}, {NULL,NULL,3, "c", "h"}
,{NULL,NULL,4, "bb", "c"}, {NULL,NULL,5, "c", "d"}
};
#define NRANGE COUNTOF(ranges)
void add_range(struct range *pp);
void list_dmp(FILE *fp, int isupper, struct range *bp);
struct treetwo {
struct treetwo *prev;
struct treetwo *next;
char *str;
struct range *list_lo; /* ranges that use str as lower limit */
struct range *list_hi; /* ranges that use str as upper limit */
};
struct treetwo *root = NULL;
struct treetwo ** find_hnd(struct treetwo **tpp, char *str);
void tree_dmp(FILE *fp, struct treetwo *tp, char *msg, unsigned indent);
struct result {
unsigned size;
unsigned used;
struct {
unsigned num;
unsigned mask;
} *entries;
};
#define MASK_BELOW_LO 1
#define MASK_ABOVE_LO 2
#define MASK_BELOW_HI 4
#define MASK_ABOVE_HI 8
int result_resize(struct result *res, unsigned newsize);
void init_structures(void);
struct result *find_matches (char *str);
unsigned update_state(struct result *rp, struct treetwo *tp, int isabove);
int main (void)
{
char buff[100];
struct result *res;
size_t pos;
unsigned idx;
static char *legend[4] = { "unknown", "below", "above", "impossible"};
init_structures();
tree_dmp(stderr, root, "Root", 0);
while (fgets (buff, sizeof buff, stdin) ) {
pos=strcspn(buff, "\r\n");
buff[pos] = 0;
res = find_matches (buff);
for (idx=0; idx < res->used; idx++) {
unsigned num = res->entries[idx].num;
unsigned mask = res->entries[idx].mask;
fprintf(stdout, "[%u]Range%u %x: '%s' %s '%s' and '%s' %s '%s'\n"
, idx, num, mask
, buff, legend[mask & 3], ranges[num].str_lo
, buff, legend[(mask>>2) & 3], ranges[num].str_hi
);
}
}
return 0;
}
unsigned update_state(struct result *rp, struct treetwo *tp, int isabove)
{
struct range *p;
unsigned mask_lo, mask_hi;
unsigned hitcnt,idx;
/* State: (lower limit)
** 0 : unknown
** MASK_BELOW_LO: below limit
** MASK_ABOVE_LO: above limit
** 3: impossible
** State: (upper limit)
** 0 : unknown
** MASK_BELOW_HI: below limit
** MASK_ABOVE_HI: above limit
** c: impossible
** Combined states:
** required state 2|4 := 6
** 5: unreachable
** a: unreachable
** 9: impossible
** f: impossible
*/
if (!tp) return 0;
hitcnt=0;
mask_lo = (isabove>=0) ? MASK_ABOVE_LO : MASK_BELOW_LO;
mask_hi = (isabove>=0) ? MASK_ABOVE_HI : MASK_BELOW_HI;
fprintf(stderr , "Update_state(start{%s}, isabove=%d, mask=%x,%x)\n"
, tp->str , isabove, mask_lo, mask_hi);
fprintf(stderr , "Update_state(Lo=%s)=", tp->str);
list_dmp(stderr , 0, tp->list_lo);
idx=0;
for (p = tp->list_lo; p ; p = p->next_lo) {
unsigned num = p->num;
fprintf(stderr , "Update_state:[%u] |= %u", num, mask_lo );
for ( ;idx < rp->used;idx++) { if (rp->entries[idx].num >= num) break; }
if ( idx < rp->used ) {
fprintf(stderr , " Old was:%u\n", rp->entries[idx].mask );
rp->entries[idx].mask |= mask_lo;
if (rp->entries[idx].mask == (MASK_ABOVE_LO|MASK_BELOW_HI)) hitcnt++;
continue;
}
if ( idx >= rp->used) {
if ( rp->used >= rp->size && result_resize(rp, rp->size ? rp->size*2 : 8)) break;
fprintf(stderr , " New at:%u\n", idx );
rp->entries[idx].num = num;
rp->entries[idx].mask = mask_lo;
rp->used++;
}
}
fprintf(stderr , "Update_state(Hi=%s)=", tp->str);
list_dmp(stderr , 1, tp->list_hi);
idx=0;
for (p = tp->list_hi; p ; p = p->next_hi) {
unsigned num = p->num;
fprintf(stderr , "Update_state:[%u] |= %u", num, mask_lo );
for ( ;idx < rp->used;idx++) { if (rp->entries[idx].num >= num) break; }
if ( idx < rp->used ) {
fprintf(stderr , " Old was:%u\n", rp->entries[idx].mask );
rp->entries[idx].mask |= mask_hi;
if (rp->entries[idx].mask == (MASK_ABOVE_LO|MASK_BELOW_HI)) hitcnt++;
continue;
}
if ( idx >= rp->used) {
if ( rp->used >= rp->size && result_resize(rp, rp->size ? rp->size*2 : 8)) break;
fprintf(stderr , " New at:%u\n", idx );
rp->entries[idx].num = num;
rp->entries[idx].mask = mask_hi;
rp->used++;
}
}
return hitcnt;
}
struct result *find_matches (char *str)
{
int rc;
struct treetwo **hnd;
struct result *res = malloc (sizeof *res);
unsigned dst,src;
res->used=res->size=0; res->entries=0;
for (hnd= &root; *hnd; hnd = (rc < 0) ? &(*hnd)->prev : &(*hnd)->next ) {
rc = strcmp( str, (*hnd)->str);
fprintf(stderr, "####\nStr=%s Node={%s} rc=%d\n"
, str, (*hnd)->str, rc );
list_dmp(stderr , 0, (*hnd)->list_lo );
list_dmp(stderr , 1, (*hnd)->list_hi );
rc = update_state(res, *hnd , rc);
#if WANT_DUPLICATES
continue;
#else
/* if we don't want duplicates we can bail out on the first match */
if (rc) break;
#endif
}
/* Now cleanup the results.
** Below(lower limit) and above(upper limit) and variations can be removed.
** Some results are incomplete, because one of there limits is out
** of reach (shadowed by a narrower range). We'll have to recompute these.
** The result structure is compacted: if entries are deleted, the remaining ones are shifted down.
** Note: part of this cleanup (removal of unreacheables) could be done in update_state(),
** that would keep the array with partial results as short as possible.
*/
for (dst=src=0; src < res->used; src++) {
int rc;
unsigned num = res->entries[src].num;
rescan:
switch (res->entries[src].mask & 0xf) {
default: break;
case 0: /* impossible */
goto rescan;
#if WANT_DUPLICATES
case MASK_ABOVE_LO:
rc = strcmp(str, ranges[num].str_hi);
res->entries[src].mask |= (rc >=0) ? MASK_ABOVE_HI : MASK_BELOW_HI;
goto rescan;
case MASK_BELOW_HI:
rc = strcmp(str, ranges[num].str_lo);
res->entries[src].mask |= (rc >=0) ? MASK_ABOVE_LO : MASK_BELOW_LO;
goto rescan;
#endif
case MASK_BELOW_HI|MASK_ABOVE_LO:
if (dst != src) res->entries[dst] = res->entries[src];
dst++;
}
}
fprintf(stderr, "####\nFinal pass: %u/%u\n", dst, res->used );
res->used = dst;
return res;
}
void init_structures(void)
{
unsigned idx;
for (idx = 0; idx < NRANGE; idx++) {
add_range( &ranges[idx]);
}
}
void list_dmp(FILE *fp, int isupper, struct range *bp)
{
fprintf(fp, "%s", (isupper) ? "Upper" :"Lower" );
for ( ; bp ; bp = (isupper) ? bp->next_hi : bp->next_lo) {
fprintf(fp, " %u:{%s,%s}"
, bp->num , bp->str_lo , bp->str_hi
);
}
fprintf( stdout, "\n" );
}
void add_range(struct range *pp)
{
struct treetwo **tpp;
struct range **rpp;
fprintf(stderr, "Inserting range %u->{%s,%s}\n", pp->num, pp->str_lo, pp->str_hi);
/* find low boundary for this interval */
tpp = find_hnd (&root, pp->str_lo);
if (!*tpp) {
fprintf(stderr, "Creating node for %u->%s (low)\n", pp->num, pp->str_lo);
*tpp = malloc(sizeof **tpp);
(*tpp)->list_lo = NULL;
(*tpp)->list_hi = NULL;
(*tpp)->str = pp->str_lo;
}
for (rpp = &(*tpp)->list_lo; *rpp ; rpp = &(*rpp)->next_lo) {;}
*rpp = pp;
fprintf(stderr, "Added range %u->{%s,%s} to treenode(%s)->list_lo\n"
, pp->num, pp->str_lo, pp->str_hi
, (*tpp)->str
);
/* find high boundary */
tpp = find_hnd (&root, pp->str_hi);
if (!*tpp) {
fprintf(stderr, "Creating node for %u->%s (High)\n", pp->num, pp->str_hi);
*tpp = malloc(sizeof **tpp);
(*tpp)->list_lo = NULL;
(*tpp)->list_hi = NULL;
(*tpp)->str = pp->str_hi;
}
for (rpp = &(*tpp)->list_hi; *rpp ; rpp = &(*rpp)->next_hi) {;}
*rpp = pp;
fprintf(stderr, "Added range %u->{%s,%s} to treenode(%s)->list_hi\n"
, pp->num, pp->str_lo, pp->str_hi
, (*tpp)->str
);
}
struct treetwo ** find_hnd(struct treetwo **tpp, char *str)
{
int rc;
for ( ; *tpp; tpp = (rc < 0) ? &(*tpp)->prev : &(*tpp)->next ) {
rc = strcmp( str, (*tpp)->str);
if (!rc) break;
}
return tpp;
}
void tree_dmp(FILE *fp, struct treetwo *tp, char *msg, unsigned indent)
{
unsigned uu;
if (!tp) return;
if (!msg) msg = "";
for (uu=0; uu < indent; uu++) { fputc( ' ', fp); }
fprintf(fp, "%s:{%s}\n", msg, tp->str );
for (uu=0; uu < indent+1; uu++) { fputc( ' ', fp); }
list_dmp(fp , 0, tp->list_lo);
for (uu=0; uu < indent+1; uu++) { fputc( ' ', fp); }
list_dmp(fp , 1, tp->list_hi);
tree_dmp(fp, tp->prev, "Prev", indent+2);
tree_dmp(fp, tp->next, "Next", indent+2);
}
int result_resize(struct result *res, unsigned newsize)
{
void *old;
old = res->entries;
res->entries = realloc ( res->entries , newsize * sizeof *res->entries);
if ( !res->entries) {
res->entries = old; return -1;
}
res->size = newsize;
if (res->used > newsize) res->used = newsize;
return 0;
}

hashkey collision when removing C++

To make the search foreach "symbol" i want to remove from my hashTable, i have chosen to generate the hashkey i inserted it at. However, the problem that Im seeing in my remove function is when I need to remove a symbol from where a collision was found it previously results in my while loop condition testing false where i do not want.
bool hashmap::get(char const * const symbol, stock& s) const
{
int hash = this->hashStr( symbol );
while ( hashTable[hash].m_symbol != NULL )
{ // try to find a match for the stock associated with the symbol.
if ( strcmp( hashTable[hash].m_symbol , symbol ) == 0 )
{
s = &hashTable[hash];
return true;
}
++hash %= maxSize;
}
return false;
}
bool hashmap::put(const stock& s, int& usedIndex, int& hashIndex, int& symbolHash)
{
hashIndex = this->hashStr( s.m_symbol ); // Get remainder, Insert at that index.
symbolHash = (int&)s.m_symbol;
usedIndex = hashIndex;
while ( hashTable[hashIndex].m_symbol != NULL ) // collision found
{
++usedIndex %= maxSize; // if necessary wrap index around
if ( hashTable[usedIndex].m_symbol == NULL )
{
hashTable[usedIndex] = s;
return true;
}
else if ( strcmp( hashTable[usedIndex].m_symbol , s.m_symbol ) == 0 )
{
return false; // prevent duplicate entry
}
}
hashTable[hashIndex] = s; // insert if no collision
return true;
}
// What if I need to remove an index i generate?
bool hashmap::remove(char const * const symbol)
{
int hashVal = this->hashStr( symbol );
while ( hashTable[hashVal].m_symbol != NULL )
{
if ( strcmp( hashTable[hashVal].m_symbol, symbol ) == 0 )
{
stock temp = hashTable[hashVal]; // we cansave it
hashTable[hashVal].m_symbol = NULL;
return true;
}
++hashVal %= maxSize; // wrap around if needed
} // go to the next cell meaning their was a previous collision
return false;
}
int hashmap::hashStr(char const * const str)
{
size_t length = strlen( str );
int hash = 0;
for ( unsigned i = 0; i < length; i++ )
{
hash = 31 * hash + str[i];
}
return hash % maxSize;
}
What would I need to do to remove a "symbol" from my hashTable from a previous collision?
I am hoping it is not java's equation directly above.
It looks like you are implementing a hash table with open addressing, is that right? Deleting is a little tricky in that scheme. See http://www.maths.lse.ac.uk/Courses/MA407/del-hash.pdf:
"Deletion of keys is problematic with open addressing: If there are two colliding keys x and y with h(x) = h(y), and key x is inserted before key y, and one wants to delete key x, this cannot simply be done by marking T[h(x)] as FREE, since then y would no longer be found. One possibility would be to mark T[h(x)] as DELETED (another special entry), which is skipped when searching for a key. A table place marked as DELETED may also be re-used for storing another key z that one wants to insert if one is sure that this key z is not already in the table (i.e., by reaching the end of the probe sequence for key z and not finding it). Such re-use complicates the insertion method. Moreover, places with DELETED keys fill the table."
What you need to do is create a dummy sentinel value that represents a "deleted" item. When you insert a new value into the table, you need to check to see if an element is NULL or "deleted". If a slot contains this sentinel "deleted" value or the slot is NULL, then the slot is a valid slot for insertion.
That said, if you are writing this code for production, you should consider using the boost::unordered_map, instead of rolling your own hash map implementation. If this is for schoolwork,... well, good luck.