Operator [] overload for hash table - c++

i want to overload the [] operator to use in a hash table i have to do for homework.
I am using a vector of lists that contain pairs.std::vector <std::forward_list<std::pair<std::string, int>>>
What i want from the operator to do is to return the other part of the given pair , for instance if there is a pair("test" , 21) , by writing vectorname["test"] i should get 21 , or if i were to write vectorname["test"]=22 it should modify the pair.Also , there should be no identical keys , or if they were to be ,only the first one would be taken into consideration.
This is my first stack overflow question , sorry if i didn't explait things very well.

In order to do this sort of thing, you need to have your operator[] return a reference-like type that can be assigned to (in order to update the table) or just used (when reading the hash table). The critical thing you need to decide is what to do when the key is not present in the table.
add the key to the table immediately. This means that when you try to read a key that is not present, you'll add it to the table with a defaulted value (this is how STL maps work)
don't add the key until you actually assign to the element. This is more work, but allows you to have key values without a default constructor.
In the former case, you just return an actual reference to the element value. In the latter case, you need to implement a custom element_ref class that can be assigned to (operator=) or can be implicitly converted to the element value type (operator int in your case).

Related

std::unordered_set::find - construct an instance only for find()

A lot of times I see my key is actually inside my value.
For example:
struct Elem {
int key;
// ... Other variables ...
}
That makes me want to use std::unordered_set instead of std::unordered_map, because I already have the key stored inside my value - no need to waste more place for std::unordered_map's .first (key).
Then I start implementing with std::unordered_set and get to the place I need to perform a find() over my std::unordered_set.
Then I realize I need to create an empty-shell Elem so I would be able to find(), beacuse std::unordered_set::find gets a Key for input
template < class Key, // unordered_set::key_type/value_type
class Hash = hash<Key>, // unordered_set::hasher
class Pred = equal_to<Key>, // unordered_set::key_equal
class Alloc = allocator<Key> // unordered_set::allocator_type
> class unordered_set;
Sometimes building an empty-shell Elem is hard / wasteful / maybe even not possible?
For example, when my key/value is
An iterator
A reference
A class with specific c'tor (not constructing the instance only with the key)
Q. Am I missing something?
Q. Is there a way to do find() that isn't wasteful? I mean that doesn't make me create an instance I didn't want to
Something really strange to me - that I already should have the element I'm looking for in order to find it, or at least an empty-shell of it.
When choosing a data structure to hold your data you need to consider your use case.
If you want to look up data from a key you should use a map. If you just want to store unique values in a collection and you don't need to look them up use set.
I don't see why its so much trouble to insert a element as map.emplace_back(elem.key, elem) vs set.emplace_back(elem) if it means that down the road you can just query the elem as map.at(key) or map[key] vs having create an empty elem.
Besides, std::set does the whole key thingamajig (roughly) underwater anyway. (source: What is the difference between set vs map in C++?)

How to get a fix sized class name string

I use typeid(ClassName).name() to get the name for a wide range of class types. However, I need to make the length of it fixed (e.g. 8 char). In many cases the class is in a namespace which makes the string so long, and it does not work if I just get the first 10 characters.
Does anyone know a good way to code/decode a string into a fixed size string? I can't really keep a table to map the hash_code to a name since I'm going to send the string to another machine which does not have access to the map.
template <typename ClassType> char* get_name(){
return typeid(ClassType).name(); // ??
}
In general, it's not possible to build a function mapping arbitrary-length strings into a fixed domain. That violates the pigeonhole principle.
The following suggestion seems to me fairly convoluted, but given the lack of larger context to your problem, here goes...
Suppose you build a class through which to run all your names, as so
class compressor {
explicit compressor(std::size_t seed);
std::string operator()(const std::string &name) const;
}
It has two members: a ctor taking a seed, and an operator() taking a name string and returning an 8-char key string. In your code, initialize this object with some fixed, arbitrary seed.
Internally, the class object should hold an unordered_map mapping, for each distinct name on which it was applied, the key to which it was mapped. Initially, obviously, this internal unordered_map will be empty.
The class object should use a universal hash function, pseudo-randomly selected by the seed in the constructor. See the answer to this question on one way to create a universal hash function.
When the operator is called, it should check if the name is in the internal unordered_map. If so, return the key found for it. Otherwise, first use the hash function to calculate the key and place it in the internal unordered_map. When generating a new key, though, check if it collides with an existing key, and throw an exception if so.
Realistically speaking, since each distinct name corresponds to a place in your code where you call typeid, the number of distinct names, say n should be in the 1000s, at most. Set m to be the range possible with 8 characters (264).
The probability of a collision is ~n2 / (2 m), which should be tiny. Thus, most chances are that there will be no collisions, and no exception thrown. If one is thrown, though, change the seed, and build the program again. The expected number of times you'll have to do that (after the initial time) is close to 0.

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?
I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.
First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it
If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

Google's dense_hash_map crashing in set_empty_key() function

I am trying to use google dense_hash_map to store key value data instead of std:map.
When I tested with (int, int ) pair, I set the set_empty_key(mymap, -2) and it worked.
But, now when I use it with my (hash, value) pair, I set the set_empty_key (mymap -2) or set_empty_key(mymap, some_random_hash), in both the cases my program crashes in set_empty_key();.
Anyone can guide me with this? How can I fix this crash?
Thanks.
I don't know the exact reason of crash you've got, but, based on your description I see at least two potential mistakes.
First. Check that both key_type and data_type types are POD types and don't contain pointers to itself. More specifically (original):
Both key_type and data_type must be
plain old data. In addition, there should
be no data structures that point
directly into parts of key or value,
including the key or value itself (for
instance, you cannot have a value like
struct {int a = 1, *b = &a}. This is
because dense_hash_map uses malloc()
and free() to allocate space for the
key and value, and memmove() to
reorganize the key and value in
memory.
Second. Concerning using dense_hash_map. You need to set up some special "empty" key value which will never be used for real elements stored in your collection. Moreover if you are going to use erase() you need to specify special key for deleted items which also will never be used as key for real stored items.
That is perfectly described here:
dense_hash_map requires you call
set_empty_key() immediately after
constructing the hash-map, and before
calling any other dense_hash_map
method. (This is the largest
difference between the dense_hash_map
API and other hash-map APIs. See
implementation.html for why this is
necessary.) The argument to
set_empty_key() should be a key-value
that is never used for legitimate
hash-map entries. If you have no such
key value, you will be unable to use
dense_hash_map. It is an error to call
insert() with an item whose key is the
"empty key." dense_hash_map also
requires you call set_deleted_key()
before calling erase(). The argument
to set_deleted_key() should be a
key-value that is never used for
legitimate hash-map entries. It must
be different from the key-value used
for set_empty_key(). It is an error to
call erase() without first calling
set_deleted_key(), and it is also an
error to call insert() with an item
whose key is the "deleted key."

Problem counting words from a phrase without using std::map

i want to see the number of appearance of words from some phrases.
My problem is that i can't use map to do this:
map[word] = appearnce++;
Instead i have a class that uses binary tree and behaves like a map, but i only have the method:
void insert(string, int);
Is there a way to counts the words apperances using this function?(because i can't find a way to increment the int for every different word) Or do I have to overload operator [] for the class? What should i do ?
Presumably you also have a way to retrieve data from your map-like structure (storing data does little good unless you can also retrieve it). The obvious method would be to retrieve the current value, increment it, and store the result (or store 1 if retrieving showed the value wasn't present previously).
I guess this is homework and you're learning about binary trees. In that case I would implement operator[] to return a reference to the existing value (and if no value exists, default construct a value, insert it, and return that. Obviously operator[] will be implemented quite similarly to your insert method.
can you edit "insert" function?
if you can, you can add static variable that count the appearnces inside the function