I have learned multiple languages, but It is the first time I have realized that some C++ books implements its HashTables without storing the key, only the value. I understand that due design specifications it is valid but I still have the question.
Is mandatory for a C++ hashtable to be implemented to only store values ?
Edit:
From this question
And this book: M. Weiss Allen, Data Structures and Algorithm Analysis in C++. Addison-Wesley, 2014. pag 197.
What you described are 2 different kinds of tables.
One is a list, the other is a key-value-pair.
They hash tables you are familiar with are unordered_map in the standard library.
The other one is an unordered_set. They have difference use-cases.
Certainly, since both use hash functions, both can be called hash tables.
Related
I just finished a course in data structs and algorithms (cpp) in my school and I am interested in databasing in the real world... so specifically SQL.
So my question is what is the difference between SQL and for example c++ stl std::multimap? Is SQL faster? or can I make an equally as fast (time complexity wise) homemade SQL using a c++ STL?
thanks!
(sorry I'm new to programming outside the boundaries of my classes)
The obvious difference is that SQL is a query language to interact with database while STL is library (conventionally, STL is also used to refer to a certain subset of the standard C++ library). As such, these are apples and oranges.
SQL actually entails a suite of standards specifying various parts of a database system. For a database system to be useful it is desirable that certain characteristics are met (ACID. Even just looking at these there is no requirement that they are met by STL containers. I think only the consistency would even be desirable for STL container:
STL container mutations are not required to be atomic: when an exception is thrown from within one of the mutating functions the container may become unusable, i.e., STL containers are only required to meet the basic exception guarantee.
As mentioned, [successful] mutations yield to a consistent state.
STL containers can't be currently mutated and read, i.e., there is no concept of isolation. If you want to access an STL container in a concurrent environment you need to make sure that there is no other accessor when the container is being mutated (you can have as many concurrent readers while there is not mutator, though).
There is on concept of durability for STL containers while it may be considered the core feature of databases to be durable (well, all ACID features can be considered core database features).
Database internally certainly use some data structures and algorithms to provide the ACID features. This is an area where STL may come in, although primarily with its key strength, i.e., algorithms which aren't really "algorithms" but rather "solvers for specific problems": STL is the foundation of an efficient algorithm library which is applicable to arbitrary data structures (well, that's the objective - I don't think it is, yet, achieved). Sadly, important areas of data structures are not appropriately covered, though. In particular with respect to databases algorithms on trees especially b-trees tend to be important but are not at all covered by STL.
The STL container std::multimap<...> does contain a tree (typically a red/black-tree but that's not mandated) but it is tied to this particular in-memory representation. There is no way to apply the algorithms used to implement this particular data structure to some suitable persistent representation. Also, a std::multimap<...> still uses just one key (the multi refers to allowing multiple elements with the same key, not to having multiple keys) while database typically require multiple look-up mechanisms (indices which are utilized when executing queries based on a query plan for each query.
You got multiple questions and the interesting one (in my opinion) is this: "... or can I make an equally as fast (time complexity wise) homemade SQL using a c++ STL?"
In a perfect world where STL covers all algorithms, yes, you could create a query evaluator for a database based on the STL algorithms. You could even use some of the STL containers as auxiliary data structures although the primary data structures in a database are properly represented in persistent storage. To create an actual database you'd also need something which translates the query into a good query plan which can then be executed.
Of course, if all you really need are some look-ups by a key in a data structure which is read at some point by the program, you wouldn't need a full-blown database and looks are probably faster using suitable STL containers.
Note, that the time complexity tends to be useful for guiding a quick evaluations of different approaches. However, in practice the constant factors tend to matter and often the algorithms with the inferior time complexity behaves better. The canonical example is quicksort which outperforms the "superior" algorithms (e.g. heapsort or mergesort) for typical inputs (although in practice actually introsort is used which is a hybrid of quicksort, heapsort, and insertion-sort is used which combines the strength of these respective algorithms to behave well on all inputs). BTW, to get an illustration of the algorithms you may want to watch the Hungarian Sort Dancers.
This question has been bugging me for quite a long time and today I've read a detailed article related to hash tables. Without checking any implementation examples I wanted to give a shot for writing a Hash Table from scratch.
The seperate chaining method gave me the idea of implementing the hash table. Anyone who has experience on data structures might regard this question as a joke but i'm a beginner and without diving straight at the code I wanted to discuss my implementations efficiency. Would it be efficient or any other fundamental ideas could be preferred than this?
I think for starters you can also peek into the source (or documentations) of some hash maps implemented in boost libraries. It is called unordered_map. (link is here)
As long as you don't know about these implementations and want to use a hash and you are annoyed because it is not in STL, you are intrigued to write your own fast datastore.
But now implementing hash-maps are so much out of the game that C++11 has unordered_map in its STL. You'll see there are plenty of more interesting stuff out there.
Note: separate chaining is called bucket hash. In fact, boost uses bucket hash, see this link. Maybe you could rather look up some performance comparisons. Chances are those who do perf's will write good enough implementations.
Using closed addressing, another alternative is to use a self balancing binary search tree, e.g. red-black tree/std::map or heap tree, for the inner data structure, or even another hash map with different hashing algorithm.
Using open addressing, another alternative to linear probing are quadratic probing and double hashing; there are also less commonly used strategies such as cuckoo hashing, hopscotch hashing, etc.
The key points of implementing hash table is choosing the right hashing algorithm, resizing strategy (load factor), and collision resolution strategy. The best strategy is highly dependant on the type of workload that you're expecting as there are tradeoffs for each approach.
I am a big java fan but i have to work now on C++ for a project. I intended to you a java hashmap kind of feature in c++. After googling i found there exists no hashmap/hashtable in C++ STL library. But i found these data types: map, unordered_map, unorderd_set and hash_map. hash_map is microsoft's specific dll/library and remaining are used under STL. i have to work on IBM XL C/C++ compiler. So i can't use microsoft/boost as my company don't recommend them. All i have to use STL specific. Please, provide some info on these collections. What would be best among these STL specifics if i have to choose hashmap functionality? Thanks in advance.
unordered_map is equivalent to java's HashMap, and is a hash map - so it is probably what you are after.
map is equivalent to a TreeMap in java. It is implemented as a red-black tree.
unordered_set is equivalent to java's HashSet. It contains only keys, and not pairs of (key,value)
Did you read Wikipedia's page on associative containers in C++?
If you want real hash table (with a key providing an hash code, but no order between keys) you could use C++ 2011 std::unordered_map template. You'll need a compiler recent enough to be C++11 compatible in that regard.
If you can provide an order on keys, consider also using std::map which is available even in the older C++03 standard.
We are porting out game from C++ to web; the game make extensive use of STL.
Can you provide short comparison chart (and if possible, a bit of code samples for basic operations like insertion/deletion/searching and (where applicable) equal_range/binary_search) for the classes what are equivalents to the following STL containers :
std::vector
std::set
std::map
std::list
stdext::hash_map
?
Thanks a lot for your time!
UPD:
wow, it seems we do not have everything we needhere :(
Can anyone point to some industry standard algorithms library for AS3 programs (like boost in C++)?
I can not believe people can write non-trivial software without balanced binary search trees (std::set std::map)!
The choices of data structures are significantly more limited in as3. You have:
Array or Vector.<*> which stores a list of values and can be added to after construction
Dictionary (hash_map) which stores key/value pairs
maps and sets aren't really supported as there's no way to override object equality. As for binary search, most search operations take a predicate function for you to override equality for that search.
Edit: As far as common algorithm and utility libraries, I'd take a look at as3commons
Maybe this library will fit your needs.
Looking for good source code either in C or C++ or Python to understand how a hash function is implemented and also how a hash table is implemented using it.
Very good material on how hash fn and hash table implementation works.
Thanks in advance.
Hashtables are central to Python, both as the 'dict' type and for the implementation of classes and namespaces, so the implementation has been refined and optimised over the years. You can see the C source for the dict object here.
Each Python type implements its own hash function - browse the source for the other objects to see their implementations.
When you want to learn, I suggest you look at the Java implementation of java.util.HashMap. It's clear code, well-documented and comparably short. Admitted, it's neither C, nor C++, nor Python, but you probably don't want to read the GNU libc++'s upcoming implementation of a hashtable, which above all consists of the complexity of the C++ standard template library.
To begin with, you should read the definition of the java.util.Map interface. Then you can jump directly into the details of the java.util.HashMap. And everything that's missing you will find in java.util.AbstractMap.
The implementation of a good hash function is independent of the programming language. The basic task of it is to map an arbitrarily large value set onto a small value set (usually some kind of integer type), so that the resulting values are evenly distributed.
There is a problem with your question: there are as many types of hash map as there are uses.
There are many strategies to deal with hash collision and reallocation, depending on the constraints you have. You may find an average solution, of course, that will mostly fit, but if I were you I would look at wikipedia (like Dennis suggested) to have an idea of the various implementations subtleties.
As I said, you can mostly think of the strategies in two ways:
Handling Hash Collision: Bucket, which kind ? Open Addressing ? Double Hash ? ...
Reallocation: freeze the map or amortized linear ?
Also, do you want baked in multi-threading support ? Using atomic operations it's possible to get lock-free multithreaded hashmaps as has been proven in Java by Cliff Click (Google Tech Talk)
As you can see, there is no one size fits them all. I would consider learning the principles first, then going down to the implementation details.
C++ std::unordered_map use a linked-list bucket and freeze the map strategies, no concern is given to proper synchronization as usual with the STL.
Python dict is the base of the language, I don't know of the strategies they elected