what are the differences between these in C++? - c++

I am a big java fan but i have to work now on C++ for a project. I intended to you a java hashmap kind of feature in c++. After googling i found there exists no hashmap/hashtable in C++ STL library. But i found these data types: map, unordered_map, unorderd_set and hash_map. hash_map is microsoft's specific dll/library and remaining are used under STL. i have to work on IBM XL C/C++ compiler. So i can't use microsoft/boost as my company don't recommend them. All i have to use STL specific. Please, provide some info on these collections. What would be best among these STL specifics if i have to choose hashmap functionality? Thanks in advance.

unordered_map is equivalent to java's HashMap, and is a hash map - so it is probably what you are after.
map is equivalent to a TreeMap in java. It is implemented as a red-black tree.
unordered_set is equivalent to java's HashSet. It contains only keys, and not pairs of (key,value)

Did you read Wikipedia's page on associative containers in C++?
If you want real hash table (with a key providing an hash code, but no order between keys) you could use C++ 2011 std::unordered_map template. You'll need a compiler recent enough to be C++11 compatible in that regard.
If you can provide an order on keys, consider also using std::map which is available even in the older C++03 standard.

Related

May a hashtable store its keys in C++?

I have learned multiple languages, but It is the first time I have realized that some C++ books implements its HashTables without storing the key, only the value. I understand that due design specifications it is valid but I still have the question.
Is mandatory for a C++ hashtable to be implemented to only store values ?
Edit:
From this question
And this book: M. Weiss Allen, Data Structures and Algorithm Analysis in C++. Addison-Wesley, 2014. pag 197.
What you described are 2 different kinds of tables.
One is a list, the other is a key-value-pair.
They hash tables you are familiar with are unordered_map in the standard library.
The other one is an unordered_set. They have difference use-cases.
Certainly, since both use hash functions, both can be called hash tables.

std::hash_set vs std::unordered_set, are they the same thing?

I know hash_set is non-standard and unordered_set is standard. However, I am wondering, performance wise, what is the difference between the two? Why do they exist separately?
The complexity requirements for the unordered_-containers set out by the C++ standard essentially don't leave much room for the implementation, which has to be some sort of hash table. The standard was written in full awareness that those data structures had already been deployed by most vendors as an extension.
Compiler vendors would typically call those containers "hash map" or "hash set", which is what you're probably referring to (there is no literal std::hash_set in the standard, but I think there's one in GCC in a separate namespace, and similarly for other compilers).
When the new standard was written, the authors wanted to avoid possible confusion with existing extension libraries, so they went for a name that reflects the typical C++ mindset: say what it is, not how it's implemented. The unordered containers are, well, unordered. That means you get less from them compared to the ordered containers, but this diminished utility affords you more efficient access.
Implementation-wise, hash_set, Boost-unordered, TR1-unordered and C++11-unordered will be very similar, if not identical.
Regarding the question "are they the same thing" from the subject line: based on my experience of upgrading code from __gnu_cxx::hash_set to std::unordered_set, they are almost, but not exactly, the same thing.
The difference that I ran into is that iterating through __gnu_cxx::hash_set returned the items in what appeared to be the original order of insertion, whereas std::unordered_set would not. So as the name implies, one cannot rely on an iterator to return the items in any particular order when iterating though the entire std::unordered_set.
Visual Studio 2010 for example has both hash_xxx and unordered_xxx, and if you look through the headers, atleast their implementation is the same for all of those (same base-/"policy"-classes).
For other compilers, I don't know, but due to how hash container usually have to be implemented, I guess there won't be many differences, if any at all.
They are pretty much the same things. The standard (C++0x) name is unordered_set. hash_set was an earlier name from boost and others.

Coming from C++ to AS3 : what are fundamental AS3 data structures classes?

We are porting out game from C++ to web; the game make extensive use of STL.
Can you provide short comparison chart (and if possible, a bit of code samples for basic operations like insertion/deletion/searching and (where applicable) equal_range/binary_search) for the classes what are equivalents to the following STL containers :
std::vector
std::set
std::map
std::list
stdext::hash_map
?
Thanks a lot for your time!
UPD:
wow, it seems we do not have everything we needhere :(
Can anyone point to some industry standard algorithms library for AS3 programs (like boost in C++)?
I can not believe people can write non-trivial software without balanced binary search trees (std::set std::map)!
The choices of data structures are significantly more limited in as3. You have:
Array or Vector.<*> which stores a list of values and can be added to after construction
Dictionary (hash_map) which stores key/value pairs
maps and sets aren't really supported as there's no way to override object equality. As for binary search, most search operations take a predicate function for you to override equality for that search.
Edit: As far as common algorithm and utility libraries, I'd take a look at as3commons
Maybe this library will fit your needs.

What other data structures are available in the C++ STL?

I'm already aware of the following:
arrays
bitsets
hash maps and sets
regular maps and sets
iterators
lists
pairs
tuples
queues, deques, and priority queues
stacks
valarrays
vectors
Is there any other type of data structure available in the C++ library. What I'm specifically looking for is graphs, but I'd also like to know what else is there.
Also, I'd like to know if there are any external libraries I can link with my projects to implement a graph.
It's "the C++ standard library" or something to that effect, not "the STL". That term refers to an initial draft of some specific data structures and algorithms. Not all of them made it into the standard library, and the standard library also contains other stuff (for example, all the iostream classes).
That looks like a complete list to me (you appear to be talking specifically about C++0x, since you mention tuples and arrays). I don't know if I would even consider bitsets and iterators to be "data structures", but I guess that's a fair description.
There is definitely not a graph implementation. Unfortunately. :( You can get one from Boost, though.
The STL is divided into three parts:
Containers
Iterators
Algorithms
You have obviously found the containers part and you have probably used the iterators associated with the containers. But there is even more to the iterators than you have found.
The algorithms sections is linked to the containers via iterators. But also contains the parts handle functors and associated binders.
My favortie site for this is: http://www.sgi.com/tech/stl/table_of_contents.html
In addition to the standard libraries you should have a look at the boost libraries:
see also: Boost Library

Looking for production quality Hash table/ unordered map implementation to learn?

Looking for good source code either in C or C++ or Python to understand how a hash function is implemented and also how a hash table is implemented using it.
Very good material on how hash fn and hash table implementation works.
Thanks in advance.
Hashtables are central to Python, both as the 'dict' type and for the implementation of classes and namespaces, so the implementation has been refined and optimised over the years. You can see the C source for the dict object here.
Each Python type implements its own hash function - browse the source for the other objects to see their implementations.
When you want to learn, I suggest you look at the Java implementation of java.util.HashMap. It's clear code, well-documented and comparably short. Admitted, it's neither C, nor C++, nor Python, but you probably don't want to read the GNU libc++'s upcoming implementation of a hashtable, which above all consists of the complexity of the C++ standard template library.
To begin with, you should read the definition of the java.util.Map interface. Then you can jump directly into the details of the java.util.HashMap. And everything that's missing you will find in java.util.AbstractMap.
The implementation of a good hash function is independent of the programming language. The basic task of it is to map an arbitrarily large value set onto a small value set (usually some kind of integer type), so that the resulting values are evenly distributed.
There is a problem with your question: there are as many types of hash map as there are uses.
There are many strategies to deal with hash collision and reallocation, depending on the constraints you have. You may find an average solution, of course, that will mostly fit, but if I were you I would look at wikipedia (like Dennis suggested) to have an idea of the various implementations subtleties.
As I said, you can mostly think of the strategies in two ways:
Handling Hash Collision: Bucket, which kind ? Open Addressing ? Double Hash ? ...
Reallocation: freeze the map or amortized linear ?
Also, do you want baked in multi-threading support ? Using atomic operations it's possible to get lock-free multithreaded hashmaps as has been proven in Java by Cliff Click (Google Tech Talk)
As you can see, there is no one size fits them all. I would consider learning the principles first, then going down to the implementation details.
C++ std::unordered_map use a linked-list bucket and freeze the map strategies, no concern is given to proper synchronization as usual with the STL.
Python dict is the base of the language, I don't know of the strategies they elected