C++ hash table implementation with collision handling - c++

What is a good C++ library for hash tables / hash maps similar to what java offers. I have worked with Google Sparsehash, but it has no support for collisions.

Use std::unordered_map (or unordered_multimap), which despite its name is a hash table - it will be part of the next C++ standard, and is available in most current C++ implementations. Do not use the classes with hash in their names that your implementation may provide - they are not and will not be standard.

http://www.sgi.com/tech/stl/hash_multimap.html
or
std::tr1::unordered_multimap

In addition to those mentioned in other answers, you could try MCT's closed_hash_map or linked_hash_map. It is internally similar to Google SparseHash, but doesn't restrict values used and has some other functional advantages.
I'm not sure I understand what you mean by "no support for collisions", though. Both Google SparseHash and similarly implemented MCT of course handle collisions fine, though differently than Java's HashMap.

Related

Why does the C++ STL use RBtree to implement "std::map"?

Nowadays I am looking python source code, and I found both python and C# use hash to implement Dictionary.
The time complexity of hash is O(1) and RBtree is O(lgn), so can anybody tell me the reason why the C++ STL uses RBtree to implement std::map?
Because it has a separate container for hash tables: std::unordered_map<>. Note also that .NET has SortedDictionary<> in addition to Dictionary<>.
The answer can be found in "The Standard C++ Library, A Tutorial and Reference", available online here: http://cs-people.bu.edu/jingbinw/program/The%20C++STL-T&R.pdf.
Short quote explaining:
In general, the whole standard (language and library) is the result of a lot of discussions and
influence from hundreds of people all over the world. For example, the Japanese came up with
important support for internationalization. Of course, mistakes were made, minds were changed,
and people had different opinions. Then, in 1994, when people thought the standard was close to
being finished, the STL was incorporated, which changed the whole library radically. However, to
get finished, the thinking about major extensions was eventually stopped, regardless of how
useful the extension would be. Thus, hash tables are not part of the standard, although they
should be a part of the STL as a common data structure.
Obviously since that time c++ 11 has come out, and since the name map was already taken, and hash_map is a name that was already widely used via common extension libraries (e.g.__gnu_cxx::hash_map), the name unordered_map was chosen for hash maps.

Something like boost::multi_index for Python

I have come to appreciate a lot boost::multi_index in C++. It happens that I would happily use something like that in Python; for scripts that process data coming out from numerical intensive applications. Is there such a thing for Python? I just want to be sure that it doesn't exist, then I would try to implement it myself. Things that won't do it for me:
Wrapping boost::multi_index in Python. It simply doesn't scale.
Using sqlite3 in memory. It is ugly.
Since python collections only store references to objects, not objects themselves, theres isn't much difference between having one collection with multiple indexing schemes, and just having multiple collections.
You can for example have several dicts with your data, each of them using different keys to refer to them.
To answer your question of whether a similar thing exists in Python, I would say no.
One useful feature of Boost.MultiIndex is that elements can be modified in-place (via replace() or modify()). Python's native dict doesn't provide such a functionality and requires the key to be immutable. I haven't seen other implementations that allow the key to be altered. So in this specific area, there's no comparable thing as Boost.MultiIndex in Python.
If you only require multiple static views of your data, then I would agree with Radomir Dopieralski. You can wrap multiple dicts in your own class to provide a unified API to ensure the synchronization between different views. I don't know what you mean by "performance-aware transformations" but if you were talking about the computational complexity of the insertion/deletion operations, even with Boost.MultiIndex, "inserting an element into a multi_index_container reduces to a simple combination of elementary insertion operations on each of the indices, and similarly for deletion."

C++ Data structures API Questions

What C++ library provides Data structures API that match the ones provided by java.util.* as much as possible.
Specifically, I am looking for the following DS and following Utility Functions:-
**DS**: Priority Queue, HashMap, TreeMap, HashSet,
TreeSet, ArrayList, String most importantly.
**Utility**: Arrays.* , Collections.*, Regex, FileHandling etc.
and other converters and algorithms like Binary Search, Sort, NthElement etc.
My guess is that Boost may be able to do all these, but I find it too bulky and is non-trivial to add it into a project, especially, when I want to quickly get started on something and when although the code would require all these data structures, the code overall is not going to be that huge to warrant spending lot of effort in setting up libraries.
An example would be if someone had to write a C++ program to do Network Flow Algorithm for a school assignment. I am sure I could come up with better examples, but this one's on top of my head.
Thanks
Ajay
All of those containers are available in some form in the SC++L:
Priority Queue std::priority_queue (this is actually a container adapter, rather than a container itself - that is, it works "on top of" another container, usually std::vector or std::deque.
HashMap std::unordered_map (or if your compiler doesn't support C++0x, there's boost::unordered_map)
TreeMap std::map
HashSet and TreeSet are basically the same as HashMap and TreeMap, except the key and value are the same thing. However, there's also std::unordered_set and std::set.
ArrayList is the venerable std::vector
String is the venerable std::string. Many of the functions you get in the Java String class can be found in the Boost.Strings library.
Do not be afraid of setting up boost. In my experience, you set it up once and then use it over and over again in all of your projects. Also, all of the libraries that I mentioned above are header-only libraries. That means, you don't actually need to build/install any libraries, just references the headers.
For the other things, I'm not so sure, since I don't know Java all that well. At the end of the day, you're not going to find a library that's "just like Java, except written in C++" because that would be kind of pointless. A C++ library is written to play to C++'s strength, a Java library is written to play to Java's strengths. To try and shoehorn a library designed for one language into another doesn't make sense to me.

What is the point of STL?

I've been programming c++ for about a year now and when i'm looking about i see lots of references to STL.
Can some one please tell me what it does?
and the advantages and disadvantageous of it?
also what does it give me over the borlands VCL or MFC?
thanks
It's the C++ standard library that gives you all sorts of very useful containers, strings, algorithms to manipulate them with etc.
The term 'STL' is outdated IMHO, what used to be the STL has become a large part of the standard library for C++.
If you are doing any serious C++ development, you will need to be familiar with this library and preferably the boost library. If you are not using it already, you're probably working at the wrong level of abstraction or you're constraining yourself to a small-ish subset of C++.
STL stands for Standard Template Library. This was a library designed mainly by Stepanov and Lee which was then adopted as part of the C++ Standard Library. The term is gradually becoming meaningless, but covers these parts of the Standard Library:
containers (vectors, maps etc.)
iterators
algorithms
If you call yourself a C++ programmer, you should be familiar with all of these concepts, and the Standard Library implementation of them.
The STL is the Standard Template Library. Like any library it's a collection of code that makes your life easier by providing well tested, robust code for you to re-use.
Need a collection (map, list, vector, etc) they're in the STL
Need to operate on a collection (for_each, copy, transform, etc,) they're in the STL
Need to do I/O, there's classes for that.
Advantages
1, You don't have to re-implement standard containers (cus you'll get it wrong anyway)
Read this book by Nicolai M.Josuttis to learn more about the STL, it's the best STL reference book out there.
It provides common useful tools for the programmer! Iterators, algorithms, etc. Why re-invent the wheel?
"advantages and disadvantageous" compared to what? To writing all that code yourself? Is not it obvious? It has great collections and tools to work with them
Wikipedia has a good overview: http://en.wikipedia.org/wiki/Standard_Template_Library
The STL fixes one big deficiency of C++ - the lack of a standard string type. This has cause innumerable headaches as there have been thousands of string implementations that don't work well together.
It stands for standard template library
It is a set of functions and class that are there to save you a lot of work.
They are designed to use templates, which is where you define a function, but with out defining what data type it will work on.
for example, vector more or less lets you have dynamic arrays. when you create an instance of it, you say what type you want it to work for. This can even be your own data type (class).
Its a hard thing to think about, but it is hugely powerful and can save you loads of time.
Get reading up on it now! You want regret it.
It gives you another acronym to toss around at cocktail parties.
Seriously, check the intro docs starting e.g. with the Wikipedia article on STL.
The STL has Iterators. Sure, collections and stuff are useful, but the power iterators is gigantic, and, in my humble opinion, makes the rest pale in comparison.

Functional data structures in C++

Does anyone know of a C++ data structure library providing functional (a.k.a. immutable, or "persistent" in the FP sense) equivalents of the familiar STL structures?
By "functional" I mean that the objects themselves are immutable, while modifications to those objects return new objects sharing the same internals as the parent object where appropriate.
Ideally, such a library would resemble STL, and would work well with Boost.Phoenix (caveat- I haven't actually used Phoenix, but as far as I can tell it provides many algorithms but no data structures, unless a lazily-computed change to an existing data structure counts - does it?)
I would look and see whether FC++ developed by Yannis Smaragdakis includes any data structures. Certainly this project more than any other is about supporting a functional style in C++.
This is more of a heads up than a detailed answer, but Bartosz Milewski appears to have done a lot of work on this. See, for example:
http://bartoszmilewski.com/2013/11/13/functional-data-structures-in-c-lists/
Looks like he's implemented a lot of algorithms from Okasiki's book Purely Functional Data Structures here:
https://github.com/BartoszMilewski/Okasaki
N.B. I haven't tried these yet, but they're the first C++ persistent data structures I've seen outside of FC++.
Hopefully, I'll get to trying them soon.