Why does the C++ STL use RBtree to implement "std::map"? - c++

Nowadays I am looking python source code, and I found both python and C# use hash to implement Dictionary.
The time complexity of hash is O(1) and RBtree is O(lgn), so can anybody tell me the reason why the C++ STL uses RBtree to implement std::map?

Because it has a separate container for hash tables: std::unordered_map<>. Note also that .NET has SortedDictionary<> in addition to Dictionary<>.

The answer can be found in "The Standard C++ Library, A Tutorial and Reference", available online here: http://cs-people.bu.edu/jingbinw/program/The%20C++STL-T&R.pdf.
Short quote explaining:
In general, the whole standard (language and library) is the result of a lot of discussions and
influence from hundreds of people all over the world. For example, the Japanese came up with
important support for internationalization. Of course, mistakes were made, minds were changed,
and people had different opinions. Then, in 1994, when people thought the standard was close to
being finished, the STL was incorporated, which changed the whole library radically. However, to
get finished, the thinking about major extensions was eventually stopped, regardless of how
useful the extension would be. Thus, hash tables are not part of the standard, although they
should be a part of the STL as a common data structure.
Obviously since that time c++ 11 has come out, and since the name map was already taken, and hash_map is a name that was already widely used via common extension libraries (e.g.__gnu_cxx::hash_map), the name unordered_map was chosen for hash maps.

Related

C++ Data structures API Questions

What C++ library provides Data structures API that match the ones provided by java.util.* as much as possible.
Specifically, I am looking for the following DS and following Utility Functions:-
**DS**: Priority Queue, HashMap, TreeMap, HashSet,
TreeSet, ArrayList, String most importantly.
**Utility**: Arrays.* , Collections.*, Regex, FileHandling etc.
and other converters and algorithms like Binary Search, Sort, NthElement etc.
My guess is that Boost may be able to do all these, but I find it too bulky and is non-trivial to add it into a project, especially, when I want to quickly get started on something and when although the code would require all these data structures, the code overall is not going to be that huge to warrant spending lot of effort in setting up libraries.
An example would be if someone had to write a C++ program to do Network Flow Algorithm for a school assignment. I am sure I could come up with better examples, but this one's on top of my head.
Thanks
Ajay
All of those containers are available in some form in the SC++L:
Priority Queue std::priority_queue (this is actually a container adapter, rather than a container itself - that is, it works "on top of" another container, usually std::vector or std::deque.
HashMap std::unordered_map (or if your compiler doesn't support C++0x, there's boost::unordered_map)
TreeMap std::map
HashSet and TreeSet are basically the same as HashMap and TreeMap, except the key and value are the same thing. However, there's also std::unordered_set and std::set.
ArrayList is the venerable std::vector
String is the venerable std::string. Many of the functions you get in the Java String class can be found in the Boost.Strings library.
Do not be afraid of setting up boost. In my experience, you set it up once and then use it over and over again in all of your projects. Also, all of the libraries that I mentioned above are header-only libraries. That means, you don't actually need to build/install any libraries, just references the headers.
For the other things, I'm not so sure, since I don't know Java all that well. At the end of the day, you're not going to find a library that's "just like Java, except written in C++" because that would be kind of pointless. A C++ library is written to play to C++'s strength, a Java library is written to play to Java's strengths. To try and shoehorn a library designed for one language into another doesn't make sense to me.

What is so great about STL? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am a Java developer trying to learn C++. I have many times read on the internet (including Stack Overflow) that STL is the best collections library that you can get in any language. (Sorry, I do not have any citations for that)
However after studying some STL, I am really failing to see what makes STL so special. Would you please shed some light on what sets STL apart from the collection libraries of other languages and make it the best collection library?
What is so great about the STL ?
The STL is great in that it was conceived very early and yet succeeded in using C++ generic programming paradigm quite efficiently.
It separated efficiently the data structures: vector, map, ... and the algorithms to operate on them copy, transform, ... taking advantage of templates to do so.
It neatly decoupled concerns and provided generic containers with hooks of customization (Comparator and Allocator template parameters).
The result is very elegant (DRY principle) and very efficient thanks to compiler optimizations so that hand-generated algorithms for a given container are unlikely to do better.
It also means that it is easily extensible: you can create your own container with the interface you wish, as long as it exposes STL-compliant iterators you'll be able to use the STL algorithms with it!
And thanks to the use of traits, you can even apply the algorithms on C-array through plain pointers! Talk about backward compatibility!
However, it could (perhaps) have been better...
What is not so great about the STL ?
It really pisses me off that one always have to use the iterators, I'd really stand for being able to write: std::foreach(myVector, [](int x) { return x+1;}); because face it, most of the times you want to iterate over the whole of the container...
But what's worse is that because of that:
set<int> mySet = /**/;
set<int>::const_iterator it = std::find(mySet.begin(), mySet.end(), 1005); // [1]
set<int>::const_iterator it = mySet.find(1005); // [2]
[1] and [2] are carried out completely differently, resulting in [1] having O(n) complexity while [2] has O(log n) complexity! Here the problem is that the iterators abstract too much.
I don't mean that iterators are not worthy, I just mean that providing an interface exclusively in terms of iterators was a poor choice.
I much prefer myself the idea of views over containers, for example check out what has been done with Boost.MPL. With a view you manipulate your container with a (lazy) layer of transformation. It makes for very efficient structures that allows you to filter out some elements, transform others etc...
Combining views and concept checking ideas would, I think, produce a much better interface for STL algorithms (and solve this find, lower_bound, upper_bound, equal_range issue).
It would also avoid common mistakes of using ill-defined ranges of iterators and the undefined behavior that result of it...
It's not so much that it's "great" or "the best collections library that you can get in *any* language", but it does have a different philosophy to many other languages.
In particular, the standard C++ library uses a generic programming paradigm, rather than an object-oriented paradigm that is common in languages like Java and C#. That is, you have a "generic" definition of what an iterator should be, and then you can implement the function for_each or sort or max_element that takes any class that implements the iterator pattern, without actually having to inherit from some base "Iterator" interface or whatever.
What I love about the STL is how robust it is. It is easy to extend it. Some complain that it's small, missing many common algorithms or iterators. But this is precisely when you see how easy it is to add in the missing components you need. Not only that, but small is beautiful: you have about 60 algorithms, a handful of containers and a handful of iterators; but the functionality is in the order of the product of these. The interfaces of the containers remain small and simple.
Because it's fashion to write small, simple, modular algorithms it gets easier to spot bugs and holes in your components. Yet, at the same time, as simple as the algorithms and iterators are, they're extraordinarily robust: your algorithms will work with many existing and yet-to-be-written iterators and your iterators work with many existing and yet-to-be-written algorithms.
I also love how simple the STL is. You have containers, you have iterators and you have algorithms. That's it (I'm lying here, but this is what it takes to get comfortable with the library). You can mix different algorithms with different iterators with different containers. True, some of these have constraints that forbid them from working with others, but in general there's a lot to play with.
Niklaus Wirth said that a program is algorithms plus data-structures. That's exactly what the STL is about. If Ruby and Python are string superheros, then C++ and the STL are an algorithms-and-containers superhero.
STL's containers are nice, but they're not much different than you'll find in other programming languages. What makes the STL containers useful is that they mesh beautifully with algorithms. The flexibility provided by the standard algorithms is unmatched in other programming languages.
Without the algorithms, the containers are just that. Containers. Nothing special in particular.
Now if you're talking about container libraries for C++ only, it is unlikely you will find libraries as well used and tested as those provided by STL if nothing else because they are standard.
The STL works beautifully with built-in types. A std::array<int, 5> is exactly that -- an array of 5 ints, which consumes 20 bytes on a 32 bit platform.
java.util.Arrays.asList(1, 2, 3, 4, 5), on the other hand, returns a reference to an object containing a reference to an array containing references to Integer objects containing ints. Yes, that's 3 levels of indirection, and I don't dare predict how many bytes that consumes ;)
This is not a direct answer, but as you're coming from Java I'd like to point this out. By comparison to Java equivalents, STL is really fast.
I did find this page, showing some performance comparisons. Generally Java people are very touchy when it comes to performance conversations, and will claim that all kinds of advances are occurring all the time. However similar advances are also occurring in C/C++ compilers.
Keep in mind that STL is actually quite old now, so other, newer libraries may have specific advantages. Given the age, its' popularity is a testament to how good the original design was.
There are four main reasons why I'd say that STL is (still) awesome:
Speed
STL uses C++ templates, which means that the compiler generates code that is specifically tailored to your use of the library. For example, map will automagically generate a new class to implement a map collection of 'key' type to 'value' type. There is no runtime overhead where the library tries to work out how to efficiently store 'key' and 'value' - this is done at compile time. Due to the elegant design some operations on some types will compile down to single assembly instructions (e.g. increment integer-based iterator).
Efficiency
The collections classes have a notion of 'allocators', which you can either provide yourself or use the library-provided ones which allocate only enough storage to store your data. There is no padding nor wastage. Where a built-in type can be stored more efficiently, there are specializations to handle these cases optimally, e.g. vector of bool is handled as a bitfield.
Exensibility
You can use the Containers (collection classes), Algorithms and Functions provided in the STL on any type that is suitable. If your type can be compared, you can put it into a container. If it goes into a container, it can be sorted, searched, compared. If you provide a function like 'bool Predicate(MyType)', it can be filtered, etc.
Elegance
Other libraries/frameworks have to implement the Sort()/Find()/Reverse() methods on each type of collection. STL implements these as separate algorithms that take iterators of whatever collection you are using and operate blindly on that collection. The algorithms don't care whether you're using a Vector, List, Deque, Stack, Bag, Map - they just work.
Well, that is somewhat of a bold claim... perhaps in C++0x when it finally gets a hash map (in the form of std::unordered_map), it can make that claim, but in its current state, well, I don't buy that.
I can tell you, though, some cool things about it, namely that it uses templates rather than inheritance to achieve its level of flexibility and generality. This has both advantages and disadvantages; a disadvantage is that lots of code gets duplicated by the compiler, and any sort of dynamic runtime typing is very hard to achieve; however, a key advantage is that it is incredibly quick. Because each template specialization is really its own separate class generated by the compiler, it can be highly optimized for that class. Additionally, many of the STL algorithms that operate on STL containers have general definitions, but have specializations for special cases that result in incredibly good performance.
STL gives you the pieces.
Languages and their environments are built from smaller component pieces, sometimes via programming language constructs, sometimes via cut-and-paste. Some languages give you a sealed box - Java's collections, for instance. You can do what they allow, but woe betide you if you want to do something exotic with them.
The STL gives you the pieces that the designers used to build its more advanced functionality. Directly exposing the iterators, algorithms, etc. give you an abstract but highly flexible way of recombining core data structures and manipulations in whatever way is suitable for solving your problem. While Java's design probably hits the 90-95% mark for what you need from data structures, the STL's flexibility raises it to maybe 99%, with the iterator abstraction meaning you're not completely on your own for the remaining 1%.
When you combine that with its speed and other extensibility and customizabiltiy features (allocators, traits, etc.), you have a quite excellent package. I don't know that I'd call it the best data structures package, but certainly a very good one.
Warning: percentages totally made up.
Unique because it
focuses on basic algorithms instead of providing ready-to-use solutions to specific application problems.
uses unique C++ features to implement those algorithms.
As for being best... There is a reason why the same approach wasn't (and probably won't) ever followed by any other language, including direct descendants like D.
The standard C++ library's approach to collections via iterators has come in for some constructive criticism recently. Andrei Alexandrescu, a notable C++ expert, has recently begun working on a new version of a language called D, and describes his experiences designing collections support for it in this article.
Personally I find it frustrating that this kind of excellent work is being put into yet another programming language that overlaps hugely with existing languages, and I've told him so! :) I'd like someone of his expertise to turn their hand to producing a collections library for the so-called "modern languages" that are already in widespread use, Java and C#, that has all the capabilities he thinks are required to be world-class: the notion of a forward-iterable range is already ubiquitous, but what about reverse iteration exposed in an efficient way? What about mutable collections? What about integrating all this smoothly with Linq? etc.
Anyway, the point is: don't believe anyone who tells you that the standard C++ way is the holy grail, the best it could possibly be. It's just one way among many, and has at least one obvious drawback: the fact that in all the standard algorithms, a collection is specified by two separate iterators (begin and end) and hence is clumsy to compose operations on.
Obviously C++, C#, and Java can enter as many pissing contests as you want them to. The clue as to why the STL is at least somewhat great is that Java was initially designed and implemented without type-safe containers. Then Sun decided/realised people actually need them in a typed language, and added generics in 1.5.
You can compare the pros and cons of each, but as to which of the three languages has the "greatest" implementation of generic containers - that is solely a pissing contest. Greatest for what? In whose opinion? Each of them has the best libraries that the creators managed to come up with, subject to other constraints imposed by the languages. C++'s idea of generics doesn't work in Java, and type erasure would be sub-standard in typical C++ usage.
The primary thing is, you can use templates to make using containers switch-in/switch-out, without having to resort to the horrendous mess that is Java's interfaces.
If you fail to see what usage the STL has, I recommend buying a book, "The C++ Programming Language" by Bjarne Stroustrup. It pretty much explains everything there is about C++ because he's the dude who created it.

What is the point of STL?

I've been programming c++ for about a year now and when i'm looking about i see lots of references to STL.
Can some one please tell me what it does?
and the advantages and disadvantageous of it?
also what does it give me over the borlands VCL or MFC?
thanks
It's the C++ standard library that gives you all sorts of very useful containers, strings, algorithms to manipulate them with etc.
The term 'STL' is outdated IMHO, what used to be the STL has become a large part of the standard library for C++.
If you are doing any serious C++ development, you will need to be familiar with this library and preferably the boost library. If you are not using it already, you're probably working at the wrong level of abstraction or you're constraining yourself to a small-ish subset of C++.
STL stands for Standard Template Library. This was a library designed mainly by Stepanov and Lee which was then adopted as part of the C++ Standard Library. The term is gradually becoming meaningless, but covers these parts of the Standard Library:
containers (vectors, maps etc.)
iterators
algorithms
If you call yourself a C++ programmer, you should be familiar with all of these concepts, and the Standard Library implementation of them.
The STL is the Standard Template Library. Like any library it's a collection of code that makes your life easier by providing well tested, robust code for you to re-use.
Need a collection (map, list, vector, etc) they're in the STL
Need to operate on a collection (for_each, copy, transform, etc,) they're in the STL
Need to do I/O, there's classes for that.
Advantages
1, You don't have to re-implement standard containers (cus you'll get it wrong anyway)
Read this book by Nicolai M.Josuttis to learn more about the STL, it's the best STL reference book out there.
It provides common useful tools for the programmer! Iterators, algorithms, etc. Why re-invent the wheel?
"advantages and disadvantageous" compared to what? To writing all that code yourself? Is not it obvious? It has great collections and tools to work with them
Wikipedia has a good overview: http://en.wikipedia.org/wiki/Standard_Template_Library
The STL fixes one big deficiency of C++ - the lack of a standard string type. This has cause innumerable headaches as there have been thousands of string implementations that don't work well together.
It stands for standard template library
It is a set of functions and class that are there to save you a lot of work.
They are designed to use templates, which is where you define a function, but with out defining what data type it will work on.
for example, vector more or less lets you have dynamic arrays. when you create an instance of it, you say what type you want it to work for. This can even be your own data type (class).
Its a hard thing to think about, but it is hugely powerful and can save you loads of time.
Get reading up on it now! You want regret it.
It gives you another acronym to toss around at cocktail parties.
Seriously, check the intro docs starting e.g. with the Wikipedia article on STL.
The STL has Iterators. Sure, collections and stuff are useful, but the power iterators is gigantic, and, in my humble opinion, makes the rest pale in comparison.

C++ hash table implementation with collision handling

What is a good C++ library for hash tables / hash maps similar to what java offers. I have worked with Google Sparsehash, but it has no support for collisions.
Use std::unordered_map (or unordered_multimap), which despite its name is a hash table - it will be part of the next C++ standard, and is available in most current C++ implementations. Do not use the classes with hash in their names that your implementation may provide - they are not and will not be standard.
http://www.sgi.com/tech/stl/hash_multimap.html
or
std::tr1::unordered_multimap
In addition to those mentioned in other answers, you could try MCT's closed_hash_map or linked_hash_map. It is internally similar to Google SparseHash, but doesn't restrict values used and has some other functional advantages.
I'm not sure I understand what you mean by "no support for collisions", though. Both Google SparseHash and similarly implemented MCT of course handle collisions fine, though differently than Java's HashMap.

What questions should an expert in STL be expected to answer, in an interview

I was looking at a job posting recently and one of the requirements was that a person be a 9/10 in their knowledge of STL.
When I judge my skills, to me a 10 is someone that writes advanced books on the subject, such as Jon Skeet (C#), John Resig (JavaScript) or Martin Odersky (Scala).
So, a 9/10 is basically a 10, so I am not certain what would be expected at that level.
An example of some questions would be found at: http://discuss.joelonsoftware.com/default.asp?joel.3.414500.47
Obviously some coding will be needed, but should everything be expected to be memorized, as there is quite a bit in STL.
In some cases Boost libraries extend STL, so should it be expected that I would be using Boost also, as I may sometimes confuse which function came from which of the two libraries.
I am trying to get an idea if I can answer questions that would be expected of a STL expert, though it is odd that being a C++ expert wasn't a requirement.
UPDATE
After reflecting on the answers to my question it appears that what they may be looking for is someone that can see the limits of STL and extend the library, which is something I haven't done. I am used to thinking within the limits of what STL and Boost give me and staying within the lines. I may need to start looking at whether that has been too limiting and see if I can go outside the box. I hope they don't mean a 9 as Google does. :)
Funny -- I don't consider myself a 9/10 in STL (I used to be, but I'm a bit rusty now), and I do fully agree with #joshperry's important terminological distinguo (I've often been on record as berating the abuse of STL to mean "the parts of the C++ standard library that were originally inspired by SGI's STL"!-), yet I consider his example code less than "optimally STL-ish". I mean, for the given task "Put all the integers in a vector to standard out.", why would anyone ever code, as #joshperry suggests,
for(std::vector<int>::iterator it = intVect.begin(); it != intVect.end(); ++i)
std::cout << *it;
rather than the obvious:
std::copy(intVect.begin(), intVect.end(), std::ostream_iterator<int>(std::cout));
or the like?! To me, that would kind of suggest they don't know about std::ostream_iterator -- especially if they're supposed to be showing off their STL knowledge, why wouldn't they flaunt it?-)
At my current employer, to help candidates self-rate about competence in a technology, we provide a helpful guide -- "10: I invented that technology; 9: I wrote THE book about it" and so on down. So, for example, I'd be a 9/10 in Python -- only my colleague and friend Guido can fairly claim a 10/10 there. STL is an interesting case: while Stepanov drove the design, my colleague Matt Austern did the first implementation and wrote "the" book about it, too (this one) -- so I think he'd get to claim, if not a 10, a 9.5. By that standard, I could be somewhere between 7 and 8 if I could take an hour to refresh (custom allocators and traits are always tricky, or at least that's how I recall them!-).
So, if you're probing somebody who claims a 9, grill them over the really hard parts such as custom allocators and traits -- presumably they wouldn't miss a beat on all the containers, algorithms, and special iterators, so don't waste much interview time on those (which would be key if you were probing for a 7 or 7.5). Maybe ask them to give a real-life example where they used custom traits and/or allocators, and code all the details of the implementation as well as a few sample uses.
BTW, if you're the one needing to cram on C++'s standard library at an advanced level, I'm told by knowledgeable and non-rusty friends that Josuttis' book nowadays is even more useful than my friend Matt's (unfortunately, I've never read Josuttis in depth, so I can't confirm or deny that - I do see the book has five stars on Amazon, which is impressive;-).
It is just a dumb requirement for a job. When hiring you want a good great programmer first, specific knowledge second.
It would be reasonable, at this day and age, to expect knowledge/familiarity/etc with the STL. But unless the job is to reimplement the STL, you don't need 9/10. Even if that is the job, you still just need a great programmer that has lots of experience with templates (making not just using).
For example, for all the answers to "output the integers of a vector", probably the exact same code is generated. Only the version that has been templated to handle any container of any items shows a hint of 'great' vs good (just a hint). ie the ability to abstract.
Anyhow, just go for it. Be ready to use the STL to help solve other problems. Nothing more.
(In fact, for most of the interviews I've been in, the requirement was to NOT use the STL. ie - write a function that reverses a string. My first answer is that there is probably something in the std lib that would do that. Then they say, right, of course, but what if you had to write it yourself...)
I should preface this by noting that I think the same criteria should be applied not only to STL (regardless of which definition you prefer for that), but to many other types of things as well.
From my perspective, simply knowing the existing STL components and being able to apply them well probably should not qualify as a 9/10. Rather, I'd consider that level about 7/10. 8/10 is when the person is able to extend the STL by providing new components that follow its philosophy and fit with existing components naturally and easily.
By 9/10, I'd expect to see somebody who can not only provide new components, but is able to improve some of the existing ones, such as Boost::bind. For 10/10, I'd expect to see this go beyond the rather ad hoc, localized improvements of a 9/10, and moving toward looking at a more architectural level, such as using ranges instead of individual iterators. For a concrete example, consider the difference between Boost's ranges and Andrei Alexandrescu's ideas for ranges. Boost's ranges are handy, useful and convenient, but they change what you type, not how you think. Andre's version of ranges is much more all-encompassing -- an architectural solution that changes how you design and think about code, not just how you type it.
Well, you could walk into the interview and say "I noticed that your posting asked for someone knowledgeable in STL but that term is sometimes used to mean: (1) C++ standard library; (2) the library Stepanov designed at HP; (3) the parts of [1] based on [2]; (4) specific vendor implementations of either [1], [2], or [3]; (5) the underlying principles of [2]. As such, the term is highly ambiguous, and must be used with extreme caution. If you meant [1] and insist on abbreviating, "stdlib" is a far better choice."*
Honestly though since it is a library it is somewhat finite and probably not composeable to nauseating infinitum like a language proper. So I would say any question that had them use some of the stdlib algorithms would be effective to see if they knew them well.
With iterators being an integral part of the stdlib I would also maybe ask them to "Put all the integers in a vector to standard out." I would expect something like:
// thanks to onebyone
std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, " ");
If they write something like the following they probably aren't very familiar with iterators:
for(int i = 0; i < vec.size(); ++i)
std::cout << vec[i];
Also one interesting thing to look for is if they do using namespace std at the top of their code file. Ask them why, and if they don't say something along the lines of "I use that for short demo code only" or if they put it in a header file, thank them for coming in and send them out the door.
Another aspect of the stdlib is it's heavy use of templates, the person should have a good understanding of basic template programing for type substitution. Maybe ask them to "Write a function that will write all of items of any stdlib container to standard out". I would expect to see something like:
template<typename InputIter>
void Output(InputIter it, InputIter end) {
while(it != end)
std::cout << *it++;
}
These are probably not 9/10 questions but interesting ones I think a 2-3/10 should know.
One 9/10 difficulty I would say is to write a derived iostream properly without using the boost stream base classes. But there is probably quite a difference between using the stdlib and extending it...
*(thanks to nolyc on freenode ##C++ for the quote)
9/10 is quite subjective. I have been asked good questions about STL. These are examples:
When should you use a deque vs a vector (knowledge of how they are internally implemented is helpful)
Recognize STL code that uses invalid references, or can end up using invalid references.
Implement simple operations on different containers, and know where and when to use std::algorithm vs member functions of a container.