Why isn't std::set just called std::binary_tree? [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
std::set in C++ is not a real set in terms of data structures. std::unordered_set is a real set, but std::set is a binary search tree, more specifically a red-black tree. Why, then, is it called std::set? Is there some specific functionality that sets a std::set apart from a binary tree? Thanks.

Why isn't std::set just called std::binary_tree?
Because
Tree doesn't describe how the interface is used. Set does.
std::set does not provide sufficient operations to be used as a general purpose search tree. It only provides an interface to a particular application of a search tree: The representation of a set.
Technically the standard doesn't specify that std::set is a binary search tree; although red-black BST may be the only data structure that can achieve the requirements imposed on std::set, and the interface has carefully been specified with that data structure in mind, the choice of internal data structure is an implementation detail.
For same reason std::unordered_set isn't called std::hash_table.
std::unordered_set is a real set, but std::set is a binary search tree
std::unordered_set isn't any more or less "real" than std::set is. They are both sets with slightly different requirements and guarantees; One designed to be implementable using a tree, and another designed to be implementable using a hash table.
P.S. Tree and a hash table are not the only ways to represent a set. One internal data structure that can implement most - but not all - of std::set is a sorted vector. Especially for small sets, a sorted vector can be much faster than std::set.

Related

Is it faster to lookup in set than unordered_map [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a question about lookup speed. I want to know which STL container can produce the fastest lookup time in C++. unordered_map comes to my mind since it is implemented by hash map, but I am afraid its performance is penalized because it contains key-value pair, whereas set contains only key. I guess the answer will depends on 1) the data type of key; and 2) the STL implementation of set.
In other words, which container is faster to search for the existence of an key, is it set, unordered_map, or something else?
Edit:
Would appreciate the answer with more explanation on the implementation or mechanism of the container. For instance, unordered_map is fast because it's implemented with hashmap. That will be more helpful than saying "it depends on the need". Thanks!
This depends to a large degree on the distribution of your data, the size of your dataset, the compiler, the toolchain...
The only way you can know is to measure it for your use case.
Do this after selecting the appropriate container for your task, then switch to something else only if you find that you need to and that you can get better performance for your use case by doing so.
Based on your question, I'd say the choice is between set and unsorted_set. On the other hand, if you don't actually know yet whether your data have both keys and values, then you're probably not ready to start profiling your solution.
Consider this perspective from a different angle,
Since performance is what you're interested in here, you might want to design your data structure in a way to optimally utilise cacheline
If the number of elements is not too high, then a vector will outperform all other containers. This is the case, because vectors store elements in contiguous memory locations and your cache loves contiguous memory allocation
You also mentioned about key-value pairs hampering lookup speed. One way to get around this from cacheline perspective is to store the keys in a contiguous data structure, do the lookup with keys alone. The only when you have a hit, you might want to read the corresponding value from your key-value pair
Check out this talk by Mike acton for more on this

c++ stl containers:where can we use them [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
So we know that the basic structures which form the backbone of C++ algorithms are:
trees set
queue
linkedlist
array
vector map
unordered_map and pair.
My question is which data structure is suitable for which application.For instance I know that for Database indexing and searching preferred choices are B+ tree and Hash table.Can anyone shed some more light on this,
This is not only a C++ problem, but also an algorithm question. It maybe too broad, but I can give you some advice.
set and map: They are ordered container, it is used for a both manytimes-insert-and-read structure. It can finish insert delete read in O(logn) time.
vector: used for something like dynamic array or a structure you will frequently push_back at it, and if no other reason, you should use it.
deque: much like vector, but it can also finish push_front in O(1) time
list: used for a structure you need to frequently insert, but less random access
unordered_map and unordered_set: look for hash table
array: used for a structure whose size is fixed.
pair and tuple: bind many object into one struct. Nothing special
Beside all of this, there are also some container meeting other requirement, you can serach them.
e.g. any and optional

What is the underlying structure of an std::map? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Somebody told me yesterday that the underlying structure of an ordered map is a binary search tree. This does not make sense to me since you cannot have O(1) retrieval if that were the case. Can anyone explain?
Also, if one were to implement a hash table in C++ without using the stdlib, what would be the best way to do so?
std::map lookup time is not O(1) its O(log(n)).
std::unordered_map has a lookup time of O(1) amortized.
std::unordered_map and std::unordered_set are hashtables.
The underlying data structure is implementation-defined. It is most commonly implemented as a Red-Black tree which is a self-balancing binary search tree. The time complexity for getting an element is O(logn) (see this)
I would just read the implementation of std::unordered_map as a starting point. I assume this is learning activity so reading and understanding working STL implementation would be a good exercise. If it's not an exercise then use std::unordered_map
std::map uses Red-Black tree as it gets a reasonable trade-off between the complexity of node insertion/deletion and searching.

Why do we need to learn different Sorting algorithms when the STL sort function is already available to us in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Yesterday, this question came in my mind. Although I have neither read all the sorting algorithms like Quicksort, Merge Sort, Heapsort
Insertion Sort,
Selection Sort, and
Bubble Sort nor I have read the Introduction to Algorithms by CLRS but still, I am curious to know why there is a need to learn all such algorithms when the pre-defined sort function is already available to us in many languages.
Because
Simply sorting only may not be always the requirement. The requirement can be different. You may need to modify / integrate a sorting algorithm in order to develop a completely different thing.
The predefined sorting methods may not be the efficient at all cases.
Its always not about the sorted result but the approach of sorting in order to improve time and space complexity. Efficiency is the key.
There is no particular algorithm that is guaranteed to work best at all cases. Pros and cons may differ for different algorithms.
Need to understand which algorithm to be applied at what scenarios.
Sorting may not always done with numbers. It can be applied on other different complex types / structures. (There may not be pre-defined methods for complex cases )
There is always scope for a better approach.

why C++ STL have five different iterators? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
why C++ STL have five iterators? Only random iterator could be sufficient to operate on all the containers. Any specific reason?
Sorry..It is my mistake..I did't mean random iterator...I was supposed to ask about bidirectional iterator...So don't you think that only bidirectional iterator can cover the functionality of input, output, forward iterators? So is there any specific reason to introduce (input, output, forward) iterators concept? Thanks. –
Containers aren't the the only interesting sequence. Also, std::list<...> and the associative containers don't have an efficient method for random access although they are containers. std::forward_list<...> can walk in just one direction. When sequences are sources or drains, they can often just traversed once. Oh, look! I actually gave reasons for all five categories!
Note that the "STL iterators" are not classes but concepts, i.e., requirements for operations and associated types needed to meet the respective iterator concept. The basic idea is that algorithm interfaces are specified in terms of the weakest concepts yielding an efficient implementation. When stronger concepts are provided to the algorithms they may be able to apply some optimizations. This approach yields flexible and efficient algorithms operating on all kinds of different sequences.
To get an idea why check this page
A random access iterator cannot always work. A simple example: If you're streaming data via the network, you cannot start again from the beginning. There are more reasons, but simply read the page.