Efficiently find number of elements in range in C++ set [duplicate]

Efficiently find number of elements in range in C++ set [duplicate] - c++

I want to find rank of an element in stl set. I am able to traverse from beginning to that element and find out its rank but that is taking O(n). Is there any method to find the rank in O(logn).

No; a balanced tree does not need to store the number of descendants of each node, which is required to more quickly compute distance( s.begin(), iter ) for std::set s and iterator iter (which is what I suppose you mean). Therefore the information simply doesn't exist except by counting the items one by one.
If you need to perform many such computations, copy the set into a sorted, random-access sequence such as vector or deque, but then modification of the sequence becomes expensive.
A tree data structure that does what you ask probably exists in a free library somewhere, but I don't know of one.

What you are looking for is called an Order Statistic Tree. If you are using GNU C++ library, you should have an extension available for building order statistic trees. A short example is given below:
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <cstdio>
using namespace std;
using namespace pb_ds;
typedef tree<
int, /* key type */
null_mapped_type, /* value type */
less<int>, /* comparison */
rb_tree_tag, /* for having an rb tree */
tree_order_statistics_node_update> order_set;
int main()
{
order_set s;
s.insert(10);
s.insert(20);
s.insert(50);
s.insert(25);
printf("rank of 25 = %d\n", s.order_of_key(25));
}
Output should be rank of 25 = 2. For more examples, you can see this file.

The functionality of a sorted vector suggested by #Potatoswatter is provided by the flat_set from Boost.Container. The documentation lists the following trade-offs
Faster lookup than standard associative containers
Much faster iteration than standard associative containers
Less memory consumption for small objects (and for big objects if shrink_to_fit is used)
Improved cache performance (data is stored in contiguous memory)
Non-stable iterators (iterators are invalidated when inserting and erasing elements)
Non-copyable and non-movable values types can't be stored
Weaker exception safety than standard associative containers (copy/move constructors can throw when shifting values in erasures and insertions)
Slower insertion and erasure than standard associative containers (specially for non-movable types)

There is actually a built-in solution if you are using GCC, but Subhasis Das's answer is somewhat outdated and will not work with newer versions of GCC due to updates. The header is now
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
and the set structure is
typedef tree<
int,
null_type,
std::less<int>,
rb_tree_tag,
tree_order_statistics_node_update> ordered_set;
Alternatively, if a multiset is needed, the std::less<int> can be repalced with std::less_equal<int>.
Here's a demonstration of find-by-rank:
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
#include <iostream>
typedef tree<int, null_type, std::less_equal<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
int main()
{
ordered_set s;
s.insert(10);
s.insert(20);
s.insert(50);
s.insert(25);
for(int i=24; i<=26; i++) std::cout << "Rank of " << i << ": " << s.order_of_key(i) << std::endl;
}

I think there is a lower_bound function in the STL set in C++ and this can be used to find the rank of an element in a set. Take a look at https://www.geeksforgeeks.org/set-lower_bound-function-in-c-stl/.

Related

C++ STL container suited for finding the nth element in dynamic ordered list?

Using balanced BST like AVL or Red-Black-Tree, we can easily maintain a set of values that:
Insert/delete/query given value.
Count the elements that smaller/larger than given value.
Find the element with rank k after sort.
All above can be archived in O(log N) complexity.
My question is, is there any STL container supporting all 3 above operations in the same complexity?
I know STL set/multiset can be used for 1 and 2. And I examined the _Rb_tree based containers map/set/multiset, but none provides support for 3. Is there a way to subclassing ext/rb_tree to solve this?

The data structure you're looking for is an order statistic tree, which is a binary search tree where each node stores the size of the subtree rooted at that node.
This supports all of the listed operations in O(log n).
There exists an order statistic tree in GNU Policy-Based Data Structures (part of GNU C++).
The code would look something like this:
#include <iostream>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
typedef
tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
set_t;
int main()
{
set_t s;
s.insert(12);
s.insert(50);
s.insert(30);
s.insert(20);
cout << "1st element: " << *s.find_by_order(1) << '\n'; // 20
cout << "Position of 30: " << s.order_of_key(30) << '\n'; // 2
return 0;
}
Live demo.
[Derived from this answer]

Tracking node traversals in calls to std::map::find?

I'm performing a large number of lookups, inserts and deletes on a std::map. I'm considering adding some code to optimize for speed, but I'd like to collect some statistics about the current workload. Specifically, I'd like to keep track of how many nodes 'find' has to traverse on each call so I can keep a running tally.
I'm thinking that if most changes in my map occur at the front, I might be better off searching the first N entries before using the tree that 'find' uses.

Find will have to compare elements using the map's compare function so you can provide a custom compare function that counts the number of times it is called in order to see how much work it is doing on each call (essentially how many nodes are traversed).
I don't see how searching the first N entries before calling find() could help in this case though. Iterating through the entries in a map just traverses the tree in sorted order so it can't be more efficient than just calling find() unless somehow your comparison function is much more expensive than a check for equality.
Example code:
#include <algorithm>
#include <iostream>
#include <map>
#include <numeric>
#include <vector>
using namespace std;
int main() {
vector<int> v(100);
iota(begin(v), end(v), 0);
vector<pair<int, int>> vp(v.size());
transform(begin(v), end(v), begin(vp), [](int i) { return make_pair(i, i); });
int compareCount = 0;
auto countingCompare = [&](int x, int y) { ++compareCount; return x < y; };
map<int, int, decltype(countingCompare)> m(begin(vp), end(vp), countingCompare);
cout << "Compares during construction: " << compareCount << "\n";
compareCount = 0;
auto pos = m.find(50);
cout << "Compares during find(): " << compareCount << "\n";
}

If it's feasible for your key/value structures it is worth considering unordered_map (in C++11 or TR1) as an alternative. std::map, being a balanced tree, is not likely to perform well under this usage profile, and hybrid approaches where you search the first N seem like a lot of work to me with no guaranteed payoff.

How to find rank of an element in stl set in O(logn)

I want to find rank of an element in stl set. I am able to traverse from beginning to that element and find out its rank but that is taking O(n). Is there any method to find the rank in O(logn).

No; a balanced tree does not need to store the number of descendants of each node, which is required to more quickly compute distance( s.begin(), iter ) for std::set s and iterator iter (which is what I suppose you mean). Therefore the information simply doesn't exist except by counting the items one by one.
If you need to perform many such computations, copy the set into a sorted, random-access sequence such as vector or deque, but then modification of the sequence becomes expensive.
A tree data structure that does what you ask probably exists in a free library somewhere, but I don't know of one.

What you are looking for is called an Order Statistic Tree. If you are using GNU C++ library, you should have an extension available for building order statistic trees. A short example is given below:
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <cstdio>
using namespace std;
using namespace pb_ds;
typedef tree<
int, /* key type */
null_mapped_type, /* value type */
less<int>, /* comparison */
rb_tree_tag, /* for having an rb tree */
tree_order_statistics_node_update> order_set;
int main()
{
order_set s;
s.insert(10);
s.insert(20);
s.insert(50);
s.insert(25);
printf("rank of 25 = %d\n", s.order_of_key(25));
}
Output should be rank of 25 = 2. For more examples, you can see this file.

The functionality of a sorted vector suggested by #Potatoswatter is provided by the flat_set from Boost.Container. The documentation lists the following trade-offs
Faster lookup than standard associative containers
Much faster iteration than standard associative containers
Less memory consumption for small objects (and for big objects if shrink_to_fit is used)
Improved cache performance (data is stored in contiguous memory)
Non-stable iterators (iterators are invalidated when inserting and erasing elements)
Non-copyable and non-movable values types can't be stored
Weaker exception safety than standard associative containers (copy/move constructors can throw when shifting values in erasures and insertions)
Slower insertion and erasure than standard associative containers (specially for non-movable types)

There is actually a built-in solution if you are using GCC, but Subhasis Das's answer is somewhat outdated and will not work with newer versions of GCC due to updates. The header is now
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
and the set structure is
typedef tree<
int,
null_type,
std::less<int>,
rb_tree_tag,
tree_order_statistics_node_update> ordered_set;
Alternatively, if a multiset is needed, the std::less<int> can be repalced with std::less_equal<int>.
Here's a demonstration of find-by-rank:
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
#include <iostream>
typedef tree<int, null_type, std::less_equal<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;
int main()
{
ordered_set s;
s.insert(10);
s.insert(20);
s.insert(50);
s.insert(25);
for(int i=24; i<=26; i++) std::cout << "Rank of " << i << ": " << s.order_of_key(i) << std::endl;
}

I think there is a lower_bound function in the STL set in C++ and this can be used to find the rank of an element in a set. Take a look at https://www.geeksforgeeks.org/set-lower_bound-function-in-c-stl/.

stdext::hash_map unclear hash function

#include <iostream>
#include <hash_map>
using namespace stdext;
using namespace std;
class CompareStdString
{
public:
bool operator ()(const string & str1, const string & str2) const
{
return str1.compare(str2) < 0;
}
};
int main()
{
hash_map<string, int, hash_compare<string, CompareStdString> > Map;
Map.insert(make_pair("one", 1));
Map.insert(make_pair("two", 2));
Map.insert(make_pair("three", 3));
Map.insert(make_pair("four", 4));
Map.insert(make_pair("five", 5));
hash_map<string, int, hash_compare<string, CompareStdString> > :: iterator i;
for (i = Map.begin(); i != Map.end(); ++i)
{
i -> first; // they are ordered as three, five, two, four, one
}
return 0;
}
I want to use hash_map to keep std::string as a key. But when i insert the next pair order is confused. Why order is do not match to insert order ? how should i get the order one two three four five ??

Why order is do not match to insert order?
That's because a stdext::hash_map (and the platform-independent standard library version std::unordered_map from C++11) doesn't maintain/guarantee any reasonable order of its elements, not even insertion order. That's because it is a hashed container, with the individual elements' position based on their hash value and the size of the container. So you won't be able to maintain a reasonable order for your data with such a container.
What you can use to keep your elements in a guaranteed order is a good old std::map. But this also doesn't order elements by insertion order, but by the order induced by the comparison predicate (which can be confugured to respect insertion time, but that would be quite unintuitive and not that easy at all).
For anything else you won't get around rolling your own (or search for other libraries, don't know if boost has something like that). For example add all elements to a linear std::vector/std::list for insertion order iteration and maintain an additional std::(unordered_)map pointing into that vector/list for O(1)/O(log n) retrieval if neccessary.

What is the best way to use a HashMap in C++?

I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.
Any good examples will be appreciated.

The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map (std::map) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map) insert and access is in O(1). It is just another name for a hashtable.
An example with (ordered) std::map:
#include <map>
#include <iostream>
#include <cassert>
int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}
Output:
23
Key: hello Value: 23
If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.
Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).
The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).
Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:
#include <tr1/unordered_map>
std::tr1::unordered_map<std::string, int> m;
It is also part of boost, i.e. you can use the corresponding boost-header for better portability.

A hash_map is an older, unstandardized version of what for standardization purposes is called an unordered_map (originally in TR1, and included in the standard since C++11). As the name implies, it's different from std::map primarily in being unordered -- if, for example, you iterate through a map from begin() to end(), you get items in order by key1, but if you iterate through an unordered_map from begin() to end(), you get items in a more or less arbitrary order.
An unordered_map is normally expected to have constant complexity. That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. An std::map has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly, as the map grows larger. For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc.
From a practical viewpoint, that's not really the whole story though. By nature, a simple hash table has a fixed size. Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. As a result, operations that (potentially) grow the table (e.g., insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). Lookups, which cannot change the size of the table, are generally much faster. As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. For situations where you insert a lot of data, then iterate through the table once to retrieve results (e.g., counting the number of unique words in a file) chances are that an std::map will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file).
1 Where the order is defined by the third template parameter when you create the map, std::less<T> by default.

Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors:
#include <iostream>
#include <unordered_map>
class Hashtable {
std::unordered_map<const void *, const void *> htmap;
public:
void put(const void *key, const void *value) {
htmap[key] = value;
}
const void *get(const void *key) {
return htmap[key];
}
};
int main() {
Hashtable ht;
ht.put("Bob", "Dylan");
int one = 1;
ht.put("one", &one);
std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}
Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.)

Evidence that std::unordered_map uses a hash map in GCC stdlibc++ 6.4
This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by:
GDB step debugging into the class
performance characteristic analysis
Here is a preview of the performance characteristic graph described in that answer:
How to use a custom class and hash function with unordered_map
This answer nails it: C++ unordered_map using a custom class type as the key
Excerpt: equality:
struct Key
{
std::string first;
std::string second;
int third;
bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};
Hash function:
namespace std {
template <>
struct hash<Key>
{
std::size_t operator()(const Key& k) const
{
using std::size_t;
using std::hash;
using std::string;
// Compute individual hash values for first,
// second and third and combine them using XOR
// and bit shifting:
return ((hash<string>()(k.first)
^ (hash<string>()(k.second) << 1)) >> 1)
^ (hash<int>()(k.third) << 1);
}
};
}

For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:
In your class you need to define an equality operator overload ==. If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/
Under the standard namespace, declare a template struct called hash with your classname as the type (see below). I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/
namespace std {
template<>
struct hash<my_type> {
size_t operator()(const my_type& k) {
// Do your hash function here
...
}
};
}
So then to implement a hashtable using your new hash function, you just have to create a std::map or std::unordered_map just like you would normally do and use my_type as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.
#include <unordered_map>
int main() {
std::unordered_map<my_type, other_type> my_map;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js