How to find a pair from set using only the second value? - c++

I want to find the pair using the second element only and the first element could be anything, also all of the second elements are unique.
Code using std::find_if but this takes linear time
set<pair<int,int> > s;
s.insert(make_pair(3,1));
s.insert(make_pair(1,0));
auto it = find_if(s.begin(),s.end(),[value](const pair<int,int>& p ){ return p.second == value; });
if(it==s.end())
s.insert(make_pair(1,value));
else {
int v = it->first;
s.erase(it);
s.insert(make_pair(v+1,value));
}
I want to use std::find function of set so that it takes logarithmic time.

There is no data structure that do exactly what you want.
However databases do something similar. They call it Index Skip Scanning. To implement the same without starting from scratch, you could implement a std::map from the first thing in the pair to a std::map of the second thing in the pair. And now a lookup of a single pair is logarithmic in time, lookup of the things with a given first entry is also logarithmic in time (though iterating through those things may be slower), and lookup of the things with the second entry is linear in the number of first values you have, times logarithmic in the number of second values that you have.
Do note that this is only worthwhile if you have a very large number of pairs, and relatively few values for the first entry in the pair. And furthermore you are constantly changing data (so maintaining multiple indexes is a lot of overhead), and only rarely doing a lookup on the second value in the pair. Break any of those assumptions and the overhead is not worth it.
That is a rather specific set of assumptions to satisfy. It comes up far more often for databases than C++ programmers. Which is why most databases support the operation, and the standard library of C++ does not.

Related

What is the fastest way to check if value is exists in std::map?

What is the fastest way to check if value is exists in the std::map<int, int>? Should I use unordered map? In this task I can't use any libraries instead of std.
Now, I am not know any ways to do this without checking all values.
The fastest way is to not do it. Don't look for values in maps, look for keys in maps.
If you need to search for a value, use another data structure (or a separate map).
The only way to search for a value in a map is linearly (O(N)), but due to caching overhead in iterating over the map data structure, it's going to be even slower than iterating over e.g. a vector.
Unless you have very big data sets (over 100'000 or so) access times into maps should not bother you, since it's gonna be really minuscule in either cases, because you just have int as a key already.
To check whether value exist in a map or not you should just use iterators or maybe std::find_if. Doesn't really matter what way you choose it's gonna be linear (O(n)) anyway.
// both examples assume you using c++ 17 standart
// simple cycle variant
bool is_value_exists(auto const &map, int val) {
for (auto const &[key, value] : map) {
if (value == val) return true;
}
return false;
}
// find if version
#include <algorithm>
bool is_value_exists2(auto const &map, int val) {
return std::find_if(
map.begin(),
map.end(),
[val](auto const &kv) { return kv.second == val; }
) != map.end();
}
You can find and element using the find method. Maps are usually implemented by red-black trees, which has a logarithmic search complexity.
If you need to search by a value, then you could create a reverse map, a map which has the values of the initial map as keys and the corresponding keys are the values. You can search for the value by key in the second map, which will yield the key. However, rebuilding the invert map takes resources both in time and storage, so you should only do it if you are going to search multiple times.
Regarding the values, a map is not really different from a list or a vector. So exhaustive (linear) search is the fastest way.
Whenever you are entering key-value pair into the std::map instance, also add its pointer to the iterator of that element in an std::vector<iterator_...> variable.
myVec[value]=myMap.find(key); // at time of inserting a new key
This way, you can just use the value as a key(index) in the vector and directly access the map content using that pointer after comparing to nullptr.
Biggest downside is the extra book-keeping required when you remove keys from the map. If the removing operation is frequent enough, you may also use a map in place of it because if the expected value-range is too big (like all of 32bits), it is not memory-efficient.
You can also use the map-iteration-search as a backing-store of a direct-mapped cache (which works the fastest for integer keys(values here)). All cache-hits would be served at the cost of just a bitwise & operation with some value like 8191, 4095, etc (O(1)). All cache-misses would still require a full iteration of the map elements that is slow (O(N)).
So, if the cache-hit ratio is close to 100%, it can approach O(1), otherwise it will be O(N) that is slow.

Unodered_map find followed by emplace or just insert for maximum efficiency?

So I've been doing research on the efficiency of the orderings of different unordered_map function calls. Here are two possible workings out of the same code.
Note: keywordMap is an unordered map that maps strings to vector of a home-made struct (which is the type of e). This is done in a loop.
First option:
auto curKeyWord = someString;
auto curEntryPair = keywordMap.insert(
make_pair( move(curKeyWord), vector<entry*>{e} ) );
if (!curEntryPair.second){//insertion failed
curEntryPair.first->second.push_back(e);
}
Second option:
auto curKeyWord = someString;
auto curEntry = keywordMap.find(curKeyWord);
if( curEntry == end(keywordMap) ){//DNE in map
keywordMap.emplace( make_pair( move(curKeyWord), vector<entry*>{e} ) );
}
else{
curEntry->second.push_back(e);
}
I am interested in which of these blocks of code is faster. The question really boils down to how .insert works. If insert basically works as finding where the key should be and inserting it if it doesn't exist, then the first should be faster, as it is just a single probe. Once I've called insert, I have everything I need to call push_back should the insert not have done anything. It also is, however, significantly uglier. I'm also curious if insert has the same problem emplace does, where it constructs the element before checking whether or not the key exists in the map already.
It is possible that I will have to benchmark these two pieces of code, but I am wondering if there is any piece of information that I am missing that would tell me the answer now.
Generally speaking, the first bit of code is faster because it performs a singe lookup.
Here is the usual way of doing insert/overwrite in map:
auto rv = map.insert(std::make_pair(key, value));
if (!rv.second)
rv.first->second = value;
The point here is that that std::map is usually implemented as a balanced binary tree (google: red/black tree) so both insert() and find() take O(log(n)) steps. IE the container has a natural internal order and insertion must place the new items at their correct place. (that is why the keys must be in strict weak order).
std::unordered_map uses hashing, so the lookup is O(1) for the default-constructed map (ie when there is a single item in every bucket). Once collisions are allowed (ie when you have k items in each bucket), each lookup would be take O(k) steps.
Now, going back to the original question - doing less work is always better. The difference in the std::unordered_map case is very small (the second lookup happens in O(1) steps in the default case).

Accessing adjacent elements of a map in c++

Suppose I have a float-integer map m:
m[1.23] = 3
m[1.25] = 34
m[2.65] = 54
m[3.12] = 51
Imagine that I know that there's a mapping between 2.65 and 54, but I don't know about any other mappings.
Is there any way to visit the adjacent mappings without iterating from the beginning or searching using the find function?
In other words: can I directly access the adjacent values by just knowing about a single mapping...such as m[2.65]=54?
UPDATE Perhaps a more important "point" than my answer, brought up by #MattMcNabb:
Floating point keys in std:map
Can I directly access the adjacent values by just knowing about a single mapping (m[2.65]=54)
Yes. std::map is an ordered collection; which is to say that if an operator< exists (more generally, std::less) for the key type you can expect it to have sorted access. In fact--you won't be able to make a map for a key type if it doesn't have this comparison operator available (unless you pass in a predicate function to perform this comparison in the template invocation)
Note there is also a std::unordered_map which is often preferable for cases where you don't need this property of being able to navigate quickly between "adjacent" map entries. However you will need to have std::hash defined in that case. You can still iterate it, but adjacency of items in the iteration won't have anything to do with the sort order of the keys.
UPDATE also due to #MattMcNabb
Is there any way to visit the adjacent mappings without iterating from the beginning or searching using the find function?
You allude to array notation, and the general answer here would be "not really". Which is to say there is no way of saying:
if (not m[2.65][-2]) {
std::cout << "no element 2 steps prior to m[2.65]";
} else {
std::cout << "the element 2 before m[2.65] is " << *m[2.65][-2];
}
While no such notational means exist, the beauty (and perhaps the horror) of C++ is that you could write an augmentation of map that did that. Though people would come after you with torches and pitchforks. Or maybe they'd give you cult status and put your book on the best seller list. It's a fine line--but before you even try, count the letters and sequential consonants in your last name and make sure it's a large number.
What you need to access the ordering is an iterator. And find will get you one; and all the flexibility that it affords.
If you only use the array notation to read or write from a std::map, it's essentially a less-capable convenience layer built above iterators. So unless you build your own class derived from map, you're going to be stuck with the limits of that layer. The notation provides no way to get information about adjacent values...nor does it let you test for whether a key is in the map or not. (With find you can do this by comparing the result of a lookup to end(m) if m is your map.)
Technically speaking, find gives you the same effect as you could get by walking through the iterators front-to-back or back-to-front and comparing, as they are sorted. But that would be slower if you're seeking arbitrary elements. All the containers have a kind of algorithmic complexity guarantee that you can read up on.
When dereferencing an iterator, you will receive a pair whose first element is the key and second element is the value. The value will be mutable, but the key is constant. So you cannot find an element, then navigate to an adjacent element, and alter its key directly...just its value.

Efficient removal of a set of integers from another set

I have a (large) set of integers S, and I want to run the following pseudocode:
set result = {};
while(S isn't empty)
{
int i = S.getArbitraryElement();
result.insert(i);
set T = elementsToDelete(i);
S = S \ T; // set difference
}
The function elementsToDelete is efficient (sublinear in the initial size of S) and the size of T is small (assume it's constant). T may contain integers no longer in S.
Is there a way of implementing the above that is faster than O(|S|^2)? I suspect I should be able to get O(|S| k), where k is the time complexity of elementsToDelete. I can of course implement the above in a straightforward way using std::set_difference but my understanding is that set_difference is O(|S|).
Using std::set S;, you can do:
for (auto k : elementsToDelete(i)) {
S.erase(k);
}
Of course the lookup for erase is O(log(S.size())), not the O(1) you're asking for. That can be achieved with std::unordered_set, assuming not too many collisions (which is a big assumption in general but very often true in particular).
Despite the name, the std::set_difference algorithm doesn't have much to do with std::set. It works on anything you can iterate in order. Anyway it's not for in-place modification of a container. Since T.size() is small in this case, you really don't want to create a new container each time you remove a batch of elements. In another example where the result set is small enough, it would be more efficient than repeated erase.
The set_difference in C++ library has time complexity of O(|S|) hence it is not good for your purposes so i advice you to use S.erase() to delete set element in the S in O(logN) implemented as BST . Hence your time complexity reduces to O(NlogN)

Remove elements from first set element which second set contains without iteration

I have two sets of pairs ( I cannot use c++11)
std::set<std::pair<int,int> > first;
std::set<std::pair<int,int> > second;
and I need to remove from first set all elements which are in second set(if first contain element from second to remove). I can do this by iterating through second set and if first contains same element erase from first element, but I wonder is there way to do this without iteration ?
If I understand correctly, basically you want to calculate the difference of first and second. There is an <algorithm> function for that.
std::set<std::pair<int, int>> result;
std::set_difference(first.begin(), first.end(), second.begin(), second.end(), inserter(result, result.end()));
Yes, you can.
If you want to remove, not just to detect, that is here another <algorithm> function: remove_copy_if():
http://www.cplusplus.com/reference/algorithm/remove_copy_if/
imho. It's not so difficult to understand how it works.
I wonder is there way to do this without iteration.
No. Internally, sets are balanced binary trees - there's no way to operate on them without iterating over the structure. (I assume you're interested in the efficiency of implementation, not the convenience in code, so I've deliberately ignored library routines that must iterates internally).
Sets are sorted though, so you could do an iterations over each, removing as you went (so # operations is the sum of set sizes) instead of an iteration and a lookup for each element (where number of operations is the number of elements you're iterating over times log base 2 of the number of elements in the other set). Only if one of your sets is much smaller than the other will the iterate/find approach will win out. If you look at the implementation of your library's set_difference function )mentioned in Amen's answer) - it should show you how to do the two iterations nicely.
If you want something more efficient, you need to think about how to achieve that earlier: for example, storing your pairs as flags in identically sized two-dimension matrix such that you can AND with the negation of the second set. Whether that's practical depends on the range of int values you're storing, whether the amount of memory needed is ok for your purposes....