C++ Multimap manipulation - c++

I have created a multimap as I have repeating keys. But I want do an efficient manipulation so that I can generate a new multimap with subsequent higher keys aligned. This is what I mean:
This is what I have:
key values
11 qwer
11 mfiri
21 iernr
21 ghfnfjf
43 dnvfrf
This is what I want to achive
key values
11 qwer,iernr
11 mfiri,iernr
21 iernr,dnvfrf
21 ghfnfjf,dnvfrf
43 dnvfrf
I have about 10 million entries so I am looking for something efficient.
In above value "qwer,iernr" is one string.

Here's a simple way to do it:
auto cur = map.begin();
auto next = map.upper_bound(cur->first);
for(; next != map.end(); next = map.upper_bound(cur->first))
{
for(; cur != next; ++cur)
{
cur->second += ", ";
cur->second += next->second;
}
}
... given a std::multimap<int, std::string> map;
However, any operation transforming 10m+ elements isn't going to be super fast.

Looks like straight-forward way would work fine. Map elements will be laid out in ascending order (assuming compare operator suits you). So just going through the equal ranges and modifying them with value of the element just after the range will do what you want.
Clone map (if you need the original), take first element, get equal_range() for its key, modify values with value of second iterator in the range (unless it is the last one). Get equal_range() for the key of second iterator. Repeat.

agree with Eugene ! also see following reference in terms of equal_range()
stl::multimap - how do i get groups of data?

To do this, you need to simply iterate through the map, while building the new map in order.
You can do this in two levels:
for (auto it=map.cbegin(); it != map.cend(); )
{
// The inner loop is over all entries having the same key
auto next_key_it=find_next_key_after(it);
for (; it != next_key_it; ++it) {
new_map.emplace_hint(new_map.end(), it->first, new_value(it->second, next_key_it));
}
}
The new_value function (or lambda) does the value transformation (or not, if the second parameter is map.end()).
The find_next_key_after(it) function returns the same as map.upper_bound(it->first), but could also be implemented as linear search for the first entry with different key.
It depends on your (expected) key distribution, which to use - if keys repeat a small, limited number of times, linear search is better; if the number of different keys is limited, with large equal key ranges, then upper_bound may be better.
For guaranteed complexity, linear search is better: The whole algorithm then has O(n) complexity. Which is as efficient as you can get.

Related

How to access a range/interval of keys in an ordered map in C++?

I am trying to write an if-condition where I want to execute code depending on which elements of a map are accessed, e.g. for a map with 100 elements only for elements 26 to 74. But I do not want to address specific keys but rather a certain fraction of the map. Should I do this with the [] operator? I tried
if(map.size()/4 < iterator < map.size()*3/4){}
but this does not work.
Just get iterators to the beginning and end of the range you want to inspect like
auto it = std::next(map.begin(), map.size() / 4);
auto end = std::next(map.begin(), map.size() * 3 / 4);
and then you can iterate that range like
for (; it != end; ++it)
{
// do stuff here
}
You don't even need the end iterator if you want to you a counter. This saves you the cost of advancing the end iterator through the map which could make a difference, especially on larger maps. That would look like
auto it = std::next(map.begin(), map.size() / 4);
auto end = map.size() / 2;
for (size_t counter = 0; counter < end; ++it, ++counter)
{
// do stuff here
}
You cannot efficiently get this information from a std::map iterator, because they are not random access (but only bidirectional). In other words, given a std::map iterator, you can find out how many entries are before and after it only by decrementing/incrementing it until you are at the start/end. If you do this for every entry (e.g. in some <algorithm> function), that's risking a performance bottleneck - which may be acceptable in your situation, but you should be aware of it.
If you can do what NathanOliver suggests, that's great, but there are situations where it might not be that easy (e.g. non-contiguous ranges).

How does insertion in an unordered_map in C++ work?

int main()
{
auto n=0, sockNumber=0, pairs=0;
unordered_map<int, int> numberOfPairs;
cin >> n;
for(int i=0; i<n; ++i)
{
cin >> sockNumber;
numberOfPairs.insert({sockNumber, 0}); // >>>>>> HERE <<<<<<
numberOfPairs.at(sockNumber) += 1;
if(numberOfPairs.at(sockNumber) % 2 == 0)
{
pairs += 1;
}
}
cout << pairs;
return 0;
}
This code counts the number of pairs in the given input and prints it. I want to know how the insert method of an unordered_map works. Every time I see a number, I've inserted it with a value '0'.
Does the insert method skip inserting the value '0' when it sees the same number again? How does it work?
Input -
9
10 20 20 10 10 30 50 10 20
Output -
3
Does the insert method skip inserting the value '0' when it sees the
same number again?
Yes, it does.
From the cpp.reference.com unordered_map :
Unordered map is an associative container that contains key-value
pairs with unique keys. Search, insertion, and removal of elements
have average constant-time complexity.
And from the cpp.reference.com unordered_map::insert :
Inserts element(s) into the container, if the container doesn't
already contain an element with an equivalent key.
How does it work?
I suppose that certain work principles depend much on the particular STL implementation.
Basically unordered_map is implemented as a hash table where elements are organized into the buckets corresponding to the same hash. When you try to insert a key-value pair key hash is computed. If there is no such hash in the hash table or there is no such key-value pair in the bucket corresponding to the computed hash then the new pair is inserted into the unordered_map.
A std::unordered_map holds unique keys as values. If you want to keep inserting the same key, then use std::unordered_multimap.
Also, you should realize that std::unordered_map::insert returns a value that denotes whether the insertion was successful.
if ( !numberOfPairs.insert({sockNumber, 0}).second )
{
// insertion didn't work
}
You could have used the above to confirm that the item wasn't inserted, since the same key existed already in the map.
unordered_map does not allow key duplicates, so if you are trying to use the .insert() method to insert the same key it will fail and skip that operation. However if you use unorderedMap[key] = value to insert a duplicate key, it will not skip but updating the value matching the key to the new value.

Find which element is not sorted in a list

I have a list filled with the numbers 3, 7, 10, 8, 12. I'd like to write a line that will tell me which element in the list is not sorted (in this case it is the 4th element). However, the code I have right now tells me the value of the 4th element (8). Is there a way I can rewrite this to tell me it's the 4th element rather than the number 8?
Here is the code I have now:
list<int>::iterator i;
if (!is_sorted(myList.begin(), myList.end())) {
i = is_sorted_until(myList.begin(), myList.end());
cout << *i << endl;
}
The first thing I should say, is that if you care about numerical position, you should be using a random access container, such as std::vector. Then your job would be simple:
// calling is_sorted is a waste if you're about to call is_sorted_until
auto i = is_sorted_until(my_vector.begin(), my_vector.end());
if (i != my_vector.end())
cout << (i - my_vector.begin());
If you must use a list, and you still need the position, then you should write your own algorithm which provides this information. It really shouldn't be that hard, it's just a for loop comparing each element to the one that precedes it. When you find one which compares less than the one which procedes it, you've found your element. Just keep an integer count alongside it, and you're good.
The obvious way would be to simply search for an element that's less than the element that preceded it.
int position = 1;
auto prev = myList.begin(), pos=std::next(prev, 1);
while (pos != myList.end() && *prev < *pos) {
++position;
++prev;
++pos;
}
You could use a standard algorithm instead, but they seem somewhat clumsy for this situation.
Does this help?
std::is_sorted_until()
From http://www.cplusplus.com/reference/algorithm/is_sorted_until/:
Find first unsorted element in range
Returns an iterator to the first element in the range [first,last) which does not follow an ascending order.
The range between first and the iterator returned is sorted.
If the entire range is sorted, the function returns last.
The elements are compared using operator< for the first version, and comp for the second.

Using lower_bound() and upper_bound() to select records

I have a map of objects, keyed by a date (stored as a double). I want to filter/extract the objects based on date, so I wrote a function similar to the snippet below.
However, I found that if I provide a date that is either lower than the earliest date, or greater than the last date, the code fails. I have modified the code so that any input startdate that is lower than the first date is set to the first (i.e. lowest) date in the map, likewise, enddate > last date is set to the last (greatest) date in the map
void extractDataRecords(const DatedRecordset& recs, OutStruct& out, const double startdt, const double enddt)
{
double first = recs.begin()->first, last = recs.rbegin()->first;
const double sdate = (start < first) ? first : startdt;
const double edate = (enddt > last) ? last : enddt;
DatedRecordsetConstIter start_iter = recs.lower_bound(sdate), end_iter = recs.upper_bound(edate);
if ((start_iter != recs.end()) && (end_iter != recs.end()))
{
// do Something
}
}
Is this the correct way to achieve this behaviour?
std::lower_bound returns: "the first position into which value can be inserted without violating the ordering." std::upper_bound returns: "the furthermost position into which value can be inserted without violating the ordering." In other words, if you insert the new item at either position, you're guaranteed that the overall ordering of the collection remains intact.
If you're going to use both anyway, you should probably use std::equal_range instead -- it returns an std::pair of iterators, one that's the same as lower_bound would have returned, and the other the same as upper_bound would have returned. Although it has the same worst-case complexity as calling the two separately, it's usually faster than two separate calls.
It's worth noting, however that if what you have is really a map (rather than a multimap) there can only be one entry with a given key, so there's not much reason to deal with both lower_bound and upper_bound for any given key.
From GNU libstdc++
lower_bound:
This function returns the first element of a subsequence of elements
that matches the given key. If
unsuccessful it returns an iterator
pointing to the first element that has
a greater value than given key or
end() if no such element exists
Your original approach on using lower_bound sounds correct to me. However, I think you don't need to use upper_bound, you can do a simple comparison with enddt. I would try
for( DatadRecordsetConstIter cit = recs.lower_bound( startdt );
cit != rec.end(); ++cit ) {
if( *cit > enddt ) {
break;
}
// do stuff with *cit
}

how to get median value from sorted map

I am using a std::map. Sometimes I will do an operation like: finding the median value of all items. e.g
if I add
1 "s"
2 "sdf"
3 "sdfb"
4 "njw"
5 "loo"
then the median is 3.
Is there some solution without iterating over half the items in the map?
I think you can solve the problem by using two std::map. One for smaller half of items (mapL) and second for the other half (mapU). When you have insert operation. It will be either case:
add item to mapU and move smallest element to mapL
add item to mapL and move greatest element to mapU
In case the maps have different size and you insert element to the one with smaller number of
elements you skip the move section.
The basic idea is that you keep your maps balanced so the maximum size difference is 1 element.
As far as I know STL all operations should work in O(ln(n)) time. Accessing smallest and greatest element in map can be done by using iterator.
When you have n_th position query just check map sizes and return greatest element in mapL or smallest element in mapR.
The above usage scenario is for inserting only but you can extend it to deleting items as well but you have to keep track of which map holds item or try to delete from both.
Here is my code with sample usage:
#include <iostream>
#include <string>
#include <map>
using namespace std;
typedef pair<int,string> pis;
typedef map<int,string>::iterator itis;
map<int,string>Left;
map<int,string>Right;
itis get_last(map<int,string> &m){
return (--m.end());
}
int add_element(int key, string val){
if (Left.empty()){
Left.insert(make_pair(key,val));
return 1;
}
pis maxl = *get_last(Left);
if (key <= maxl.first){
Left.insert(make_pair(key,val));
if (Left.size() > Right.size() + 1){
itis to_rem = get_last(Left);
pis cpy = *to_rem;
Left.erase(to_rem);
Right.insert(cpy);
}
return 1;
} else {
Right.insert(make_pair(key,val));
if (Right.size() > Left.size()){
itis to_rem = Right.begin();
pis cpy = *to_rem;
Right.erase(to_rem);
Left.insert(*to_rem);
}
return 2;
}
}
pis get_mid(){
int size = Left.size() + Right.size();
if (Left.size() >= size / 2){
return *(get_last(Left));
}
return *(Right.begin());
}
int main(){
Left.clear();
Right.clear();
int key;
string val;
while (!cin.eof()){
cin >> key >> val;
add_element(key,val);
pis mid = get_mid();
cout << "mid " << mid.first << " " << mid.second << endl;
}
}
I think the answer is no. You cannot just jump to the N / 2 item past the beginning because a std::map uses bidirectional iterators. You must iterate through half of the items in the map. If you had access to the underlying Red/Black tree implementation that is typically used for the std::map, you might be able to get close like in Dani's answer. However, you don't have access to that as it is encapsulated as an implementation detail.
Try:
typedef std::map<int,std::string> Data;
Data data;
Data::iterator median = std::advance(data.begin(), data.size() / 2);
Works if the size() is odd. I'll let you work out how to do it when size() is even.
In self balancing binary tree(std::map is one I think) a good approximation would be the root.
For exact value just cache it with a balance indicator, and each time an item added below the median decrease the indicator and increase when item is added above. When indicator is equal to 2/-2 move the median upwards/downwards one step and reset the indicator.
If you can switch data structures, store the items in a std::vector and sort it. That will enable accessing the middle item positionally without iterating. (It can be surprising but a sorted vector often out-performs a map, due to locality. For lookups by the sort key you can use binary search and it will have much the same performance as a map anyway. See Scott Meyer's Effective STL.)
If you know the map will be sorted, then get the element at floor(length / 2). If you're in a bit twiddly mood, try (length >> 1).
I know no way to get the median from a pure STL map quickly for big maps. If your map is small or you need the median rarely you should use the linear advance to n/2 anyway I think - for the sake of simplicity and being standard.
You can use the map to build a new container that offers median: Jethro suggested using two maps, based on this perhaps better would be a single map and a continuously updated median iterator. These methods suffer from the drawback that you have to reimplement every modifiying operation and in jethro's case even the reading operations.
A custom written container will also do what you what, probably most efficiently but for the price of custom code. You could try, as was suggested to modify an existing stl map implementation. You can also look for existing implementations.
There is a super efficient C implementation that offers most map functionality and also random access called Judy Arrays. These work for integer, string and byte array keys.
Since it sounds like insert and find are your two common operations while median is rare, the simplest approach is to use the map and std::advance( m.begin(), m.size()/2 ); as originally suggested by David Rodríguez. This is linear time, but easy to understand so I'd only consider another approach if profiling shows that the median calls are too expensive relative to the work your app is doing.
The nth_element() method is there for you for this :) It implements the partition part of the quick sort and you don't need your vector (or array) to be sorted.
And also the time complexity is O(n) (while for sorting you need to pay O(nlogn)).
For a sortet list, here it is in java code, but i assume, its very easy to port to c++:
if (input.length % 2 != 0) {
return input[((input.length + 1) / 2 - 1)];
} else {
return 0.5d * (input[(input.length / 2 - 1)] + input[(input.length / 2 + 1) - 1]);
}