Execute a function on matching pairs in a map - c++

I have some code that looks roughly like this; given two maps, if the first key exists in both maps, then multiply the two second values together, then sum all the products. For example:
s1 = {{1, 2.5}, {2, 10.0}, {3, 0.5}};
s2 = {{1, 10.0}, {3, 20.0}, {4, 3.33}};
The answer should be 2.5*10.0 + 0.5*20.0, the sum of the products of the matching keys.
double calcProduct(std::map<int, double> const &s1, std::map<int, double> const &s2)
{
auto s1_it = s1.begin();
auto s2_it = s2.begin();
double result = 0;
while (s1_it != s1.end() && s2_it != s2.end())
{
if (s1_it->first == s2_it->first)
{
result += s1_it->second * s2_it->second;
s1_it++:
s2_it++;
}
else if (s1_it->first < s2_it->first)
{
s1_it = s1.lower_bound(s2_it->first);
}
else
{
s2_it = s2.lower_bound(s1_it->first);
}
}
return result;
}
I would like to refactor this and std::set_intersection seems to be close to what I want as the documentation has an example using std::back_inserter, but is there a way to get this to work on maps and avoid the intermediate array?

The code you're using is already very close to the way that set_intersect would be implemented. I can't see any advantage to creating a new map and iterating over it.
However there were a couple of things with your code I wanted to mention.
If you're going to increment your iterators you shouldn't make them constant.
I would expect that there will be more misses than hits when looking for equivalent elements. I would suggest having the less than comparisons first:
double calcProduct( std::map<int , double> const &s1 , std::map<int , double> const &s2 )
{
auto s1_it = s1.begin();
auto s2_it = s2.begin();
double result = 0;
while ( s1_it != s1.end() && s2_it != s2.end() )
{
if ( s1_it->first < s2_it->first )
{
s1_it = s1.lower_bound( s2_it->first );
}
else if(s2_it->first < s1_it->first )
{
s2_it = s2.lower_bound( s1_it->first );
}
else
{
result += s1_it->second * s2_it->second;
s1_it++;
s2_it++;
}
}
return result;
}

Related

How to reduce time complexity under c++ with nested loops and regex?

I have such function.
Input argument - vector of user names, vector of strings, number of top users.
First I count amount of occurancies for each user in strings. If there are several occurancies in one string - it still counts as 1.
Then I sort it by amount of occurancies. If amount of occurancies are equal - sort alphabetically user names.
And function return top N users with the most occurancy.
std::vector<std::string> GetTopUsers(const std::vector<std::string>& users,
const std::vector<std::string>& lines, const int topUsersNum) {
std::vector<std::pair<std::string, int>> userOccurancies;
//count user occurancies
for (const auto & user : users) {
int count = 0;
for (const auto &line : lines) {
std::regex rgx("\\b" + user + "\\b", std::regex::icase);
std::smatch match;
if (std::regex_search(line, match, rgx)) {
++count;
auto userIter = std::find_if(userOccurancies.begin(), userOccurancies.end(),
[&user](const std::pair<std::string, int>& element) { return element.first == user; });
if (userIter == userOccurancies.end()) {
userOccurancies.push_back(std::make_pair(user, count));
}
else {
userIter->second = count;
}
}
}
}
//sort by amount of occurancies, if occurancies are equal - sort alphabetically
std::sort(userOccurancies.begin(), userOccurancies.end(),
[](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2)
{ return (p1.second > p2.second) ? true : (p1.second == p2.second ? p1.first < p2.first : false); });
//extract top N users
int topUsersSz = (topUsersNum <= userOccurancies.size() ? topUsersNum : userOccurancies.size());
std::vector<std::string> topUsers(topUsersSz);
for (int i = 0; i < topUsersSz; i++) {
topUsers.push_back(userOccurancies[i].first);
}
return topUsers;
}
So for the input
std::vector<std::string> users = { "john", "atest", "qwe" };
std::vector<std::string> lines = { "atest john", "Qwe", "qwe1", "qwe," };
int topUsersNum = 4;
output will be qwe atest john
But it looks very complex. O(n^2) for loops + regex inside. It must be O(n^3) or even more.
Can you give me please advices how to make it with less complexity in c++11?
And also give me advices about code.
Or maybe there are better board for questions about complexity and performance?
Thank you.
UDP
std::vector<std::string> GetTopUsers2(const std::vector<std::string>& users,
const std::vector<std::string>& lines, const size_t topUsersNum) {
std::vector<std::pair<std::string, int>> userOccurancies(users.size());
auto userOcIt = userOccurancies.begin();
for (const auto & user : users) {
userOcIt->first = std::move(user);
userOcIt->second = 0;
userOcIt++;
}
//count user occurancies
for (auto &user: userOccurancies) {
int count = 0;
std::regex rgx("\\b" + user.first + "\\b", std::regex::icase);
std::smatch match;
for (const auto &line : lines) {
if (std::regex_search(line, match, rgx)) {
++count;
user.second = count;
}
}
}
//sort by amount of occurancies, if occurancies are equal - sort alphabetically
std::sort(userOccurancies.begin(), userOccurancies.end(),
[](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2)
{ return (p1.second > p2.second) ? true : (p1.second == p2.second ? p1.first < p2.first : false); });
//extract top N users
auto middle = userOccurancies.begin() + std::min(topUsersNum, userOccurancies.size());
int topUsersSz = (topUsersNum <= userOccurancies.size() ? topUsersNum : userOccurancies.size());
std::vector<std::string> topUsers(topUsersSz);
auto topIter = topUsers.begin();
for (auto iter = userOccurancies.begin(); iter != middle; iter++) {
*topIter = std::move(iter->first);
topIter++;
}
return topUsers;
}
Thanks to #Jarod42. I updated first part. I think that allocate memory to vector once at constructor is faster than call emplace_back every time, so I used it. If I am wrong - mark me.
Also I use c++11, not c++17.
time results:
Old: 3539400.00000 nanoseconds
New: 2674000.00000 nanoseconds
It is better but still looks complex, isn't it?
constructing regex is costly, and can be moved outside the loop:
also you might move string instead of copy.
You don't need to sort all range. std::partial_sort is enough.
And more important, you might avoid the inner find_if.
std::vector<std::string>
GetTopUsers(
std::vector<std::string> users,
const std::vector<std::string>& lines,
int topUsersNum)
{
std::vector<std::pair<std::string, std::size_t> userCount;
userCount.reserve(users.size());
for (auto& user : users) {
userCount.emplace_back(std::move(user), 0);
}
for (auto& [user, count] : userCount) {
std::regex rgx("\\b" + user + "\\b", std::regex::icase);
for (const auto &line : lines) {
std::smatch match;
if (std::regex_search(line, match, rgx)) {
++count;
}
}
}
//sort by amount of occurancies, if occurancies are equal - sort alphabetically
auto middle = userCount.begin() + std::min(topUsersNum, userCount.size());
std::partial_sort(userCount.begin(),
middle,
userCount.end(),
[](const auto& lhs, const auto& rhs)
{
return std::tie(rhs.second, lhs.first) < std::tie(lhs.second, rhs.first);
});
//extract top N users
std::vector<std::string> topUsers;
topUsers.reserve(std::distance(userCount.begin(), middle));
for (auto it = userCount.begin(); it != middle; ++it) {
topUsers.push_back(std::move(it->first));
}
return topUsers;
}
i'm no professional coder, but i've made your code a bit faster (~90% faster, unless my math is wrong or i timed it wrong).
what it does is, it goes trough each of the lines, and for each line it counts the number of occurences for each user given. if the number of occurences for the current user are larger than the previous one, it moves the user at the beginning of the vector.
#include <iostream>
#include <Windows.h>
#include <vector>
#include <string>
#include <regex>
#include <algorithm>
#include <chrono>
std::vector<std::string> GetTopUsers(const std::vector<std::string>& users,
const std::vector<std::string>& lines, const int topUsersNum) {
std::vector<std::pair<std::string, int>> userOccurancies;
//count user occurancies
for (const auto & user : users) {
int count = 0;
for (const auto &line : lines) {
std::regex rgx("\\b" + user + "\\b", std::regex::icase);
std::smatch match;
if (std::regex_search(line, match, rgx)) {
++count;
auto userIter = std::find_if(userOccurancies.begin(), userOccurancies.end(),
[&user](const std::pair<std::string, int>& element) { return element.first == user; });
if (userIter == userOccurancies.end()) {
userOccurancies.push_back(std::make_pair(user, count));
}
else {
userIter->second = count;
}
}
}
}
//sort by amount of occurancies, if occurancies are equal - sort alphabetically
std::sort(userOccurancies.begin(), userOccurancies.end(),
[](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2)
{ return (p1.second > p2.second) ? true : (p1.second == p2.second ? p1.first < p2.first : false); });
//extract top N users
int topUsersSz = (topUsersNum <= userOccurancies.size() ? topUsersNum : userOccurancies.size());
std::vector<std::string> topUsers(topUsersSz);
for (int i = 0; i < topUsersSz; i++) {
topUsers.push_back(userOccurancies[i].first);
}
return topUsers;
}
unsigned int count_user_occurences(
std::string & line,
std::string & user
)
{
unsigned int occur = {};
std::string::size_type curr_index = {};
// while we can find the name of the user in the line, and we have not reached the end of the line
while((curr_index = line.find(user, curr_index)) != std::string::npos)
{
// increase the number of occurences
++occur;
// increase string index to skip the current user
curr_index += user.length();
}
// return the number of occurences
return occur;
}
std::vector<std::string> get_top_users(
std::vector<std::string> & user_list,
std::vector<std::string> & line_list
)
{
// create vector to hold results
std::vector<std::string> top_users = {};
// put all of the users inside the "top_users" vector
top_users = user_list;
// make sure none of the vectors are empty
if(false == user_list.empty()
&& false == line_list.empty())
{
// go trough each one of the lines
for(unsigned int i = {}; i < line_list.size(); ++i)
{
// holds the number of occurences for the previous user
unsigned int last_user_occur = {};
// go trough each one of the users (we copied the list into "top_users")
for(unsigned int j = {}; j < top_users.size(); ++j)
{
// get the number of the current user in the current line
unsigned int curr_user_occur = count_user_occurences(line_list.at(i), top_users.at(j));
// user temporary name holder
std::string temp_user = {};
// if the number of occurences of the current user is larger than the one of the previous user, move it at the top
if(curr_user_occur >= last_user_occur)
{
// save the current user's name
temp_user = top_users.at(j);
// erase the user from its current position
top_users.erase(top_users.begin() + j);
// move the user at the beginning of the vector
top_users.insert(top_users.begin(), temp_user);
}
// save the occurences of the current user to compare further users
last_user_occur = curr_user_occur;
}
}
}
// return the top user vector
return top_users;
}
int main()
{
std::vector<std::string> users = { "john", "atest", "qwe" };
std::vector<std::string> lines = { "atest john", "Qwe", "qwel", "qwe," };
// time the first function
auto start = std::chrono::high_resolution_clock::now();
std::vector<std::string> top_users = get_top_users(users, lines);
auto stop = std::chrono::high_resolution_clock::now();
// save the time in milliseconds
double time = std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start).count();
// print time
printf("%.05f nanoseconds\n", time);
// time the second function
auto start2 = std::chrono::high_resolution_clock::now();
std::vector<std::string> top_users2 = GetTopUsers(users, lines, 4);
auto stop2 = std::chrono::high_resolution_clock::now();
// save the time in milliseconds
double time2 = std::chrono::duration_cast<std::chrono::nanoseconds>(stop2 - start2).count();
// print time
printf("%.05f nanoseconds", time2);
getchar();
return 0;
}
results (for my PC at least, they're pretty consistent across multiple runs):
366800.00000 nanoseconds
4235900.00000 nanoseconds

Find first range not in set of ranges

I'm already familiar with the 1D bin packing nextFit, firstFit and bestFit + their offline algorithm variations. I mention these just for context.
My problem is about trying to find narrowest range (i,i+required_size) with width >= 1 that is not in the set of non-overlapping ranges:
int required_size;
std::set<std::pair<int,int>> ranges;
std::pair<int,int> result = findNarrowestFitFor(ranges,required_size);
How should I try solve this?
Explaining it to the duck: Obliviously I need to iterate the ranges and find two adjacent items that don't overlap.
Beta asked for example code, so here is what I'm at:
std::vector<char> buffer;
std::map<int,int> ranges;
// Search for free space by finding narrowest range
// that is not in ranges
const char * find_free_area(size_t nbytes) {
ptrdiff_t fit = buffer.capacity();
auto pos = ranges.begin();
auto itr = pos;
if(ranges.empty()) {
return buffer.data();
}
while(itr != ranges.end()) {
// Find next hole begin
itr = std::adjacent_find(ritr, ranges.end(),
[]( const std::pair<const int,int> & a,
const std::pair<const int,int> & b) {
return a.second != b.first;
});
if(itr == ranges.end()) {
itr = ranges.rbegin().base();
}
// Get next range
auto next = std::next(itr);
if(next != ranges.end()) {
auto space = next->first - itr->second;
if(space < fit) {
fit = space;
pos = ranges.begin();
std::advance(pos, itr->second);
}
} else if(itr->second ) {
// todo..
}
}
}

c++ Remove element from list and assign it to an object variable

I have a list of vector of Data (object) and I need to iterate through the list, find the biggest vector, remove it from the list and assign it to a new variable (which is a vector of Data). I am having problems during execution (it compiles ok but then stops working). How can I get the element without destroying it so I can manipulate later?
This is the code:
int biggestIndex = 0, biggestValue = -1;
i = 0;
list< vector<Data> >::iterator it;
for (it = (myList).begin(); it!= (myList).end(); it++) {
if ((*it).size() > biggerSize) {
biggestIndex = i;
biggestValue = basePList.size();
}
i++;
}
it = (myList).begin();
advance(it,biggestIndex);
vector<Data> partition = (vector<Data>) *it;
auto biggest = thelist.begin();
for (auto itr = thelist.begin() ; itr != thelist.end(); itr++) {
if (itr->size() > biggest->size()) {
biggest = itr;
}
}
vector<int> thebiggest = *biggest;
This of course needs to be compiled with at least C++11 extensions enabled, so add -std=c++11 or higher to your g++ command.
auto longest = *max_element(begin(myList),
end(myList),
[](const vector<Data>& v1, const vector<Data>& v2)
{return v1.size() < v2.size();});

rewrite access to collection to avoid "double" finding

I have such code:
std::unordered_map<int64_t /*id_ord*/, LimitOrder> futOrders;
auto i = futOrders.find(orderId);
if (i == futOrders.end()) {
LimitOrder& newOrder = futOrders[orderId];
// work
} else {
LimitOrder& futOrder = i->second;
// another work
}
Here I execute "find" twice:
first time: auto i = futOrders.find(orderId);
second time: LimitOrder& newOrder = futOrders[orderId];
Can i rewrite it somehow to avoid "double find"?
You can perform an emplace, and check the return value to know whether the item was inserted or not:
std::unordered_map<int64_t /*id_ord*/, LimitOrder> futOrders;
auto i = futOrders.emplace(
std::piecewise_construct, std::tie(orderId), std::make_tuple());
if (i.second) {
LimitOrder& newOrder = i.first->second;
// work
} else {
LimitOrder& futOrder = i.first->second;
// another work
}
How about using size() to realize if an element was inserted, like this:
auto old_size = futOrders.size();
LimitOrder& order = futOrders[orderId];
if (old_size < futOrders.size()) {
LimitOrder& newOrder = order;
// work
} else {
LimitOrder& futOrder = order;
// another work
}
Assuming there is a way to "determine if an order is empty", you could do:
LimitOrder& anOrder = futOrders[orderId];
if (anOrder.empty())
{
// New order, do stuff that only new orders need.
}
else
{
// Old order, update it.
}
The empty method could of course be something like if (anOrder.name == "") or if (anOrder.orderId == 0), etc.
You can use this overload of insert instead:
std::pair<iterator,bool> insert( const value_type& value );
Example:
std::unordered_map<int, std::string> m { {0, "A"}, {1, "B"}, {2, "C"} };
int orderId = 1;
// attempt to insert with key you have and default constructed value type
auto p = m.insert( std::make_pair(orderId, std::string()) );
if (p.second) {
// the element was inserted
} else {
// the element was not inserted
std::cout << p.first->second; // will print "B"
}
In both cases, p.first is the iterator to the element you search for (or just got inserted).

Top 5 values from std::map

I have a function already that takes out the key value with the most mapped value.
// Function for finding the occurances of colors or in this case hex values
void findOccurrances(double * mostNumTimes, map<string, int> &hexmap, string * colorlist)
{
map<string,int>::iterator it = hexmap.begin();
for( ;it != hexmap.end(); it ++)
{
if(*mostNumTimes <= it->second)
{
*mostNumTimes = it->second;
*colorlist = it->first;
}
}
}
Is there an easy way to expand it to show the top five results?
I know you can copy it to a vector and what not but I'm wanting an easier way of doing it.
Copying into a vector isn't that difficult:
typedef std::pair<string, int> Pair;
std::vector<Pair> contents(hexmap.begin(), hexmap.end());
Done.
But to find the top 5, believe it or not <algorithm> has a function template that does exactly what you want. In C++11, which usefully has lambdas:
std::vector<Pair> results(5);
std::partial_sort_copy(
hexmap.begin(), hexmap.end(),
results.begin(), results.end(),
[](const Pair &lhs, const Pair &rhs) { return lhs.second > rhs.second; }
);
results now contains the top 5 entries in descending order.
I have a little bit confused about the arguments. why you write the function in this way, And I try for this.
string colorlist;
double max_val = 0;
for (int i = 0; i < 5; ++i)
{
findOccurrances(&max_val, hexmap, &colorlist);
cout << colorlist << " " << max_val << endl; // to do something
mapStudent.erase(colorlist);
}
I'd interchange the key and value and create new map
std::map<int,std::string> dst;
std::transform(hexmap.begin(), hexmap.end(),
std::inserter(dst, dst.begin()),
[](const std::pair<std::string,int> &p )
{
return std::pair<int,std::string>(p.second, p.first);
}
);
And now print top five values of dst in usual way,
typedef std::map<int, std::string> Mymap;
Mymap::iterator st = dst.begin(), it;
size_t count = 5;
for(it = st; ( it != dst.end() ) && ( --count ); ++it)
std::cout << it->second << it->first <<std::endl ;
Edit:
Use a std::multimap if there are same int (value) for more than one std::string (key) in your hexmap