How to use map::equal_range without a copy of the object? - c++

I have a performance-sensitive function which uses a map<string, ...> to store some data.
I need to be able to look up values with any substring of some other string as the key, without creating an intermediate string (i.e., the goal is to prevent a heap allocation from happening merely because I want to look something up).
The obvious solution is to hold two separate data structures (perhaps with another map on the side, to map from some key to to each string) -- one for the strings, and one for references to those strings.
But I'm wondering, is there a better way to do this with just a map alone, or do I need another data structure? I'd like to avoid creating too many extra indirections if possible.

Sorry if I misunderstood, but would your problem be solved if you could use a "substring view" of the query string to search the multi-map, instead of an ordinary std::string object?
In that case something along the lines below would work (using C++11-based coding):
Define a substring-view object type. It is constructed from a string and (from,to) offsets, but does not make a copy of the substring:
class substrview
{
std::string::const_iterator _from;
std::string::const_iterator _to;
public:
substrview(
const std::string &s,
const std::size_t from,
const std::size_t to)
: _from(s.begin()+from), _to(s.begin()+to)
{ }
std::string::const_iterator begin() const
{ return _from; }
std::string::const_iterator end() const
{ return _to; }
};
In order to search the multi-map using the substring view, I suggest using the std::lower_bound and std::upper_bound methods from <algorithm>:
int main()
{
std::multimap<std::string,int> map {
{ "hello" , 1 },
{ "world" , 2 },
{ "foo" , 3 },
{ "foobar" , 4 },
{ "foo" , 5 },
};
std::string query { "barfoo" };
/* Search for all suffixes of "barfoo", one after the other: */
for (std::size_t i = 0 ; i < query.size() ; ++i) {
substrview subquery { query,i,query.size() };
auto found_from = std::lower_bound(begin(map),end(map),subquery,cmpL);
auto found_to = std::upper_bound(begin(map),end(map),subquery,cmpU);
/* Now [found_from,found_to) is the match range in the multi-map.
Printing the matches: */
while (found_from != found_to) {
std::cout << found_from->first << ", " << found_from->second << '\n';
++found_from;
}
}
}
For this to work, we only need to define the comparison operators cmpL and cmpU (one for lower_bound, the other for upper_bound – we need two because the comparison is assymetric: comparing a multi-map entry to a substringview in cmpL, and comparing a substringview to a multi-map entry in cmpU):
inline bool cmpL(
const std::pair<std::string,int> &entry,
const substrview &val)
{
return std::lexicographical_compare
(entry.first.begin(),entry.first.end(),val.begin(),val.end());
}
inline bool cmpU(
const substrview &val,
const std::pair<std::string,int> &entry)
{
return std::lexicographical_compare
(val.begin(),val.end(),entry.first.begin(),entry.first.end());
}
Working gist of the complete code: https://gist.github.com/4070189

You need a string_ref type which participates in the < relation with std::string. In the TS n3442, Jeffrey Yaskin proposes introducing a string_ref type influenced by Google's StringPiece and llvm's StringRef. If you can use either of those then you're pretty much done; otherwise writing your own to the proposed interface should be fairly easy, especially as you only need a subset of the functionality.
Note that if you have an implicit constructor from std::string:
string_ref(const std::string &s): begin(s.begin()), end(s.end()) {}
then the < relation with std::string comes for free.

Related

Unable to find a user-defined type in a c++ unordered set with custom operator==()

Problem Statement: Iterate over an array of objects and check if the object exists in an unordered_set.
Goal: I could have thousand of objects in one container to check their existence in millions of objects in another container. I choose unordered_set for its constant finding complexity and vector for iterating. I'm new to this and if you have any alternate approach, I'd really appreciate it.
Issue: unordered_set find isn't working as expected or I got the concept wrong!
Main:
int main() {
std::vector<std::unique_ptr<Block>> vertices;
vertices.push_back(std::make_unique<Block>("mod1", "work"));
vertices.push_back(std::make_unique<Block>("mod2", "work"));
vertices.push_back(std::make_unique<Block>("mod3", "work"));
std::unordered_set<std::unique_ptr<Block>> undefs;
undefs.insert(std::make_unique<Block>("mod1", "work"));
undefs.insert(std::make_unique<Block>("mod2", "work"));
for(auto& vertex : vertices) {
auto search = undefs.find(vertex);
if(search != undefs.end()){
std::cout << "Block: " << vertex->getName() << "\n";
}
}
}
Block Class Overload:
bool Block::operator==(std::unique_ptr<Block>& block) const {
return block->getName() == mName;
}
Expected Output:
mod1
mod2
Block:
#pragma once
#include <string>
#include <memory>
using std::string;
class Block {
private:
string mName;
string mLib;
public:
Block(string const& name, string const& lib);
string getName() const;
string getLib() const;
bool operator==(std::unique_ptr<Block>& block) const;
};
You are trying to compare pointers, not values.
You need to specify hashing function for class Block.
For example, if you want to use mName as key the code will be the following:
class Block {
private:
string mName;
string mLib;
public:
Block(string const& name, string const& lib)
{
mName = name;
mLib = lib;
}
string getName() const {
return mName;
};
string getLib() const {
return mLib;
}
bool operator==(const Block & block) const;
};
template<> struct std::hash<Block> {
std::size_t operator()(const Block & block) const noexcept {
return std::hash<std::string>{}(block.getName());
}
};
bool Block::operator==(const Block & block) const {
return block.getName() == mName;
}
int main() {
std::vector<Block> vertices;
vertices.emplace_back(Block("mod1", "work"));
vertices.emplace_back(Block("mod2", "work"));
vertices.emplace_back(Block("mod3", "work"));
std::unordered_set<Block> undefs;
undefs.emplace(Block("mod1", "work"));
undefs.emplace(Block("mod2", "work"));
for (auto& vertex : vertices) {
auto search = undefs.find(vertex);
if (search != undefs.end()) {
std::cout << "Block: " << vertex.getName() << "\n";
}
}
}
An unordered_set requires a hashing function and a comparison function. You are using the existing hashing and comparison functions for std::unique_ptr, which is definitely not what you want.
I would not suggest trying to change the behavior of std::unique_ptr<Block> because that will lead to confusion in other code that wants normal semantics for pointers. Instead, add normal hashing and comparison functions for Block and pass customized ones to the constructor of the unordered_set.
The problem there is that you are trying to compare pointers, which are different!
I'dont know the reasons behind using unique_ptr<> but by doing that you are actually trying to compare identities, instead of states which is what you want.
So you can see what I mean, let's say the first Block object is at position 100 in your memory. That would be its identity. So we have object1 whose state is "mod1, work" and whose identity is 100. Then we have object2, whose identity is 150 but its state is the same as object1, "mod1, work".
All what you have both inside the vector and the unordered_set are pointers, so you have memory positions. When inserting them in the vector, you inserted, let's say, position 100. But in the unordered_set you inserted 150. They have the same state, but find method is looking for a memory position.
I hope my answer was helpful. If you find any mistakes here or think differently please let me know. Good luck! :)

Pattern for dynamically setting an order based on contents of a std::vector<string>

I am currently struggling coming up with an optimized method for dynamic ordering. I currently have a vector that looks like this in some place of my code
std::vector<std::string> vec {
"optionB",
"optionA",
"optionC"
};
The items in the above vector can be shuffled.The items in this vector are inserted in a specific order so the above order can be different. For simplicity sakes I added the items during declaration.There are about 9 items in the actual case for simplicity I am using only 3 string items.
Now somewhere else in my code I have something like this.
void filter()
{
bool _optionA,_optionB,_optionC
...
//These boolean variables get assigned values
...
...
/*
Todo : I would like to change the ordering of the
following code based on the ordering of items in the
vector. Currently its in the order _optionA _optionB,
_optionC. I would like this ordering to be based
on the order of the strings as in the above vector.
so it should be _optionB,_optionA,_optionC ,
I understand the items in the vector are string
and the following are boolean types
*/
if(_optionA){
}
if(_optionB) {
}
if(_optionC){
}
}
The simplest approach that comes to my mind is
for(auto str : vec)
{
if( (str=="optionA" && _optionA))
{
//This was optionA
}
else if( (str=="optionB" && _optionB)) {
}
else if( (str=="optionC" && _optionC)) {
}
}
I want to know what would be the most optimized way to accomplish the above task ? I am looking for a solution that would avoid iterating through a vector since its in a performance centric piece of code. Is there a way for me to use integrate bitwise operations or something like array indexing to accomplish this task ? Please let me know if something is unclear
It sounds like you want map a string to an actual process. Could you create an interface option class and have instances of options mapped to the string that should cause them to occur? That way you could use the string as a key to get back an Option object and call something like myOption.execute().
The downside to this method is that you need to create a new option class and have it inherit from the interface each time you need a new option.
#Edit: Sorry I think I may have misunderstood the question. But I think the premise still applies you could have a map of string to boolean and just use the string as a key to get back whether the option is toggled on or off.
Assuming you load the vector in on start up, you can sort it at that point to your liking. For example, in alphabetical order. This will mean that you know the order of the vector therefore you can simply reference the vector by index when checking in the your filter function.
Load in data into vector std::vector<string> data = {"optionA", "optionB"};.
Sort using std::sort(data.begin, data.end); or any other sort method of your choice.
Then in you filter function check the vector based on index. if (data.at(1) == "optionA") { }
If I understand your problem correctly, you need to imply order_by on the boolean variables/predicates.
In the below program I will refer your (_optionA, _optionB, _optionC) as predicates even though they are bool, since we can upgrade this problem to work with predicates as well.
Based on the above assumption, I am going ahead with an implementation.
You should pass an ordered_predicates to your filter function.
ordered_predicates is sorted according to your desired criteria.
filter()'s job is just to execute them in the order defined.
auto filter(std::vector<bool> const & ordered_predicates)
-> void
{
for (auto const & condition : ordered_predicates) {
if (condition) {
// ... do your usual stuff here
}
}
}
So how should we go ahead to achieve this ordered_predicates?
We will create a function called order_by that will take an order_by_criteria and a mapping, which will help it in creating ordered_predicates.
With this function, creating ordered_predicates is just a one time cost.
auto order_by(std::vector<std::string> const & order_by_criteria,
std::map<std::string, bool> const & mapping)
-> std::vector<bool>
{
std::vector<bool> ordered_predicates;
for (auto const & item : order_by_criteria)
ordered_predicates.push_back(mapping.at(item));
return ordered_predicates;
}
Where order_by_criteria is your std::vector<std::string> and mapping is just a map which tells which string and predicates are associated.
std::vector<std::string> order_by_criteria { "optionB", "optionA", "optionC" };
std::map<std::string, bool> mapping = { {"optionA", _optionA },
{"optionB", _optionB },
{"optionC", _optionC } };
Here is a complete working program for your reference.
#include <iostream>
#include <map>
#include <vector>
auto order_by(std::vector<std::string> const & order_by_criteria,
std::map<std::string, bool> const & mapping)
-> std::vector<bool>
{
std::vector<bool> ordered_predicates;
for (auto const & item : order_by_criteria)
ordered_predicates.push_back(mapping.at(item));
return ordered_predicates;
}
auto filter(std::vector<bool> const & ordered_predicates)
-> void
{
for (auto const & condition : ordered_predicates) {
if (condition) {
// ... do your usual stuff here
}
}
}
int main()
{
bool _optionA = true, _optionB = false, _optionC = true;
std::vector<std::string> order_by_criteria { "optionB", "optionA", "optionC" };
std::map<std::string, bool> mapping = { {"optionA", _optionA },
{"optionB", _optionB },
{"optionC", _optionC } };
auto ordered_predicates = order_by(order_by_criteria, mapping);
filter(ordered_predicates);
filter(ordered_predicates); // call as many times as you want, with pre-decided order
return 0;
}
If I got the problem correctly, sorting is a way to go. Just sort the vector together with bool flags, using std::vector values as keys, and then simply check bool flags in fixed, lexicographic, order.
Suppose we have a vector {"optB", "optC", "optA"}. After sorting, the indices {0, 1, 2} will rearrange: std::size_t perm[] = {2, 0, 1}. Using this information, that can be precomputed (outside filter(...)), we can rearrange the bool flags:
bool options[N];
// populate options...
bool new_options[N];
for (std::size_t i = 0; i < N; ++i)
new_options[perm[i]] = options[i];
Now we simply check new_options successively:
if (new_options[0]) {
...
}
if (new_options[1]) {
...
}
To precompute perm array use std::map:
std::map<std::string, std::size_t> map;
for (std::size_t i = 0; i < N; ++i)
map.emplace(vec[i], i);
std::size_t perm[N];
auto m = map.begin();
for (std::size_t i = 0; i < N; ++i, ++m)
perm[i] = m->second;

C++ unordered_map<string, ...> lookup without constructing string

I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:
char* buffer, size_t bufferSize
At some point, however, I'd like to look up a substring in one of these:
std::unordered_map<std::string, Info> stringToInfo = {...
So, to do that, I go:
stringToInfo.find(std::string(buffer, bufferSize))
That constructs a std::string for the sole purpose of the lookup.
I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...
class SubString
{
char* buffer;
size_t bufferSize;
// ...
};
... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.
So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?
What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find and std::set::find (note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html
Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find.
If you want to stick to std::unordered_map, you could avoid creating so many string temporaries by storing a std::string member alongside your unordered_map that you can reassign values to, then pass that string to find. You could encapsulate this in a custom container class.
Another route is to write a custom class to use as your unordered container key:
struct CharPtrOrString
{
const char* p_;
std::string s_;
explicit CharPtrOrString(const char* p) : p_{p} { }
CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }
bool operator==(const CharPtrOrString& x) const
{
return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
: p_ == x.s_
: x.p_ ? s_ == x.p_
: s_ == x.s_;
}
struct Hash
{
size_t operator()(const CharPtrOrString& x) const
{
std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
return std::hash<std::string_view>()(sv);
}
};
};
You can then construct CharPtrOrString from std::strings for use in the unordered container keys, but construct one cheaply from your const char* each time you call find. Note that operator== above has to work out which you did (convention used is that if the pointer's nullptr then the std::string member's in use) so it compares the in-use members. The hash function has to make sure a std::string with a particular textual value will produce the same hash as a const char* (which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).
In C++20, you can now do this:
// struct is from "https://www.cppstories.com/2021/heterogeneous-access-cpp20/"
struct string_hash {
using is_transparent = void;
[[nodiscard]] size_t operator()(const char *txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(std::string_view txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(const std::string &txt) const {
return std::hash<std::string>{}(txt);
}
};
// Declaration of map
std::unordered_map<std::string, Info, string_hash, std::equal_to<>> map;
std::string_view key = "foo";
if (map.find(key))
{
// do something here
}
Just note that you will still need std::string when using []. There may be a way around that, but I'm not too sure

Binary search on the vector of structs

I have a vector of structs that contains struct with this architecture
struct Main{
int mainID;
string mainDIV;
string mainNAME;
}
is it possible to use binary search on struct? I know its easy to use on value using
binary_search( vector.begin() , vector.end() , 5 )
But is there a way how to pass callback or something to actually find attribute of struct? I fail to find anything related to thi topic.
Yes, it's possible. The value that std::binary_search takes is only meaningful when compared to the elements of the container. In the simple case (if Main supports operator< somewhere), you would provide an element of type Main as the value:
// check if this specific Main exists
bool yes = std::binary_search(v.begin(), v.end(), Main{0, "some", "strings"});
// does exactly the same thing as above
bool yes = std::binary_search(v.begin(), v.end(), Main{0, "some", "strings"}
, std::less<Main>{});
If it doesn't support operator< (or your container is ordered by something else, e.g. mainID), then you will have to provide a comparator yourself that the algorithm will use:
// check if there is a Main with mainID 5
bool yes = std::binary_search(v.begin(), v.end(), 5,
[](const Main& element, const int value) {
return element.mainID < value;
});
You have to provide information to binary_search() to tell it how to compare your objects. The two most common ways, are to either add an operator<() to the struct if that is possible, or provide a helper function that can compare two structs.
The first form would look something like this:
struct Main {
int mainID ;
string mainDIV ;
string mainNAME ;
bool operator<(const Main & other) const
{
return mainID < other.mainID ;
}
}
This will only compare on on mainID, but you can expand it from there.
Also, this only teaches the compiler how to compare two struct Main, while #Barry's answer above will match an int and a struct Main. But lets keep going with this answer.
Now to find the record for 5, we have to make it into a struct Main:
struct Main search_key = { 5 } ;
bool yes = std::binary_search( v.begin(), v.end(), search_key ) ;
Now, this isn't very elegant, and besides if you have a constructor for struct Main ( and haven't put it in your example ), this won't even work. So we add another constructor just for int.
struct Main
{
Main(int id, const string & a_div, const string & a_name ) : id(id), div(a_div), name(a_name) { }
Main(int id) : id(id) { }
int id ;
string div, name ;
bool operator<(const Main &o) const { return id < o.id ; }
} ;
Now we can do a slightly shorter form:
bool has_3 = std::binary_search( v.begin(), v.end(), Main( 3) ) ;
Historical note: Bjarne has been trying for quite some time to get default comparison operators into the standard, but not everyone was excited about it at the standards meetings. I though there was some progress on it at the last meeting and so it may eventually appear when C++17 is a thing.

Find array element by member value - what are "for" loop/std::map/Compare/for_each alternatives?

Example routine:
const Armature* SceneFile::findArmature(const Str& name){
for (int i = 0; i < (int)armatures.size(); i++)
if (name == armatures[i].name)
return &armatures[i];
return 0;
}
Routine's purpose is (obviously) to find a value within an array of elements, based on element's member variable, where comparing member variable with external "key" is search criteria.
One way to do it is to iterate through array in loop. Another is to use some kind of "map" class (std::map, some kind of vector sorted values + binarySearch, etc, etc). It is also possible to make a class for std::find or for std::for_each and use it to "wrap" the iteration loop.
What are other ways to do that?
I'm looking for alternative ways/techniques to extract the required element.
Ideally - I'm looking for a language construct, or a template "combo", or a programming pattern I don't know of that would collapse entire loop or entire function into one statement. Preferably using standard C++/STL features (no C++0x, until it becomes a new standard) AND without having to write additional helper classes (i.e. if helper classes exist, they should be generated from existing templates).
I.e. something like std::find where comparison is based on class member variable, and a variable is extracted using standard template function, or if variable (the one compared against "key"("name")) in example can be selected as parameter.
The purpose of the question is to discover/find language feature/programming technique I don't know yet. I suspect that there may be an applicable construct/tempalte/function/technique similar to for_each, and knowing this technique may be useful. Which is the main reason for asking.
Ideas?
If you have access to Boost or another tr1 implementation, you can use bind to do this:
const Armature * SceneFile::findArmature(const char * name) {
find_if(armatures.begin(), armatures.end(),
bind(_stricmp, name, bind(&string::c_str, bind(&Armature::name, _1))) == 0);
}
Caveat: I suspect many would admit that this is shorter, but claim it fails on the more elegant/simpler criteria.
Sure looks like a case for std::find_if -- as the predicate, you could use e.g. a suitable bind1st. I'm reluctant to say more as this smacks of homework a lot...;-).
Why 5 lines? Clean doesn't have a number attached to it. In fact, clean code might take more lines in the utility classes, which can then be reused over and over. Don't restrict yourself unnecessarily.
class by_name
{
public:
by_name(const std::string& pName) :
mName(pName)
{}
template <typename T>
bool operator()(const T& pX)
{
return pX.name == pName;
}
private:
std::string mName;
};
Then:
const Armature* SceneFile::findArmature(const char* name)
{
// whatever the iterator type name is
auto iter = std::find_if(armatures.begin(), armatures.end(), by_name(name));
return iter == armatures.end() ? 0 : &(*iter);
}
Within restriction:
class by_name { public: by_name(const std::string& pName) : mName(pName) {} template <typename T> bool operator()(const T& pX) { return pX.name == pName; } private: std::string mName; };
Then:
const Armature* SceneFile::findArmature(const char* name)
{
// whatever the iterator type name is
auto iter = std::find_if(armatures.begin(), armatures.end(), by_name(name));
return iter == armatures.end() ? 0 : &(*iter);
}
:)
C++0x has ranged-based for-loops, which I think would make the most elegant solution:
const Armature* SceneFile::findArmature(const std::string& pName) const
{
for (auto a : armatures)
{
if (a.name = pName) return &a;
}
return 0;
}
You would probably need to use STL map. It gives you possibility to get the element using keys. Your key would be the name of armature.
http://www.cplusplus.com/reference/stl/map/
EDIT: :D
one liner B-)
const Armature* SceneFile::findArmature(const Str& name){for (int i = 0; i < (int)armatures.size(); i++) if(name == armatures[i].name) return &armatures[i]; return 0;}
Holy shiz, you're using _stricmp? FAIL. Also, you didn't actually tell us the type of the vectors or any of the variables involved, so this is just guesswork.
const Armature* SceneFile::findArmature(const std::string& lols) {
for(auto it = armatures.begin(); it != armatures.end(); it++) {
if (boost::iequals(lols, (*it).name))
return &(*it);
return NULL;
}
Ultimately, if you need this, you should put the armatures or pointers to them in a std::map. A vector is the wrong container if you're searching into it, they're best for when the collection is what's important rather than any finding behaviour.
Edited to use a std::string reference.