C++ unordered_map<string, ...> lookup without constructing string - c++

I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:
char* buffer, size_t bufferSize
At some point, however, I'd like to look up a substring in one of these:
std::unordered_map<std::string, Info> stringToInfo = {...
So, to do that, I go:
stringToInfo.find(std::string(buffer, bufferSize))
That constructs a std::string for the sole purpose of the lookup.
I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...
class SubString
{
char* buffer;
size_t bufferSize;
// ...
};
... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.
So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?

What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find and std::set::find (note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html
Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find.
If you want to stick to std::unordered_map, you could avoid creating so many string temporaries by storing a std::string member alongside your unordered_map that you can reassign values to, then pass that string to find. You could encapsulate this in a custom container class.
Another route is to write a custom class to use as your unordered container key:
struct CharPtrOrString
{
const char* p_;
std::string s_;
explicit CharPtrOrString(const char* p) : p_{p} { }
CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }
bool operator==(const CharPtrOrString& x) const
{
return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
: p_ == x.s_
: x.p_ ? s_ == x.p_
: s_ == x.s_;
}
struct Hash
{
size_t operator()(const CharPtrOrString& x) const
{
std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
return std::hash<std::string_view>()(sv);
}
};
};
You can then construct CharPtrOrString from std::strings for use in the unordered container keys, but construct one cheaply from your const char* each time you call find. Note that operator== above has to work out which you did (convention used is that if the pointer's nullptr then the std::string member's in use) so it compares the in-use members. The hash function has to make sure a std::string with a particular textual value will produce the same hash as a const char* (which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).

In C++20, you can now do this:
// struct is from "https://www.cppstories.com/2021/heterogeneous-access-cpp20/"
struct string_hash {
using is_transparent = void;
[[nodiscard]] size_t operator()(const char *txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(std::string_view txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(const std::string &txt) const {
return std::hash<std::string>{}(txt);
}
};
// Declaration of map
std::unordered_map<std::string, Info, string_hash, std::equal_to<>> map;
std::string_view key = "foo";
if (map.find(key))
{
// do something here
}
Just note that you will still need std::string when using []. There may be a way around that, but I'm not too sure

Related

returning a std::string from a variant which can hold std::string or double

I have the following code:
#include <variant>
#include <string>
#include <iostream>
using Variant = std::variant<double, std::string>;
// helper type for the visitor
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
// explicit deduction guide (not needed as of C++20)
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
std::string string_from(const Variant& v)
{
return std::visit(overloaded {
[](const double arg) { return std::to_string(arg); },
[](const std::string& arg) { return arg; },
}, v);
}
int main()
{
Variant v1 {"Hello"};
Variant v2 {1.23};
std::cout << string_from(v1) << '\n';
std::cout << string_from(v2) << '\n';
return 0;
}
I have a function called string_from() which takes a variant and converts its inner value to a string.
The variant can hold either a std::string or a double.
In case of a std::string, I just return it.
In case of a double, I create a std::string from the double and then return it.
The problem is, I don't like the fact that I'm returning a copy of the std::string in case of a string-variant. Ideally, I would return a std::string_view or another kind of string observer.
However, I cannot return a std::string_view because in case of a double-variant I need to create a new temporary std::string and std::string_view is non-owning.
I cannot return a std::string& for the same reason.
I'm wondering if there's a way to optimize the code so that I can avoid the copy in case of a string-variant.
Note in my actual use case, I obtain strings from string-variants very frequently, but very rarely from double-variants.
But I still want to be able to obtain a std::string from a double-variant.
Also, in my actual use case, I usually just observe the string, so I don't really need the copy every time. std::string_view or some other string-observer would be perfect in this case, but it is impossible due to the reasons above.
I've considered several possible solutions, but I don't like any of them:
return a char* instead of a std::string and allocate the c-string somewhere on the heap in case of a double. In this case, I would also need to wrap the whole thing in a class which owns the heap-allocated strings to avoid memory leaks.
return a std::unique_ptr<std::string> with a custom deleter which would cleanup the heap-allocated strings, but would do nothing in case the string resides in the variant. Not sure how this custom deleter would be implemented.
Change the variant so it holds a std::shared_ptr<std::string> instead. Then when I need a string from the string-variant I just return a copy of the shared_ptr and when I need a string from the double-variant I call std::make_shared().
The third solution has an inherent problem: the std::string no longer resides in the variant, which means chasing pointers and losing performance.
Can you propose any other solutions to this problem? Something which performs better than copying a std::string every time I call the function.
You can return a proxy object. (this is like your unique_ptr method)
struct view_as_string{
view_as_string(const std::variant<double, std::string>& v){
auto s = std::get_if<std::string>(&v);
if(s) ref = s;
else temp = std::to_string(std::get<double>(v));
}
const std::string& data(){return ref?*ref:temp;}
const std::string* ref = nullptr;
std::string temp;
};
Use
int main()
{
std::variant<double, std::string> v1 {"Hello"};
std::variant<double, std::string> v2 {1.23};
std::cout << view_as_string(v1).data() << '\n';
view_as_string v2s(v2);
std::cout << v2s.data() << '\n';
}
The problem is, a variant holds different types, but you're trying to find a way to represent all of them in a single type. A string representation is useful for generic logging, but it has the downsides you describe.
For variants, I don't like trying to consolidate the values back into a single common thing, because if that was easily possible then there would be no need for the variant in the first place.
Better, I think, is to defer the conversion as late as possible, and keep forwarding it on to other functions that make use of the value as it is, or convert and forward until it's used--rather than trying to extract a single value and trying to use that.
A fairly generic function might look like this:
template <typename Variant, typename Handler>
auto with_string_view(Variant const & variant, Handler && handler) {
return std::visit(overloaded{
[&](auto const & obj) {
using std::to_string;
return handler(to_string(obj));
},
[&](std::string const & str) {return handler(str); },
[&](std::string_view str) { return handler(str); },
[&](char const * str) { return handler(str); }
}, variant);
}
Since the temporary created in the generic version outlives the call to the handler, this is safe and efficient. It also shows the "forward it on" technique that I've found to be very useful with variants (and visiting in general, even for non-variants.)
Also, I don't explicitly convert to string_view, but the function could add requirements that the handler accepts string views (if that helps document the usage.)
With the above helper function you might use it like this:
using V = std::variant<std::string, double>;
V v1{4.567};
V v2{"foo"};
auto print = [](std::string_view sv) { std::cout << sv << "\n";};
with_string_view(v1, print);
with_string_view(v2, print);
Here's a full live example, expanded out a little too: https://godbolt.org/z/n7KhEW7vY
If thread safety is not an issue, you could simply use a static std::string as the backing storage when returning a double value. Then you would be able to return a std::string_view, eg:
std::string_view string_from(const Variant& v)
{
static std::string buffer;
return std::visit(overloaded {
[&buffer](const double arg) -> std::string_view { buffer = std::to_string(arg); return buffer; },
[](const std::string& arg) -> std::string_view { return arg; },
}, v);
}
Online Demo
I've come up with my own solution inspired by apple apple's solution with the view_as_string class.
Here it is:
class owning_string_view : public std::string_view
{
public:
explicit owning_string_view(const char* str) : std::string_view{str}, m_string_buffer{} {}
explicit owning_string_view(const std::string& str) : std::string_view{str}, m_string_buffer{} {}
explicit owning_string_view(std::string&& str) : std::string_view{}, m_string_buffer{std::move(str)}
{
static_cast<std::string_view&>(*this) = m_string_buffer;
}
private:
std::string m_string_buffer;
};
Instead of taking a Variant I made it more generic and it takes strings instead.
For lvalue strings it just creates a std::string_view of the string.
For rvalue strings it moves the string into the buffer.
It extends from std::string_view so it can be used in std::string_view contexts seamlessly.
Of course you have to be careful not no slice off the std::string_view part from the object when creating an rvalue owning_string_view but this is true for std::string_view as well. You have to be careful not to take a std::string_view from an rvalue std::string.
Passing a owning_string_view as a std::string_view parameter to a function is safe for the same reason it is safe to pass an rvalue std::string as a std::string_view parameter to a function. The rvalue lives during the function call.
I also realized a deeper problem when returning a string_view from my Variantclass.
If I try to extract a std::string_view or a owning_string_view from an rvalue Variant I'm still going do end up with a dangling string_view, so I added 2 functions for taking a string from the Variant:
one accepts lvalue variants only and it returns owning_string_view.
the other accepts rvalue variants only and it returns a std::string, which is moved from the variant (since the variant is an rvalue).
One more observation: Ideally, I would make the first 2 constructors of owning_string_view constexpr but I can't because the default constructor of std::string is not constexpr. I hope this is changed in the future.

CPP: Use of deleted function

I am trying to sort a vector that contains custom struct entries using a lambda function in c++ . But I get prompted the following error message
error: use of deleted function ‘dummy_struct& dummy_struct::operator=(const dummy_struct&)
The code looks like the following:
#include <regex>
struct dummy_struct
{
dummy_struct(std::string name, int64_t value_a) :
name(name),
value_a(value_a)
{}
const std::string name;
const int64_t value_a;
int ExtractNumberFromName(std::regex key)
{
int retval;
std::cmatch match;
std::regex_search(this->name.c_str(),match,key);
retval=std::stoi(match[0],nullptr);
return retval;
}
};
void SortByCustomKey(const std::vector<dummy_struct> collection, std::regex key)
{
auto compare = [key](dummy_struct a, dummy_struct b)
{
return a.ExtractNumberFromName(key) > b.ExtractNumberFromName(key)
};
std::sort(std::begin(collection),std::end(collection),compare);
}
int main()
{
std::vector<dummy_struct> test;
test.push_back(dummy_struct("Entry[1]",1));
test.push_back(dummy_struct("Entry[2]",2));
test.push_back(dummy_struct("Entry[3]",3));
SortByCustomKey(test,std::regex("[0-9]+"));
}
What am I missing here?
std::sort sorts vector by swapping it's elements in place.
This requires for your class to implement copy assignment operator (or move assignment), which compiler won't generate for you due to const fields in the class. For your example the only solution seems to remove the const qualifiers from the fields. If you don't want them to be modified just make them private and don't provide (public) setters.
If they absolutely must stay there and you just want to get your values in sorted order you can use a different structure or store pointers in the vector.
Another solution is to write a custom swap implementation for your class that would const_cast away the qualifiers of the fields for the purpose of the assignment, although this is usually a bad code smell.

Is std::string an object?

just looking in optimizing some std::map code. The map contains objects, accessed via the string-identifier.
Example:
std::map<std::string, CVeryImportantObject> theMap;
...
theMap["second"] = new CVeryImportantObject();
Now, when using the find-function as theMap->find("second"), the String is converted into std::string("second"), which causes new string allocations (over all when using IDL=2 with Visual Studio).
1. Is there a possibility to use a string-only class to avoid such allocations?
Intentionally I've tried to use another String-Class as well:
std::map<CString, CVeryImportantObject> theMap;
This code works also. But CString indeed is an object.
And: If you remove an object from the map, I'll need to release both the related object and the key, do I?
Any suggestions?
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations (over all when using IDL=2 with Visual Studio).
This is a Standard issue, which is fixed in C++14 for ordered containers. The newest version of VS, VS 14 CTP (which is a pre-release) contains a fix for this issue, as will new versions of other implementations.
If you need to avoid allocations, you can try a class like llvm::StringRef which can refer to std::string or string literals interchangably, but then you will be left trying to handle the ownership externally.
You can try something like unique_ptr<char[], maybe_delete> that sometimes deletes the contents. This is a bit of a mess to interface with though.
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
The map will automatically destruct the key and value for you. For a class which frees it's own resources like std::string, which is the only sane way to write C++, then you can erase without worrying about resource cleanup.
If you always use string constants as keys, you can use const char * as key type in map when you use proper comparator:
struct PCharCompare {
bool operator()( const char *s1, const char *s2 ) const { return strcmp( s1, s2 ) < 0; }
};
std::map< const char *, CVeryImportantObject, PCharCompare> theMap;
Note: you have to be careful and need to understand how it works, as it can easily lead to UB:
void foo() {
char buffer[256];
snprintf( buffer, sizeof( buffer ), "blah" );
theMap.insert( std::make_pair( buffer, Object ) );
} // ups dangled pointer in the map
As for optimization, it is very unlikely that std::string creation is a culprit. you may try to use std::unordered_map or something similar for optimization
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations
Not necessarily. VC uses Small-String Optimisation (SSO). This means that for a string as short as "second", no allocation on the heap should take place at all; the characters will instead be stored directly in the temporarily created std::string object.
This is still not free (because the std::string has to be created, albeit without any dynamic allocation happening inside), but should be good enough. Is it really a concern for you? Chances are very high that it does not cause any measurable performance decrease.
Is there a possibility to use a string-only class to avoid such allocations?
Not really, except of the C++14 fix mentioned in other answers. Using char const * as the key type is very dangerous, because std::map will only store the actual addresses, not copies of the keys.
If I were you and if I really experienced performance problems, I'd just not use std::map directly but create my own container class to wrap a std::map<char const *, T, CustomComparison> and do the hard pointer work inside.
template <class ValueType>
class FastStringMap
{
private:
struct Comparison
{
bool operator()(char const *lhs, char const *rhs) const
{
return strcmp(lhs, rhs) > 0;
}
};
typedef std::map<char const *, ValueType, Comparison> WrappedMap;
WrappedMap m_map;
public:
typedef typename WrappedMap::iterator iterator;
typedef typename WrappedMap::const_iterator const_iterator;
bool insert(char const *key, ValueType const &value)
{
if (m_map.find(key) != m_map.end())
{
return false;
}
else
{
char *copy = new char[strlen(key) + 1];
strcpy(copy, key);
try
{
return m_map.insert(std::make_pair(copy, value)).second;
}
catch (...)
{
delete copy;
throw;
}
}
}
~FastStringMap()
{
for (iterator iter = m_map.begin(); iter != m_map.end(); ++iter)
{
delete[] iter->first;
}
}
iterator find(char const *key)
{
return m_map.find(key);
}
const_iterator find(char const *key) const
{
return m_map.find(key);
}
// further operations
};
To be used like this:
FastStringMap<int> m;
m.insert("AAA", 1);
m.insert("BBB", 2);
m.insert("CCC", 3);
std::cout << m.find("AAA")->second;
Note that you can possibly make this more sophisticated by templatising also on the character type (for std::wstring support) or by providing "real" iterator classes (using Boost Iterator Facade).
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
If you use std::string, no. If you use char const * and if the pointers point to memory allocated dynamically (as in my example), then yes.

Avoiding Helper Functions for Doing Comparisons

Say I have a type with a member function:
class Thing {
std::string m_name;
public:
std::string & getName() {
return m_name;
}
};
And say I have a collection of that type:
std::vector<Thing> things;
And I want to keep the things in order by name. To do that, I use std::lower_bound to figure out where to put it:
bool thingLessThan(Thing const& thing, std::string const& name) {
return thing.getName() < name;
}
void addThing(std::string const& name) {
vector<Thing>::iterator position = lower_bound(
things.begin(), things.end(),
name,
thingLessThan);
if (position == things.end() || position->getName() != name) {
position = things.insert(position, Thing());
position->getName() = name;
}
}
Is there a way to do the same thing as the thingLessThan function without actually creating a function, perhaps using std::mem_fun, std::less, etc?
Other than a lambda you can simply define an operator< which adheres to strict weak ordering to allow a container of your object to be comparable by STL algorithms with the default predicate std::less
class whatever
{
public:
bool operator<(const whatever& rhs) const { return x < rhs.x; }
private:
int x;
};
std::vector<whatever> v;
std::sort(v.begin(), v.end());
Sure. You can use a lambda expression (assuming your compiler supports it):
vector<Thing>::iterator position = lower_bound(
things.begin(), things.end(),
name,
[](Thing const& thing, std::string const& name) { return thing.getName() < name; });
Of course, an alternative option is just to define operator< for the class, then it will be used by default, if you don't specify another comparer function for std::lower_bound.
Depending on what your purpose is? If you just like the syntactic niceness of not declaring something to be used in one place, use lambda expressions to create an anonymous function.
You can overload operator<() and use std::less<T> if you don't want to write predicates contantly. Also you can use lambda-expressions, which would be much nicer, because operator<() is logically connected only with things, that can be put in some order in obvious ways, like numbers or strings.
If you use a std::map, the strings will be placed in alphabetical order automatically. If you want to modify the ordering further, create your own key comparison function. I think this would be the simplest option.
To use a std::list, you can write your own comparison code inside of the addThing() function that goes through the list looking at each string and inserts the new one at the appropriate place.

Find array element by member value - what are "for" loop/std::map/Compare/for_each alternatives?

Example routine:
const Armature* SceneFile::findArmature(const Str& name){
for (int i = 0; i < (int)armatures.size(); i++)
if (name == armatures[i].name)
return &armatures[i];
return 0;
}
Routine's purpose is (obviously) to find a value within an array of elements, based on element's member variable, where comparing member variable with external "key" is search criteria.
One way to do it is to iterate through array in loop. Another is to use some kind of "map" class (std::map, some kind of vector sorted values + binarySearch, etc, etc). It is also possible to make a class for std::find or for std::for_each and use it to "wrap" the iteration loop.
What are other ways to do that?
I'm looking for alternative ways/techniques to extract the required element.
Ideally - I'm looking for a language construct, or a template "combo", or a programming pattern I don't know of that would collapse entire loop or entire function into one statement. Preferably using standard C++/STL features (no C++0x, until it becomes a new standard) AND without having to write additional helper classes (i.e. if helper classes exist, they should be generated from existing templates).
I.e. something like std::find where comparison is based on class member variable, and a variable is extracted using standard template function, or if variable (the one compared against "key"("name")) in example can be selected as parameter.
The purpose of the question is to discover/find language feature/programming technique I don't know yet. I suspect that there may be an applicable construct/tempalte/function/technique similar to for_each, and knowing this technique may be useful. Which is the main reason for asking.
Ideas?
If you have access to Boost or another tr1 implementation, you can use bind to do this:
const Armature * SceneFile::findArmature(const char * name) {
find_if(armatures.begin(), armatures.end(),
bind(_stricmp, name, bind(&string::c_str, bind(&Armature::name, _1))) == 0);
}
Caveat: I suspect many would admit that this is shorter, but claim it fails on the more elegant/simpler criteria.
Sure looks like a case for std::find_if -- as the predicate, you could use e.g. a suitable bind1st. I'm reluctant to say more as this smacks of homework a lot...;-).
Why 5 lines? Clean doesn't have a number attached to it. In fact, clean code might take more lines in the utility classes, which can then be reused over and over. Don't restrict yourself unnecessarily.
class by_name
{
public:
by_name(const std::string& pName) :
mName(pName)
{}
template <typename T>
bool operator()(const T& pX)
{
return pX.name == pName;
}
private:
std::string mName;
};
Then:
const Armature* SceneFile::findArmature(const char* name)
{
// whatever the iterator type name is
auto iter = std::find_if(armatures.begin(), armatures.end(), by_name(name));
return iter == armatures.end() ? 0 : &(*iter);
}
Within restriction:
class by_name { public: by_name(const std::string& pName) : mName(pName) {} template <typename T> bool operator()(const T& pX) { return pX.name == pName; } private: std::string mName; };
Then:
const Armature* SceneFile::findArmature(const char* name)
{
// whatever the iterator type name is
auto iter = std::find_if(armatures.begin(), armatures.end(), by_name(name));
return iter == armatures.end() ? 0 : &(*iter);
}
:)
C++0x has ranged-based for-loops, which I think would make the most elegant solution:
const Armature* SceneFile::findArmature(const std::string& pName) const
{
for (auto a : armatures)
{
if (a.name = pName) return &a;
}
return 0;
}
You would probably need to use STL map. It gives you possibility to get the element using keys. Your key would be the name of armature.
http://www.cplusplus.com/reference/stl/map/
EDIT: :D
one liner B-)
const Armature* SceneFile::findArmature(const Str& name){for (int i = 0; i < (int)armatures.size(); i++) if(name == armatures[i].name) return &armatures[i]; return 0;}
Holy shiz, you're using _stricmp? FAIL. Also, you didn't actually tell us the type of the vectors or any of the variables involved, so this is just guesswork.
const Armature* SceneFile::findArmature(const std::string& lols) {
for(auto it = armatures.begin(); it != armatures.end(); it++) {
if (boost::iequals(lols, (*it).name))
return &(*it);
return NULL;
}
Ultimately, if you need this, you should put the armatures or pointers to them in a std::map. A vector is the wrong container if you're searching into it, they're best for when the collection is what's important rather than any finding behaviour.
Edited to use a std::string reference.