It is easy to find a string in a set of strings using set::find or first of a set of strings in a set of strings using std::find_first_of. But I think that STL doesn't handle this case of find_first_of set of strings (substrings) in a string. For low latency reasons I use parallel execution, would you please let me know if this implementation is idiomatic using modern C++ :
#include <string>
#include <list>
#include <atomic>
#include <execution>
#include <iostream>
class Intent{
const std::list<std::string> m_Context;
const std::string m_Name;
std::atomic_bool m_Found;
public:
Intent(const std::list<std::string> context, const std::string name)
: m_Context(context)
, m_Name(name)
, m_Found(false)
{}
Intent(const Intent & intent) = delete;
Intent & operator=(const Intent & intent) = delete;
Intent(Intent && intent) : m_Context(std::move(intent.m_Context))
, m_Name(std::move(intent.m_Name))
, m_Found(static_cast< bool >(intent.m_Found))
{}
bool find(const std::string & sentence)
{
for_each( std::execution::par
, std::begin(m_Context)
, std::end(m_Context)
, [& m_Found = m_Found, & sentence](const std::string & context_element){
//
// Maybe after launching thread per context_element one of them make intent Found
// so no need to run string::find in the remaining threads.
//
if(!m_Found){
if(sentence.find(context_element) != std::string::npos)
{
m_Found = true;
}
}
}
);
return m_Found;
}
const bool getFound() const {return m_Found;}
const std::string & getName() const {return m_Name;}
};
int main()
{
Intent intent({"hello", "Hi", "Good morning"}, "GREETING");
std::cout << intent.find("Hi my friend.");
}
I think the idiomatic way of doing it would be to use std::find_if. Then you don't need the atomic<bool> either.
// return iterator to found element or end()
auto find(const std::string & sentence)
{
return std::find_if( std::execution::par
, std::begin(m_Context)
, std::end(m_Context)
, [&sentence](const std::string & context_element) {
return sentence.find(context_element) != std::string::npos;
}
);
}
If you really only want a bool you could use std::any_of:
bool find(const std::string & sentence)
{
return std::any_of( std::execution::par
, std::begin(m_Context)
, std::end(m_Context)
, [&sentence](const std::string & context_element) {
return sentence.find(context_element) != std::string::npos;
}
);
}
You may want to consider using a std::vector instead of a std::list too. vectors provide random access iterators while lists only provide bidirectional iterators.
Related
I have a header file header.hpp in which is declared this function:
#ifndef COMMON_HPP
#define COMMON_HPP
#include <string>
#include <map>
extern std::string func( const std::map <std::string, std::string>& generic_map, const std::string& feat_string )
#endif
and that function is defined in a .cpp file:
#include "header.hpp"
#include <string>
#include <map>
std::string func( const std::map <std::string, std::string>& map_s, const std::string& str )
{
if( map_s.find( str ) == map_s.end() )
{
throw std::runtime_error( "Error!" );
}
return map_s.at( str );
}
Benchmarking registered an execution time of 292 ns in 2684415 iterations. Do you know if there is any way to improve the function in order to make it faster?
You're doing two lookups into the map to get the element you seek:
Here's an immediate improvement:
std::string func( const std::map <std::string, std::string>& map_s, const std::string& str )
{
auto itor = map_s.find(str);
if (itor == map_s.end())
{
throw std::runtime_error( "Error!" );
}
return itor->second;
}
The other improvement you can make is to use a std::unordered_map instead of std::map. If you don't need to enumerate keys in the map in any sorted or consistent order, you should use unordered_map, which is O(1) for insertion and lookup. The ordered map will likely have O(lg N) runtime for insertion and lookup. (i.e. hash table vs binary tree implementations for the collections). std::unordered_map and std::map have near identical interfaces, so it's an easy drop-in replacement.
Given how small the code is, there is really not many changes you can make to it, but there are a few:
use the iterator that map::find() returns, you don't need to call map::at() afterwards. You are actually performing the same search 2 times, when you really want to perform it only 1 time.
return a reference to the found value, to avoid making a copy of it (map::at() returns a reference anyway). The exception you are throwing will ensure the returned reference is never invalid.
If you don't need the map to be sorted, use std::unordered_map instead, which is usually faster than std::map for inserts and lookups.
Try this:
const std::string& func( const std::map <std::string, std::string>& map_s, const std::string& str )
{
auto iter = map_s.find( str );
if (iter == map_s.end() )
throw std::runtime_error( "Error!" );
return iter->second;
}
That being said, if you don't mind what kind of error message is thrown, you can replace map::find() with map::at(), since map::at() throws its own std::out_of_range exception if the key is not found (that is the whole purpose of map::at() vs map::operator[]), eg:
const std::string& func( const std::map <std::string, std::string>& map_s, const std::string& str )
{
return map_s.at( str );
}
First, it returns std::string instead of const std::string&, which is slow (involves a copy).
Also, exceptions in current compilers are very expensive. A possible way around them is to return a const std::string* and return nullptr in case element is not found (also removed double lookup):
const std::string* func( const std::map <std::string, std::string>& map_s, const std::string& str )
{
auto it = map_s.find( str );
if( it == map_s.end() )
{
return nullptr;
}
return &(it->second);
}
An advanced technique is continuation passing style, which can further optimize your code (and save you a branching). In this case, instead of returning something, you pass what should be done with the result:
template<typename Found, typename NotFound>
void func( const std::map <std::string, std::string>& map_s, const std::string& str Found found, NotFound notFound)
{
auto it = map_s.find( str );
if( it == map_s.end() )
{
return notFound();
}
return found(it->second);
}
Usage:
std::map<std::string, std::string> m;
std::string s;
func(m, s, [&](const std::string& elem) {
/* found - do something with elem */
}, [&]() {
/* not found - error handler */
});
This is like if you had your own control structures. Of course, you could make the return type different from void (in case you'd like to 'tunnel' the result of found(elem) and notFound()). If you'd like, you can share e.g. the error handler among multiple calls to the function (in which case, you store it in a variable, like const auto onNotFound = [&]() { /* ... */ };). Of course, when you're optimizing, you might limit the capture of the lambda to what's necessary.
I have the following small and easy code:
int main(int argc, char *argv[]) {
std::vector<std::string> in;
std::set<std::string> out;
in.push_back("Hi");
in.push_back("Dear");
in.push_back("Buddy");
for (const auto& item : in) {
*** std::transform(item.begin(),item.end(),item.begin(), ::tolower);
*** out.insert(item);
}
return 0;
}
I'd like to copy all items of in into out.
However, with an in-place lowercase conversion, preferrably without an extra temporary variable.
So this is the required content of out at the end:
hi
dear
buddy
Please note, const auto& item is fixed, meaning I can't remove the const requirement (this is part of a bigger library, here is over-simplified for demo).
How should I do this? (If I remove the "const" modifier, it works, but if modifications are not allowed on item, how can I still insert the item into the set, while also transforming it to lowercase?)
Note, you have to copy - since items in the original in container can not be moved into out container. The below code makes the copy of each element exactly once.
...
in.push_back("Hi");
in.push_back("Dear");
in.push_back("Buddy");
std::transform(in.begin(), in.end(), std::inserter(out, out.end()),
[] (std::string str) { boost::algorithm::to_lower(str); return str;}
);
return 0;
You need a lambda function with transform, and you shouldn't have a const & to your strings or transform can't modify it.
#include <algorithm>
#include <set>
#include <string>
#include <vector>
int main()
{
std::vector<std::string> in;
std::set<std::string> out;
in.push_back("Hi");
in.push_back("Dear");
in.push_back("Buddy");
for (/*const*/ auto& item : in) // you can't have a const & if your going to modify it.
{
std::transform(item.begin(), item.end(), item.begin(), [](const char c)
{
return static_cast<char>(::tolower(c));
});
out.insert(item);
}
return 0;
}
I'm using a very nice and simple std::vector<std::string> initializer which takes an input string and regex. It's similar to a basic split, just it works with regex Group1 matches:
static std::vector<std::string> match(const std::string& str, const std::regex& re) {
return { std::sregex_token_iterator(str.begin(), str.end(), re, 1), std::sregex_token_iterator() };
}
Construction of a vector is done like below:
std::string input = "aaa(item0,param0);bbb(item1,param1);cc(item2,param2);";
std::vector<std::string> myVector = match(input, std::regex(R"(\(([^,]*),)"));
This results a vector containing item0,item1,item2 extracted from an input string with regex:
Now my match function uses the first group results of the regex and (I believe) utilizes the vector's intialization form of:
std::vector<std::string> myVector = { ... };
I'd like to create a similar match function to construct a std::map<std::string,std::string>. Map also has the above initializator:
std::map<std::string,std::string> myMap = { {...}, {...} };
My idea is to modify the regex to create more group results:
And I would like to modify the above match function to create a nice map for me with the modified regex (\(([^,]*),([^)]*)), resulting the same as this:
std::map<std::string,std::string> myMap = { {"item0", "param0"}, {"item1", "param "}, {"item2", "param2"}, };
What I've tried?
static std::map<std::string, std::string> match(const std::string& str, const std::regex& re) {
return { std::sregex_token_iterator(str.begin(), str.end(), re, {1,2}), std::sregex_token_iterator() };
}
This one (in case of a vector) would put both Group1 and Group2 results into the vector. But it can not initialize a map.
How can I still do that easily (Is it not possible with sregex_token_iterator)?
I don't know what 'easily' does mean exactly, so here comes simple solution:
#include <iostream>
#include <regex>
#include <vector>
static std::map<std::string, std::string> match(const std::string& str, const std::regex& re) {
std::map<std::string, std::string> retVal;
auto token = std::sregex_token_iterator(str.begin(), str.end(), re, {1,2});
for (auto it=token++, jt=token; it != std::sregex_token_iterator(); ++it, jt = it++)
retVal.emplace(*it,*jt);
return retVal;
}
int main() {
std::string input = "aaa(item0,param0);bbb(item1,param1);cc(item2,param2);";
auto myVector = match(input, std::regex(R"(\(([^,]*),([^)]*))"));
for (const auto& item : myVector)
std::cout<<item.first<<'\t'<<item.second<<std::endl;
}
You can also could try to use boost and homemade generic algorithm.
Consider the following code:
#include <boost/range.hpp>
#include <boost/range/any_range.hpp>
#include <boost/range/join.hpp>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <list>
struct TestData {
TestData() : m_strMem01("test"), m_intMem02(42), m_boolMem03(true) {}
std::string m_strMem01;
int m_intMem02;
bool m_boolMem03;
};
struct IntComp {
bool operator()(const TestData &s, int i) { return s.m_intMem02 < i; }
bool operator()(int i, const TestData &s) { return i < s.m_intMem02; }
bool operator()(const TestData &i, const TestData &s) {
return i.m_intMem02 < s.m_intMem02;
}
};
struct StrComp {
bool operator()(const TestData &s, const std::string &str) {
return s.m_strMem01 < str;
}
bool operator()(const std::string &str, const TestData &s) {
return str < s.m_strMem01;
}
bool operator()(const TestData &i, const TestData &s) {
return i.m_strMem01 < s.m_strMem01;
}
};
typedef boost::any_range<TestData, boost::forward_traversal_tag,
const TestData &, std::ptrdiff_t> TestRange;
std::vector<TestData> vecData(10);
std::list<TestData> listData(20);
TestRange foo() {
TestRange retVal;
auto tmp1 = std::equal_range(vecData.cbegin(), vecData.cend(), 42, IntComp());
retVal = boost::join(retVal, tmp1);
auto tmp2 =
std::equal_range(listData.cbegin(), listData.cend(), "test", StrComp());
retVal = boost::join(retVal, tmp2);
return retVal;
}
int main(int argc, char *argv[]) {
auto res = foo();
for (auto a : res) {
std::cout << a.m_strMem01 << std::endl;
}
//std::cout << res[4].m_intMem02 << std::endl;
}
If you uncomment the last line the code fails since distance_to not implemented for any_forward_iterator_interface. I'm not sure what exactly I'm missing here, like implementing operator[] or distance_to but for what? My own version traversal tag? And why it doesn't work in the first place?
Coliru version
I would say the answer depends on your performance needs and your laziness when it comes to implementing a new iterator abstraction. The core reason for your [] operator not working is the fact that std::list<...> does not provide a random access traversal iterator. If you would have chosen a container that provides such an iterator. You any_range<...> could have taken the random_access_traversal_tag and everything would be fine.
I think it's fair to say that it is not such a big deal to implement a random access iterator on top of a list by simply encapsulating the current index and count forward and backward within the list whenever a specific position is meant to be accessed, but it's clearly against the nature of the list performance-wise.
Is there a good reason to hold one of the collection in a list ?
Is there a good reason to access the resulting any_range by random ?
Is it worth the effort to provide a inefficient random access interface for std::list ?
Of course any_iterator (which underlies the any_range implementation) doesn't gratuitously emulate RandomAccess iterators for any odd iterator you pass.
If you want that, just make an iterator adaptor that does this (making it very slow to random access elements in a list - so don't do this).
This is basically what I want to do:
bool special_compare(const string& s1, const string& s2)
{
// match with wild card
}
std::vector<string> strings;
strings.push_back("Hello");
strings.push_back("World");
// I want this to find "Hello"
find(strings.begin(), strings.end(), "hell*", special_compare);
// And I want this to find "World"
find(strings.begin(), strings.end(), "**rld", special_compare);
But std::find doesn't work like that unfortunately. So using only the STL, how can I do something like this?
Based on your comments, you're probably looking for this:
struct special_compare : public std::unary_function<std::string, bool>
{
explicit special_compare(const std::string &baseline) : baseline(baseline) {}
bool operator() (const std::string &arg)
{ return somehow_compare(arg, baseline); }
std::string baseline;
}
std::find_if(strings.begin(), strings.end(), special_compare("hell*"));
The function you need to use is this : std::find_if, because std::find doesn't take compare function.
But then std::find_if doesn't take value. You're trying to pass value and compare both, which is confusing me. Anyway, look at the documentation. See the difference of the usage:
auto it1 = std::find(strings.begin(), strings.end(), "hell*");
auto it2 = std::find_if(strings.begin(), strings.end(), special_compare);
Hope that helps.
You'll need std::find_if(), which is awkward to use, unless you're on a C++11 compiler. Because then, you don't need to hardcode the value to search for in some comparator function or implement a functor object, but can do it in a lambda expression:
vector<string> strings;
strings.push_back("Hello");
strings.push_back("World");
find_if(strings.begin(), strings.end(), [](const string& s) {
return matches_wildcard(s, "hell*");
});
Then you write a matches_wildcard() somewhere.
Since nobody has mentioned std::bind yet, I'll propose this one
#include <functional>
bool special_compare(const std::string& s, const std::string& pattern)
{
// match with wild card
}
std::vector<std::string> strings;
auto i = find_if(strings.begin(), strings.end(), std::bind(special_compare, std::placeholders::_1, "hell*"));
With C++11 lambdas:
auto found = find_if(strings.begin(), strings.end(), [] (const std::string& s) {
return /* you can use "hell*" here! */;
});
If you can't use C++11 lambdas, you can just make a function object yourself. Make a type and overload operator ().
I wanted to have an example with custom class having custom find logic but didn't find any answer like that. So I wrote this answer which uses custom comparator function (C++11) to find an object.
class Student {
private:
long long m_id;
// private fields
public:
long long getId() { return m_id; };
};
Now suppose, I want to find the student object whose m_id matches with a given id. I can write std::find_if like this:
// studentList is a vector array
long long x_id = 3; // local variable
auto itr = std::find_if(studentList.begin(), studentList.end(),
[x_id](Student& std_val)
{ return std_val.getId() == x_id; }
);
if(itr == studentList.end())
printf("nothing found");