Finding in a std map using regex - c++

I want to know how I can look up for an item in a map using a regex function. In my case, I have a map with expressions such as en*, es*, en-AU and so on and I have a string of possible values like en, en-US, en-GB, es-CL and so on.
I want to search using that string to find that item in the map.
Looking for the key without the wildcard first then looking for the key with the wildcard as the 2nd priority.
Please help me out with this problem or if this is an inefficient or if any one has a different approach please tell me another method for that. I use C++ with boost and stl.

If the map is small or the search is executed rarely, then just iterate through the map and match each key with the regular expression.
Otherwise: If the regular expression is used only for some kind of prefix search, you can use the member function lower_bound to efficiently find all entries with the given prefix. For example the following function first looks for an entry that matches exactly. If no such entry exists, the function returns the range of all entries with a matching prefix.
using items = std::map<std::string, item>;
auto lookup(const items& items, const std::string& key)
-> std::pair<items::const_iterator, items::const_iterator>
{
auto p = items.lower_bound(key);
auto q = items.end();
if (p != q && p->first == key) {
return std::make_pair(p, std::next(p));
} else {
auto r = p;
while (r != q && r->first.compare(0, key.size(), key) == 0) {
++r;
}
return std::make_pair(p, r);
}
}
Otherwise: If you have to cope with regular expressions or wildcards, then you can combine the two approaches. First search for an entry that matches exactly with the member function find. If no such entry exists, then extract the constant prefix from the regular expression. The prefix may be empty. Use the member function lower_bound to find the first entry with that prefix. Iterate through all entries with that prefix and test if the regular expression matches.

Related

Iterating over a map, starting at a specific key

I am implementing a function meant to find any strings that share a prefix with a given string. All of the possible strings to compare to are in a map already, and I would like to iterate over that map starting at where the given string is. Currently, I am looping over the entire map, but for performance reasons, I need to make this more efficient. This is how I am currently iterating over the map:
for(auto const& it : lookup_map){ //performs code }
I would like this to not start from the beginning of the map, but wherever the given string is in the map.
Just use good old iterators:
for( auto it = lookup_map.find( your_string ); it != lookup_map.end(); ++it ) {
// using it
}
std::map has its own member function lower_bound for what you want to do. You can use std::map::lower_bound to get your starting position and then do a compare of the prefix for subsequent iterations:
for (auto it = lookup_map.lower_bound(prefix);
it != std::end(lookup_map) && it->first.compare(0, prefix.size(), prefix) == 0;
++it)
{
...
}
This will be more efficient than iterating every single key. The advantage in this case of using lower_bound() is that it will return the first item that is equivalent or after the search term. So if your search term is "aa" and you have an entry "aab" in your map, lower_bound() will return an iterator to "aab". I think this will be more useful in your case because you want to search on the prefix.
In C++20, std::string has a starts_with() function. So we can use this function to check the prefix and simplify our code a little:
for (auto it = lookup_map.lower_bound(prefix);
it != lookup_map.end() && it->first.starts_with(prefix); ++it)
{
...
}
Demo

How to match a vector of regular expression with a one string?

If I want to verify one string is completely matches with the any one in the vector of strings then i will use
std::find(vectOfStrings.begin(), vectOfStrings.end(), "<targetString>") != v.end()
If the target string matches with any of the string in the vector then it will return true.
But what if i want to check one string is matches with any one of the vector of regular expressions?
Is there any standard library i can use to make it work like
std::find(vectOfRegExprsns.begin(), vectOfRegExprsns.end(), "<targetString>") != v.end()?
Any suggestions would be highly appreciated.
How about using std::find_if() with a lambda?
std::find_if(
vectOfRegExprsns.begin(), vectOfRegExprsns.end(),
[](const std::string& item) { return regex_match(item, std::regex(targetString))});

Prebuilt function to find character sequence in a string?

I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.

std::map - Element access without exception and without insertion

I have a recurrent pattern with the use of std::map.
I want to retrieve the value only when the key is present, otherwise I don't want to insert element. Currently I'm using count(key) or find(key) (which one is better? from the documentation the complexity seems to be the same) and if them returns a positive value that I access the map. However I would like to avoid the use of two operations on the map. Something like:
map<string, int> myMap;
int returnvalue;
boole result = myMap.get("key1",returnValue)
if(result){
\\ use returnValue
}
Reading the std::map documentation on cplusplus.com I found two functions for accessing map elements:
at(): which throws an excpetion if the key is not present
[]: which insert a new value if the key is not present
None of them satisfy my necessity.
Use map::find:
auto it = myMap.find(key);
if (it != myMap.end())
{
// use it->second
}
else
{
// not found
}
This part was easy. The harder problem is when you want to look up if an element exists and return it if it does, but otherwise insert a new element at that key, all without searching the map twice. For that you need to use lower_bound followed by hinted insertion.
using count() for sure the key is exists
then uses find() to get the k/v pair
if (myMap.count(key))
{
auto it = myMap.find(key)
}
else
{
// not found
}

Get all the keys which matches a query in a map

Say I have more than one key with the same value in a map. Then in that case how do I retrieve all keys that matches a query.
Or, Is there any possibility to tell find operation to search after a specific value.
I am using an std::map, C++.
Would something like this work for you:
void FindKeysWithValue(Value aValue, list<Key>& aList)
{
aList.clear();
for_each(iMap.begin(), iMap.end(), [&] (const pair<Key, Value>& aPair)
{
if (aPair.second == aValue)
{
aList.push_back(aPair.first);
}
});
}
The associative containers probably won't help you too much because for std::map<K, V> the key happens to be unique and chances that your chosen query matches the ordering relation you used may not be too high. If the order matches, you can use the std::map<K, V> members lower_bound() and upper_bound(). For std::multimap<K, V> you can also use equal_range().
In general, i.e., if you query isn't really related to the order, you can use std::copy_if() to get a sequence of objects matching a predicate:
Other other;
// ...
std::vector<Other::value_type> matches;
std::copy_if(other.begin(), other.end(),
std::back_inserter(matches), predicate);
When copying the elements is too expensive, you should probably consider using std:find_if() instead:
for (auto it(other.begin());
other.end() != (it = std::find_if(it, other.end(), predicate));
++it) {
// do something with it
}
The only way is to iterate over map.
this link may be useful: Reverse map lookup
Provided you want quick access and you don't mind using some more space, then you maintain another map that gets stored as value, key. In your case, you would need to handle the duplicate values (that you will be storing as keys).
Not a great idea but definitely an option.
A map is meant for efficient lookup of keys. Lookup based on values is not efficient, and you basically have to iterate through the map, extracting matches yourself:
for(map<A,B>::iterator i = m.begin(); i != m.end(); i++)
if(i->second == foo)
you_found_a_match();
If you intend to do this often, you can build up a multimap mapping the other way, so you can efficiently perform a value-based lookup:
multimap<B,A> reverse;
for(map<A,B>::iterator i = m.begin(); i != m.end(); i++)
reverse.insert(pair<B,A>(i->second,i->first));
You can now easily find the keys with a given value value:
matches = reverse.equal_range(value);
for(multimap<B,A>::iterator i = matches.first; i != matches.second; i++)
A & key = i->second;
If these maps aren't going to grow continuously, it may be more efficient to simply maintain a vector > instead, define a comparator for it based on the value, and use equal_range on that instead.