Checking if a string contains more than just keywords C++

Checking if a string contains more than just keywords C++ - c++

Thank you for clicking on my question.
After countless hours of searching, I have not come across a solution and its quite difficult to search for something you don't know how to properly phrase in a search. Please help me out, I would appreciate it.
The data of the string would be like:
std::string keyword 1 "Hello";
std::string keyword 2 "Ola";
std::string test = Keyword1+Keyword2+keyword2;
Example of what I'm trying to achieve as a pseudocode:
if(test.contains(more then the 2 keywords))
I wanna make sure the string has other text than just the keywords above.

You can remove all instances of these keywords from your data and see what's left. It's not terribly efficient but shouldn't matter for reasonably sized inputs.
bool contains_more_than(std::vector<std::string> const& keywords, std::string sample) {
for (std::string const& keyword: keywords) {
size_t pos;
while ((pos = sample.find(keyword)) != sample.npos) {
sample.replace(pos, keyword.size(), "");
}
}
return !sample.empty();
}
Note that this might fail if some keyword is a substring of another:
contains_more_than({"123", "12345"}, "12345") returns True.
To avoid this you can first sort your keywords by std::string::size:
std::string(keywords.begin(), keywords.end(),
[](std::string const& s1, std::string const& s2) {
return s1.size() > s2.size();
});
Now:
contains_more_than({"12345", "123"}, "12345") returns False

A possible solution: expressed as a regular expression, you are testing whether the string matches ^(Hello|Ola)*$. That is, does the whole string match any number of repeats of "Hello" and/or "Ola" (and with nothing else)? You can use the regex standard library to match regular expressions in C++.

Related

How to match a vector of regular expression with a one string?

If I want to verify one string is completely matches with the any one in the vector of strings then i will use
std::find(vectOfStrings.begin(), vectOfStrings.end(), "<targetString>") != v.end()
If the target string matches with any of the string in the vector then it will return true.
But what if i want to check one string is matches with any one of the vector of regular expressions?
Is there any standard library i can use to make it work like
std::find(vectOfRegExprsns.begin(), vectOfRegExprsns.end(), "<targetString>") != v.end()?
Any suggestions would be highly appreciated.

How about using std::find_if() with a lambda?
std::find_if(
vectOfRegExprsns.begin(), vectOfRegExprsns.end(),
[](const std::string& item) { return regex_match(item, std::regex(targetString))});

Parsing key/value pairs from a string in C++

I'm working in C++11, no Boost. I have a function that takes as input a std::string that contains a series of key-value pairs, delimited with semicolons, and returns an object constructed from the input. All keys are required, but may be in any order.
Here is an example input string:
Top=0;Bottom=6;Name=Foo;
Here's another:
Name=Bar;Bottom=20;Top=10;
There is a corresponding concrete struct:
struct S
{
const uint8_t top;
const uint8_t bottom;
const string name;
}
I've implemented the function by repeatedly running a regular expression on the input string, once per member of S, and assigning the captured group of each to the relevant member of S, but this smells wrong. What's the best way to handle this sort of parsing?

For an easy readable solution, you can e.g. use std::regex_token_iterator and a sorted container to distinguish the attribute value pairs (alternatively use an unsorted container and std::sort).
std::regex r{R"([^;]+;)"};
std::set<std::string> tokens{std::sregex_token_iterator{std::begin(s), std::end(s), r}, std::sregex_token_iterator{}};
Now the attribute value strings are sorted lexicographically in the set tokens, i.e. the first is Bottom, then Name and last Top.
Lastly use a simple std::string::find and std::string::substr to extract the desired parts of the string.
Live example

Do you care about performance or readability? If readability is good enough, then pick your favorite version of split from this question and away we go:
std::map<std::string, std::string> tag_map;
for (const std::string& tag : split(input, ';')) {
auto key_val = split(input, '=');
tag_map.insert(std::make_pair(key_val[0], key_val[1]));
}
S s{std::stoi(tag_map["top"]),
std::stoi(tag_map["bottom"]),
tag_map["name"]};

Parse string into and unknown amount of regex groups in C++

I know the exact format of the text I should be getting. In particular, it should match a regex with a variable number of groups.
I want to use the C++ regex library to determine (a) if it is valid text, and (b) to parse those groups into a vector. How can I do this? I can find examples online to do (a), but not (b).
#include <string>
#include <regex>
#include <vector>
bool parse_this_text(std::string & text, std::vector<std::string> & group) {
// std::string text_regex = "^([a-z]*)(,[0-9]+)*$"
// if the text matches the regex, return true and parse each group into the vector
// else return false
???
}
Such that the following lines of code return the expected results.
std::vector<std::string> group;
parse_this_text("green,1", group);
// should return true with group = {"green", ",1"};
parse_this_text("yellow", group);
// should return true with group = {"yellow"};
parse_this_text("red,1,2,3", group);
// should return true with group = {"red", ",1", ",2", ",3"};
parse_this_text("blue,1.0,3.0,1,a", group);
// should return false (since it doesn't match the regex)
Thanks!

(?=^([a-zA-Z]*)(?:\,\d+)+$)^.*?(?:((?:\,\d+)+)).*?$
You can use this.This will first validate using lookahead and then return 2 groups.
1) containing name
2) containing all the rest of integers (This can be easily split) or you can use re.findall here
Though it doesnot answer your question fully , it might be of help.
Have a look.
http://regex101.com/r/wE3dU7/3

One option is to scan the string twice, the first time to check for validity and the second time to split it into fields. With the example in the OP, you don't really need regexen to split the line, once you know that it is correct; you can simply split on commas. But for the sake of exposition, you could use a std::regex_token_iterator (assuming you have a C++ library which supports those), something like this:
bool parse_this_text(const std::string& s, std::vector<std::string>& result) {
static const std::regex check("[[:alpha:]][[:alnum:]]*(,[[:digit:]])*",
std::regex_constants::nosubs);
static const std::regex split(",");
if (!std::regex_match(s, check))
return false;
std::sregex_token_iterator tokens(s.begin(), s.end(), split, -1);
result.clear();
std::copy(tokens, std::sregex_token_iterator(), std::back_inserter(result));
return true;
}
For more complicated cases, or applications in which the double scan is undesired, you can tokenize using successive calls to std::regex_search(), supplying the end of the previous match as the starting point, and std::regex_constants::continuous as the match flags; that will anchor each search to the character after the previous match. You could, in that case, use a std::regex_iterator, but I'm not convinced that the resulting code is any simpler.

Having a std::set with only file names (a, f/a, f/b, f/f/c, etc) how to list a directory by given f/?

So we have a set of file names\urls like file, folder/file, folder/file2, folder/file3, folder/folder2/fileN, etc. We are given a string like folder/. We want to find folder/file, folder/file2, folder/file3, and most intrestingly folder/folder2/ (we do not want to list forlder2 contents just show that it exists and it can be searched). Is such thing possible via STL and Boost, and how to do it?
Ups - just found out that I already loocked for this once a while ago here... but havent found correct answer yet...

A relatively simply C++11 implementation. This could be modified to C++03 easily. (caveat: have not compiled or tested this).
std::set<std::string> urls; // The set of values you have
std::string key_search = "folder/"; // text to search for
std::for_each(
urls.begin(),
urls.end(),
[&key_search] (const std::string& value)
{
// use std::string::find, this will only display
// strings that match from the beginning of the
// stored value:
if(0 == value.find(key_search))
std::cout << value << "\n"; // display
});

This sounds like a great opportunity to use regex stuff in Boost/C++11
Something like
std::set<std::string> theSet;
// Get stuff into theSet somehow
const std::string searchFor= "folder/";
std::set<std::string> matchingSet;
std::for_each(std::begin(theSet), std::end(theSet),
[&matchingSet, &searchFor] (const std::string & s)
{
if (/* the appropriate code to do regex matching... */)
matchingSet.insert(s); // or the match that was found instead of s
});
Sorry I can't provide the regex syntax... I need to study that more.

The ordered containers have a set of methods that are quite useful in finding a range of iterators: lower_bound and upper_bound. In your case, you want to use:
std::for_each(
path_set.lower_bound("folder/"),
path_set.upper_bound("folder0"), // "folder" + ('/'+1)
...);

boost::regex_search - boost kills my brain cells, again

Good programmers keep simple things easy right?
And it's not like the boost documentation makes your life less uneasy...
All I want is an implementation for:
// fulfils the function of a regex matching where the pattern may match a
// substring instead of the entire string
bool search( std::string, std::string, SomeResultType )
So it can be used as in:
std::string text, pattern;
SomeResultsType match;
if( search( text, pattern, match ) )
{
std::string result = match[0];
if( match[1].matched )
// where this is the second capture group, not recapturing the same group
std::string secondMatch = match[1];
}
I want my client code not to be bothered with templates and iterators... I know, I'm a wuss. After peering for an hour over the template spaghetti in the boost docs for doing something so simple, I feel like my productivity is seriously getting hampered and I don't feel like I've learned anything from it.
boost::regex_match does it pretty simple with boost::cmatch, except that it only matches the whole string, so I've been adapting all my patterns to match the whole strings, but I feel that it is a dirty hack and would prefer some more proper solution. If I would have known it would take this long, I would have stuck with regex_match
Also welcome, a copy of Reading boost documentation for dummies
Next week in Keep it simple and easy with boost, function binders! No, just kidding, I wouldn't do that to anyone.
Thanks for all help

I think you want regex_search: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
Probably this overload is the one you want:
bool regex_search(const basic_string& s,
match_results::const_iterator, Allocator>& m,
const basic_regex& e,
match_flag_type flags = match_default);
That seems to match what you wanted - SomeResultsType is smatch, and you need to convert your pattern to a regex first.

On Windows, you can use the .NET Regex class:
Example (copied from the linked page):
#using <System.dll>
using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
// Define a regular expression for repeated words.
Regex^ rx = gcnew Regex( "\\b(?<word>\\w+)\\s+(\\k<word>)\\b",static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );
// Define a test string.
String^ text = "The the quick brown fox fox jumped over the lazy dog dog.";
// Find matches.
MatchCollection^ matches = rx->Matches( text );
// Report the number of matches found.
Console::WriteLine( "{0} matches found.", matches->Count );
// Report on each match.
for each (Match^ match in matches)
{
String^ word = match->Groups["word"]->Value;
int index = match->Index;
Console::WriteLine("{0} repeated at position {1}", word, index);
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Checking if a string contains more than just keywords C++ - c++

A possible solution: expressed as a regular expression, you are testing whether the string matches ^(Hello|Ola)*$. That is, does the whole string match any number of repeats of "Hello" and/or "Ola" (and with nothing else)? You can use the regex standard library to match regular expressions in C++.

Related

How to match a vector of regular expression with a one string?

Parsing key/value pairs from a string in C++

Parse string into and unknown amount of regex groups in C++

Having a std::set with only file names (a, f/a, f/b, f/f/c, etc) how to list a directory by given f/?

boost::regex_search - boost kills my brain cells, again

Categories

Resources