c++ Is there a way to find sentences within strings? - c++

I'm trying to recognise certain phrases within a user defined string but so far have only been able to get a single word.
For example, if I have the sentence:
"What do you think of stack overflow?"
is there a way to search for "What do you" within the string?
I know you can retrieve a single word with the find function but when attempting to get all three it gets stuck and can only search for the first.
Is there a way to search for the whole string in another string?

Use str.find()
size_t find (const string& str, size_t pos = 0)
Its return value is the starting position of the substring. You can test if the string you are looking for is contained in the main string by performing the simple boolean test of returning str::npos:
string str = "What do you think of stack overflow?";
if (str.find("What do you") != str::npos) // is contained
The second argument can be used to limit your search from certain string position.
The OP question mentions it gets stuck in the attempt to find a three word string. Actually, I believe you are misinterpreting the return value. It happens that the return for the single word search "What" and the string "What do you" have coincidental starting positions, therefore str.find() returns the same. To search for individual words positions, use multiple function calls.

Use regular expressions
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("What do you think of stack overflow?");
std::smatch m;
std::regex e ("\\bWhat do you think\\b");
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Also you can find wildcards implementing with boost (regex in std library was boost::regex library before c++11) there

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

How to search a string for multiple substrings

I need to check a short string for matches with a list of substrings. Currently, I do this like shown below (working code on ideone)
bool ContainsMyWords(const std::wstring& input)
{
if (std::wstring::npos != input.find(L"white"))
return true;
if (std::wstring::npos != input.find(L"black"))
return true;
if (std::wstring::npos != input.find(L"green"))
return true;
// ...
return false;
}
int main() {
std::wstring input1 = L"any text goes here";
std::wstring input2 = L"any text goes here black";
std::cout << "input1 " << ContainsMyWords(input1) << std::endl;
std::cout << "input2 " << ContainsMyWords(input2) << std::endl;
return 0;
}
I have 10-20 substrings that I need to match against an input. My goal is to optimize code for CPU utilization and reduce time complexity for an average case. I receive input strings at a rate of 10 Hz, with bursts to 10 kHz (which is what I am worried about).
There is agrep library with source code written in C, I wonder if there is a standard equivalent in C++. From a quick look, it may be a bit difficult (but doable) to integrate it with what I have.
Is there a better way to match an input string against a set of predefined substrings in C++?
The best thing is to use a regular expression search, if you use the following regular expression:
"(white)|(black)|(green)"
that way, with only one pass over the string, you'll get in group 1 if a match was found for the "white" substring (and beginning and end points), in group 2 if a match of the "black" substring (and beginning and end points), and in group 3 if a match of the "green" substring. As you get, from group 0 the position of the end of the match, you can begin a new search to look for more matches, and everything in one pass over the string!!!
You could use one big if, instead of several if statements. However, Nathan's Oliver solution with std::any_of is faster than that though, when making the array of the substrings static (so that they do not get to be recreated again and again), as shown below.
bool ContainsMyWordsNathan(const std::wstring& input)
{
// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};
return std::any_of(std::begin(keywords), std::end(keywords),
[&](const std::wstring& str){return input.find(str) != std::string::npos;});
}
PS: As discussed in Algorithm to find multiple string matches:
The "grep" family implement the multi-string search in a very efficient way. If you can use them as external programs, do it.

Determining the location of C++11 regular expression matches

How do I efficiently determine the location of a capture group inside a searched string? Getting the location of the entire match is easy, but I see no obvious ways to get at capture groups beyond the first.
This is a simplified example, lets presume "a*" and "b*" are complicated regexes that are expensive to run.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main()
{
regex matcher("a*(needle)b*");
smatch findings;
string haystack("aaaaaaaaneedlebbbbbbbbbbbbbb");
if( regex_match(haystack, findings, matcher) )
{
// What do I put here to know how the offset of "needle" in the
// string haystack?
// This is the position of the entire, which is
// always 0 with regex_match, with regex_search
cout << "smatch::position - " << findings.position() << endl;
// Is this just a string or what? Are there member functions
// That can be called?
cout << "Needle - " << findings[1] << endl;
}
return 0;
}
If it helps I built this question in Coliru: http://coliru.stacked-crooked.com/a/885a6b694d32d9b5
I will not mark this as and answer until 72 hours have passed and no better answers are present.
Before asking this I presumed smatch::position took no arguments I cared about, because when I read the cppreference page the "sub" parameter was not obviously an index into the container of matches. I thought it had something to do with "sub"strings and the offset value of the whole match.
So my answer is:
cout << "Needle Position- " << findings.position(1) << endl;
Any explanation on this design, or other issues my line of thinking may have caused would be appreciated.
According to the documentation, you can access the iterator pointing to the beginning and the end of the captured text via match[n].first and match[n].second. To get the start and end indices, just do pointer arithmetic with haystack.begin().
if (findings[1].matched) {
cout << "[" << findings[1].first - haystack.begin() << "-"
<< findings[1].second - haystack.begin() << "] "
<< findings[1] << endl;
}
Except for the main match (index 0), capturing groups may or may not capture anything. In such cases, first and second will point to the end of the string.
I also demonstrate the matched property of sub_match object. While it's unnecessary in this case, in general, if you want to print out the indices of the capturing groups, it's necessary to check whether the capturing group matches anything first.

I need to swap out two values in a string for one (C++)

I am making a roman numeral converter. I have everything figured out except there is one problem at the end.
The string looks like IVV
I need to make it IX
I have split the string at each new letter, then appended them back on, then using an if statement to see if it contains 2 "V"s. I want to know if there is a simpler way to do this.
Using std::string should help you tremendously as you can leverage its search and replace functionality. You'll want to start with the find function which allows you to search for a character or a string and returns an index where what you are searching for exists or npos if the search fails.
You can then call replace passing it the index returned by find, the number of characters you want to replace and what replace the range with.
The code below should help you get started.
#include <string>
#include <iostream>
int main()
{
std::string roman("IVV");
// Search for the string you want to replace
std::string::size_type loc = roman.find("VV");
// If the substring is found replace it.
if (loc != std::string::npos)
{
// replace 2 characters staring at position loc with the string "X"
roman.replace(loc, 2, "X");
}
std::cout << roman << std::endl;
return 0;
}
You could use std string find and rfind operations, these find the position of the first and the last occurrence of the entered parameter, check if these are not equal and you will know
Answer updated
#include <string>
int main()
{
std::string x1 = "IVV";
if (x1.find('V') !=x1.rfind('V'))
{
x1.replace(x1.find('V'), 2, 'X');
}
return 0;
}

Getting sub-match_results with boost::regex

Hey, let's say I have this regex: (test[0-9])+
And that I match it against: test1test2test3test0
const bool ret = boost::regex_search(input, what, r);
for (size_t i = 0; i < what.size(); ++i)
cout << i << ':' << string(what[i]) << "\n";
Now, what[1] will be test0 (the last occurrence). Let's say that I need to get test1, 2 and 3 as well: what should I do?
Note: the real regex is extremely more complex and has to remain one overall match, so changing the example regex to (test[0-9]) won't work.
I think Dot Net has the ability to make single capture group Collections so that (grp)+ will create a collection object on group1. The boost engine's regex_search() is going to be just like any ordinary match function. You sit in a while() loop matching the pattern where the last match left off. The form you used does not use a bid-itterator, so the function won't start the next match where the last match left off.
You can use the itterator form:
(Edit - you can also use the token iterator, defining what groups to iterate over. Added in the code below).
#include <boost/regex.hpp>
#include <string>
#include <iostream>
using namespace std;
using namespace boost;
int main()
{
string input = "test1 ,, test2,, test3,, test0,,";
boost::regex r("(test[0-9])(?:$|[ ,]+)");
boost::smatch what;
std::string::const_iterator start = input.begin();
std::string::const_iterator end = input.end();
while (boost::regex_search(start, end, what, r))
{
string stest(what[1].first, what[1].second);
cout << stest << endl;
// Update the beginning of the range to the character
// following the whole match
start = what[0].second;
}
// Alternate method using token iterator
const int subs[] = {1}; // we just want to see group 1
boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
boost::sregex_token_iterator j;
while(i != j)
{
cout << *i++ << endl;
}
return 0;
}
Output:
test1
test2
test3
test0
Boost.Regex offers experimental support for exactly this feature (called repeated captures); however, since it's huge performance hit, this feature is disabled by default.
To enable repeated captures, you need to rebuild Boost.Regex and define macro BOOST_REGEX_MATCH_EXTRA in all translation units; the best way to do this is to uncomment this define in boost/regex/user.hpp (see the reference, it's at the very bottom of the page).
Once compiled with this define, you can use this feature by calling/using regex_search, regex_match and regex_iterator with match_extra flag.
Check reference to Boost.Regex for more info.
Seems to me like you need to create a regex_iterator, using the (test[0-9]) regex as input. Then you can use the resulting regex_iterator to enumerate the matching substrings of your original target.
If you still need "one overall match" then perhaps that work has to be decoupled from the task of finding matching substrings. Can you clarify that part of your requirement?