I'm using the following code which is similar to Stroustrup's C++ 4th Edition Page 127&128. Per output log below, it prints the first match, however not the match for the trailing -XXXX digits.
Does anyone know why the trailing digits are not matched and/or printed??
Thanks
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char *argv[])
{
// ZIP code pattern: XXddddd-dddd and variants
regex pat (R"(\w{2}\s*\d{5}(−\d{4})?)");
int lineno = 0;
for (string line; getline(cin,line);) {
++lineno;
smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) // search for pat in line
for (auto p : matches) {
cout << p << " ";
}
cout << endl;
// cout << lineno << ": " << matches[0] << '\n';
}
return 0;
}
Output log:
$ ./a.out
AB00000-0000
AB00000
− is not -. That are two different symbols. You have − in the code and - in the input. Here I fixed the code:
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char *argv[])
{
// ZIP code pattern: XXddddd-dddd and variants
regex pat (R"(\w{2}\s*\d{5}(-\d{4})?)");
int lineno = 0;
for (string line; getline(cin,line);) {
++lineno;
smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) // search for pat in line
for (auto p : matches) {
cout << p << " ";
}
cout << endl;
// cout << lineno << ": " << matches[0] << '\n';
}
return 0;
}
Related
It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab
I have this sample code
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("\\(?<=pi\\)\\(\\d+\\)\\(?=\"\\)");
std::regex e (reg);
std::cout << "Target sequence: " << s << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
I need to get number between pi and " -> (piMYNUMBER")
In online regex service my regex works fine (?<=pi)(\d+)(?=") but c++ regex don't match anything.
Who knows what is wrong with my expression?
Best regards
That is correct, C++ std::regex flavors do not support lookbehinds. You need to capture the digits between pi and ":
#include <iostream>
#include <vector>
#include <regex>
int main() {
std::string s ("eritueriotu3498 \"pi656\" sdfs3646df");
std::smatch m;
std::string reg("pi(\\d+)\""); // Or, with a raw string literal:
// std::string reg(R"(pi(\d+)\")");
std::regex e (reg);
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), e, 1),
std::sregex_token_iterator());
// Demo printing the results:
std::cout << "Number of matches: " << results.size() << std::endl;
for( auto & p : results ) std::cout << p << std::endl;
return 0;
}
See the C++ demo. Output:
Number of matches: 1
656
Here, pi(\d+)" pattern matches
pi - a literal substring
(\d+) - captures 1+ digits into Group 1
" - consumes a double quote.
Note the fourth argument to std::sregex_token_iterator, it is 1 because you need to collect only Group 1 values.
I am trying to match a literal number, e.g. 1600442 using a set of regular expressions in Microsoft Visual Studio 2010. My regular expressions are simply:
1600442|7654321
7895432
The problem is that both of the above matches the string.
Implementing this in Python gives the expected result:
import re
serial = "1600442"
re1 = "1600442|7654321"
re2 = "7895432"
m = re.match(re1, serial)
if m:
print "found for re1"
print m.groups()
m = re.match(re2, serial)
if m:
print "found for re2"
print m.groups()
Gives output
found for re1
()
Which is what I expected. Using this code in C++ however:
#include <string>
#include <iostream>
#include <regex>
int main(){
std::string serial = "1600442";
std::tr1::regex re1("1600442|7654321");
std::tr1::regex re2("7895432");
std::tr1::smatch match;
std::cout << "re1:" << std::endl;
std::tr1::regex_search(serial, match, re1);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl << "re2:" << std::endl;
std::tr1::regex_search(serial, match, re2);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
gives me:
re1:
1600442
re2:
1600442
which is not what I expected. Why do I get match here?
The smatch does not get overwritten by the second call to regex_search thus, it is left intact and contains the first results.
You can move the regex searching code to a separate method:
void FindMeText(std::regex re, std::string serial)
{
std::smatch match;
std::regex_search(serial, match, re);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
}
int main(){
std::string serial = "1600442";
std::regex re1("^(?:1600442|7654321)");
std::regex re2("^7895432");
std::cout << "re1:" << std::endl;
FindMeText(re1, serial);
std::cout << "re2:" << std::endl;
FindMeText(re2, serial);
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
Result:
Note that Python re.match searches for the pattern match at the start of string only, thus I suggest using ^ (start of string) at the beginning of each pattern.
Below is my following code
#include <iostream>
#include <stdlib.h>
#include <boost/regex.hpp>
#include <string>
using namespace std;
using namespace boost;
int main() {
std::string s = "Hello my name is bob";
boost::regex re("name");
boost::cmatch matches;
try{
// if (boost::regex_match(s.begin(), s.end(), re))
if (boost::regex_match(s.c_str(), matches, re)){
cout << matches.size();
// matches[0] contains the original string. matches[n]
// contains a sub_match object for each matching
// subexpression
for (int i = 1; i < matches.size(); i++){
// sub_match::first and sub_match::second are iterators that
// refer to the first and one past the last chars of the
// matching subexpression
string match(matches[i].first, matches[i].second);
cout << "\tmatches[" << i << "] = " << match << endl;
}
}
else{
cout << "No Matches(" << matches.size() << ")\n";
}
}
catch (boost::regex_error& e){
cout << "Error: " << e.what() << "\n";
}
}
It always outputs with no Matches.
Im sure that regex should work.
I used this example
http://onlamp.com/pub/a/onlamp/2006/04/06/boostregex.html?page=3
from boost regex:
Important
Note that the result is true only if the expression matches the whole of the input sequence. If you want to search for an expression somewhere within the sequence then use regex_search. If you want to match a prefix of the character string then use regex_search with the flag match_continuous set.
Try boost::regex re("(.*)name(.*)"); if you want to use the expression with regex_match.
From the following code, I expect to get this output from the corresponding input:
Input: FOO Output: Match
Input: FOOBAR Output: Match
Input: BAR Output: No Match
Input: fOOBar Output: No Match
But why it gives "No Match" for input FOOBAR?
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string InputString = arg_vec[1];
string toMatch = "FOO";
const regex e(toMatch);
if (regex_match(InputString, e,match_partial)) {
cout << "Match" << endl;
} else {
cout << "No Match" << endl;
}
return 0;
}
Update:
Finally it works with the following approach:
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
bool testSearchBool(const boost::regex &ex, const string st) {
cout << "Searching " << st << endl;
string::const_iterator start, end;
start = st.begin();
end = st.end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
return boost::regex_search(start, end, what, ex, flags);
}
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string InputString = arg_vec[1];
string toMatch = "FOO*";
static const regex e(toMatch);
if ( testSearchBool(e,InputString) ) {
cout << "MATCH" << endl;
}
else {
cout << "NOMATCH" << endl;
}
return 0;
}
Use regex_search instead of regex_match.
Your regular expression has to account for characters at the beginning and end of the sub-string "FOO".
I'm not sure but "FOO*" might do the trick
match_partial would only return true if the partial string was found at the end of the text input, not the beginning.
A partial match is one that matched
one or more characters at the end of
the text input, but did not match all
of the regular expression (although it
may have done so had more input been
available)
So FOOBAR matched with "FOO" would return false.
As the other answer suggests, using regex.search would allow you to search for sub-strings more effectively.