C++11 regex to match nothing [duplicate] - c++

This question already has answers here:
A Regex that will never be matched by anything
(30 answers)
Closed 9 years ago.
I encountered a task which reuires a regex matching nothing.
C++ reference happily states that it already has such thing:
http://en.cppreference.com/w/cpp/regex/basic_regex/basic_regex
1) Default constructor. Constructs an empty regular expression which will match nothing.
But in reality (clang 3.3) it happens not to be the case:
#include <string>
#include <regex>
int main(int argc, const char *argv[]) {
std::regex re1;
std::regex re2("");
std::smatch rt1, rt2;
bool r1 = std::regex_match(std::string(""), rt1, re1);
bool r2 = std::regex_match(std::string(""), rt2, re2);
std::cerr << "r1:" << r1 << ", r2:" << r2 << std::endl;
}
This program prints: r1:1, r2:1
What should mean that both regexes matched empty string.
Any idea what is wrong here and how to create "match nothing" regex ?

The default constructor for std::basic_regex constructs a regular expression that "does not match any character sequence". [re.regex.construct]/1. If your implementation matches an empty character sequence it's wrong.

Related

regex_match not returning true [duplicate]

This question already has an answer here:
Regex not working as expected with C++ regex_match
(1 answer)
Closed 4 days ago.
I am very confused why this regex match in C++ not working.
#include <iostream>
#include <regex>
#include <string>
void test_code(){
const std::string test_string("this is a test of test");
const std::regex match_regex("test");
std::cout<<test_string<<std::endl;
std::smatch match;
if (std::regex_match(test_string, match, match_regex)){
std::cout<<match.size()<<std::endl;
}
}
int main() {
test_code();
}
I read the CPP reference documentation and tried to write a simple regex check. I am not sure why this is not working (i.e. it s not returning true for std::regex_match(...) call.
As stated in documentation for std::regex_match() (emphasis is mine):
Determines if the regular expression e matches the entire target character sequence, which may be specified as std::string, a C-string, or an iterator pair.
and your regex pattern does not obviously match the whole string. So you either need to change your regex to something like ".*test.*" or use std::regex_search() If you want to check substring for matching:
Determines if there is a match between the regular expression e and some subsequence in the target character sequence.

Validating an NMEA sentence using C++ [duplicate]

This question already has answers here:
Regex statement in C++ isn't working as expected [duplicate]
(3 answers)
Closed 3 years ago.
I need help with creating regular expressions for NMEA sentence. The reason for this because I want to validate the data whether it is a correct form of NMEA sentence. Using C++. Below is some example of NMEA sentence in the form of GLL. If it's possible I would also like to get a sample of c++ that will validate the code.
$GPGLL,5425.32,N,106.92,W,82808*64
$GPGLL,5425.33,N,106.91,W,82826*6a
$GPGLL,5425.32,N,106.9,W,82901*5e
$GPGLL,5425.32,N,106.89,W,82917*61
I have also included the expression I have tried that I found it online. But when I run it, it says unknown escape sequence.
#include <iostream>
#include <regex>
#include<string.h>
using namespace std;
int main()
{
// Target sequence
string s = "$GPGLL, 54 30.49, N, 1 06.74, W, 16 39 58 *5E";
// An object of regex for pattern to be searched
regex r("[A-Z] \w+,\d,\d,(?:\d{1}|),[A-B],[^,]+,0\*([A-Za-z0-9]{2})");
// flag type for determining the matching behavior
// here it is for matches on 'string' objects
smatch m;
// regex_search() for searching the regex pattern
// 'r' in the string 's'. 'm' is flag for determining
// matching behavior.
regex_search(s, m, r);
// for each loop
for (auto x : m)
cout << "The nmea sentence is correct ";
return 0;
}
The C++ compiler interprets \d and friends as a character escape code.
Either double the backslashes:
regex r("[A-Z] \\w+,\\d,\\d,(?:\\d{1}|),[A-B],[^,]+,0\\*([A-Za-z0-9]{2})");
or use a raw literal:
regex r(R"re([A-Z] \w+,\d,\d,(?:\d{1}|),[A-B],[^,]+,0\*([A-Za-z0-9]{2}))re");

Is regex match guaranteed to always only look out for the last pattern? C++ [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
Assume I have a string like this:
"a-b-c-d"
n = 4 sequences seperated by "-".
Now I want to receive the first n - 1 sequences ("a-b-c") and the last sequence - ("d").
I can achieve this with the following code:
std::string string{ "a-b-c-d" };
std::regex reg{ "^(.*)-(.*)$" };
std::smatch match;
std::regex_match(string, match, reg);
std::cout << match.str(1) << '\n';
std::cout << match.str(2) << '\n';
producing the excpected output:
a-b-c
d
However, following the pure logical grammar of this regex ("^(.*)-(.*)$")
a
b-c-d
or
a-b
c-d
could also be valid matches of the string. Afterall (.*) could be interpreted differently here and the first (.*) could decide to stop at the first character sequence or the second etc.
So my question: is std::smatch guaranteed to behave this way? Does std::smatch always explicitly look for the last patterns when giving the option to capture with (.*)? Is there a way to tell std::smatch to look for the first occurrence rather than the last?
* is greedy. So the first (.*) matches as much as it can while the second (.*) still has something left to match. There is only one correct match, and it is the one you want.
If you want the first group to be matched non-greedily, add a ? after the *:
^(.*?)-(.*)$
For your example input a-b-c-d this leaves you with a in the first capture group and b-c-d in the second.

C++ std::regex confusion

While working on a solution to this question, I came up with the following c++ regex:
#include <regex>
#include <string>
#include <iostream>
std::string remove_password(std::string const& input)
{
// I think this should work for skipping escaped quotes in the password.
// It works in javascript, but not in the standard library implementation.
// anyone have any ideas?
// (.*password\(("|'))(?:\\\2|[^\2])*?(\2.*)
// const char prog[] = R"__regex((.*password\(')([^']*)('.*)))__regex";
const char prog[] = R"__regex((.*password\(("|'))(?:\\\2|[^\2])*?(\2.*))__regex";
auto reg = std::regex(prog, std::regex_constants::syntax_option_type::ECMAScript);
std::smatch match;
std::regex_match(input, match, reg);
// match[0] is the entire string
// match[1] is pre-password
// match[2] is the password
// match[3] is post-password
return match[1].str() + "********" + match[3].str();
}
int main()
{
using namespace std::literals;
auto test_string = R"__(select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),password('vijay'),dbname('default'),query('analyze table default.test01 compute statistics'));)__";
std::cout << remove_password(test_string);
}
I wanted to capture passwords, even if they contained an escaped quote or double-quote.
However the regex does not compile in clang or gcc.
It compiles correctly in regex101.com when using the javascript syntax.
Am I wrong, or is the implementation incorrect?
Note that ECMAScript is the default flavor in C++ std::regex, you do not have to specify it explicitly. At any rate, std::regex_constants::syntax_option_type::ECMAScript causes one error here since the compiler expects a std::regex_constants value here, and the simplest fix is to remove it or use std::regex(prog, std::regex_constants::ECMAScript).
The [^\2] pattern causes the second issue, Unexpected character in bracket expression. You cannot use backreferences inside bracket expressions, but you may use a negative lookahead to restrict a . / [^] pattern to match anything but what Group 2 holds.
Use
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
See your fixed C++ demo.
However, it seems you may use a "cleaner" approach using std::regex_replace:
std::string remove_password(std::string const& input)
{
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
auto reg = std::regex(prog);
return std::regex_replace(input, reg, "$1********$3");
}
See another C++ demo. The $1 and $3 are the placeholders for Group 1 and 3 values.

Difference between std::regex_match & std::regex_search?

Below program has been written to fetch the "Day" information using the C++11 std::regex_match & std::regex_search. However, using the first method returns false and second method returns true(expected). I read the documentation and already existing SO question related to this, but I do not understand the difference between these two methods and when we should use either of them? Can they both be used interchangeably for any common problem?
Difference between regex_match and regex_search?
#include<iostream>
#include<string>
#include<regex>
int main()
{
std::string input{ "Mon Nov 25 20:54:36 2013" };
//Day:: Exactly Two Number surrounded by spaces in both side
std::regex r{R"(\s\d{2}\s)"};
//std::regex r{"\\s\\d{2}\\s"};
std::smatch match;
if (std::regex_match(input,match,r)) {
std::cout << "Found" << "\n";
} else {
std::cout << "Did Not Found" << "\n";
}
if (std::regex_search(input, match,r)) {
std::cout << "Found" << "\n";
if (match.ready()){
std::string out = match[0];
std::cout << out << "\n";
}
}
else {
std::cout << "Did Not Found" << "\n";
}
}
Output
Did Not Found
Found
25
Why first regex method returns false in this case?. The regex seems to be correct so ideally both should have been returned true. I ran the above program by changing the std::regex_match(input,match,r) to std::regex_match(input,r) and found that it still returns false.
Could somebody explain the above example and, in general, use cases of these methods?
regex_match only returns true when the entire input sequence has been matched, while regex_search will succeed even if only a sub-sequence matches the regex.
Quoting from N3337,
§28.11.2/2 regex_match [re.alg.match]
Effects: Determines whether there is a match between the regular expression e, and all of the character sequence [first,last). ... Returns true if such a match exists, false otherwise.
The above description is for the regex_match overload that takes a pair of iterators to the sequence to be matched. The remaining overloads are defined in terms of this overload.
The corresponding regex_search overload is described as
§28.11.3/2 regex_search [re.alg.search]
Effects: Determines whether there is some sub-sequence within [first,last) that matches the regular expression e. ... Returns true if such a sequence exists, false otherwise.
In your example, if you modify the regex to r{R"(.*?\s\d{2}\s.*)"}; both regex_match and regex_search will succeed (but the match result is not just the day, but the entire date string).
Live demo of a modified version of your example where the day is being captured and displayed by both regex_match and regex_search.
It's very simple. regex_search looks through the string to find if any portion of the string matches the regex. regex_match checks if the whole string is a match for the regex. As a simple example, given the following string:
"one two three four"
If I use regex_search on that string with the expression "three", it will succeed, because "three" can be found in "one two three four"
However, if I use regex_match instead, it will fail, because "three" is not the whole string, but only a part of it.