Difference between std::regex_match & std::regex_search? - c++

Below program has been written to fetch the "Day" information using the C++11 std::regex_match & std::regex_search. However, using the first method returns false and second method returns true(expected). I read the documentation and already existing SO question related to this, but I do not understand the difference between these two methods and when we should use either of them? Can they both be used interchangeably for any common problem?
Difference between regex_match and regex_search?
#include<iostream>
#include<string>
#include<regex>
int main()
{
std::string input{ "Mon Nov 25 20:54:36 2013" };
//Day:: Exactly Two Number surrounded by spaces in both side
std::regex r{R"(\s\d{2}\s)"};
//std::regex r{"\\s\\d{2}\\s"};
std::smatch match;
if (std::regex_match(input,match,r)) {
std::cout << "Found" << "\n";
} else {
std::cout << "Did Not Found" << "\n";
}
if (std::regex_search(input, match,r)) {
std::cout << "Found" << "\n";
if (match.ready()){
std::string out = match[0];
std::cout << out << "\n";
}
}
else {
std::cout << "Did Not Found" << "\n";
}
}
Output
Did Not Found
Found
25
Why first regex method returns false in this case?. The regex seems to be correct so ideally both should have been returned true. I ran the above program by changing the std::regex_match(input,match,r) to std::regex_match(input,r) and found that it still returns false.
Could somebody explain the above example and, in general, use cases of these methods?

regex_match only returns true when the entire input sequence has been matched, while regex_search will succeed even if only a sub-sequence matches the regex.
Quoting from N3337,
§28.11.2/2 regex_match [re.alg.match]
Effects: Determines whether there is a match between the regular expression e, and all of the character sequence [first,last). ... Returns true if such a match exists, false otherwise.
The above description is for the regex_match overload that takes a pair of iterators to the sequence to be matched. The remaining overloads are defined in terms of this overload.
The corresponding regex_search overload is described as
§28.11.3/2 regex_search [re.alg.search]
Effects: Determines whether there is some sub-sequence within [first,last) that matches the regular expression e. ... Returns true if such a sequence exists, false otherwise.
In your example, if you modify the regex to r{R"(.*?\s\d{2}\s.*)"}; both regex_match and regex_search will succeed (but the match result is not just the day, but the entire date string).
Live demo of a modified version of your example where the day is being captured and displayed by both regex_match and regex_search.

It's very simple. regex_search looks through the string to find if any portion of the string matches the regex. regex_match checks if the whole string is a match for the regex. As a simple example, given the following string:
"one two three four"
If I use regex_search on that string with the expression "three", it will succeed, because "three" can be found in "one two three four"
However, if I use regex_match instead, it will fail, because "three" is not the whole string, but only a part of it.

Related

regex_match not returning true [duplicate]

This question already has an answer here:
Regex not working as expected with C++ regex_match
(1 answer)
Closed 4 days ago.
I am very confused why this regex match in C++ not working.
#include <iostream>
#include <regex>
#include <string>
void test_code(){
const std::string test_string("this is a test of test");
const std::regex match_regex("test");
std::cout<<test_string<<std::endl;
std::smatch match;
if (std::regex_match(test_string, match, match_regex)){
std::cout<<match.size()<<std::endl;
}
}
int main() {
test_code();
}
I read the CPP reference documentation and tried to write a simple regex check. I am not sure why this is not working (i.e. it s not returning true for std::regex_match(...) call.
As stated in documentation for std::regex_match() (emphasis is mine):
Determines if the regular expression e matches the entire target character sequence, which may be specified as std::string, a C-string, or an iterator pair.
and your regex pattern does not obviously match the whole string. So you either need to change your regex to something like ".*test.*" or use std::regex_search() If you want to check substring for matching:
Determines if there is a match between the regular expression e and some subsequence in the target character sequence.

Replace single backslash with double in a string c++

I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.

Is regex match guaranteed to always only look out for the last pattern? C++ [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
Assume I have a string like this:
"a-b-c-d"
n = 4 sequences seperated by "-".
Now I want to receive the first n - 1 sequences ("a-b-c") and the last sequence - ("d").
I can achieve this with the following code:
std::string string{ "a-b-c-d" };
std::regex reg{ "^(.*)-(.*)$" };
std::smatch match;
std::regex_match(string, match, reg);
std::cout << match.str(1) << '\n';
std::cout << match.str(2) << '\n';
producing the excpected output:
a-b-c
d
However, following the pure logical grammar of this regex ("^(.*)-(.*)$")
a
b-c-d
or
a-b
c-d
could also be valid matches of the string. Afterall (.*) could be interpreted differently here and the first (.*) could decide to stop at the first character sequence or the second etc.
So my question: is std::smatch guaranteed to behave this way? Does std::smatch always explicitly look for the last patterns when giving the option to capture with (.*)? Is there a way to tell std::smatch to look for the first occurrence rather than the last?
* is greedy. So the first (.*) matches as much as it can while the second (.*) still has something left to match. There is only one correct match, and it is the one you want.
If you want the first group to be matched non-greedily, add a ? after the *:
^(.*?)-(.*)$
For your example input a-b-c-d this leaves you with a in the first capture group and b-c-d in the second.

Multiple string replaces in one line

I have a sql statement and for debugging I want to print it. The statement contains placeholders and I want to fill the placeholders in one instruction line before I print. Is this valid or UB?
std::string query("SELECT A, B FROM C WHERE D = ? and E = ?;");
std::cout << query.replace(query.find("?"), 1, "123").replace(query.find("?"), 1, "234") << std::endl;
Is the order of the instructions
Find position of first question mark
Replace first string in query
Find position of second question mark after first replacement
Replace second string in query
guaranteed or is it possible that both find operations can be called before both replace operations like
Find position of first question mark
Find position of second question mark before first replacement
Replace first string in query
Replace second string in query
I'm asking because:
Order of evaluation of the operands of almost all C++ operators
(including the order of evaluation of function arguments in a
function-call expression and the order of evaluation of the
subexpressions within any expression) is unspecified. The compiler can
evaluate operands in any order, and may choose another order when the
same expression is evaluated again.
EDIT:
It's not possible to use third party dependencies in this project.
In query.replace(query.find("?"), 1, "123").replace(query.find("?"), 1, "234")
query.find("?") is un-sequenced from each other.
so result is unpredictable between possible sequences.
I cannot find anything in the rules of order of evaluation that strictly specifies the ordering of the function arguments of chained functions. That is to say that in your case you can know that:
The first replace is sequenced before the second one, because the second one operates on its return value
Each find call is sequenced before the replace that uses its return value as an argument
But what you want is for the first replace to be sequenced before the second find and there is no such guarantee. For reference, see the rules here.
You can use boost::algorithm::replace_first multiple times:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
#include <string>
int main() {
std::string query("SELECT A, B FROM C WHERE D = ? and E = ?;");
for(auto replacement : {"123", "1"})
boost::algorithm::replace_first(query, "?", replacement);
std::cout << query << '\n';
}
Note that this simple string replacement won't work for replacement strings that need quoting.

What is returned in std::smatch and how are you supposed to use it?

string "I am 5 years old"
regex "(?!am )\d"
if you go to http://regexr.com/ and apply regex to the string you'll get 5.
I would like to get this result with std::regex, but I do not understand how to use match results and probably regex has to be changed as well.
std::regex expression("(?!am )\\d");
std::smatch match;
std::string what("I am 5 years old.");
if (regex_search(what, match, expression))
{
//???
}
The std::smatch is an instantiation of the match_results class template for matches on string objects (with string::const_iterator as its iterator type). The members of this class are those described for match_results, but using string::const_iterator as its BidirectionalIterator template parameter.
std::match_results supports a operator[]:
If n > 0 and n < size(), returns a reference to the std::sub_match representing the part of the target sequence that was matched by the nth captured marked subexpression).
If n == 0, returns a reference to the std::sub_match representing the part of the target sequence matched by the entire matched regular expression.
if n >= size(), returns a reference to a std::sub_match representing an unmatched sub-expression (an empty subrange of the target sequence).
In your case, regex_search finds the first match only and then match[0] holds the entire match text, match[1] would contain the text captured with the first capturing group (the fist parenthesized pattern part), etc. In this case though, your regex does not contain capturing groups.
Here, you need to use a capturing mechanism here since std::regex does not support a lookbehind. You used a lookahead that checks the text that immediately follows the current location, and the regex you have is not doing what you think it is.
So, use the following code:
#include <regex>
#include <string>
#include <iostream>
using namespace std;
int main() {
std::regex expression(R"(am\s+(\d+))");
std::smatch match;
std::string what("I am 5 years old.");
if (regex_search(what, match, expression))
{
cout << match.str(1) << endl;
}
return 0;
}
Here, the pattern is am\s+(\d+)". It is matching am, 1+ whitespaces, and then captures 1 or more digits with (\d+). Inside the code, match.str(1) allows access to the values that are captured with capturing groups. As there is only one (...) in the pattern, one capturing group, its ID is 1. So, str(1) returns the text captured into this group.
The raw string literal (R"(...)") allows using a single backslash for regex escapes (like \d, \s, etc.).