C++ Regex always matching entire string - c++

Whenever I use a regex function it matches the entire string for some reason.
#include <iostream>
#include <regex>
int main() {
std::string text = "This (is a) test";
std::regex pattern("\(.+\)");
std::cout << std::regex_replace(text, pattern, "isnt") << std::endl;
return 0;
}
Output: isnt

Your pattern unfortunately is not what it seems to be. Here is the problem.
Imagine for some reason you want to match tabs in with you regex. You might try this.
std::regex my_regex("\t");
This would work, but the string your std::regex class has seen is " ", not "\t". This is because of how C++ threats escaped characters. To pass literal "\t", you had to do the following.
std::regex my_regex("\\t");
So the correct syntax for your regex is.
std::regex pattern("\\(.+\\)");

Related

regex_replace invalid open parenthesis

DEMO
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(?<=bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"0");
std::wcout << str << std::endl;
}
error:
terminate called after throwing an instance of 'std::regex_error'
what(): Invalid special open parenthesis.
https://regex101.com/r/a33eFL/1
Whats wrong with the parenthesis?
Well, this is one illustration why the plural of "regex" is "regrets"...
C++ accepts several flavours of regexes, but none of them seems to understand lookbehinds. Default modified ECMAScript flavour only accepts lookaheads. I'm not 100% sure about POSIX, awk and grep flavours, but none of them seems to have any lookarounds whatsoever.
Fortunately, you can get the same effect without lookarounds, using capturing group. I had to change format string rules to sed, because default ECMAScript rules allow for two-digit backreferences.
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"\\10", std::regex_constants::format_sed);
std::wcout << str << std::endl;
}
See it online
You don't need to use a lookbehind for this situation. Simply use a normal capturing group and include it in the replacement string:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"$010");
std::wcout << str << std::endl;
}
Output:
bst.enable_adb_access="0"
Note that because the substitution for the capturing group is followed by a digit, we need to use the $nn format for the group number (hence $010), otherwise $10 could - dependent on the compiler - be interpreted as replacing with capture group 10.
Demo on ideone

How to match complex strings with regular expressions

I am a newbie in C++, I am using the regular expression function, but I have not been able to get the results I want
c++ code:
#include <regex>
std::string str = "[game.exe+009E820C]+338";
std::smatch result;
std::regex pattern("\\[([^\\[\\]]+)\\]");
std::regex_match(str, result, pattern);
// no result
std::cout << result[1] << std::endl;
I am familiar with javascript regular expressions, so I can get the value I want:
'[game.exe+009E820C]+338'.match(/\[([^\[\]]+)\]/)[1] => game.exe+009E820C
Is my c++ code doing something wrong
If you want to access the capture groups, it appears that the regex_match API requires a pattern which matches the entire input. Also, to avoid getting bogged down by a negative character class which includes a closing square bracket, I recommend using the Perl lazy dot instead. Putting all this together:
std::string str = "[game.exe+009E820C]+338";
std::smatch result;
std::regex pattern(".*\\[(.*?)\\].*");
std::regex_match(str, result, pattern);
std::cout << result[1] << std::endl;
This prints:
game.exe+009E820C

Regexp matching fails with invalid special open parenthesis

I am trying to use regexps in c++11, but my code always throws an std::regex_error of Invalid special open parenthesis.. A minimal example code which tries to find the first duplicate character in a string:
std::string regexp_string("(?P<a>[a-z])(?P=a)"); // Nothing to be escaped here, right?
std::regex regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
std::regex_match(target, matched_regexp, regexp_to_match);
for(const auto& m: matched_regexp)
{
std::cout << m << std::endl;
}
Why do I get an error and how do I fix this example?
There are 2 issues here:
std::regex flavors do not support named capturing groups / backreferences, you need to use numbered capturing groups / backreferences
You should use regex_search rather than regex_match that requires a full string match.
Use
std::string regexp_string(R"(([a-z])\1)");
std::regex regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
if (std::regex_search(target, matched_regexp, regexp_to_match)) {
std::cout << matched_regexp.str() << std::endl;
}
// => bb
See the C++ demo
The R"(([a-z])\1)" raw string literal defines the ([a-z])\1 regex that matches any lowercase ASCII letter and then matches the same letter again.
http://en.cppreference.com/w/cpp/regex/ecmascript says that ECMAScript (the default type for std::regex) requires (?= for positive lookahead.
The reason your regex crashes for you is because named groups not supported by std::regex. However you can still use what is available to find the first duplicate char in string:
#include <iostream>
#include <regex>
int main()
{
std::string s = "abc def cde";
std::smatch m;
std::regex r("(\\w).*?(?=\\1)");
if (std::regex_search(s, m, r))
std::cout << m[1] << std::endl;
return 0;
}
Prints
c

Replacing tokens that match pieces of a regex

I would like to use a regex both as a pattern to search and a template to construct a string. (I'm using boost::regex because I'm on gcc 4.8.4 where apparently regex is not fully supported (until 4.9)):
That is, I want to construct a regex, pass it to a function, use the regex to match some files, then construct an output file name following the same pattern. For example:
Regex: "file_.*\.txt"
to match things like "file_1.txt", "file_2.txt", etc.
and then would like to construct from it
Output: "file_all.txt"
That is, I want to match files starting with "file_" and ending with ".txt", then I want to fill in "all" between the "file_" and the ".txt", all from a single regex object.
We'll skip the matching to the regex as that is straightforward, but rather focus on the replacement:
#include <iostream>
#include <iterator>
#include <string>
#include <boost/regex.hpp>
std::string constructOutput(const boost::regex& myRegex)
{
// How to replace the match to the center of the filenames here?
// return boost::regex_replace(?, myRegex, "all");
}
int main()
{
// We can do something like this, but it requires us to manually separate the "center" of the regex from the string, as well as keep around a string object and a regex object:
// std::string myText = "File_.*.txt";
// boost::regex myRegex("_.*\\.");
// std::cout << '\n' << boost::regex_replace(myText, myRegex, "_all.") << '\n';
// Want to do this:
boost::regex myRegex("File_.*\\.txt");
std::string outputString = constructOutput(myRegex);
std::cout << outputString << std::endl;
}
Is something like this possible?

Need help constructing Regular expression pattern

I'm failing to create a pattern for the stl regex_match function and need some help understanding why the pattern I created doesn't work and what would fix it.
I think the regex would have a hit for dl.boxcloud.com but it does not.
****still looking for input. I updated the program reflect suggestions. There are two matches when I think should be one.
#include <string>
#include <regex>
using namespace std;
wstring GetBody();
int _tmain(int argc, _TCHAR* argv[])
{
wsmatch m;
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
regex_search(GetBody(), m, wregex(regex));
printf("%d matches.\n", m.size());
return 0;
}
wstring GetBody() {
wstring body(L"ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
return body;
}
There is no problem with the code itself. You mistake m.size() for the number of matches, when in fact, it is a number of groups your regex returns.
The std::match_results::size reference is not helpful with understanding that:
Returns the number of matches and sub-matches in the match_results object.
There are 2 groups (since you defined a capturing group around the 2 alternatives) and 1 match all in all.
See this IDEONE demo
#include <regex>
#include <string>
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
string data("ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
std::regex pattern("(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << "Match: " << result[0] << std::endl;
std::cout << "Captured text 1: " << result[1] << std::endl;
std::cout << "Size: " << result.size() << std::endl;
data = result.suffix().str();
}
}
It outputs:
Match: dl.boxcloud.com
Captured text 1: dl.boxcloud.com
Size: 2
See, the captured text equals the whole match.
To "fix" that, you may use non-capturing group, or remove grouping at all:
std::regex pattern("(?:dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
// or
std::regex pattern("dl\\.boxcloud\\.com|api-content\\.dropbox\\.com");
Also, consider using raw string literal when declaring a regex (to avoid backslash hell):
std::regex pattern(R"(dl\.boxcloud\.com|api-content\.dropbox\.com)");
You need to add another "\" before each ".". I think that should fix it. You need to use escape character to represent "\" so your regex looks like this
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
Update:
As #user3494744 also said you have to use
std::regex_search
instead of
std::regex_match.
I tested and it works now.
The problem is that you use regex_match instead of regex_search. To quote from the manual:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences
This fix will give a match, but too many since you also have to replace \. by \\. as shown before my answer. Otherwise the string "dlXboxcloud.com" will also match.