regex visual studio - c++

I was planning to use the following regex to capture path and name of a file:
std::regex capture_path_name_file("(.+)\\([^\\]+)\\.[^\\]+$");
but when running (i'm using visual studio) i get the regex error
error_brack: the expression contained mismatched [ and ]
Trying to pinpoint the cause i tried the following regex:
std::regex test("[^\\]")
and I got the same error.
I have tested my regex in regex101.com (with the slight difference that i had to use \. instead of \\.)
Thanks for any help.

The issue you have is because \\ is treated as 1 literal \ symbol in regular string literals. Biffen explained it well in his comment, [^\\] is treated as [^\], the ] is treated as a literal ] and not the closing character class delimiter (and there is no matching ] to close the character class further).
The right answer is: use _splitpath_s.
And if you want to further play with regex, you can fix it like this:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex rex1(R"((.+?)([^\\.]+\.[^\\.]+)$)");
std::smatch m;
std::string str = "c:\\Python27\\REGEX\\test_regex.py";
if (regex_search(str, m, rex1)) {
std::cout << "Path: " << m[1] << std::endl;
std::cout << "File name: " << m[2] << std::endl;
}
return 0;
}
Using raw string literals, you can avoid the majority of issues related to escaping. Use R"((.+?)([^\\.]+\.[^\\.]+)$)", it will match and capture into Group 1 the file folder path, and it will capture into Group 2 the file name with extension. Note that the extension must be present.

Related

C++ Regex always matching entire string

Whenever I use a regex function it matches the entire string for some reason.
#include <iostream>
#include <regex>
int main() {
std::string text = "This (is a) test";
std::regex pattern("\(.+\)");
std::cout << std::regex_replace(text, pattern, "isnt") << std::endl;
return 0;
}
Output: isnt
Your pattern unfortunately is not what it seems to be. Here is the problem.
Imagine for some reason you want to match tabs in with you regex. You might try this.
std::regex my_regex("\t");
This would work, but the string your std::regex class has seen is " ", not "\t". This is because of how C++ threats escaped characters. To pass literal "\t", you had to do the following.
std::regex my_regex("\\t");
So the correct syntax for your regex is.
std::regex pattern("\\(.+\\)");

regex_replace invalid open parenthesis

DEMO
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(?<=bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"0");
std::wcout << str << std::endl;
}
error:
terminate called after throwing an instance of 'std::regex_error'
what(): Invalid special open parenthesis.
https://regex101.com/r/a33eFL/1
Whats wrong with the parenthesis?
Well, this is one illustration why the plural of "regex" is "regrets"...
C++ accepts several flavours of regexes, but none of them seems to understand lookbehinds. Default modified ECMAScript flavour only accepts lookaheads. I'm not 100% sure about POSIX, awk and grep flavours, but none of them seems to have any lookarounds whatsoever.
Fortunately, you can get the same effect without lookarounds, using capturing group. I had to change format string rules to sed, because default ECMAScript rules allow for two-digit backreferences.
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"\\10", std::regex_constants::format_sed);
std::wcout << str << std::endl;
}
See it online
You don't need to use a lookbehind for this situation. Simply use a normal capturing group and include it in the replacement string:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"$010");
std::wcout << str << std::endl;
}
Output:
bst.enable_adb_access="0"
Note that because the substitution for the capturing group is followed by a digit, we need to use the $nn format for the group number (hence $010), otherwise $10 could - dependent on the compiler - be interpreted as replacing with capture group 10.
Demo on ideone

Regexp matching fails with invalid special open parenthesis

I am trying to use regexps in c++11, but my code always throws an std::regex_error of Invalid special open parenthesis.. A minimal example code which tries to find the first duplicate character in a string:
std::string regexp_string("(?P<a>[a-z])(?P=a)"); // Nothing to be escaped here, right?
std::regex regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
std::regex_match(target, matched_regexp, regexp_to_match);
for(const auto& m: matched_regexp)
{
std::cout << m << std::endl;
}
Why do I get an error and how do I fix this example?
There are 2 issues here:
std::regex flavors do not support named capturing groups / backreferences, you need to use numbered capturing groups / backreferences
You should use regex_search rather than regex_match that requires a full string match.
Use
std::string regexp_string(R"(([a-z])\1)");
std::regex regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
if (std::regex_search(target, matched_regexp, regexp_to_match)) {
std::cout << matched_regexp.str() << std::endl;
}
// => bb
See the C++ demo
The R"(([a-z])\1)" raw string literal defines the ([a-z])\1 regex that matches any lowercase ASCII letter and then matches the same letter again.
http://en.cppreference.com/w/cpp/regex/ecmascript says that ECMAScript (the default type for std::regex) requires (?= for positive lookahead.
The reason your regex crashes for you is because named groups not supported by std::regex. However you can still use what is available to find the first duplicate char in string:
#include <iostream>
#include <regex>
int main()
{
std::string s = "abc def cde";
std::smatch m;
std::regex r("(\\w).*?(?=\\1)");
if (std::regex_search(s, m, r))
std::cout << m[1] << std::endl;
return 0;
}
Prints
c

unchecked exception while running regex- get file name without extention from file path

I have this simple program
string str = "D:\Praxisphase 1 project\test\Brainstorming.docx";
regex ex("[^\\]+(?=\.docx$)");
if (regex_match(str, ex)){
cout << "match found"<< endl;
}
expecting the result to be true, my regex is working since I have tried it online, but when trying to run in C++ , the app throws unchecked exception.
First of all, use raw string literals when defining regex to avoid issues with backslashes (the \. is not a valid escape sequence, you need "\\." or R"(\.)"). Second, regex_match requires a full string match, thus, use regex_search.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str = R"(D:\Praxisphase 1 project\test\Brainstorming.docx)";
// OR
// string str = R"D:\\Praxisphase 1 project\\test\\Brainstorming.docx";
regex ex(R"([^\\]+(?=\.docx$))");
if (regex_search(str, ex)){
cout << "match found"<< endl;
}
return 0;
}
See the C++ demo
Note that R"([^\\]+(?=\.docx$))" = "[^\\\\]+(?=\\.docx$)", the \ in the first are literal backslashes (and you need two backslashes in a regex pattern to match a \ symbol), and in the second, the 4 backslashes are necessary to declare 2 literal backslashes that will match a single \ in the input text.

Need help constructing Regular expression pattern

I'm failing to create a pattern for the stl regex_match function and need some help understanding why the pattern I created doesn't work and what would fix it.
I think the regex would have a hit for dl.boxcloud.com but it does not.
****still looking for input. I updated the program reflect suggestions. There are two matches when I think should be one.
#include <string>
#include <regex>
using namespace std;
wstring GetBody();
int _tmain(int argc, _TCHAR* argv[])
{
wsmatch m;
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
regex_search(GetBody(), m, wregex(regex));
printf("%d matches.\n", m.size());
return 0;
}
wstring GetBody() {
wstring body(L"ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
return body;
}
There is no problem with the code itself. You mistake m.size() for the number of matches, when in fact, it is a number of groups your regex returns.
The std::match_results::size reference is not helpful with understanding that:
Returns the number of matches and sub-matches in the match_results object.
There are 2 groups (since you defined a capturing group around the 2 alternatives) and 1 match all in all.
See this IDEONE demo
#include <regex>
#include <string>
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
string data("ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
std::regex pattern("(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << "Match: " << result[0] << std::endl;
std::cout << "Captured text 1: " << result[1] << std::endl;
std::cout << "Size: " << result.size() << std::endl;
data = result.suffix().str();
}
}
It outputs:
Match: dl.boxcloud.com
Captured text 1: dl.boxcloud.com
Size: 2
See, the captured text equals the whole match.
To "fix" that, you may use non-capturing group, or remove grouping at all:
std::regex pattern("(?:dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
// or
std::regex pattern("dl\\.boxcloud\\.com|api-content\\.dropbox\\.com");
Also, consider using raw string literal when declaring a regex (to avoid backslash hell):
std::regex pattern(R"(dl\.boxcloud\.com|api-content\.dropbox\.com)");
You need to add another "\" before each ".". I think that should fix it. You need to use escape character to represent "\" so your regex looks like this
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
Update:
As #user3494744 also said you have to use
std::regex_search
instead of
std::regex_match.
I tested and it works now.
The problem is that you use regex_match instead of regex_search. To quote from the manual:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences
This fix will give a match, but too many since you also have to replace \. by \\. as shown before my answer. Otherwise the string "dlXboxcloud.com" will also match.