Regex c++ crashing while initialization - c++

I'm currently working on finding registry paths match using regex.
I have initalized regex as
regex regx("HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\\\{0398BFBC-913B-3275-9463-D2BF91B3C80B\\}")
and the program throws a std::tr1::regex_error exception.
I tried to escape the curly braces using "\\\\" but it still didn't work.
Any idea on how to fix it?
I'm on Windows 10, Visual Studio 2010.

Let's look at a C++ string literal (a slightly shorter one that we can read):
"A\\B\\C"
This, taking account of the literal escaping, is really the string:
A\B\C
Now you're passing this string to the regex engine. But regex has its own escaping, yet there are no escape sequences \B or \C (there may be, but there aren't for your actual characters).
Hence the regex is invalid and trying to instantiate it throws an exception.
You will need an extra layer of escaping:
"A\\\\B\\\\C"
Or use a raw string literal:
R"(A\\B\\C)"
In other words:
regex regx(R"(HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\\\{0398BFBC-913B-3275-9463-D2BF91B3C80B\\})")
(Yuck!)

Related

Unexpected end of regex when ascii character

Minimal Verfiable Example
#include<regex>
int main(){
std::regex re("\\u_nic400_ib_ext_m_ib_ar_fifo_wr_mux/mux_0_1_out [0]");
}
Why is this giving me a regex_error? My debugger's error message is unexpected end of regex when ascii character, but I just trying to match the literal above and I don't see where the issue is.
\u is the beginning of the escape sequence for a Unicode code point, you need to escape it. Also, [...] is a character set match, it needs to be escaped if you want to match it literally.
std::regex re("\\\\u_nic400_ib_ext_m_ib_ar_fifo_wr_mux/mux_0_1_out \\[0\\]");
If you're using C++11 or newer, it's helpful to use raw strings when writing regular expressions, so you don't have to double the backslashes.
std::regex re(R"(\\u_nic400_ib_ext_m_ib_ar_fifo_wr_mux/mux_0_1_out \[0\])");
This is all only relevant if you're creating the regexp as a literal. If you're constructing it dynamically at run time, you don't need to double the escapes, since you're feeding the string directly to the regexp engine, it's not being parsed as C source code.

Recursive regular expression match with boost

I got a problem with C++ standard regex library not compiling recursive regex.
Looking up on the internet I found out it's a well known problem and people suggest using boost library. This is the incriminated one :
\\((?>[^()]|(?R))*\\)|\\w+
What I'm trying to do is basically using this regex to split statements according to spaces and brackets (including the case of balanced brackets inside brackets) but every piece of code showing how to do it using boost doesn't work properly and I don't know why. Thanks in advance.
You may declare the regex using a raw string literal, using R"(...)" syntax. This way, you won't have to escape backslashes twice.
Cf., these are equal declarations:
std::string my_pattern("\\w+");
std::string my_pattern(R"(\w+)");
The parentheses are not part of the regex pattern, they are raw string literal delimiter parts.
However, your regex is not quite correct: you need to recurse only the first alternative and not the whole regex.
Here is the fix:
std::string my_pattern(R"((\((?:[^()]++|(?1))*\))|\w+)");
Here, (\((?:[^()]++|(?1))*\)) matches and 1+ chars other than ( and ) or recurses the whole Group 1 pattern with (?1) regex subroutine.
See the regex demo.

regex_match failure in C++11 using VS12

I need a function that returns true if a string is a partial comment in C++(lets say it just start with /* as a condition) and i thought that a simple regex should solve my solution fast. Wrote it from scrath, tested it online at http://regex101.com/ and it worked like a charm. But in C++, using the c++11 regex_match, it fails displaying me anything. I'll place the regex in code:
regex partialCommReg("(^[\/][\*][\S\s]*$)");
if (regex_match ("/* ", partialCommReg) )
cout<<"ok";
edit: I'm using VS12 as my compiler.
You need to escape the backslashes within the string literal. A better solution is to use raw string literals to avoid having to escape them.
regex partialCommReg(R"((^[\/][\*][\S\s]*$))");
// ^ ^ ^
Live example
Also, your regex can be made a little simpler, this works too:
regex partialCommReg(R"((^/\*[\S\s]*$))");
There seems to be a bug in the VS regex implementation, I was able to reproduce the behavior you're seeing on VS2013. First off, you do have to escape the backslashes, and if you turn the warning level up high enough VS will warn you about illegal escape sequences in the string literal you've posted.
Assuming that's done, your code still won't find a match, and it looks like the part VS doesn't like is this: [\\S\\s]*. If you replace that part with .*, the code works. All 3 versions below will print OK.
regex partialCommReg("(^[\\/][\\*].*$)");
regex partialCommReg("(^/\\*.*$)"); // simplified version of the one above
regex partialCommReg(R"((^/\*.*$))"); // uses raw string literals, VS2013 only
regex partialCommReg("(^[\\/][\\*][\\S\\s]*$)");
Notice the escape sequences. Additionally, if you are using g++, it (regex) is not supported until 4.9. Prior to that, it just throws an exception when you attempt to do anything (but it will compile).

Porting from C# to Delphi, Regex incompatibility

I am new to HTTP and Regex. I have a piece of code which I have ported to Delphi which works partially. The exception 'lookbehind not of fixed length' is raised on a particular statement:
'(?<=image\\?c=)[^\"]+'
The statement is there to extract image link from a html form. After some research here and on the web, I have come to understand that the '+' at the end causes this in some implementations of Regex. Which I couldn't find was how can I change it to work in Delphi's implementation. As the code works in C#, can somebody help and explain?
The lookbehind section doesn't have fixed length. That has nothing to do with the + at the end. The lookbehind portion is (?<=image\\?c=). You copied that from C#. In C#, the regex wants to look for a literal question mark. That's a special character in regex, so it needs a backslash in front of it. Backslash is special in C# strings, though, so that backslash needs another backslash, all just to represent a single question mark.
In Delphi strings, backslashes aren't special, so the two of them are treated as a literal backslash to search for in the regex. The question mark isn't escaped, so the Delphi regex treats it as an instruction to make the literal backslash optional. The optional character makes the lookbehind have variable length.
To solve this, simply remove one backslash.
You can also remove the one before the quotation mark, but it should have no effect since quotation marks aren't special in regex.
Even if you use an HTML parser to identify HTML element that contains this URL fragment, you may still need the right regex to recognize which HTML element is your target.

What's wrong in regex format for filenames in <regex> VS10?

I'm trying to parse filenames paths by in Visual Studio 2010.
But program crashes with
Microsoft C++ exception: std::tr1::regex_error at memory location 0x001ef120..
on
regex myRegEx("^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]\\{2,6\\})$");
Regular expression is ^([a-zA-Z]\:)(\\[^\\/:*?<>"|]*(?<![ ]))*(\.[a-zA-Z]{2,6})$
What's wrong with regex format?
You can perhaps narrow things down by slicing it into chunks. Evaluate the atoms separately, and see where the error turns up:
"^([a-zA-Z]\\:).*$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*[.]*$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]+)$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]\\{2,6\\})$"
One possible gotcha is the range, "\{2,6\}". If you really want "two to six letters", then you don't want backslashes in the middle of the range. The real answer depends on your parser.
Also, if there's confusion as to what's being escaped with backslashes, remember that you can often escape special characters by putting them into a range. For example, \\ may be equivalent to [\], and \. is certainly equivalent to [.].
First of all, I don't know well c++ regex syntax but, it seems to me that \\[ means escape the [ character.
I guess you should code just as [ if you want a negated character class [^\\/:*?<>"|]
^([a-zA-Z]\:)([^\\/:*?<>"|]*(?<![ ]))*(\.[a-zA-Z]{2,6})$
tr1::regex doesn't support lookbehind, so it's choking on "(?<![ ])".
Unfortunately, I'm not enough of a regex user to give you guidance on what you might use instead.