regex_match failure in C++11 using VS12 - c++

I need a function that returns true if a string is a partial comment in C++(lets say it just start with /* as a condition) and i thought that a simple regex should solve my solution fast. Wrote it from scrath, tested it online at http://regex101.com/ and it worked like a charm. But in C++, using the c++11 regex_match, it fails displaying me anything. I'll place the regex in code:
regex partialCommReg("(^[\/][\*][\S\s]*$)");
if (regex_match ("/* ", partialCommReg) )
cout<<"ok";
edit: I'm using VS12 as my compiler.

You need to escape the backslashes within the string literal. A better solution is to use raw string literals to avoid having to escape them.
regex partialCommReg(R"((^[\/][\*][\S\s]*$))");
// ^ ^ ^
Live example
Also, your regex can be made a little simpler, this works too:
regex partialCommReg(R"((^/\*[\S\s]*$))");
There seems to be a bug in the VS regex implementation, I was able to reproduce the behavior you're seeing on VS2013. First off, you do have to escape the backslashes, and if you turn the warning level up high enough VS will warn you about illegal escape sequences in the string literal you've posted.
Assuming that's done, your code still won't find a match, and it looks like the part VS doesn't like is this: [\\S\\s]*. If you replace that part with .*, the code works. All 3 versions below will print OK.
regex partialCommReg("(^[\\/][\\*].*$)");
regex partialCommReg("(^/\\*.*$)"); // simplified version of the one above
regex partialCommReg(R"((^/\*.*$))"); // uses raw string literals, VS2013 only

regex partialCommReg("(^[\\/][\\*][\\S\\s]*$)");
Notice the escape sequences. Additionally, if you are using g++, it (regex) is not supported until 4.9. Prior to that, it just throws an exception when you attempt to do anything (but it will compile).

Related

Recursive regular expression match with boost

I got a problem with C++ standard regex library not compiling recursive regex.
Looking up on the internet I found out it's a well known problem and people suggest using boost library. This is the incriminated one :
\\((?>[^()]|(?R))*\\)|\\w+
What I'm trying to do is basically using this regex to split statements according to spaces and brackets (including the case of balanced brackets inside brackets) but every piece of code showing how to do it using boost doesn't work properly and I don't know why. Thanks in advance.
You may declare the regex using a raw string literal, using R"(...)" syntax. This way, you won't have to escape backslashes twice.
Cf., these are equal declarations:
std::string my_pattern("\\w+");
std::string my_pattern(R"(\w+)");
The parentheses are not part of the regex pattern, they are raw string literal delimiter parts.
However, your regex is not quite correct: you need to recurse only the first alternative and not the whole regex.
Here is the fix:
std::string my_pattern(R"((\((?:[^()]++|(?1))*\))|\w+)");
Here, (\((?:[^()]++|(?1))*\)) matches and 1+ chars other than ( and ) or recurses the whole Group 1 pattern with (?1) regex subroutine.
See the regex demo.

Regex c++ crashing while initialization

I'm currently working on finding registry paths match using regex.
I have initalized regex as
regex regx("HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\\\{0398BFBC-913B-3275-9463-D2BF91B3C80B\\}")
and the program throws a std::tr1::regex_error exception.
I tried to escape the curly braces using "\\\\" but it still didn't work.
Any idea on how to fix it?
I'm on Windows 10, Visual Studio 2010.
Let's look at a C++ string literal (a slightly shorter one that we can read):
"A\\B\\C"
This, taking account of the literal escaping, is really the string:
A\B\C
Now you're passing this string to the regex engine. But regex has its own escaping, yet there are no escape sequences \B or \C (there may be, but there aren't for your actual characters).
Hence the regex is invalid and trying to instantiate it throws an exception.
You will need an extra layer of escaping:
"A\\\\B\\\\C"
Or use a raw string literal:
R"(A\\B\\C)"
In other words:
regex regx(R"(HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\\\{0398BFBC-913B-3275-9463-D2BF91B3C80B\\})")
(Yuck!)

Boost regex does not match

I made a python regular expression and now I'm supposed to code the program in C++.
I was told to use boost's regex by the respective person.
It is supposed to match a group of at least one to 80 lower alphanumeric characters including underscore followed by a backslash then another group of at least one to 80 lower alphanumeric characters again including an underscore and last but not least a question mark. The total string must be at least 1 character long and is not allowed to exceed 256.
Here is my python regex:
^((?P<grp1>[a-z0-9_]{1,80})/(?P<grp2>[a-z0-9_]{1,80})([?])){1,256}$
My current boost regex is:
^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$
Cut down basically my code would look like this:
boost::cmatch match;
bool isMatch;
boost::regex myRegex = "^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$";
isMatch = boost::regex_match(str.c_str(), match, myRegex);
Edit: whoops totally forgot the question xDD. My problem is quite simple: The regex doesn't match though it's supposed to.
Example matches would be:
some/more?
object/value?
devel42/version_number?
The last requirement
The total string must be at least 1 character long and is not allowed to exceed 256.
is always true as your string is already limited from 3 to 162 characters. You have only to keep the first part of your regex:
^[a-z0-9_]{1,80}/[a-z0-9_]{1,80}\?$
My g++ gives me the warning "unknown escape sequence: '\/'"; that means you should use "\\/" instead of "\/". You need a backslash char stored in the string, and then let the regex parser eat it as a escaping trigger.
By the way, my boost also requires a constructor invocation, so
boost::regex myRegex("^(([a-z0-9_]{1,80})\\/([a-z0-9_]{1,80})([?])){1,256}$");
seems work.
You can also use C++11 raw string literal to avoid C++ escaping:
boost::regex myRegex(R"(^(([a-z0-9_]{1,80})\/([a-z0-9_]{1,80})([?])){1,256}$)");
By the way, testing <regex> in libstdc++ svn is welcome. It should come with GCC 4.9 ;)
The actual error was a new line sent to the server by the client on entering the respective string that would've been later compared.
Funny how the errors root is rarely where you expect it to be.
Anyways, thank you all for your answers. They gave me the ability to clean up my regular expressions.

What's wrong in regex format for filenames in <regex> VS10?

I'm trying to parse filenames paths by in Visual Studio 2010.
But program crashes with
Microsoft C++ exception: std::tr1::regex_error at memory location 0x001ef120..
on
regex myRegEx("^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]\\{2,6\\})$");
Regular expression is ^([a-zA-Z]\:)(\\[^\\/:*?<>"|]*(?<![ ]))*(\.[a-zA-Z]{2,6})$
What's wrong with regex format?
You can perhaps narrow things down by slicing it into chunks. Evaluate the atoms separately, and see where the error turns up:
"^([a-zA-Z]\\:).*$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*[.]*$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]+)$"
"^([a-zA-Z]\\:)(\\\\[^\\\\/:*?<>\"|]*(?<![ ]))*(\\.[a-zA-Z]\\{2,6\\})$"
One possible gotcha is the range, "\{2,6\}". If you really want "two to six letters", then you don't want backslashes in the middle of the range. The real answer depends on your parser.
Also, if there's confusion as to what's being escaped with backslashes, remember that you can often escape special characters by putting them into a range. For example, \\ may be equivalent to [\], and \. is certainly equivalent to [.].
First of all, I don't know well c++ regex syntax but, it seems to me that \\[ means escape the [ character.
I guess you should code just as [ if you want a negated character class [^\\/:*?<>"|]
^([a-zA-Z]\:)([^\\/:*?<>"|]*(?<![ ]))*(\.[a-zA-Z]{2,6})$
tr1::regex doesn't support lookbehind, so it's choking on "(?<![ ])".
Unfortunately, I'm not enough of a regex user to give you guidance on what you might use instead.

Need assistance with Regular Expressions in Qt (QRegExp) [bad repetition syntax?]

void MainWindow::whatever(){
QRegExp rx ("<span(.*?)>");
//QString line = ui->txtNet1->toHtml();
QString line = "<span>Bar</span><span style='baz'>foo</span>";
while(line.contains(rx)){
qDebug()<<"Found rx!";
line.remove (rx);
}
}
I've tested the regular expression online using this tool. With the given regex string and a sample text of <span style="foo">Bar</span> the tool says that it the regular expression should be found in the string. In my Qt code, however, I'm never getting into my while loop.
I've really never used regex before, in Qt or any other language. Can someone provide some help? Thanks!
[edit]
So I just found that QRegExp has a function errorString() to use if the regex is invalid. I output this and see: "bad repetition syntax". Not really sure what this means. Of course, googling for "bad repetition syntax" brings up... this post. Damn google, you fast.
The problem is that QRegExp only supports greedy quantifiers. More precisely, it supports either greedy or reluctant quantifiers, but not both. Thus, <span(.*?)> is invalid, since there is no *? operator. Instead, you can use
QRegExp rx("<span(.*)>");
rx.setMinimal(true);
This will give every *, +, and ? in the QRegExp the behavior of *?, +?, and ??, respectively, rather than their default behavior. The difference, as you may or may not be aware, is that the minimal versions match as few characters as possible, rather than as many.
In this case, you can also write
QRegExp rx("<span([^>]*)>");
This is probably what I would do, since it has the same effect: match until you see a >. Yours is more general, yes (if you have a multi-character ending token), but I think this is slightly nicer in the simple case. Either will work, of course.
Also, be very, very careful about parsing HTML with regular expressions. You can't actually do it, and recognizing tags is—while (I believe) possible—much harder than just this. (Comments, CDATA blocks, and processing instructions throw a wrench in the works.) If you know the sort of data you're looking at, this can be an acceptable solution; even so, I'd look into an HTML parser instead.
What are you trying to achieve? If you want to remove the opening tag and its elements, then the pattern
<span[^>]*>
is probably the simplest.
The syntax .*? means non-greedy match which is widely supported, but may be confusing the QT regex engine.