How to ignore backslash character in boost::regex_search() function?

How to ignore backslash character in boost::regex_search() function? - c++

I'm working on C++, I'm getting a regular expression from xml file. And i have to search this regular expression in a long string.
e.g. my regular expression is : ".+myFunction"
So if i have to put this regular expression in xml file then i need to use backslach character '\' before '.' in above regular expresssion.
i.e. "\.+myFunction"
Now I'm using boost::regex_search() function to search above regular expression. But due to additional backslash character, function return false.
So how to ignore the backslach character while using boost::regex_search() function??
sample code is as follows:
string longString = "hdh::dfjdj::dfuhgj::myFunction.devide.and"
string regularExp = "\.+myFunction"
const boost::regex searchPattern(regularExp);
if(boost::regex_search(longString, searchPattern))
{
cout <<"Regular expresssion is found" << std::endl;
}

It's not really too clear what you're asking:
In the XML, `".+myFunction" is perfectly legal, as is.
If you're trying to match that exact sequence, you'll need to escape
the backslash twice: once because it has a special meaning in a
string literal, and a second time because it has a special meaning for
regular expressions. You'll also need to escape the quotes, if
they're part of what you're looking for: "\"\\\\.+myFunction\"".
But if you're trying to match an exact sequence, you don't need
regular expressions: std::search is largely sufficient.
If you're trying to define a regular expression which matches a
sequence of one or more characters other than a newline, followed by
the sequence "myFunction", the string literal to initialize the
regular expression would be ".+myFunction".

Related

JLEX lexical generator error: unterminated string at the end of the line

I am generating lexical analyzer with JLEX. I found a regular expression for string literal from this link and used it in .jflex file same of other expressions. but it gives me this error :
unterminated string at the end of the line
StringLiteral = \"(\\.|[^"\\])*\"
can anyone help me please, thanks.

The regular expression you copied is for (f)lex, which uses a slightly different syntax for regular expressions. In particular, (f)lex treats " as an ordinary character inside bracketed classes, so [^"\\] is a character class matching anything other than (^) a quotation mark (") or a backslash (\\).
However, in JFlex, the " is a quoting character, whether outside or inside brackets. So the " in the character class is unterminated. Hence the error message.
So you need to backslash-escape it:
StringLiteral = \"(\\.|[^\"\\])*\"
See the JFlex manual chapter on regular expressions for details

escape apostrophe in a regex string that starts and ends with apostrophe in dart

I'm trying to create a regular expression match for an email address and I intend to use it in a dart application.
I found the following regex for that:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
now i'm really new to dart but I understood that I can create regular expression strings with r'' or r"".
now with dart I can escape characters with \ so if I want to escape an apostrophe in a string that I started and ended with apostrophe I can just do this:
final String a = 'foo\'bar';
but with final String a = r'foo\'bar' I get an error. how can I properly escape that ?
thank you

No, r'' does not mean "regular expression". It means "raw", so backslash is interpreted as a literal backslash, and not as an escape character.
Not having to escape each backslash is useful for the kind of strings which often contain a lot of backslashes, such as regular expression patterns.
Regular expressions are created as instances of the RegExp class.
You can concatenate raw strings that use different delimiters to create a single string for the whole pattern. In your case, this should work:
String pattern = r"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" + r'"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])';
RegExp exp = new RegExp(pattern);

kotlin String::replace removing escape sequences?

I'm trying some string manipulation using regex's, but I'm not getting the expected output
var myString = "/api/<user_id:int>/"
myString.replace(Regex("<user_id:int>"), "(\\d+)")
this should give me something like /api/(\d+)/ but instead I get /api/(d+)/
However if I create an escaped string directly like var a = "\d+"
I get the correct output \d+ (that I can further use to create a regex Pattern)
is this due to the way String::replace works?
if so, isn't this a bug, why is it removing my escape sequences?

To make the replace a literal string, use:
myString.replace(Regex("<user_id:int>"), Regex.escapeReplacement("(\\d+)"))
For details, this is what kotlin Regex.replace is doing:
Pattern nativePattern = Pattern.compile("<user_id:int>");
String m = nativePattern.matcher("/api/<user_id:int>/").replaceAll("(\\d+)");
-> m = (d+)
From Matcher.replaceAll() javadoc:
Note that backslashes () and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string. Dollar signs may be treated
as references to captured subsequences as described above, and
backslashes are used to escape literal characters in the replacement
string.
The call to Regex.escapeReplacement above does exactly that, turning (\\d+) to (\\\\d+)

You are using a .replace overload that takes a regex as the first argument, thus, the second argument is parsed as a regex replacement pattern. Inside a regex replacement pattern, a \ char is special, it may escape a dollar symbol to be treated as a literal dollar sign. So, the literal backslash inside regex replacement patterns should be doubled.
You might use
myString.replace(Regex("<user_id:int>"), """(\\d+)""")
Whenever you have to search and replace with a regex and your replacement pattern is a dynamic value, you should use Regex.escapeReplacement (see GUIDO's answer).
However, you are replacing a literal value with another literal value, you do not have to use a regex here:
myString.replace("<user_id:int>", """(\d+)""")
See this Kotlin demo yielding /api/(\d+)/.
Note the use of raw string literals where a backslash is parsed as a literal backslash.

The replacement as the regex engine see's it is interpolated as a double quoted string.
This is true with every regex engine.
This is to distinguish control codes, like tab newline or carriage return.
Nothing special here.
So the replacement as the engine wants to see it is (\\d+).
The language interpolates the same.
Final result repl_str = "(\\\\d+)"

Lex/Flex :Regular expression for string literals in C/C++?

I look here ANSI C grammar .
This page includes a lot of regular expressions in Lex/Flex for ANSI C.
Having a problem in understanding regular expression for string literals.
They have mentioned regular expression as \"(\\.|[^\\"])*\"
As I can understand \" this is used for double quotes, \\ is for escape character, . is for any character except escape character and * is for zero or more times.
[^\\"] implies characters except \ , " .
So, in my opinion, regular expression should be \"(\\.)*\".
Can you give some strings where above regular expression will fail?
or
Why they have used [^\\"]?

The regex \"(\\.)*\" that you proposed matches strings that consist of \ symbols alternating with any characters like:
"\z\x\p\r"
This regular expression would therefore fail to match a string like:
"hello"
The string "hello" would be matched by the regex \".*\" but that would also match the string """" or "\" both of which are invalid.
To get rid of these invalid matches we can use \"[^\\"]*\", but this will now fail to match a string like "\a\a\a" which is a valid string.
As we saw \"(\\.)*\" does match this string, so all we need to do is combine these two to get \"(\\.|[^\\"])*\".

What does it mean "you can’t hide the terminating delimiter of a pattern inside a regex construct" in the "Programming Perl"?

Sorry, but once again I need help to understand rather complicated snippet from the "Programming Perl" book. Here it is (what is obscure to me marked as bold):
patterns are parsed like double-quoted strings, all the normal double-quote conventions will work, including variable interpolation (unless you use single quotes
as the delimiter) and special characters indicated with backslash escapes. These are applied before the string is interpreted as a regular expression (This is one of the
few places in the Perl language where a string undergoes more than one pass of
processing). ...
Another consequence of this two-pass parsing is that the ordinary Perl tokener
finds the end of the regular expression first, just as if it were looking for the
terminating delimiter of an ordinary string. Only after it has found the end of the
string (and done any variable interpolation) is the pattern treated as a regular
expression. Among other things, this means you can’t “hide” the terminating
delimiter of a pattern inside a regex construct (such as a bracketed character class
or a regex comment, which we haven’t covered yet). Perl will see the delimiter
wherever it is and terminate the pattern at that point.
First, why it is said that Only after it has found the end of the string not the end of the regular expression which it was looking, as stated before?
Second, what does it mean you can’t “hide” the terminating delimiter of a pattern inside a regex construct? Why I can't hide the terminating delimiter /, whereas I can place it wherever I want either in the regexp directly /A\/C/ or in a interpolated variable (even without \):
my $s = 'A/';
my $p = 'A/C';
say $p =~ /$s/;
outputs 1.
While I was writing and re-reading my question I thought that this snippet tells about using a single-quote as a regexp delimiter, then it all seems quite cohesive. Is my assumption correct?
My appreciation.

It says "end of the string" instead of "end of the regular expression" because at that point it's treating the regex as if it were just a string.
It's trying to say that this does not work:
/foo[-/_]/
Even though normal regex metacharacters are not special inside [], Perl will see the regex as /foo[-/ and complain about an unterminated class.
It's trying to say that Perl does not parse the regex as it reads it. First it finds the end of the regex in your source code as if it were a quoted string, so the only special character is \. Then it interpolates any variables. Then it parses the result as a regular expression.
You can hide the terminating delimiter with \ because that works in ordinary strings. You can hide the delimiter inside an interpolated variable, because interpolation happens after the delimiter is found. If you use a bracketing delimiter (e.g. { } or [ ]), you can nest matching pairs of delimiters inside the regex, because q{} works like that too.
But you can't hide it inside any other regex construct.

Say you want to match a *. You would use
m/\*/
But what if you were using you used * as your delimiter? The following doesn't work:
m*\**
because it's interpreted as
m/*/
as seen in the following:
$ perl -e'm*\**'
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE / at -e line 1.
Take the string literal
"a\"b"
It produces the string
a"b
Similarly, the match operator
m*a\*b*
produces the regex pattern
a*b
If you want to match a literal *, you have to use other means. In other words.
m*a\*b* === m/a*b/ matches pattern a*b
m*a\x{2A}b* === m/a\*b/ matches pattern a\*b

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to ignore backslash character in boost::regex_search() function? - c++

Related

JLEX lexical generator error: unterminated string at the end of the line

escape apostrophe in a regex string that starts and ends with apostrophe in dart

kotlin String::replace removing escape sequences?

Lex/Flex :Regular expression for string literals in C/C++?

What does it mean "you can’t hide the terminating delimiter of a pattern inside a regex construct" in the "Programming Perl"?

Categories

Resources