Why does this Swift RegEx match "e1234"? - regex

Here's the regular expression:
let legalStr = "(?:[eE][\\+\\-]?[0-9]{1,3})?$"
Here's the invocation:
if let match = sender.stringValue.rangeOfString(legalStr, options: .RegularExpressionSearch) {
print("\(sender.stringValue) is legal")
}
else {
print( "\(sender.stringValue) is not legal")
}
If I type garbage, like "abcd" is returns illegal string.
If I type something like "e123" it returns legal string.
(note that the empty string is also legal.)
However, if I type "e1234" it still returns "legal". I'd expect it to return "not legal". Am I missing something here? BTW, note the "$" at the end of the regular expression. The three digits should appear at the end of the string.
If it's not immediately clear, the source of the string is a text edit box.

Your pattern is only anchored at the end, and matches the empty string. So any string at all will match successfully by just matching your pattern as an empty string at the end.
Add a ^ to the front to anchor it on that side, too.

Related

MarkLogic matches function with regular expression

I am new to XQuery so could you please help me to understand what is § and §.*$ in below MarkLogic XQuery:
if (matches($cite, '§'))
replace($cite,'§.*$','')
here $cite := "HI CONST Preamble"
In regex, the $ in a regex expression is an anchor point to the end of the input string value.
§ is a numeric entity reference of the § character, and . is a wildcard and the * is a quantifier meaning zero to many.
The matches() expression is testing whether the $cite contains the § character. If it does, then it attempts to replace() the § and all of the characters following it until the end of the input with nothing.
For example:
let $cite := "HI CONST Preamble"
return
if (matches($cite, '§'))
then replace($cite,'§.*$','')
else "no match"
returns: "no match" because it doesn't contain the § at all.
However, this:
let $cite := "HI CONST Preamble §foo bar baz."
return
if (matches($cite, '§'))
then replace($cite,'§.*$','')
else "no match"
Returns: "HI CONST Preamble" because it does contain §, so §foo bar baz. is replaced with "".

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

recursive matching for string delimiter with regular expression

In verilog language, the statements are enclosed in a begin-end delimiter instead of bracket.
always# (*) begin
if (condA) begin
a = c
end
else begin
b = d
end
end
I'd like to parse outermost begin-end with its statements to check coding rule in python. Using regular expression, I want results with regular expression like:
if (condA) begin
a = c
end
else begin
b = d
end
I found similar answer for bracket delimiter.
int funcA() {
if (condA) {
b = a
}
}
regular expression:
/({(?>[^{}]+|(?R))*})/g
However, I don't know how to modify atomic group ([^{}]) for "begin-end"?
/(begin(?>[??????]+|(?R))*end)/g
The point of the [??????]+ part is to match any text that does not match a char that is equal or is the starting point of the delimiters.
So, in your case, you need to match any char other than a char that starts either begin or end substring:
/begin(?>(?!begin|end).|(?R))*end/gs
See the regex demo
The . here will match any char including line break chars due to the s modifier. Note that the actual implementation might need adjustments (e.g. in PHP, the g modifier should not be used as there are specific functions/features for that).
Also, since you recurse the whole pattern, you need no outer parentheses.

How to match "{" using regex in c++

May we have similar question here stackoverflow:
But my question is:
First I tried to match all x in the string so I write the following code, and it's working well:
string str = line;
regex rx("x");
vector<int> index_matches; // results saved here
for (auto it = std::sregex_iterator(str.begin(), str.end(), rx);
it != std::sregex_iterator();
++it)
{
index_matches.push_back(it->position());
}
Now if I tried to match all { I tried to replace
regex rx("x"); with regex rx("{"); andregex rx("\{");.
So I got an exception and I think it should throw an exception because we use {
sometimes to express the regular expression, and it expect to have } in the regex at the end that's why it throw an exception.
So first is my explanation correct?
Second question I need to match all { using the same code above, is that possible to change the regex rx("{"); to something else?
You need to escape characters with special meaning in regular expressions, i.e. use \{ regular expression. But, \ has special meaning in C++ string literals. So, next you need to escape characters with special meaning in C++ string literals, i.e. write:
regex rx("\\{");

C++ TR1 regex - multiline option

I thought that $ indicates the end of string. However, the following piece of code gives "testbbbccc" as a result, which is quite astonishing to me... This means that $ actually matches end of line, not end of the whole string.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
tr1::regex r("aaa([^]*?)(ogr|$)");
string test("bbbaaatestbbbccc\nddd");
vector<int> captures;
captures.push_back(1);
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator iter(test.begin(), test.end(), r, captures); iter != end; )
{
string& t1 = iter->str();
iter++;
cout << t1;
}
}
I have been trying to find a "multiline" switch (which actually can be easily found in PCRE), but without success... Can someone point me to the right direction?
Regards,
R.P.
As Boost::Regex was selected for tr1, try the following:
From Boost::Regex
Anchors:
A '^' character shall match the start
of a line when used as the first
character of an expression, or the
first character of a sub-expression.
A '$' character shall match the end of
a line when used as the last character
of an expression, or the last
character of a sub-expression.
So the behavior you observed is correct.
From: Boost Regex as well:
\A Matches at the start of a buffer
only (the same as \`).
\z Matches at
the end of a buffer only (the same as
\').
\Z Matches an optional sequence
of newlines at the end of a buffer:
equivalent to the regular expression
\n*\z
I hope that helps.
There is no multiline switch in TR1 regexs. It's not exactly the same, but you could get the same functionality matching everything:
(.|\r|\n)*?
This matches non-greedily every character, including new line and carriage return.
Note: Remember to escape the backslashes '\' like this '\\' if your pattern is a C++ string in code.
Note 2: If you don't want to capture the matched contents, append '?:' to the opening bracket:
(?:.|\r|\n)*?