Validator.matchesRegex in Mule blows up with basic pattern - regex

I am receiving a date in message as a string. the following regular expression will confirm it is in at least the format I know I can handle:
^[0-9]{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])$
but when I provide this regex to the validator.matchesRegex method in the Mule Expression language like so:
<when expression="#[validator.matchesRegex(payload.DateOfBirth,'^[0-9]{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])$') == false]">
<set-variable variableName="validation_message" value="{"error": "Invalid DateOfBirth"}" doc:name="invalid DateOfBirth"/>
</when>
I receive the following error:
org.mule.api.MessagingException: [Error: illegal escape sequence: -]
[Near : {... eOfBirth,'^[0-9]{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[1 ....}]
^
[Line: 1, Column: 55] (org.mule.api.expression.InvalidExpressionException). Message payload is of type: HashMap
update
I've tried two new alterations:
doubling up my curly brackets like so:
^[0-9]{{4}}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])$
but I get the same error but a few characters off:
[Error: illegal escape sequence: -]
[Near : {... fBirth,'^[0-9]{{4}}\-(0?[1-9]|1[012])\-(0?[1-9]|[1 ....}]
^
[Line: 1, Column: 57] (org.mule.api.expression.InvalidExpressionException). Message payload is of type: HashMap
unescaping the dashes between the numbers
'^[0-9]{{4}}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])$
but I get the following error:
org.mule.api.MessagingException: Execution of the expression "validator.matchesRegex(payload.DateOfBirth,'^[0-9]{{4}}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])$') == false" failed. (org.mule.api.expression.ExpressionRuntimeException). Message payload is of type: HashMap
I have changed the pattern to not use the escape backslash to only accept the dash:
^[0-9]{4}.(0?[1-9]|1[012]).(0?[1-9]|[12][0-9]|3[01])$
I have replaced '-' with '.' which is not really what I want, but it at least validates the number portions. I have confirmed this works. it just allows invalid values such as '2016!02_23' when it really should only allow '2016-02-23'
TL;DR: there is a bug with regular expressions within MEL when you are trying to escape the dash character?

Mule (as this page indicates) does not give a way to avoid the evil escaped escape. If you use a \ in your regex, you must escape it: \\. Java is the same way.
Also, you need to understand that certain regex symbols do not always need to be escaped. This is very important when you work in Mule/Java, because it means you avoid the evil escaped escape.
Depending on where they appear in the regex, characters may either gain or lose meaning as meta characters. The - character only has special meaning when it is sandwiched inside [] character classes. This means that you just can just use it normally instead of escaping it in your regular expression.
I suggest that you read up on regexes.
There will be times that you need to use the evil escaped escape, which can get confusing. Personally, I usually use this site to convert my regexes into escaped Strings.

Related

Invalid regular expression:Lone quantifier brackets

I have a html phone pattern that will accept these formats :
+61 x xxxx xxxx,
+61xxxxxxxxx,
0x xxxx xxxx,
0xxxxxxxxx,
xxxx xxxx,
xxxxxxxx,
+xx xxx xxx xxx,
+xxxxxxxxxxx,
0xxx xxx xxx,
0xxxxxxxxx
It was working few months ago, now suddenly my phone fields are not validating . I'm having this error:
Pattern attribute value ^(?:0|\(?\+61\)?\s?|0061\s?)[1-79](?:[\.\-\s]?\d\d){4}|(\d{4}[\s]\d{4})|(\d{8})|(\d{4}[\s]\d{3}[\s]\d{3})|(\+61\[\s]\d{3}[\s]\d{3}[\s]\d{3})|(\+61\s\d{3}\s\d{3}\s\d{3})$ is not a valid regular expression: Uncaught SyntaxError: Invalid regular expression: /^(?:0|\(?\+61\)?\s?|0061\s?)[1-79](?:[\.\-\s]?\d\d){4}|(\d{4}[\s]\d{4})|(\d{8})|(\d{4}[\s]\d{3}[\s]\d{3})|(\+61\[\s]\d{3}[\s]\d{3}[\s]\d{3})|(\+61\s\d{3}\s\d{3}\s\d{3})$/: Lone quantifier brackets
So far, no one cared to show where in your pattern the error is.
…|(\+61\[\s]\d{3}[\s]\d{3}[\s]\d{3})|(\+61\s\d{3}\s\d{3}\s\d{3})$
^
There by mistake you inserted a backslash, escaping the opening bracket, so making it an ordinary character and leaving the closing bracket Lone. (Sadly the error message is somewhat misleading, since those brackets are of course not quantifier brackets.)
Similar to the request above, I received the error in Visual Studio Code using the following RegEx to capture words wrapped in brackets:
\[([^\\[\r\n]*(?:\\.[^\\]\r\n]*)*)\] (which is incorrect)
on the text string (SQL)
SELECT d.[fname]
,d.[lname],d.[address1],d.[address2],d.[City],d.[zip]
FROM my_table d
WHERE [hh_id] IN (SELECT [HhId]
FROM other_table)
However using this RegEx pattern in online regex tools shown in https://regex101.com/r/HGycR9/1 show it is valid RegEx.
The correct RegEx pattern is \[([^[\r\n]*(?:\\.[^\r\n]*)*)\], which will highlight:
SELECT d.[fname] ,d.[lname],d.[address1],d.[address2],d.[City],d.[zip]
FROM my_table d WHERE [hh_id] IN (SELECT [HhId] FROM other_table)
Then you can use $1 in the replace field of VS Code to remove the brackets and leave the words contained within them.
SELECT d.fname
,d.lname,d.address1,d.address2,d.City,d.zip
FROM my_table d
WHERE hh_id IN (SELECT HhId
FROM other_table)
Just to spell it out a bit more explicitly, the difference between VSCode (and apparently Visual Studio) and other Java/ECMAScript regex interpreters is that the VS products are picky about matching brackets, even if they should be interpreted as literal characters.
For example, the regex below - with just the first bracket escaped - is fine with matching a string surrounded by square brackets in most JS-based (and PCRE) regex interpreters. e.g.: [string]
\[\w+]
^
While just the "opening" square bracket is escaped, the "closing" square bracket is also interpreted as a literal character. Semantically, with the first bracket escaped, you haven't actually started a character class match (or group or quantifier, if it's parentheses () or curly braces {} you're matching).
In VSCode, you must also escape the "closing" bracket, or else you get the "Lone quantifier brackets" error.
\[\w+\]
^ ^
Another irritating aspect is that the .NET regex interpreter behaves very similarly to a standard Javascript interpreter, so it similarly does not care about escaping both brackets if the opening one is escaped. Something of this nature that's fine in .NET or Powershell won't work the same in VSCode.
That means exactly that, pattern invalid.
If you want to match phones from Australia, you could use:
pattern="^(?:0|\(?\+61\)?\s?|0061\s?)[1-79](?:[\.\-\s]?\d\d){4}$"
Pattern found here.
Example:
https://jsfiddle.net

Asterisk regular expression : Invalid preceding regular expression

I am trying to verify if inbound CLI matchest one of these patterns:
CLI STARTING WITH:
+39
0039
3
0[1-9]
So i wrote the following
exten => s,n,Set(isita=${REGEX("^(+39|0039|3|0[1-9])" ${cli})})
However I am getting this error :
Malformed input REGEX(): Invalid preceding regular expression
What is wrong with my regular expression?
You need to escape the +, use this RegEx instead:
^(\\+39|0039|3|0[1-9])
You can see the error when you Test it on RegExr
Normally in a RegEx (in JavaScript for example, whre is it enclosed in /), you only need one \, however when the RegEx is stored in a string (in this case anyway), you need 2 \.
If you have one \, the string is trying to create a character based on \+ (like \n is a newline). You need the second \ to state that the first \ should not be converted.
New RegEx on RegExr
Answer is correct, but use of REGEXP inside dialplan is not so nice idea. Dialplan itself is regexp, it have form for do regexp based on cli
exten => _s/_39.,n,Noop(do something for cli starting with 39)
So it more asterisk-way use dialplan, not regexp.

Parsing variables within a string using a Regular Expression

I've got a bit of a problem with regular expressions with ColdFusion.
I have a string:
Hi my name is {firstname}. and i live in {towncity} my email address is {email}
What I would like to know is how would I go about finding all strings, within my string, that are encased within a set of {} brackets? I would like to split all the matching strings into an array so I can use the results of query data.
Also is this a commonly used pattern for processing strings within matching strings for merging variable data ?
Any help greatly appreciated.
Simple Answer
To find all the brace-encased strings, you can use rematch and the simple expression \{[^{}]+\}
Explanation
The backslashes \ before each brace are to escape them, and have them act as literal braces (they carry special meaning otherwise).
The [^...] is a negative character class, saying match any single char that is NOT one of those contained within, and the greedy + quantifier tells it to match as many as possible, but at least one, from the preceding item.
Thus using [^{}]+ between the braces means it will not match nested or unmatched braces. (Whilst using \{.*?\} could match two opening braces. Note: the *? is a lazy quantifier, it matches nothing (if possible), but as many as required.)
Extended Answer
However, since you say that the results come from a query, a way to only match the values you're dealing with is to use the query's ColumnList to form an expression:
`\{(#ListChangeDelims(QueryName.ColumnList,'|')#)\}`
This changes ColumnList into a pipe-delimited list - a set of alternatives, grouped by the parentheses - i.e. the generated pattern will be like:
\{(first_name|towncity|email)\}
(with the contents of that group going into capture group 1).
To actually populate the text (rather than just matching) you could do something similar, except there is no need for a regex here, just a straight replace whilst looping through columns:
<cfloop index="CurColumn" list=#QueryName.ColumnList#>
<cfset text = replace( text , '{#CurColumn#}' , QueryName[CurColumn][CurrentRow] , 'all' ) />
</cfloop>
(Since this is a standard replace, there's no need to escape the braces with backslashes; they have no special meaning here.)
Use the reMatch(reg_expression, string_to_search) function.
The details on Regular Expressions in Coldfusion 10 are here. (I believe the regexp in CF8 would be roughly the same.)
Use the following code.
<cfset str = "Hi my name is {firstname}. And I live in {towncity} my email address is {email}.">
<cfoutput>Search string: <b>#str#</b><br />Search result:<br /></cfoutput>
<cfset ret = reMatch("\{[\w\s\(\)\+\.#-]+\}", str)>
<cfdump var ="#ret#">
This returns an array with the following entries.
{firstname}
{towncity}
{email}
The [] brackets in CF regular expressions define a character set to match a single character. You put + after the brackets to match one or more characters from the character set defined inside the []. For example, to match one or more upper case letters you could write [A-Z]+.
As detailed in the link above, CF defines shortcuts to match various characters. The ones I used in the code are: \w to match an alpha-numeric character or an underscore, \s to match a whitespace character (including space, tab, newline, etc.).
To match the following special characters +*?.[^$({|\ you escape them by writing backslash \ before them.
An exception to this is the dash - character, which cannot be escaped with a backslash. So, to use it as a literal simply place it at the very end of the character set, like I did above.
Using the above regular expression you can extract characters from the following string, for example.
<cfset str = "Hi my name is { John Galt}. And I live in {St. Peters-burg } my email address is {john#exam_ple.com}.">
The result would be an array with the following entries.
{ John Galt}
{St. Peters-burg }
{john#exam_ple.com}
There may be much better ways to do this, but using something like rematch( '{.*?}', yourstring ) would give you an array of all the matches.
For future reference, I did this with the excellent RegExr, a really nice online regex checker. Full disclosure, it's not specifically for ColdFusion, but it's a great way to test things out.

parsing url for specific param value

im looking to use a regular expression to parse a URL to get a specific section of the url and nothing if I cannot find the pattern.
A url example is
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5#c452fds-634d-f424fds-cdsa&bf_action=jildape
I wish to get the bolded text in it.
Currently im using the regex "d=([^#]*)" but the problem is im also running across urls of this pattern:
and im getting the bold section of it
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5&bf_action=jildape
I would prefer it have no matches of this url because it doesnt contain the #
Regexes are not a magic tool that you should always use just because the problem involves a string. In this case, your language probably has a tool to break apart URLs for you. In PHP, this is parse_url(). In Perl, it's the URI::URL module.
You should almost always prefer an existing, well-tested solution to a common problem like this rather than writing your own.
So you want to match the value of the id parameter, but only if it has a trailing section containing a '#' symbol (without matching the '#' or what's after it)?
Not knowing the specifics of what style of regexes you're using, how about something like:
id=([^#&]*)#
regex = "id=([\\w-])+?#"
This will grab everything that is character class[a-zA-Z_0-9-] between 'id=' and '#' assuming everything between 'id=' and '#' is in that character class(i.e. if an '&' is in there, the regex will fail).
id=
-Self explanatory, this looks for the exact match of 'id='
([\\w-])
-This defines and character class and groups it. The \w is an escaped \w. '\w' is a predefined character class from java that is equal to [a-zA-Z_0-9]. I added '-' to this class because of the assumed pattern from your examples.
+?
-This is a reluctant quantifier that looks for the shortest possible match of the regex.
#
-The end of the regex, the last character we are looking for to match the pattern.
If you are looking to grab every character between 'id=' and the first '#' following it, the following will work and it uses the same logic as above, but replaces the character class [\\w-] with ., which matches anything.
regex = "id=(.+?)#"

URL Regular Expression in Racket

I'm trying to use the URL regular expression to match URLs in Racket like this:
(regexp-match #rx"((mailto\:|(news|(ht|f)tp(s?))\:\/\/){1}\S+)" "www.test.com/")
The problem is that I'm getting this error: read: unknown escape sequence \: in string. What should I do to correct this?
Now I'm trying this:
(regexp-match #px"((mailto:|(news|(ht|f)tp(s?))://){1}\S+)" "www.youtube.com/watch?v=I0r4Wo2Q3l4")
And now I'm getting this error: read: unknown escape sequence \S in string
There are a number of issues with your code. First, as others have pointed out, you don't need to escape the colon character.
Second, you need to use #px to start a regular expression that uses perl-regexp extensions, as you've done.
Finally, you've left out the "http://" in the input that makes it match the pattern.
Here's an example that works:
#lang racket
(regexp-match #px"((mailto:|(news|(ht|f)tp(s?))://){1}\\S+)"
"http://www.test.com/")
running this code produces:
'("http://www.test.com/" "http://www.test.com/" "http://" "http" "ht" "")
\: is an incorrect scape sequence because : isn't a special character did you wanted to write .?