What is the output of this regex [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
What is the output of this regular expression?
/\s+(\d+)\s+/
In particular what is the meaning of /\s

In your regex \s+ matches any number of whitespaces sequentially and /d+ matches any number of digits sequentially .
\s and \d matches a single whitespace and single digit respectively the + makes it match any number of sequential whitespaces and digits respectively.

You can find a full explanation at regex101.com.
/\s+(\d+)\s+/
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

https://regex101.com/
might be useful :D
\s+(\d+)\s+ / ↵ matches the character
↵ literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) 1st Capturing Group
(\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s+ matches any whitespace
character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed

Related

BASH -- grep'ing for perl-regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Beginner here and I'm trying to understand this. Can someone please break down the part in between the single quotes and describe what it does?
grep -oP '(?<=\S\/1\.\d.\s)[345]\d+'
Many thanks in advance!
Positive Lookbehind (?<=\S/1.\d.\s) Assert that the Regex below matches
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\/ matches the character / literally (case sensitive)
1 matches the character 1 literally (case sensitive)
\. matches the character . literally (case sensitive)
\d matches a digit (equal to [0-9])
. matches any character (except for line terminators)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Match a single character present in the list below [345]
345 matches a single character in the list 345 (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Output simply copied from https://regex101.com/r/HfJSNm/1 : very handy to test/share/have automatic explications on regexes.

Need this regex to check for 3 decimal places?

I have :
^-?[0-9]\d*(\.\d+)?$
But need it to allow only up to 3 decimal places. So allowed values are:
+10.123
-10.123
10.123
10
+10
-10
10.1
10.12
Not allowed:
10.1234
10.123%
Advice / suggested expression mods appreciated.
Thanks in advance.
In addition to * and + metacharacters, which specify unlimited repetition, regex allows you to place specific limits on the number of matches with the {a,b} construct. Here, a is the minimum required number of matches, and b is the maximum. Both a and b are inclusive.
Since you need to match at least one and at most three digits, you need to replace \d+ with \d{1,3}:
^[+-]?[0-9]\d*(\.\d{1,3})?$
Optimization: With a working regex in hand, you can optimize by replacing [0-9] with another \d, and "folding" it into \d* by using \d+:
^[+-]?\d+(\.\d{1,3})?$
^[+-]?\d+(\.\d{1,3})?$
Explanation:
See it here: https://www.debuggex.com/r/BbCBL5pQWLxsD4a6
^ asserts position at start of a line
Match a single character present in the list below [+-]?
? Quantifier — Matches between zero and one times, as many times as possible,
giving back as needed (greedy)
+- matches a single character in the list +- (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
giving back as needed (greedy)
1st Capturing Group (\.\d{1,3})?
? Quantifier — Matches between zero and one times, as many times as possible,
giving back as needed (greedy)
\. matches the character . literally (case sensitive)
\d{1,3} matches a digit (equal to [0-9])
{1,3} Quantifier — Matches between 1 and 3 times, as many times as possible,
giving back as needed (greedy)
$ asserts position at the end of a line
Explanation From: [https://regex101.com/]
^[+-]{0,1}\d*?(\.{0,1}\d{0,3})?$ should work
see https://regex101.com/r/P6DBrW/1/ for Explanation of the regexp
^(?!0\d)\d*
(\.\d{1,4})?$

Regular exp[ression in XSLT [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
Can somebody help me to understand this below regular expreesion in XSLT.
regexp:match(test-graph.api.example.com, '(?=CN).*\.(.*)(\.)(.*)(?<=com)', 'i')
What would be the output and how to interpret this regular expression.
Please let me know
Is should be read this way:
(?=CN).*\.(.*)(\.)(.*)(?<=com)
explanation:
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . literally (case sensitive)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\.)
\. matches the character . literally
3rd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Positive Lookbehind (?<=com)
Assert that the Regex below matches
com matches the characters com literally
The i modifiers means that the regex is case insentive.

Use RegEx to match URL [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I'm writing a regex to pull a URL out of an auto-generated email from my monitoring system. For example:
https://mon.contoso.com/mon/call.py?fn=edit&num=1389896156
I need a regex to match:
https://mon.contoso.com/mon/call.py?fn=edit&num=XXXXXXXXX
whereby the "x"'s always change. I run into an issue with the "?". The point of this is to append the URL to a field in JIRA.
Pattern p = new Pattern("https://mon.contoso.com/mon/call.py?fn=edit&num=(\d+)")
Matcher m = p.matcher(inputEmail);
return m.matches() ? m.group(1) : "";
This returns num if it is numeric, otherwise you might want to use \w instead of \d. If you want the whole URL, remove the group() parameter.
You don't indicate what language you're working in.
In Python and JavaScript, this regex will identify a variety of URLs:
/\[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])|(?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]#!$&'()*+,;=\-~%]*/gi
You can refer to this regex101 test for examples of the regex in use.
Explanation:
/\[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])|(?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]#!$&'()*+,;=\-~%]*/gi
1st Alternative: \[[^\]\n]+\](?:\([^\)\n]+\)|\[[^\]\n]+\])
\[ matches the character [ literally
[^\]\n]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\] matches the character ] literally
\n matches a line-feed (newline) character (ASCII 10)
\] matches the character ] literally
(?:\([^\)\n]+\)|\[[^\]\n]+\]) Non-capturing group
1st Alternative: \([^\)\n]+\)
\( matches the character ( literally
[^\)\n]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\) matches the character ) literally
\n matches a line-feed (newline) character (ASCII 10)
\) matches the character ) literally
2nd Alternative: \[[^\]\n]+\]
\[ matches the character [ literally
[^\]\n]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\] matches the character ] literally
\n matches a line-feed (newline) character (ASCII 10)
\] matches the character ] literally
2nd Alternative: (?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,})[./\w\d:/?#\[\]#!$&'()*+,;=\-~%]*
(?:\/\w+\/|.:\\|\w*:\/\/|\.+\/[./\w\d]+|(?:\w+\.\w+){2,}) Non-capturing group
1st Alternative: \/\w+\/
\/ matches the character / literally
\w+ match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/ matches the character / literally
2nd Alternative: .:\\
. matches any character (except newline)
: matches the character : literally
\\ matches the character \ literally
3rd Alternative: \w*:\/\/
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
: matches the character : literally
\/ matches the character / literally
\/ matches the character / literally
4th Alternative: \.+\/[./\w\d]+
\.+ matches the character . literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/ matches the character / literally
[./\w\d]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
./ a single character in the list ./ literally
\w match any word character [a-zA-Z0-9_]
\d match a digit [0-9]
5th Alternative: (?:\w+\.\w+){2,}
(?:\w+\.\w+){2,} Non-capturing group
Quantifier: {2,} Between 2 and unlimited times, as many times as possible, giving back as needed [greedy]
\w+ match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\. matches the character . literally
\w+ match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
[./\w\d:/?#\[\]#!$&'()*+,;=\-~%]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
./ a single character in the list ./ literally
\w match any word character [a-zA-Z0-9_]
\d match a digit [0-9]
:/?# a single character in the list :/?# literally
\[ matches the character [ literally
\] matches the character ] literally
#!$&'()*+,;= a single character in the list #!$&'()*+,;= literally (case insensitive)
\- matches the character - literally
~% a single character in the list ~% literally
g modifier: global. All matches (don't return on first match)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

Regex to match URL end-of-line or "/" character

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:
http://server/xyz/2008-10-08-4
http://server/xyz/2008-10-08-4/
http://server/xyz/2008-10-08-4/123/more
But not match something like this:
http://server/xyz/2008-10-08-4-1
So, I thought my best bet was something like this:
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]
where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?
To match either / or end of content, use (/|\z)
This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).
To put that with an updated version of what you had:
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)
Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )
You've got a couple regexes now which will do what you want, so that's adequately covered.
What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$
1st Capturing Group (.+)
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\d{4}-\d{2}-\d{2})
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
3rd Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
4th Capturing Group (.*)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
In Ruby and Bash, you can use $ inside parentheses.
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)
(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)