This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
Can somebody help me to understand this below regular expreesion in XSLT.
regexp:match(test-graph.api.example.com, '(?=CN).*\.(.*)(\.)(.*)(?<=com)', 'i')
What would be the output and how to interpret this regular expression.
Please let me know
Is should be read this way:
(?=CN).*\.(.*)(\.)(.*)(?<=com)
explanation:
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . literally (case sensitive)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\.)
\. matches the character . literally
3rd Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Positive Lookbehind (?<=com)
Assert that the Regex below matches
com matches the characters com literally
The i modifiers means that the regex is case insentive.
Related
Very much a rookie question - have the following which I am using to look for any urls without the following extensions (seems to work):
href="\S*(?i)(?<!\.html)(?<!\.pdf)(?<!\.doc)(?<!\.docx)(?<!\.ppt)(?<!\.pptx)(?<!\.xls)(?<!\.xlsx)(?<!\.jpg)(?<!\.jpeg)(?<!\.eps)"
Where I am struggling is trying to figure out how to also find file names with extensions and exclude such as:
test.html#help
test.html?help
test.html?help&please
Not sure how to take something like this (?<!\.html) and add a wildcard to handle anything after .html
Did some more testing via an online regex tester site and this seems to work - matches any of the file extensions including test.html#help etc :
href="\S*(?i)(((?<=\.html)\S*)|((?<=\.pdf)\S*)|((?<=\.doc)\S*)|((?<=\.ppt)\S*)|((?<=\.xls)\S*)|((?<=\.jpg)\S*)|((?<=\.jpeg)\S*)|((?<=\.eps)\S*))"
but this does not work at all:
href="\S*(?i)((?<!\.html)\S*)"
Any help greatly appreciated.
Does this work for you?
href="(?<url>.+?\.(?!(html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)).*?)"
https://regex101.com/r/SPjvkR/1
href=" matches the characters href=" literally (case sensitive)
Named Capture Group url (?<url>.+?\.(?!(html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)).*?)
(?<url> is what gives the capturing group the name. It can be omitted if you don't want the capturing group to have a name or can be replaced with ?: to make the it a non-capturing group. Naming can just make it more convenient to get the group's value in later code if needed, but in your case I don't think it matters
.+? matches any character (except for line terminators) between one and unlimited times, as few times as possible, expanding as needed (lazy)
\. matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
Negative Lookahead (?!(html|xlsm|pdf|doc|ppt|jpg|jpeg|eps))
Assert that the Regex below does not match
2nd Capturing Group (html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)
.*? matches any character (except for line terminators) between zero and unlimited times, as few times as possible, expanding as needed (lazy)
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
Update
New regex based on comments
href="((?!.*\.(?:html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)).*?)"
Regex Demo
href=" matches the characters href=" literally (case sensitive)
1st Capturing Group ((?!.*\.(?:html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)).*?)
Negative Lookahead (?!.*\.(?:html|xlsm|pdf|doc|ppt|jpg|jpeg|eps))
Assert that the Regex below does not match
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
Non-capturing group (?:html|xlsm|pdf|doc|ppt|jpg|jpeg|eps)
. matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
" matches the character " with index 3410 (2216 or 428) literally (case sensitive)
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
What is the output of this regular expression?
/\s+(\d+)\s+/
In particular what is the meaning of /\s
In your regex \s+ matches any number of whitespaces sequentially and /d+ matches any number of digits sequentially .
\s and \d matches a single whitespace and single digit respectively the + makes it match any number of sequential whitespaces and digits respectively.
You can find a full explanation at regex101.com.
/\s+(\d+)\s+/
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/
might be useful :D
\s+(\d+)\s+ / ↵ matches the character
↵ literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) 1st Capturing Group
(\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s+ matches any whitespace
character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Beginner here and I'm trying to understand this. Can someone please break down the part in between the single quotes and describe what it does?
grep -oP '(?<=\S\/1\.\d.\s)[345]\d+'
Many thanks in advance!
Positive Lookbehind (?<=\S/1.\d.\s) Assert that the Regex below matches
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\/ matches the character / literally (case sensitive)
1 matches the character 1 literally (case sensitive)
\. matches the character . literally (case sensitive)
\d matches a digit (equal to [0-9])
. matches any character (except for line terminators)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Match a single character present in the list below [345]
345 matches a single character in the list 345 (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Output simply copied from https://regex101.com/r/HfJSNm/1 : very handy to test/share/have automatic explications on regexes.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
How can I remove all spaces into braces with Notepad++ and RegEx?
For example:
I have string [Word1 Word2 Word3]
I need: [Word1Word2Word3]
Thanks
\s++(?=[^[]*])
\s++
matches any whitespace character (equal to [\r\n\t\f\v ])
++ Quantifier — Matches between one and unlimited times, as many times as possible, without giving back (possessive)
Positive Lookahead (?=[^[]*])
Assert that the Regex below matches
Match a single character not present in the list below [^[]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[ matches the character [ literally (case sensitive)
] matches the character ] literally (case sensitive)
(?:\[|\G(?!^))[^]\s]*\K\s+
Non-capturing group (?:\[|\G(?!^))
1st Alternative \[
\[ matches the character [ literally (case sensitive)
2nd Alternative \G(?!^)
\G asserts position at the end of the previous match or the start of the string for the first match
Negative Lookahead (?!^)
Assert that the Regex below does not match
^ asserts position at start of the string
Match a single character not present in the list below [^]\s]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
] matches the character ] literally (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:
http://server/xyz/2008-10-08-4
http://server/xyz/2008-10-08-4/
http://server/xyz/2008-10-08-4/123/more
But not match something like this:
http://server/xyz/2008-10-08-4-1
So, I thought my best bet was something like this:
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]
where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?
To match either / or end of content, use (/|\z)
This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).
To put that with an updated version of what you had:
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)
Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )
You've got a couple regexes now which will do what you want, so that's adequately covered.
What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$
1st Capturing Group (.+)
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\d{4}-\d{2}-\d{2})
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
3rd Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
4th Capturing Group (.*)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
In Ruby and Bash, you can use $ inside parentheses.
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)
(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)