Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
What does this ((\d\d\d)\s)? regex match?
\d matches the digits. it is all about the langugae you are using.
In python3, [0-9] matches only 0123456789 characters, while \d matches [0-9] and other digit characters, for example Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩.
\s matches any whitespace character
\d matches digits from [0-9].
\s matches white-space characters like [ \t\n\r]
? is means optional, it matches even if the following regex are not present.
() are used for grouping.
Now the question is what does ((\d\d\d)\s)? match?
\d\d\d matches 3 consecutive digits and group them to $1.
((\d\d\d)\s) matches 3 consecutive followed by space and this is grouped to $2.
since we have ? at the end of the regex, it matches digits followed with space and also if there are no such match.
In case if there is no match, it points to start of the line.
The regex expression :
The first backslash escapes the open parenthesis that follows, as it is a special character, so the regex will search for an open and a close parenthesis in the input string
Example : (111)
have a look at this site
https://regex101.com/r/yS5fU8/2
1st Capturing Group (\d\d\d)
p (\d\d\d) \d matches a digit (equal to [0-9])
\d matches a digit (equal to [0-9])
\d matches a digit (equal to [0-9])
\d matches a digit (equal to [0-9])
and
- \s matches any whitespace character (equal to [\r\n\t\f\v ])
Related
This question already has answers here:
Greedy vs. Reluctant vs. Possessive Qualifiers
(7 answers)
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I'm trying to understand the behavior of regex when using \d and \w consecutively to match words and numbers in a sentence. I searched for similar questions but I couldn't find a good match (please let me know if this is somehow duplicate).
# Example sentence
"Adam has 100 friends. Bill has 23 friends. Cindy has 5 friends."
When I use regex [A-Za-z]+\s\w+\s\d+\w, it returns matches for:
Adam has 100
Bill has 23
BUT NOT FOR
Cindy has 5
I would have expected no matches at all since the greedily searched digits (\d+) are not followed by any word character (\w); they are followed by a white space instead. I think, somehow \w is matching digits following the first occurrence of any digit. I thought \d+ would have exhausted the stretch of digits in the search. Can you help me understand what is going on here?
Thanks
I thought \d+ would have exhausted the stretch of digits in the search
No that is not the case. \d+ matches as many digits as it can before next \w (that also matches digit i.e. [a-zA-Z_0-9]) forces regex engine to backtrack one position so that \w can match one word character.
If you don't want this backtracking to happen then use possessive quantifier ++:
[A-Za-z]+\s\w+\s\d++\w
However note that \d++w pattern will always fail for all 3 cases because \d++ won't backtrack and \w will never be able to match a digit.
This pattern will succeed only if there is non-digit word character in the end like Chapter is 23A.
RegEx Demo
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want regex to parse for four digits, with only one comma or nothing after those digits being considered valid.
Valid examples:
1970
1970 hello
1970, hello
hello ,1970
Invalid examples:
1970hello
1970,hello
1970,,
hello,1970
I only want the digits (e.g. 1970) to actually be parsed.
I currently have: (?<![^\s,])(\d{4})(?![^\s,]), but that matches with the bottom three invalid strings. Any ideas?
If you want only one comma or nothing after the 4 digits, you could use a positive lookahead (?=,?(?!\S)) asserting what is on the right is an optional comma. Then use a negative lookahead to assert what comes after the comma is not a non whitespace char.
If what comes before the 4 digits can only be a comma, but not a not whitespace char before that comma, you can use a negative lookbehind (?<!\S\S) to exclude 2 consecutive non whitespace chars
But you also want to exclude matching not a comma before (?<![^,\s]) to for example not allow $1970
(?<!\S\S)(?<![^,\s])\d{4}(?=,?(?!\S))
(?<! Negative lookbehind, assert what is on the left is not
\S\S Match 2 consecutive non whitespace chars
) Close lookbehind
(?<! Negative lookbehind, assert what is on the left is not
[^,\s] Match any char except , or a whitespace char
) Close lookbehind
\d{4} Match 4 digits
(?= Positive lookahead, assert what is on the right is
,?(?!\S) Match an optional , not followed by a non whitespace char
) Close lookahead
Regex demo
Note that if you need the match only you can omit the capturing group.
Your articulation of the conditions doesn't seem to agree with your examples. One possible and plausible generalization from your examples would be require the number to be space-separated, but permit a single adjacent comma before or after the number but of course, we can't really know if that's what you actually mean.
(?:(?:^|\s),?)(\d{4})(?=,?(?:\s|$))
The capturing parentheses contain the number; there will be a non-capturing match before it.
Because ur comma. You need (?<!\S\S)(\d{4})(?!\S\S) to match invalid.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Beginner here and I'm trying to understand this. Can someone please break down the part in between the single quotes and describe what it does?
grep -oP '(?<=\S\/1\.\d.\s)[345]\d+'
Many thanks in advance!
Positive Lookbehind (?<=\S/1.\d.\s) Assert that the Regex below matches
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\/ matches the character / literally (case sensitive)
1 matches the character 1 literally (case sensitive)
\. matches the character . literally (case sensitive)
\d matches a digit (equal to [0-9])
. matches any character (except for line terminators)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Match a single character present in the list below [345]
345 matches a single character in the list 345 (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Output simply copied from https://regex101.com/r/HfJSNm/1 : very handy to test/share/have automatic explications on regexes.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
How can I remove all spaces into braces with Notepad++ and RegEx?
For example:
I have string [Word1 Word2 Word3]
I need: [Word1Word2Word3]
Thanks
\s++(?=[^[]*])
\s++
matches any whitespace character (equal to [\r\n\t\f\v ])
++ Quantifier — Matches between one and unlimited times, as many times as possible, without giving back (possessive)
Positive Lookahead (?=[^[]*])
Assert that the Regex below matches
Match a single character not present in the list below [^[]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[ matches the character [ literally (case sensitive)
] matches the character ] literally (case sensitive)
(?:\[|\G(?!^))[^]\s]*\K\s+
Non-capturing group (?:\[|\G(?!^))
1st Alternative \[
\[ matches the character [ literally (case sensitive)
2nd Alternative \G(?!^)
\G asserts position at the end of the previous match or the start of the string for the first match
Negative Lookahead (?!^)
Assert that the Regex below does not match
^ asserts position at start of the string
Match a single character not present in the list below [^]\s]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
] matches the character ] literally (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Can someone please explain what this regexp matches?
#\b(https://exampleurl.com/)([^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#
I have no experience with regexp and I need to know what this one does.
Trying with link. It explains all:
/[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|))/
[^\s()<>]+ match a single character not present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
()<> a single character in the list ()<> literally (case sensitive)
(?:([\w\d]+)|([^[:punct:]\s]|)) Non-capturing group
1st Alternative: ([\w\d]+)
\( matches the character ( literally
[\w\d]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
\d match a digit [0-9]
\) matches the character ) literally
2nd Alternative: ([^[:punct:]\s]|)
1st Capturing group ([^[:punct:]\s]|)
1st Alternative: [^[:punct:]\s]
[^[:punct:]\s] match a single character not present in the list below
[:punct:] matches punctuation characters [POSIX]
\s match any white space character [\r\n\t\f ]
2nd Alternative: ([^[:punct:]\s]|)
1st Capturing group ([^[:punct:]\s]|)
1st Alternative: [^[:punct:]\s]
[^[:punct:]\s] match a single character not present in the list below
[:punct:] matches punctuation characters [POSIX]
\s match any white space character [\r\n\t\f ]
2nd Alternative: (null, matches any position)