I'm trying to make regex that picks all words that are a-z and with or without the symbol '.
the word needs to be at least 2 characters
cant start with the ' symbol
two ' symbols can't be next to each other
and "two character" words can't end with the ' symbol
I have being working for hours on that regex and i can't make it work:
/\b[a-z]([a-z(\')](?!\1))+\b/
it does not work and i don't know why! (the two ' symbols next to each other)
any ideas?
([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})
Live # RegExPal
You probably will not need to use \b as regex is greedy and will consume all words as a whole.
This version can't be tested with RegexPal (does not recognize the lookbehind) but has custom word borders:
(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])
This should work (disclaimer: untested)
/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/
Yours does not because you are attempting to nest a parenthesized expression inside a character class. That only adds ( and ) to the class, it will not set the value of your next \1 code.
(Edit) Added the constraint on aa'.
Assuming words are delimited by spaces:
(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)
In action in a perl script:
my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù
output:
OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù
Explanation:
The regular expression:
(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z]{2} any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
'' '\'\''
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[a-z']{2,} any character of: 'a' to 'z', ''' (at
least 2 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Related
I'm trying to override Parsedown's markup to only allow <h2> headings.
What regex would escape all heading types except <h2>?
#Heading -> \#Heading
##Heading -> ##Heading
###Heading -> \###Heading
####Heading -> \####Heading
#####Heading -> \#####Heading
######Heading -> \######Heading
You can use this regex
^(?!##\w)(?=#)
Regex Demo
Regex Breakdown
^ #Start of string
(?! #Negative lookahead(it means, whatever is there next do not match it)
##\w #Assert that its impossible to match two # followed by a word character
)
(?= #Positive lookahead
# #check if there is at least one #
)
NOTE
\w denotes any character from [A-Za-z0-9_].
[..] denotes character class. Any character(not string) present in this will be matched.
Use look aheads for headings, but not double hashes:
^(?!##\w)(?=#+)
Description
^((?:#|#{3,})[^#])
Replace with: \$1
This regular expression will do the following:
match one hash
match 3 or more hash
Example
Live Demo
https://regex101.com/r/kE4oK6/1
Sample text
#Heading
##Heading
###Heading
####Heading
#####Heading
######Heading
Sample Matches
\#Heading
##Heading
\###Heading
\####Heading
\#####Heading
\######Heading
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
#{3,} '#' (at least 3 times (matching the
most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
[^#] any character except: '#'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
I need to capture all Upper and Lower case character strings other than "FOO" and "BAR". How to do this?
I tried [^(^FOO$)(^BAR$)] but it doesn't work.
Update:
Actually I'm using this in a context, I am concatenating it with another regex
["(\w)+": _this_regex_ ]
For example ["abc":FOO] shouldn't be matched
All other types say ["abc":BAZ] should match
You want a negative look ahead:
\["(\w+)"\s*:\s*(?!FOO\b|BAR\b)(\w+)]
The (\w+) are capturing group, they store the key/value pairs inside variables (I guess that's what you want to do?)
(?!...) is a negative lookahead: it will cause the regex to fail if what's inside matches.
\b is a word-boundary: here it will make the loohahead match (and so fail the regex) only if FOO is followed by a non alphanum character (so ["foo": FOOLISH] will be accepted by the regex)
\s is a short for all type of whitespaces (spaces, tabs, newlines etc)
Demo: http://regex101.com/r/fM3uZ7
What you tried [^...] was a negative character range: it matches any character (and only one character) that's not inside the character range. And keep in mind that inside character ranges only ], ^ and - are special character (so $ means \$ and so on)
Have a try with:
(?i)^(?!.*foo)(?!.*bar)
In action within perl script:
my $re = qr~(?i)^(?!.*foo)(?!.*bar)~;
while(<DATA>) {
chomp;
say (/$re/ ? "OK : $_" : "KO : $_");
}
__DATA__
["abc":FOO]
["abc":BAZ]
output:
KO : ["abc":FOO]
OK : ["abc":BAZ]
Explanation:
The regular expression:
(?i)^(?!.*foo)(?!.*bar)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
foo 'foo'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
bar 'bar'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
I want any path string that leads to a file with an extension of '.log' or a path that contains the directory 'tmp' to be excluded from the match
I'm nearly there:
(?!tmp).+?\.(?!log|tmp).+
http://rubular.com/r/Ubkz7MIEGH
What I want is for
tmp/hello.jpg
to be excluded in the same way that
hello.log
hmm.tmp
Are excluded.
Just try with following regex:
^(?!(?:.*log$)|tmp).*$
How about:
^(?!.*\btmp\b)(?!.+\.log\b)(.+)$
Explanation:
The regular expression:
(?-imsx:^(?!.*\btmp\b)(?!.+\.log\b)(.+)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
tmp 'tmp'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
log 'log'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
^(?!tmp).*(?<!\.tmp|log)$
It's just a negative lookbehind. Live demo
I am trying to make regular expression for Valid sharepoint folder name, which have conditions:
Cannot begin or end with a dot,
Cannot contain consecutive dots and
Cannot contain any of the following characters: ~ " # % & * : < > ? / \ { | }.
Wrote Regex for 1st and 3rd point:
[^\.]([^~ " # % & * : < > ? / \ { | }]+) [^\.]$
and for third (?!.*\.\.).*)$ but they are not working properly and have to integrate them into one expression.
Please help.
What about just
^\w(?:\w+\.?)*\w+$
I made a small test here
EDIT
This also works
^\w(?:\w\.?)*\w+$
How about:
/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/
explanation:
The regular expression:
(?-imsx:^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?/\\{|}]+).+$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[~"#%&*:<>?/\\{|}] any character of: '~', '"', '#', '%',
+ '&', '*', ':', '<', '>', '?', '/', '\\',
'{', '|', '}' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In action (perl script):
my $re = qr/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/;
while(<DATA>) {
chomp;
say /$re/ ? "OK : $_" : "KO : $_";
}
__DATA__
.abc
abc.
a..b
abc
output:
KO : .abc
KO : abc.
KO : a..b
OK : abc
Can anyone decode what this regular expression means in Perl:
while (/([0-9a-zA-Z\-]+(?:'[a-zA-Z0-9\-]+)*)/g)
Here is a breakdown of the regex:
( # start a capturing group (1)
[0-9a-zA-Z-]+ # one or more digits or letters or hyphens
(?: # start a non-capturing group
' # a literal single quote character
[a-zA-Z0-9-]+ # one or more digits or letters or hyphens
)* # repeat non-capturing group zero or more times
) # end of capturing group 1
The regex is in the form /.../g and in a while loop, which means that the code inside of the while will be run for each non-overlapping match of the regex.
There's a tool for that: YAPE::Regex::Explain
The regular expression:
(?-imsx:([0-9a-zA-Z\-]+(?:'[a-zA-Z0-9\-]+)*))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9a-zA-Z\-]+ any character of: '0' to '9', 'a' to
'z', 'A' to 'Z', '\-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
[a-zA-Z0-9\-]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', '\-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
F.J's answer is a perfect breakdown. But... he left out an important piece, which is the /g at the end. It tells the parser to continue where it left off from last time. So the while loop will continue to loop over the string repeatedly until it gets the the point where there are no other points that match.