Regex Class With Exceptions - regex

I know that the regex class \D matches "all characters that are non-numeric" but I would like to match on all characters that are non-numeric and are not / or -
How might I do this? Thanks!

You can negate character sets by putting ^ inside:
[^\d\/-]
Will match any one character, which is not a digit, forward slash or dash.

You already know how to find non-numeric characters with \D. You may lay a restriction on the \D to exclude / and - and any other non-numeric characters with a negative lookahead:
(?![\/-])\D
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[\/-] any character of: '\/', '-'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\D non-digits (all but 0-9)

Related

Regex - Find string that has 5 or more mentions in it

Trying to detect whether a message has 5 or more mentions in it.
For example:
#Boddy is doing great with #shirly #rebecca #jimmy and #mom
Above will count as one match. This will not count as a match:
#Boddy is doing great with #shirly #rebecca #jimmy and ...
Preferably, ##### should not count either, but not too important!
I've tried
#([^# ]+){5,}
But no luck, it highlights all 5 instead of the whole string.
Use a pattern which covers the entire string:
^.*#.*#.*#.*#.*#.*$
Demo
Assumptions:
Looking at your data you'd maybe want to assert that the '#' is preceded with either the start-line anchor or a space;
You'd like to avoid concatenated '#'s to prevent false positives.
With these in mind, maybe you could try:
^(?:[^#]*(?<!\S)#\w+){5}.*$
Seen an online demo.
^ - Start-line anchor;
(?:- Open non-capture group;
[^#]* - 0+ (Greedy) characters other than '#';
(?<!\S) - Negative lookbehind to assert position is not preceded by a non-whitespace;
#\w+ - A literal '#' with 1+ (Greedy) word-characters;
){4} - Close non-capture group and match 4 more times;
.* - Any 0+ (Greedy) characters;
$ - An end-line anchor.
You can change your pattern by adding the # after the negated character class, and also end the pattern with the negated character class.
Using the negated character class also prevents unnecessary backtracking.
If you don't need the capture group, you can use a non capture group (?:
Note that [^#]* can also match a newline
^([^#]*#){5,}[^#]*$
See a regex demo.
If you want to match mentions, ##### should not match and you don't want to match crossing newlines, you can prepend \B before the # to assert a non word boundary.
Then match at least a single char other than a whitespace char or # after matching the #.
^[^#\n\r]*(\B#[^#\s][^#\n\r]*){5,}$
See another regex demo.

regex if (text contain this text) match this

I have these two sentence
TAGGING ODP:-7.160792, 113.496069
TAGGING pel:-7.160792, 113.496069
I want to match -7.160792 part only if the full sentence contain "odp" in it.
I tried the following (?(?=odp)-\d+.\d+) but it doesn't work, i don't know why.
Any help is appreciated.
(?(?=odp)-\d+\.\d+) won't work because (?=odp) is a positive lookahead that imposes a constraint on the pattern on the right, -\d+\.\d+. Namely, it requires odp string to occur exactly at the same location where - and a number are expected.
Use
(?<=ODP:)-\d+\.\d+
ODP:(-\d+\.\d+)
If lookbehinds are supported, the first variant is more viable.
Otherwise, another option with capturing groups is good to use.
And if odp can appear anywhere, even after the number:
(?i)^(?=.*odp).*(-\d+\.\d+)
This will capture the value into a group.
EXPLANATION
--------------------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
odp 'odp'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
You can use the regex, (?i)(?<=odp:)[^,]*.
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
[^,]*: Anything but ,
👉 If you want the match to be restricted to numbers only, you can use the regex, (?i)(?<=odp:)(?:-\d+.\d+)
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
(?:: Start non capturing group
-: Literal, -
\d+: 1+ digit(s)
.\d+: . followed by 1+ digit(s)
): End non capturing group
👉 If the sign can be either + or -, you can use the regex, (?i)(?<=odp:)(?:[+-]\d+.\d+)
The pattern (?(?=odp)\-\d+\.\d+) is using a conditional (? stating in the if clause:
If what is directly to the right from the current position is odp,
then match -\d+.\d+
That can not match.
What you also could do is match odp followed by any char other than a digit using \D* and capture the digit part in a group.
\bodp\b\D*(-\d+\.\d+)\b
The pattern matches:
\bodp\b match odp between word boundaries to prevent a partial match
\D* Optionally match any char other than a digit
(-\d+\.\d+) Capture - and 1+ digits with a decimal part in group 1
\b A word boundary
Regex demo
(?<=ODP:)(-\d+.\d+)
You can try using the negative look behind.
This should solve for the code you ve provided.

Regex to pick the alias from email address

I need to identify all email addresses in a given cell enclosed in any special character, written in any number of multiple lines.
This is something that I built.
"(!\s<,;-)[a-zA-Z0-9]*#"
Is there any improvement?
The pattern (!\s<,;-)[a-zA-Z0-9]*# starts with capturing !\s<,;- literally. If you want to match 1 of the listed characters, you can use a character class [!\s<,;-] instead.
If you want to match xyz123 in xyz123#gmail.com you can use:
[a-zA-Z0-9]+(?=#)
The pattern matches
[a-zA-Z0-9]+ Match 1+ occurrences of any of the listed ranges
(?=#) Assert (not match) an # directly to the right of the current position
See a regex demo.
Use
([a-zA-Z0-9]\w*)#
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z0-9] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
# '#'

Regex multiple lookbehinds order

I've got a regex that I want to use to match word characters after a - if there are not any proceeding word characters.
(?<!\w)(?<=-)\w+
For the string
I want -to match a word-if it has a '-' bofore it and only-if '-' is not preceded by a word character.
I would expect it to only match to. However, it actually matches to, if, and if.
Demo
If I take the positive lookbehind out
(?<!\w)-\w+
In the same string, it only matches -to as expected but I don't want the - in the match information.
Is it possible to chain positive and negative lookbehinds so they happen in order?
The pattern that you tried (?<!\w)(?<=-)\w+ makes 2 assertions the current position:
(?<!\w) is there not a word character directly to the left
(?<=-) is there a - directly to the left
This can also be written as just (?<=-)\w+ as the positive lookbehind asserts that the exact match should be at the left.
You get the matches to, if, and if because that assertion is true at multiple places.
You can use (?<=\W-) to assert what is directly to the left is a non word character \W followed by -
(?<=\W-)\w+
Regex demo
Use
(?<=\B-)\w+
See proof
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\B the boundary between two word chars (\w)
or two non-word chars (\W)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))

Trying to match what is before /../ but after / with regular expressions

I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /
I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:
this/is/a/./path/that/../includes/face/./stuff/../hat
and my regular expression is:
#\/(.*)\.\.\/#
matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../
How should I change my regex to make it work?
.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.
Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)
[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.
Change your pattern that only characters other than / ([^/]) get matched:
#([^/]*)/\.\./#
Alternatively, you can use a lookahead.
#(\w+)(?=/\.\./)#
Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
) end of look-ahead
I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#
In your favourite language, do a few splits and string manipulation eg Python
>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1] # the last item is not required.
>>> for item in a:
... print item.split("/")[-1]
...
that
stuff
In python:
>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']
Or if you just want the text without the slashes:
>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']
([^/]+) will capture all the text between slashes.
([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.
You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'