Reg ex matching a word - regex

I need to match only first two files, out of four files listed below:
ABD_DEF_GHIJ_20150611
ABD_DEF_GHIJ
ABD_DEF_GHIJ_FX_20150611
ABD_DEF_GHIJ_FX
I am using reg ex - ABD_DEF_GHIJ(_\d{8}|\b) and it's working fine. I would like to know if my solution is ok or there is any better alternate solution.

You could use a negative lookahead which will exclude those having _FX following the initial alpha string
^ABD_DEF_GHIJ(?!_FX)(?:_\d{8})?$
see example here

Use anchors and make the number part as optional.
^ABD_DEF_GHIJ(?:_\d{8})?$
DEMO

seems like you don't want to include files with FX, use negative look ahead, you can also append the optional (_\d{8})? if you think it's necessary
^ABD_DEF_GHIJ(?!_FX)
DEMO
DEMO 2

Try this RegEx:
ABD_DEF_GHIJ(?!_FX_?)(_\d{8})?
On regexpal.com:
This also works:
\bABD_DEF_GHIJ(?!_FX_?)(_\d{8}|\b)

Related

Regex to exclude a string from anywhere, but match another expression

I'm quite new to regex. Tried to look at other questions but still can't workout how to resolve my scenario. I want to match string that starts with "AB" but not ABC,or string contains DE but not DEF. For example sDEN23, DET or DE should be matches, AB3 should be matches. I've tried the below so far but it doesn't work as expected. Could someone please help? Many thanks.
Edited: How can this be achieved without using lookahead and lookbehind, as these are not support by Impala?
.*AB?[^C].*|.*DE?[^DEF]
You can use a negative lookahead pattern to avoid matches followed by certain characters:
^(?:AB(?!C)|(?!.*DEF).*DE).*
Demo: https://regex101.com/r/DH1WTf/3
EDIT: Since you've updated the question by replacing the python tag with impala, whose regex engine does not support lookarounds, you can instead use multiple LIKE operators to achieve what you want:
SELECT * FROM table_name WHERE (col LIKE 'AB%' OR col LIKE '%DE%') AND NOT (col LIKE 'ABC%' OR col LIKE '%DEF%')
Try with
(AB)([^C^\n\r])+|(DE)([^F^\n\r])+
Use a negative look ahead for your restrictions, and a match for either of your targets:
^(?!ABC|DEF)(AB|DE).*
See live demo.

Regex to match path containing one of two strings

RegEx to match one of two strings in the third segment, ie in pseudo code:
/content/au/(boomer or millenial)/...
Example matches
/content/au/boomer
/content/au/boomer/male/31
/content/au/millenial/female/29/M
/content/au/millenial/male/18/UM
Example non-matches
/content/au
/content/nz/millenial/male/18/UM
/content/au/genz/male
I've tried this, but to no avail:
^/content/au/(?![^/]*/(?:millenial|boomer))([^/]*)
Don't use a look ahead; just use the plain alternation millenial|boomer then a word-boundary:
^/content/au/(?:millenial|boomer)\b(?:/.*)?
See live demo.
You should probably spell millennial correctly too (two "n"s, not one).
What's with the negative lookahead? This is a simple, if not trivial, positive match.
^/content/au/(?:millenial|boomer)(?:/|$)
The final group says the match needs to be followed by a slash or nothing, so as to exclude paths which begin with one of the alternatives, but contain additional text.
You can use the following regex DEMO
content/au/(?:boomer|millenial)

Regex Pattern to extract url links from two string

I have two string in which I have to sorten urls. I want a regex pattern to extract them
https://l.facebook.com/l.php?u=http%3A%2F%2Febay.to%2F2EyH7Nq&h=ATNHM5kACc4rh_z68Ytw__cNCzJ63_iPezd_whc0PjcsN4qj1PfdJgFXyrOKM3-biqPm7eAXTOc5LD6r-7JXhRsqsqEHUs0jaJkjvm_QWf6sqbHQmS63q6P0_NcQoUa86Az5EttNT9xJb_evKBaiFxW7e7v2afJQn2zNxz5lQ8xgxhMcEFuJ3hUiSYUMEemKFB2LSIgAZFibRv4GeRrTk8hxFaArkBuAhQaXQFd4jX-aQuUYhjD0ErV5FY-D4gFMpb0lFCU7SyBlRpkUuOcHVjwjxN-_g6reMYwo8loAJnJD
/redirect?q=http%3A%2F%2Fgoo.gl%2FIW7ct&redir_token=PV5sR8F7GuXT9PgPO_nkBFLABQx8MTUxNjA3OTY5MEAxNTE1OTkzMjkw&v=7wmIyD1fM4M&event=video_description
Output will be from 1st and 2nd link:-
http%3A%2F%2Febay.to%2F2EyH7Nq
http%3A%2F%2Fgoo.gl%2FIW7ct
Please help me out.
I have already used:-
(http|https).*?&
but its not working on first url.
You can try this:
=(https?[^&]*)
Demo
If lookbehind is possible in your flavour of regex then you may try this as well which will ensure to not capture the equal sign:
(?<=)(https?[^&]*)
Demo 2
Try this regex !
I am also attach the output of the regex through regex101.
http%3A%2F%2F(.*)%2F(.*[^&])(?=&)
You can use this pattern to only capture goo.gl and ebay.to links:
(http%3A%2F%2F(ebay\.to|goo\.gl)%2F[^&]*)&
Demo

Regex: Find a string without a particular character in a particular spot

I am trying to find a single-line string using the following regex:
=949.+\$h[^1]\$.+
However, the \$h[^1]\$ string can contain 10 and 11 in addition to a 1. I want to find the tens and elevens but NOT the ones.
So I want to find:
$h10 and $h11 but NOT $h1
Thoughts?
to clarify, I'd like it to find 2,3,4,5,6,7,8,9,10,11 but not 1
You can use this:
\$h(2|3|4|5|6|7|8|9|10|11)
DEMO:
http://regex101.com/r/lX4mS5
If you want to match everything except $h1, you can use a negative look-behind, like this:
\$h[\d]{1,2}(?<!(\$h1))
DEMO:
http://regex101.com/r/bS8mA2
Another option (without looking around):
\$h([2-9]|\d{2,})
Debuggex Demo

Regular Expression Filter for values in brackets

I've been trying to get the correct filter for:
{0}{1/2}{R/G}{X}{Y}{Z}{R}{R}
I've tried this on rubular.com (http://rubular.com/r/niCiKoUfmN):
\{([0-Z])\}
I get:
{0}{X}{Y}{Z}{R}{R}
But I do not get:
{1/2}{R/G}
How can I write the regular expression so it gets all of it?
\{(\w)(?:\/(\w))?\}
Edit live on Debuggex
A radical way consists to use a negated character class with the character you want to avoid:
\{([^}]*)\}
[^}] means all characters except }
* means zero or more times
You don't have the slash sign (/) in your group. Further, you have to add an quantificator to tell the parser, more characters in brackets are allowed:
\{([0-Z/]+)\}
You can do so by adding an optional /[0-Z]
Which will give you:
\{([0-Z](\/[0-Z])?)\}
Rubular: http://rubular.com/r/3D0VPCaJX7
This should do it:
\{[0-Z\/]+\}
You don't need the parentheses unless you're wanting to use a subset of the match for something else.
You need to include 0 or more inclusions of the / clause.
Debuggex Demo
\{([0-Z][\/0-Z]*)\}
Edit live on Debuggex
jsFiddle Demo in javascript