Match any except list of values - oracle regex - regex

I need an Oracle regex that will match a file-name in the format ABCD_EFG_YYYYMMDD_HH(24)MISS.csv, except if the time-part is one of three specific values: 110000, 140000, or 180000.
So, for example, it will match the file-name ABC_DEF_20120925_110001.csv, but not the file-name ABCD_EFG_20120925_110000.csv is not.
The following non-Oracle regex works:
^ABCD_EFG_[0-9]*_(?!110000|140000|180000)[0-9]*\.csv$
but I don't know how to write it as an Oracle regex.

Oracle doesn't support lookahead assertions, so you'll have to spell out all the valid matches:
^ABCD_EFG_[0-9]*_([02-9]|1[0235679]|1[148]0{0,3}[1-9])[0-9]*\.csv$
should work (assuming that the time part is always 6 digits long).
Explanation:
ABCD_EFG_ # Match ABCD_EFG_
[0-9]*_ # Match first number (date part) and _
( # Match a number that starts with
[02-9] # 0 or 2-9
| # or
1[0235679] # 1, followed by 2,3,5,6,7, or 9
| # or
1[148] # 11, 14, or 18
0{0,3} # followed by up to three zeroes
[1-9] # but then one digit 1-9
) # End of alternation
[0-9]* # Fill the rest with any digits
\.csv # Match .csv (mind the backslash!)

Related

Extract a sub-string from a matched string

I am attempting to extract a sub-string from a string after matching for 24 at the beginning of the string. The substring is a MAC id starting at position 6 till the end of the string. I am aware that a sub string method can do the job. I am curious to know a regex implementation.
String = 2410100:80:a3:bf:72:d45
After much trial and error, this the reg-ex I have which I think is convoluted.
[^24*$](?<=^\S{6}).*$
How can this reg-ex be modified to match for 24, then extract the substring from position 6 till the end of the line?
https://regex101.com/r/vcvfMx/2
Expected Results: 00:80:a3:bf:72:d45
You can use:
(?<=^24\S{3}).*$
Here's a demo: https://regex101.com/r/HqT0RV/1/
This will get you the result you expect (i.e., 00:80:a3:bf:72:d45). However, that doesn't seem to be a valid MAC address (the 5 at the end seems to be not part of the MAC). In which case, you should be using something like this:
(?<=^24\S{3})(?:[0-9a-f]{2}:){5}[0-9a-f]{2}
Demo: https://regex101.com/r/HqT0RV/2
Breakdown:
(?<= # Start of a positive Lookbehind.
^ # Asserts position at the beginning of the string.
24 # Matches `24` literally.
\S{3} # Matches any three non-whitespace characters.
) # End of the Lookbehind (five characters so far).
(?: # Start of a non-capturing group.
[0-9a-f] # A number between `0` and `9` or a letter between `a` and `f` (at pos. #6).
{2} # Matches the previous character class exactly two times.
: # Matches `:` literally.
) # End of the non-capturing group.
{5} # Matches the previous group exactly five times.
[0-9a-f] # Any number between `0` and `9` or any letter between `a` and `f`.
{2} # Matches the previous character class exactly two times.

Selecting if no delimiter, and no selecting if it is

I have string like "smth 2sg. smth", and sometimes "smth 2sg.| smth.".
What mask should I use for selecting "2sg." if string does not contains"|", and select nothing if string does contains "|"?
I have 2 methods. They both use something called a Negative Lookahead, which is used like so:
(?!data)
When this is inserted into a RegEx, it means if data exists, the RegEx will not match.
More info on the Negative Lookahead can be found here
Method 1 (shorter)
Just capture 2sg.
Try this RegEx:
(\dsg\.)(?!\|)
Use (\d+... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
Method 2 (longer but safer)
Match the whole string and capture 2sg.
Try this RegEx:
^\w+\s*(\dsg\.)(?!\|)\s*\w+\.?$
Use (\d+sg... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
^ # String starts with ...
\w+\s* # Letters then Optional Whitespace (smth )
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
\s* # Optional Whitespace
\w+ # Letters (smth)
\.? # Optional . (Dot)
$ # ... Strings ends with
Something like this might work for you:
(\d*sg\.)(?!\|)
It assumes that there is(or there is no)number followed by sg. and not followed by |.
^.*(\dsg\.)[^\|]*$
Explanation:
^ : starts from the beginning of the string
.* : accepts any number of initial characters (even nothing)
(\dsg\.) : looks for the group of digit + "sg."
[^\|]* : considers any number of following characters except for |
$ : stops at the end of the string
You can now select your string by getting the first group from your regex
Try:
(\d+sg.(?!\|))
depending on your programming environment, it can be little bit different but will get your result.
For more information see Negative Lookahead

Regex to Extract Last Part of URL that Contains User ID Strings

I'm having a hard time figuring this one out and could use some help.
I'm using Google Analytics filters to reduce the number of unique pages being reported in our app by stripping out ID strings from the URLs that are coming in.
What I need is a regex that will look for URLs that have these IDs in the URL. Here's what sets them apart from the rest of the URL:
ID strings are always the last part of the URL
ID strings always contain both letters and numbers
ID strings are always either 16- or 32-characters in length
ID strings can show up twice in a URL
ID strings can end with either a "/" or without
Here are some example URLs that show how they appear in our reporting:
/app/6be031b9672be9b5/
/app/admin/client/settings/6be031b9672be9b5
/app/subscribers/ea33fb38c9efc4dc0367819f23434f99/
/app/subscribers/customfieldsettings/0359c487066727ae/
/app/reports/6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/
The second part of my question is that this regex should also group everything before these ID strings into a capturing group so that I can call that group later on in the filter, effectively stripping out these ID strings to look like the following:
/app/6be031b9672be9b5/ --> /app/
/app/subscribers/ea33fb38c9efc4dc0367819f23434f99/ --> /app/subscribers/
etc.
I've tried a couple different approaches but none seem to work perfectly, so I could really use the help, thank you!
Here's a solution:
^(.*?)(?:\/[a-zA-Z0-9]{16}|\/[a-zA-Z0-9]{32}){0,2}\/?$
Demo
This will remove the last part or 2 parts of URLs which are 16 or 32 characters long and contain only letters and digits.
You can make sure these parts contain both letters and numbers like this, if the tool supports lookaheads:
^(.*?)(?:\/(?=.{0,15}?\d)(?=.{0,15}?[a-zA-Z])[a-zA-Z0-9]{16}|\/(?=.{0,31}?\d)(?=.{0,31}?[a-zA-Z])[a-zA-Z0-9]{32}){0,2}\/?$
Demo
This adds assertions to the pattern.
Breakdown:
^(.*?) # Start of URL
(?:
\/ # a slash
(?=.{0,15}?\d) # check there's a digit at most 16 chars ahead
(?=.{0,15}?[a-zA-Z]) # check there's a letter at most 16 chars ahead
[a-zA-Z0-9]{16} # check the next 16 chars are digits or letters
| # .. or:
\/ # a slash
(?=.{0,31}?\d) # check there's a digit at most 32 chars ahead
(?=.{0,31}?[a-zA-Z]) # check there's a letter at most 32 chars ahead
[a-zA-Z0-9]{32} # check the next 32 chars are digits or letters
){0,2} # .. at most 2 times
\/?$ # optional slash at end
This will do it:
([a-z0-9]+)(?:\/?$)
Demo
Explanation:
([a-z0-9]+) matches and captures the alphanumeric part
(?:\/?$) looks for (but doesn't match or capture) the optional final / and then the end of the string ($)
modified - totally missed that can be 1 or 2 id's at the end thing.
Oh well, revised fwiw.
# (?i)^(.*?)/((?:(?=[^/]{0,31}[a-f])(?=[^/]{0,31}[0-9])(?:[a-f0-9]{16}|[a-f0-9]{32})(?:(?:/[a-z])?/?$|/)){1,2})$
(?i) # Case insensitive modifier
^ # BOS, begin the ride ..
( .*? ) # (1), Kreep up on the first ID
/ # Trim this / junk
( # (2 start), 1-2 ID's separated by a /
(?:
(?= [^/]{0,31} [a-f] ) # Use largest range (32), Must be a lettr AND number
(?= [^/]{0,31} [0-9] )
(?: # One of 16 or 32 length
[a-f0-9]{16}
| [a-f0-9]{32}
)
(?:
(?: / [a-z] )? # optional / letter
/? $ # /? EOS for end of 1 or 2
| # or,
/ # / between 2 only
)
){1,2}
) # (2 end)
$ # EOS, rides over !!
Sample output:
** Grp 0 - ( pos 195 , len 63 )
/app/reports/6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/
** Grp 1 - ( pos 195 , len 12 )
/app/reports
** Grp 2 - ( pos 208 , len 50 )
6fa92d36be0e6c16/dc5aa096fba9cbb97eea1dae616d4b3c/

Match specific length x or y

I'd like a regex that is either X or Y characters long. For example, match a string that is either 8 or 11 characters long. I have currently implemented this like so: ^([0-9]{8}|[0-9]{11})$.
I could also implement it as: ^[0-9]{8}([0-9]{3})?$
My question is: Can I have this regex without duplicating the [0-9] part (which is more complex than this simple \d example)?
There is one way:
^(?=[0-9]*$)(?:.{8}|.{11})$
or alternatively, if you want to do the length check first,
^(?=(?:.{8}|.{11})$)[0-9]*$
That way, you have the complicated part only once and a generic . for the length check.
Explanation:
^ # Start of string
(?= # Assert that the following regex can be matched here:
[0-9]* # any number of digits (and nothing but digits)
$ # until end of string
) # (End of lookahead)
(?: # Match either
.{8} # 8 characters
| # or
.{11} # 11 characters
) # (End of alternation)
$ # End of string
With Perl, you could do:
my $re = qr/here_is_your_regex_part/;
my $full_regex = qr/$re{8}(?:$re{3})?$/
For those of us looking to capture different lengths of the same multiple try this.
^(?:[0-9]{32})+$
Where 32 is the multiple you want to capture all lengths for (32, 64, 96, ...).

Match letter followed by specific numeric range

I am writing a regular expression in which the string can be of 2-3 characters.
The first character has to be a Alphabet between A and H (capitals). This character has to be followed by a number between 1 and 12.
I wrote
[A-H]{1}[1-12]{1,2}
This is fine when I keyin A12 but not when I keyin A6
Please suggest.
You can't specify a range of digits like that because it is implemented as a range between characters, so [1-12] is equivalent to [12], which would only match either a 1 or a 2. Instead, try the following:
[A-H](?:1[012]|[1-9])
Here is an explanation:
[A-H] # one letter from A to H
(?: # start non-capturing group
1[012] # 1 followed by 0, 1, or 2 (10, 11, 12)
| # OR
[1-9] # one digit from 1 to 9
) # end non-capturing group
Note that the {1} after [A-H] in your original regex is unnecessary, [A-H]{1} and [A-H] are equivalent.
You may want to consider adding anchors to the regex, otherwise you would also get a partial match on a string like A20. If you are trying to match an entire string then you should use the following:
\A[A-H](?:1[012]|[1-9])\z
If it is within a larger text you could use word boundaries instead:
\b[A-H](?:1[012]|[1-9])\b
Here you go:
^[A-H]([1-9]|1[0-2])$
No need to for the {1} in your question.
The regex is anchored with ^ and $ meaning it can can be the only thing on your line.
It will not match A60 for example