Regex - Capturing a group based on a pattern - regex

I'm trying to extract match patterns like:
AA/8G+8G+8G+8G/WITHOUT *
AA/8*G+8*G+8*G+8*G/WITH *
AA/8G+8*G+8*G+8*G/MIXED, THIS IS NOT SUPPORTED YET
using the following regular expression:
https://regex101.com/r/zemJ8H/1
but it matches only 8G+8G+8G+
because the pattern is identified as 8G**+**
Is there any way to include the last 8G (without +) in the group?

This expression takes care of all three cases, including the mixed one:
(?<=/)([0-9]{1,2})[*]*([GM])[+]?(\1[*]*\2[+]?)+
Demo
The idea is to separate out the capturing of digits (new \1) from letters (the \2) and use both captures in the repeating group (\1[\*]*\2[\+]?)+ at the end of the expression.

AA/(8\*?G\+?)+/
https://regex101.com/r/zemJ8H/2
What about something like this? AA followed by a plan followed by any number of sequences containing an 8, optionally a *, a G , and optionally a following +, finally followed by a +.

Related

BigQuery - Alternative method to Positive Lookahead for RegExes

I've written a RegEx pattern that identifies alpha-characters that are immediately followed by a numeric character, with the intention that it would used in BigQuery's REGEXP_EXTRACT function.
Here's the pattern: ([A-Z]|[a-z])*(?=[0-9])
However, due to BigQuery's use of RE2 expression library, the Positive Lookahead function does not work. What's an alternative method of identifying the numeric character without including it in the extracted string/match?
Use case:
To extract the first 1 or 2 alpha-characters of a UK postcode, e.g.
NW9 9KL
M1 0TE
ph3 2ee
N10 10KE
You can use
REGEXP_EXTRACT(col, '^[A-Za-z]+')
The ^[A-Za-z]+ regex matches
^ - start of string
[A-Za-z]+ - one or more letters.
Also, if you MUST check for a digit right after the initial letters, you can use a
REGEXP_EXTRACT(col, '^([A-Za-z]+)[0-9]')
The ^([A-Za-z]+)[0-9] regex matches and captures into Group 1 the initial letters, and then just matches a digit (with [0-9]). The REGEXP_EXTRACT function returns the captured substring if there is a capturing group.

Regex for alphanumberic with / or -

The regex should match alphabets or numbers with / or - in between them but should not start or end with / or -
I tried this using RegExr but does not work
[a-zA-Z0-9]+[/|-]*[a-zA-Z0-9]+$
Your current regex has the following problems :
it matches multiple / and -, but only in one spot (e.g. will match 0123/-/-456 but not 0123/456/789
it also matches |, which you don't need to use in a [character class]
it matches up until the end of the string$, but doesn't match from ^the start of the string (e.g. it would match foo0123/456, although it wouldn't match 0123/456foo)
You can use the following regex that Avinash Raj proposed :
^[a-zA-Z0-9]+(?:[/-][a-zA-Z0-9]+)*$
The first point it fixed by putting both the character classe matching slashes and dashes and the one matching alnum characters inside a (?:non-capturing group) which we can quantify with * to specify it can occur any number of time. This group will match any number of slash or dash followed by alnum characters.
The other two points are straightforward, we remove the useless | and add a ^ at the start of the regex.

Elastic search regex to get last 7 digits from right

I have data indexed in this format 676767 2343423 2344444 32494444. I need a regular expression to pattern anlayser last 7 digits from right. Ex output: 2494444. Pattern which we have tried [0-9]{7} which is not working.
In ElasticSearch, the pattern is anchored by default. That means, you cannot rely on partial matches, you need to match the entire string and capture the last consecutive 7 digits.
Use
.*([0-9]{7})
where
.* - will match any 0+ chars other than newline (as many as possible) and then will backtrack to match...
([0-9]{7}) - 7 digits placed into Capture group 1.
The Sense plug-in returns the captured value if a capturing group is defined in the regular expression pattern, so, no additional extraction work (or group accessing work) needs to be done.

Regular expression only 2 consecutive specific characters

I'm trying to build a regular expression for an abstract filesystem. It should:
Start with letters [a-zA-Z], '/', or '.'
Only allow one consecutive occurrence of '/'
Only allow two consecutive occurrences of '.'
Here's what I have so far (works not allowing 3 '.'s but works when typing only one. Any input is greatly appreciated. I tried positive and negative lookaheads for the second group but it still has the same problem.
(?!.*\/{2})(?!.*\.{3})^[A-Za-z\/\.]*$
My Regex101 link:
https://regex101.com/r/xM8oY5/1
I have added a negative lookahead, that matches a dot . surrounded by two not-dot characters.
/(?!(.*[^.])?\.([^.].*)?$)(?!.*\/{2})(?!.*\.{3})^[A-Za-z\/\.]*$/
^^^^^^^^^^^^^^^^^^^^^^^^^
(.*[^.])? -> some arbitrary characters and at least one not-dot
\. -> the dot
([^.].*)?$ -> one not-dot and some arbitrary characters
Both blocks - before and after the dot - are optional, if the single dot comes at start or end of the string.
Test it on regex101.

Regular expression to match string only if trailed by a character

I need help creating a regular expression.
Here are two sample strings:
/path/to/file.jpg
/path/to/file.type.jpg
Respectively, I'm trying to capture:
file.jpg
file.type.jpg
But I want to capture the three as separate strings.
file,jpg
file,type,jpg
Note that I'm not capturing the periods.
I thought something like this could work (excluding the new lines):
([a-z]+)\.
[([a-z]+)[\.]{1}]?
([a-z]{3})
Guidance would be appreciated.
I'm wondering if there is another modified I would need to use to have it capture it properly.
The above expression errors out, by the way :(
I suggest you to use pattern
\/([^.]+)\.?([^.]+|)\.([^.]+)$
and you will have 3 groups: file, type (which will be empty, if not present) and extension
You'd have to use:
/(\w+)(\.(\w+))?\.(\w+){3,4}\b
Then capturing groups 1, 3 and 4 would be your: file(1) type(3) and jpg/png whatever(4)
Groups taken apart:
(\w+) - matches word characters 1 or more (equivalent of saying: {1, }
(\.(\w+))? - matches the 3rd group and with a dot in front, and makes the whole group optional ( ? )
(\w+) - as gr 1
(\w{3,4})\b - matchees 3 or 4 word characters ( {3,4} ) and ensures that after those chracters there are no other characters (word end - \b - ! if supported !)
You can use: "\/(?:\w+\/)+(\w+)\.?(\w+)?\.(\w+)" as regex.
Edit: didnt read about not matching dots.
Live Demo
This regex should work:
/(\w+)\.(\w+)(?:\.(\w+))?$/
Live Demo