I have a simple string separated by underscores from which I need to pull all the values after a specific underscore using a regular expression with the REGEXP_EXTRACT formula in Google Data Studio
The strings look like this:
ABC123_DEF456_GHI789-JKL274
Basically the values after the second underscore can be alphanumeric or symbols as well.
I need to pull the values after the second underscore. In the case of the example I gave, it would be:
GHI789-JKL274
Any ideas would be greatly appreciated.
With your shown samples please try following regex.
^(?:.*?_){2}([^_]*)
OR
REGEXP_EXTRACT(yourField, "^(?:.*?_){2}([^_]*)")
Here is the Online Demo for used regex.
Explanation: Adding a detailed explanation for used regex here.
^ ##Matching from starting of the value here.
(?: ##Opening 1 non-capturing group here.
.*?_ ##Using Lazy match to match till next occurrence of _ here.
){2} ##Closing non-capturing group here and matching its 2 occurrences.
( ##Creating 1 and only capturing group here.
[^_]* ##Matching everything before _ here.
) ##Closing capturing group here.
You need to use
REGEXP_EXTRACT(some_field, "^(?:[^_]*_){2}([^_]*)")
See the regex demo.
Details:
^ - start of string
(?:[^_]*_){2} - two occurrences of any zero or more chars other than _ and then a _
([^_]*) - Capturing group #1: zero or more chars other than _.
Related
I want to a regex pattern that could match the following cases:
0, 1, 0.1, .1, 1g, 0.1g, .1g, 1(g/100ml), .1(g/ml)
If the regex matches the pattern, I want to capture only the numerical part(0,1,0.1..)
I tried using following regex but it matches many cases:
((?=\.\d|\d)(?:\d+)?(?:\.?\d*))|((?=\.\d|\d)(?:\d+)?(?:\.?\d*))[a-zA-Z]+?|\([^)]*\)
How to achieve above with single regex pattern?
Edit:
To make the question solution more generic
What would be a single regex that would match below
Any numerical ( 0, 1, 0.1, ...)
Any numerical followed by g, mg any characters (0.1g, .1mg, 100kg)
Any numerical followed by anything in parentheses - .1(g/100ml), 100(mg/1kg)
And just capture the numerical part
You could make the pattern a bit more specific and use a capture group for the digits and optionally match what follows or (Updated with the comment of # anubhava) add a word boundary to prevent another partial match.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g?\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group for the alternation
\(g\/\d*ml\) Match (g/ optional digits and ml)
| Or
g?\b Match an optional g followed by a word boundary
) Close non capture group
Regex demo
If the values should match in the comma separated string, you can assert either a , or the end of the string to the right.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g)?(?=,|$)
Regex demo
Edit
A broad pattern to match anything between parenthesis or optional chars a-zA-Z after the digits:
(\d*\.?\d+)(?:\([^()]*\)|[a-zA-Z]*\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group
\([^()]*\) Match from opening till closing parenthesis
| Or
[a-zA-Z]*\b Optionally match chars in the ranges a-zA-Z followed by a word boundary
) Close non capture group
Regex demo
EDIT2: With OP's edited samples(to match 0, 1, 0.1 OR (0.1g, .1mg, 100kg) OR .1(g/100ml), 100(mg/1kg)), adding following solution here. Explanation is same as very first solution, only thing is in spite of matching specific strings, I have changed regex to match any alphabets here.
(\d*\.?\d+)(?:[a-zA-Z]+|\([a-zA-Z]+(?:\/\d*(?:[a-zA-Z]+))?\)|(?:,\s+|$))
Online Demo for above regex
EDIT1: As per OP's comments to match .01c and 100(g/1000L) kind of examples adding following regex, which is small edit to 1st solution here.
(\d*\.?\d+)(?:g|cc|\(g(?:\/\d*(?:ml|L))?\)|(?:,\s+|$))
Online demo for above regex
With your shown samples, please try following regex here.
(\d*\.?\d+)(?:g|\(g(?:\/\d*ml)?\)|(?:,\s+|$))
Online demo for above regex
Explanation: Adding detailed explanation for above.
(\d*\.?\d+) ##Matching digits 0 or more occurrences followed by .(optional, followed by 1 or more digits occurrences here.
(?: ##Starting a non-capturing group here.
g| ##matching only g here OR.
\(g(?:\/\d*ml)?\)| ##Matching (g) OR (g/digits ml) here OR.
(?:,\s+|$) ##Matching comma followed by 1 or more spaces occurrences OR end of value here.
) ##Closing non-capturing group here.
try this:
[\d]?\.?\d+(?:g|(?<p>\()(?(p)g\/(?:\d+)?ml\)))?
Demo
For example, if I have the following strings:
99%89 (should match)
99%? (should match)
?%99 (should match)
?%? (should not match)
?%99%99 (should match)
99%99%99%? (should match)
essentially the first or second element can be a ? or a number, but both elements cannot be ?. I tried thinking of something like:
[0-9]*|[?](?!\?)[%][0-9]*|[?]
But this does not yield the correct answer, any help would be appreciated
With your shown samples, could you please try following.
^(?:(?:\?(?:(?:%\d+){1,})?)|(?:(?:(?:\d+%){1,})?\?(?:(?:%\d+){1,})?)|(?:\d+%\d+))$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^(?: ##Matching from starting of the value, starting a non-capturing group from here.
(?:\? ##Starting non-capturing group(one for understanding purposes) matching literal ? here.
(?:(?:%\d+){1,})? ##In a non capturing group looking for % with 1 or more occurrences of digits and matching this group match keeping it optional.
)| ##Closing one non-capturing group here, with OR condition here.
(?: ##Starting non-capturing group(two) here.
(?:(?:\d+%){1,})?\? ##Looking for digits with % one or more occurrences in a non-capturing group keeping it optional followed by ?
(?:(?:%\d+){1,})? ##Checking for % digits one or more occurrences in a non-capturing group keeping it optional followed by ?
)| ##Closing two non-capturing group here, with OR condition here.
(?:\d+%\d+) ##In a non-capturing group looking for 1 or more digits % one or more digits
)$ ##Closing 1st non-capturing group at the end of value.
Not sure if I am reading the question right, but as you tried using a negative lookahead you could assert that the string does not only contains % and/or ?
^(?![%?]+$)[\d?%]+$
Regex demo
Or without a lookahead:
^[%?]*\d[%?\d]*$
Regex demo
I have a string looks like this
#123##1234###2356####69
It starts with # and followed by any digits, every time the # appears, the number of # increases, first time 1, second time 2, etc.
It's similar to this regex, but since I don't know how long this pattern goes, so it's not very useful.
^#\d+##\d+###\d+$
I'm using PCRE regex engine, it allows recursion (?R) and conditions (?(1)...) etc.
Is there a regex to validate this pattern?
Valid
#123
#12##235
#1234##12###368
#1234##12###368####22235#####723356
Invalid
##123
#123###456
#123##456##789
I tried ^(?(1)(?|(#\1)|(#))\d+)+$ but it doesn't seem to work at all
You can do this using PCRE conditional sub-pattern matching:
^(?:((?(1)\1)#)\d+)++$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
(: Start capture group #1
(?(1)\1): if/then/else directive that means match back-reference \1 only if 1st capture group is available otherwise match null
#: Match an additional #
): End capture group #1
\d+: Match 1+ digits
)++: End non-capture group. Match 1+ of this non-capture group.
$: End
One option could be optionally matching a backreference to group 1 inside group 1 using a possessive quantifier \1?+# adding # on every iteration.
^(?:(\1?+#)\d+)++$
^ Start of string
(?: Non capture group
(\1?+#)\d+ Capture group 1, match an optional possessive backreference to what is already captured in group 1 and add matching a # followed by 1+ digits
)++ Close the non capture group and repeat 1+ times possessively
$ End of string
Regex demo
I think you can use forward-referencing here:
^(?:((?:\1(?!^)|^)#)\d+)+$
See the regex demo.
Details:
^ - start of string
(?:((?:\1(?!^)|^)#)\d+)+ - one or more occurrences of
((?:\1(?!^)|^)#) - Group 1 (the \1 value): start of string or an occurrence of the Group 1 value if it is not at the string start position
\d+ - one or more digits
$ - end of string.
NOTE: This technique does not work in regex flavors that do not support forward referencing, like ECMAScript based flavors (e.g. JavaScript, VBA, C++ std::regex)
Despite there are already working answers, and inspired by Wiktor's answer, I came up this idea:
(?:(^#|#\1)\d+)+$
Which is also quite short and effective(also works for non pcre environment).
See the test cases
I'm trying to formulate a regex that captures everything after last period, up until (not including) underscore number 3 AFTER the period.
For example:
ABC_Simple_DEF.dbo.GDE_1_1_Contact_test
should return GDE_1_1.
I've tried using [^.]+$ which includes everything after the last period.
The expression _[^_]+$ includes last underscore and everything after, which is close, but not exactly what I'm looking for.
Kinda stuck here and would appreciate any help
You may use
[^._]+(?:_[^._]+){2}(?=_[^.]*$)
Or, capturing approach (you will need to grab Group 1 value from the result):
([^._]+(?:_[^._]){2})_[^.]*$
See regex demo #1 and regex demo #2.
Details
[^._]+ - 1+ chars other than . and _
(?:_[^._]+){2} - two repetitions of
_ - an underscore
[^._]+ - 1+ chars other than . and _
(?=_[^.]*$) - a positive lookahead that requires _ and 0+ chars other than . up to the end of string immediately to the right of the current position.
If a negative lookbehind is supported, one option could be to assert what is on the left is a dot and use a negative lookahead to assert no more dots after the matched one:
(?<=\.)(?!.*\.)(?:[^_]+_){2}[^_]+
Explanation
(?<=\.) Negative behind, assert what is directly on the left is not a dot
(?!.*\.) Negative lookahead, assert not more dots following
(?: Non capturing group
[^_]+_ match 1+ times not an underscore, then an _
){2} Close non capturing group and repeat 2 times
[^_]+ Match 1+ times not an _
Regex demo
A slight variation over Wiktor's answer, that requires a last period and captures everything until the third underscore, or until the end if there are less than three (non-capturing groups dropped for clarity, test here) :
\.([^._]*(_[^._]*){0,2})[^.]*$
The target capture group is 1. To better visualize, suppose your input contains only underscores, periods, and the character c, then it becomes :
\.(c*(_c*){0,2})c*$
The straight "dumb" regex is:
([^.]*\.)*([^_]*_[^_]*_[^_]*).*
and you need group \1
Test here.
I need to extract 1234567 from below URLs
http://www.test.in/some--wonders-1234567---2
http://www.test.in/some--wonders-1234567
I tried with .*\-([0-9]+)(?:-{2,}2)?.
but for the first URL it returned 2, but this is in non-capturing group.
Please give me a solution. I am digging it for so long. not getting any idea.
Try this one:
.*?\-([0-9]+)(?:-{2,}2|$)
It sets lazy mode for first .* pattern, you can also remove it at all with same effect:
\-([0-9]+)(?:-{2,}2|$)
If your regex engine supports negative look behinds (some do not), you can do it this way:
(?<!\d+-+)\d+
It gives you any non-empty digit string, which is not preceded by (minuses followed by digits).
Big advantage is that you don't have to use groups here - regex itself returns what you want.
You could match a - followed by one or more digits which you could capture in a group ([0-9]+). This group will contain the value you want to extract.
Then an optional part (?:-{2,}[0-9]+)? that would match ---2 followed by asserting the end of the line $.
-(\d+)(?:-{2,}\d+)?$
Explanation
- Match literally
(\d+) Capture one or more digits in a group
(?: Non capturing group
-{2,} Match 2 or more times -
\d+ Match one or more digits
)? close non capturing group and make it optional
$ Assert position at the end of the line