I have two strings below which i need to apply a regex function to in Google BigQuery with its desired outputs: Input:
MERCURE ENGAGEMENT_LaL_FB_TALENT:HENRIQUE_PORTUGAL_WEEK 4_IMAGE CAROUSEL_I19
MERCURE ENGAGEMENT_LaL_FB_UGC:_ENGLAND_TBC_WEEK 4_IMAGE CAROUSEL_I25
Output:
HENRIQUE
ENGLAND
I cannot use a reverse or positive look ahead within bigquery.
The closest I have gotten is the following:
:\D*
Which matches the word after the colon but before the white space.
Any ideas helpful
You might also use a capturing group with with REGEXP_EXTRACT.
:_?([^\s_]+)
Explanation
:_? Match : and an optional underscore
( Capture group 1
[^\s_]+ Match 1+ times any char other than a whitespace char or an underscore (Omit \s if there can also be spaces in between)
) Close group 1
Regex demo
You could also exclude matching an underscore from a word character which narrows down the range of accepted characters.
:_?([^\W_]+)
One approach uses REGEXP_REPLACE:
SELECT REGEXP_REPLACE(col, r'^.*:_?([^_]+)_.*$', r'\1') AS output
FROM yourTable;
Use
REGEXP_EXTRACT("column_name", r":[^a-zA-Z]*([a-zA-Z]+)")
See regex proof
Explanation
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[^a-zA-Z]* any character except: 'a' to 'z', 'A' to
'Z' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
Related
I need to detect only the word from a sentence where only combination of the numbers and letters exists by regex.
I am using this https://regex101.com/r/eSlu2I/1 ^[a-zA-Z0-9]* regex.
Here last two ones should be excluded.
Can anyone help me with this?
Use
^(?![a-zA-Z]+\b)[a-zA-Z0-9]*
See regex proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (0 or more times (matching the
most amount possible))
You can use the following regex:
\w*\d\w*
Explanation:
\w*: optional combination of alphanumeric characters
\d: digit
\w*: optional combination of alphanumeric characters
Try it here.
EDIT: In case you require the presence of at least one letter together with the number, you can instead use the following regex:
\w*(\d[A-Za-z]|[A-Za-z]\d)\w*
Explanation:
\w*: optional combination of alphanumeric characters
(\d[A-Za-z]|[A-Za-z]\d):
\d[A-Za-z]|: digit + alphabetical character or
[A-Za-z]\d: alphabetical character + digit
\w*: optional combination of alphanumeric characters
I am trying to write a regex expression in PCRE which captures the first part of a word and excludes the second portion. The first portion needs to accommodate different values depending upon where the transaction is initiated from. Here is an example:
Raw Text:
.controller.CustomerDemographicsController
Regex Pattern Attempted:
\.controller\.(?P<Controller>\w+)
Results trying to achieve (in bold is the only content I want to save in the named capture group):
.controller.CustomerDemographicsController
NOTE: I've attempted to exclude using ^, lookback, and lookforward.
Any help is greatly appreciated.
You can match word chars in the Controller group up to the last uppercase letter:
\.controller\.(?P<Controller>\w+)(?=\p{Lu})
See the regex demo. Details:
\.controller\. - a .controller\. string
(?P<Controller>\w+) - Named capturing group "Controller": one or more word chars as many as possible
(?=\p{Lu}) - the next char must be an uppercase letter.
Note that (?=\p{Lu}) makes the \w+ stop before the last uppercase letter because the \w+ pattern is greedy due to the + quantifier.
Also, use
\.controller\.(?P<Controller>[A-Za-z]+)[A-Z]
See proof.
EXPLANATION:
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
controller 'controller'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
(?P<Controller> group and capture to Controller:
--------------------------------------------------------------------------------
[A-Za-z]+ any character of: 'A' to 'Z', 'a' to 'z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of Controller group
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
I have these two sentence
TAGGING ODP:-7.160792, 113.496069
TAGGING pel:-7.160792, 113.496069
I want to match -7.160792 part only if the full sentence contain "odp" in it.
I tried the following (?(?=odp)-\d+.\d+) but it doesn't work, i don't know why.
Any help is appreciated.
(?(?=odp)-\d+\.\d+) won't work because (?=odp) is a positive lookahead that imposes a constraint on the pattern on the right, -\d+\.\d+. Namely, it requires odp string to occur exactly at the same location where - and a number are expected.
Use
(?<=ODP:)-\d+\.\d+
ODP:(-\d+\.\d+)
If lookbehinds are supported, the first variant is more viable.
Otherwise, another option with capturing groups is good to use.
And if odp can appear anywhere, even after the number:
(?i)^(?=.*odp).*(-\d+\.\d+)
This will capture the value into a group.
EXPLANATION
--------------------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
odp 'odp'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
You can use the regex, (?i)(?<=odp:)[^,]*.
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
[^,]*: Anything but ,
👉 If you want the match to be restricted to numbers only, you can use the regex, (?i)(?<=odp:)(?:-\d+.\d+)
Explanation:
(?i): Case-insenstitive flag
(?<=odp:): Positive lookbehind for odp:
(?:: Start non capturing group
-: Literal, -
\d+: 1+ digit(s)
.\d+: . followed by 1+ digit(s)
): End non capturing group
👉 If the sign can be either + or -, you can use the regex, (?i)(?<=odp:)(?:[+-]\d+.\d+)
The pattern (?(?=odp)\-\d+\.\d+) is using a conditional (? stating in the if clause:
If what is directly to the right from the current position is odp,
then match -\d+.\d+
That can not match.
What you also could do is match odp followed by any char other than a digit using \D* and capture the digit part in a group.
\bodp\b\D*(-\d+\.\d+)\b
The pattern matches:
\bodp\b match odp between word boundaries to prevent a partial match
\D* Optionally match any char other than a digit
(-\d+\.\d+) Capture - and 1+ digits with a decimal part in group 1
\b A word boundary
Regex demo
(?<=ODP:)(-\d+.\d+)
You can try using the negative look behind.
This should solve for the code you ve provided.
I need to identify all email addresses in a given cell enclosed in any special character, written in any number of multiple lines.
This is something that I built.
"(!\s<,;-)[a-zA-Z0-9]*#"
Is there any improvement?
The pattern (!\s<,;-)[a-zA-Z0-9]*# starts with capturing !\s<,;- literally. If you want to match 1 of the listed characters, you can use a character class [!\s<,;-] instead.
If you want to match xyz123 in xyz123#gmail.com you can use:
[a-zA-Z0-9]+(?=#)
The pattern matches
[a-zA-Z0-9]+ Match 1+ occurrences of any of the listed ranges
(?=#) Assert (not match) an # directly to the right of the current position
See a regex demo.
Use
([a-zA-Z0-9]\w*)#
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z0-9] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
# '#'
I'm a beginner in Qt and C++ programming. I want to use a Regular expression validator in my line edit that doesn't allow to write dot(.) right after dot(.). This is my Regex that I've used :
QRegExp reName("[a-zA-Z][a-zA-Z0-9. ]+ ")
But this is not enough for my task. Please someone help me.
I'm looking for something like this - for example :
"camp.new." (accepted)
"camp..new" (not accepted)
"ca.mp.n.e.w" (accepted)
How about:
^[a-zA-Z](?:\.?[a-zA-Z0-9 ]+)+$
Explanation:
The regular expression:
^[a-zA-Z](?:\.?[a-zA-Z0-9 ]+)+$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z'
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
[a-zA-Z0-9 ]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', ' ' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
Generally speaking, what you want to do is to say that at each point you've got a ., it is not followed by another ., and otherwise everything is fine. A negative lookahead assertion is all you need here from the big bag of trickiness, but bear in mind that . is an RE metacharacter so there will be some backslashes too.
^(?:[^.]|\.(?!\.))*$
You might want adjust that further, of course.
In expanded form:
^ # Anchor at start
(?: # Start sub-RE
[^.] # Not a “.”
| # or...
\. (?! \. ) # a “.” if not followed by a “.”
)* # As many of the sub-RE as necessary
$ # Anchor at end
If you're RE engine anchors things anyway, you can simplify a little:
(?:[^.]|\.(?!\.))*