I need to detect only the word from a sentence where only combination of the numbers and letters exists by regex.
I am using this https://regex101.com/r/eSlu2I/1 ^[a-zA-Z0-9]* regex.
Here last two ones should be excluded.
Can anyone help me with this?
Use
^(?![a-zA-Z]+\b)[a-zA-Z0-9]*
See regex proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (0 or more times (matching the
most amount possible))
You can use the following regex:
\w*\d\w*
Explanation:
\w*: optional combination of alphanumeric characters
\d: digit
\w*: optional combination of alphanumeric characters
Try it here.
EDIT: In case you require the presence of at least one letter together with the number, you can instead use the following regex:
\w*(\d[A-Za-z]|[A-Za-z]\d)\w*
Explanation:
\w*: optional combination of alphanumeric characters
(\d[A-Za-z]|[A-Za-z]\d):
\d[A-Za-z]|: digit + alphabetical character or
[A-Za-z]\d: alphabetical character + digit
\w*: optional combination of alphanumeric characters
Related
I am trying to write a regex expression in PCRE which captures the first part of a word and excludes the second portion. The first portion needs to accommodate different values depending upon where the transaction is initiated from. Here is an example:
Raw Text:
.controller.CustomerDemographicsController
Regex Pattern Attempted:
\.controller\.(?P<Controller>\w+)
Results trying to achieve (in bold is the only content I want to save in the named capture group):
.controller.CustomerDemographicsController
NOTE: I've attempted to exclude using ^, lookback, and lookforward.
Any help is greatly appreciated.
You can match word chars in the Controller group up to the last uppercase letter:
\.controller\.(?P<Controller>\w+)(?=\p{Lu})
See the regex demo. Details:
\.controller\. - a .controller\. string
(?P<Controller>\w+) - Named capturing group "Controller": one or more word chars as many as possible
(?=\p{Lu}) - the next char must be an uppercase letter.
Note that (?=\p{Lu}) makes the \w+ stop before the last uppercase letter because the \w+ pattern is greedy due to the + quantifier.
Also, use
\.controller\.(?P<Controller>[A-Za-z]+)[A-Z]
See proof.
EXPLANATION:
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
controller 'controller'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
(?P<Controller> group and capture to Controller:
--------------------------------------------------------------------------------
[A-Za-z]+ any character of: 'A' to 'Z', 'a' to 'z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of Controller group
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
I need to identify all email addresses in a given cell enclosed in any special character, written in any number of multiple lines.
This is something that I built.
"(!\s<,;-)[a-zA-Z0-9]*#"
Is there any improvement?
The pattern (!\s<,;-)[a-zA-Z0-9]*# starts with capturing !\s<,;- literally. If you want to match 1 of the listed characters, you can use a character class [!\s<,;-] instead.
If you want to match xyz123 in xyz123#gmail.com you can use:
[a-zA-Z0-9]+(?=#)
The pattern matches
[a-zA-Z0-9]+ Match 1+ occurrences of any of the listed ranges
(?=#) Assert (not match) an # directly to the right of the current position
See a regex demo.
Use
([a-zA-Z0-9]\w*)#
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z0-9] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
# '#'
Basically I want to match this:
So this. So that. [this should match]
Yes this. No that. [this shouldn't match]
I thought this would work:
(\b(\w+)\1\b.*){2,}
But right now, it's matching the second line too: https://regexr.com/5jhag
Why is this and how to fix it?
Match if the line has two or more of the same capitalized word
As you want to match capitalized words only a \w is not right because it matches [a-zA-Z0-9_] characters. Also using \1 just after the capture group means consecutive repeats only. Finally \b is also required around matches.
You may use this regex:
\b([A-Z]\w*)\b.*\b\1\b
RegEx Demo
RegEx Details:
\b: Word boundary
([A-Z]\w*): Match a capitalize word that start with uppercase letter followed by 0 or more of any word characters
\b: Word boundary
.*: Match 0 or more of any characters
\b\1\b: Match same word as what we captured in group #1 surrounded with word boundaries
(\b(\w+)\1\b.*){2,} is a repeated capturing group. \1 is a backreference that references the value of the group it is defined in and it is always assigned an empty string, at each iteration. Note: if you were to test with PCRE engine, there would be no match, see proof, because \1 is not empty, it is null and there is no match.
Your regex matches Yes this. No that. because the current expression is equal to (\b(\w+)\b.*){2,} and matches any word, then any text, two times or more.
Use
.*\b([A-Z][a-zA-Z]+)\b.*\b\1\b.*
See proof.
Unicode version:
.*\b(\p{Lu}\p{L}+)\b.*\b\1\b.*
See another proof.
Explanation
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
I have two strings below which i need to apply a regex function to in Google BigQuery with its desired outputs: Input:
MERCURE ENGAGEMENT_LaL_FB_TALENT:HENRIQUE_PORTUGAL_WEEK 4_IMAGE CAROUSEL_I19
MERCURE ENGAGEMENT_LaL_FB_UGC:_ENGLAND_TBC_WEEK 4_IMAGE CAROUSEL_I25
Output:
HENRIQUE
ENGLAND
I cannot use a reverse or positive look ahead within bigquery.
The closest I have gotten is the following:
:\D*
Which matches the word after the colon but before the white space.
Any ideas helpful
You might also use a capturing group with with REGEXP_EXTRACT.
:_?([^\s_]+)
Explanation
:_? Match : and an optional underscore
( Capture group 1
[^\s_]+ Match 1+ times any char other than a whitespace char or an underscore (Omit \s if there can also be spaces in between)
) Close group 1
Regex demo
You could also exclude matching an underscore from a word character which narrows down the range of accepted characters.
:_?([^\W_]+)
One approach uses REGEXP_REPLACE:
SELECT REGEXP_REPLACE(col, r'^.*:_?([^_]+)_.*$', r'\1') AS output
FROM yourTable;
Use
REGEXP_EXTRACT("column_name", r":[^a-zA-Z]*([a-zA-Z]+)")
See regex proof
Explanation
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[^a-zA-Z]* any character except: 'a' to 'z', 'A' to
'Z' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
This is the first question I post so sorry for anything I might screw up.
I've spent the past hour experimenting and searching for a way to replace "not 8 character (digit/a-z/A-Z) long word" with blank space (or in other words delete anything but those words) in Notepad++ through regular expressions.
I managed to bookmark the lines containing them but I'm stuck with the whole line that has that word, I just want those specific words. I'd appreciate any help, thanks a lot!
Edit2: A better way to phrase this is:
To remove anything that isn't: 8 character long that starts with an S and only contains digits and letters. In other words, remove anything that isn't S******* where *=digit,letter
Edit: I realized that's not enough to understand the situation so here's an example. I want to process this:
Here's your first code: S284JF2B
Here's your second code: SKE093JF
Here's your third code: S28fka30
And get this output:
S284JF2B
SD34EQ5M
SASFKA30
The actual file has lots of other characters that are not just digits/letters and the codes I want on the output are always 8 character long (digits/Uppercase letters) always starting with an S.
I have two possible solutions. Both solutions require the string to be 8 characters long and begin with an S.
Given the this sample text:
the problem is it's not the words that do not contain any words that I don't want
but actually any string that isn't a string that starts with an S and is 8 character long.
Example: S294KS12 this is the type of string I want on the document. Contains 8 characters
that are either digits or letters and starts with an S
SOMETIME
S294KS12
S1234567
S123456A
Option 1
This solution only finds strings which are 8 characters long and start with an S.
\bS[A-Z0-9]{7}\b
Live Demo
https://regex101.com/r/lK0aO9/1
Matches from Sample
S294KS12
SOMETIME
S294KS12
S1234567
S123456A
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
S 'S'
----------------------------------------------------------------------
[A-Z0-9]{7} any character of: 'A' to 'Z', '0' to '9'
(7 times)
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
Options 2
This solution does additional checking to ensure there is at least one additional letter and one number.
\bS(?=[A-Z]*[0-9])(?=[0-9]*[A-Z])[A-Z0-9]{7}\b
Live Demo
https://regex101.com/r/vH4lX2/3
Matches from Sample
S294KS12
S294KS12
S123456A
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
S 'S'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[A-Z]* any character of: 'A' to 'Z' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[0-9]* any character of: '0' to '9' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[A-Z0-9]{7} any character of: 'A' to 'Z', '0' to '9'
(7 times)
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
Putting all together
To replace everything else, then I'd incoprporate the regular expression into ( ... )\s?|. Which will match everything, including the desired strings.
If you then use $1 in the Replace with option in Notepad++, then you'll be left with just your desired strings.
I recommend using option 2 above, and inserting that into the expression so it looks like this:
(\bS(?=[A-Z]*[0-9])(?=[0-9]*[A-Z])[A-Z0-9]{7}\b)\s?|.
Replace with: $1
Live Demo
https://regex101.com/r/gO7zV7/1
Description
Lacking any proper examples, this will find substrings that are 8 characters long and not containing any letters. The substring must be bracketed by either whitespace or at the beginning or end of the string
(?<=\s|^)[^a-zA-Z0-9\s]{8}(?=\s|$)
Example
Live Demo
https://regex101.com/r/gS9uN7/1
Sample text
I've spent the past hour experimenting and searching for a way to replace "not 8 character (digit/a-z/A-Z) $##!#$>< fd long word" with blank space (or in other words delete anything but those words) in Notepad++ through regular expressions.
Sample Matches
$##!#$><
After Replacement
I've spent the past hour experimenting and searching for a way to replace "not 8 character (digit/a-z/A-Z) fd long word" with blank space (or in other words delete anything but those words) in Notepad++ through regular expressions.
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
(?<= look behind to see if there is:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
^ after an optional start of the string
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
[^a-zA-Z0-9\s]{8} any character except: 'a' to 'z', 'A' to
'Z', '0' to '9', whitespace (\n, \r, \t, \f, and " ")
(8 times)
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
To match everything but the tokens from your example:
(^|\s)(?!S\w{7}\b)\S*
For a live demo, see https://regex101.com/r/rW8mF0/4
To match any non-8 character word:
\b\w{1,7}\b|\b\w{9,}\b
It matches words of length 1 - 7 OR words of length 9 and more.
For a live demo, see https://regex101.com/r/fX2sE5/1
I have tried with following text to match your problem with the additional non-alphanumeric characters.
<protocol="toto" john="doe" Here's your first code: S284JF2B sign="+" />
<protocol="toto" john="doe" Here's your second code: SKE093JF sign="+" />
<protocol="toto" john="doe" 8char="s2345678" Here's your third code: S28fka30 sign="+" />
I used the following regular expression
\b(\w{1,7}|(?=[^S])\w{8}|S(?![A-Za-z0-9]{8})\w{8}|\w{9,})\b|[^\w\r\n]
I obtained only the codes when replacing with nothing and selected the option "Match case" in the replacement windows.
With a live demo here: https://regex101.com/r/wW3eB1
Explanations:
\b(...)\b: word between boundaries
\w{1,7}: word of 1 to 7 letters
(?=[^S])\w{8}: word of 8 letters not starting by 'S'
S(?![A-Za-z0-9]{8})\w{8}: word starting with a 'S', with 8 characters but containing something other than alpha-numeric (i.e. an underscore)
|\w{9,}: word of 9 letters or more
[^\w\r\n]: character that is neither a word or an EOL character
I'd use the following regex ^.*(S[A-Z0-9]{7})(?!=[A-Z0-9]).*$:
Ctrl+H
Find what: ^.*(S[A-Z0-9]{7})(?!=[A-Z0-9]).*$
Replace with: $1
DO NOT check . matches newline
Replace all
Explanation:
^ : begining of line
.* : any character 0 or more times
( : start group 1
S[A-Z0-9]{7}: S followed by 7 alphanumeric characters
) : end group
(?!=[A-Z0-9]) : negative lookahead to make sure there are no alphanum after
.* : any character 0 or more times
$ : end of line