I am learning about regular expressions and I'm trying to solve this question: https://regex.sketchengine.co.uk/cgi/ex1.cgi
So far, I've come up with:
^[psr][^ta|?!ea].*$
But instead of checking if it doesn't match 'ea' as a substring, it tries to not match 'e' and 'a' as a second character. What is my error in this?
Your regex is wrong, see its description:
NODE
EXPLANATION
^
the beginning of the string
[psr]
any character of: 'p', 's', 'r'
[^ta|?!ea]
any character except: 't', 'a', '
.*
any character except \n (0 or more times (matching the most amount possible))
$
before an optional \n, and the end of the string
Use
.*p[ioa ]t.*
See proof
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
p 'p'
--------------------------------------------------------------------------------
[ioa ] any character of: 'i', 'o', 'a', ' '
--------------------------------------------------------------------------------
t 't'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
In your pattern you do not allow matching pe se and re ruling out respite, but you only don't want to allow pe.
You could use a negative lookahead to rule out a p directly followed by one of your characters in the character class.
^(?!p[tea])[psr].*
The pattern matches:
^ Start of the string
(?!p[tea]) Negative lookahead, assert not pt or pe or pa directly to the right
[psr].* Match either p s r followed by 0+ times any character
Regex demo
Note that there are no | ? or ! in the example data.
Related
I need to detect only the word from a sentence where only combination of the numbers and letters exists by regex.
I am using this https://regex101.com/r/eSlu2I/1 ^[a-zA-Z0-9]* regex.
Here last two ones should be excluded.
Can anyone help me with this?
Use
^(?![a-zA-Z]+\b)[a-zA-Z0-9]*
See regex proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (0 or more times (matching the
most amount possible))
You can use the following regex:
\w*\d\w*
Explanation:
\w*: optional combination of alphanumeric characters
\d: digit
\w*: optional combination of alphanumeric characters
Try it here.
EDIT: In case you require the presence of at least one letter together with the number, you can instead use the following regex:
\w*(\d[A-Za-z]|[A-Za-z]\d)\w*
Explanation:
\w*: optional combination of alphanumeric characters
(\d[A-Za-z]|[A-Za-z]\d):
\d[A-Za-z]|: digit + alphabetical character or
[A-Za-z]\d: alphabetical character + digit
\w*: optional combination of alphanumeric characters
I'm working on adding a regex that determines whether a given input is valid. The input should be alpha numeric (underscores, dashes, periods also allowed) and between 1 and 60 characters. It should also contain a certain substring inside it (let's just say "foo.bar"). This is my attempt:
^.[a-zA-Z0-9_.-]{1,60}$
That does what I need, aside from the substring part. I'm not sure how to add the "the string must contain the substring foo.bar" requirement. FWIW I'm doing this in Ruby so I understand this means PCRE is being used.
As an example, this string should be valid:
aGreatStringWithfoo.barInIt1111
this shouldn't
aBadStringWithoutTheSubstringInIt
Use
^(?=.{1,60}$)[a-zA-Z0-9_.-]*foo\.bar[a-zA-Z0-9_.-]*$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.{1,60} any character except \n (between 1 and
60 times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
bar 'bar'
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
I need to identify all email addresses in a given cell enclosed in any special character, written in any number of multiple lines.
This is something that I built.
"(!\s<,;-)[a-zA-Z0-9]*#"
Is there any improvement?
The pattern (!\s<,;-)[a-zA-Z0-9]*# starts with capturing !\s<,;- literally. If you want to match 1 of the listed characters, you can use a character class [!\s<,;-] instead.
If you want to match xyz123 in xyz123#gmail.com you can use:
[a-zA-Z0-9]+(?=#)
The pattern matches
[a-zA-Z0-9]+ Match 1+ occurrences of any of the listed ranges
(?=#) Assert (not match) an # directly to the right of the current position
See a regex demo.
Use
([a-zA-Z0-9]\w*)#
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z0-9] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
# '#'
I have two strings below which i need to apply a regex function to in Google BigQuery with its desired outputs: Input:
MERCURE ENGAGEMENT_LaL_FB_TALENT:HENRIQUE_PORTUGAL_WEEK 4_IMAGE CAROUSEL_I19
MERCURE ENGAGEMENT_LaL_FB_UGC:_ENGLAND_TBC_WEEK 4_IMAGE CAROUSEL_I25
Output:
HENRIQUE
ENGLAND
I cannot use a reverse or positive look ahead within bigquery.
The closest I have gotten is the following:
:\D*
Which matches the word after the colon but before the white space.
Any ideas helpful
You might also use a capturing group with with REGEXP_EXTRACT.
:_?([^\s_]+)
Explanation
:_? Match : and an optional underscore
( Capture group 1
[^\s_]+ Match 1+ times any char other than a whitespace char or an underscore (Omit \s if there can also be spaces in between)
) Close group 1
Regex demo
You could also exclude matching an underscore from a word character which narrows down the range of accepted characters.
:_?([^\W_]+)
One approach uses REGEXP_REPLACE:
SELECT REGEXP_REPLACE(col, r'^.*:_?([^_]+)_.*$', r'\1') AS output
FROM yourTable;
Use
REGEXP_EXTRACT("column_name", r":[^a-zA-Z]*([a-zA-Z]+)")
See regex proof
Explanation
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[^a-zA-Z]* any character except: 'a' to 'z', 'A' to
'Z' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
I'm a beginner in Qt and C++ programming. I want to use a Regular expression validator in my line edit that doesn't allow to write dot(.) right after dot(.). This is my Regex that I've used :
QRegExp reName("[a-zA-Z][a-zA-Z0-9. ]+ ")
But this is not enough for my task. Please someone help me.
I'm looking for something like this - for example :
"camp.new." (accepted)
"camp..new" (not accepted)
"ca.mp.n.e.w" (accepted)
How about:
^[a-zA-Z](?:\.?[a-zA-Z0-9 ]+)+$
Explanation:
The regular expression:
^[a-zA-Z](?:\.?[a-zA-Z0-9 ]+)+$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
[a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z'
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
[a-zA-Z0-9 ]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', ' ' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
Generally speaking, what you want to do is to say that at each point you've got a ., it is not followed by another ., and otherwise everything is fine. A negative lookahead assertion is all you need here from the big bag of trickiness, but bear in mind that . is an RE metacharacter so there will be some backslashes too.
^(?:[^.]|\.(?!\.))*$
You might want adjust that further, of course.
In expanded form:
^ # Anchor at start
(?: # Start sub-RE
[^.] # Not a “.”
| # or...
\. (?! \. ) # a “.” if not followed by a “.”
)* # As many of the sub-RE as necessary
$ # Anchor at end
If you're RE engine anchors things anyway, you can simplify a little:
(?:[^.]|\.(?!\.))*