Regex to contain one of three characters? - regex

I need to write a regex that matches strings that has one of three characters say just x, y and z. I tried "[xyz]^" but it doesn't work. The string may containe any other characters but must contain at least one of the three given characters in any order or position

Regex Demo
\b\w*(x|y|z)\w*\b
Debuggex Demo
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\w* match any word character [a-zA-Z0-9_]
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
1st Capturing group (x|y|z)
1st Alternative: x
x matches the character x literally (case sensitive)
2nd Alternative: y
y matches the character y literally (case sensitive)
3rd Alternative: z
z matches the character z literally (case sensitive)
\w* match any word character [a-zA-Z0-9_]
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

This might be what you are looking for:
^.*[xyz].*$
Debuggex Demo

The following regex should match:
^.*[xyz].*$
In python :
>>> import re
>>> re.match(r'^.*[xyz].*$', 'AzE')
<_sre.SRE_Match object at 0x2643718>
>>> re.match(r'^.*[xyz].*$', 'AEz')
<_sre.SRE_Match object at 0x2643cc8>
>>> re.match(r'^.*[xyz].*$', 'AE')
>>>

Related

Validating emails in file with batch

I have a file with emails and I need to validate them.
The sequence is:
First name.
Dot.
Last name.
Number (optional - for same names).
static string domain(#utp.ac.pa).
I wrote this:
egrep -E [a-z]\.+[a-z][0-9]*#["utp.ac.pa"] test.txt
It should match this email: "anell.zheng#utp.ac.pa"
But it is also matching:
test4#utp.ac.pa
2anell#utp.ac.pa
Although they don't follow the sequence. What am I doing wrong?
Your regex doesn't even match the first email. If I understand your requirements correctly, this should work:
[A-Za-z]+\.[A-Za-z]+[0-9]*#utp\.ac\.pa
Note that to match a dot, it needs to be escaped (i.e., \.) because . matches any character.
You can get rid of A-Z if you don't want to match upper-case letters.
Try it online.
Let me know if this isn't what you want.
Regex: ^[A-Za-z]+\.[A-Za-z]+(?:_\d+)*#utp\.ac\.pa$
Demo
Regex Details:
^ asserts position at start of a line
Match a single character present in the list below [A-Za-z]+
. matches the character . literally (case sensitive)
Match a single character present in the list below [A-Za-z]+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:_\d+)*
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
_ matches the character _ literally (case sensitive)
\d+ matches a digit (equal to [0-9])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
#utp matches the characters #utp literally (case sensitive)
. matches the character . literally (case sensitive)
ac matches the characters ac literally (case sensitive)
. matches the character . literally (case sensitive)
pa matches the characters pa literally (case sensitive)
$ asserts position at the end of a line

Regex / grep match on: "not this" and "that"

From a Linux command line, I would like to find all the instances in multiple files where I do not reference a figure reference with Fig..
So I'm looking each line for when I don't preface \ref{fig with exactly Fig. .
Fig. \ref{fig:myFigure}
A sentence with Fig. \ref{fig:myFigure} there.
\ref{fig:myFigure}
A sentence with \ref{fig:myFigure} there.
The regex should ignore cases (1) and (2), but find cases (3) and (4).
You can use Negative Lookahead like:
^((?!Fig\. {0,1}\\ref\{fig).)*$
https://regex101.com/r/wSw9iI/2
Negative Lookahead (?!Fig\.\s*\\ref\{fig)
Assert that the Regex below does not match
Fig matches the characters Fig literally (case sensitive)
\. matches the character . literally (case sensitive)
\s* matches any whitespace character (equal to [\r\n\t\f\v ])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ literally (case sensitive)
ref matches the characters ref literally (case sensitive)
\{ matches the character { literally (case sensitive)
fig matches the characters fig literally (case sensitive)

regex findall to retrieve a substring based on start and end character

I have the following string:
6[Sup. 1e+02]
I'm trying to retrieve a substring of just 1e+02. The variable first refers to the above specified string. Below is what I have tried.
re.findall(' \d*]', first)
You need to use the following regex:
\b\d+e\+\d+\b
Explanation:
\b - Word boundary
\d+ - Digits, 1 or more
e - Literal e
\+ - Literal +
\d+ - Digits, 1 or more
\b - Word boundary
See demo
Sample code:
import re
p = re.compile(ur'\b\d+e\+\d+\b')
test_str = u"6[Sup. 1e+02]"
re.findall(p, test_str)
See IDEONE demo
import re
first = "6[Sup. 1e+02]"
result = re.findall(r"\s+(.*?)\]", first)
print result
Output:
['1e+02']
Demo
http://ideone.com/Kevtje
regex Explanation:
\s+(.*?)\]
Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 1 «(.*?)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “]” literally «\]»

How does this regex work when finding the last occurrence of a word?

I came across a regex like the following:
foo(?!.*foo)
if it is fed with foo bar bar foo, it will find the last occurrence of foo. I know it uses a mechanism called negative lookahead which means it will match a word which not end with characters after the ?!. But how does the regex here works?
Slightly different answer from sshashank (because the word containing in his answer doesn't work for me and in regex you have to be pedantic—it's all about precision.) I'm 100% sure sshashank knows this and only phrased it that way for brevity.
The regex matches foo, not followed (i.e., negative lookahead (?!) by this:
{{{any number of any characters (i.e., .*) then the characters foo}}}
If the lookahead fails, the portion corresponding to .* does not contain foo. foo comes later.
See this automatic translation:
NODE EXPLANATION
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
) end of look-ahead
The same in different words from regex101:
/foo(?!.*foo)/
foo matches the characters foo literally (case sensitive)
(?!.*foo) Negative Lookahead - Assert that it is impossible to match the regex below
.* matches any character (except newline)
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
foo matches the characters foo literally (case sensitive)
What does RegexBuddy have to say?
foo(?!.*foo)
foo(?!.*foo)
Match the character string “foo” literally (case sensitive) foo
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*foo)
Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character string “foo” literally (case sensitive) foo
It matches foo only if it is not followed (?!) by any more text (.*) containing foo in it.
Negative lookahead is essential if you want to match something not followed by something else.
Short explanation:
foo(?!.*foo) matches foo when not followed by any character except \n and `foo`
For example, say you have the following two strings.
foobar
barfoo
And the regular expression:
foo(?!bar)
This matches foo when not followed by bar so it would match the string barfoo here.

Perl regexp specific letters in string

input strings consists of letters I N P U Y X
-I have to verify that it only contains these letters and nothing else in PERL regexp
-verify that input also contains at least 2 occurrences of "NP" (without quotes)
example string:
INPYUXNPININNPXX
strings are all in uppercase
You can use this lookahead based regex in PCRE:
^(?=(?:.*?NP){2})[INPUYX]+$
Online Demo: http://regex101.com/r/zH3jQ3
Explanation:
^ assert position at start of a line
(?=(?:.*?NP){2}) Positive Lookahead - Assert that the regex below can be matched
(?:.*?NP){2} Non-capturing group
Quantifier: Exactly 2 times
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
NP matches the characters NP literally (case sensitive)
[INPUYX]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
INPUYX a single character in the list INPUYX literally (case sensitive)
$ assert position at end of a line
Use this:
^[INPUYX]*NP[INPUYX]*?NP[INPUYX]*$
See it in action: http://regex101.com/r/vI2xQ6
Effectively what we're doing here is allowing 0 or more of your character class, capturing the first (required) occurrence of NP, then ensuring that it occurs at least once again before the end of the string.
Hypothetically if you wanted to capture out the middle, you could do:
^(?=(?:(.*?)NP){2})[INPUYX]+$
Or as #ikegami points out (matching ONLY the single line) \A(?=(?:(.*?)NP){2})[INPUYX]+\z.
The cleanest solution is:
/^[INPUXY]*\z/ && /NP.*NP/s
The following is the most efficient as it avoids matching the string twice and it prevents backtracking on failure:
/
^
(?: (?:[IPUXY]|N[IUXY])* NP ){2}
[INPUXY]*
\z
/x
See in action
To capture what's between the two NP, you can use
/
^
(?:[IPUXY]|N[IUXY])* NP
( (?:[IPUXY]|N[IUXY])* ) NP
[INPUXY]*
\z
/x
See in action