Using ?=. in regular expression

Using ?=. in regular expression - regex

I saw the phrase
^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9_##%\*\-]{8,24}$
in regex, which was password checking mechanism. I read few courses about regular expressions, but I never saw combination ?=. explained.
I want know how it works. In the example it is searching for at least one capital letter, one small letter and one number. I guess it's something like "if".

(?=regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?= and ). To quote from the linked page:
lookaround actually matches characters, but then gives up the match,
returning only the result: match or no match. That is why they are
called "assertions". They do not consume characters in the string, but
only assert whether a match is possible or not. Lookaround allows you
to create regular expressions that are impossible to create without
them, or that would get very longwinded without them.
The . is not part of the lookahead, because it matches any single character that is not a line terminator.

Although i am a newbie to regex but what i understand about the above regex is
1- ?= is positive lookahead i.e. it matches the expression by looking ahead and sees if there is any pattern that matches your search paramater like [A-Z]
2- .* makes sure that they can be 0 or more number of characters before your matching expression i.e. it makes sure that u can lookahead till the end of the input string to find a match.
In short * is a quantifier which says 0 or more so if:
For instance u changed * with ? for [A-Z] part then your expression will only return true if ur 1st or 2nd letter is capital. OR if u changed it with + then ur expression will return true if any letter other than the first is a capital letter

^ asserts position at start of the string
Positive Lookahead (?=\D*\d)
Assert that the Regex below matches
\D matches any character that's not a digit (equivalent to [^0-9])
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
Positive Lookahead (?=[^a-z]*[a-z])
Assert that the Regex below matches
Match a single character not present in the list below [^a-z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Match a single character present in the list below [a-z]
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Positive Lookahead (?=[^A-Z]*[A-Z])
Assert that the Regex below matches
Match a single character not present in the list below [^A-Z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [A-Z]
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
. matches any character (except for line terminators)
{8,30} matches the previous token between 8 and 30 times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

Related

Validating emails in file with batch

I have a file with emails and I need to validate them.
The sequence is:
First name.
Dot.
Last name.
Number (optional - for same names).
static string domain(#utp.ac.pa).
I wrote this:
egrep -E [a-z]\.+[a-z][0-9]*#["utp.ac.pa"] test.txt
It should match this email: "anell.zheng#utp.ac.pa"
But it is also matching:
test4#utp.ac.pa
2anell#utp.ac.pa
Although they don't follow the sequence. What am I doing wrong?

Your regex doesn't even match the first email. If I understand your requirements correctly, this should work:
[A-Za-z]+\.[A-Za-z]+[0-9]*#utp\.ac\.pa
Note that to match a dot, it needs to be escaped (i.e., \.) because . matches any character.
You can get rid of A-Z if you don't want to match upper-case letters.
Try it online.
Let me know if this isn't what you want.

Regex: ^[A-Za-z]+\.[A-Za-z]+(?:_\d+)*#utp\.ac\.pa$
Demo
Regex Details:
^ asserts position at start of a line
Match a single character present in the list below [A-Za-z]+
. matches the character . literally (case sensitive)
Match a single character present in the list below [A-Za-z]+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:_\d+)*
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
_ matches the character _ literally (case sensitive)
\d+ matches a digit (equal to [0-9])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
#utp matches the characters #utp literally (case sensitive)
. matches the character . literally (case sensitive)
ac matches the characters ac literally (case sensitive)
. matches the character . literally (case sensitive)
pa matches the characters pa literally (case sensitive)
$ asserts position at the end of a line

How to exclude comma (,) in regex?

I came to scenario where I only want [0-9 or .] For that I used this regex:
[0-9.]$
This regex accepts 0-9 and . (dot for decimal). But when I write something like this
1,1
It also accepts comma (,). How can I avoid this?

Once you are looking into a way to parse numbers (you said dot is for decimals), maybe you don't want your string to start with dot neither ending with it, and must accept only one dot. If this is your case, try using:
^(\d+\.?\d+|\d)$
where:
\d+ stands for any digit (one or more)
\.? stands for zero or one of literal dot
\d stands for any digit (just one)
You can see it working here
Or maybe you'd like to accept strings starting with a dot, which is normally accepted being 0 as integer part, in this case you can use ^\d*\.?\d+$.

This regex [0-9.]$ consists of a character class that matches a digit or a dot at the end of a line $.
If you only want to match a digit or a dot you could add ^ to assert the position at the start of a line:
^[0-9.]$
If you want to match one or more digits, a dot and one or more digits you could use:
^[0-9]+\.[0-9]+$

This regex may help you:
/[0-9.]+/g
Accepts 0 to 9 digits and dot(.).
Explanation:
Match a single character present in the list below [0-9.]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
. matches the character . literally (case sensitive)
You can test it here

How to remove specific data along with the angle brackets

I need to remove a specific data from xml file along with data, I tried lot but couldn't get the right approach. Please help me out.
Example Input:
<isOurAccount>false</isOurAccount><maturityDate/><openedDate/><valuationAmount>0<valuationAmount><value>0</value>
Expected output:
<isOurAccount>false</isOurAccount><valuationAmount>0<valuationAmount><value>0</value>
Similarly for rest of the elements for pattern <somevalue/>
Couldn't get the specific regular expression.
Thanks

Using a simple RegEx replace action to substitute with empty "" all occurrences of <[a-zA-Z]+ *\/> should suffice.
RegEx description retrieved from https://regex101.com
< matches the characters < literally
[a-zA-Z]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
(space)* matches the space character literally
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\/ matches the escaped character / literally
> matches the characters > literally

Regular Expression for alphabets,numbers,spaces and underscores

How can I create a regex expression that will match only letters and numbers, one space between each word and underscores?
Good Examples:
Vamshi1
vamshi_pendota
vamshi pendota
Bad Examples:
vam shi1
vam_shi pendota

You should use a regex tester site like http://regex101.com/
You can enter in your examples, and use the quick reference to help you construct the correct regular expression.

With this simple regex:
^[a-zA-Z0-9]+(?:[ _][a-zA-Z0-9]+)?$
See demo
Option 2 for capitalization
If only the first letter of each word can be a capital letter, use
^[A-Z]?[a-z0-9]+(?:[ _][A-Z]?[a-z0-9]+)?$
What it means
^[a-zA-Z0-9]+(?:[ _][a-zA-Z0-9]+)?$
Assert position at the beginning of the string ^
Match a single character present in the list below [a-zA-Z0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “a” and “z” (case sensitive) a-z
A character in the range between “A” and “Z” (case sensitive) A-Z
A character in the range between “0” and “9” 0-9
Match the regular expression below (?:[ _][a-zA-Z0-9]+)?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character from the list “ _” [ _]
Match a single character present in the list below [a-zA-Z0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
A character in the range between “a” and “z” (case sensitive) a-z
A character in the range between “A” and “Z” (case sensitive) A-Z
A character in the range between “0” and “9” 0-9
Assert position at the end of the string, or before the line break at the end of the string, if any (line feed) $

Unless you provide any further information, I suspect that what you are after cannot be achieved through a regular expression.
Regular expressions are used to match patterns of strings. In your case, the Good and Bad cases you want to match look the same from a pattern perspective.
Assuming that Vamshi is a valid name but Vam shi is not (despite both having alpha numeric characters and one white space) in your language, I suspect you need to look at a dictionary implementation and not simply a regular expression one.
EDIT: After seeing your change, something like so should work for you: ^[a-z0-9_]+(\s[a-z0-9_]+)*$. The regular expression should expect the string to start with one or more lower case letters and/or underscores optionally followed by a white space and more text.

Regex to match URL end-of-line or "/" character

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:
http://server/xyz/2008-10-08-4
http://server/xyz/2008-10-08-4/
http://server/xyz/2008-10-08-4/123/more
But not match something like this:
http://server/xyz/2008-10-08-4-1
So, I thought my best bet was something like this:
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]
where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

To match either / or end of content, use (/|\z)
This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).
To put that with an updated version of what you had:
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)
Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

You've got a couple regexes now which will do what you want, so that's adequately covered.
What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$
1st Capturing Group (.+)
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (\d{4}-\d{2}-\d{2})
\d{4} matches a digit (equal to [0-9])
{4} Quantifier — Matches exactly 4 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
\d{2} matches a digit (equal to [0-9])
{2} Quantifier — Matches exactly 2 times
- matches the character - literally (case sensitive)
3rd Capturing Group (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
4th Capturing Group (.*)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string

In Ruby and Bash, you can use $ inside parentheses.
/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)
(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using ?=. in regular expression - regex

Related

Validating emails in file with batch

How to exclude comma (,) in regex?

How to remove specific data along with the angle brackets

Regular Expression for alphabets,numbers,spaces and underscores

Regex to match URL end-of-line or "/" character

Categories

Resources