How to test url string using regular expression - regex

Below is my code
/config\/info\/newplan/.test(string)
which will return true when find /config/info/newplan/ in string.
However, I would like to test different condition in the same time like below
/config\/info\/newplan/.test(string) || /config\/info\/oldplan/.test(string) || /config\/info\/specplan/.test(string)
which will return true if the string end up with either "newplan" or "oldplan" or "specplan"
My question is how to make a better code and not write "/config/\info/\xxxx\ so many times?

Use an alternation group:
/config\/info\/(?:new|old|spec)plan/.test(string)
^^^^^^^^^^^^^^^
See the regex demo.
Pattern details:
config\/info\/ - a literal config/info/ substring
(?:new|old|spec) - a non-capturing group (where | separates alternatives) matching any one of the substrings: new, old or spec
plan - a literal plan substring

this would be your bet
config\/info\/(newplan|oldplan|specplan)\/
OR
config\/info\/(newplan|oldplan|specplan)\/.test(string)
please see the example at [https://regex101.com/r/NyP1HP/1] as it doesn't allow other possibilities like following
/config/info/new1plan/
/config/info/newoldplan/
/config/info/specplan1/

Related

Regex, search for prefix, excluding suffix [duplicate]

How do I put a regular expression to check if a string starts with certain pattern and is NOT ending with certain pattern.
Example:
Must StartsWith: "US.INR.USD.CONV"
Should not end with: ".VALUE"
Passes Regex: "US.INR.USD.CONV.ABC.DEF.FACTOR"
Fails Regex Check: "US.INR.USD.CONV.ABC.DEF.VALUE"
I am using C#.
You can use this regex based on negative lookahead:
^US\.INR\.USD\.CONV(?!.*?\.VALUE$).*$
RegEx Demo
Explanation:
^US\.INR\.USD\.CONV - Match US.INR.USD.CONV at start of input
(?!.*?\.VALUE$) - Negative lookahead to make sure line is not ending with .value
^US\.INR\.USD\.CONV.*(?<!\.VALUE)$
Try this.See demo.
https://regex101.com/r/fA6wE2/26
Just use a negative lookbehind to make .VALUE is not before $ or end of string.
(?<!\.VALUE)$ ==>Makes sure regex engine looks behind and checks if `.VALUE` is not there when it reaches the end of string.
You don't need regular expressions for that. You can just use String.StartsWith and String.EndsWith
if(val.StartsWith("US.INR.USD.CONV") && !val.EndsWith(".VALUE"))
{
// valid
}
And as you mention in your comment to anubhava's answer you can do this to check for ".PERCENT" at the end as well.
if(val.StartsWith("US.INR.USD.CONV") &&
!val.EndsWith(".VALUE") &&
!val.EndsWith(".PERCENT"))
{
// valid
}
IMHO this makes the code much more readable and will almost definitely perform faster as well.

Match two regex pattern in single regex [duplicate]

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well?
Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.
Use a non-consuming regular expression.
The typical (i.e. Perl/Java) notation is:
(?=expr)
This means "match expr but after that continue matching at the original match-point."
You can do as many of these as you want, and this will be an "and." Example:
(?=match this expression)(?=match this too)(?=oh, and this)
You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.
You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:
(?=.*word1)(?=.*word2)(?=.*word3)
The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.
In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:
/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m
The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.
Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:
/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m
Look at this example:
We have 2 regexps A and B and we want to match both of them, so in pseudo-code it looks like this:
pattern = "/A AND B/"
It can be written without using the AND operator like this:
pattern = "/NOT (NOT A OR NOT B)/"
in PCRE:
"/(^(^A|^B))/"
regexp_match(pattern,data)
The AND operator is implicit in the RegExp syntax.
The OR operator has instead to be specified with a pipe.
The following RegExp:
var re = /ab/;
means the letter a AND the letter b.
It also works with groups:
var re = /(co)(de)/;
it means the group co AND the group de.
Replacing the (implicit) AND with an OR would require the following lines:
var re = /a|b/;
var re = /(co)|(de)/;
You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.
You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):
(abc)|(bca)|(acb)|(bac)|(cab)|(cba)
However, this makes a very long and probably inefficient regexp, if you have more than couple terms.
If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.
Is it not possible in your case to do the AND on several matching results? in pseudocode
regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...
Why not use awk?
with awk regex AND, OR matters is so simple
awk '/WORD1/ && /WORD2/ && /WORD3/' myfile
The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.
What you want to do is not possible with a single regexp.
If you use Perl regular expressions, you can use positive lookahead:
For example
(?=[1-9][0-9]{2})[0-9]*[05]\b
would be numbers greater than 100 and divisible by 5
In addition to the accepted answer
I will provide you with some practical examples that will get things more clear to some of You. For example lets say we have those three lines of text:
[12/Oct/2015:00:37:29 +0200] // only this + will get selected
[12/Oct/2015:00:37:x9 +0200]
[12/Oct/2015:00:37:29 +020x]
See demo here DEMO
What we want to do here is to select the + sign but only if it's after two numbers with a space and if it's before four numbers. Those are the only constraints. We would use this regular expression to achieve it:
'~(?<=\d{2} )\+(?=\d{4})~g'
Note if you separate the expression it will give you different results.
Or perhaps you want to select some text between tags... but not the tags! Then you could use:
'~(?<=<p>).*?(?=<\/p>)~g'
for this text:
<p>Hello !</p> <p>I wont select tags! Only text with in</p>
See demo here DEMO
You could pipe your output to another regex. Using grep, you could do this:
grep A | grep B
((yes).*(no))|((no).*(yes))
Will match sentence having both yes and no at the same time, regardless the order in which they appear:
Do i like cookies? **Yes**, i do. But milk - **no**, definitely no.
**No**, you may not have my phone. **Yes**, you may go f yourself.
Will both match, ignoring case.
Use AND outside the regular expression. In PHP lookahead operator did not not seem to work for me, instead I used this
if( preg_match("/^.{3,}$/",$pass1) && !preg_match("/\s{1}/",$pass1))
return true;
else
return false;
The above regex will match if the password length is 3 characters or more and there are no spaces in the password.
Here is a possible "form" for "and" operator:
Take the following regex for an example:
If we want to match words without the "e" character, we could do this:
/\b[^\We]+\b/g
\W means NOT a "word" character.
^\W means a "word" character.
[^\We] means a "word" character, but not an "e".
see it in action: word without e
"and" Operator for Regular Expressions
I think this pattern can be used as an "and" operator for regular expressions.
In general, if:
A = not a
B = not b
then:
[^AB] = not(A or B)
= not(A) and not(B)
= a and b
Difference Set
So, if we want to implement the concept of difference set in regular expressions, we could do this:
a - b = a and not(b)
= a and B
= [^Ab]

Regex: how to match all character classes and not just one or more [duplicate]

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well?
Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.
Use a non-consuming regular expression.
The typical (i.e. Perl/Java) notation is:
(?=expr)
This means "match expr but after that continue matching at the original match-point."
You can do as many of these as you want, and this will be an "and." Example:
(?=match this expression)(?=match this too)(?=oh, and this)
You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.
You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:
(?=.*word1)(?=.*word2)(?=.*word3)
The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.
In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:
/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m
The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.
Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:
/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m
Look at this example:
We have 2 regexps A and B and we want to match both of them, so in pseudo-code it looks like this:
pattern = "/A AND B/"
It can be written without using the AND operator like this:
pattern = "/NOT (NOT A OR NOT B)/"
in PCRE:
"/(^(^A|^B))/"
regexp_match(pattern,data)
The AND operator is implicit in the RegExp syntax.
The OR operator has instead to be specified with a pipe.
The following RegExp:
var re = /ab/;
means the letter a AND the letter b.
It also works with groups:
var re = /(co)(de)/;
it means the group co AND the group de.
Replacing the (implicit) AND with an OR would require the following lines:
var re = /a|b/;
var re = /(co)|(de)/;
You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.
You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):
(abc)|(bca)|(acb)|(bac)|(cab)|(cba)
However, this makes a very long and probably inefficient regexp, if you have more than couple terms.
If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.
Is it not possible in your case to do the AND on several matching results? in pseudocode
regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...
Why not use awk?
with awk regex AND, OR matters is so simple
awk '/WORD1/ && /WORD2/ && /WORD3/' myfile
The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.
What you want to do is not possible with a single regexp.
If you use Perl regular expressions, you can use positive lookahead:
For example
(?=[1-9][0-9]{2})[0-9]*[05]\b
would be numbers greater than 100 and divisible by 5
In addition to the accepted answer
I will provide you with some practical examples that will get things more clear to some of You. For example lets say we have those three lines of text:
[12/Oct/2015:00:37:29 +0200] // only this + will get selected
[12/Oct/2015:00:37:x9 +0200]
[12/Oct/2015:00:37:29 +020x]
See demo here DEMO
What we want to do here is to select the + sign but only if it's after two numbers with a space and if it's before four numbers. Those are the only constraints. We would use this regular expression to achieve it:
'~(?<=\d{2} )\+(?=\d{4})~g'
Note if you separate the expression it will give you different results.
Or perhaps you want to select some text between tags... but not the tags! Then you could use:
'~(?<=<p>).*?(?=<\/p>)~g'
for this text:
<p>Hello !</p> <p>I wont select tags! Only text with in</p>
See demo here DEMO
You could pipe your output to another regex. Using grep, you could do this:
grep A | grep B
((yes).*(no))|((no).*(yes))
Will match sentence having both yes and no at the same time, regardless the order in which they appear:
Do i like cookies? **Yes**, i do. But milk - **no**, definitely no.
**No**, you may not have my phone. **Yes**, you may go f yourself.
Will both match, ignoring case.
Use AND outside the regular expression. In PHP lookahead operator did not not seem to work for me, instead I used this
if( preg_match("/^.{3,}$/",$pass1) && !preg_match("/\s{1}/",$pass1))
return true;
else
return false;
The above regex will match if the password length is 3 characters or more and there are no spaces in the password.
Here is a possible "form" for "and" operator:
Take the following regex for an example:
If we want to match words without the "e" character, we could do this:
/\b[^\We]+\b/g
\W means NOT a "word" character.
^\W means a "word" character.
[^\We] means a "word" character, but not an "e".
see it in action: word without e
"and" Operator for Regular Expressions
I think this pattern can be used as an "and" operator for regular expressions.
In general, if:
A = not a
B = not b
then:
[^AB] = not(A or B)
= not(A) and not(B)
= a and b
Difference Set
So, if we want to implement the concept of difference set in regular expressions, we could do this:
a - b = a and not(b)
= a and B
= [^Ab]

Regex working in regex engine but not in postgresql

I tried to match number 13 in pipe separated string like the one below:
13 - match
1|2|13 - match
13|1|2 - match
1|13|2 - match
1345|1|2 - should fail
1|1345|2 - should fail
1|2|1345 - should fail
1|4513|2 - should fail
4513|1|2 - should fail
2|3|4|4513- should fail
So, if 13 only occurs at the beginning or end, or in-between the string as a whole word it should match.
For that I wrote the following regex:
^13$|(\|13\|)?(?(1)|(^13\||\|13$))
In Regex101 it is working as expected. Please click link to see my sample.
But in Postgresql it throws error for the following query:
SELECT * FROM tbl_privilage WHERE user_id = 24 and show_id ~ '^13$|(\|13\|)?(?(1)|(^13\||\|13$))';
Error:
ERROR: invalid regular expression: quantifier operand invalid
SQL state: 2201B
Don't use a regex, using an array is more robust (and maybe more efficient as well):
select *
from the_table
where '13' = any (string_to_array(the_column, '|'));
this assumes that there is no whitespace between the values and the delimiter. You can even index that expression which probably makes searching a lot faster.
But I agree with Frank: you should really fix your data model.
Documentation is quite clear, saying that operator ~ implements the POSIX regular expressions. In Regex101 you're using PCRE (Perl-compatible) regular expressions. The two are very different.
If you need PCRE regular expressions in PostgreSQL you can setup an extension. Like pgpcre.
You need to match 13 within word boundaries.
You need
[[:<:]]13[[:>:]]
This solution should work even if you have spaces around the numeric values.
See documentation:
There are two special cases of bracket expressions: the bracket
expressions [[:<:]] and [[:>:]] are constraints, matching empty
strings at the beginning and end of a word respectively.

Refining a regex repeating group

I'm trying to extract two sides of a string delimited by a hyphen
abc - def
At the moment I have
([^-]*)-([^-]*)
Match 1 would be abc and match 2 would be def.
Is there a more elegant way of writing this regular expression so that there are no repeating elements? i.e. ([^-]*) is not repeated twice.
Simply use [^-]+ and iterate over the results.
Illustration in Java:
// yours
Matcher m1 = Pattern.compile("([^-]*)-([^-]*)").matcher("abc - def");
if (m1.find()) {
System.out.println(m1.group(1));
System.out.println(m1.group(2));
}
// mine
Matcher m2 = Pattern.compile("[^-]+").matcher("abc - def");
while (m2.find()) {
System.out.println(m2.group());
}
Outputs are identical.
Use a non-greedy match:
(.*?)-(.*)
See a live demo showing it working.
I don't think it can be done more simply than this.
You could just match (.*)-(.*), the hyphen would still have to get matched so it would split the 2 expressions.
By the way, you can try checking online on sites like this - http://regexpal.com/
You can do it like this:
(?:[^-]*-?){2}
Regex 101 Demo
If your regex is more complex, you could split it up into smaller chunks and then reuse those.
For your example this could look like this (Java):
String side = "([^-]*)";
String regex = side + "-" + side;
However, while this is useful for repeated complex regexes (think e-mail validation and such), in your case the version with repetitions is perfectly okay.
You can refer to what was matched in an earlier group by using ([^-]*)-\1, but this will only match if the two sides are equal, not if they match the same pattern, i.e., it would match "abc-abc", but not "abc-def".