Regex for url rewrite with if then else - regex

Given these urls:
1: http://site/page-name-one-123/
2: http://site/page-name-set2/
3: http://site/set20
I wrote this expression that will be applied to last url segment:
(?(?<=set[\d])([\d]+)|([^/]+))
What I'd want to do is to catch every digits followed by 'set' only if the url segment starts with 'set' and a digit immediately after; otherwise i want to use the whole segment (excluding slashes).
As I wrote this regex, it matches any character that is not a '/'. I think I'm doing something wrong in test statement.
Could anyone point me right?
Thanks
UPDATE
Thanks to Josh input I played around for a bit and found that this one fits better my needs:
set-(?P<number>[0-9]+)|(?P<segment>[^/]+)

I hope this pattern can help you out, I put it together based on your requirements. You may want to play around with setting some of the groups to not capture so that you only get the segments that you need. However, it does seperate capture your set URL's without set at the start.
((?<=/{1})(((?<!set)[\w|-]*?)(\d+(?=/?))|((?:set)\d+)))
I suggest using RegExr to pick it apart if you need to.

Try this:
((?<=/)set\d+|(?<=/)[^/]+?set\d+)
Explanation
<!--
Options: ^ and $ match at line breaks
Match the regular expression below and capture its match into backreference number 1 «((?<=/)set\d+|(?<=/)[^/]+?set\d+)»
Match either the regular expression below (attempting the next alternative only if this one fails) «(?<=/)set\d+»
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=/)»
Match the character “/” literally «/»
Match the characters “set” literally «set»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?<=/)[^/]+?set\d+»
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=/)»
Match the character “/” literally «/»
Match any character that is NOT a “/” «[^/]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Match the characters “set” literally «set»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
-->

Related

Regular expression to match string in brackets

I'm building a regular expression which have to extract strings from brackets. This is an example string:
((?X is parent ?Y)(?X is child ?Z))
I need to get strings: '?X is parent ?Y' and also '?X is child ?Z'. This is what I've created yet:
^(\((.*?)\))+$
The problem is that it matches only the string in the second bracket. Could anybody help me to improve the expression so that it matches both strings in brackets?
Note: brackets can contain any content, like ((AAA)(BBB)). In this case 'AAA' and 'BBB' should be matched.
Thanks forward.
Based on your comments, it seems that you just want to match anything inside the brackets, for that you can use:
String Sample1 = "((something)(world)(example))";
Pattern regex = Pattern.compile("\\(?\\((.*?)\\)\\)?");
Matcher regexMatcher = regex.matcher(Sample1);
while (regexMatcher.find()) {
System.out.print(regexMatcher.group(1));
// something world example
}
Demo
Regex Explanation
Match the character “(” literally «\(?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “(” literally «\(»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match the character “)” literally «\)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
This seems to work:
Pattern.compile("[\\(]{0,1}(\\((.*?)\\))")
Thanks all for replies and comments.

Trying to build regex need inputs on that

I am having a string
<sip:a39pbx#47.168.156.141:5060;maddr=47.168.156.141>;expires=703,<sip:739pbxast25#47.168.156.141:5060;maddr=47.168.156.141>;expires=826;
want to extract expires and its value, I have tried using
(.*)(expires=\d*);(.*)
But it is giving me only last one which is expires=826, I want to select other or also which is ending with ,.
Any inputs are appreciated.
You should try like:
(expires=(\d+)),
The last comma will help you to match with the value which ends with ,. It will give two match one with key & value and another with value only.
Live Demo
(expires=\d+)
http://regex101.com/r/sQ5bJ7
(expires=\d+)
Match the regular expression below and capture its match into backreference number 1 «(expires=\d+)»
Match the characters “expires=” literally «expires=»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

regex to match a word with unique (non-repeating) characters

I'm looking for a regex that will match a word only if all its characters are unique, meaning, every character in the word appears only once.
Example:
abcdefg -> will return MATCH
abcdefgbh -> will return NO MATCH (because the letter b repeats more than once)
Try this, it might work,
^(?:([A-Za-z])(?!.*\1))*$
Explanation
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below «(?:([A-Z])(?!.*\1))*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([A-Z])»
Match a single character in the range between “A” and “Z” «[A-Z]»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1)»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the same text as most recently matched by capturing group number 1 «\1»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
You can check whether there are 2 instances of the character in the string:
^.*(.).*\1.*$
(I just simply capture one of the character and check whether it has a copy elsewhere with back reference. The rest of .* are don't-cares).
If the regex above match, then the string has repeating character. If the regex above doesn't match, then all the characters are unique.
The good thing about the regex above is when the regex engine doesn't support look around.
Apparently John Woo's solution is a beautiful way to check for the uniqueness directly. It assert at every character that the string ahead will not contain the current character.
This one would also provide a full match to any length word with non-repeating letters:
^(?!.*(.).*\1)[a-z]+$
I slightly revised the answer provided by #Bohemian to another question a while ago to get this.
It has also been a while since the question above has been asked but I thought it would be nice to also have this regex pattern here.

What would be the regex pattern for a set of numbers separated with a comma

The possible values are...
1 (it will always start with a number)
1,2
4,6,10
You can try something like this:
^[0-9]+(,[0-9]+)*
This should do it:
(\d+,?)+
This will do:
-?[0-9]+(,-?[0-9]+)*
Or, if you want to be pedantic and disallow numbers starting with 0 (other than 0 itself):
(0|-?[1-9][0-9]*)(,(0|-?[1-9][0-9]*))+
Floating-point numbers are left as an exercise to the reader.
You'll want
(?<=(?:,|^))\d+(?=(?:$|,))
Regex Buddy explains it as...
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=(?:,|^))»
Match the regular expression below «(?:,|^)»
Match either the regular expression below (attempting the next alternative only if this one fails) «,»
Match the character "," literally «,»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «^»
Assert position at the start of the string «^»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:$|,))»
Match the regular expression below «(?:$|,)»
Match either the regular expression below (attempting the next alternative only if this one fails) «$»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
Match the character "," literally «,»
I would explain it as, "match any string of digits confirming that before it comes either the start of the string or a comma and that after it comes either the end of the string or a comma". nothing else.
The important thing is to use non-capturing groups (?:) instead of simply () to help overall performance.

How do you understand regular expressions that are written in one line?

This is a neat well documented regular expression, easy to understand, maintain and modify.
text = text.replace(/
( // Wrap whole match in $1
(
^[ \t]*>[ \t]? // '>' at the start of a line
.+\n // rest of the first line
(.+\n)* // subsequent consecutive lines
\n* // blanks
)+
)
/gm,
But how do you go about working with these?
text = text.replace(/((^[ \t]*>[ \t]?.+\n(.+\n)*\n*)+)/gm,
Is there a beautifier of some sort that makes sense of it and describes its functionality?
It's worth the effort to become adept at reading regexs in the one line form. Most of the time there are written this way
RegexBuddy will "translate" any regex for you. When fed your example regex, it outputs:
((^[ \t]*>[ \t]?.+\n(.+\n)*\n*)+)
Options: ^ and $ match at line breaks
Match the regular expression below and capture its match into backreference number 1 «((^[ \t]*>[ \t]?.+\n(.+\n)*\n*)+)»
Match the regular expression below and capture its match into backreference number 2 «(^[ \t]*>[ \t]?.+\n(.+\n)*\n*)+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Note: You repeated the capturing group itself. The group will capture only the last iteration.
Put a capturing group around the repeated group to capture all iterations. «+»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character present in the list below «[ \t]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
The character “ ” « »
A tab character «\t»
Match the character “>” literally «>»
Match a single character present in the list below «[ \t]?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
The character “ ” « »
A tab character «\t»
Match any single character that is not a line break character «.+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a line feed character «\n»
Match the regular expression below and capture its match into backreference number 3 «(.+\n)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration.
Put a capturing group around the repeated group to capture all iterations. «*»
Match any single character that is not a line break character «.+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a line feed character «\n»
Match a line feed character «\n*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
This does look rather intimidating in text form, but it's much more readable in HTML form (which can't be reproduced here) or in RegexBuddy itself. It also points out common gotchas (such as repeating capturing groups which is probably not wanted here).
I like expresso
After a while, I've gotten used to reading the things. There is not much to most regexes, and I recommend the site http://www.regular-expressions.info/ if you want to use them more often.
Regular expressions are just a way to express masks, etc. At the end it's just a "language" with its own syntax.
Comment every bit of your regular expression would be the same thing as comment every line of your project.
Of course it would help people who doesn't understand your code, but it's just useless if you (the developer) do understand the meaning of the regex.
For me, reading regular expressions is the same thing as reading code. If the expression is really complex an explanation below could be useful but most of the time it isn't necessary.