Python regex lookbehind and lookahead - regex

I need to match the string "foo" from a string with this format:
string = "/foo/boo/poo"
I tied this code:
poo = "poo"
foo = re.match('.*(?=/' + re.escape(poo) + ')', string).group(0)
and it gives me /foo/boo as the content of the variable foo (instead of just foo/boo).
I tried this code:
poo = "poo"
foo = re.match('(?=/).*(?=/' + re.escape(poo) + ')', string).group(0)
and I'm getting the same output (/foo/boo instead of foo/boo).
How can I match only the foo/boo part?

Hey try the following regex:
(?<=/).*(?=/poo)
^^^^^^
It will not take into account your first slash in the result.
Tested regex101: https://regex101.com/r/yzMkTg/1
Transform your code in the following way and it should work:
poo = "poo"
foo = re.match('(?<=/).*(?=/' + re.escape(poo) + ')', string).group(0)
Have a quick look at this link for more information about the behavior of Positive lookahead and Positive lookbehind
http://www.rexegg.com/regex-quickstart.html

You are missing a < in your lookbehind!
Lookbehinds look like this:
(?<=...)
not like this:
(?=...)
That would be a lookahead!
So,
(?<=/).*(?=/poo)

Related

Find specific char inside delimiter

I have this string:
(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)
I need to grab the commas inside the parentesis for further processing, and I want the commas spliting the groups to remain.
Let's say I want to replace the target commas by FOO, the result should be:
(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
I want a Regular Expression that is not language specific.
You can just use a lookaround to find all , that are not preceded by a ) like this:
(?<!\)),
I don't want some language specific functions for this
The format of the above regex is not language specific as can be seen in the following Code Snippet or this regex101 snippet:
const x = '(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)';
const rgx = /(?<!\)),/g;
console.log(x.replace(rgx, ' XXX'));
For example:
import re
s = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
s = re.sub(r",(?=[^()]+\))", " FOO", s)
print(s)
# (40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
We use a positive lookahead to only replace commas where ) comes before ( ahead in the string.
Use re.sub with a callback function:
inp = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
output = re.sub(r'\((-?\d+(?:\.\d+)?),\s*(-?\d+(?:\.\d+)?)\)', lambda m: r'(' + m.group(1) + r' FOO ' + m.group(2) + r')', inp)
print(output)
This prints:
(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)
The strategy here is to capture the two numbers in each tuple in separate groups. Then, we replace by connecting the two numbers with FOO instead of the original comma.

RegEx for matching the first {N} chars and last {M} chars

I'm having an issue filtering tags in Grafana with an InfluxDB backend. I'm trying to filter out the first 8 characters and last 2 of the tag but I'm running into a really weird issue.
Here are some of the names...
GYPSKSVLMP2L1HBS135WH
GYPSKSVLMP2L2HBS135WH
RSHLKSVLMP1L1HBS045RD
RSHLKSVLMP35L1HBS135WH
RSHLKSVLMP35L2HBS135WH
only want to return something like this:
MP8L1HBS225
MP24L2HBS045
I first started off using this expression:
[MP].*
But it only returns the following out of 148:
PAYNKSVLMP27L1HBS045RD
PAYNKSVLMP27L1HBS135WH
PAYNKSVLMP27L1HBS225BL
PAYNKSVLMP27L1HBS315BR
The pattern [MP].* Matches either a M or P and then matches any char until the end of the string not taking any char, digit or quantifing number afterwards into account.
If you want to match MP and the value does not end on a digit but the last in the match should be a digit, you could use:
MP[A-Z0-9]+[0-9]
Regex demo
If lookaheads are supported you might also use:
MP[A-Z0-9]+(?=[A-Z0-9]{2}$)
Regex demo
You may not even want to touch MP. You can simply define a left and right boundary, just like your question asks, and swipe everything in between which might be faster, maybe an expression similar to:
(\w{8})(.*)(\w{2})
which you can simply call it using $2. That is the second capturing group, just to be easy to replace.
Graph
This graph shows how the expression would work:
Performance
This JavaScript snippet shows the performance of this expression using a simple 1-million times for loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "RSHLKSVLMP35L2HBS135WH";
var regex = /^(\w{8})(.*)(\w{2})$/g;
var match = string.replace(regex, "$2");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Try Regex: (?<=\w{8})\w+(?=\w{2})
Demo

Match anything before certain character

I have the following strings
/search?checkin=2018-10-25&checkout=2018-10-27&id=bandung-108001534490276290&page=1&room=1&sort=popularity&type=CITY
/search?checkin=2018-12-09&checkout=2018-12-13&id=singapore-108001534490299035&maxPrice=&minPrice=&room=1&type=REGION
/search?checkin=2018-10-22&checkout=2018-10-23&lat=-6.1176043&long=106.7767146&maxPrice=&minPrice=&room=1&type=COORDINATE
/search?page=1&room=1&type=POI&id=taman-mini-indonesia-indah-110001539700828313&checkin=2018-11-14&checkout=2018-11-16&sort=distance
i want to get all string starts from &id= until the first & so they will return
id=bandung-108001534490276290
id=singapore-108001534490299035
id=taman-mini-indonesia-indah-110001539700828313
When i tried this regex \&id=.*\& it doesn't match my requirement.
Hown do i resolve this?
I'd go with [?&](id=[^&]+).
[?&] - ? or &, because order of GET parameters is usually not guaranteed and you can get the id in the first place – something like /search?id=something-123456&checkin=2018-10-25&…
[^&]+ - at least one character that's not &
() marks a capturing group
Demo in JS:
const strings = [
"/search?checkin=2018-10-25&checkout=2018-10-27&id=bandung-108001534490276290&page=1&room=1&sort=popularity&type=CITY",
"/search?checkin=2018-12-09&checkout=2018-12-13&id=singapore-108001534490299035&maxPrice=&minPrice=&room=1&type=REGION",
"/search?checkin=2018-10-22&checkout=2018-10-23&lat=-6.1176043&long=106.7767146&maxPrice=&minPrice=&room=1&type=COORDINATE",
"/search?page=1&room=1&type=POI&id=taman-mini-indonesia-indah-110001539700828313&checkin=2018-11-14&checkout=2018-11-16&sort=distance]"
]
const regex = /[?&](id=[^&]+)/
strings.forEach(string => {
const match = regex.exec(string)
if (match) {
console.log(match[1])
}
})
Demo and explanation at Regex101: https://regex101.com/r/FBeNDN/1/
Positive Lookahead (?=)
Try a positive lookahead:
/&id=.+?(?=&)|&id=.+?$/gm
This part: (?=&) means: if an & is found, then everything before it is a match.
The alternation:| (it's an OR logic gate) is an update in regards to a comment from Nick concerning that if the parameter ended with an &id=...
It's the same match but instead of looking for a & it will look for the end of the line $. Note that the multi-line flag is used to make $ represent EOL.
Demo
var str = `/search?checkin=2018-10-25&checkout=2018-10-27&id=bandung-108001534490276290&page=1&room=1&sort=popularity&type=CITY
/search?checkin=2018-12-09&checkout=2018-12-13&id=singapore-108001534490299035&maxPrice=&minPrice=&room=1&type=REGION
/search?page=1&room=1&type=POI&id=indo-1999999051158
/search?checkin=2018-10-22&checkout=2018-10-23&lat=-6.1176043&long=106.7767146&maxPrice=&minPrice=&room=1&type=COORDINATE
/search?page=1&room=1&type=POI&id=taman-mini-indonesia-indah-110001539700828313&checkin=2018-11-14&checkout=2018-11-16&sort=distance
/search?page=1&room=1&type=POI&id=indonesia-1100055689`;
var rgx = /&id=.+?(?=&$)|&id=.+?$/gm;
var res = rgx.exec(str);
while (res != null) {
console.log(res[0]);
res = rgx.exec(str);
}

Exclude quantitizer from regular expression`

I have a quantifier regular expression that matches a 5digit code [0-9]{5}.
How can I exclude any matched of the above quantifier?
I tried [^([0-9]{5})] but it seems it doesn't work.
Test data follows:
including:
12345678875645 (will be matched)
pppppaaaaa (will be matched)
52p26 (will be matched)
123 (will be matched)
excluding:
12345 (won't be matched)
try this
^(\d{1,4}|\d{6,})$
This won't match numbers with exactly 5 digits
demo here: https://regex101.com/r/sHvRMA/1
You can use a negative look ahead:
/(?!^[0-9]{5}$)^.+$/
var rexp = /(?!^[0-9]{5}$)^.+$/;
var str = ['12345', '12345678875645', 'pppppaaaaa', '52p26', '123'];
for (var i = 0; i < str.length; i++) {
console.log(str[i] + ' - ' + (rexp.test(str[i]) ? 'matched' : 'did not match'));
}
I assume that you need a regex to match all things except 5 digits length
You simply need to use negative lookahead assertion for excluding 5 digits. that is it.
\b(?!\d{5}).+|.{6,}\b
It excludes only 5 digits not anything else

String Replacing in Regex

I am trying to replace text in string using regex. I accomplished it in c# using the same pattern but in swift its not working as per needed.
Here is my code:
var pattern = "\\d(\\()*[x]"
let oldString = "2x + 3 + x2 +2(x)"
let newString = oldString.stringByReplacingOccurrencesOfString(pattern, withString:"*" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
print(newString)
What I want after replacement is :
"2*x + 3 +x2 + 2*(x)"
What I am getting is :
"* + 3 + x2 +*)"
Try this:
(?<=\d)(?=x)|(?<=\d)(?=\()
This pattern matches not any characters in the given string, but zero width positions in between characters.
For example, (?<=\d)(?=x) This matches a position in between a digit and 'x'
(?<= is look behind assertion (?= is look ahead.
(?<=\d)(?=\() This matches the position between a digit and '('
So the pattern before escaping:
(?<=\d)(?=x)|(?<=\d)(?=\()
Pattern, after escaping the parentheses and '\'
\(?<=\\d\)\(?=x\)|\(?<=\\d\)\(?=\\\(\)