Regular Expression to get a query string value without using lookbehind - regex

I want to extract the "en" from the following url so it can be re-written.
contact/default.aspx?lang=en
/contact/default.aspx?lang=en-us&id=1
/contact/default.aspx?id=1111&lang=en
The above examples should be rewritten as:
/contact/en/default.aspx
Unfortunately IIS7 does not support lookbehinds so this peice of regex cannot be used:
(?<=lang\=)(.+)
Any ideas how i can match the value part of the query string?
Thanks

I would do
.*?(&|\?)lang=([^&]+).*
and use the capture group 1

Related

Regex PCRE capture multiple occurences query string in URL

I am trying to capture multiple occurence of utm tag in a URL and append when re-writing the url. However i just want utm key values and skip others.
This is a sample URL
https://example.com/dl/?screen=page&title=SABC&page_id=4063&myvalue=Noidea&utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&test=value&utm_content=contentTest19
I tried this:
(\?.*)(page_id=([^&]*))(\?|&)(.*[&?]utm_[a-z]+=([^&]+).*)
and unfortunately, it doesn't produce the result I expect.
I need to capture PAGE ID and utm tags both, but do not want test=value, myvalue=Noidea and only want query strings with utm tags.
Expected Result is the URL below:
https://example.com/dl/page_id/4063?utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&utm_content=contentTest19
one group with pageid=<somenumber/text>
one group with all utm tags with key and value
Help will be appreciated.
You can make regex like this to get group result:
(?:(page_id|utm_[a-z]+)=[A-z0-9]+)(?:^\&)?
You can instead replace any parameter that does not match the desired ones with the empty string. The pattern for this is
(?:[?&](?!(?:page_id|utm_[^=&]++)=)[^&]*+)++$|(?<=[?&])(?!(?:page_id|utm_[^=&]++)=)[^&]*+(?:&|$)
Here's a working proof: https://regex101.com/r/L5xcl4/2 It has an extra \s only so it works on the multiline input in the tester, but you shouldn't need it as you'll be working on a string that contains only a URL without whitespace.

Why did my regex not give the desired result

I have a string from which I need to extract specific url that consists of an image extension and the following regex:
ITEMIMAGEURL\d+=(http://.*?)(,|$|\n)
and the string that I've to extract from is:
ITEMIMAGEURL0 = http://images.example.com/xyz/l/dasda/test-image-6af8af8afa9.jpg,
ITEMIMAGEURL1 = http://images.example.com/xyz/l/dasda/test-image-,
ITEMIMAGEURL2 = http://images.example.com/abc/as/test/test-image-abrd23lg9.jpg
My regex works fine but I want to extract only the url with .jpg|.gif or any other image extension so I've tried
ITEMIMAGEURL\d+=(http://.*?(?(?=.[a-zA-Z]{3,4})))(,|$|\n)
But it didn't work as expected
My expected result is
http://images.example.com/xyz/l/dasda/test-image-6af8af8afa9.jpg
http://images.example.com/abc/as/test/test-image-abrd23lg9.jpg
You can use this regex to extract image URLs:
ITEMIMAGEURL\d+=(http://[^,\s]+?\.(?:jpe?g|gif|png))
RegEx Demo
Your image URL is captured in group #1. This assumes your URL doesn't contain comma character.
If comma is allowed in image URLs then use this regex with negative lookahead:
ITEMIMAGEURL\d+=(http://(?:(?!,ITEMIMAGEURL\d).)+\.(?:jpe?g|gif|png))
RegEx Demo 2
ITEMIMAGEURL\d+=(http:\/(?:\/[\w\.-]+)+\.(?:jpe?g|gif|png),?\s?)?
I think you know basics of RegExp. So one one: (?:\/[\w\.-]+) this is a pattern of valid url path. This is not only valid one, you could choose any you like, e.g. (?:\/[^\s,]+).
Demo

regular expression - excluding results if string is present

Trying to create a regular expression that excludes results of a substring is present.
Data Set:
http://www.cnn.com/test1
http://www.cnn.com/test3
http://www.cnn.com/test5
http://www.stackflow.com/test4
http://www.cnn.com/test3
http://www.cnn.com/test4
exclude:
find all cnn.com sites
that don't have /test3
Results:
http://www.cnn.com/test1
http://www.cnn.com/test5
http://www.cnn.com/test4
Figured it out: (www.cnn.com)(?!/test3)
If you want to avoid matching strings like http://www.cnn.com/test/test3 then you can use a negtive lookbehind at the end of the string
cnn\.com.*(?<!test3)$
I'm guessing this would be fastest:
cnn\.com(?!\/test3)[a-zA-Z0-9-._~:?##!$&'*+,;=`.\/\(\)\[\]]*
because you restrict the URL to allowed characters only.

Using a wildcard in Regex at the end of a URL in GA

I'm a newbie at Regex. I'm trying to get a report in GA that returns all pages after a certain point in the URL.
For example:
http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/14-June-2016/
I want to see all dates so: http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/*
Here's what I've got so far in my regex:
^https:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox(?=(?:\/.*)?$)
You can try this:
https?:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox[\w/_-]*
GA RE2 regex engine does not allow lookarounds (even lookaheads) in the pattern. You have defined one - (?=(?:\/.*)?$).
If you need all links having www.essentialibiza.com/ibiza-club-tickets/carl-cox/, you can use a simple regex:
www\.essentialibiza\.com/ibiza-club-tickets/carl-cox/
If you want to precise the protocol:
https?://www\.essentialibiza\.com/ibiza-club-tickets/carl-cox(/|$)
The ? will make s optional (1 or 0 occurrences) and (/|$) will allow matching the URL ending with cox (remove this group if you want to match URLs that only have / after cox).

Regex to extract host

i've searched all over the net for this but does anyone have a Regular expression to extract the host from this text?
Host: my.domain.com
check if this helps you
function fnGetDomain(url)
{
return (url.match(/:\/\/(.[^/]+)/)[1]).replace('www.','');
}
(([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+$)
With capturing group (you have to retrieve the value from group 1 afterwards):
Host:\s*(.*)$
With lookbehind (doesn't work in most regex engines due to variable-length lookbehind, but the match itself is the value you want):
(?<=Host:\s*).*$