Regex to extract host - regex

i've searched all over the net for this but does anyone have a Regular expression to extract the host from this text?
Host: my.domain.com

check if this helps you
function fnGetDomain(url)
{
return (url.match(/:\/\/(.[^/]+)/)[1]).replace('www.','');
}

(([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+$)

With capturing group (you have to retrieve the value from group 1 afterwards):
Host:\s*(.*)$
With lookbehind (doesn't work in most regex engines due to variable-length lookbehind, but the match itself is the value you want):
(?<=Host:\s*).*$

Related

Using a wildcard in Regex at the end of a URL in GA

I'm a newbie at Regex. I'm trying to get a report in GA that returns all pages after a certain point in the URL.
For example:
http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/14-June-2016/
I want to see all dates so: http://www.essentialibiza.com/ibiza-club-tickets/carl-cox/*
Here's what I've got so far in my regex:
^https:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox(?=(?:\/.*)?$)
You can try this:
https?:\/\/www\.essentialibiza\.com\/ibiza-club-tickets\/carl-cox[\w/_-]*
GA RE2 regex engine does not allow lookarounds (even lookaheads) in the pattern. You have defined one - (?=(?:\/.*)?$).
If you need all links having www.essentialibiza.com/ibiza-club-tickets/carl-cox/, you can use a simple regex:
www\.essentialibiza\.com/ibiza-club-tickets/carl-cox/
If you want to precise the protocol:
https?://www\.essentialibiza\.com/ibiza-club-tickets/carl-cox(/|$)
The ? will make s optional (1 or 0 occurrences) and (/|$) will allow matching the URL ending with cox (remove this group if you want to match URLs that only have / after cox).

Regex - find all hosts inside a url

Given the following url:
http://clk.atdmt.com/FLO/go/364329512/direct/01/?href=http://www.****123****.com/refer.do?r=linkshare&lsid=vl0mfKZlvKU-I%2AKKCkbqWO7Zb9aqRSVLEw&lsurl=http%3A%2F%2F****123****%2Fcollection.do%3Fdataset%3D12905%26cm_mmc%3DIM_AFFILIATES-_-Linkshare-vl0mfKZlvKU-_-10003079-_-3
Is there any expression that will match all the hosts above? (e.g http://clk.atdmt.com, http://*123*.com....)
I need the same expression to work even in the matched string will be http://clk.atdmt.com
Thanks
You can use this (javascript flavour) to extract the host (capture group 1):
/http:\/\/(?:www\.)?([^\/]+)/g

Getting rid of the parenthesis with regular expression group matching

I'm trying to analyze logs using splunk and I need to parse lines that look like this:
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf) interceptor.CustomLoggingOutInterceptor (AbstractLoggingInterceptor.java:149) - Outbound Message
I've got this regex which matches:
(?i)^[^\]]*\]\s+(?P<FIELDNAME>[^ ]+)
this part :
2012-06-20 20:35:13,980 INFO [http-bio-8080-exec-72] (b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Using groups I can extract the real information that I need and that is :
(b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf)
Only problem is that I don't need parenthesis, I've tried with some negative lookahead/lookbehind google searches, don't really know regex that well.
So my final goal would be to capture b50f3a81-f9e0-4ebf-b9e2-b007c8dd4cbf . thanks
(?i)^[^\]]*\]\s+\((?P<FIELDNAME>[^ ]+)\)
That matches and drops the () in group 1.
Play with the regex here.

Regular expression - Negative look-ahead

I'm trying to use Perl's negative look-ahead regular expression
to exclude certain string from targeted string. Please give me your advice.
I was trying to get strings which do not have -sm, -sp, or -sa.
REGEX:
hostname .+-(?!sm|sp|sa).+
INPUT
hostname 9amnbb-rp01c
hostname 9tlsys-eng-vm-r04-ra01c
hostname 9tlsys-eng-vm-r04-sa01c
hostname 9amnbb-sa01
hostname 9amnbb-aaa-sa01c
Expected Output:
hostname 9amnbb-rp01c - SELECTED
hostname 9tlsys-eng-vm-r04-ra01c - SELECTED
hostname 9tlsys-eng-vm-r04-sa01c
hostname 9amnbb-sa01
hostname 9amnbb-aaa-sa01c
However, I got this actual Output below:
hostname 9amnbb-rp01c - SELECTED
hostname 9tlsys-eng-vm-r04-ra01c - SELECTED
hostname 9tlsys-eng-vm-r04-sa01c - SELECTED
hostname 9amnbb-sa01
hostname 9amnbb-aaa-sa01c - SELECTED
Please help me.
p.s.: I used Regex Coach
to visualize my result.
Move the .+- inside of the lookahead:
hostname (?!.+-(?:sm|sp|sa)).+
Rubular: http://www.rubular.com/r/OuSwOLHhEy
Your current expression is not working properly because when the .+- is outside of the lookahead, it can backtrack until the lookahead no longer causes the regex to fail. For example with the string hostname 9amnbb-aaa-sa01c and the regex hostname .+-(?!sm|sp|sa).+, the first .+ would match 9amnbb, the lookahead would see aa as the next two characters and continue, and the second .+ woudl match aaa-sa01c.
An alternative to my current regex would be the following:
hostname .+-(?!sm|sp|sa)[^-]+?$
This would prevent the backtracking because no - can occur after the lookahead, the non-greedy ? is used so that this would work correctly in a multiline global mode.
The following passes your testcases:
hostname [^-]+(-(?!sm|sp|sa)[^-]+)+$
I think it is a little easier to read than F.J.'s answer.
To answer Rudy: the question was posed as an exclusion-of-cases situation. That seems to fit negative lookahead well. :)

Regular Expression to get a query string value without using lookbehind

I want to extract the "en" from the following url so it can be re-written.
contact/default.aspx?lang=en
/contact/default.aspx?lang=en-us&id=1
/contact/default.aspx?id=1111&lang=en
The above examples should be rewritten as:
/contact/en/default.aspx
Unfortunately IIS7 does not support lookbehinds so this peice of regex cannot be used:
(?<=lang\=)(.+)
Any ideas how i can match the value part of the query string?
Thanks
I would do
.*?(&|\?)lang=([^&]+).*
and use the capture group 1