regular expression match domain - regex

I need a regular expression to match the following domains as follows:
http://www.cnn.com/fred = www.cnn.com
cnn.com = cnn.com
www.cnn.com:8080 = www.cnn.com
I have the following regular expression (using pcre):
([^/]+://)?([^:/]+)
The above works fine in case 2 and 3 however with 1 i still have the http:// appended to the matching string, is there a regular expression option which i can use to skip the http part?
many thanks in advance

This one should suit your needs:
^(?:(?:f|ht)tps?://)?([^/:]+)
The first group will contain what you're looking for.

this looks like the closest i could get to what i want not perfect but seems to gets the job done
www?([^/:]+)

Related

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you
Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);
Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)
You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

Regex, how to match all urls but one?

My Regex skills a minimum, I have been trying for a while now to get this to work:
I need to match all urls in one domain, but one (the login one).
Example:
Match: domain.com/ANYTHING-GOES-HERE
but
Not Match: domain.com/login
I don't actually need to match the domain.com part because that's always the same, what comes after it.
I have tried:
(?!\/login)\/.*
\/.*[^login]
Neither one seems to work as desired.
Update:
I should have explained that this is done in PHP. I don't have control over the actual code that runs the regex, but I do have control over how many regex I can have. So I could have one regex that matches everything, and then have one regex that matches or not matches "/login"
You're almost there:
// javascript
r = /domain\.com\/(?!login).+/
r.test("domain.com/ANYTHING-GOES-HERE") // true
r.test("domain.com/login") // false
This also rejects "domain.com/login/foobar", if you want it to be accepted, modify the regex to be
r = /domain\.com\/(?!login$).+/

MATLAB 2012 regular expression

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.
^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data

Regular expression quantifier questions

Im trying to find a regular expression that matches this kind of URL:
http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html
and dont match this:
http://sub.domain.com/selector/F/13/K/10546/sampletext/5987/K/sample/K/101/sample_text.html
only if the number of /K/ is minimum 1 and maximum 2 (something with a quantifier like {1,2})
Until this moment i have the following regexp:
http://sub\.domain\.com/selector/F/[0-9]{1,2}/[a-z0-9_-]+/
Now i would need a hand to add any kind of condition like:
Match this if in the text appears the /K/ from 1 to 2 times at most.
Thanks in advance.
Best Regards.
Josema
Do you need to this all in one line?
The approach I would take is to do a regex for /K/ and then count the number of matches I got.
I think Boost is a C++ library right? In C# I would do it like this:
string url = "http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html";
if (Regex.Matches(url, "/K/").Count <= 2)
{
// good url found
}
UPDATE
This regex would match everything up to the first two K's and then only allow the url filename.html after that:
^http://sub.domain.com/selector/F/[\d]+/[a-zA-Z]+/[\d]+/[a-zA-Z]+/[\d]+/K/[a-zA-Z_]+\.html$
This RE will match anything after the/F/[0-9]{1,2} that has 1 or 2 /K/, it could also match http://sub.domain.com/selector/F/13/K/100546/stuff/21515/stuff/sampletext/654654/K/stuff/sampletext_sampletext.html :
^http://sub\.domain\.com/selector/F/[0-9]{1,2}(?:/K(?=/)(?:(?!/K/)/[a-z0-9_.-]+)*){1,2}$

Regular Expression for some email rules

I was using a regular expression for email formats which I thought was ok but the customer is complaining that the expression is too strict. So they have come back with the following requirement:
The email must contain an "#" symbol and end with either .xx or .xxx ie.(.nl or .com). They are happy with this to pass validation. I have started the expression to see if the string contains an "#" symbol as below
^(?=.*[#])
this seems to work but how do I add the last requirement (must end with .xx or .xxx)?
A regex simply enforcing your two requirements is:
^.+#.+\.[a-zA-Z]{2,3}$
However, there are email validation libraries for most languages that will generally work better than a regex.
I always use this for emails
^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}" +
#"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
#".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
Try http://www.ultrapico.com/Expresso.htm as well!
It is not possible to validate every E-Mail Adress with RegEx but for your requirements this simple regex works. It is neither complete nor does it in any way check for errors but it exactly meets the specs:
[^#]+#.+\.\w{2,3}$
Explanation:
[^#]+: Match one or more characters that are not #
#: Match the #
.+: Match one or more of any character
\.: Match a .
\w{2,3}: Match 2 or 3 word-characters (a-zA-Z)
$: End of string
Try this :
([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4})\be(\w*)s\b
A good tool to test our regular expression :
http://gskinner.com/RegExr/
You could use
[#].+\.[a-z0-9]{2,3}$
This should work:
^[^#\r\n\s]+[^.#]#[^.#][^#\r\n\s]+\.(\w){2,}$
I tested it against these invalid emails:
#exampleexample#domaincom.com
example#domaincom
exampledomain.com
exampledomain#.com
exampledomain.#com
example.domain#.#com
e.x+a.1m.5e#em.a.i.l.c.o
some-user#internal-email.company.c
some-user#internal-ema#il.company.co
some-user##internal-email.company.co
#test.com
test#asdaf
test#.com
test.#com.co
And these valid emails:
example#domain.com
e.x+a.1m.5e#em.a.i.l.c.om
some-user#internal-email.company.co
edit
This one appears to validate all of the addresses from that wikipedia page, though it probably allows some invalid emails as well. The parenthesis will split it into everything before and after the #:
^([^\r\n]+)#([^\r\n]+\.?\w{2,})$
niceandsimple#example.com
very.common#example.com
a.little.lengthy.but.fine#dept.example.com
disposable.style.email.with+symbol#example.com
other.email-with-dash#example.com
user#[IPv6:2001:db8:1ff::a0b:dbd0]
"much.more unusual"#example.com
"very.unusual.#.unusual.com"#example.com
"very.(),:;<>[]\".VERY.\"very#\\ \"very\".unusual"#strange.example.com
postbox#com
admin#mailserver1
!#$%&'*+-/=?^_`{}|~#example.org
"()<>[]:,;#\\\"!#$%&'*+-/=?^_`{}| ~.a"#example.org
" "#example.org
üñîçøðé#example.com
üñîçøðé#üñîçøðé.com