I am new to regular expressions, but Give me this, I need to find a match:
a.com
b.com
c.com
aa.com
admin.com
www.com
mail.com
vg.com
As a result, I have found a regular expression to all domains except the admin / www / mail.
I wrote this:
[a-zA-Z0-9]+.com
But how to exclude admin, mail, www
I tried this:
^(www|mail|admin)[a-zA-Z0-9]+.com
But it doesn't work
Try this
\w+(?<!admin|mail|www)\.com
Here it is with some tests
http://www.rubular.com/r/frRl1ucR8J
Further reading on Regular Expressions: http://www.regular-expressions.info/tutorial.html
And the trick I used is called Negative LookBehind http://www.regular-expressions.info/lookaround.html
It is not simple to exclude some things, but here is a link to help:
http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html
is it possible to use a replace first? You could first do a find/replace to eliminate lines that match the things you want to skip, then use your regular expression.
You would do this to search for a string that doesn't contain admin:
^((?!admin).)*$
I'm not sure how to do it for multiple strings...
I use this, somewhat similar to already answered.
/^[A-Za-z0-9._'%+-]+#(\[(\d{1,3}\.){3}|(?!hotmail|gmail|yahoo|live|msn|outlook|comcast|verizon)(([a-zA-Z\d-]+\.)+))([a-zA-Z]{2,4}|\d{1,3})(\]?)$/i
Related
I am trying to create a regex in pcre, that is going to salinize URL with multiple slashes like the following:
https://www.domin.com/test1/////test2/somemoretests_67142 https://www.domin.com/test1/test2/somemoretests_67142///// https://www.domin.com/test1/test2///somemoretests_67142
So that I can replace it with the following: https://\2\4 and the link at the end of it looks: https://www.domin.com/test1/test2/somemoretests_67142
I have been struggling with it for the past couple of days, so any regex guru help is more than welcome :)
I have tried the following and more:
(http|https):\/\/(.*)(\/\/+)(.*)
(http|https):\/\/(.*)(\/\/){2,}(.*)
(http|https):\/\/(.*)(\/\/{2})(.*)
I am going to utilize these for Akamai to sanitize our URLs though cloudlet.
You can try:
(?<!https:\/)(?<!http:\/)(\/+$|(?<=\/)\/+)
And substitute the first group with empty string.
Regex demo.
This will produce this output:
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
I need to fix my url pattern:
/^((http(s)?(\:\/\/)){1}(www\.)?([\w\-\.\/])*(\.[a-zA-Z]{2,4}\/?)[^\\\/#?])[^\s\b\n|]*[^\.,;:\?\!\#\^\$ -]/
I thought this regex was ok, but it is not working for urls like: https://xx.xx (without www). 'www' should be optional ((www.)?). Where is the bug?
The problem is not in the (www\.)? part but that parts after that.
Take a look at the [^\\\/#?] and the [^\.,;:\?\!\#\^\$ -] parts.
So a valid URL would be https://xx.xx plus none of \/#? plus none of .,;:?!#^$_- making the url valid if you add those, for example https://xx.xx11.
I do advice you to not try to create your own regex because you are missing a lot!
For example, tlds like .amsterdam are valid. And why are you capturing so many groups?
Your regex as an image made with https://www.debuggex.com/:
I am trying to write a regular expression in which I want to compare the URL's.
Any URL Matches
http://*.xyz.com
Except or Excluding
http://m.xyz.com and http://m.product.xyz.com
So far I was trying to do it by using if else in RegExp but I couldn't be able to do it right...
(^http:\/\/)(((1)<!(m|m\.product))\.xyz\.co\.jp)?
You can try that:
^http:\/\/(?!m\.xyz\.com|m\.product\.xyz\.com).*\.xyz\.com$
Regex101 Demo
https?:\/\/(?!m\.|m\.product\.).*\.xyz\..*
This regex accepts all *.xyz.* domains except m.xyz.* and m.product.xyz.*. Also takes care of http or https.
Demo
What would the regular expression look like to include/exclude a specific URL? I posted two URLs below -I need a regex that will distinguish between the two. The only difference in the two URLs is the ending: type vs hcat.
https://post.craigslist.org/k/WDEDan6W4xGILKcEW036_A/w7TH4?s=type
https://post.craigslist.org/k/WDEDan6W4xGILKcEW036_A/w7TH4?s=hcat
I hope I understood your question right.
But if you want to give the exact given URLs in - this should do:
"https://post\.craigslist\.org/k/WDEDan6W4xGILKcEW036_A/w7TH4\?s=(type|hcat)"
With this, Capture Group 1 would contain either type or hcat or nothing.
If you want to check based on this domain URL and the URL should end on the parameter s with type or cat, use this:
"https://post\.craigslist\.org/.*?s=(type|hcat)"
Note: The ? now marks the * as not greedy, it is not the escaped \? from above.
I have the following regex that attempts to match URLs:
/((http|https):(([A-Za-z0-9$_.+!*(),;/?:#&~=-])|%[A-Fa-f0-9]{2}){2,}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*(),;/?:#&~=%-]*))?([A-Za-z0-9$_+!*();/?:~-]))/g
How can I modify this regex to only match URLs of a single domain?
For example, I only want to match URLs that begin with http://www.google.com?
This should simplify my regex, but I'm too much of a regex noob to get it working (after all these years...)
Did you write that RegEx? I don't know what it's trying to do, but it certainly doesn't match URLs correctly. Here's something it matches:
http:###9#?~
which I'm pretty sure isn't a valid URL.
You shouldn't be using RegEx to match URLs like this. You haven't said what language you're working in, but use whatever its equivalent of urlparse is..
Here's a relevant question: How do you validate a URL with a regular expression in Python?