Regex to look for url start value and end value - regex

I'm using using regex to look for URL that starts with http or https and with a specific value.
^http|https\:\/\/www
This regex looks at the http/https in a URL and this works.
/[\/]\bvalue?\b[\/]/g
This regex looks for "value" in a url and this currently matches with
http://www.test.co.uk/value/
http://www.test.co.uk/folder/value/
Is there a possibility to put those two regex together? Basically I need to display URLs that doesn't contain http/https or /value/ in the URL path

You're looking to do this: /(?=^(https|http))|(\bvalue\b)/g
First half: (?=^(https|http)) which will look first for https and then for http. My personal opinion however is to reduce the code to look only for http, since by matching for http you can also match for https. You may think this behavior is not going to work, but logically it does. You can try that if you like and see what happens.
Second half: (\bvalue\b). You can be more specific such as it being between forward and back slashes, or not. I used the \b delimiter to avoid it being part of another string and it worked quite well.
The important part here is to unite them, so use the | operator and it yields the above result.
Test strings:
http://www.helloworldvalue/value/values/
https://www.helloworldvalue/values/svalue/value/value/vaaluevalue/
Try it and let me know if you have any questions in the comments below.

Related

Regex to match a URL with parameters but reject URL with subfolder

Short Question: What regex statement will match the parameters on a URL but not match a subfolder? For example match google.com?parameter but not match google.com/subdomain
Long Question: I am re-directing a few URLs on a site.
I want a request to ilovestarwars.com/page2 to re-direct to ilovestarwars.com/forceawakens
I setup this re-direct and it works great most of the time. The problem is when there are URL parameters. For example if someone sends the URL using an email program that tracks links. Then ilovestarwars.com/page2 becomes ilovestarwars.com/page2?parameter=trackingcode123 after they send it which results in a 404 on my site because it is looking for the exact URL.
No problem, I will just use Regex. So I now re-direct using ilovestarwars.com/page2(.*) and it works great accepts all the parameters, no more 404s.
However, trying to future proof my work, I am worried, what happens if someone adds content inside the page2 folder? For example ilovestarwars.com/page2/mistake
They shouldn't, but if they do, it will take them forever to figure out why it is redirecting.
So my question is, how can I create a regex statement that will match the parameters but reject a subfolder?
I tried page2(.*?)/ as is suggested in this answer, but https://www.regex101.com/ says the slash is an unescaped delimiter.
Background info as suggested here, I am using Wordpress and the Redirection plugin. This is the article that goes over the initial redirect I setup.
A direct answer to your question would be something like this: ^/([^?&/\]*)(.*)$
This assumes the string starts at the first / (if it doesn't, remove the / that follows the ^). In the first capture group you will get the page name (page2, in the case of your example URL) and in the second capture group, you will get the remaining part of the url (anything following one of these chars: ?, &, /, \). If you don't care about the second capture group, use ^/([^?&/\]*).*$
An indirect answer would be that you don't do it this way. Instead, there should be an index page in folder page2 that uses a 301 redirect to redirect to the proper page. It would make much more sense to do it statically. I understand that you may not have that much control over your webpage, though, since it is Wordpress, in which case the former answer should work with the given plugin.

regex for url without protocol

I need a regex to check a url with this format: "www.stackoverflow.com".
I do not want to allow http or https: "http://www.stackoverflow.com"
I've been looking for a good 45 minutes and can't find anything. only regex that allows both or that require "http".
The closest I've seen is "^([a-zA-Z0-9]+(.[a-zA-Z0-9]+)+.*)$" but this allows anything as long as it includes "."
Acceptable expression: www.example.com
Unacceptable expression: http://www.example.com, example.com etc.
Basically something that makes sure it starts with "www.". If possible I also want to make sure it ends with ".something". And all the other URL regex attributes like not allowing "!" etc.
try using this pattern
^(?!https?).*$
with i modifier for case insensitive.
Demo
Per comment below use this pattern
^(?!https?)www\..*$
or simply
^www\..*$

Regex to match url but not urlMvc

I'm going through my Android app at the minute making sure all my HTTP calls point to the same place. I want to run a search so I don't have to manually look through each file and possibly miss one. I've seen I can do a find using Regex. What I need is a url that matches the string url but ignores urlMvc and urlProcedural (as there the variables the calls should be made to). Is this something thats possible with a Regex or will I have to go through all the files manually?
Copy from comment: You can use negative lookahead: url(?!Mvc|Procedural)

Regex pattern to format url

I have this pattern ^(?:http://)?(?:www.)?(.*?)/?(.*?)$ but it's still not perfect.
Let's say we have these urls to test against it:
example.com
example.com/
www.example.com/
http://example.com/
example.com/param
http://example.com/params/
The final output should be example.com/ if there's no parameters and example.com/params/ if with parameters. My problem is that it matches only second group. It doesn't look like /? is working otherwise it would stop on slash character. Is it possible to achieve what I want using only one pattern?
So you want the host name in $1? Your regex is ambiguous, there are many ways to match it; the regex engine will prefer the longest, leftmost possible match. If you don't want slashes in the first part, then say so. Explicitly. (?:http://)?(?:www\.)?([^/]*)?/?(.*)?$
One that I've used is:
((?:(?:https?://)?[\w\d:##%/;$()~_?\+\-=&]+|www|ftp)\.[\w\d:##%/;$()~_?\+\-=&\.]+)
The problem with URLs is that there are SO many ways one can be written, which is why the above code looks so congested. This will match all your examples above, but it will also match things like:
alkasi.jaias
Hopefully this will get you headed to where you need or want to go, and perhaps someone might be able to come up behind me and clean it up some (it's early morning, I'm getting ready for work, and am exhausted. :P)

Regex to find bad URLs in a database field

We had an issue with the text editor on our website that was doubling up the URL. So for example, the text field may look contain:
This is a description for a media item, and here in a link.
So pretty much I need a regex to detect any string that begins with http and has another http before a closing quote, as in "http://www.example.com/apage.htmlhttp://www.example.com/apage.html"
"http[^"]+http
http://www.example.com/apage.htmlhttp://www.example.com/apage.html
This is actually a valid URL! So you'd want to be a bit careful not to munge any other URLs that happen to have ‘http://’ in the middle of them. To detect only a ‘doubled’ URL you could use backreferences:
"(https?://[^"]*)\1"
(This is a non-standard regex feature, but most modern implementations have it.)
Using regex to process HTML is a bad idea. HTML cannot reliably be parsed by regex.
If you can use the *.? syntax, you can just look for the following:
http(.*?)http
and if its present, reject the url.
The string that begins with http and has another http before a quote is:
^http[^"]*http
But, although this answers exactly your question I suspect you may want Uh Clem's answer instead ;-)
You will probably want something like this:
("http[^"]+)(http)
Then compare the two and if \1 === " + \2 then replace them.
One thought; do you have any query strings in any of your urls. If you do, are any of them like this "http://someurl.com?http=somemoredatahttp://someurl.com?http=somemoredata"?
If so, you will want something far more complicated.