Check if URL is in a sentence with regex - regex

I need to check if a URLs is in a sentence.
Some text. This is good.
https://stackoverflow.com
More text
More text https://stackoverflow.com. More text. This is bad
I can find the URLs after some research, but I'm stuck on finding them in sentences.
https://regex101.com/r/AmuFIX/5
((http|ftp|https):\/\/)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)[\r\n]

Based on the comments, it sounds like you're looking for cases where a URL is mixed with other text on a line, not necessarily a sentence. For that, I would use something like this:
.+\b((http|ftp|https):\/\/)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)\b.+
This changes your query by asserting that there must be some characters, followed by a word boundary, followed by a URL, followed by a word boundary, followed by some other characters. This won't match a URL at the start or end of a line that also has other content; for that you'd likely need to two two separate matches - one for a URL with something before it, one for a URL with something after it.

Related

URL regex that skips ending periods

I'm trying to create a regex that matches url strings within normal text. I have this:
http[s]?://[^\s]+
This seems to work well with the exception that if the url is at the end of a sentence it will grab the period as well. For example for this string:
I am typing some text with the url http://something.com/something-?args=someargs. This is another sentence.
it matches:
http://something.com/some-thing?args=someargs.
I would like it to match:
http://something.com/some-thing?args=someargs
Obviously I can't exclude periods because they are in the url previously but I can't figure out how to tell it to exclude the last period if there is one. I could potentially use a negative lookahead for end of line or whitespace, but if it's in the middle of the line (without a period after it) that would leave off the last character of the url.
Most of the ones I have seen online have the same issue that they match the ending dot so maybe it's not possible? I know basic regex but certainly not a genius with it so if someone has a solution I would be very grateful :).
Also, I can do some post-process in this case to remove the dot if I need to, just seems like there should be a Regex solution...
Try this one
http[s]?://[^\s]+[^. ]

What's the right regular expression to match the exact word at the end of a string and excluding all other urls with more chars at the end?

I have to match an exact string at the end of a url, but not match all other urls that have more characters after that string
I can better explain with example.
I need to match the url having the string 'white' at its end: http//mysite.com/white
But I also need to not match urls having one or more characters postponed to it, like http//mysite.com/white__blue or http//mysite.com/white/yellow or http//mysite.com/white/
How to do that?
Thanks
Regex to match any url*
^(https?:\/\/)?([\da-z\.-]+\.[a-z\.]{2,6}|[\d\.]+)([\/:?=&#]{1}[\da-z\.-]+)*[\/\?]?$
Regex to match a url containing white in the end
^(https?:\/\/)?([\da-z\.-]+\.[a-z\.]{2,6}|[\d\.]+)([\/:?=&#]{1}[\da-z\.-]+)*[\/\?]?white$
You can check the regex here
From regexr.com
It does not match urls(which are not valid anyway) like
httpabrakadabra.co//
http:google.com
http://no-tld-here-folks.a
http://potato.54.211.192.240/
Based on your limited sample inputs, I'd say you could get away with this very minimal pattern:
^http[^\s]+white$
However, depending on what you are truly trying to achieve, what language/function you are implementing this pattern with, and what the full input string looks like, this pattern may need to be refined.
It would be best if you would improve your question to include all of the above relevant information.

Regex: matching string with 2 specific characters

I'm working in Google Analytics and trying to use the RegEx advanced filter option to display page names that contain two /, but not three /. The text string within the first section will always be products; however, after the second / it is random.
For example,
I want to include these page name strings:
/products/skis
/products/snowboards
/products/skates
I want to exclude these page name strings:
/products/skis/mens
/products/snowboards/womens
/products/skates/red
Again, the products part is consistent...but the second text section is random.
Appreciate any help -- thanks!
One possibility would be this::
^\/products\/[a-zA-Z]+$
This would capture the first slash, followed by 'products', followed by a second slash, and then any text string (without special characters). Nothing else would come after.
To match pages names starting by /products/ and not containing a third slash, you can use this regex:
^\/products\/[^\/]+$

Perl regex to match only if not followed by both patterns

I am trying to write a pattern match to only match when a string is not followed by both following patterns. Right now I have a pattern that I've tried to manipulate but I can't seem to get it to match correctly.
Current pattern:
/(address|alias|parents|members|notes|host|name)(?!(\t{5}|\S+))/
I am trying to match when a string is not spaced correctly but not if it is part of a larger word.
For example I want it to match,
host \t{4} something
but not,
hostgroup \t{5} something
In the above example it will match hostgroup and end up separating it into 2 separate words "host" and "group"
Match:
notes \t{4} something
but not,
notes_url \t{5} something
Using my pattern it ends up turning into:
notes \t{5} _url
Hopefully that makes a bit more sense.
I'm not at all clear what you want, but word boundaries will probably do what you ask.
Does this work for you?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5})/
Update
Having understood your problem better, does this do what you want?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5}(?!\t))/

Replacing Or Stripping Special Characters With Regex (Smarty)

in a part of my website i have a url that looks like this:
http://www.webizzi.com/mp3/search.html?q=+Hush+Hush+-+(Avril)++Lavigne's+
I would like to keep a cleaner url by stripping every special character that appears on the url except + but i also do not want to have something like ++ or + at the beginning or end of the url, the url should look like the one below
http://www.webizzi.com/mp3/search.html?q=Hush+Hush+Avril+Lavigne
what i have to process the url at the moment is:
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/\s+/":"+"|stripslashes}
Assuming your input is everything after ?q=
s/(^\++|\++$|\+\++|[\(\)]+)//g
In those last pair of brackets, you put any other characters you want stripped.
This matches one or more opening +'s, one or more closing +'s, two or more +'s anywhere, or one or more the special characters inside the brackets (so far, just parentheses) and replaces it with nothing – an empty string – zilch – nada.
I don't know jack about Smarty, but I think you should try something like
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/(^\++|\++$|\+\++|[\(\)]+)/":""|stripslashes}
I'm not quite sure if you need to escape the parentheses here, so if it doesn't work, lose some backslashes.
href="{$site_url}tests/tests/view/{$test.test_id}/{strtolower($test.test_name|replace:' ':'-'|regex_replace:'/(^++|++$|+++|[()]+)/':'')}">Test