Replacing Or Stripping Special Characters With Regex (Smarty) - regex

in a part of my website i have a url that looks like this:
http://www.webizzi.com/mp3/search.html?q=+Hush+Hush+-+(Avril)++Lavigne's+
I would like to keep a cleaner url by stripping every special character that appears on the url except + but i also do not want to have something like ++ or + at the beginning or end of the url, the url should look like the one below
http://www.webizzi.com/mp3/search.html?q=Hush+Hush+Avril+Lavigne
what i have to process the url at the moment is:
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/\s+/":"+"|stripslashes}

Assuming your input is everything after ?q=
s/(^\++|\++$|\+\++|[\(\)]+)//g
In those last pair of brackets, you put any other characters you want stripped.
This matches one or more opening +'s, one or more closing +'s, two or more +'s anywhere, or one or more the special characters inside the brackets (so far, just parentheses) and replaces it with nothing – an empty string – zilch – nada.
I don't know jack about Smarty, but I think you should try something like
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/(^\++|\++$|\+\++|[\(\)]+)/":""|stripslashes}
I'm not quite sure if you need to escape the parentheses here, so if it doesn't work, lose some backslashes.

href="{$site_url}tests/tests/view/{$test.test_id}/{strtolower($test.test_name|replace:' ':'-'|regex_replace:'/(^++|++$|+++|[()]+)/':'')}">Test

Related

Check if URL is in a sentence with regex

I need to check if a URLs is in a sentence.
Some text. This is good.
https://stackoverflow.com
More text
More text https://stackoverflow.com. More text. This is bad
I can find the URLs after some research, but I'm stuck on finding them in sentences.
https://regex101.com/r/AmuFIX/5
((http|ftp|https):\/\/)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)[\r\n]
Based on the comments, it sounds like you're looking for cases where a URL is mixed with other text on a line, not necessarily a sentence. For that, I would use something like this:
.+\b((http|ftp|https):\/\/)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)\b.+
This changes your query by asserting that there must be some characters, followed by a word boundary, followed by a URL, followed by a word boundary, followed by some other characters. This won't match a URL at the start or end of a line that also has other content; for that you'd likely need to two two separate matches - one for a URL with something before it, one for a URL with something after it.

URL regex that skips ending periods

I'm trying to create a regex that matches url strings within normal text. I have this:
http[s]?://[^\s]+
This seems to work well with the exception that if the url is at the end of a sentence it will grab the period as well. For example for this string:
I am typing some text with the url http://something.com/something-?args=someargs. This is another sentence.
it matches:
http://something.com/some-thing?args=someargs.
I would like it to match:
http://something.com/some-thing?args=someargs
Obviously I can't exclude periods because they are in the url previously but I can't figure out how to tell it to exclude the last period if there is one. I could potentially use a negative lookahead for end of line or whitespace, but if it's in the middle of the line (without a period after it) that would leave off the last character of the url.
Most of the ones I have seen online have the same issue that they match the ending dot so maybe it's not possible? I know basic regex but certainly not a genius with it so if someone has a solution I would be very grateful :).
Also, I can do some post-process in this case to remove the dot if I need to, just seems like there should be a Regex solution...
Try this one
http[s]?://[^\s]+[^. ]

Regex Expressions, URL and end part

I'm trying to make a regex expression to detect a URL with a dynamic ending from a message. So for example it would be something like this.
"http://loclhost/something/randomstring example text example text example text"
So the "http://localhost/something/" will always be the same but the random string part wont and I want to grab "http://loclhost/something/randomstring" only...
I've tried doing this expression
"/http://localhost/something/(.*) "
The thing is, it selects the whole text. I've tried looking up online but can't find anything. Would love some help :)
The .* will keep 'eating up' characters. You probably want something like
/http:\/\/localhost\/something\/([^\s]*)/
to make it 'stop' at a white-space character. Or
/http:\/\/localhost\/something\/([a-z0-9]*)/
if you are sure that randomstring only contains alpha-numerical characters.
Example: https://regex101.com/r/U12o53/1
You need to modify the (.*) part of the url so it only contains valid url characters, e.g.
/http:\/\/localhost\/something\/([\d\w\-_]*)/
You can modify it as you need based on the characters that can be in randomstring.

Unable to match csrf-token in Lua

Gone around the houses on this one tonight. All I want to do is pull out the csrf-token in the following script however it returns nil
local html = '<meta content="authenticity_token" name="csrf-param" /><meta content="ndcZ+Vp8MuM/hF6LizdrvJqgcRh22zF8w/DnIX0DvR0=" name="csrf-token" />'
local csrf_token = string.match(html, 'content="(.*)" name="csrf-token"')
If I modify the script and take off the "-token" part it matches something, but not the right thing of course.
I know it is the hyphen because if I modify the string to be "csrftoken" and the match it finds works as expected.
I attempted to escape the - like so \- but that threw an error...
elp...
There are two problems:
The - does need to be escaped, but Lua uses % instead of \.
Further, the reason why you get something odd is due to the fact the . can match anything, including across tags (or attributes) and tries to take as much as possible (since the engine will return the left-most possible match, ungreedy quantifiers wouldn't help either). What you should do is restrict the allowed characters, so that the captured thing cannot go outside of the attribute quotes - like [^"] (any character except quotes):
Taking all of that together:
local csrf_token = string.match(html, 'content="([^"]*)" name="csrf%-token"')
In any case, you shouldn't actually be matching HTML with regular expressions.
name="csrf-token'"
You have an extra apostrophe at the end of this line.
I would also escape " = and the hyphen, though this may not be necessary for all these characters.

How to capture text between two markers?

For clarity, I have created this:
http://rubular.com/r/ejYgKSufD4
My strings:
http://blablalba.com/foo/bar_soap/foo/dir2
http://blablalba.com/foo/bar_soap/dir
http://blablalba.com/foo/bar_soap
My Regular expression:
\/foo\/(.*)
This returns:
/foo/bar_soap/dir/dir2
/foo/bar_soap/dir
/foo/bar_soap
But I only want
/foo/bar_soap
Any ideas how I can achieve this? As illustrated above, I want everything after foo up until the first forward slash.
Thanks in advance.
Edit. I only want the text after foo until until the next forward slash after. Some directories may also be named as foo and this would render incorrect results. Thanks
. will match anything, so you should change it to [^/] (not slash) instead:
\/foo\/([^\/]*)
Some of the other answers use + instead of *. That might be correct depending on what you want to do. Using + forces the regex to match at least one non-slash character, so this URL would not match since there isn't a trailing character after the slash:
http://blablalba.com/foo/
Using * instead would allow that to match since it matches "zero or more" non-slash characters. So, whether you should use + or * depends on what matches you want to allow.
Update
If you want to filter out query strings too, you could also filter against ?, which must come at the front of all query strings. (I think the examples you posted below are actually missing the leading ?):
\/foo\/([^?\/]*)
However, rather than rolling out your own solution, it might be better to just use split from the URI module. You could use URI::split to get the path part of the URL, and then use String#split split it up by /, and grab the first one. This would handle all the weird cases for URLs. One that you probably haven't though of yet is a URL with a specified fragment, e.g.:
http://blablalba.com/foo#bar
You would need to add # to your filtered-character class to handle those as well.
You can try this regular expression
/\/foo\/([^\/]+)/
\/foo\/([^\/]+)
[^\/]+ gives you a series of characters that are not a forward slash.
the parentheses cause the regex engine to store the matched contents in a group ([^\/]+), so you can get bar_soap out of the entire match of /foo/bar_soap
For example, in javascript you would get the matched group as follows:
regexp = /\/foo\/([^\/]+)/ ;
match = regexp.exec("/foo/bar_soap/dir");
console.log(match[1]); // prints bar_soap