Regex Expressions, URL and end part - regex

I'm trying to make a regex expression to detect a URL with a dynamic ending from a message. So for example it would be something like this.
"http://loclhost/something/randomstring example text example text example text"
So the "http://localhost/something/" will always be the same but the random string part wont and I want to grab "http://loclhost/something/randomstring" only...
I've tried doing this expression
"/http://localhost/something/(.*) "
The thing is, it selects the whole text. I've tried looking up online but can't find anything. Would love some help :)

The .* will keep 'eating up' characters. You probably want something like
/http:\/\/localhost\/something\/([^\s]*)/
to make it 'stop' at a white-space character. Or
/http:\/\/localhost\/something\/([a-z0-9]*)/
if you are sure that randomstring only contains alpha-numerical characters.
Example: https://regex101.com/r/U12o53/1

You need to modify the (.*) part of the url so it only contains valid url characters, e.g.
/http:\/\/localhost\/something\/([\d\w\-_]*)/
You can modify it as you need based on the characters that can be in randomstring.

Related

How to reverse regx to not match

I have regular which select url, I want that it not select url only word, how to not select url? instead select word like (admin,hello).
Regex
((.*?\w+|\W):\/\/[\w\-\.]+.*?\/*.*?\w\W+.*\/.*?\w\W+.*?\/{0,})
Text
htt$ps://b24-56kck1.$bitr%ix24.kz/com#pany/pe#rsonal/us^&er/19/k/roce/
https://1.tesssst1.ru/ororo
admin
hello
##$#$$#w_svccx354V2346Vf
SendAjaxFilterToServer(quiz_questions);
Alex, it is very hard to invert a regular expression, so you want to think in terms of the attributes of what you want to match. One thing that jumps out to me is you just want the line to contain letters. For that, you can use ^[a-zA-Z]+$
Another way to go at it, is you can create an inverted list of characters - ones which you don't want present. This can be harder, but for the simple example input you give, you don't want ":", "/" or "#" to be in the line. That would be ^[^:/#]+$.
These are examples of how you need to think about the problem.
Try this, then trip the surrounding whitespace (because of lack of support for lookaround in Go):
(^|[\n\s])[a-zA-Z]+([\n\s]|$)
https://regex101.com/r/MqyDWC/3

I need a RegExp that's special

I need a custom RegExp. In a big text I want to remove any href tag with a certain URL. The perk is that those URLs are made by a server and contain an extra bit of url made of char-upper/lower-number.
So I would like Notepad++ to search and replace by naught all strings that contain an a href+http://www.gymglish.com/workbook/show-lesson/+extrastring like xwSzAdM45jL6+</a>
With http://www.gymglish.com/workbook/show-lesson/[a-zA-Z0-9/] Notepad++ find the string and perform the replacement till the first char of the extra bit (eg : xDghdS5jkA becomes DghdS5jkA).
I made a simple reasoning : if it does the replacement till the first char I must repeat the Regexp for 14 next chars thus
http://www.gymglish.com/workbook/show-lesson/[a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/]/[a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/][a-zA-Z0-9\/]>*</[a|A]> :-) however that's a dumb regexp
This should do the trick: (edited to use the new URL)
<[a|A] (href|HREF)=[\'|\"]http:\/\/www\.gymglish\.com\/workbook\/show-lesson[\/a-zA-Z0-9]*[\'|\"]>[a-zA-Z0-9 ]*<\/[a|A]>
Debuggex Demo

Replacing Or Stripping Special Characters With Regex (Smarty)

in a part of my website i have a url that looks like this:
http://www.webizzi.com/mp3/search.html?q=+Hush+Hush+-+(Avril)++Lavigne's+
I would like to keep a cleaner url by stripping every special character that appears on the url except + but i also do not want to have something like ++ or + at the beginning or end of the url, the url should look like the one below
http://www.webizzi.com/mp3/search.html?q=Hush+Hush+Avril+Lavigne
what i have to process the url at the moment is:
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/\s+/":"+"|stripslashes}
Assuming your input is everything after ?q=
s/(^\++|\++$|\+\++|[\(\)]+)//g
In those last pair of brackets, you put any other characters you want stripped.
This matches one or more opening +'s, one or more closing +'s, two or more +'s anywhere, or one or more the special characters inside the brackets (so far, just parentheses) and replaces it with nothing – an empty string – zilch – nada.
I don't know jack about Smarty, but I think you should try something like
{$config.siteurl}search.html?q={$tags[row].tag|regex_replace:"/(^\++|\++$|\+\++|[\(\)]+)/":""|stripslashes}
I'm not quite sure if you need to escape the parentheses here, so if it doesn't work, lose some backslashes.
href="{$site_url}tests/tests/view/{$test.test_id}/{strtolower($test.test_name|replace:' ':'-'|regex_replace:'/(^++|++$|+++|[()]+)/':'')}">Test

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.