Django - Accepting full sentence in query parameters - regex

I'm putting together a small API that just converts the message coming in to Spongebob mockcase.
I've got everything rolling, but coming back I'm realizing I've been testing with a single value & thus just noticed the following URL entry will not be able to accept spaces/%20.
url(r'^mock/(?P<message>\w+)/$',mock, name='mock'),
I've looked all over, but not sure how to phrase what I'm looking for appropriately to find anything useful. What would I be looking for to accept a full sentence.
Worth noting this will be coming from a chat message, therefore, it will be sent as is, not percent encoded.

You don't really want to put things like that as URL parameters. Instead it should be in the querystring: for example mysite.com/mock/?message=Message+goes+here.
The URL should just be:
url('^mock/$', ...)
and the view then just gets the data from request.GET['message'].

You may fix your immediate issue using
r'^mock/(?P<message>[^/]+)/?$
See the regex demo
Here, [^/]+ matches any one or more chars other than / and the /? matches an optional / at the end of the string ($).

Related

Reliably match an url inside a line

I'm having some trouble figuring out what I thought to be pretty simple regex. I'm trying to make a Twitter bot in Python that tweets quotes from some author.
I need it to:
read a quote and an url from a file
parse the quote and the url apart so that it can add quotes marks
around the quote part and use the url part to determine which book
the quote is from and add the relevant book cover
I also need to get the url apart to calculate the tweet length after
twitter shortened the url
One last thing: some quotes might not have url, I need it to identify that and add some random pics as a fallback.
After trials and errors, I came up with this regex that seemed to do the job when I tested it : r'(?P<quote>.*)(?P<link>https.*)?'
Since I don't need to validate url I don't think that I need any complicated regex like the ones I came across in my research.
But when I tried to fire up the bot, I realized it won't parse the quote correctly, and instead catch the whole line as "quote" (and failing to identify the url).
What puzzles me is that it doesn't fail consistently, instead it seems that sometimes it works, and sometimes it doesn't.
Here is an example of what I'm trying to do that fails unreliably: https://regex101.com/r/mODPUq/1/
Here is the whole function I've written:
def parseText(text):
# Separate the quote from the link
tweet = {}
regex = r'(?P<quote>.*)?(?P<link>https.*)?'
m = re.search(regex, text)
tweet = m.groupdict("")
return tweet
[EDIT] Ok I didn't quite solve the problem this way but found a workaround that might not be very elegant but at least seem to do the job :
I have 2 separate functions, one to get the url, the other to split the url out of the line and return the quote alone.
I first call getUrl(), and then only if it returns something that is not None, I call getQuote(). If url == None, I can directly tweet the whole line.
This way the regex part became very straightforward, and it seems to work so far with or without url. I just have one minor issue, when there's no url even if I use str.split('/n') to cut out the newline character it must still be there, because when I add quotes mark the last one is on a newline.
I leave the issue open for now since technically it's not resolved, thanks to those that gave me answer but it doesn't seem to work.
You can also change regex string to r'(?P<quote>.*)?.(?P<link>https.*)' which also takes care of any extra characters between the quote and the link

Retrieve characters after nth occurrence of an another with Regex

I'm writing a simple bot that broadcasts messages to clients based on messages from a server. This will be done in JavaScript but I am trying to understand Regex. I've been Googling for the past hour and I've come so close but I am simply unable to solve this one.
Basically I need to retrieve everything between the second / and the first [. It sounds really simple but I cannot figure out how to do this.
Here's some sample code:
192.168.1.1:33291/76561198014386231/testName joined [linux/76561198014386231]
Here's the Regex I've come up with:
\/(.*?)\[
I've found lots of similar questions here on StackOverflow but most of them seem specific to a particular language or end up being too complex and I'm unable to whittle down the query.
I know this is a simple one, but I am totally stumped.
Instead of .*?. Then you could match everything but a forward slash by doing [^\/]*.
([^\/]*)\s*\[
Live preview
If it needs to be after the second slash. As in the contents between the second slash and the square bracket can contain slashes. Then you could do:
(?:.*?\/){2}(.*)\s*\[
Live preview
Remove the \s* if you want to. I'm just assuming you don't care about that whitespace.

Regex to look for url start value and end value

I'm using using regex to look for URL that starts with http or https and with a specific value.
^http|https\:\/\/www
This regex looks at the http/https in a URL and this works.
/[\/]\bvalue?\b[\/]/g
This regex looks for "value" in a url and this currently matches with
http://www.test.co.uk/value/
http://www.test.co.uk/folder/value/
Is there a possibility to put those two regex together? Basically I need to display URLs that doesn't contain http/https or /value/ in the URL path
You're looking to do this: /(?=^(https|http))|(\bvalue\b)/g
First half: (?=^(https|http)) which will look first for https and then for http. My personal opinion however is to reduce the code to look only for http, since by matching for http you can also match for https. You may think this behavior is not going to work, but logically it does. You can try that if you like and see what happens.
Second half: (\bvalue\b). You can be more specific such as it being between forward and back slashes, or not. I used the \b delimiter to avoid it being part of another string and it worked quite well.
The important part here is to unite them, so use the | operator and it yields the above result.
Test strings:
http://www.helloworldvalue/value/values/
https://www.helloworldvalue/values/svalue/value/value/vaaluevalue/
Try it and let me know if you have any questions in the comments below.

Regex to match a URL with parameters but reject URL with subfolder

Short Question: What regex statement will match the parameters on a URL but not match a subfolder? For example match google.com?parameter but not match google.com/subdomain
Long Question: I am re-directing a few URLs on a site.
I want a request to ilovestarwars.com/page2 to re-direct to ilovestarwars.com/forceawakens
I setup this re-direct and it works great most of the time. The problem is when there are URL parameters. For example if someone sends the URL using an email program that tracks links. Then ilovestarwars.com/page2 becomes ilovestarwars.com/page2?parameter=trackingcode123 after they send it which results in a 404 on my site because it is looking for the exact URL.
No problem, I will just use Regex. So I now re-direct using ilovestarwars.com/page2(.*) and it works great accepts all the parameters, no more 404s.
However, trying to future proof my work, I am worried, what happens if someone adds content inside the page2 folder? For example ilovestarwars.com/page2/mistake
They shouldn't, but if they do, it will take them forever to figure out why it is redirecting.
So my question is, how can I create a regex statement that will match the parameters but reject a subfolder?
I tried page2(.*?)/ as is suggested in this answer, but https://www.regex101.com/ says the slash is an unescaped delimiter.
Background info as suggested here, I am using Wordpress and the Redirection plugin. This is the article that goes over the initial redirect I setup.
A direct answer to your question would be something like this: ^/([^?&/\]*)(.*)$
This assumes the string starts at the first / (if it doesn't, remove the / that follows the ^). In the first capture group you will get the page name (page2, in the case of your example URL) and in the second capture group, you will get the remaining part of the url (anything following one of these chars: ?, &, /, \). If you don't care about the second capture group, use ^/([^?&/\]*).*$
An indirect answer would be that you don't do it this way. Instead, there should be an index page in folder page2 that uses a 301 redirect to redirect to the proper page. It would make much more sense to do it statically. I understand that you may not have that much control over your webpage, though, since it is Wordpress, in which case the former answer should work with the given plugin.

Regular Expression to match a specific URL broken up by arbitrary characters

I run a Django-based forum (the framework is probably not important to the question, but still) and it has been increasingly getting spammed with posts that link to a specific website constantly (www.solidwoodkitchen.co.uk - these people are apparently the worst).
I've implemented a string blocking system that stops them posting to the forum if the URL of the website is included in the post, but as spam bots usually do, it has figured out a way around that by breaking up the URL with other characters (eg. w_w_w.s*olid_wood*kit_ch*en._*co.*uk .). So a couple of questions:
Is it even possible to build a regex capable of finding the specific URL within a block of text even when it has been modified like that?
If it is, would this cause a performance hit?
Description
You could break the url into a string of characters, then join them together with [^a-z0-9]*?. So in this case with www.solidwoodkitchen.co.uk the resulting regex would look like:
w[^a-z0-9]*?w[^a-z0-9]*?w[^a-z0-9]*?[.][^a-z0-9]*?s[^a-z0-9]*?o[^a-z0-9]*?l[^a-z0-9]*?i[^a-z0-9]*?d[^a-z0-9]*?w[^a-z0-9]*?o[^a-z0-9]*?o[^a-z0-9]*?d[^a-z0-9]*?k[^a-z0-9]*?i[^a-z0-9]*?t[^a-z0-9]*?c[^a-z0-9]*?h[^a-z0-9]*?e[^a-z0-9]*?n[^a-z0-9]*?[.][^a-z0-9]*?c[^a-z0-9]*?o[^a-z0-9]*?[.][^a-z0-9]*?u[^a-z0-9]*?k
Edit live on Debuggex
This could would basically search for the entire string of characters seperated by zero or more non alphanumeric characters.
Or you could take the input text and strip out all punctuation then simply search for wwwsolidwoodkitchencouk.