Urlpattern regular expression not working - django

So i'm trying to make url like so
re_path(r'^product_list/(?P<category_slug>[\w-]+)/(?:(?P<filters>[\w~#=]+)?)$', views.ProductListView.as_view(), name='filtered_product_list'),
and at this point it works with things like:
/product_list/sdasdadsad231/bruh=1~3~10#nobruh=1~4
bruh=1~3~10#nobruh=1~4 - those are filters
but later i want to implement search by word functionality
so i want it recognize things like
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4&search_for=athing
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4
/product_list/sdasdadsad231/?search_for=athing
/product_list/sdasdadsad231/
so in different situations it will get filters and/or search_for or nothing at all

You might write the pattern as:
^product_list/(?P<category_slug>[\w-]+)/(?:\??(?P<filters>[\w~#=&-]+)?)$
Regex demo
If you want to match the leading / from the example data, you can append that in the pattern after the ^

The part after the question mark is the query string [wiki] and does not belong to the path. Django will construct a QueryDict for this, and this will be available through request.GET [Django-doc]. Indeed, if the path is for example:
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4&search_for=athing
Then the ?filters=bruh-1~3~10&nobruh-1~4&search_for=athing is not part of the path, and it will be wrapped in request.GET as a QueryDict that looks like:
>>> QueryDict('filters=bruh-1~3~10&nobruh-1~4&search_for=athing')
<QueryDict: {'filters': ['bruh-1~3~10'], 'nobruh-1~4': [''], 'search_for': ['athing']}>
You thus can not capture the part after (and including) the question mark, this is already stripped of the path when trying to match with the re_path(…) and path(…) definitions.

Related

combine two URLs REGEX

I have data from two URLS that I need to combine using REGEX
/online-teaching
/online-teaching?fbclid
I have /(online-teaching)|(online teaching)
I can't figure out how to include the url with the ? and the one without.
Thanks!
How about something as simple as:
online-teaching(?:.+)?
Regex demo
Match online-teaching and anything that follows, if it exists (might need to constraint for specific characters instead of matching all with . to have a valid URL, but I'll leave that up to you).

regular expression in python for match url

I need to use python to match url in my text file.
However, there is a special case:
i like 🤣pic.twitter.com/Sex8JaP5w5/a7htvq🤣
In this case I would like to keep the emoji next to the url and just match the url in the middle.
Ideally, I would like to have result like this:
i like 🤣<url>🤣
Since I am new to this, this is what I have so far.
pattern = re.compile("([:///a-zA-Z////\.])+(.com)+([:///a-zA-Z////\.])")
but the return result is something unsatisfied like this:
i like 🤣<url>Sex8JaP5w5/a7htvq🤣
Would you please help me with this? Thank you so much
A solution using existing packages:
from urlextract import URLExtract
import emoji
def remove_emoji(text):
return emoji.get_emoji_regexp().sub(r'', text)
extractor = URLExtract()
source = "i like 🤣pic.twitter.com/Sex8JaP5w5/a7htvq🤣 "
urlsWithEmojis = extractor.find_urls(source)
urls = list(map(remove_emoji, urlsWithEmojis))
print(urls)
output
['pic.twitter.com/Sex8JaP5w5/a7htvq']
Try it Online!
Inspired by How do you extract a url from a string using python? and removing emojis from a string in Python
If looks like you are missing * or+ at the last matching group so it only matches one character. So you want "([:///a-zA-Z////\.])+(.com)+([:///a-zA-Z////\.])*" or "([:///a-zA-Z////\.])+(.com)+([:///a-zA-Z////\.])+".
Now I don't know if this regex is simplified for your case, but it does not match all urls. For an example of that check out https://www.regextester.com/20
If you are attempting to match any url I would recommend rethinking your problem and trying to simplify down to more specific types of urls, like the example you provided.
EDIT: Also why (.com)+? Is there really a case where multiple ".com"s appear like .com.com.com
Also I think you have small typo and it is supposed to be (\.com). But since you have ([:///a-zA-Z////\.])+ it could be reduced to (com), however i think the explicit (\.com) makes it an easier expression to read.

Django URL issue with regular expressions

I am new to Python, Django 1.9 and overall regular expressions. So I am trying to write something like this within urls.py
search/doc_name/language/?id
where doc_name, allow for any name/case/length etc. like so: 'My Fave Doc 12'
where language, allow two letters like so: 'en'
where id, allows only numbers.
This is what I have, can someone point out where I went wrong?
url(r'^search/[\w-]+/[a-z]{2}+/(?P<id>[0-9]+)$', '....
The doc_name doesn't allow spaces. Add a space in the character set if you want one. Make sure you put it before the dash ([\w -]+). If other whitespaces are allowed, used \s instead ([\w\s-]+).
Also the language would currently match any even amount of letters. Remove the + and leave only [a-z]{2}. + means repeat one or more times, anything is matched only once by default.
You should really avoid to have spaces in you URL, I suggest the following:
url format: /search/<doc_name>/<id>/?lang=<language>
in urls.py:
url(r'^search/(?P<doc_name>[\w]+)/(?P<id>[0-9]+)/$'), your_view)
in views.py:
lang = request.GET.get('lang', 'en')
doc_name = request.POST.get('doc_name')
id = request.POST.get('id')

How do I match the question mark character in a Django URL?

In my Django application, I have a URL I would like to match which looks a little like this:
/mydjangoapp/?parameter1=hello&parameter2=world
The problem here is the '?' character being a reserved regex character.
I have tried a number of ways to match this... This was my first attempt:
(r'^pbanalytics/log/\?parameter1=(?P<parameter1>[\w0-9-]+)&parameter2=(?P<parameter2>[\w0-9-]+), 'mydjangoapp.myFunction')
This was my second attempt:
(r'^pbanalytics/log/\\?parameter1=(?P<parameter1>[\w0-9-]+)&parameter2=(?P<parameter2>[\w0-9-]+), 'mydjangoapp.myFunction')
but still no luck!
Does anyone know how I might match a '?' exactly in a Django URL?
Don't. You shouldn't match query string with URL Dispatcher.
You can access all values using request.GET dictionary.
urls
(r'^pbanalytics/log/$', 'mydjangoapp.myFunction')
function
def myFunction(request)
param1 = request.GET.get('param1')
Django's URL patterns only match the path component of a URL. You're trying to match on the querystring as well, this is why you're having trouble. Your first regex does what you wanted, except that you should only ever be matching the path component.
In your view you can access the querystring via request.GET
The ? character is a reserved symbol in regex, yes. Your first attempt looks like proper escaping of it.
However, ? in a URL is also the end of the path and the beginning of the query part (like this: protocol://host/path/?query#hash.
Django's URL dispatcher doesn't let you dispatch URLs based on the query part, AFAIK.
My suggestion would be writing a django view that does the dispatching based on the request.GET parameter to your view function.
The way to do what the original question was i.e. catch-all in URL dispatch var...
url(r'^mens/(?P<pl_slug>.+)/$', 'main.views.mens',),
or
url(r'^mens/(?P<pl_slug>\?+)/$', 'main.views.mens',),
As far as why this is needed, GET URL's don't exactly provide good "permalinks" or good presentation in general for customers and to clients.
Clients often times request the url be formatted i.e.
www.example-clothing-site.com/mens/tops/shirts/t-shirts/Big_Brown_Shirt3XL
this is a far more readable interface for the end-user and provides a better overall presentation for the client.

Regular expression that uses an "OR" conditional

I could use some help writing a regular expression. In my Django application, users can hit the following URL:
http://www.example.com/A1/B2/C3
I'd like to create a regular expression that allows accepts any of the following as a valid URL:
http://www.example.com/A1
http://www.example.com/A1/B2
http://www.example.com/A1/B2/C3
I'm guessing I need to use the "OR" conditional, but I'm having trouble getting my regex to validate. Any thoughts?
UPDATE: Here is the regex so far. Note that I have not included the "http://www.example.com" portion -- Django handles that for me. I'm just concerned with validating 1,2, or 3 subdirectories.
^(\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20}))$
Skip the |, use the ? and ()
http://www\.example\.com/A1(/B2(/C3)?)?
And if you replace the A1-C3 with a pattern:
http://www\.example\.com/[^/]*(/[^/]*(/[^/]*)?)?
Explanation:
it matches every string that starts with http://www.example.com/A1
it can match an additional /B2 and even an additional /C3, but /C3 is only matched, when there is a /B2
[^/]* (as many non slashes as possible)
if you need the A1-C3 in special capture groups, you can use this:
http://www\.example\.com/([^/]*)(/([^/]*)(/([^/]*))?)?
Will give (groupnumber: content):
matches: 0: (http://www.example.com/dir1/dir2/dir3)
1: (dir1)
2: (/dir2/dir3)
3: (dir2)
4: (/dir3)
5: (dir3)
You can check it out online here or get this tool (yes it's free, and it's even written in Lisp...).
There's a much more Django way to do this:
urlpatterns = patterns('',
url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2}/(?P<object_slug3>\w{2})$', direct_to_template, {"template": "two_levels_deep.html"}, name="two_deep"),
url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2})$', direct_to_template, {"template": "one_level_deep.html"}, name="one_deep"),
url(r'^(?P<object_slug1>\w{2})$', direct_to_template, {"template": "homepage.html"}, name="home"),
)
The other methods don't take advantage of Django's power to pass variables.
Edit: I switched the order of the urlpattern to be more obvious for the parser (i.e. bottom up is more defined than top down).
http://www\.example\.com/A1(/B2(/C3)?)?
^(\w{1,20})(/\w{1,20})*
this is for as many subdirectories as you like if you only want 2:
^(\w{1,20})(/\w{1,20}){0,2}
If I'm understanding, I think you just need another set of parens around the whole OR statement:
^((\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20})))$
Be aware that Django's reverse URL matching (permalinks, reverse() and {% url %}) can handle a limited subset of regular expressions. To be able to use them, it's sometimes necessary to split complex regexes into separate URL dispatcher rules.