Regular expression that uses an "OR" conditional - regex

I could use some help writing a regular expression. In my Django application, users can hit the following URL:
http://www.example.com/A1/B2/C3
I'd like to create a regular expression that allows accepts any of the following as a valid URL:
http://www.example.com/A1
http://www.example.com/A1/B2
http://www.example.com/A1/B2/C3
I'm guessing I need to use the "OR" conditional, but I'm having trouble getting my regex to validate. Any thoughts?
UPDATE: Here is the regex so far. Note that I have not included the "http://www.example.com" portion -- Django handles that for me. I'm just concerned with validating 1,2, or 3 subdirectories.
^(\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20}))$

Skip the |, use the ? and ()
http://www\.example\.com/A1(/B2(/C3)?)?
And if you replace the A1-C3 with a pattern:
http://www\.example\.com/[^/]*(/[^/]*(/[^/]*)?)?
Explanation:
it matches every string that starts with http://www.example.com/A1
it can match an additional /B2 and even an additional /C3, but /C3 is only matched, when there is a /B2
[^/]* (as many non slashes as possible)
if you need the A1-C3 in special capture groups, you can use this:
http://www\.example\.com/([^/]*)(/([^/]*)(/([^/]*))?)?
Will give (groupnumber: content):
matches: 0: (http://www.example.com/dir1/dir2/dir3)
1: (dir1)
2: (/dir2/dir3)
3: (dir2)
4: (/dir3)
5: (dir3)
You can check it out online here or get this tool (yes it's free, and it's even written in Lisp...).

There's a much more Django way to do this:
urlpatterns = patterns('',
url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2}/(?P<object_slug3>\w{2})$', direct_to_template, {"template": "two_levels_deep.html"}, name="two_deep"),
url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2})$', direct_to_template, {"template": "one_level_deep.html"}, name="one_deep"),
url(r'^(?P<object_slug1>\w{2})$', direct_to_template, {"template": "homepage.html"}, name="home"),
)
The other methods don't take advantage of Django's power to pass variables.
Edit: I switched the order of the urlpattern to be more obvious for the parser (i.e. bottom up is more defined than top down).

http://www\.example\.com/A1(/B2(/C3)?)?

^(\w{1,20})(/\w{1,20})*
this is for as many subdirectories as you like if you only want 2:
^(\w{1,20})(/\w{1,20}){0,2}

If I'm understanding, I think you just need another set of parens around the whole OR statement:
^((\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20})))$

Be aware that Django's reverse URL matching (permalinks, reverse() and {% url %}) can handle a limited subset of regular expressions. To be able to use them, it's sometimes necessary to split complex regexes into separate URL dispatcher rules.

Related

Urlpattern regular expression not working

So i'm trying to make url like so
re_path(r'^product_list/(?P<category_slug>[\w-]+)/(?:(?P<filters>[\w~#=]+)?)$', views.ProductListView.as_view(), name='filtered_product_list'),
and at this point it works with things like:
/product_list/sdasdadsad231/bruh=1~3~10#nobruh=1~4
bruh=1~3~10#nobruh=1~4 - those are filters
but later i want to implement search by word functionality
so i want it recognize things like
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4&search_for=athing
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4
/product_list/sdasdadsad231/?search_for=athing
/product_list/sdasdadsad231/
so in different situations it will get filters and/or search_for or nothing at all
You might write the pattern as:
^product_list/(?P<category_slug>[\w-]+)/(?:\??(?P<filters>[\w~#=&-]+)?)$
Regex demo
If you want to match the leading / from the example data, you can append that in the pattern after the ^
The part after the question mark is the query string [wiki] and does not belong to the path. Django will construct a QueryDict for this, and this will be available through request.GET [Django-doc]. Indeed, if the path is for example:
/product_list/sdasdadsad231/?filters=bruh-1~3~10&nobruh-1~4&search_for=athing
Then the ?filters=bruh-1~3~10&nobruh-1~4&search_for=athing is not part of the path, and it will be wrapped in request.GET as a QueryDict that looks like:
>>> QueryDict('filters=bruh-1~3~10&nobruh-1~4&search_for=athing')
<QueryDict: {'filters': ['bruh-1~3~10'], 'nobruh-1~4': [''], 'search_for': ['athing']}>
You thus can not capture the part after (and including) the question mark, this is already stripped of the path when trying to match with the re_path(…) and path(…) definitions.

combine two URLs REGEX

I have data from two URLS that I need to combine using REGEX
/online-teaching
/online-teaching?fbclid
I have /(online-teaching)|(online teaching)
I can't figure out how to include the url with the ? and the one without.
Thanks!
How about something as simple as:
online-teaching(?:.+)?
Regex demo
Match online-teaching and anything that follows, if it exists (might need to constraint for specific characters instead of matching all with . to have a valid URL, but I'll leave that up to you).

Regular expression in URL for Django slug

I have 2 URL's with a slug field in the URL.
url(r'^genres/(?P<slug>.+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>.+)/monthly/$', views.genre_month, name='genre_month'),
The first one opens fine but the second one gives a DoesNotExist error saying Genres matching query does not exist.
Here is how I'm accessing the 2nd URL in my HTML
<li>Monthly Top Songs</li>
I tried to print the slug in the view. It is passed as genre_name/monthly instead instead of genre_name.
I think the problem is with the regex in the URLs. Any idea what's wrong here?
Django always uses the first pattern that matches. For urls similar to genres/genre_name/monthly your first pattern matches, so the second one is never used. The truth is the regex is not specific enough, allowing all characters - which doesn't seem to make sense.
You could reverse the order of those patterns, but what you should do is to make them more specific (compare: urls.py example in generic class-based views docs):
url(r'^genres/(?P<slug>[-\w]+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>[-\w]+)/monthly/$', views.genre_month, name='genre_month'),
Edit 2020:
Those days (since Django 2.0), you can (and should) use path instead of url. It provides built-in path converters, including slug:
path('genres/<slug:slug>/', views.genre_view, name='genre_view'),
path('genres/<slug:slug>/monthly/', views.genre_month, name='genre_month'),
I believe that you can also drop the _ from the pattern that #Ludwik has suggested and revise to this version (which is one character simpler :) ):
url(r'^genres/(?P<slug>[-\w]+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>[-\w]+)/monthly/$', views.genre_month, name='genre_month'),
Note that \w stands for "word character". It always matches the ASCII characters [A-Za-z0-9_]. Notice the inclusion of the underscore and digits. more info
In Django >= 2.0, slug is included in URL by doing it like below.
from django.urls import path
urlpatterns = [
...
path('articles/<slug:some_title>/', myapp.views.blog_detail, name='blog_detail'),
...
]
Source: https://docs.djangoproject.com/en/2.0/ref/urls/#django.urls.path

Django url reverse: Non-reversible reg-exp portion: '(?='

Django version: 1.5 (trunk)
I'm using a positive look-ahead assertion in url pattern A, which works fine by itself. But when I try to reverse url pattern B, which is completelly unrelated, I get:
ValueError: Non-reversible reg-exp portion: '(?='
Example urls:
url(r'^foo(?=bar)/', test, name= 'bla'),
url(r'bar/', test, name= 'bli'),
Triggering the error:
from django.core.urlresolvers import reverse
reverse('bli')
I found this related ticket, but didn't make me smarter sadly
https://code.djangoproject.com/ticket/17492
Anyone can tell me what's wrong with the code?
Your code is OK, the problem is, Django can't reverse every possible regular expression. Currently Django's implementation of regex normalizer can't handle at least two things: disjunction (|) and non-capturing (look-ahead, look-behind) patterns.
So, to solve your problem, just avoid using look-ahead in your URL patterns and you're good to go. It should be possible, after all, using plain regular expressions without all those funky extensions it is possible to represent any regular language.

Django urlpatterns frustrating problem with trailing slashes

All of the examples I can find of urlpatterns for django sites have a separate entry for incoming urls that have no leading slash, or the root folder. Then they handle subfolders on each individual line. I don't understand why a simple
/?
regular expression doesn't permit these to be on one simple line.
Consider the following, let's call the Django project Baloney and the App name is Cheese. So in the project urls.py we have something like this to allow the apps urls.py to handle it's requests...
urlpatterns = patterns('',
(r'^cheese/', include('Baloney.Cheese.urls')),
)
then inside of the Cheese apps urls.py, I don't understand why this one simple line would not trigger as true for all incoming url subpaths, including a blank value...
urlpatterns = patterns('',
(r'^(?P<reqPath>.*)/?$', views.cheeseapp_views),
)
Instead, it matches the blank case, but not the case of a value present. So...
http://baloneysite.com/cheese/ --> MATCHES THE PATTERN
http://baloneysite.com/cheese/swiss --> DOES NOT MATCH
Basically I want to capture the reqPath variable to include whatever is there (even blank or '') but not including any trailing slash if there is one.
The urls are dynamic slugs pulled from the DB so I do all the matching up to content in my views and just need the url patterns to forward the values along. I know that the following works, but don't understand why this can't all be placed on one line with the /? regular expression before the ending $ sign.
(r'^$', views.cheeseapp_views, {'reqPath':''}),
(r'^(?P<reqPath>.*)/$', views.cheeseapp_views),
Appreciate any insights.
I just tried a similar sample and it worked as you wrote it. No need for /?, .* would match that anyway. What is the exact error you are getting? Maybe you have your view without the request parameter? I.e. views.cheeseapp_views should be something like:
def cheeseapp_views(request, reqPath):
...
Edit:
The pattern that you suggested catches the trailing slash into reqPath because * operator is greedy (take a look at docs.python.org/library/re.html). Try this instead:
(r'^(?P<reqPath>.*?)/?$', views.cheeseapp_views)
note it's .*? instead of .* to make it non-greedy.