Django-TinyMCE Validation Rules - django

I am using TinyMCE in my Django Admin site. I need to validate that no disallowed HTML Tags get submitted. This is what I tried:
1) Validation Method
def check_for_invalid_html_tags(value) :
compiled_regex = re.compile('<(?!/?(p|div|ul|li)(>|\s))[^<]+?>')
if compiled_regex.match(value):
raise ValidationError('Invalid Tags')
2) Validation Rule
content = tinymce_models.HTMLField(validators=[check_for_invalid_html_tags])
This does not seem to work, as any submission is let through as valid. When I change the tinymce_models.HTMLField to models.TextField, the rule works perfectly. Thus I believe that the issue is as a result of TinyMCE.
Can anybody help?

I read the doc and there is a slight difference between match and search
match:
If zero or more characters at the beginning of string ...
search:
Scan through string looking for the first location ...
search() vs. match()
since what your are looking for might be everywhere in your string you need to use search instead of match. An other point, you might neeed to set the fag re.S or re.DOTALL since you might have newline in your input.
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
So here is the check_for_invalid_html_tags in a functor and a working solution.
import re
class CheckForInvalidHtmlTags(object):
compiled_regex = re.compile('<(?!/?(p|div|ul|li)(>|\s))[^<]+?>')
def __call__(self, value):
if self.compiled_regex.search(value):
print 'error'
else:
print 'ok'
c = CheckForInvalidHtmlTags()
c('test test <a>test<a> test') # print error
c('test <p> test</p>') # print ok
c('test<a> test</a><p>test</p>test') # print error

Your validation method must actually be a validator, which has special methods like __call__. Use one of django's core validators, like the regex validator.
from django.core.validators import RegexValidator
check_for_invalid_html_tags = RegexValidator(
regex=''<(?!/?(p|div|ul|li)(>|\s))[^<]+?>'',
message='Invalid Tags',
code='invalid_content'
)
Then in your model:
content = tinymce_models.HTMLField(validators=[check_for_invalid_html_tags])

Related

Regex validator does not work with Django admin forms

I try to use a RegexValidator with a CharField, but I can't make it work...
class Configuration(models.Model):
name = models.CharField(verbose_name=u'Name', validators =
[RegexValidator(regex="[a-z]", message="Not cool", code="nomatch")])
I then just register it with
admin.site.register(Configuration)
But then in the admin forms, it accepts any possible name... Is the validation system suppose to work like that, or am I missing something ?
Your current regex checks that your value contains a single character from a-z. So it allows a, but it also allows a1.
Try changing the regex to:
regex=r"^[a-z]+$"
By including ^ and $ to mark the beginning and end of string, you make sure that your string contains only characters from a-z. The + allows multiple characters.

How to query by string

In Product model and i want to query by food title. and return me NoReverseMatch error
html url:
some text
views.py:
def product(request, food_name):
product = Catalog.objects.get(food_name=food_name)
return render(request, 'food/product.html', {'product':product})
url.py
url(r'^product/(?P<food_name>\w+)/', food_views.product, name='product'),
Trace
NoReverseMatch: Reverse for 'product' with arguments '()' and keyword arguments '{u'food_name': u'%D9%86%D8%A7%D9%86%20%D8%A8%D9%88%D8%B1%DA%A9'}' not found. 1 pattern(s) tried: [u'product/(?P<food_name>\\w+)/']
Remove the urlencode, you don't need it
some text
urlencode is used when you need to encode a string in a way that will allow it to be used inside a url (such as when you're adding get parameters). Above, you are just passing a string parameter to a function that is constructing a url.
You seem to be trying to encode arabic characters into your url which are not matched by \w so you need to update your url to support these
^product/(?P<food_name>[\w\u0600-\u06FF]+)/
Will handle most of these (See this regexr example), but I'm not familiar with arabic enough to know what the unicode for ک is
I believe it's because \w+ doesn't match URL-encoded string. Try to change it temporarily to .* (just to check if there are not any other issues). If it will work — change \w+ to better template matching URL-encoded strings.

django 1.7 how to pass arguments to function regular expressions

I am trying to pass an ID of a table to my function but I am not sure what's going on.
if I hard code the ID number does work, if I use the (?Pd+) with d+ so it use as many digits, like in the tutorials. doesn't work. should this be different?
thanks guys.
my urls
from django.conf.urls import patterns, include, url
from polls import views
urlpatterns = patterns('',
#url(r'^main_site/$', views.main_site),
url(r'^vote/$', views.vote),
url(r'^stadistics/$', views.stadistics),
# using it like this doesn't work
url(r'^vote/Restaurant_Info/(?P<rest_id>d+)/$', views.restaurant_menu),
#testing the info of the restaurant
# hard coding the id of the restaurant does work
url(r'^vote/Restaurant_Info/4/$', views.restaurant_menu),
my view
def restaurant_menu(request, rest_id="0"):
response = HttpResponse()
try:
p = Restaurant.objects.get(id=rest_id)
response.write("<html><body>")
response.write("<p>name of the restaurant</p>")
response.write(p.name)
response.write("</body></html>")
except Restaurant.DoesNotExist:
response.write("restaurant not found")
return response
You're missing a backslash in your expression, currently d+ matches the character d literally "one or more" times. The backslash in combination with a literal character creates a regular expression token with special meaning.
Therefore, \d+ will match digits 0 to 9 "one or more" times.
url(r'^vote/Restaurant_Info/(?P<rest_id>\d+)/$', views.restaurant_menu)
You're missing a slash. It should be (?P<rest_id>\d+)
url(r"^vote/Restaurant_Info/(?P<rest_id>\d+)/$", views.restaurant_menu),

Regular expression in URL for Django slug

I have 2 URL's with a slug field in the URL.
url(r'^genres/(?P<slug>.+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>.+)/monthly/$', views.genre_month, name='genre_month'),
The first one opens fine but the second one gives a DoesNotExist error saying Genres matching query does not exist.
Here is how I'm accessing the 2nd URL in my HTML
<li>Monthly Top Songs</li>
I tried to print the slug in the view. It is passed as genre_name/monthly instead instead of genre_name.
I think the problem is with the regex in the URLs. Any idea what's wrong here?
Django always uses the first pattern that matches. For urls similar to genres/genre_name/monthly your first pattern matches, so the second one is never used. The truth is the regex is not specific enough, allowing all characters - which doesn't seem to make sense.
You could reverse the order of those patterns, but what you should do is to make them more specific (compare: urls.py example in generic class-based views docs):
url(r'^genres/(?P<slug>[-\w]+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>[-\w]+)/monthly/$', views.genre_month, name='genre_month'),
Edit 2020:
Those days (since Django 2.0), you can (and should) use path instead of url. It provides built-in path converters, including slug:
path('genres/<slug:slug>/', views.genre_view, name='genre_view'),
path('genres/<slug:slug>/monthly/', views.genre_month, name='genre_month'),
I believe that you can also drop the _ from the pattern that #Ludwik has suggested and revise to this version (which is one character simpler :) ):
url(r'^genres/(?P<slug>[-\w]+)/$', views.genre_view, name='genre_view'),
url(r'^genres/(?P<slug>[-\w]+)/monthly/$', views.genre_month, name='genre_month'),
Note that \w stands for "word character". It always matches the ASCII characters [A-Za-z0-9_]. Notice the inclusion of the underscore and digits. more info
In Django >= 2.0, slug is included in URL by doing it like below.
from django.urls import path
urlpatterns = [
...
path('articles/<slug:some_title>/', myapp.views.blog_detail, name='blog_detail'),
...
]
Source: https://docs.djangoproject.com/en/2.0/ref/urls/#django.urls.path

Can you explain this simple example from chapter 3 of the Django book (it is about views, url patterns, and regular expressions)?

I am trying to follow what is being explained here: http://www.djangobook.com/en/2.0/chapter03/ (the confusing example is about 4/5 of the way down the page).
The idea is to have a URL in which any one of /time/plus/1, /time/plus/2, /time/plus/3 etc -- all the way up to /time/plus/99 -- could be matched. The book says:
How, then do we design our application to handle arbitrary hour
offsets? The key is to use wildcard URLpatterns. As we mentioned
previously, a URLpattern is a regular expression; hence, we can use
the regular expression pattern \d+ to match one or more digits:
Since we want to stop at 99, the book suggests using the following:
urlpatterns = patterns('',
# ...
(r'^time/plus/\d{1,2}/$', hours_ahead),
# ...
)
But now we are faced with the problem of capturing exactly which number the regular expression matches and using it in our calculations. The book's explanation proceeds:
Now that we’ve designated a wildcard for the URL, we need a way of
passing that wildcard data to the view function, so that we can use a
single view function for any arbitrary hour offset. We do this by
placing parentheses around the data in the URLpattern that we want to
save. In the case of our example, we want to save whatever number was
entered in the URL, so let’s put parentheses around the \d{1,2}, like
this:
(r'^time/plus/(\d{1,2})/$', hours_ahead),
If you’re familiar with regular expressions, you’ll be right at home
here; we’re using parentheses to capture data from the matched text.
Okay, I understand that the data is being captured -- but where is it being stored? How does Django know that it needs to pass the captured data to the hours_ahead function? Indeed, one commentator on the books website even asks the following question:
It's not clear from the description how saving the number entered in
the URL - by putting parentheses around d{1,2} - allows this value to
be passed as a parameter to hours_ahead.
Can you explain how the captured data from the URL get's passed to the hours_ahead function?
In case you're interested, here is the function from the views.py file:
from django.http import Http404, HttpResponse
import datetime
def hours_ahead(request, offset):
try:
offset = int(offset)
except ValueError:
raise Http404()
dt = datetime.datetime.now() + datetime.timedelta(hours=offset)
html = "<html><body>In %s hour(s), it will be %s.</body></html>" % (offset, dt)
return HttpResponse(html)
The parentheses form what is called a regular expression capturing group, as you allude to in your question.
I don't have the Django code in front of me, but you could actually do something like this yourself using the re module and an asterisk.
import re
def myfunc1arg(arg1):
print "I got passed the argument", arg1
def myfunc2args(arg1, arg2):
print "I got passed the arguments", arg1, "and", arg2
myfunc1arg(*re.match("args/(.*)", "args/hello").groups())
myfunc2args(*re.match("args/(.*)/(.*)", "args/hello/world").groups())
The first argument to re.match is the regular expression (the pattern you are putting in your code) and the second argument is the "url". Django uses a line of code like the above to pull out the groups in parentheses (the capturing groups) and pass them to your function.
ETA: If you're interested, here (lines 195-209) is the specific Django code that captures the regular expressions from the URL path:
def resolve(self, path):
match = self.regex.search(path)
if match:
# If there are any named groups, use those as kwargs, ignoring
# non-named groups. Otherwise, pass all non-named arguments as
# positional arguments.
kwargs = match.groupdict()
if kwargs:
args = ()
else:
args = match.groups()
# In both cases, pass any extra_kwargs as **kwargs.
kwargs.update(self.default_args)
return ResolverMatch(self.callback, args, kwargs, self.name)
What it has there in the ResolverMatch are args, a list of positional arguments, and kwargs, a list of keyword arguments (which can be created via named capturing groups).
For regular expressions in general, brackets () indicate that the portion of the string that matches the regex inside the brackets (in this case (\d{1,2})) is captured.
By captured, I mean that it is saved so that it can be retrieved later.
This is commonly used in (e.g.) find/replace operations: say I wanted to turn "I am 39 years old" to "Ewww, you're 39!". We have to not only match I am xx years old, but also capture the 39 so that we can use it later.
Typically then, you could do a find/replace with find being I am (\d{1,2}) years old, and replace with Ewww, you're \1!. The brackets in the find expression mean "save whatever is in these brackets because we want to use it later", and the \1 in the replace expression means "put the first thing we saved back in here."
For Django, (r'some.regex.here.including.parentheses',FUNCTION) means that anything that is saved (ie surrounded by brackets) in the regex is passed as an argument to the function.