Django: safe and unsafe html tag <a> with template-supplied URLs - django

Since HTML5 does not care if an attribute is enclosed by double-quotes, I used to skip them for the sake of simplicity. For example, I use:
<a href=/someURL/someArgs/>Link to some URL</a>
instead of
Link to some URL
However, I happened to read a document written by a security hacker, who indicates that if the URL is supplied by Django templates, there may be security problems. That is,
<a href={{ someURL }}>Link to some URL</a> <!-- Unsafe -->
Link to some URL <!-- Safe -->
Is that true? What kinds of security problems are there?

In my opinion, both of usages provide a similar way to insert a malicious code. It all depends on whether you always control what's in "someURL" or it's something based on input from your web user.

Related

Django view getting called twice (double GET request)

I'm creating a classifieds website in Django. A single view function handles global listings, city-wise listings, barter-only global listings and barter-only city-wise listings. This view is called ads.
The url patterns are written in the following order (note that each has a unique name although it's tied to the same ads view):
urlpatterns = patterns('',
url(r'^buy_and_sell/$', ads,name='classified_listing'),
url(r'^buy_and_sell/barter/$', ads,name='barter_classified_listing'),
url(r'^buy_and_sell/barter/(?P<city>[\w.#+-]+)/$', ads,name='city_barter_classified_listing'),
url(r'^buy_and_sell/(?P<city>[\w.#+-]+)/$', ads,name='city_classified_listing'),
)
The problem is that when I hit the url named classified_listing in the list above, the function ads gets called twice. I.e. here's what I see in my terminal:
[14/Jul/2017 14:31:08] "GET /buy_and_sell/ HTTP/1.1" 200 53758
[14/Jul/2017 14:31:08] "GET /buy_and_sell/None/ HTTP/1.1" 200 32882
This means double the processing. I thought urls.py returns the first url pattern matched. What am I doing wrong and what's the best way to fix this? All other calls work as expected btw (i.e. only once).
Note: Ask for more information in case I've missed something.
Great explanation to understand these type of occurences: https://groups.google.com/d/msg/django-users/CRMMYWix_60/KEIkguUcqxYJ
This issue has nothing to do with how url patterns are ordered in urls.py.
Like pointed out in the comments under the question, this has to do with problematic asset references in the HTML template.
What does that mean?
For instance, try curl -i http://localhost:8000/example/ >> output.txt in your terminal. Then open up output.txt in your editor of choice. Now search for href or src attributes where values are None (or otherwise malformed). That's one reason a double call is being created. That was the reason for me. I removed these, and the double call disappeared.
There's this old - but relevant - writeup about how to comprehensively diagnose this problem on your machine here: https://groups.google.com/forum/#!msg/django-users/CRMMYWix_60/KEIkguUcqxYJ
Happy testing.
As I can't comment on other answers, just to add for future wanderers that for me the "problem" was in a correctly formed but yet for the browser instructing <iframe src="#"..> tag. On django server the view was rendering twice, once with original request and then again by the hidden iframe element that I used for some of the modal popups later in the page usage.
After emptying the src attribute like <iframe src=""..> a second request is no longer initiated and my modals work fine.
The solution actually is from the link posted already in answers before [https://groups.google.com/forum/#!msg/django-users/CRMMYWix_60/KEIkguUcqxYJ][1]
where it is explained:
Note that it's a URI. That means something that is retrieved. Since
you've used the value "#fff", that will be interpreted by the browser as
a reference to the current page (#fff being an anchor, and not passed to
the server). Ergo, a second request is made.
that the iframe src # (anchor) is instructing the browser to load again the same URL, for the iframe element in my case.
I indeed had several style elements with #fff colors inside and whatnot, but this wasn't it, as browsers are smart enough to recognize this is not an anchor.
With available tools (browser only) I found to be easy to debug and find these initiation href/src attributes over the Network tab of your browser developer tools - in Chrome is just by clicking the Initiator link of the corresponding row - giving you the exact line from the page source that initiated the request to the same URL.
I struggled with the same problem and just wanted to share my experience with it. I had double requests all over my application but everything seemed to work as expected apart form it.
What Daniel Rossman pointet out in the comments was actually also true for my problem. I had a <link rel="shortcut icon" href="#"> in my base template which caused the double request, because of the #, which is a reference to the page itself. Once i removed it, i had no double requests anymore.
Hope this answer can save someone some debugging time.
I got double request in view function, in my scenario, this went wrong:
<img id="profile-img" src="#" alt="" class="profile-cover">
by setting src="" dismiss double request. it was a silly thing, I just thought it apply to a then must apply to img, but img actually send another request.

Modify (potentially) many URLs within an HTML document in C++

I'm given a string which contains the contents of an HTML document, and I need to modify some of the URLs contained within the document. The URLs which need modification begin with the form:
<script src="https://foo.com/some/variable/path/to/file.js" ...
And must be modified to:
<script src="https://foo.com/some/variable/path/to/NEW/file.js" ...
My current approach has been to use Google's RE2's GlobalReplace function with the regexp:
"(?i)(<script\\s+(?:[^>]+\\s+)?src=[\"']https://foo\\.com/"
"(?:.*?/)*?)(.*?\\.js[\"'][^>]*>)"
Which almost works, until I realized that it's possible that the HTML that I'm given might already have some of the URLs modified and some not, the former of which should be left alone.
Question: What's the easiest way to go about modifying the URLs without modifying the ones that have already been modified upstream?
A single pass approach is essential.

Making sure custom tag does not have subtags

I am building custom tag to wrap around glyphicons.
<b:icon binding="i" />
Part of the Glyphicon spec includes:
Only for use on empty elements
Icon classes should only be used on elements that contain no text
content and have no child elements.
I want to make sure no one does something like
<b:icon binding="i">
<cfset myVariable++>
</b:icon>
Is there a way to make sure a custom tag does not have any inner tags?
Well you have two options that I can see.
First, throw an exception if thisTag.executionMode is anything other than "start". Or one could likewise throw an exception if thisTag.hasEndTag is true. However this will restrict tag usage to:
<b:icon binding="i">
And not:
<b:icon binding="i" />
Because /> is shorthand for an end-tag. This is less than ideal, and you perhaps won't accept that as an approach.
Secondly you can check if there's any generatedContent but this is a big haphazard because it's entirely possible to have something between starting and closing tags, but is careful to not generate content:
<b:icon binding="i"><cfset foo="bar"></b:icon>
(note: even the new lines and indentation would count as generatedContent if there were any).
Bottom line: whilst JSP custom tags allow for control of this sort of thing, I cannot see how it can be controlled by the CFML implementation. The closest you can get is to prohibit closing tags entirely.

Ember is there alternative way for binding elements attributes

I have a dynamically created Ember's views which are connected from some sub-parts stored in the DB. I'm also using a Jsoup for modifying the template to include some other non-Ember parts. Unfortunately when my stored ember part contains attribute binding in common way:
<li {{bindAttr class="isCompleted:completed isEditing:editing"}}>
The Jsoup tries to "fix" them by adding empty quotes like, which is of course expected behavior:
<li {{bindattr="" class="isCompleted:completed isEditing:editing" }}="">
Is there any way for binding the attributes ie. by wrapping it within some valid HTML
ie. like data-ember='{{bindAttr "something"}}' or at least a way for preventing Jsoup from these changes?
The problem here is (as you surely already know) that Jsoup tries to parse your HTML markup with the included handlebars expressions, and by doing so it check's for valid HTML, so in the case of bindAttr this is interpreted as a attribute for your <li> tag, and because a valid attribute is something like class="foo" Jsoup converts it to bindAttr="".
Lamentably there is no built-in way of telling Jsoup to ignore tags with no values. I guess you should try another tool that fit's your needs.
Hope it helps.

Markdown and XSS

Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as
convert markdown to html
sanitize html (w/whitelist)
insert into database
but to me it makes more sense to do the following:
sanitize markdown (remove all tags -
no exceptions)
convert to html
insert into database
Am I missing something? This seems to me to be pretty nearly xss-proof
Please see this link:
http://michelf.com/weblog/2010/markdown-and-xss/
> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>
Becomes
<blockquote>
<p>hello <a name="n"
href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
∴​ you must sanitize after converting to HTML.
There are two issues with what you've proposed:
I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.
Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.
There are some good resources on the web about output sanitization:
Sanitizing user data: Where and how to do it
Output sanitization (One of my clients, who shall remain nameless and whose affected system was not developed by me, was hit with this exact worm. We have since secured those systems, of course.)
BizTech: Best Practices: Never heard of XSS?
Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.
*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.
Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.
insert into database
convert markdown to html
sanitize html (w/whitelist)
perl
use Text::Markdown ();
use HTML::StripScripts::Parser ();
my $hss = HTML::StripScripts::Parser->new(
{
Context => 'Document',
AllowSrc => 0,
AllowHref => 1,
AllowRelURL => 1,
AllowMailto => 1,
EscapeFiltered => 1,
},
strict_comment => 1,
strict_names => 1,
);
$hss->filter_html(Text::Markdown::markdown(shift))
convert markdown to html
sanitize html (w/whitelist)
insert into database
Here, the assumptions are
Given dangerous HTML, the sanitizer can produce safe HTML.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
sanitize markdown (remove all tags - no exceptions)
convert to html
insert into database
Here the assumptions are
Given dangerous markdown, the sanitizer can produce markdown that when converted to HTML by a different program will be safe.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
The markdown sanitizer has to know not just about dangerous HTML and dangerous markdown, but how the markdown->HTML converter does its job. That makes it more complex, and more likely to be wrong than the simpler unsafeHTML->safeHTML function above.
As a concrete example, "remove all tags" assumes you can identify tags, and would not work against UTF-7 attacks. There might be other encoding attacks out there that render this assumption moot, or there might be a bug that causes the markdown->HTML program to convert (full-width '<', exotic white-space characters stripped by markdown, SCRIPT) into a <script> tag.
The most secure would be:
sanitize markdown (remove all tags - no exceptions)
convert markdown to HTML
sanitize HTML
insert into a DB column marked risky
re-sanitize HTML every time you fetch that column from the DB
That way, when you update your HTML sanitizer you get protection against any newly discovered attacks. This is often inefficient, but you can get pretty good security by storing a timestamp with HTML inserted so that you can tell which might have been inserted during the time when someone knew about an attack that gets past your sanitizer.