I'm sending emails, using Django framework.
And I wonder is it possible to make subject line red, using <span style="color:red">Some subject line</span>
No, HTML tags will not be rendered in the subject field of an email by RFC2822 compliant clients.
The RFC defines lexical tokens (HTML tags etc.) to be used in the body, not in the header fields. The subject is part of the header fields.
Note that this is not a limitation of Django.
If you want fanciness, you might want to look into including unicode characters, which is becoming more and more popular these days.
Related
I've been trawling the internet trying to find a solution to this issue. Basically I am using a web service provided by the company that runs our support software to retrieve customer tickets and output them (dependent on filtering) through our system so that customers can see from their dashboard which current support tickets they have active. I've managed to get the desired tags from the XML that is returned via the web service and place their content in a html table (therefore listing the active tickets row by row in the table) however, as the ticket description tag is populated with the content from emails sent by clients, there is lots of nasty redundant css and styling that has been applied to the Email that I would like to remove.
So far I have managed to use the 'replace' function to replace some of the redundant content from this email content ->
l_html_build := replace(l_html_build,'<','<');
l_html_build := replace(l_html_build,'>','>');
l_html_build := replace(l_html_build,'<','');
l_html_build := replace(l_html_build,'>','');
l_html_build := replace(l_html_build,' ',' ');
However I now need to overwrite the p tags which have all sorts of garbage added to them so that they just become standard p tags->
From this:
<p 0in;"="" 3.0pt="" padding:="" 1.0pt;="" solid="" border-top:="" none;="" _mce_style=""border:" 0in"="" 0in="" 1.0pt;padding:3.0pt="" #b5c4df="" style=""border:none;border-top:solid">
To this:
<p>
I've looked into using the regEXP function listed here psoug however this appears to require a select statement that is performed each time. The data I need to manipulate is stored in a CLOB called l_html_build so is there any way of adapting the regEXP function to be used in a similar way to the replace function above or is there an alternative method that I am not aware of?
I apologise if this is a noob question. My expertise lies in front end development, PHP and MySQL but unfortunately I'm now required to bits of PL/SQL in my new role.
Any help would be greatly appreciated.
Knowing that:
There is no standard PL/SQL package that parses HTML.
You can't reliably parse HTML with regex. Furthermore, Oracle only support basic regular expressions, restricting its capabilities.
You want to stay in PL/SQL
You are left with few options (that I can think of):
Write a simple procedure yourself that will work in most of the cases (but there will be many exceptions that will break your parser).
Use a java parser, load class in database, call java from PL/SQL. Oracle comes with its integrated jvm, so this involves no extra setup.
I would go with option (2) if you want reliability, or option (1) if infrequent but inevitable losses are acceptable.
Since your content will be coming from email client, we can assume that only a tiny (negligible?) fraction will have very obscure HTML.
In that case you could start with simple regex expressions that may need some tweaking:
SQL> SELECT regexp_replace(
2 '<p1 3.0pt="" padding:="" #b5c4df="">
3 text
4 </p>',
5 '<([[:alpha:]]+)[^>]*>',
6 '<\1>') remove_attr_simple
7 FROM dual;
REMOVE_ATTR_SIMPLE
------------------
<p>
text
</p>
This will fail to catch tricky valid HTML (such as <P attr=">">) but since your input is somewhat standard this should be fine often enough. You may need to remove HTML comments with another procedure -- I'm not sure it can be done with regex.
SQL is really not the best tool for this job. Nor will regexes be able to perform this kind of task reliably. You would be better off extracting the data and processing it in another language using an XML parser.
Presumably Oracle itself is not sending these emails. What program does the sending, and can you add some programmatic processing at that point?
Since you already know PHP, here is a discussion of parsing HTML/XML in PHP. Similar tools are available in most other languages.
Pass a []byte into a template as the body of a message post on a forum-style web app. In the template, call a method to convert to string and along the way, switch out all newlines for line breaks:
<p>{{.BodyString}}</p>
...
func (p *Post) BodyString() string {
nl := regexp.MustCompile(`\n`)
return nl.ReplaceAllString(string(p.Body), `<br>`)
}
What you'll end up with:
paragraphs <br> <br>in <br> <br>this <br> <br>post
I don't want to pass the entire post in with HTML(p.Body), as it represents third party data from potentially untrustworthy sources. Is there a way to whitelist only some tags for formatting purposes using the vanilla Go1 template package?
I do think you want to parse the HTML. The HTML parser in exp/html was deemed incomplete and so removed from Go 1, although the exp tree is still in the Go source tree and can be accessed by weekly tag, for example. I don't know exactly what is incomplete. I used it for a simple task once and it met my needs.
Also of course, check the dashboard and see related SO post, Any smart method to get exp/html back after Go1?, mostly for the recomendation of http://code.google.com/p/go-html-transform/
I'm affraid the template package cannot help with this too much. If you want to remove specific (black-listed) tags (resp. the sub-tree enclosed by such tags) or allow to pass only specific tags (white-listed) then I think probably nothing less than parsing and rewriting the html AST can be a good solution. That said, one can see here and there some crazy REs trying to do the same, but I don't consider that a "good solution" and I doubt they can be a "correct" solution in the general case of a specs conforming HTML, including several legal irregularities, as it is probably ruled out of a regular grammar category problem.
I'm using lib-cURL as a HTTP client to retrieve various pages (can be any URL for that matter).
Usually the data comes as a UTF-8 string and then I just call "MultiByteToWideChar" and it works well.
However, some web-pages still use code-page encoding and I see gibberish if i try to convert those pages to UTF-8.
Is there an easy way to retrieve the code page from the data? or I'll have to scan it manually (for "encoding=") and then translate it accordingly.
If so, how do i get the code-page id from name (Code Page Identifiers)?
Thanks,
Omer
There are several location where a document can state its encoding:
the Content-Type HTTP header
the (optional) XML declaration
the Content-Type meta tag inside the document header
for HTML5 documents the charset meta tag.
There are probably even more I've forgotten.
In the end, detecting the actual encoding is rather hard. You really shouldn't do this yourself but use high-level libraries for retrieving and parsing HTML content. I'm sure they are available even for C++, even if they have to be thiefed from the a browser environment. :)
I used DetectInputCodepage in IMultiLanguage2 interface and it worked great !
In the admin, if you enter in a slug two things are applied through JS:
The string is made slug-friendly
The string is transliterated if the language is not English, so for example Cyrillic Russian text gets converted into Transliterated Russian ( typed out in English )
I'm basically inserting a couple thousand rows and I need to access this. Does django provide a non-js server-side version of this transliterator which I can access to somehow do the insertion?
Looks like I have to port over the usr/lib/pymodules/python2.5/django/contrib/admin/media/js/urlify.js code unless I can figure out a way to programmatically load all articles on the client side and slugify themselves automatically.
Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as
convert markdown to html
sanitize html (w/whitelist)
insert into database
but to me it makes more sense to do the following:
sanitize markdown (remove all tags -
no exceptions)
convert to html
insert into database
Am I missing something? This seems to me to be pretty nearly xss-proof
Please see this link:
http://michelf.com/weblog/2010/markdown-and-xss/
> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>
Becomes
<blockquote>
<p>hello <a name="n"
href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
∴ you must sanitize after converting to HTML.
There are two issues with what you've proposed:
I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.
Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.
There are some good resources on the web about output sanitization:
Sanitizing user data: Where and how to do it
Output sanitization (One of my clients, who shall remain nameless and whose affected system was not developed by me, was hit with this exact worm. We have since secured those systems, of course.)
BizTech: Best Practices: Never heard of XSS?
Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.
*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.
Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.
insert into database
convert markdown to html
sanitize html (w/whitelist)
perl
use Text::Markdown ();
use HTML::StripScripts::Parser ();
my $hss = HTML::StripScripts::Parser->new(
{
Context => 'Document',
AllowSrc => 0,
AllowHref => 1,
AllowRelURL => 1,
AllowMailto => 1,
EscapeFiltered => 1,
},
strict_comment => 1,
strict_names => 1,
);
$hss->filter_html(Text::Markdown::markdown(shift))
convert markdown to html
sanitize html (w/whitelist)
insert into database
Here, the assumptions are
Given dangerous HTML, the sanitizer can produce safe HTML.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
sanitize markdown (remove all tags - no exceptions)
convert to html
insert into database
Here the assumptions are
Given dangerous markdown, the sanitizer can produce markdown that when converted to HTML by a different program will be safe.
The definition of safe HTML will not change, so if it is safe when I insert it into the DB, it is safe when I extract it.
The markdown sanitizer has to know not just about dangerous HTML and dangerous markdown, but how the markdown->HTML converter does its job. That makes it more complex, and more likely to be wrong than the simpler unsafeHTML->safeHTML function above.
As a concrete example, "remove all tags" assumes you can identify tags, and would not work against UTF-7 attacks. There might be other encoding attacks out there that render this assumption moot, or there might be a bug that causes the markdown->HTML program to convert (full-width '<', exotic white-space characters stripped by markdown, SCRIPT) into a <script> tag.
The most secure would be:
sanitize markdown (remove all tags - no exceptions)
convert markdown to HTML
sanitize HTML
insert into a DB column marked risky
re-sanitize HTML every time you fetch that column from the DB
That way, when you update your HTML sanitizer you get protection against any newly discovered attacks. This is often inefficient, but you can get pretty good security by storing a timestamp with HTML inserted so that you can tell which might have been inserted during the time when someone knew about an attack that gets past your sanitizer.