I'm currently working on my blog site. One of the most important thing in appearance is multi language syntax highlighter. So I decided to use Pygments library and I wrote some code:
from django import template
from pygments import highlight
from pygments.formatters.html import HtmlFormatter
from pygments.lexers import get_lexer_by_name, guess_lexer
from django.utils.safestring import mark_safe
from bs4 import BeautifulSoup
register = template.Library()
#register.filter(is_safe=True)
def highlighter(content):
soup = BeautifulSoup(unicode(content))
codeBlocks = soup.findAll(u'code')
for i,block in enumerate(codeBlocks):
if block.has_attr(u'class'):
language = block[u'class']
else:
language = u'text'
try:
lexer = get_lexer_by_name(language[0])
except ValueError:
try:
lexer = guess_lexer(unicode(block))
except ValueError:
lexer = get_lexer_by_name(language[0])
highlighting = highlight(unicode(block), lexer, HtmlFormatter())
block.replaceWith(highlighting)
return mark_safe(unicode(soup))
In my template i use somethink like this:
<p>{{ post.en_post_content|highlighter|safe|linebreaks}}</p>
Highlighting works well but I can't make it safe because this is what I receive:
http://i.gyazo.com/a2557c861e20a826b28cb5c261e6020f.png
I'm also worried about strange characters like "'"
I need advices on how to deal with it. Thanks in advance for reply.
Related
Below part of code is copied from this snippet available at django snippet. This code may be working fine for given version but now i want to use it for latest version of django i.e 2.0 and with python-3. Below part of code snippet is given the error :
return template.mark_safe(''.join(map(template.force_unicode,
AttributeError: module 'django.template' has no attribute 'mark_safe'
def render(self, context):
return template.mark_safe(''.join(map(template.force_unicode,
_render_nodelist_items(self,context))))
template.NodeList.render = render
if possible make it working for django 2.0 as i need to use it at multiple place in my project.
Try the following
from django.utils.safestring import mark_safe
from django.utils.encoding import force_text
def render(self, context):
return mark_safe(
''.join(map(force_text(template.render()), _render_nodelist_items(self,context)))
)
As seems to frequently happen here, I am quite new to Python 2.7 and Scrapy. Our project has us scraping website date, following some links and more scraping, and so on. This was all working fine. Then I updated Scrapy.
Now when I launch my spider, I get the following message:
This wasn't coming up anywhere previously (none of my prior error messages looked anything like this). I am now running scrapy 1.1.0 on Python 2.7. And none of the spiders that had previously worked on this project are working.
I can provide some example code if need be, but my (admittedly limited) knowledge of Python suggests to me that its not even getting to my script before bombing out.
EDIT:
OK, so this code is supposed to start at the first authors page for Deakin University academics on The Conversation, and go through and scrape how many articles they have written and comments they have made.
import scrapy
from ltuconver.items import ConversationItem
from ltuconver.items import WebsitesItem
from ltuconver.items import PersonItem
from scrapy import Spider
from scrapy.selector import Selector
from scrapy.http import Request
import bs4
class ConversationSpider(scrapy.Spider):
name = "urls"
allowed_domains = ["theconversation.com"]
start_urls = [
'http://theconversation.com/institutions/deakin-university/authors']
#URL grabber
def parse(self, response):
requests = []
people = Selector(response).xpath('///*[#id="experts"]/ul[*]/li[*]')
for person in people:
item = WebsitesItem()
item['url'] = 'http://theconversation.com/'+str(person.xpath('a/#href').extract())[4:-2]
self.logger.info('parseURL = %s',item['url'])
requests.append(Request(url=item['url'], callback=self.parseMainPage))
soup = bs4.BeautifulSoup(response.body, 'html.parser')
try:
nexturl = 'https://theconversation.com'+soup.find('span',class_='next').find('a')['href']
requests.append(Request(url=nexturl))
except:
pass
return requests
#go to URLs are grab the info
def parseMainPage(self, response):
person = Selector(response)
item = PersonItem()
item['name'] = str(person.xpath('//*[#id="outer"]/header/div/div[2]/h1/text()').extract())[3:-2]
item['occupation'] = str(person.xpath('//*[#id="outer"]/div/div[1]/div[1]/text()').extract())[11:-15]
item['art_count'] = int(str(person.xpath('//*[#id="outer"]/header/div/div[3]/a[1]/h2/text()').extract())[3:-3])
item['com_count'] = int(str(person.xpath('//*[#id="outer"]/header/div/div[3]/a[2]/h2/text()').extract())[3:-3])
And in my Settings, I have:
BOT_NAME = 'ltuconver'
SPIDER_MODULES = ['ltuconver.spiders']
NEWSPIDER_MODULE = 'ltuconver.spiders'
DEPTH_LIMIT=1
Apparently my six.py file was corrupt (or something like that). After swapping it out with the same file from a colleague, it started working again 8-\
My objective is to use pyquery with scrapy, apparently from scrapy.selector import PyQuerySelector returns ImportError: cannot import name PyQuerySelector when I crawl the spider.
I followed this specific gist https://gist.github.com/joehillen/795180 to implement pyquery.
Any suggestions or tutorials that can help me get this job done?
You declare a class and make your rules and in the callback attribute of rule extractor give parse_item by default the scrapy goes parse() function
def parse_item(self, response):
pyquery_obj = PyQuery(response.body)
header = self.get_header(pyquery_obj)
return {
'header': header,
}
def get_header(self, pyquery_obj):
return pyquery_obj('#page_head').text()
Let say we have a message like this :
messages.add_message(request, messages.SUCCESS,
_('Document %(doc_type_name)s %(name)s (%(fname)s) created.') % {
'doc_type_name': conditional_escape(document.doc_type.name),
'name': conditional_escape(document.title),
'fname': conditional_escape(document.name),
'url': document.get_absolute_url()
})
Here it will work only if we display the message with {{ message|safe }} but we don't want that since if there is some code in %(name) it will be executed too.
If I use:
messages.add_message(request, messages.SUCCESS,
mark_safe(_('Document %(doc_type_name)s %(name)s (%(fname)s) created.') % {
'doc_type_name': conditional_escape(document.doc_type.name),
'name': conditional_escape(document.title),
'fname': conditional_escape(document.name),
'url': document.get_absolute_url()
}))
The mark_safe doesn't work.
I read a solution over there : https://stackoverflow.com/a/12600388/186202
But it is the reverse that I need here:
_('Document %s created.') % mark_safe('')
And as soon as it goes through the ugettext function it is not safe anymore.
How should I do?
You are trying to mix view and logic by placing HTML inside Python code. Well, sometimes you just have to do this but it is not the case.
mark_safe() returns SafeString object which is treated by Django templates specially. If SafeString evaluated by ugettext or % you will get string again, it is an expected behaviour. You can not mark safe only formatting string, either complete output with doc name/title etc or everything is not safe. Ie, it will not work this way.
You can put HTML into template and use render_to_string(), and probably it is the best option.
Are document title, name and doc_type.name set by user? If not, you can skip mark_safe and document using HTML in document properties as feature.
Previous response are correct: you should avoid mixing python and html as much as possible.
To solve your issue:
from django.utils import six # Python 3 compatibility
from django.utils.functional import lazy
from django.utils.safestring import mark_safe
from django.utils.translation import ugettext_lazy as _
mark_safe_lazy = lazy(mark_safe, six.text_type)
then:
lazy_string = mark_safe_lazy(_("<p>My <strong>string!</strong></p>"))
Found in django documentation:
https://docs.djangoproject.com/en/1.9/topics/i18n/translation/#s-other-uses-of-lazy-in-delayed-translations
I have a model, Order, that has an action in the admin panel that lets an admin send information about the order to certain persons listed that order. Each person has language set and that is the language the message is supposed to be sent in.
A short version of what I'm using:
from django.utils.translation import ugettext as _
from django.core.mail import EmailMessage
lang = method_that_gets_customer_language()
body = _("Dear mister X, here is the information you requested\n")
body += some_order_information
subject = _("Order information")
email = EmailMessage(subject, body, 'customer#example.org', ['admin#example.org'])
email.send()
The customer information about the language he uses is available in lang. The default language is en-us, the translations are in french (fr) and german (de).
Is there a way to use the translation for the language specified in lang for body and subject then switch back to en-us? For example: lang is 'de'. The subject and body should get the strings specified in the 'de' translation files.
edit:
Found a solution.
from django.utils import translation
from django.utils.translation import ugettext as _
body = "Some text in English"
translation.activate('de')
print "%s" % _(body)
translation.activate('en')
What this does it take the body variable, translates it to German, prints it then returns the language to English.
Something like
body = _("Some text in English")
translation.activate('de')
print "%s" % body
prints the text in English though.
If you're using Python 2.6 (or Python 2.5 after importing with_statement from __future__) you can use the following context manager for convenience.
from contextlib import contextmanager
from django.utils import translation
#contextmanager
def language(lang):
if lang and translation.check_for_language(lang):
old_lang = translation.get_language()
translation.activate(lang)
try:
yield
finally:
if lang:
translation.activate(old_lang)
Example of usage:
message = _('English text')
with language('fr'):
print unicode(message)
This has the benefit of being safe in case something throws an exception, as well as restoring the thread's old language instead of the Django default.
Not sure if activating/deactivating translation is proper way to solve that problem(?)
If I were facing that problem I would try to build some model for storing subjects/body/language/type fields. Some code draft:
class ClientMessageTemplate(models.Model):
language = model.CharField(choices=AVAIALBLE_LANGUAGES,...)
subject = models.CharField(...)
body = models.CharField(...)
type = models.CharField(choices=AVAILABLE_MESSAGE_TYPES)
Then you can retreive easily ClientMessageTemplate you need base on type and client's language.
Advantage of this solution is that you can have all data maintainable via admin interface and do not need to recompile message files each time something changed.