Django - Decoding MIME Header with Base64 and UTF-8 - django

I am creating a web email interface to read IMAP accounts. I'm having problems decoding a certain email header.
I obtain the following From header (specific example from an event email):
('"=?UTF-8?B?QmVubnkgQmVuYXNzaQ==?=" <NOREPLY#NOREPLY.LOCKNLOADEVENTS.COM>', None)
I separate the first part:
=?UTF-8?B?QmVubnkgQmVuYXNzaQ==?=
According to some research, it's aparently a Base64-encoded UTF-8 header.
I tried to decode it using the Base64 decoder:
# Separate sender name from email itself
first_part = header_text[1:header_text.index('" <')]
print "First part:", first_part
import base64
decoded_first_part = base64.urlsafe_b64decode(first_part)
print decoded_first_part
But I obtain a
TypeError: Incorrect padding.
Can anybody help me figure out what's wrong?
Thank you

>>> import base64
>>> base64.decodestring('QmVubnkgQmVuYXNzaQ==')
'Benny Benassi'
But you probably want to use a proper IMAP library for doing this stuff.

Related

Decoding and encoding JSON in Django

I was following some django rest framework tutorials and found some obscure codes. This snippet is from the customised user model, the project from which uses jwt for authentication.
As I commented in the snippet, I can't notice the reason Why they first encodes data and decode it again. I thought this kind of pattern is not only specific to this tutorial, but quite a general pattern. Could anyone explain me please?
def _generate_jwt_token(self):
"""
Generates a JSON Web Token that stores this user's ID and
has an expiry date set to 60 days into the future.
"""
dt = datetime.now() + timedelta(days=60)
token = jwt.encode({ #first encode here
'id': self.pk,
'exp': int(dt.strftime('%s'))
}, settings.SECRET_KEY, algorithm='HS256')
return token.decode('utf-8') #returns decoded object
“Encoding” usually refers to converting data to its binary representation (bytes).
JWT (JSON Web Token) encoding uses a specific data structure and cryptographic signing to allow secure, authenticated exchanges.
The steps to encode data as JWT are as follows :
The payload is converted to json and encoded using base64.
A header, specifying the token type (eg. jwt) and the signature algorithm to use (eg. HS256), is encoded similarly.
A signature is derived from your private key and the two previous values.
Result is obtained by joining header, payload and signature with dots. The output is a binary string.
More informations here.
Decoding it using UTF-8 transforms this binary string into an Unicode string :
>>> encoded_bin = jwt.encode({'some': 'data'}, 'secret_sig', algorithm='HS256')
>>> type(encoded_bin)
<class 'bytes'>
>>> encoded_string = encoded_bin.decode('utf-8')
>>> type(encoded_string)
<class 'str'>
Notes:
It is not always possible to decode bytes to string. Base64-encoding your data allows you to store any bytes as a text representation, but the encoded form requires more space (+33%) than it's raw representation.
A binary string is prefixed by a b in your Python interpreter (eg. b"a binary string")

scraping chinese characters python

I learnt how to scrap website from https://automatetheboringstuff.com. I wanted to scrap http://www.piaotian.net/html/3/3028/1473227.html in which the contents is in chinese and write its contents into a .txt file. However, the .txt file contains random symbols which I assume is a encoding/decoding problem.
I've read this thread "how to decode and encode web page with python?" and figured the encoding method for my site is "gb2312" and "windows-1252". I tried decoding in those two encoding methods but failed.
Can someone kindly explain to me the problem with my code? I'm very new to programming so please let me know my misconceptions as well!
Also, when I remove the "html.parser" from the code, the .txt file turns out to be empty instead of having at least symbols. Why is this the case?
import bs4, requests, sys
reload(sys)
sys.setdefaultencoding("utf-8")
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
content = novelSoup.select("br")
novelFile = open("novel.txt", "w")
for i in range(len(content)):
novelFile.write(str(content[i].getText()))
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novel.encoding = "GBK"
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
out:
<br>
一元宗,坐落在青峰山上,绵延极长,现在是盛夏时节,天空之中,太阳慢慢落了下去,夕阳将影子拉的很长。<br/>
<br/>
一片不是很大的小湖泊边上,一个约莫着十七八岁的青衣少年坐在湖边,抓起湖边的一块石头扔出,顿时在湖边打出几朵浪花。<br/>
<br/>
叶希文有些茫然,他没想到,他居然穿越了,原本叶希文只是二十一世纪的地球上一个普通的大学生罢了,一个月了,他才后知后觉的反应过来,这不是有人和他进行恶作剧,而是,他真的穿越了。<br/>
Requests will automatically decode content from the server. Most
unicode charsets are seamlessly decoded.
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text. You can find out
what encoding Requests is using, and change it, using the r.encoding
property:
>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'
If you change the encoding, Requests will use the new value of
r.encoding whenever you call r.text.

Multipart httpresponse with django

I would like some help about my code. My goal is to send at the same time string variables as a ini plain text and a bmp file in an httpResponse.
For the moment I insert the decoded bytes of the bmp file in an ini parameter, take into account that I communicate with an interphone which is only client but not server so I can only make httpresponses but no requests.
If I base64 encode my image, I'll need to change the software of our interphone to decode it, for the moment I can't, can you tell me if base64 encode bytes is mandatory in my case ?
I made some researches on the web and I saw that people base64 encode their images or they make multipart response.
Could you help me to implement a multipart response please, even hand made, that would interest me ?
I show you how I do for the moment, I put the image in the "string" ini parameter:
def send_bmp():
outputConfig = io.StringIO()
outputConfig.write('[RETURN_INFO]\r\n')
outputConfig.write('config_id=255\r\n')
outputConfig.write('config_type=2\r\n')
outputConfig.write('action=3\r\n')
outputConfig.write('[DATABASE]\r\n')
file = open(django_settings.TMP_DIR+'/qrcode.bmp', 'rb').read()
outputConfig.write('size_all='+str(len(file))+'\r\n')
outputConfig.write('string='+file.decode('iso-8859-1')+'\r\n')
outputConfig.write('csum='+str(sum(file))+'\r\n')
body = outputConfig.getvalue()
httpR = HttpResponse(body, content_type='text/plain;charset=iso-8859-1')
httpR['Content-Length'] = len(body)
return httpR
Here is the response I get :
https://gist.github.com/Ezekiah/e6fd50f13c05f338f27a
If you need to mix the image file content with the rest of the response I think you have to use Base64 encoding. If it is possible to return the ini parameters in one request and the file in another Django provides a FileResponse class(subclass of StreamingHttpResponse) that you can use to return the bmp file in chunks, like this:
from django.http import FileResponse
def send_bmp(request):
file = open(django_settings.TMP_DIR+'/qrcode.bmp', 'rb')
return FileResponse(file)

Tweepy location on Twitter API filter always throws 406 error

I'm using the following code (from django management commands) to listen to the Twitter stream - I've used the same code on a seperate command to track keywords successfully - I've branched this out to use location, and (apparently rightly) wanted to test this out without disrupting my existing analysis that's running.
I've followed the docs and have made sure the box is in Long/Lat format (in fact, I'm using the example long/lat from the Twitter docs now). It looks broadly the same as the question here, and I tried using their version of the code from the answer - same error. If I switch back to using 'track=...', the same code works, so it's a problem with the location filter.
Adding a print debug inside streaming.py in tweepy so I can see what's happening, I print out the self.parameters self.url and self.headers from _run, and get:
{'track': 't,w,i,t,t,e,r', 'delimited': 'length', 'locations': '-121.7500,36.8000,-122.7500,37.8000'}
/1.1/statuses/filter.json?delimited=length and
{'Content-type': 'application/x-www-form-urlencoded'}
respectively - seems to me to be missing the search for location in some way shape or form. I don't believe I'm/I'm obviously not the only one using tweepy location search, so think it's more likely a problem in my use of it than a bug in tweepy (I'm on 2.3.0), but my implementation looks right afaict.
My stream handling code is here:
consumer_key = 'stuff'
consumer_secret = 'stuff'
access_token='stuff'
access_token_secret_var='stuff'
import tweepy
import json
# This is the listener, resposible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
#print type(decoded), decoded
# Also, we convert UTF-8 to ASCII ignoring all bad characters sent by users
try:
user, created = read_user(decoded)
print "DEBUG USER", user, created
if decoded['lang'] == 'en':
tweet, created = read_tweet(decoded, user)
print "DEBUG TWEET", tweet, created
else:
pass
except KeyError,e:
print "Error on Key", e
pass
except DataError, e:
print "DataError", e
pass
#print user, created
print ''
return True
def on_error(self, status):
print status
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret_var)
stream = tweepy.Stream(auth, l)
#locations must be long, lat
stream.filter(locations=[-121.75,36.8,-122.75,37.8], track='twitter')
The issue here was the order of the coordinates.
Correct format is:
SouthWest Corner(Long, Lat), NorthEast Corner(Long, Lat). I had them transposed. :(
The streaming API doesn't allow to filter by location AND keyword simultaneously.
you must refer to this answer i had the same problem earlier
https://stackoverflow.com/a/22889470/4432830

How to parse the email message and extract data from that message using Python?

I want to parse an email message and want to extract the data from that email using Python, can anyone suggest the code to achieve that.
Thanks in advance.
You can import the email and email.policy using
import email
import email.policy
and then use this statement
email.parser.BytesParser(policy=email.policy.default).parse(f)
where f is the email file directory.