Problem reading file with pandas uploaded to a django view - django

I'm uploading some excel and csv files to a django view using axios, then i pass those files to a function that uses pandas read_csv and read_excel functions to process them, the first problem i had was with some excel files that had some non utf-8 characters that pandas was unable to read, the only solution i found was to set "engine = 'python'" when reading the file (changing the encoding to utf-8-sig or utf-16 didn't work).
This works when i'm testing the script from my terminal, but when i use the same script on a view i get the following error: ValueError("The 'python' engine cannot iterate through this file buffer.")
This is the code i'm using:
try:
data = pandas.read_csv(request.FILES['file'], engine="python")
except:
print("Oops!",sys.exc_info(),"occured.")
trying the same function through the terminal works fine
pandas.read_csv("file.csv", engine="python")

Related

uploading arabic files in django not working returning codes

i have a view in my django project that should be able to read the content of an uploaded .txt file from the input type="file", but the thing is that with arabic content it doesn't print the actual text, but a series of codes "\xd9\x88\xd9\x82\xd8\xa7\xd9\x84" and i couldn't find any solution for this since the file is perfectly viewable on my pc and my website it the one exporting that file in "utf-8". any help here ?
Uploaded_File = request.FILES["Doc"]
for chunk in Uploaded_File.chunks(chunk_size=None):
print(chunk)

How to create and write to an XLSX file on Google Cloud Storage

I'm having trouble with my XLSX file process on Google Cloud Storage. The following code is what I have so far:
import cloudstorage
mime = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
filehandle = cloudstorage.open('/default/temp_export.xlsx', 'w', content_type=mime)
filehandle.write('some data1,some data2\n')
filehandle.write('some data3, somedata4\n')
filehandle.close()
This will create an XLSX file temp_export.xlsx on my storage bucket with the XLSX format using the mime type. When I tried reading the file with the following command, it works fine:
import cloudstorage
filehandle = cloudstorage.open('/default/temp_export.xlsx')
print filehandle.read()
# Output:
# some data1,some data2
# some data3, somedata4
But when I tried going to my storage bucket and downloading the temp_export.xlsx and try to open it, it throws this excel error:
Excel cannot open this file.
The file format or file extension is not valid.
Verify that the file has not been corrupted and that
the file extension matches the format of the file.
Anyone know what I'm doing wrong or how I can fix it? Thanks.
Any reason for not using the latest library for Cloud Storage? That one is only for App Engine with Python 2.7.
There is an example here on how to upload an object and how to download it, and then you must read your xlsx file with a proper library for it, like the ones mentioned here.

django csv import encoding

I am using the Django csv import (https://pypi.python.org/pypi/django-csvimport) to populate some models. The problem is the csv files I have are encoded in ANSI (Windows-1252) format and they have words with special characters e.g. JOSÉ, when I import to my models the word become JOSи.
Could you help me with this?
P.S.:
1 - I have fulfilled the encoding field of the csv import with many options (ansi, utf-8...) but it seems to have no effect.
2 - I have tried to convert my csv files to many differents formats (using vb.net) like utf-8, utf-32, unicode... but all of them cause some error in Django csv import.
After some tries I found the solution:
While trying to convert my text file using vb.net I was opening it with OpenText(), which open the file with UTF8 encoding. So I opened it with something like "Using SR As StreamReader = New StreamReader(Fl.FullName, System.Text.Encoding.GetEncoding("Windows-1252"), True)", and wrote it with UTF8. This solved the problem.

Python program to extend short urls that integrates with Stata

I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!
Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.

How can I programmatically generate PDFs using LaTeX?

I'm trying to generate some LaTeX code which from thereon should generate PDF documents.
Currently, I'm using the Django templating system for dynamically creating the code, but I have no idea on as how to move on from here. I understand that I could save the code in a .tex file, and use subprocess to run pdflatex for generating the PDF. But I had so much trouble escaping the LaTeX code in "plain" Python that I decided to use the Django templating system. Is there a way that I could somehow maybe pipe the output produced by Django to pdflatex? The code produced is working properly, it's just that I do not know what to do with it.
Thanks in advance
I was tackling the same issue in a project I had previously worked on, and instead of piping the output, I created temporary files in a temporary folder, since I was worried about handling the intermediate files LaTeX produces. This is the code I used (note that it's a few years old, from when I was still new to Python/Django; I'm sure I could come up with something better if I was writing this today, but this definitely worked for me):
import os
from subprocess import call
from tempfile import mkdtemp, mkstemp
from django.template.loader import render_to_string
# In a temporary folder, make a temporary file
tmp_folder = mkdtemp()
os.chdir(tmp_folder)
texfile, texfilename = mkstemp(dir=tmp_folder)
# Pass the TeX template through Django templating engine and into the temp file
os.write(texfile, render_to_string('tex/base.tex', {'var': 'whatever'}))
os.close(texfile)
# Compile the TeX file with PDFLaTeX
call(['pdflatex', texfilename])
# Move resulting PDF to a more permanent location
os.rename(texfilename + '.pdf', dest_folder)
# Remove intermediate files
os.remove(texfilename)
os.remove(texfilename + '.aux')
os.remove(texfilename + '.log')
os.rmdir(tmp_folder)
return os.path.join(dest_folder, texfilename + '.pdf')
The dest_folder variable is usually set to somewhere in the media directory, so that the PDF can then be served statically. The value returned is the path to the file on disk. The logic of what its URL would be is handled by whatever function sets the dest_folder.