Python read a file in zip archives from api call - python-2.7

I have a restful endpoint which my rest api could make a get request to it and the file is a zip file. In this zip file, there're 2 files. I only want to read the content in 1 file from this zip archives. I was able to do a test and it likes my code stuck on line file=zipfile.ZipFile(io.BytesIO(response_object.content)).
class ZipFileResponseHandler:
def __init__(self,**args):
self.csv_file_to_index = args['csv_file_to_index']
def __call__(self, response_object, raw_response_output, response_type, req_args, endpoint):
file = zipfile.ZipFile(io.BytesIO(response_object.content))
for name in file.namelist():
if re.match(name, self.csv_file_to_index):
data =file.read(name)
print_xml_stream(repr(data))

So i found the solution to my own answer. Because I use python 2.7 the corresponding method that use to handle the response_object is StringIO not BytesIO. So the line:
file = zipfile.ZipFile(io.BytesIO(response_object.content))
should be
file = zipfile.ZipFile(StringIO.StringIO(response_object.content))

Related

flask send_from_directory function keeps sending the same old file

I have a flask app that contains a link to download a file from the server. The file will be updated by another callback function. The part for send_from_directory is like this:
app = flask.Flask(__name__)
dash_app = dash.Dash(__name__,server=app,url_base_pathname="/",external_stylesheets=external_stylesheets)
...
#dash_app.server.route('/download/',methods=["GET","POST"])
def download_data():
return flask.send_from_directory("../data/",
filename='result.csv',
as_attachment=True,
attachment_filename='result.csv',
cache_timeout=0)
I have 2 problems:
1) the file downloaded are always the same old file, despite I have have set the cache timeout as 0.
2) the downloaded file are always named as "download", instead of the file name I specified "result.csv".

Django, Store jpg file received as string in http POST

I am receiving an http request from a desktop application with a screenshot. I cannot speak with the developer or see source code, so all I have is the http request I am getting.
The file isn't in request.FILES, it is in request.POST.
#csrf_exempt
def create_contract_event_handler(request, contract_id, event_type):
keyboard_events_count = request.POST.get('keyboard_events_count')
mouse_events_count = request.POST.get('mouse_events_count')
screenshot_file = request.POST.get('screenshot_file')
barr2 = bytes(screenshot_file.encode(encoding='utf8'))
with open('.test/output.jpeg', 'wb') as f:
f.write(barr2)
f.close()
The file is corrupted.
The binary starts like this, I don't know if that helps:
����JFIFHH��C
%# , #&')*)-0-(0%()(��C
(((((((((((((((((((((((((((((((((((((((((((((((((((�� `"��
Also, if I try to open the image with PIL, I get the following error:
from PIL import Image
im = Image.open('./test/output.jpg')
#OSError: cannot identify image file './test/output.jpg'
Finally, I managed to touch the code in the other hand, the 'filename' was missing in the header and for that reason I was getting the file in the POST instead of in the FILES dictionary.

how to get text from .doc file using python 2.7

I am extracting text from .docx file using following code
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
data = getText(file_path)
Now,I want to extract .doc file also in my django rest api hosted on pythonanywhere.As api is on pythonanywhere I am unable to install textract library and antiword.So,How can I do it?
abiword is installed on PythonAnywhere:
abiword --to=txt myfile.doc
will produce a file called myfile.txt.

Upload an mp3 files to soundcloud using Python (file name is random)

I'd like to upload an mp3 file from hotfolder without knowing the name of the file. (such as *.mp3)
here's what I tried (to upload specific file / known file name)
import soundcloud
# create client object with app and user credentials
client = soundcloud.Client(client_id='***',
client_secret='***',
username='***',
password='***')
# print authenticated user's username
print client.get('/me').username
mp3_file=('test.mp3')
# upload audio file
track = client.post('/tracks', track={
'title': 'Test Sound',
'asset_data': open(mp3_file, 'rb')
})
# print track link
print track.permalink_url
how can I make the script upload any mp3 file in that folder ? (script and files are located in the same folder)
From the language as written here, it's not precisely clear what you mean by "upload any mp3 file in that folder." Does uploading the first file in the folder satisfy your need, or does it need to be a different file each time the script executes? If the latter, my suggestion is to get a list of files and then randomly select one of them.
To get a list of all files in python,
from os import listdir
from os.path import isfile, join
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
and then to randomly select one of them:
import random
print(random.choice(onlyfiles))
Hope this helps

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "test.py", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\pdf.py", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\pdf.py", line 1547, in
_contentStreamRename
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
page.mergePage(watermark.getPage(0))
output.addPage(page)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
output.write(outputStream)
outputStream.close()
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
outputStream.close()
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.
I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.