Requests Gzip HTTP download and write to disk - python-2.7

I'm using the requests library and python 2.7 to download a gzipped text file from a web api. Using the code below, I'm able to successfully send a get request and, judging from the headers, receive a response in the formed of the gzip file.
I know Requests decompresses these files for you automatically if it detects from the header that the response is gzipped. I wanted to take that download in the form of a file stream and write the contents to disk for storage and future analysis.
When I get open the resulting file in my working directory however I get characters like this: —}}¶— Q#Ï 'õ
For reference, some of the response headers include 'Content-Encoding': 'gzip', 'Content-Type': 'application/download', 'Accept-Encoding,User-Agent'
Am I wrong to write in binary? Am I not encoding the text correctly(ie. could it be ASCII vs utf-8)? There is no apparent character encoding noted in the response headers.
try:
response = requests.get(url, paramDict, stream=True)
except Exception as e:
print(e)
with open(outName, 'wb') as out_file:
for chunk in response.iter_content(chunk_size=1024):
out_file.write(chunk)
EDIT 3.30.2016:
Now I've changed my code a little bit to utilize gzipstream library. I tried using the stream to read the entirety of the Gzipped text file that is in my response content:
with open(outName, 'wb') as out_file, GzipStreamFile(response.content) as fileStream:
streamContent = fileStream.read()
out_file.write(streamContent)
I then received this error:
out_file.write(streamContent)
AttributeError: '_GzipStreamFile' object has no attribute 'close'
The output was an empty text file with the file name as anticipated. Do I need to initialize my streamContent variable outside of the with block so that it doesn't automatically try to call a close method at the end of the block?
EDIT 4.1.2016 Just thought I'd clarify that this DOES NOT have to be a stream, that was just one solution I encountered. I just want to make a daily request for this gzipped file and have it saved locally in plaintext

try:
response = requests.get(url, paramDict)
except Exception as e:
print(e)
data = zlib.decompress(response.content, zlib.MAX_WBITS|32)
with open('outFileName.txt','w') as outFile:
outFile.write(data)
Here is the code that I wrote that ended up working. It is as sigmavirus said: the file was gzipped to begin with. I knew this fact, but did not describe it clearly enough apparently as I kept read/writing the gzipped bytes.
Using the zlib module, I was able to decompress the content of the response all at one time into the data variable; I then wrote that variable containing the decompressed data into a file.
I'm not sure if this is the best or most pythonic way to do this, but it worked. If anyone can enlighten me as to why I cannot gzip.open this content (perhaps I needed to use an alternative method, I tried gzipstream library to no avail), I would appreciate any explanations, but I do consider this question answered.
Thanks to everyone who helped me, even if you didn't have the solution, you helped encourage me to persevere!

So the combination here of stream=True and iter_content is what is causing your problems. What you might want to do is something akin to this (to preserve the streaming behaviour):
try:
response = requests.get(url, params=paramDict, stream=True)
except Exception as e:
print(e)
raw = response.raw
with open(outName, 'wb') as out_file
while True:
chunk = raw.read(1024, decode_content=True)
if not chunk:
break
out_file.write(chunk)
Note that you still want to use bytes because you haven't determined the character encoding of the content so you still have bytes but you're no longer dealing with the gzipped bytes.

You are requesting the raw socket stream which is stripping of the chunk transfer encoding but leaving the content coding intact. In other words: What you've got there is pretty certainly the gzipped content. The presence of the Content-Encoding: gzip header is a strong indicator for that, as http clients are required to remove it should they remove the content coding.
One way to eliminate this would be to send an empty Accept-Encoding header among the request to indicate no encoding were acceptable. If the API is RFC compliant, you should receive an uncompressed response. The other way would be to decompress the stream yourself. I believe this cannot be done natively by the gzip and zlib modules. However, the gzipstream lib should give you a start.

Related

Redmine Rest API - file attachment, upload token is not complete

I'm trying to add an issue with a file attachment but the response token is not complete.
It is the same error as http://www.redmine.org/boards/2/topics/42425 (5 years old question) but there is no response there.
The redmine used is a 3.2.1.stable.
I'm using https://www.redmine.org/projects/redmine/wiki/Rest_api#Attaching-files to know how to upload files, but when I do a POST to /uploads.json?filename=myFileName, the response is something like {"upload":{"token":"6898."}} The response code is still a 201, so it doesn't seem like there is an error.
The response to the API call should be something like {"upload":{"token":"7167.ed1ccdb093229ca1bd0b043618d88743"}}.
I tried using the partial token returned, to no avail.
Anyone have an idea as to why the token is not okay / how to correct the problem?
For some reason Redmine can't copy uploaded file to persistent (final) location. Maybe there's a lack of disk space, maybe there are some issues with file name. Check your environment.log for
"Saving attachment '#{self.diskfile}' (#{#temp_file.size} bytes)"
when the file is being uploaded. Maybe this will indicate the reason.
In fact, the problem was that the file sent was empty.
This answer may well be of help to someone.

HttpQueryInfo to get File Size

Why does this function work on a direct url to a download however fail on a php page echoing out a file for download? (GetLastError is 0)
Not all HTTP requests will have a content length field in the response. Dynamic pages generated by PHP scripts might not know how large the content actually is.
In these cases you need just need to read a little bit at the time until there is no more data returned from the server.

JMeter - How to extract values from a response which has been decoded from base64 and stored in a variable? All under the same sampler

I am trying to test a webservice's performance, and having a few issues with using and passing variables. There are multiple sequential requests, which depend on some data coming from a previous response. All requests need to be encoded to base64 and placed in a SOAP envelope namespace before sending it to the endpoint. It returns and encoded response which needs to be decoded to see the xml values which need to be used for the next request. What I have done so far is:
1) Beanshell preprocessor added to first sample to encode the payload which is called from a file.
2) Regex to pull the encoded response bit from whole response.
3) Beanshell post processor to decode the response and write to a file (just in case). I have stored the decoded response in a variable 'Output' and I know this works since it writes the response to file correctly.
4) After this, I have added 4 regex extractors and tried various things such as apply to different parts, check different fields, check JMeter variable etc. However, it doesn't seem to work.
This is what my tree is looking like.
JMeter Tree
I am storing the decoded response to 'Output' variable like this and it works since it's writing to file properly:
import org.apache.commons.io.FileUtils;
import org.apache.commons.codec.binary.Base64;
String Createresponse= vars.get("Createregex");
vars.put("response",new String(Base64.decodeBase64(Createresponse.getBytes("UTF-8"))));
Output = vars.get("response");
f = new FileOutputStream("filepath/Createresponse.txt");
p = new PrintStream(f);
this.interpreter.setOut(p);
print(Output);
f.close();
And this is how I using Regex after that, I have tried different options:
Regex settings
Unfortunately though, the regex is not picking up these values from 'Output' variable. I basically need them saved so i can use ${docID} in the payload file for next request.
Any help on this is appreciated! Also happy to provide more detail if needed.
EDIT:
I had a follow up question. I am trying to run this with multiple users. I have a field ${searchuser} in my payload xml file called in the pre-processor here.
The CSV Data set above it looks like this:
However, it is not picking up the values from CSV and substituting in the payload file. Any help is appreciated!
You have 2 problems with your Regular Expression Extractor configuration:
Apply to: needs to be response
Field to check: needs to be Body, Body as a Document is being used for binary file formants like PDF or Word.
By the way, you can do Base64 decoding and encoding using __base64Decode() and __base64Encode() functions available via JMeter Plugins. The plugins in their turn can be installed in one click using Plugin Manager

pydrive: Losing file content during upload()

I currently have a 34x22 .xlsx spreadsheet. I am downloading it via pydrive, filling in some of the blank values, and uploading the file back via pydrive. When I upload the file back, all cells with formulas are blank (any cell that starts with =). I have a local copy of the file I want to upload, and it looks fine so I'm pretty sure the issue must be with pydrive.
My code:
def upload_r1masterfile(filename='temp.xlsx'):
"""
Upload a given file to drive as our master file
:param filename: name of local file to upload
:return:
"""
# Get the file we want
master_file = find_r1masterfile()
try:
master_file.SetContentFile(filename)
master_file.Upload()
print 'Master file updated. ' + str(datetime.datetime.now())
except Exception, e:
print "Warning: Something wrong with file R1 Master File."
print str(e)
return e
The only hint I have is that if I add the param={'convert': True} tag to Upload, then there is no loss. However, that means I am now working in google sheets format, and I would rather not do that. Not only because it's not the performed format to work with here, but also because if I try to master_file.GetContentFile(filename) I get the error: No downloadLink/exportLinks for mimetype found in metadata
Any hints? Is there another attribute on upload that I am not aware of?
Thanks!
Robin was able to help me answer this question at the github repository. Both suggested solutions worked:
1) When you upload the file, did you close Excel first? IIRC MS Office writes a lot of the content to a temporary file, so that may explain why some parts are missing. If you tried the non converting upload first, the full file may have been saved to disk between the two tries, and thus the second converting upload attempt worked.
2) GetContentFile takes a second argument called mimetype, which should allow you to download the file. Could you try .GetContentFile(filename, mimetype="application/vnd.ms-excel")? If that mimetype doesn't work as anticipated, there is a great StackOverflow post here which lists a bunch of different types you can try.
Thanks again Robin!

How to get the file-name before downloading the file

I am trying to download a binary file from a http: server. I am using the functions InternetOpenUrl() and then InternetReadFile() to download the file. Is it possible to know the file name before downloading?
What I am doing now to get the file name is- Once the download is complete, using GetFileVersionInfo() and from the buffer i am getting the OrginalFilename, then renaming the file to the OrginalFilename.
Is there any other way to get the file name before downloading?
Thanks
Vinod
Look at HttpQueryInfo. Look at the Content-Type and Content-Disposition headers.
You may have to use HTTP_QUERY_CUSTOM to get raw content-type if it just returns e.g. "text/plain".
To get all the headers (and thereby work out which one contains the information you want) you can use HTTP_QUERY_RAW_HEADERS_CRLF.