python mechanize retrieving files larger than 1GB

python mechanize retrieving files larger than 1GB - python-2.7

I am trying to download some files via mechanize. Files smaller than 1GB are downloaded without causing any trouble. However, if a file is bigger than 1GB the script runs out of memory:
The mechanize_response.py script throws out of memory at the following line
self.__cache.write(self.wrapped.read())
__cache is a cStringIO.StringIO, It seems that it can not handle more than 1GB.
How to download files larger than 1GB?
Thanks

It sounds like you are trying to download the file into memory but you don't have enough. Try using the retrieve method with a file name to stream the downloaded file to disc.

I finally figured out a work around.
Other than using browser.retrieve or browser.open I used mechanize.urlopen which returned the urllib2 Handler. This allowed me to download files larger than 1GB.
I am still interested in figuring out how to make retrieve work for files larger than 1GB.

Related

cPickle.load() doesnt accept non-.gz files, what can I use for .pkl files?

I am trying to run an example of a LSTM recurrent neural network that is presented in this git: https://github.com/mesnilgr/is13.
I've installed theano and everything and when I got to the point of running the code, I've noticed the data was not being downloaded, so I've opened an issue on the github (https://github.com/mesnilgr/is13/issues/12) and this guy came up with a solution that consisted in:
1-get the data from the dropbox link he provides.
2- change the code of the 'load.py' file to download, and read the data properly.
The only issue is that the data in the dropbox folder(https://www.dropbox.com/s/3lxl9jsbw0j7h8a/atis.pkl?dl=0) is not a compacted .gz file as, I suppose, was the data from the original repository. So I dont have enough skill to change the code in order to do with the uncompressed data exaclty what it would do with the compressed one. Can someone help me?
The modification suggested and the changes I've done are described on the issue I've opened on the git(https://github.com/mesnilgr/is13/issues/12).

It looks like your code is using
gzip.open(...)
But if the file is not gzipped then you probably just need to remove the gzip. prefix and use
open(...)

Django FileUploadHundler force file storage to Disk

I understand that Django File Upload Handler by default stores files less than 2.5MB to memory and those above it to a temp folder in the disk,
In my models,where I have a file field,I have specified the upload_tofolder where I expect the files to be written to.
Though when I try reading this files from this folder,I get an error implying that the files do not yet exist in that folder.
How will I force django to write the files to the folder specified in upload_to before another procedure starts reading from it?
I know I can read the files directly from memory by request.FILES['file'].name but I would rather force the files to be written from memory to folder before I read them.
Any Insights will be highly appreciated.

FILE_UPLOAD_MAX_MEMORY_SIZE setting tells django the maximum size of file to keep in memory. Set it to 0 and it will be always written to disk.

Chunk download with OneDrive Rest API

this is the first time I write on StackOverflow. My question is the following.
I am trying to write a OneDrive C++ API based on the cpprest sdk CasaBlanca project:
https://casablanca.codeplex.com/
In particular, I am currently implementing read operations on OneDrive files.
Actually, I have been able to download a whole file with the following code:
http_client api(U("https://apis.live.net/v5.0/"), m_http_config);
api.request(methods::GET, file_id +L"/content" ).then([=](http_response response){
return response.body();
}).then([=]( istream is){
streambuf<uint8_t> rwbuf = file_buffer<uint8_t>::open(L"test.txt").get();
is.read_to_end(rwbuf).get();
rwbuf.close();
}).wait()
This code is basically downloading the whole file on the computer (file_id is the id of the file I am trying to download). Of course, I can extract an inputstream from the file and using it to read the file.
However, this could give me issues if the file is big. What I had in mind was to download a part of the file while the caller was reading it (and caching that part if he came back).
Then, my question would be:
Is it possible, using the OneDrive REST + cpprest downloading a part of a file stored on OneDrive. I have found that uploading files in chunks seems apparently not possible (Chunked upload (resumable upload) for OneDrive?). Is this true also for the download?
Thank you in advance for your time.
Best regards,
Giuseppe

OneDrive supports byte range reads. And so you should be able to request chunks of whatever size you want by adding a Range header.
For example,
GET /v5.0/<fileid>/content
Range: bytes=0-1023
This will fetch the first KB of the file.

Library for extracting zip on the fly

I have a rather large ZIP file, which gets downloaded (cannot change the file). The quest now is to unzip the file while it is downloading instead of having to wait till the central directory end is received.
Does such a library exist?

I wrote "pinch" a while back. It's in Objective-C but the method to decode files from a zip might be a way to get it in C++? Yeah, some coding will be necessary.
http://forrst.com/posts/Now_in_ObjC_Pinch_Retrieve_a_file_from_inside-I54
https://github.com/epatel/pinch-objc

I'm not sure such a library exists. Unless you are on a very fast line [or have a very slow processor], it's unlikely to save you a huge amount of time. Decompressing several gigabytes only takes a few seconds if all the data is in ram [it may then take a while to write the uncompressed data to the disk, and loading it from the disk may add to the total time].
However, assuming the sending end supports "range" downloading, you could possibly write something that downloads the directory first [by reading the fixed header first, then reading the directory and then downloading the rest of the file from start to finish]. Presumably that's how "pinch" linked in epatel's answer works.

Compressing KML files

is there anyway to compress a KML file?
My KML files are created once pr. night. And are allways named the same. But the biggest file takes up almost 10mb. And that takes about 10 seconds to download on a computer with high speed internet. I would like for this file to either be smaller. or if there are some other ways to minimize the file content, without loosing any features i would like to know :)
Im using Openlayers to view the map and load the KML files.

The first thing to try is gzip compression - check out this - https://developers.google.com/speed/articles/gzip

KMZ files are zipped KML files.
KMZ Files

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

python mechanize retrieving files larger than 1GB - python-2.7

It sounds like you are trying to download the file into memory but you don't have enough. Try using the retrieve method with a file name to stream the downloaded file to disc.

I finally figured out a work around. Other than using browser.retrieve or browser.open I used mechanize.urlopen which returned the urllib2 Handler. This allowed me to download files larger than 1GB. I am still interested in figuring out how to make retrieve work for files larger than 1GB.

Related

cPickle.load() doesnt accept non-.gz files, what can I use for .pkl files?

Django FileUploadHundler force file storage to Disk

Chunk download with OneDrive Rest API

Library for extracting zip on the fly

Compressing KML files

Categories

Resources