When will NSURLConnection decompress a compressed resource? - compression

I've read how NSURLConnection will automatically decompress a compressed (zipped) resource, however I can not find Apple documentation or official word anywhere that specifies the logic that defines when this decompression occurs. I'm also curious to know how this would relate to streamed data.
The Problem
I have a server that streams files to my app using a chunked encoding, I believe. This is a WCF service. Incidentally, we're going with streaming because it should alleviate server load during high use and also because our files are going to be very large (100's of MB). The files could be compressed or uncompressed. I think in my case because we're streaming the data, the Content-Encoding header is not available, nor is Content-Length. I only see "Transfer-Encoding" = Identity in my response.
I am using the AFNetworking library to write these files to disk with AFHTTPRequestOperation's inputStream and outputStream. I have also tried using AFDownloadRequestOperation as well with similar results.
Now, the AFNetworking docs state that compressed files will automatically be decompressed (via NSURLConnection, I believe) after download and this is not happening. I write them to my documents directory, with no problems. Yet they are still zipped. I can unzip them manually, as well. So the file is not corrupted. Do they not auto-unzip because I'm streaming the data and because Content-Encoding is not specified?
What I'd like to know:
Why are my compressed files not decompressing automatically? Is it because of streaming? I know I could use another library to decompress afterward, but I'd like to avoid that if possible.
When exactly does NSURLConnection know when to decompress a downloaded file, automatically? I can't find this in the docs anywhere. Is this tied to a header value?
Any help would be greatly appreciated.

NSURLConnection will decompress automatically when the appropriate Content-Encoding (e.g. gzip) is available in the response header. That's down to your server to arrange.

Related

Requests issue decoding gzip

I'm trying to pull a large number of text files from a website using the requests package where some of the files are available outright as text and others are compressed text files.
tmpHtml = 'https://website.com/csv/pwr/someData.dat.gz'
tmpReq = requests.get(tmpHtml, proxies = proxy_w_auth, auth = (usr, pw))
When I pull the uncompressed files, everything works well however when I pull one of the compressed files I get lots of the following:
'\x1f\x8b\x08\x08\xe5\xc6\xd9A\x00\x03someData.dat\x00\xa5\x9d\xcbn\x1c\xb9\x19\x85\xf7\x01\xf2\x0e\xfd\x00Q\xa9X,^j\xa9\xc8\x9a\xb1\x9dX\x16dM\x12/\r\x8c\x0712\x19\x0f\xb2\t\x02\xf4\xc3\xa7\xba\xeeM\x9e\x9f<\xa46s\x93\xf1\r\x8b\xfd\x7fl\x9e\xe2E/\xcfwo\x1eNo\xee^\x1e\xceo\x7f\xfa\xf3\xf9\xe9\xf9\xe3\x9b\x9f\xee_\xce\x9f^\x9e\xdf=\x9d\xef?>\xbe<\xdf\x8d\xff\xba\xfe\xc3\xe9\xe5\xf3\xd3\xc3\xf4\xc3\xbf\x8c\x7f{xy\xf9\xeb\xc3\x87\x87\xc7\x97\xd3\xd3\xf3\xbb\xfb\x87\xf3\xe3\xc3\xcb\xe9\xfe\xed\xdd\xe3\x8f\x0f\xe7\x87\x7f<\xbd{\xbe{y\xf7\xf1qb\xff\xf1\x0f\xeaV\xdfvmk\xce\xf7\xdf~;\xff\xf0\xed\xb7\xd3\xa7\xff~\xf9\xfd\xe6\xe9\xeb\x97\x7f\xfd\xe9\xf4\xc3\xd3\xe9\x97\xef\xff9]\x10\xeaV-\x7f\xec\xdd\xe3\xf9\x87\xf3\xb9W\x8d\xf6\xe7\x1b\xd3\xf4n\xfc\x99\x9e\x7fH\xd3\xba\x90f\x1ak\xce7\xbaQ\xe3\x8f:_\x06\xd31ldu\xe3_tq\xc3z\x91\xd5\xdfvC\x19\xcb\x84,\xdd\xb8\x11\xa6\x9a\xce\x8c?+m\x99\ri\xf6\xc2\xb9i\xc7\xa6\xd9[\xdd\x96\xc1\\\x003vn\xda\xf8\x83\xd2\xa7\xf4\x12\xca\x17?\xe2\x10u\xd8\xe5\xf9\xc6\xa7\x1c\x8a\x1fP\xb5
I can see the file name in the beginning of the string that is returned but I'm not sure how I can actually extract the content. According to the requests documentation, it should automatically be decompressing gz files?
http://requests.readthedocs.org/en/latest/community/faq/
The response object looks like it has gzip in the headers as well:
{'Accept': '/', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'python-requests/2.7.0 CPython/2.7.10 Windows/7'}
Any suggestions would be much appreciated.
Sometimes web clients request that the server compress a file before sending it. Not .gz files, mind you, since you wouldn't compress something twice. This cuts down the file size, especially for large text files. The client then decompresses it automatically before displaying it to the user. This is what the requests docs in your question describe. You do not have to worry about this for your use-case.
To decompress a gzipped file, you have to either decompress it in memory using gzip (part of the standard lib) or write it to disk in 'wb' mode and use the gzip utility.

Chunk download with OneDrive Rest API

this is the first time I write on StackOverflow. My question is the following.
I am trying to write a OneDrive C++ API based on the cpprest sdk CasaBlanca project:
https://casablanca.codeplex.com/
In particular, I am currently implementing read operations on OneDrive files.
Actually, I have been able to download a whole file with the following code:
http_client api(U("https://apis.live.net/v5.0/"), m_http_config);
api.request(methods::GET, file_id +L"/content" ).then([=](http_response response){
return response.body();
}).then([=]( istream is){
streambuf<uint8_t> rwbuf = file_buffer<uint8_t>::open(L"test.txt").get();
is.read_to_end(rwbuf).get();
rwbuf.close();
}).wait()
This code is basically downloading the whole file on the computer (file_id is the id of the file I am trying to download). Of course, I can extract an inputstream from the file and using it to read the file.
However, this could give me issues if the file is big. What I had in mind was to download a part of the file while the caller was reading it (and caching that part if he came back).
Then, my question would be:
Is it possible, using the OneDrive REST + cpprest downloading a part of a file stored on OneDrive. I have found that uploading files in chunks seems apparently not possible (Chunked upload (resumable upload) for OneDrive?). Is this true also for the download?
Thank you in advance for your time.
Best regards,
Giuseppe
OneDrive supports byte range reads. And so you should be able to request chunks of whatever size you want by adding a Range header.
For example,
GET /v5.0/<fileid>/content
Range: bytes=0-1023
This will fetch the first KB of the file.

Is there any compression available in libcurl

I need to transfer a huge file from local machine to remote machine using libcurl with C++. Is there any compression option available in-built with libcurl. As the data to be transferred is large (100 MB to 1 GB in size), it would be better if we have any such options available in libcurl itself. I know we can compress the data and send it via libcurl. But just want to know is there any better way of doing so.
Note: In my case, many client machines transfer such huge data to remote server at regular interval of time.
thanks,
Prabu
According to curl_setopt() and options CURLOPT_ENCODING, you may specify:
The contents of the "Accept-Encoding: " header. This enables decoding
of the response. Supported encodings are "identity", "deflate", and
"gzip". If an empty string, "", is set, a header containing all
supported encoding types is sent.
Here are some examples (just hit search in your browser and type in compression), but I don't know hot exactly does it work and whether it expect already gzipped data.
You still may use gzcompress() and send compressed chunks on your own (and I would do the task this way... you'll have better control on what's actually going on and you'll be able to change used algorithms).
You need to send your file with zlib compression by yourself. And perhaps there are some modification needed on the server-side.

How to hook custom file parser to Gstreamer Decoder?

The HTTP file and its contents are already downloaded and are present in memory. I just have to pass on the content to a decoder in gstreamer and play the content. However, I am not able to find the connecting link between the two.
After reading the documentation, I understood that gstreamer uses httpsoupsrc for downloading and parsing of http files. But, in my case, I have my own parser as well as file downloader to do the same. It takes the url and returns the data in parts to be used by the decoder. I am not sure howto bypass httpsoupsrc and use my parser instead also how to link it to the decoder.
Please let me know if anyone knows how things can be done.
You can use appsrc. You can pass chunks of your data to app source as needed.

Read csv file from website into c++

I'd like to read the contents of a .csv file from a website, into a c++ program. Specifically, it is financial data of the form from google finance.
http://www.google.com/finance/historical?cid=22144&startdate=Nov+1%2C+2011&enddate=Nov+14%2C+2011
(If you append "&output=csv" to the above link it will download the data as a csv file)
I know that I can use something like libcurl to download the file and then read it in from there, but I wanted to read it directly into the program without having to write it to a file first.
Can I get some suggestions on the best way to do this? I was thinking boost.asio but I have no experience with it (or network programming in general).
If you are trying to download it from a web resource you will need to implement at least some part of the HTTP protocol. libcurl will do this for you.
You don't need to save it as a file. This example will show you how to download and store it in a memory buffer.