Is there any compression available in libcurl - c++

I need to transfer a huge file from local machine to remote machine using libcurl with C++. Is there any compression option available in-built with libcurl. As the data to be transferred is large (100 MB to 1 GB in size), it would be better if we have any such options available in libcurl itself. I know we can compress the data and send it via libcurl. But just want to know is there any better way of doing so.
Note: In my case, many client machines transfer such huge data to remote server at regular interval of time.
thanks,
Prabu

According to curl_setopt() and options CURLOPT_ENCODING, you may specify:
The contents of the "Accept-Encoding: " header. This enables decoding
of the response. Supported encodings are "identity", "deflate", and
"gzip". If an empty string, "", is set, a header containing all
supported encoding types is sent.
Here are some examples (just hit search in your browser and type in compression), but I don't know hot exactly does it work and whether it expect already gzipped data.
You still may use gzcompress() and send compressed chunks on your own (and I would do the task this way... you'll have better control on what's actually going on and you'll be able to change used algorithms).

You need to send your file with zlib compression by yourself. And perhaps there are some modification needed on the server-side.

Related

libcurl inbuilt compression support or not

I found an answer which told me that libcurl doesn't support compression.
libcurl (linux, C) - Is there a built-in way to POST/PUT a gzipped request?
But, somewhere else, I also found this:
Is there any compression available in libcurl
My question is, do I still have to compress strings on my own and then send them, or is that not necessary using libcurl?
Yes, if you send data you must compress that yourself before sending. There is no support for doing that "automatically" in for example HTTP (neither 1.1 nor HTTP/2).

When to use raw binary and when base64?

I want to develop a service which receives files from users. At first, I was willing to implement uploads using raw binary in order to save time (base64 increases file size by about 33%), but reading about base64 it seems to be very useful if you don't want problems uploading files.
The question is what are the downsides of implementing raw binary uploads? And in which cases it makes sense? In this case I will develop client and server so I will have control over these two, but what about routers or network, can they corrupt data if not in base64?
I'm trying to investigate what dropbox or google drive do and why, but I can't find an article.
You won't have any problems using raw binary for file uploads. All Internet Protocol networking hardware is required to be 8 bit clean - that is to transmit all 8 bits of every byte/octet.
If you choose to use the TCP protocol, it guarantees reliable transmission of octets (bytes). Encoding using base64 would be a waste of time and bandwidth.

When will NSURLConnection decompress a compressed resource?

I've read how NSURLConnection will automatically decompress a compressed (zipped) resource, however I can not find Apple documentation or official word anywhere that specifies the logic that defines when this decompression occurs. I'm also curious to know how this would relate to streamed data.
The Problem
I have a server that streams files to my app using a chunked encoding, I believe. This is a WCF service. Incidentally, we're going with streaming because it should alleviate server load during high use and also because our files are going to be very large (100's of MB). The files could be compressed or uncompressed. I think in my case because we're streaming the data, the Content-Encoding header is not available, nor is Content-Length. I only see "Transfer-Encoding" = Identity in my response.
I am using the AFNetworking library to write these files to disk with AFHTTPRequestOperation's inputStream and outputStream. I have also tried using AFDownloadRequestOperation as well with similar results.
Now, the AFNetworking docs state that compressed files will automatically be decompressed (via NSURLConnection, I believe) after download and this is not happening. I write them to my documents directory, with no problems. Yet they are still zipped. I can unzip them manually, as well. So the file is not corrupted. Do they not auto-unzip because I'm streaming the data and because Content-Encoding is not specified?
What I'd like to know:
Why are my compressed files not decompressing automatically? Is it because of streaming? I know I could use another library to decompress afterward, but I'd like to avoid that if possible.
When exactly does NSURLConnection know when to decompress a downloaded file, automatically? I can't find this in the docs anywhere. Is this tied to a header value?
Any help would be greatly appreciated.
NSURLConnection will decompress automatically when the appropriate Content-Encoding (e.g. gzip) is available in the response header. That's down to your server to arrange.

Transferring large files with web services

What is the best way to transfer large files with web services ? Presently we are using the straight forward option to transfer the binary data by converting the binary data into base 64 format and embeding the base 64 encoding into soap envelop itself.But it slows down the application performance considerably.Please suggest something for performance improvement.
In my opinion the best way to do this is to not do this!
The Idea of Webservices is not designed to transfer large files. You should really transfer an url to the file and let the receiver of the message pull the file itsself.
IMHO that would be a better way to do this then encoding and sending it.
Check out MTOM, a W3C standard designed to transfer binary files through SOAP.
From Wikipedia:
MTOM provides a way to send the binary
data in its original binary form,
avoiding any increase in size due to
encoding it in text.
Related resources:
SOAP Message Transmission Optimization Mechanism
Message Transmission Optimization Mechanism (Wikipedia)

Multi-part gzip file random access (in Java)

This may fall in the realm of "not really feasible" or "not really worth the effort" but here goes.
I'm trying to randomly access records stored inside a multi-part gzip file. Specifically, the files I'm interested in are compressed Heretrix Arc files. (In case you aren't familiar with multi-part gzip files, the gzip spec allows multiple gzip streams to be concatenated in a single gzip file. They do not share any dictionary information, it is simple binary appending.)
I'm thinking it should be possible to do this by seeking to a certain offset within the file, then scan for the gzip magic header bytes (i.e. 0x1f8b, as per the RFC), and attempt to read the gzip stream from the following bytes. The problem with this approach is that those same bytes can appear inside the actual data as well, so seeking for those bytes can lead to an invalid position to start reading a gzip stream from. Is there a better way to handle random access, given that the record offsets aren't known a priori?
The BGZF file format, compatible with GZIP was developped by the biologists.
(...) The advantage of
BGZF over conventional gzip is that
BGZF allows for seeking without having
to scan through the entire file up to
the position being sought.
In http://picard.svn.sourceforge.net/viewvc/picard/trunk/src/java/net/sf/samtools/util/ , have a look at BlockCompressedOutputStream and BlockCompressedInputStream.java
The design of GZIP, as you have realized, is not friendly to random access.
You can do as you describe, and then if you run into an error in the decompressor, conclude that the signature you found was actually compressed data.
If you finish decompressing, then it's easy to verify the validity of the stream just decompressed, via the CRC32.
If the files are not so big, you might consider just de-compressing all of the entries in series, and retaining the offsets of the signatures so as to build a directory. As you decompress, dump the bytes to a bit bucket. At that point you will have generated a directory, and you can then support random access based on filename, date, or other metadata.
This will be reasonably fast for files below 100k. Just as a guess, if you had 10 files of around 100k each, it would probably be done in 2s on a modern CPU. This is what I mean by "pretty fast". But only you know the perf requirements of your application .
Do you have a GZipInputStream class? If so you are half-way there.