how much gzipping burdens the client? - web-services

I'm optimizing our web service, and heard about gzip.
It would be good if we can reduce the network load using gzip, but I'm a little worried about how much unpacking overhead it'll bring to client.
Especially, our service uses javascript very often - which means that page rending in web browser will cost CPU time.
I cannot sure that taking cpu time to decompress gzip packet (instead of taking care of javascript) would bring positive effect to our service still.

Things like HTML and javascript libraries, particularly static files, are good candidates for compression. images aren't - they're already compressed.
Decompression of gzip compressed data is very fast compared to most internet connections - a quick test on my PC (AMD phenom 2.8GHz) results in decompression of about 170m/second, in a single core. So a ~200k javascript file would be decompressed by a modern browser on a modern PC in about 2 milliseconds, and javascript typically compresses to about 25% of its original size (~35% if it is already minified).
Of course, just what proportion of your network load is made up of decompressed javascript is another matter.

Related

Is innodb compression compatible with full text search and is the memory compressed too?

I wan't to know if full text search can be used in compressed innodb tables, and if the compression will reduce both memory and disk usage or only disk, and i there is a performance impact using the compression.
"Compatibility" is easily answered by trying in a tiny table. I think it is compatible because the data is uncompressed whenever it comes into the buffer_pool.
"Compressed" is likely to save disk space, but the numbers I have heard are only 2x. Ordinary text usually compresses 3x, but InnoDB has headers, etc, that are not compressed. (JPG does not compress.)
As for reducing memory (buffer_pool) -- It is likely to consume extra memory because both the compressed and uncompressed copies of the data are in memory at least some of the time.
A reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-compression-internals.html , plus pages around it.
My opinion is that InnoDB's compressed is rarely useful. Instead, I recommend compressing and decompressing individual columns in the client, thereby offloading that CPU task from the server. But that would not work for FULLTEXT, so maybe it would be useful for your application.

Streaming File Delta Encoding/Decoding

Here's the problem - I want to generate the delta of a binary file (> 1 MB in size) on a server and send the delta to a memory-constrained (low on RAM and no dynamic memory) embedded device over HTTP. Deltas are preferred (as opposed to sending the full binary file from the server) because of the high cost involved in transmitting data over the wire.
Trouble is, the embedded device cannot decode deltas and create the contents of the new file in memory. I have looked into various binary delta encoding/decoding algorithms like bsdiff, VCDiff etc. but was unable to find libraries that supported streaming.
Perhaps, rather than asking if there are suitable libraries out there, are there alternate approaches I can take that will still solve the original problem (send minimal data over the wire)? Although it would certainly help if there are suitable delta libraries out there that support streaming decode (written in C or C++ without using dynamic memory).
Maintain a copy on the server of the current file as held by the embedded device. When you want to send an update, XOR the new version of the file with the old version and compress the resultant stream with any sensible compressor. (Algorithms which allow high-cost encoding to allow low-cost decoding would be particularly helpful here.) Send the compressed stream to the embedded device, which reads the stream, decompresses it on the fly and XORs directly (a copy of) the target file.
If your updates are such that the file content changes little over time and retains a fixed structure, the XOR stream will be predominantly zeroes, and will compress extremely well: number of bytes transmitted will be small, effort to decompress will be low, memory requirements on the embedded device will be minimal. The further your model is from these assumptions, the less this approach will gain you.
Since you said the delta could be arbitrarily random (from zero delta to a completely different file), compression of the delta may be a lost cause. Lossless compression of random binary data is theoretically impossible. Also, since the embedded device has limited memory anyway, using a sophisticated -and therefore computationally expensive- library for compression/decompression of the occasional "simple" delta will probably be infeasible.
I would recommend simply sending the new file to the device in raw byte format, and overwriting the existing old file.
As Kevin mentioned, compressing random data should not be your goal. A few more comments about the type of data your working with would be helpful. Context is key in compression.
You used the term image which makes it sound like the classic video codec challenge. If you've ever seen weird video aliasing effects that impact the portion of the frame that has changed, and then suddenly everything clears up. You've likely witnessed the notion of a key frame along with a series of delta frames. Where the delta frames were not properly applied.
In this model, the server decides what's cheaper:
complete key frame
delta commands
The delta commands are communicated as a series of write instructions that can overlay the clients existing buffer.
Example Format:
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
There are likely a variety of methods for computing these delta commands. A brute force method would be:
Perform Smith Waterman between two images.
Compress the resulting transform into delta commands.

When to compress HTTP POST data?

I am writing a client that frequently sends small portions of data via HTTPS. The data can be anything from 50 UTF-8 chars to 10k chars - mostly human readable log data. I am using RFC standard HTTP compression.
I need to optimise for CPU consumption. I wonder if there is any threshold, something like: if a string is more than 100 chars, then it is worth doing compression.
Should I always apply compression for HTTP payload or only when its worth doing?
In my opinion, the CPU overhead caused by compression on very small files is close to zero. So, I don't think it's worth doing a test on the file size.
The most worrying aspect in your description is actually that you apparently have many small requests over HTTPS. If it's not already the case, I would recommend to enable SSL session caching in order to avoid too many SSL handshakes, which are likely to consume more CPU than compressing 100 bytes or so.

When to use raw binary and when base64?

I want to develop a service which receives files from users. At first, I was willing to implement uploads using raw binary in order to save time (base64 increases file size by about 33%), but reading about base64 it seems to be very useful if you don't want problems uploading files.
The question is what are the downsides of implementing raw binary uploads? And in which cases it makes sense? In this case I will develop client and server so I will have control over these two, but what about routers or network, can they corrupt data if not in base64?
I'm trying to investigate what dropbox or google drive do and why, but I can't find an article.
You won't have any problems using raw binary for file uploads. All Internet Protocol networking hardware is required to be 8 bit clean - that is to transmit all 8 bits of every byte/octet.
If you choose to use the TCP protocol, it guarantees reliable transmission of octets (bytes). Encoding using base64 would be a waste of time and bandwidth.

How large does a file need to be to benefit from gzip compression?

Since gzip takes time to pack on the server side, and more time to unpack on the client side, how large does a file need to be in order to benefit from it?
Are there any real numbers out there that demonstrate the efficacy of gzip at common download speeds?
It will depend heavily in the nature of the data to be transferred (i.e. how much compressible is the data you are working on).
If you are concerned about the time it takes to get the original file on the client side you should compare:
a) time taken to compress the file in the server + time taken to transfer the compressed file from the server to the client + time taken to decompress the file in the client
b) time taken to transfer the original (uncompressed) file from the server to the client.
I believe you would have to try and measure these figures using actual sample data of your application.
For example, if you were dealing with video files (uncompressable) then it would probably be better just to send the file without compressing it.
However, if for example, you were dealing with text files (highly compressable) then the overall time used for a) might be lower than b)
Not very large, gzip compress text very well, even small. CPU is much cheaper than transfer. 1M file compressed to 100K will be downloaded ten times faster. You should not gzip jpgs,mp3 and any other already compressed data.