I understand that GZIP is a combination of LZ77 and Huffman coding and can be configured with a level between 1-9 where 1 indicates the fastest compression (less compression) and 9 indicates the slowest compression method (best compression).
My question is, does the choice of level only impact the compression process or is there an additional cost also incurred in decompression depending on the level used to compress?
I ask because typically many web servers will GZIP responses on the fly if the client supports it, e.g. Accept-Encoding: gzip. I appreciate that when doing this on the fly a level such as 6 might be the good choice for the average case, since it gives a good balance between speed and compression.
However, if I have a bunch of static assets that I can GZIP just once ahead of time - and never need to do this again - would there be any downside to using the highest but slowest compression level? I.e. is there now an additional overhead for the client that would not have been incurred had a lower compression level been used.
Great question, and an underexposed issue. Your intuition is solid – for some compression algorithms, choosing the max level of compression can require more work from the decompressor when it's unpacked.
Luckily, that's not true for gzip – there's no extra overhead for the client/browser to decompress more heavily compressed gzip files (e.g. choosing 9 for compression instead of 6, assuming the standard zlib codebase that most servers use). The best measure for this is decompression rate, which for present purposes is in units of MB/sec, while also monitoring overhead like memory and CPU. Simply going by decompression time is no good because the file is smaller at higher compression settings, and we're not controlling for that factor if we're only using a stopwatch.
gzip decompression quickly gets asymptotic in terms of both time-to-decompress and memory usage once you get past level 6 compressed content. The time-to-decompress flatlines for levels 7, 8, and 9 in the test results linked by Marcus Müller, though that's coarse-grained data given in whole seconds.
You'll also notice in those results that the memory requirements for decompression are flat for all levels of compression at 0.1 MiB. That's almost unbelievable, just a degree of excellence in software that we rarely see. Mark Adler and colleagues deserve massive props for what they achieved. gzip is a very nice format.
The memory use gets at your question about overhead. There really is none. You don't gain much with level 9 in terms of browser decompression speed, but you don't lose anything.
Now, check out these test results for a bit more texture. You'll see how the gzip decompression rate is slightly faster with level 9 compressed content than with lower levels (at level 9, decomp rate is about 0.9% faster than at level 6, for example). That is interesting and surprising. I wouldn't expect the rate to increase. That was just one set of test results – it may not hold for other scenarios (and the difference is quite small in any case).
Parting note: Precompressing static files is a good idea, but I don't recommend gzip at level 9. You'll get smaller files than gzip-9 by instead using zopfli or libdeflate. Zopfli is a well-established gzip compressor from Google. libdeflate is new but quite excellent. In my testing it consistently beats gzip-9, but still trails zopfli. You can also use 7-Zip to create gzip files, and it will consistently beat gzip-9. (In the foregoing, gzip-9 refers to using the canonical gzip or zlib application that Apache and nginx use).
No, there is no downside on the decompression side when using the maximum compression level. In fact, there is a slight upside, in that better-compressed data decompresses faster. The reason is simply fewer compressed bits that the decompressor has to process.
Actually, in real world measurements a higher compression level yields lower decompression times (which might be primarily caused by the fact that you need to handle less permanent storage and less RAM access).
Since, actually, most things that happen at a client with the data are rather expensive compared to gunzipping, you shouldn't really care about that, at all.
Also be advised that for static assets that are images, usually huffman/zlib coding (PNG simply uses zlib!) is already applied, and you won't gain much by gzipping these. Actually, often small images (for example, icons) fit into a single TCP packet (ignoring the HTTP header, which sometimes is bigger than the image itself) and therefore you don't get any speed gain (but save money on transfer volume -- if you deliver terabytes of small images. Now, may I presume you're not Google itself...
Also, I'd like to point you to higher level optimization, like tools that can transform your javascript code into a compacter shape (eg. removing whitespace, renaming private variables from my_mother_really_likes_this_number_of_unicorns to m1); also, things like JQuery come in a "precompressed" form. The same exists for HTML. Doesn't make things easier to debug, but since you seem to be interested in ultimate space saving...
Related
I know that gzip supports 9 compression levels, from fast to strong.
The decompression algorithm does not care about the compression level at all.
Is it possible to reach a "higher" level than 9 by another tool than the common gzip application?
I mean, someone could have created a modified gzip compressor which is more effective than gzip level 9.
The background is that I have a webserver which hosts compressed gz files. It would be nice to reduce the sizes of those files and I do not care how long my server has to work in order to reduce those files even by 1 byte at the end. It is a one-time task, so it does not matter.
Is there something like a hacked version of gzip supporting higher levels or offering higher compression?
Yes. It's called zopfli. It is painfully slow, but will compress about 5% better than zlib level 9. zopfli is built in to pigz, which is a gzip equivalent that makes use of multiple processors and cores. Compression level 11 in pigz invokes the zopfli compressor. (pigz goes up to 11. Get it?) Using multiple cores on large inputs helps mitigate the slowness of zopfli.
How would one be able to predict execution time and/or resulting compression ratio when compressing a file using a certain lossless compression algorithm? I am especially more concerned with local compression, since if you know time and compression ratio for local compression, you can easily calculate time for network compression based on currently available network throughput.
Let's say you have some information about file such as size, redundancy, type (we can say text to keep it simple). Maybe we have some statistical data from actual prior measurements. What else would be needed to perform prediction for execution time and/or compression ratio (even if a very rough one).
For just local compression, the size of the file would have effect since actual reading and writing data to/from storage media (sdcard, hard drive) would take more dominant portion of total execution.
The actual compression portion, will probably depend on redundancy/type, since most compression algorithms work by compressing small blocks of data (100kb or so). For example, larger HTML/Javascripts files compress better since they have higher redundancy.
I guess there is also a problem of scheduling, but this could probably be ignored for rough estimation.
This is a question that been in my head for quiet sometimes. I been wondering if some low overhead code (say on the server) can predict how long it would take to compress a file before performing actual compression?
Sample the file by taking 10-100 small pieces from random locations. Compress them individually. This should give you a lower bound on compression ratio.
This only returns meaningful results if the chunks are not too small. The compression algorithm must be able to make use of a certain size of history to predict the next bytes.
It depends on the data but with images you can take small small samples. Downsampling would change the result. Here is an example:PHP - Compress Image to Meet File Size Limit.
The compression ratio can be calculated with these formulas:
And the performance benchmarking can be done using V8 or Sunspider.
You can also use algorithms like DEFLATE or LZMA to compute the mechanism. PPM (Partial by Predicting Matching) can be used for predicting.
I try to pack and compress game client resource data using zlib. If I compress the data, it will reduce Disk I/O as reduced file size but it increases CPU usage when uncompress.
Question1
if a resource used for rendering is compressed, processing (rendering and uncompressing) uses CPU, so i think it seems to be rather slow, is it right?
If no compression, Disk I/O has not changed and an additional CPU usage does not occur. And if you read only a portion of the file, DISK I/O can be reduced by using the CreateFileMapping(), MapViewOfFile() functions.
Question2
In the case of the resource, such as uncompressed image(for example tga, not png) when we have to read whole file (ex. image file), we can't get adventage of CreateFileMapping(), MapViewOfFile(), so i think compressing resource is better, how do you think?
Question3
What do you think about compressing resource data when packing?
Resources for games are not only packed to reduce size, but also to reduce the number of seeks by collapsing many small files into one, which matters a lot more than the size on disk. A single unnecessary seek on a conventional hard disk costs as much time as reading a gigabyte of data. Even if your "compression" consists of only concatenating small files together, you already gain performance.
As a small bonus, having resources packed in an archive somewhat obscures them from computer unsavy people, deterring them from modifying game assets (though admittedly, this is not a very big hurdle!).
Q1: Depending on what compression algorithm you use, you can easily get upwards of 1 GB/s decompression (close to 2 GB/s with a fast CPU). Sequential disk I/O is still around 300-400 MB/s maximum even on solid state (and usually less). Random access disk I/O is 5-20 times slower, depending on the disk and the access pattern.
On the other hand, you can get as little as a few dozen kilobytes per second in decompression speed if you choose a slow algorithm, which is much worse than just loading more data from disk. The secret is to choose an algorithm that compresses reasonably well (not perfectly, just reasonably) and runs at good decompression speed. Compression speed usually does not matter, since this is done offline once. Candidate algorithms are for example LZF, Snappy, or LZ4.
File mapping can generally be used regardless of whether the contents are compressed. Also, filemapping is not only an advantage for very small portions, on the contrary. The larger your reads, the more advantageous it becomes (very small views may actually be faster using conventional reads).
Q2: Uncompressed images do not normally occur in a game. Most of the time you will want to use DXT compression, not so much to reduce disk I/O but to reduce memory and PCIe bandwidth requirements and GPU memory consumption. DXT is a very poor compression, but it works in hardware and has an exactly predictable compression ratio. You can compress DXT-compressed textures again with a conventional general-purpose compressor (with varying rates, depending on what compressor you used, there are some that are especially optimized for that purpose).
Q3: Packing resources is definitively advisable for any non-trivial game.
A bit more detail: we're already trying to take the most advantage of zipmaps, ziplists, etc, and I'm wondering whether these representations are already compressed, or are just serialized hashes and lists; does compression significantly reduce memory usage?
Also, does compression overhead at the app server layer get offset by lower network usage? StackOverflow's experience suggests it does, any other opinions?
In brief, does it make sense - for both short and longer strings?
Redis does not compress your values, and if you should compress them yourself depends a lot on the size of the strings you are going to store. For big strings, hundreds of K's and more it's probably worth the extra CPU cycles on the client side, just like it is when you serve web pages, but for shorter strings it's likely a waste of time. Short strings generally don't compress much, so the gain would be too small.
There's a practical way to get good compression, even for very small strings (50 bytes!) -
If your values are somewhat similar to each other - for example, they're JSON representations of a few related classes of objects - you can precompute a compressor/decompressor dictionary based on some example text.
It sounds complicated, but it's simple in practice - and simpler still with the right wrapper code to handle it.
Here's a Python implementation:
https://github.com/internetarchive/openlibrary/blob/master/openlibrary/utils/compress.py
and here's a wrapper for compressing a specific class of strings: (short JSON records)
https://github.com/internetarchive/openlibrary/blob/master/openlibrary/utils/olcompress.py
One catch: to do this efficiently, your compression library must support 'cloning' the internal state. (The Python library does) You can implement something similar by prepending the example text when compressing, but this means paying an extra computation cost.
Thanks to solrize for this awesome trick.
Redis and clients are typically IO bound and the IO costs are typically at least 2 orders of magnitude in respect to the rest of the request/reply sequence. Smaller payloads will give you higher throughput and lower latencies.
I do not believe there are any hard and fast rules beyond: cost of compression << IO gains. You should bench it and find the sweat spot in setting the lower bound, but the MTU of your network is not a bad starting point for the lower bound.
When reading files off of a hard drive, mmap is generally regarded as a good way to quickly get data into memory. When working with optical drives, accesses take more time and you have a higher latency to worry about. What approach/abstraction do you use to hide/eliminate as much latency and/or overall load time of the optical drive as possible?
There's no real abstraction you can employ. Optical drives have very specific characteristics that must be optimized for to get the best performance.
Some tips:
The biggest killer on optical drives is seek time. Where possible make sure all the files you are reading are sequential on disc and as closely packed as possible. If you must seek then seek in one direction and as infrequently as possible.
Asynchronous reading can also massively improve performance. If you need to load and process files A,B & C then before processing A you should start reading file B, and while processing B you should be reading file C and so on.
Generally the more data you can read in one go the better, e.g avoid lots of little reads(). You will only get the theoretical throughput of a disc while reading large amounts of data. Some OS's /drivers will minimize the penalty of reading lots of little files by caching sectors, some will not.
Doing lots of exists(filename) checking can also be detrimental on some filesystems / OSs where only parts of the TOC are cached.
In our applications we usually pack files into one or more "lumped" files and have them ordered sequentially based on their access order. Some files (and directories) are compressed and read in their entirety before being decompressed in memory. This can be a win if you have a directory that contains a multitude of small files (e.g XML or scripts).
Basically lots of benchmarking and tweaking :)
Minimize or eliminate seeks by reading in giant chunks of data sequentially from a few files (optimally one).
First you must keep in mind, that modern optical drives are quite fast reading sequential data, but seeking data is still a lot slower than on HDs. So if you must seek a lot within a big file (e.g. jump randomly around within a 500+ MB file), it might actually be faster to first copy the whole 500 MB to HD (into a temporary file), which will be done in sequential, fast reads, perform the operation on the temp file (much faster since much faster access times on HD) and delete the file again if you are done with it.
The same of above applies to little big vs many small files as well. Working with a couple of big files is much faster than with many small files, since every time you switch from one small file to another one the huge seeking time will give you headaches again. This is the reason why many games that ship on optical media packs game data in huge archive files (e.g. all textures of one level are in one huge file instead of having one small file per texture), so try keeping data well structured in big files you can read as sequential as possible.
HD caching itself is a good technique. There is this game I remember, though I forgot the title, that always kept the 3D data of your environment on HD. While you were moving through the world, it was constantly copying data from DVD to HD. Thus the surrounding 3D landscape was always available on HD for fast access, however not the whole DVD was copied, only about 200-300 MB were temporarily cached on HD to save HD space. The only annoying thing about that was that you often had DVD access "noise" while playing the game, however most of the time the whole process was happening only during CPU idle times, so it did not really affect game play. Only if you ran very fast constantly within the same direction it could happen that the DVD drive was falling back and all of a sudden the game stopped with a loading indicator for a couple of seconds. However I've been playing this games for days and maybe saw this loading indicator three times within a single week. If you were moving slow or not constantly into the same direction, there never was a loading indicator.
Slow drives are going to be slow. Sorry. However, optical drive hardware will normally be optimized to do sequential reads, so if you can make your code work that way you might see some improvement. I doubt you'll see much difference between mmap(), fread(), et al, for sequential access. You might also be able to tune your read buffer size to be a multiple of the drive's block size, if your OS isn't already doing that for you. Optical drive can have large block sizes compared to hard drives, and if your buffers aren't large enough you're paying a price.
I'm not sure that there is a lot that you can do by the time that you are reading it. You could look at the create file API -- you can pass some hints to Windows that tell it that you are opening the file for Sequential or Random access. That is supposed to allow Windows to optimize the caching strategy used for the file.
You can tune the "chunks" that you bite off when reading your file to make them larger or smaller. You might get a slight improvement if you read in chunks that are multiples of the allocation unit size on the disk.
The hardware and media can make a difference. Say you have a DVD drive that reads at 16x. It will require media that is rated at 16x or higher, and some drives don't work well with some media brands. So even if the media meets the ratings, you might not be reading at the maximum speed. (usually a good hardware review on an optical drive will include details like this).
The layout of the files on the optical disk could be important. Was it burned all at once? Was it just mounted as a disk (like a packet-mode R/W?). I don't have experience with this, but given the longer seek times on an optical drive, fragmented files might have a greater impact than they do with a modern hard drive.