I try to pack and compress game client resource data using zlib. If I compress the data, it will reduce Disk I/O as reduced file size but it increases CPU usage when uncompress.
Question1
if a resource used for rendering is compressed, processing (rendering and uncompressing) uses CPU, so i think it seems to be rather slow, is it right?
If no compression, Disk I/O has not changed and an additional CPU usage does not occur. And if you read only a portion of the file, DISK I/O can be reduced by using the CreateFileMapping(), MapViewOfFile() functions.
Question2
In the case of the resource, such as uncompressed image(for example tga, not png) when we have to read whole file (ex. image file), we can't get adventage of CreateFileMapping(), MapViewOfFile(), so i think compressing resource is better, how do you think?
Question3
What do you think about compressing resource data when packing?
Resources for games are not only packed to reduce size, but also to reduce the number of seeks by collapsing many small files into one, which matters a lot more than the size on disk. A single unnecessary seek on a conventional hard disk costs as much time as reading a gigabyte of data. Even if your "compression" consists of only concatenating small files together, you already gain performance.
As a small bonus, having resources packed in an archive somewhat obscures them from computer unsavy people, deterring them from modifying game assets (though admittedly, this is not a very big hurdle!).
Q1: Depending on what compression algorithm you use, you can easily get upwards of 1 GB/s decompression (close to 2 GB/s with a fast CPU). Sequential disk I/O is still around 300-400 MB/s maximum even on solid state (and usually less). Random access disk I/O is 5-20 times slower, depending on the disk and the access pattern.
On the other hand, you can get as little as a few dozen kilobytes per second in decompression speed if you choose a slow algorithm, which is much worse than just loading more data from disk. The secret is to choose an algorithm that compresses reasonably well (not perfectly, just reasonably) and runs at good decompression speed. Compression speed usually does not matter, since this is done offline once. Candidate algorithms are for example LZF, Snappy, or LZ4.
File mapping can generally be used regardless of whether the contents are compressed. Also, filemapping is not only an advantage for very small portions, on the contrary. The larger your reads, the more advantageous it becomes (very small views may actually be faster using conventional reads).
Q2: Uncompressed images do not normally occur in a game. Most of the time you will want to use DXT compression, not so much to reduce disk I/O but to reduce memory and PCIe bandwidth requirements and GPU memory consumption. DXT is a very poor compression, but it works in hardware and has an exactly predictable compression ratio. You can compress DXT-compressed textures again with a conventional general-purpose compressor (with varying rates, depending on what compressor you used, there are some that are especially optimized for that purpose).
Q3: Packing resources is definitively advisable for any non-trivial game.
Related
How would one be able to predict execution time and/or resulting compression ratio when compressing a file using a certain lossless compression algorithm? I am especially more concerned with local compression, since if you know time and compression ratio for local compression, you can easily calculate time for network compression based on currently available network throughput.
Let's say you have some information about file such as size, redundancy, type (we can say text to keep it simple). Maybe we have some statistical data from actual prior measurements. What else would be needed to perform prediction for execution time and/or compression ratio (even if a very rough one).
For just local compression, the size of the file would have effect since actual reading and writing data to/from storage media (sdcard, hard drive) would take more dominant portion of total execution.
The actual compression portion, will probably depend on redundancy/type, since most compression algorithms work by compressing small blocks of data (100kb or so). For example, larger HTML/Javascripts files compress better since they have higher redundancy.
I guess there is also a problem of scheduling, but this could probably be ignored for rough estimation.
This is a question that been in my head for quiet sometimes. I been wondering if some low overhead code (say on the server) can predict how long it would take to compress a file before performing actual compression?
Sample the file by taking 10-100 small pieces from random locations. Compress them individually. This should give you a lower bound on compression ratio.
This only returns meaningful results if the chunks are not too small. The compression algorithm must be able to make use of a certain size of history to predict the next bytes.
It depends on the data but with images you can take small small samples. Downsampling would change the result. Here is an example:PHP - Compress Image to Meet File Size Limit.
The compression ratio can be calculated with these formulas:
And the performance benchmarking can be done using V8 or Sunspider.
You can also use algorithms like DEFLATE or LZMA to compute the mechanism. PPM (Partial by Predicting Matching) can be used for predicting.
I am looking for the fastest way to read a sequential file from disk.
I read in some posts that if I compressed the file using, for example, lz4, I could achieve better performance than read the flat file, because I will minimize the i/o operations.
But when I try this approach, scanning a lz4 compressed file gives me a poor performance than scanning the flat file. I didn't try the lz4demo above, but looking for it, my code is very similar.
I have found this benchmarks:
http://skipperkongen.dk/2012/02/28/uncompressed-versus-compressed-read/
http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?r=75
Is it really possible to improve performance reading a compressed sequential file over an uncompressed one? What am I doing wrong?
Yes, it is possible to improve disk read by using compression.
This effect is most likely to happen if you use a multi-threaded reader : while one thread reads compressed data from disk, the other one decode the previous compressed block within memory.
Considering the speed of LZ4, the decoding operation is likely to finish before the other thread complete reading the next block. This way, you'll achieved a bandwidth improvement, proportional to the compression ratio of the tested file.
Obviously, there are other effects to consider when benchmarking. For example, seek times of HDD are several order of magnitude larger than SSD, and under bad circumstances, it can become the dominant part of the timing, reducing any bandwidth advantage to zero.
It depends on the speed of the disk vs. the speed and space savings of decompression. I'm sure you can put this into a formula.
Is it really possible to improve performance reading a compresses
sequential file over an uncompressed one? What am i doing wrong?
Yes, it is possible (example: a 1kb zip file could contain 1GB of data - it would most likely be faster to read and decompress the ZIP).
Benchmark different algorithms and their decompression speeds. There are compression benchmark websites for that. There are also special-purpose high-speed compression algorithms.
You could also try to change the data format itself. Maybe switch to protobuf which might be faster and smaller than CSV.
I'm writing streams of images to a hard disk using std::fstream. Since most hard disk drives have a 32MB cache, is it more efficient to create a buffer to accumulate image data up to 32MB and then write to disk, or is it as efficient to just write every image onto the disk?
The cache is used as a read/write cache to alleviate problems due to queuing.... Here are my experiences with disks:
If the disk is not a SSD, then it's better if you write serially, than seek to files.. Seek is a killer for I/O performance.
The disks typically writes in sector sizes. sector sizes are usually 512b or 4k (newer disks). Try to write data one sector at a time.
Bunching I/O is always faster than multiple small I/Os. The simple reason is that the processor on the disk has a smaller queue to flush.
Whatever you can serve from memory, serve. Use disk only if necessary. You can always do an modify/invalidate cache entry on write, depending on your reliability policy. Make sure you don't swap, so your memory cache size must be reasonable, to begin with.
If you're doing this I/O management, make sure you don't double-buffer with your OS page cache. O_DIRECT for this.
Use non-blocking, if reliability isn't an issue. O_NONBLOCK
Every part of your system, from fstream down to the disk driver knows more about specific efficiency than your application even has access to.
You couldn't improve upon the various buffering schemes if you tried, so don't bother.
I'm processing data from a hard disk from one large file (processing is fast and not a lot of overhead) and then have to write the results back (hundreds of thousands of files).
I started writing the results straight away in files, one at a time, which was the slowest option. I figured it gets a lot faster if I build a vector of a certain amount of the files and then write them all at once, then go back to processing while the hard disk is occupied in writing all that stuff that i poured into it (that at least seems to be what happens).
My question is, can I somehow estimate a convergence value for the amount of data that I should write from the hardware constraints ? To me it seems to be a hard disk buffer thing, I have 16MB buffer on that hard disk and get these values (all for ~100000 files):
Buffer size time (minutes)
------------------------------
no Buffer ~ 8:30
1 MB ~ 6:15
10 MB ~ 5:45
50 MB ~ 7:00
Or is this just a coincidence ?
I would also be interested in experience / rules of thumb about how writing performance is to be optimized in general, for example are larger hard disk blocks helpful, etc.
Edit:
Hardware is a pretty standard consumer drive (I'm a student, not a data center) WD 3,5 1TB/7200/16MB/USB2, HFS+ journalled, OS is MacOS 10.5. I'll soon give it a try on Ext3/Linux and internal disk rather than external).
Can I somehow estimate a convergence value for the amount of data that I should write from the hardware constraints?
Not in the long term. The problem is that your write performance is going to depend heavily on at least four things:
Which filesystem you're using
What disk-scheduling algorithm the kernel is using
The hardware characteristics of your disk
The hardware interconnect you're using
For example, USB is slower than IDE, which is slower than SATA. It wouldn't surprise me if XFS were much faster than ext2 for writing many small files. And kernels change all the time. So there are just too many factors here to make simple predictions easy.
If I were you I'd take these two steps:
Split my program into multiple threads (or even processes) and use one thread to deliver system calls open, write, and close to the OS as quickly as possible. Bonus points if you can make the number of threads a run-time parameter.
Instead of trying to estimate performance from hardware characteristics, write a program that tries a bunch of alternatives and finds the fastest one for your particular combination of hardware and software on that day. Save the fastest alternative in a file or even compile it into your code. This strategy was pioneered by Matteo Frigo for FFTW and it is remarkably effective.
Then when you change your disk, your interconnect, your kernel, or your CPU, you can just re-run the configuration program and presto! Your code will be optimized for best performance.
The important thing here is to get as many outstanding writes as possible, so the OS can optimize hard disk access. This means using async I/O, or using a task pool to actually write the new files to disk.
That being said, you should look at optimizing your read access. OS's (at least windows) is already really good at helping write access via buffering "under the hood", but if your reading in serial there isn't too much it can do to help. If use async I/O or (again) a task pool to process/read multiple parts of the file at once, you'll probably see increased perf.
Parsing XML should be doable at practically disk read speed, tens of MB/s. Your SAX implementation might not be doing that.
You might want to use some dirty tricks. 100.000s of files to write is not going to be efficient with the normal API.
Test this by writing sequentially to a single file first, not 100.000. Compare the performance. If the difference is interesting, read on.
If you really understand the file system you're writing to, you can make sure you're writing a contiguous block you just later split into multiple files in the directory structure.
You want smaller blocks in this case, not larger ones, as your files are going to be small. All free space in a block is going to be zeroed.
[edit] Do you really have an external need for those 100K files? A single file with an index could be sufficient.
Expanding on Norman's answer: if your files are all going into one filesystem, use only one helper thread.
Communication between the read thread and write helper(s) consists of a two-std::vector double-buffer per helper. (One buffer owned by the write process and one by the read process.) The read thread fills the buffer until a specified limit then blocks. The write thread times the write speed with gettimeofday or whatever, and adjusts the limit. If writing went faster than last time, increase the buffer by X%. If it went slower, adjust by –X%. X can be small.
When reading files off of a hard drive, mmap is generally regarded as a good way to quickly get data into memory. When working with optical drives, accesses take more time and you have a higher latency to worry about. What approach/abstraction do you use to hide/eliminate as much latency and/or overall load time of the optical drive as possible?
There's no real abstraction you can employ. Optical drives have very specific characteristics that must be optimized for to get the best performance.
Some tips:
The biggest killer on optical drives is seek time. Where possible make sure all the files you are reading are sequential on disc and as closely packed as possible. If you must seek then seek in one direction and as infrequently as possible.
Asynchronous reading can also massively improve performance. If you need to load and process files A,B & C then before processing A you should start reading file B, and while processing B you should be reading file C and so on.
Generally the more data you can read in one go the better, e.g avoid lots of little reads(). You will only get the theoretical throughput of a disc while reading large amounts of data. Some OS's /drivers will minimize the penalty of reading lots of little files by caching sectors, some will not.
Doing lots of exists(filename) checking can also be detrimental on some filesystems / OSs where only parts of the TOC are cached.
In our applications we usually pack files into one or more "lumped" files and have them ordered sequentially based on their access order. Some files (and directories) are compressed and read in their entirety before being decompressed in memory. This can be a win if you have a directory that contains a multitude of small files (e.g XML or scripts).
Basically lots of benchmarking and tweaking :)
Minimize or eliminate seeks by reading in giant chunks of data sequentially from a few files (optimally one).
First you must keep in mind, that modern optical drives are quite fast reading sequential data, but seeking data is still a lot slower than on HDs. So if you must seek a lot within a big file (e.g. jump randomly around within a 500+ MB file), it might actually be faster to first copy the whole 500 MB to HD (into a temporary file), which will be done in sequential, fast reads, perform the operation on the temp file (much faster since much faster access times on HD) and delete the file again if you are done with it.
The same of above applies to little big vs many small files as well. Working with a couple of big files is much faster than with many small files, since every time you switch from one small file to another one the huge seeking time will give you headaches again. This is the reason why many games that ship on optical media packs game data in huge archive files (e.g. all textures of one level are in one huge file instead of having one small file per texture), so try keeping data well structured in big files you can read as sequential as possible.
HD caching itself is a good technique. There is this game I remember, though I forgot the title, that always kept the 3D data of your environment on HD. While you were moving through the world, it was constantly copying data from DVD to HD. Thus the surrounding 3D landscape was always available on HD for fast access, however not the whole DVD was copied, only about 200-300 MB were temporarily cached on HD to save HD space. The only annoying thing about that was that you often had DVD access "noise" while playing the game, however most of the time the whole process was happening only during CPU idle times, so it did not really affect game play. Only if you ran very fast constantly within the same direction it could happen that the DVD drive was falling back and all of a sudden the game stopped with a loading indicator for a couple of seconds. However I've been playing this games for days and maybe saw this loading indicator three times within a single week. If you were moving slow or not constantly into the same direction, there never was a loading indicator.
Slow drives are going to be slow. Sorry. However, optical drive hardware will normally be optimized to do sequential reads, so if you can make your code work that way you might see some improvement. I doubt you'll see much difference between mmap(), fread(), et al, for sequential access. You might also be able to tune your read buffer size to be a multiple of the drive's block size, if your OS isn't already doing that for you. Optical drive can have large block sizes compared to hard drives, and if your buffers aren't large enough you're paying a price.
I'm not sure that there is a lot that you can do by the time that you are reading it. You could look at the create file API -- you can pass some hints to Windows that tell it that you are opening the file for Sequential or Random access. That is supposed to allow Windows to optimize the caching strategy used for the file.
You can tune the "chunks" that you bite off when reading your file to make them larger or smaller. You might get a slight improvement if you read in chunks that are multiples of the allocation unit size on the disk.
The hardware and media can make a difference. Say you have a DVD drive that reads at 16x. It will require media that is rated at 16x or higher, and some drives don't work well with some media brands. So even if the media meets the ratings, you might not be reading at the maximum speed. (usually a good hardware review on an optical drive will include details like this).
The layout of the files on the optical disk could be important. Was it burned all at once? Was it just mounted as a disk (like a packet-mode R/W?). I don't have experience with this, but given the longer seek times on an optical drive, fragmented files might have a greater impact than they do with a modern hard drive.