I'm developing "remote screencasting" application (just like VNC but not exactly), where I transfer updated tiles of screen pixels over the network. I'd like to implement the caching mechanism, and I'd like to hear your recommendations...
Here is how I think it should be done. For each tile coordinate, there is fixed size stack (cache) where I add updated tiles. When saving, I calculate some kind of checksum (probably CRC-16 would suffice, right?) of the tile data (i.e. pixels). When getting new tile (from the new screenshot of desktop), I calculate its checksum and compare to all items checksums in the stack of that tile coordinate. If the checksum matches, instead of sending the tile I send the special message e.g. "get tile from cache stack at position X". This means I need to have identical cache stacks on the server and on the client.
Here comes my questions:
What should be the default stack size (depth)? Say if stack size is 5, this means last 5 tiles of specified coordinates will be saved, and 5 times the resolution of screen pixels will be the total cache size. For big screens raw RGB buffer of screen will be approx. 5 megabytes, so having 10-level stack means 50MB cache, right? So what should be the cache depth? I think maybe 10 but need your suggestions.
I'm compressing the tiles into JPEG before sending over network. Should I implement caching of JPEG tiles, or raw RBG tiles before compression? Logical choice would be caching raw tiles as it would avoid unnecessary JPEG encoding for the tiles that would be found in cache. But saving RGB pixels will require much bigger cache size. So what's the best option - before or after compression?
Is CRC-16 checksum alone enough for comparing new screen tiles with the tiles in cache stack? I mean should I additionally make byte-by-byte comparison for the tiles when CRC matches, or is it redundant? Is the collision probability low enough to be discarded?
In general, what do you think about the scheme I described? What would you change in it? Any kind of suggestions would be appreciated!
I like the way you explained everything, this is certainly a nice idea to implement.
I implemented the similar approach for a similar application couple of months ago, Now looking for some different schemes either work along with it or replace it.
I used the cache stack size equal number of tiles as present in a screen and didn't restrict the tile to be matched with same location previous tile. I assume it is very helpful while user is moving a window. Cache size is a trade-off between processing power , memory and bandwidth. The more tiles you have in cache the more you may save the bandwidth again on the cost of memory and processing.
I used CRC16 too but this is not ideal to implement as when it hits some CRCs in cache it produces a very odd image, which was quite annoying but very rare. Best thing is the match pixel by pixel if you can afford it in-terms of processing power. In my case I couldn't.
Caching JPEG is a better idea to save the memory, because if we create BITMAP from JPEG the damage has already been done to it in-terms of quality, I assume the probability of hitting wrong CRC is the same in both cases. I, in my case, used JPEG.
I'd use a faster hash algorithm. murmur2 or Jenkins for example. It promises much better uniqueness.
See Spice remote protocol (www.spice-space.org) for example of caching.
The cache should be as big as it can be (on the client, or in an intermediate proxy).
You might check out x11vnc's cache implementation.
Related
Voxel engine (like Minecraft) optimization suggestions?
As a fun project (and to get my Minecraft-adict son excited for programming) I am building a 3D Minecraft-like voxel engine using C# .NET4.5.1, OpenGL and GLSL 4.x.
Right now my world is built using chunks. Chunks are stored in a dictionary, where I can select them based on a 64bit X | Z<<32 key. This allows to create an 'infinite' world that can cache-in and cache-out chunks.
Every chunk consists of an array of 16x16x16 block segments. Starting from level 0, bedrock, it can go as high as you want (unlike minecraft where the limit is 256, I think).
Chunks are queued for generation on a separate thread when they come in view and need to be rendered. This means that chunks might not show right away. In practice you will not notice this. NOTE: I am not waiting for them to be generated, they will just not be visible immediately.
When a chunk needs to be rendered for the first time a VBO (glGenBuffer, GL_STREAM_DRAW, etc.) for that chunk is generated containing the possibly visible/outside faces (neighboring chunks are checked as well). [This means that a chunk potentially needs to be re-tesselated when a neighbor has been modified]. When tesselating first the opaque faces are tesselated for every segment and then the transparent ones. Every segment knows where it starts within that vertex array and how many vertices it has, both for opaque faces and transparent faces.
Textures are taken from an array texture.
When rendering;
I first take the bounding box of the frustum and map that onto the chunk grid. Using that knowledge I pick every chunk that is within the frustum and within a certain distance of the camera.
Now I do a distance sort on the chunks.
After that I determine the ranges (index, length) of the chunks-segments that are actually visible. NOW I know exactly what segments (and what vertex ranges) are 'at least partially' in view. The only excess segments that I have are the ones that are hidden behind mountains or 'sometimes' deep underground.
Then I start rendering ... first I render the opaque faces [culling and depth test enabled, alpha test and blend disabled] front to back using the known vertex ranges. Then I render the transparent faces back to front [blend enabled]
Now... does anyone know a way of improving this and still allow dynamic generation of an infinite world? I am currently reaching ~80fps#1920x1080, ~120fps#1024x768 (screenshots: http://i.stack.imgur.com/t4k30.jpg, http://i.stack.imgur.com/prV8X.jpg) on an average 2.2Ghz i7 laptop with a ATI HD8600M gfx card. I think it must be possible to increase the number of frames. And I think I have to, as I want to add entity AI, sound and do bump and specular mapping. Could using Occlusion Queries help me out? ... which I can't really imagine based on the nature of the segments. I already minimized the creation of objects, so there is no 'new Object' all over the place. Also as the performance doesn't really change when using Debug or Release mode, I don't think it's the code but more the approach to the problem.
edit: I have been thinking of using GL_SAMPLE_ALPHA_TO_COVERAGE but it doesn't seem to be working?
gl.Enable(GL.DEPTH_TEST);
gl.Enable(GL.BLEND); // gl.Disable(GL.BLEND);
gl.Enable(GL.MULTI_SAMPLE);
gl.Enable(GL.SAMPLE_ALPHA_TO_COVERAGE);
To render a lot of similar objects, I strongly suggest you take a look into instanced draw : glDrawArraysInstanced and/or glDrawElementsInstanced.
It made a huge difference for me. I'm talking from 2 fps to over 60 fps to render 100000 similar icosahedrons.
You can parametrize your cubes by using Attribs ( glVertexAttribDivisor and friends ) to make them differents. Hope this helps.
It's on ~200fps currently, should be OK. The 3 main things that I've done are:
1) generation of both chunks on a separate thread.
2) tessellation the chunks on a separate thread.
3) using a Deferred Rendering Pipeline.
Don't really think the last one contributed much to the overall performance but had to start using it because of some of the shaders. Now the CPU is sort of falling asleep # ~11%.
This question is pretty old, but I'm working on a similar project. I approached it almost exactly the same way as you, however I added in one additional optimization that helped out a lot.
For each chunk, I determine which sides are completely opaque. I then use that information to do a flood fill through the chunks to cull out the ones that are underground. Note, I'm not checking individual blocks when I do the flood fill, only a precomputed bitmask for each chunk.
When I'm computing the bitmask, I also check to see if the chunk is entirely empty, since empty chunks can obviously be ignored.
I have a universal iOS game built with cocos2d-iphone that has a large number of small images (amongst others). For these small images, the game works fine with a 1:2:4 ratio for iphone:ipad/iphone-retina:ipad-retina. I have two approaches to enable this in the game:
A) Have three sets of sprites/spritesheets - for the three form factors required and name them appropriately and have the images picked up
B) Have one set of highest resolution images that are then scaled depending on the device and its resolution aSprite.scale=[self getScaleAccordingToDevice];
Option A has the advantage of lesser runtime overhead at the cost of high on disk footprint (an important consideration, as the app is currently ~94 MB).
Option B has the advantage of a smaller on disk footprint, but the cost is that ipad retina images will be loaded in memory even for the iphone 3gs (lowest supported device).
Can someone provide arguments that will help me decide one way or the other?
Thanks
There is no argument: use option A.
Option B is absolutely out of the question because you would be loading images that may be 6-8 times larger in memory (as a texture) on a device (3GS) that has a quarter of the memory of an iPad 3 or 4 (256 MB vs 1 GB). Not to mention the additional processing power needed to render a scaled down version of such a large image. There's a good chance it won't work at all due to running out of memory and running too slowly (have you tried?).
Next, it stands to reason that at 95 MB you might still not get your app below 50 MB with option B. The large Retina textures make up two thirds or three quarters of your bundle size , the SD textures don't weigh in much. This is the only app bundle size target you should ever consider because below 50 MB users can download your app over the air, at over 50 MB they'll have to sync via Wifi or connected to a computer. If you can't get below 50 MB, it really doesn't matter if your bundle size is 55 MB or 155 MB.
Finally there are better options to decrease bundle size. Read my article and especially the second part.
If your images are PNG the first thing you should try is to convert them all to .pvr.ccz and as NPOT texture atlases (easiest way to do that: TexturePacker). You may be able to cut down bundle size by as much as 30-50% without losing image quality. And if you can afford to lose some image quality there are even greater savings possible (plus additional loading and performance improvements).
Well, at 94Mb, your app is already way beyond the download limit for phone network, ie it will only ever be downloaded when some internet connection is available. So ... is it really an issue? The other big factor you need to consider is memory footprint when running. If you run 4x on a 3G and scale down, the memory requirement will still be for the full size sprite (ie 16x the amount of memory :). So the other question you have to ask yourself is whether that game is likely to 'run' with a high memory foot print on older devices. Also, the load time for the textures could be enough to affect the usability of your app on older devices. You need to measure these things and decide based on some hard data (unfortunately).
The first test you need to do is to see whether your 'scaled down' sprites will look ok on standard resolution iphones. Scaling down sometimes falls short of expectations when rendered. If your graphic designer turns option B down, you dont have a decision. He/she has the onus of providing all 3 formats. After that, if option B is still an option, I would start with option B, and measure on 3GS (small scale project, not the complete implementation). If all is well, you are done.
ps : for app size, consider using .pvr.ccz formats ( I use texture packer). Smaller textures and much faster load time (because of the pvr format). The load time improvement may be smaller on 3GS because of generally slower processor - need to un compress.
I have an application that contains many millions of 3d rgb points that form an image when plotted. What is the fastest way of getting them to screen in a MFC application? I've tried CDC.SetPixelV in conjunction with a bitmap, which seems quite slow, and am looking towards a Direct3D or OpenGL window in my MFC view class. Any other good places to look?
Double buffering is your solution. There are many examples on codeproject. Check this one for example
Sounds like a point cloud. You might find some good information searching on that term.
3D hardware is the fastest way to take 3D points and get them into a 2D display, so either Direct3D or OpenGL seem like the obvious choices.
If the number of points is much greater than the number of pixels in your display, then you'll probably first want to cull points that are trivially outside the view. You put all your points in some sort of spatial partitioning structure (like an octree) and omit the points inside any node that's completely outside the viewing frustrum. This reduces the amount of data you have to push from system memory to GPU memory, which will likely be the bottleneck. (If your point cloud is static, and you're just building a fly through, and if your GPU has enough memory, you could skip the culling, send all the data at once, and then just update the transforms for each frame.)
If you don't want to use the GPU and instead write a software renderer, you'll want to render to a bitmap that's in the same pixel format as your display (to eliminate the chance of the blit need to do any pixels formatting as it blasts the bitmap to the display). For reasonable window sizes, blitting at 30 frames per second is feasible, but it might not leave much time for the CPU to do the rendering.
I am building an application which takes a great many number of screenshots during the process of "recording" operations performed by the user on the windows desktop.
For obvious reasons I'd like to store this data in as efficient a manner as possible.
At first I thought about using the PNG format to get this done. But I stumbled upon this: http://www.olegkikin.com/png_optimizers/
The best algorithms only managed a 3 to 5 percent improvement on an image of GUI icons. This is highly discouraging and reveals that I'm going to need to do better because just using PNG will not allow me to use previous frames to help the compression ratio. The filesize will continue to grow linearly with time.
I thought about solving this with a bit of a hack: Just save the frames in groups of some number, side by side. For example I could just store the content of 10 1280x1024 captures in a single 1280x10240 image, then the compression should be able to take advantage of repetitions across adjacent images.
But the problem with this is that the algorithms used to compress PNG are not designed for this. I am arbitrarily placing images at 1024 pixel intervals from each other, and only 10 of them can be grouped together at a time. From what I have gathered after a few minutes scanning the PNG spec, the compression operates on individual scanlines (which are filtered) and then chunked together, so there is actually no way that info from 1024 pixels above could be referenced from down below.
So I've found the MNG format which extends PNG to allow animations. This is much more appropriate for what I am doing.
One thing that I am worried about is how much support there is for "extending" an image/animation with new frames. The nature of the data generation in my application is that new frames get added to a list periodically. But I do have a simple semi-solution to this problem, which is to cache a chunk of recently generated data and incrementally produce an "animation", say, every 10 frames. This will allow me to tie up only 10 frames' worth of uncompressed image data in RAM, not as good as offloading it to the filesystem immediately, but it's not terrible. After the entire process is complete (or even using free cycles in a free thread, during execution) I can easily go back and stitch the groups of 10 together, if it's even worth the effort to do it.
Here is my actual question that everything has been leading up to. Is MNG the best format for my requirements? Those reqs are: 1. C/C++ implementation available with a permissive license, 2. 24/32 bit color, 4+ megapixel (some folks run 30 inch monitors) resolution, 3. lossless or near-lossless (retains text clarity) compression with provisions to reference previous frames to aid that compression.
For example, here is another option that I have thought about: video codecs. I'd like to have lossless quality, but I have seen examples of h.264/x264 reproducing remarkably sharp stills, and its performance is such that I can capture at a much faster interval. I suspect that I will just need to implement both of these and do my own benchmarking to adequately satisfy my curiosity.
If you have access to a PNG compression implementation, you could easily optimize the compression without having to use the MNG format by just preprocessing the "next" image as a difference with the previous one. This is naive but effective if the screenshots don't change much, and compression of "almost empty" PNGs will decrease a lot the storage space required.
Pretty simple but specific question here:
I'm not entirely familiar with the JPEG standard for compressing images. Does it create a better (that being, smaller file size at a similar quality) image when the X dimension (width) is very large and the Y dimension (height) is very small, vice versa, or when the two are nearly equal?
The practical use I have for this is CSS sprites. If a website were to consist of hundreds of CSS sprites, it would be ideal to minimize the size of the sprite file to assist users on slower internet and also to reduce server load. If the JPEG standard operates really well on a single horizontal line, but moving vertically requires a lot more complexity, it would make sense for an image of 100 16x16 CSS sprites to be 1600x16.
On the other hand if the JPEG standard has a lot of complexity working horizontally but moves from row to row easily, you could make a smaller file or have higher quality by making the image 16x1600.
If the best compression occurs when the image is a perfect square, you would want the final image to be 160x160
The MPEG/JPEG blocking mechanism would (very slightly) favor an image size that is an exact multiple of the compression block size in each dimension. However, beyond that, the format won't care if the blocks are vertical or horizontal.
So, the direct answer to your question would be "square is as good as anything", as long as your sprites divide easily into a JPEG compression block (just make sure they are 8, 16, 24 or 32 pixels wide and you'll be fine).
However, I would go a bit further and say that for "most" spites, you are going to have a smaller image size, and clearer resolution if you have the initial master image be GIF instead of JPG, even more so if you can use a reduced color palette. Consider why would you need JPG at all for "hundreds of sprites".
It looks like JPEG's compression ratio isn't affected by image dimensions. However, it looks like your dimensions should be multiples of 8 but in all your examples you had multiples of 16 so you should be fine there.
http://en.wikipedia.org/wiki/JPEG#JPEG_codec_example
If I remember correctly, PNG (being lossless) operates much better when the same color appears in a horizontal stretch rather than a vertical stretch. Why are you making your sprites JPEG? If they are of a limited color-set (which is likely if you have 16x16 sprites, animated or not), PNG might actually yield better filesizes with perfect image quality.