Related
I want a high performance field of view, on a grid like a rougelike. I know I could use the field of view algorithms out there, but I since this is for a server, I think could use some more resources, since using a pre-calculated field of view would be very wasteful to use on a client/single player.
What I'm thinking of is putting all the possibilities (I'm going with 21x21 map, so there are 441 spots: 441! / (2! (441 - 2)!) = 97020) into a file, the server reads the file, and puts the data into a unordered map of a vector of pairs of ints (the coord would be based off the position of the client, e.i.; *pair*.first+client.x-11). Then when a client sends a movement, the server gets the tiles around it, hashes it using boost, and sends the visible tiles (with delta compression of course!).
I would probably mess with bits and use 9 bit bytes instead of 8 because the max size is 441, and 8 bits max is 256, and 9 bits is 510.
What I am wondering is if the hashing will take more time than using a simple algorithm (not much experience with hashing...) or if this will take too much ram, but if not then I can flat out use the most complex and accurate algorithm without the processing time!
I have a video file, which consists of many successive frames of binary data. Each frame has also a unique timestamp (which is NOT its sequential number in file, but rather a value, provided by the camera at the recording time). On the other hand, I've got an API function which retrieves that frame based on the sequential number of that frame. To make things a bit more complicated - I have a player, who is provided with the timestamp, and should get the binary data for that frame.
Another sad thing here: timestamps are NOT sequential. They can be sequential, but it is not guaranteed, as a wraparound may occur around max unsigned short size.
So a sequence of timestamps could be either
54567, 54568, ... , 65535, 65536 , ... or
54567, 54568, ..., 65535, 0, 1, ...
So it might look like the following:
Frame 0
timestamp 54567
binary data
........
Frame 1
timestamp 54569
binary data
........
Frame 2
timestamp 54579
binary data
.
.
.
Frame n
timestamp m
binary data
0 <= n <= 65536 (MAX_UNSIGNED_SHORT)
0 <= m <= MAX_UNSIGNED_INT
The clip player API should be able to get the binary frame by the timestamp. However, internally, I can get the frame only by its frame sequential number. So if I am asked for timestamp m, I need to iterate over n frames, to find the frame with timestamp m.
To optimize it, I chose to create an index file, which would give me a match between timestamp and the frame sequential number. And here is my question:
Currently my index file is composed of binary pairs of size 2*sizeof(unsigned int), which contains timestamp and frame sequential number. Player later on creates from that file stl map with key==timestamp, value==frame sequential number.
Is there any way to do it more efficiently? Should I create my index file as a dump of some data structure, so it could later be loaded into memory by the clip player while opening the clip, so I would have an O(1) access to frames? Do you have other suggestions?
UPD:
I have updated the names and requirements (timestamps are not necessarily sequential, and frames num bounded by MAX_UNSIGNED_SHORT value). Also wanted to thank everyone who already took the time and gave an answer. The interpolation search is an interesting idea, although I never tried it myself. I guess the question would be the delta between O(1) and O(log log N) in runtime.
It would seem that we should be able to make the following assumptions:
a) the video file itself will not be modified after it is created
b) the player may want to find successive frames i.e. when it is doing normal playback
c) the player may want to find random frames i.e. when it is doing FF, REW or skip by or to chapter
Given this, why not just do a HashMap associating the Frame Id and the Frame Index? You can create that once, the player can read it and then can do an easy and time bounded look up of the requested Frame.
There are a series of tradeoffs to make here.
Your index file is already a dump of a data structure: an array. If you don't plan on often inserting or deleting frames, and keep this array in a sorted order, it's easy to do a binary search (using std::binary_search) on the array. Insertion and deletion take O(N), but searching is still O(log N). The array will occupy less space in memory, and will be faster to read and write from your index file.
If you're doing a lot of inserting and removing frames, then coverting to a std::map structure will give you better performance. If the number of frames is large, or you want to store more metadata with them, you might want to look at a B-tree structure, or just use an embedded database like Sqlite or BerkeleyDB. Both of these implement B-tree indexing and are well-tested pieces of code.
Simply store the frame data in an array where indices represent frame numbers. Then create a hash map from camera indices to frame numbers. You can get the frame belonging to either a frame number or camera index in O(1) while barely using more memory than your current approach.
Alternatively, you can maintain an array, indexed by frame number, that stores a (camera index, data) pair and perform a O(log n) binary search on it when you need to access it by camera index. This takes advantage of the fact that the camera indices are sorted.
In C++'s standard library, hash maps are available as std::unordered_map (if you compiler/STL supports them, which might not be the case since they have only recently been added to the C++ standard), although the tree-based std::map (with O(log n) lookup) is probably good enough for this purpose.
A binary search implementation is available as std::binary_search.
I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.
I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.
Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.
Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.
I've run into some nasty problem with my recorder. Some people are still using it with analog tuners, and analog tuners have a tendency to spit out 'snow' if there is no signal present.
The Problem is that when noise is fed into the encoder, it goes completely crazy and first consumes all CPU then ultimately freezes. Since main point od the recorder is to stay up and running no matter what, I have to figure out how to proceed with this, so encoder won't be exposed to the data it can't handle.
So, idea is to create 'entropy detector' - a simple and small routine that will go through the frame buffer data and calculate entropy index i.e. how the data in the picture is actually random.
Result from the routine would be a number, that will be 0 for completely back picture, and 1 for completely random picture - snow, that is.
Routine in itself should be forward scanning only, with few local variables that would fit into registers nicely.
I could use zlib or 7z api for such task, but I would really want to cook something on my own.
Any ideas?
PNG works this way (approximately): For each pixel, replace its value by the value that it had minus the value of the pixel left to it. Do this from right to left.
Then you can calculate the entropy (bits per character) by making a table of how often which value appears now, making relative values out of these absolute ones and adding the results of log2(n)*n for each element.
Oh, and you have to do this for each color channel (r, g, b) seperately.
For the result, take the average of the bits per character for the channels and divide it by 2^8 (assuming that you have 8 bit per color).
I have an array of point data, the values of points are represented as x co-ordinate and y co-ordinate.
These points could be in the range of 500 upto 2000 points or more.
The data represents a motion path which could range from the simple to very complex and can also have cusps in it.
Can I represent this data as one spline or a collection of splines or some other format with very tight compression.
I have tried representing them as a collection of beziers but at best I am getting a saving of 40 %.
For instance if I have an array of 500 points , that gives me 500 x and 500 y values so I have 1000 data pieces.
I around 100 quadratic beziers from this. each bezier is represented as controlx, controly, anchorx, anchory.
which gives me 100 x 4 = 400 pcs of data.
So input = 1000pcs , output = 400pcs.
I would like to further tighen this, any suggestions?
By its nature, spline is an approximation. You can reduce the number of splines you use to reach a higher compression ratio.
You can also achieve lossless compression by using some kind of encoding scheme. I am just making this up as I am typing, using the range example in previous answer (1000 for x and 400 for y),
Each point only needs 19 bits (10 for x, 9 for y). You can use 3 bytes to represent a coordinate.
Use 2 byte to represent displacement up to +/- 63.
Use 1 byte to represent short displacement up to +/- 7 for x, +/- 3 for y.
To decode the sequence properly, you would need some prefix to identify the type of encoding. Let's say we use 110 for full point, 10 for displacement and 0 for short displacement.
The bit layout will look like this,
Coordinates: 110xxxxxxxxxxxyyyyyyyyyy
Dislacement: 10xxxxxxxyyyyyyy
Short Displacement: 0xxxxyyy
Unless your sequence is totally random, you can easily achieve high compression ratio with this scheme.
Let's see how it works using a short example.
3 points: A(500, 400), B(550, 380), C(545, 381)
Let's say you were using 2 byte for each coordinate. It will take 16 bytes to encode this without compression.
To encode the sequence using the compression scheme,
A is first point so full coordinate will be used. 3 bytes.
B's displacement from A is (50, -20) and can be encoded as displacement. 2 bytes.
C's displacement from B is (-5, 1) and it fits the range of short displacement 1 byte.
So you save 10 bytes out of 16 bytes. Real compression ratio is totally depending on the data pattern. It works best on points forming a moving path. If the points are random, only 25% saving can be achieved.
If for example you use 32-bit integers for point coords and there is range limit, like x: 0..1000, y:0..400, you can pack (x, y) into a single 32-bit variable.
That way you achieve another 50% compression.
You could do a frequency analysis of the numbers you are trying to encode and use varying bit lengths to represent them, of course here I am vaguely describing Huffman coding
Firstly, only keep enough decimal points in your data that you actually need. Removing these would reduce your accuracy, but its a calculated loss. To do that, try converting your number to a string, locating the dot's position, and cutting of those many characters from the end. That could process faster than math, IMO. Lastly you can convert it back to a number.
150.234636746 -> "150.234636746" -> "150.23" -> 150.23
Secondly, try storing your data relative to the last number ("relative values"). Basically subtract the last number from this one. Then later to "decompress" it you can keep an accumulator variable and add them up.
A A A A R R
150, 200, 250 -> 150, 50, 50