Finding the most efficient data structure to create an index file

Finding the most efficient data structure to create an index file - c++

I have a video file, which consists of many successive frames of binary data. Each frame has also a unique timestamp (which is NOT its sequential number in file, but rather a value, provided by the camera at the recording time). On the other hand, I've got an API function which retrieves that frame based on the sequential number of that frame. To make things a bit more complicated - I have a player, who is provided with the timestamp, and should get the binary data for that frame.
Another sad thing here: timestamps are NOT sequential. They can be sequential, but it is not guaranteed, as a wraparound may occur around max unsigned short size.
So a sequence of timestamps could be either
54567, 54568, ... , 65535, 65536 , ... or
54567, 54568, ..., 65535, 0, 1, ...
So it might look like the following:
Frame 0
timestamp 54567
binary data
........
Frame 1
timestamp 54569
binary data
........
Frame 2
timestamp 54579
binary data
.
.
.
Frame n
timestamp m
binary data
0 <= n <= 65536 (MAX_UNSIGNED_SHORT)
0 <= m <= MAX_UNSIGNED_INT
The clip player API should be able to get the binary frame by the timestamp. However, internally, I can get the frame only by its frame sequential number. So if I am asked for timestamp m, I need to iterate over n frames, to find the frame with timestamp m.
To optimize it, I chose to create an index file, which would give me a match between timestamp and the frame sequential number. And here is my question:
Currently my index file is composed of binary pairs of size 2*sizeof(unsigned int), which contains timestamp and frame sequential number. Player later on creates from that file stl map with key==timestamp, value==frame sequential number.
Is there any way to do it more efficiently? Should I create my index file as a dump of some data structure, so it could later be loaded into memory by the clip player while opening the clip, so I would have an O(1) access to frames? Do you have other suggestions?
UPD:
I have updated the names and requirements (timestamps are not necessarily sequential, and frames num bounded by MAX_UNSIGNED_SHORT value). Also wanted to thank everyone who already took the time and gave an answer. The interpolation search is an interesting idea, although I never tried it myself. I guess the question would be the delta between O(1) and O(log log N) in runtime.

It would seem that we should be able to make the following assumptions:
a) the video file itself will not be modified after it is created
b) the player may want to find successive frames i.e. when it is doing normal playback
c) the player may want to find random frames i.e. when it is doing FF, REW or skip by or to chapter
Given this, why not just do a HashMap associating the Frame Id and the Frame Index? You can create that once, the player can read it and then can do an easy and time bounded look up of the requested Frame.

There are a series of tradeoffs to make here.
Your index file is already a dump of a data structure: an array. If you don't plan on often inserting or deleting frames, and keep this array in a sorted order, it's easy to do a binary search (using std::binary_search) on the array. Insertion and deletion take O(N), but searching is still O(log N). The array will occupy less space in memory, and will be faster to read and write from your index file.
If you're doing a lot of inserting and removing frames, then coverting to a std::map structure will give you better performance. If the number of frames is large, or you want to store more metadata with them, you might want to look at a B-tree structure, or just use an embedded database like Sqlite or BerkeleyDB. Both of these implement B-tree indexing and are well-tested pieces of code.

Simply store the frame data in an array where indices represent frame numbers. Then create a hash map from camera indices to frame numbers. You can get the frame belonging to either a frame number or camera index in O(1) while barely using more memory than your current approach.
Alternatively, you can maintain an array, indexed by frame number, that stores a (camera index, data) pair and perform a O(log n) binary search on it when you need to access it by camera index. This takes advantage of the fact that the camera indices are sorted.
In C++'s standard library, hash maps are available as std::unordered_map (if you compiler/STL supports them, which might not be the case since they have only recently been added to the C++ standard), although the tree-based std::map (with O(log n) lookup) is probably good enough for this purpose.
A binary search implementation is available as std::binary_search.

Related

Pre-calculeted field of view for efficiency

I want a high performance field of view, on a grid like a rougelike. I know I could use the field of view algorithms out there, but I since this is for a server, I think could use some more resources, since using a pre-calculated field of view would be very wasteful to use on a client/single player.
What I'm thinking of is putting all the possibilities (I'm going with 21x21 map, so there are 441 spots: 441! / (2! (441 - 2)!) = 97020) into a file, the server reads the file, and puts the data into a unordered map of a vector of pairs of ints (the coord would be based off the position of the client, e.i.; *pair*.first+client.x-11). Then when a client sends a movement, the server gets the tiles around it, hashes it using boost, and sends the visible tiles (with delta compression of course!).
I would probably mess with bits and use 9 bit bytes instead of 8 because the max size is 441, and 8 bits max is 256, and 9 bits is 510.
What I am wondering is if the hashing will take more time than using a simple algorithm (not much experience with hashing...) or if this will take too much ram, but if not then I can flat out use the most complex and accurate algorithm without the processing time!

FFT of large data (16gB) using Matlab

I am trying to compute a fast fourier transform of a large chunk of data imported from a text file which is around 16 gB in size. I was trying to think of a way to compute its fft in matlab, but due to my computer memory (8gB) it is giving me an out of memory error. I tried using memmap, textscan, but was not able to apply to get FFT of the combined data.
Can anyone kindly guide me as to how should I approach to get the fourier transform? I am also trying to get the fourier transform (using definition) using C++ code on a remote server, but it's taking a long time to execute. Can anyone give me a proper insight as to how should I handle this large data?

It depends on the resolution of the FFT that you require. If you only need an FFT of, say, 1024 points, then you can reshape your data to, or sequentially read it as N x 1024 blocks. Once you have it in this format, you can then add the output of each FFT result to a 1024 point complex accumulator.
If you need the same resolution after the FFT, then you need more memory, or a special fft routine that is not included in Matlab (but I'm not sure if it is even mathematically possible to do a partial FFT by buffering small chunks through for full resolution).

It may be better you implement FFT with your own code.
The FFT algorithm has a "butterfly" operation. Hence you can split the whole step into smaller blocks.
The file size is too large for a typical pc to handle. But FFT doesn't need all data at once. It can always start with 2-point (maybe 8-point is better) FFT, and you can build up by cascading the stages. It means you can read only a few points at a time, do some calculation, and save your data to disk. Next time you doing another iteration, you can read the saved data from disk.
Depending on how you build the data structure, you can either store all the data in one single file, and read/save it with pointers (in Matlab it's merely a number); or you can store every single point in one individual file, generating billions of files and distinguishing them by file names.
The idea is you can dump your calculation to disk, instead of memory. Of course it requires such amount of disk space, which is more feasible.
I can show you a piece of pseudo-code. Depending on the data structure of your original data (that 16GB txt file), the implementation will be different, but you can easily operate as you own the file. I will start with 2-point FFT and do with the 8-point sample in this wikipedia picture.
1.Do 2-point FFT on x, generating y, the 3rd column of white circles from left.
read x[0], x[4] from file 'origin'
y[0] = x[0] + x[4]*W(N,0);
y[1] = x[0] - x[4]*W(N,0);
save y[0], y[1] to file 'temp'
remove x[0], x[4], y[0], y[1] from memory
read x[2], x[6] from file 'origin'
y[2] = x[2] + x[6]*W(N,0);
y[3] = x[2] - x[6]*W(N,0);
save y[2], y[3] to file 'temp'
remove x[2], x[6], y[2], y[3] from memory
....
2.Do 2-point FFT on y, generating z, the 5th column of white circles.
3.Do 2-point FFT on z, generating final result, X.
Basically the Cooley–Tukey FFT algorithm is designed to enable you cut up the data and calculate piece by piece, so it's possible to handle large-amount data. I know it's not a regular way but if you can take a look at the Chinese version of that Wikipedia page, you may find a number of pictures that may help you understand how it splits up the points.

I've encountered this same problem. I ended up finding a solution in a paper:
Extending sizes of effective convolution algorithms. It essentially involves loading shorter chunks, multiplying by a phase factor and FFT-ing, then loading the next chunk in the series. This gives a sampled of the total FFT of the full signal. The process is then repeated with a number of times with different phase factors to fill in the remaining points. I will attempt to summarize here (adapted from Table II in the paper):
For a total signal f(j) of length N, decide on a number m or shorter chunks each of length N/m that you can store in memory (if needed, zero-pad the signal such that N is a multiple of m)
For beta = 0, 1, 2, ... ,m - 1 do the following:
Divide the new series into m subintervals of N/m successive points.
For each subinterval, multiply each jth element by exp(i*2*pi*j*beta/N). Here, j is indexed according to the position of the point relative to the first in the whole data stream.
Sum the first elements of each subinterval to produce a single number, sum the second elements, and so forth. This can be done as points are read from file, so there is no need to have the full set of N points in memory.
Fourier transform the resultant series, which contains N/m points.
This will give F(k) for k = ml + beta, for l = 0, ..., N/m-1. Save these values to disk.
Go to 2, and proceed with the next value of beta.

Filtering 1bpp images

I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.

I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.

Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.

Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.

Storing Tile Data In Excess of 100 Million Tiles Per Layer Multiple Layers

Problem: i am trying to store tile data for my map class. i had the idea of using a palette per layer, the palette would describe the data in the layer which would be an array of bytes with each byte representing a tile type.
this means 1 layer of 100 million tiles would equal ~96mb. however i overlooked how much data i could actually store in a byte and it turns out i can only store 256 tiles of course. resulting in square-root of 256 * tile-size texture sizes ( in this case 256 as tile sizes are 16) . 256*256 texture sizes are too small as each palette can only have one texture. severely limiting the tiles i can have in a layer.
i am now stuck in a bind as if i use 2 bytes ( short ) instead of 1 byte to store tile data i will double my memory usage to ~192mb per layer. and i want 4 layers at the minimum. inflating the end product to 768mb of ram used. i also can not describe the data in the data as the array offset of each byte is also a description of its location.
is there a way i could store this data more efficiently. worst case scenario will involve me saving all this to the disk and buffering to memory from the disk. but i would prefer to keep it in memory.
i guess i could come up with something smart in a few hours but i thought i would ask to see if there are any common methods i am unaware of to combat this problem.

I suggest representing your data in an array which maps to the two dimensional plane using a space filling curve such as the Hilbert curve.
Then, compress this using a combination of Huffman coding and run-length encoding. This will be particularly effective if you data is often repeated locally (i.e. there are lots of sections which are all the same tile next to each other).
Do this compression in blocks of say 256 tiles. Then, have an array of offsets that indicate how far into the compressed data certain bytes numbers are.
For example, the start of the second block (tile 256) byte might be at position 103, and the start of the third block (tile 512) might be at position 192.
Then say to access the 400th tile, you can work out this is from the second block, so decompress the second block (in this case from byte 103 to byte 191) and from this get the 400 - 256 = 144 tile. Save (cache) this decompressed data for the moment, it's likely if you're getting nearby tiles they'll also be in this decompressed block. Perhaps in your array of offsets you should also include what blocks have been recently cached, and where in the cache they are.
If you wanted to allow modifications, you'd probably have to change your data structure from one large array to a vector of vectors. Have an indicator for each vector whether it is compressed or not. When doing modifications, uncompress blocks and modify them, and recompress blocks the least recently modified blocks when memory is running out.
Or, you could just dump the whole structure to a file and memory map the file. This is much simpler but may be slower depending on the compressibility of your data and your access patterns due to additional I/O.

What is the best way to get the hash of a QPixmap?

I am developing a graphics application using Qt 4.5 and am putting images in the QPixmapCache, I wanted to optimise this so that if a user inserts an image which is already in the cache it will use that.
Right now each image has a unique id which helps optimises itself on paint events. However I realise that if I could calculate a hash of the image I could lookup the cache to see if it already exists and use that (it would help more for duplicate objects of course).
My problem is that if its a large QPixmap will a hash calculation of it slow things down or is there a quicker way?

A couple of comments on this:
If you're going to be generating a hash/cache key of a pixmap, then you may want to skip the QPixmapCache and use QCache directly. This would eliminate some overhead of using QStrings as keys (unless you also want to use the file path to locate the items)
As of Qt4.4, QPixmap has a "hash" value associated with it (see QPixmap::cacheKey() ). The documentation claims "Distinct QPixmap objects can only have the same cache key if they refer to the same contents." However, since Qt uses shared-data copying, this may only apply to copied pixmaps and not to two distinct pixmaps loaded from the same image. A bit of testing would tell you if it works, and if it does, it would let you easily get a hash value.
If you really want to do a good, fairly quick cache with removing duplications, you might want to look at your own data structure that sorts according to sizes, color depths, image types, and things such as that. Then you would only need to hash the actual image data after you find the same type of image with the same dimensions, bit-depths, etc. Of course, if your users generally open a lot of images with those things the same, it wouldn't help at all.
Performance: Don't forget about the benchmarking stuff Qt added in 4.5, which would let you compare your various hashing ideas and see which one runs the fastest. I haven't checked it out yet, but it looks pretty neat.

Just in case anyone comes across this problem (and isn't too terribly experienced with hashing things, particularly something like an image), here's a VERY simple solution I used for hashing QPixmaps and entering them into a lookup table for later comparison:
qint32 HashClass::hashPixmap(QPixmap pix)
{
QImage image = pix.toImage();
qint32 hash = 0;
for(int y = 0; y < image.height(); y++)
{
for(int x = 0; x < image.width(); x++)
{
QRgb pixel = image.pixel(x,y);
hash += pixel;
hash += (hash << 10);
hash ^= (hash >> 6);
}
}
return hash;
}
Here is the hashing function itself (you can have it hash into a qint64 if you desire less collisions). As you can see I convert the pixmap into a QImage, and simply walk through its dimensions and perform a very simple one-at-a-time hash on each pixel and return the final result. There are many ways to improve this implementation (see the other answers to this question), but this is the basic gist of what needs to be done.
The OP mentioned how he would use this hashing function to then construct a lookup table for later comparing images. This would require a very simple lookup initialization function -- something like this:
void HashClass::initializeImageLookupTable()
{
imageTable.insert(hashPixmap(QPixmap(":/Image_Path1.png")), "ImageKey1");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path2.png")), "ImageKey2");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path3.png")), "ImageKey2");
// Etc...
}
I'm using a QMap here called imageTable which would need to be declared in the class as such:
QMap<qint32, QString> imageTable;
Then, finally, when you want to compare an image to the images in your lookup table (ie: "what image, out of the images I know it can be, is this particular image?"), you just call the hashing function on the image (which I'm assuming will also be a QPixmap) and the return QString value will allow you to figure that out. Something like this would work:
void HashClass::compareImage(const QPixmap& pixmap)
{
QString value = imageTable[hashPixmap(pixmap)];
// Do whatever needs to be done with the QString value and pixmap after this point.
}
That's it. I hope this helps someone -- it would have saved me some time, although I was happy to have the experience of figuring it out.

Hash calculations should be pretty quick (somewhere above 100 MB/s if no disk I/O involved) depending on which algorithm you use. Before hashing, you could also do some quick tests to sort out potential candidates - f.e. images must have same width and height, else it's useless to compare their hash values.
Of course, you should also keep the hash values for inserted images so you only have to calculate a hash for new images and won't have to calculate it again for the cached images.
If the images are different enough, it would perhaps be enough to not hash the whole image but a smaller thumbnail or a part of the image (f.e. first and last 10 lines), this will be faster, but will lead to more collisions.

I'm assuming you're talking about actually calculating a hash over the data of the image rather than getting the unique id generated by QT.
Depending on your images, you probably don't need to go over the whole image to calculate a hash. Maybe only read the first 10 pixels? first scan line?
Maybe a pseudo random selection of pixels from the entire image? (with a known seed so that you could repeat the sequence) Don't forget to add the size of the image to the hash as well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js