Where to alter reference code to extract motion vectors from HEVC encoded video - c++

So this question has been asked a few times, but I think my C++ skills are too deficient to really appreciate the answers. What I need is a way to start with an HEVC encoded video and end with CSV that has all the motion vectors. So far, I've compiled and run the reference decoder, everything seems to be working fine. I'm not sure if this matters, but I'm interested in the motion vectors as a convenient way to analyze motion in a video. My plan at first is to average the MVs in each frame to just get a value expressing something about the average amount of movement in that frame.
The discussion here tells me about the TComDataCU class methods I need to interact with to get the MVs and talks about how to iterate over CTUs. But I still don't really understand the following:
1) what information is returned by these MV methods and in what format? With my limited knowledge, I assume that there are going to be something like 7 values associated with the MV: the frame number, an index identifying a macroblock in that frame, the size of the macroblock, the x coordinate of the macroblock (probably the top left corner?), the y coordinate of the macroblock, the x coordinate of the vector, and the y coordinate of the vector.
2) where in the code do I need to put new statements that save the data? I thought there must be some spot in TComDataCU.cpp where I can put lines in that print the data I want to a file, but I'm confused when the values are actually determined and what they are. The variable declarations look like this:
// create motion vector fields
m_pCtuAboveLeft = NULL;
m_pCtuAboveRight = NULL;
m_pCtuAbove = NULL;
m_pCtuLeft = NULL;
But I can't make much sense of those names. AboveLeft, AboveRight, Above, and Left seem like an asymmetric mix of directions?
Any help would be great! I think I would most benefit from seeing some example code. An explanation of the variables I need to pay attention to would also be very helpful.

At TEncSlice.cpp, you can access every CTU in loop
for( UInt ctuTsAddr = startCtuTsAddr; ctuTsAddr < boundingCtuTsAddr; ++ctuTsAddr )
then you can choose exact CTU by using address of CTU.
pCtu(TComDataCU class)->getCtuRsAddr().
After that,
pCtu->getCUMvField()
will return CTU's motion vector field. You can extract MV of CTU in that object.
For example,
TComMvField->getMv(g_auiRasterToZscan[y * 16 + x])->getHor()
returns specific 4x4 block MV's Horizontal element.
You can save these data after m_pcCuEncoder->compressCtu( pCtu ) because compressCtu determines all data of CTU such as CU partition and motion estimation, etc.
I hope this information helps you and other people!

Related

How does this FFT smoother work in c++/openframeworks?

I'm doing some tutorials for OpenFrameworks (i'm kind of a noob when it comes to coding but have a bit of experience so far w/ tutorials and learning what's going on and stuff over the past few years) and a major part of the code involves grabbing the sound spectrum of an audio sample and throwing the values into an array to control a float value. But I can't seem to wrap my head around what's going on here.
This is the relevant code (it's a VJ shaper that rotates and changes the size of shapes according to input from the sound spectrum):
header:
float * fftSmooth;
int bands;
cpp setup:
fftSmooth = new float[8192];
for (int i = 0; i < 8192; i++) {
fftSmooth[i] = 0;
}
bands = 64;
cpp update:
float * value = ofSoundGetSpectrum(bands);
for (int i = 0; i < bands; i++) {
fftSmooth[i] *= release; //"release" is a float
if (fftSmooth[i] < value[i]) {
fftSmooth[i] = value[i];
}
}
if anyone could walk me through the steps of what's going on, that would be great. I understand (sort of) that in the setup, an array called "fftSmooth" is being created, with 8192 floats in it, then being filled with zeros in the for loop after which the int "bands" is being assigned a value of 64. Then in the update, another array called "value" is being created with 64 floats in it by looking at "bands", which is also the number of bands in ofSoundGetSpectrum, which is grabbing the frequency levels from a sound file as it plays. I've looked at the openframeworks reference page for the sound spectrum thing and didn't really get any more clues as to what it's doing in this context, and i have no idea what the for loops and if statements in the update section are doing either.
Not knowing what's going on really isn't going to impact whether i can actually use the code or not, but i feel like if i want to actually build on this code (grabbing different frequency ranges etc) i need to know what the for loops and if statements in the update are doing.
ofSoundGetSpectrum(...)
Gets a frequency spectrum sample, taking all current sound players into account.
Each band will be represented as a float between 0 and 1.
This appears to be taking an instantaneous FFT, and returning the "strength" of each of the frequency bands.
I assume the second half of the code is run in a loop. The first time through, it is just going to copy the current band strength into fftSmooth. In subsequent passes, the multiply by release is designed to reduce the value in fftSmooth by some percentage. Then any new band strength greater than the filtered one will overwrite the old value.
If you animate plots of fftSmooth, you should get an image like this (minus the color) :

How to use buildOpticalFlowPyramid?

I'm using OpenCV 3.3.1. I want to do a semi-dense optical flow operation using cv::calcOpticalFlowPyrLK, but I've been getting some really noticeable slowdown whenever my ROI is pretty big (Partly due to the fact that I am letting the user decide what the winSize should be, ranging from from 10 to 100). Anyways, it seems like cv::buildOpticalFlowPyramid can mitigate the slowdown by building image pyramids? I'm sorta familiar what image pyramids are, but in context of the function, I'm especially confused about what parameters I pass in, and how it impacts my function call to cv::calcOpticalFlowPyrLK. With that in mind, I now have these set of questions:
The output is, according to the documentation, is an OutputArrayOfArrays, which I take it can be a vector of cv::Mat objects. If so, what do I pass in to cv::calcOpticalFlowPyrLK for prevImg and nextImg (assuming that I need to make image pyramids for both)?
According to the docs for cv::buildOpticalFlowPyramid, you need to pass in a winSize parameter in order to calculate required padding for pyramid levels. If so, do you pass in the same winSize value when you eventually call cv::calcOpticalFlowPyrLK?
What exactly are the arguments for pyrBorder and derivBorder doing?
Lastly, and apologies if it sounds newbish, but what is the purpose of this function? I always assumed that cv::calcOpticalFlowPyrLK internally builds the image pyramids. Is it just to speed up the optical flow operation?
I hope my questions were clear, I'm still very new to OpenCV, and computer vision, but this topic is very interesting.
Thank you for your time.
EDIT:
I used the function to see if my guess was correct, so far it has worked, but I've seen no noticeable speed up. Below is how I used it:
// Building pyramids
int maxLvl = 3;
maxLvl = cv::buildOpticalFlowPyramid(imgPrev, imPyr1, cv::Size(searchSize, searchSize), maxLvl, true);
maxLvl = cv::buildOpticalFlowPyramid(tmpImg, imPyr2, cv::Size(searchSize, searchSize), maxLvl, true);
// LK optical flow call
cv::calcOpticalFlowPyrLK(imPyr1, imPyr2, currentPoints, nextPts, status, err,
cv::Size(searchSize, searchSize), maxLvl, termCrit, 0, 0.00001);
So now I'm wondering what's the purpose of preparing the image pyramids if calcOpticalFlowPyrLK does it internally?
So the point of your question is that you are trying to improve speed of optical flow tracking by tuning your input parameters.
If you want dirty and quick answer then here it is
KTL (OpenCV's calcOpticalFlowPyrLK) define a e residual function which are sum of gradient of point inside search window .
The main purpose is to find vector of point that can minimize residual function
So if you increase search window size (winSize) then it is more difficult to find that set of points.
If your really really want to do that then please read the official paper.
See the section 2.4
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.585&rep=rep1&type=pdf
I took it from official document
https://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#bouguet00
Hope that help

Find coordinates in a vector c++

I'm creating a game in Qt in c++, and I store every coordinate of specific size into a vector like :
std::vector<std::unique_ptr<Tile>> all_tiles = createWorld(bgTile);
for(auto & tile : all_tiles) {
tiles.push_back(std::move(tile));
}
Each level also has some healthpacks which are stored in a vector aswell.
std::vector<std::unique_ptr<Enemy>> all_enemies = getEnemies(nrOfEnemies);
for(auto &healthPackUniquePtr : all_healthpacks) {
std::shared_ptr<Tile> healthPackPtr{std::move(healthPackUniquePtr)};
int x = healthPackPtr->getXPos();
int y = healthPackPtr->getYPos();
int newYpos=checkOverlapPos(healthPackPtr->getXPos(),healthPackPtr->getYPos());
newYpos = checkOverlapEnemy(healthPackPtr->getXPos(),newYpos);
auto healthPack = std::make_shared<HealthPack>(healthPackPtr->getXPos(), newYpos, healthPackPtr->getValue());
healthPacks.push_back(healthPack);
}
But know I'm searching for the fastest way to check if my player position is at an healthpack position. So I have to search on 2 values in a vector : x and y position. Anyone a suggestion how to do this?
Your 'real' question:
I have to search on 2 values in a vector : x and y position. Anyone a
suggestion how to do this?"
Is a classic XY question, so I'm ignoring it!
I'm searching for the fastest way to check if my player position is at
an healthpack position.
Now we're talking. The approach you are using now won't scale well as the number of items increase, and you'll need to do something similar for every pair of objects you are interested in. Not good.
Thankfully this problem has been solved (and improved upon) for decades, you need to use a spacial partitioning scheme such as BSP, BVH, quadtree/octree, etc. The beauty of the these schemes is that a single data structure can hold the entire world in it, making arbitrary item intersection queries trivial (and fast).
You can implement a callback system. Then a player moves a tile, fire a callback to that tile which the player is on. Tiles should know its state and could add health to a player or do nothing if there is nothing on that tile. Using this technique, you don`t need searching at all.
If all_leathpacks has less than ~50 elements I wouldn't bother to improve. Simple loop is going to be sufficiently fast.
Otherwise you can split the vector into sectors and check only for the elements in the same sector as your player (and maybe a few around if it's close to the edge).
If you need something that's better for the memory you and use a KD-tree to index the healtpacks and search for them fast (O(logN) time).

Excluding fields with certain state from 2D array; Game of life

I have an array - 2D(100 x 100 in this case) with some states limited within borders as shown on picture:
http://tinypic.com/view.php?pic=mimiw5&s=5#.UkK8WIamiBI
Each cell has its own id(color, for example green is id=1) and flag isBorder(marked as white on pic if true). What I am trying to do is exclude set of cell with one state limited with borders(Grain) so i could work on each grain separately which means i would need to store all indexes for each grain.
Any one got an idea how to solve it?
Now that I've read your question again... The algorithm is essentially the same as filling the contiguous area with color. The most common way to do it is a BFS algorithm.
Simply start within some point you are sure lays inside the current area, then gradually move in every direction, selecting traversed fields and putting them into a vector.
// Edit: A bunch of other insights, made before I understood the question.
I can possibly imagine an algorithm working like this:
vector<2dCoord> result = data.filter(DataType::Green);
for (2dCoord in result) {
// do some operations on data[2dCoord]
}
The implementation of filter in a simple unoptimized way would be to scan the whole array and push_back matching fields to the vector.
If you shall need more complicated queries, lazily-evaluated proxy objects can work miracles:
data.filter(DataType::Green)
.filter_having_neighbours(DataType::Red)
.closest(/*first*/ 100, /*from*/ 2dCoord(x,y))
.apply([](DataField& field) {
// processing here
});

What is the best way to get the hash of a QPixmap?

I am developing a graphics application using Qt 4.5 and am putting images in the QPixmapCache, I wanted to optimise this so that if a user inserts an image which is already in the cache it will use that.
Right now each image has a unique id which helps optimises itself on paint events. However I realise that if I could calculate a hash of the image I could lookup the cache to see if it already exists and use that (it would help more for duplicate objects of course).
My problem is that if its a large QPixmap will a hash calculation of it slow things down or is there a quicker way?
A couple of comments on this:
If you're going to be generating a hash/cache key of a pixmap, then you may want to skip the QPixmapCache and use QCache directly. This would eliminate some overhead of using QStrings as keys (unless you also want to use the file path to locate the items)
As of Qt4.4, QPixmap has a "hash" value associated with it (see QPixmap::cacheKey() ). The documentation claims "Distinct QPixmap objects can only have the same cache key if they refer to the same contents." However, since Qt uses shared-data copying, this may only apply to copied pixmaps and not to two distinct pixmaps loaded from the same image. A bit of testing would tell you if it works, and if it does, it would let you easily get a hash value.
If you really want to do a good, fairly quick cache with removing duplications, you might want to look at your own data structure that sorts according to sizes, color depths, image types, and things such as that. Then you would only need to hash the actual image data after you find the same type of image with the same dimensions, bit-depths, etc. Of course, if your users generally open a lot of images with those things the same, it wouldn't help at all.
Performance: Don't forget about the benchmarking stuff Qt added in 4.5, which would let you compare your various hashing ideas and see which one runs the fastest. I haven't checked it out yet, but it looks pretty neat.
Just in case anyone comes across this problem (and isn't too terribly experienced with hashing things, particularly something like an image), here's a VERY simple solution I used for hashing QPixmaps and entering them into a lookup table for later comparison:
qint32 HashClass::hashPixmap(QPixmap pix)
{
QImage image = pix.toImage();
qint32 hash = 0;
for(int y = 0; y < image.height(); y++)
{
for(int x = 0; x < image.width(); x++)
{
QRgb pixel = image.pixel(x,y);
hash += pixel;
hash += (hash << 10);
hash ^= (hash >> 6);
}
}
return hash;
}
Here is the hashing function itself (you can have it hash into a qint64 if you desire less collisions). As you can see I convert the pixmap into a QImage, and simply walk through its dimensions and perform a very simple one-at-a-time hash on each pixel and return the final result. There are many ways to improve this implementation (see the other answers to this question), but this is the basic gist of what needs to be done.
The OP mentioned how he would use this hashing function to then construct a lookup table for later comparing images. This would require a very simple lookup initialization function -- something like this:
void HashClass::initializeImageLookupTable()
{
imageTable.insert(hashPixmap(QPixmap(":/Image_Path1.png")), "ImageKey1");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path2.png")), "ImageKey2");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path3.png")), "ImageKey2");
// Etc...
}
I'm using a QMap here called imageTable which would need to be declared in the class as such:
QMap<qint32, QString> imageTable;
Then, finally, when you want to compare an image to the images in your lookup table (ie: "what image, out of the images I know it can be, is this particular image?"), you just call the hashing function on the image (which I'm assuming will also be a QPixmap) and the return QString value will allow you to figure that out. Something like this would work:
void HashClass::compareImage(const QPixmap& pixmap)
{
QString value = imageTable[hashPixmap(pixmap)];
// Do whatever needs to be done with the QString value and pixmap after this point.
}
That's it. I hope this helps someone -- it would have saved me some time, although I was happy to have the experience of figuring it out.
Hash calculations should be pretty quick (somewhere above 100 MB/s if no disk I/O involved) depending on which algorithm you use. Before hashing, you could also do some quick tests to sort out potential candidates - f.e. images must have same width and height, else it's useless to compare their hash values.
Of course, you should also keep the hash values for inserted images so you only have to calculate a hash for new images and won't have to calculate it again for the cached images.
If the images are different enough, it would perhaps be enough to not hash the whole image but a smaller thumbnail or a part of the image (f.e. first and last 10 lines), this will be faster, but will lead to more collisions.
I'm assuming you're talking about actually calculating a hash over the data of the image rather than getting the unique id generated by QT.
Depending on your images, you probably don't need to go over the whole image to calculate a hash. Maybe only read the first 10 pixels? first scan line?
Maybe a pseudo random selection of pixels from the entire image? (with a known seed so that you could repeat the sequence) Don't forget to add the size of the image to the hash as well.