I have a video source that produce many streams for different devices (such as: HD television, Pads, smart phones, etc.), every of them has to be checked within each other for similarity. The video stream release 50 images per second, one image every 20 milliseconds.
Lets take for instance img1 coming from stream1 at time ts1=1, img2 coming from stream2 at ts2=1 and img1.1 taken from stream1 at ts=2 (20 milliseconds later than ts=1), the comparison result should look something like this:
compare(img1, img1) = 1 same image same size
compare(img1, img2) = 0.9 same image different size
compare(img1, img1.1) = 0.8 different images same size
ideally this should be done real time, so within 20 millisecond, the goal is to understand if the streams are out of synchronization, I already implemented some compare methods (nobody of them works for this case yet):
1) histogram (SSE and OpenCV cuda), result compare(img1, img2) ~= compare(img1, img1.1)
2) pnsr (SSE and OCV cuda), result compare(img1, img2) < compare(img1, img1.1)
3) ssim (SSE and OCV cuda), resulting the same as pnsr
Maybe I get bad results because of the resize interpolation method?
Is it possible to realize a comparison method that fulfill my requirements, any ideas?
I'm afraid that you're running into a Real Problem (TM). This is not a trivial lets-give-it-to-the-intern problem.
The main challenge is that you can't do a brute-force comparison. HD images are 3 MB or more, and you're talking about O(N*M) comparisons (in time and across streams).
What you essentially need is a fingerprint that's robust against resizing but time-variant. And as you didn't realize that (the histogram idea for instance is quite time-stable, for instance) you didn't include the necessary information in this question.
So this isn't a C++ question, really. You need to understand your inputs.
Related
I'm looking for an audio or image compression algorithm that can compress a torrent of 16-bit samples
by a fairly predictable amount (2-3x)
at very high speed (say, 60 cycles per sample at most: >100MB/s)
with lossiness being acceptable but, of course, undesirable
My data has characteristics of images and audio (2-dimensional, correlated in both dimensions and audiolike in one dimension) so algorithms for audio or images might both be appropriate.
An obvious thing to try would be this one-dimensional algorithm:
break up the data into segments of 64 samples
measure the range of values among those samples (as an example, the samples might be between 3101 and 9779 in one segment, a difference of 6678)
use 2 to 4 additional bytes to encode the range
linearly downsample each 16-bit sample to 8 bits in that segment.
For example, I could store 3101 in 16 bits, and store a scaling factor ceil(6678/256) = 27 in 8 bits, then convert each 16-bit sample to 8-bit as s8 = (s16 - base) / scale where base = 3101 + 27>>1, scale = 27, with the obvious decompression "algorithm" of s16 = s8 * 27 + 3101.) Compression ratio: 128/67 = 1.91.
I've played with some ideas to avoid the division operation, but hasn't someone by now invented a superfast algorithm that could preserve fidelity better than this one?
Note: this page says that FLAC compresses at 22 million samples per second (44MB/s) at -q6 which is pretty darn good (assuming its implementation is still single-threaded), if not quite enough for my application. Another page says FLAC has similar performance (40MB/s on a 3.4GHz i3-3240, -q5) as 3 other codecs, depending on quality level.
Take a look at the PNG filters for examples of how to tease out your correlations. The most obvious filter is "sub", which simply subtracts successive samples. The differences should be more clustered around zero. You can then run that through a fast compressor like lz4. Other filter choices may result in even better clustering around zero, if they can find advantage in the correlations in your other dimension.
For lossy compression, you can decimate the differences before compressing them, dropping a few low bits until you get the compression you want, and still retain the character of the data that you would like to preserve.
I'm looking for some image compression operations, preferably simple in nature, that provide moderate compression ratios while preserving the edges in the images.
Please note that algorithms like JPEG which pack multiple operations are not applicable (unfortunately).
If you're using numpy, I suggest you take a look at the scipy.misc.imsave method
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.imsave.html
You can easily store your data in png without any loss and with compression ratios along the ranges you mentioned in your comment, e.g.,
rgb = np.zeros((255, 255, 3), dtype=np.uint8)
rgb[..., 0] = np.arange(255)
rgb[..., 1] = 55
rgb[..., 2] = 1 - np.arange(255)
imsave('/tmp/rgb_gradient.png', rgb)
Edit after comment 1:
It is really difficult to answer this question because of the lack of specifics.
Retaining a compressed version of the image in memory will certainly slow down your processing, as you will either need to decode and encode relevant parts of the image in each operation, or you'll need to use very specific algorithms that allow you to access and modify pixel values in the compressed domain (e.g., http://ieeexplore.ieee.org/document/232097/).
Now, to answer your question, the simplest way I can think is to use Huffman coding (https://www.geeksforgeeks.org/greedy-algorithms-set-3-huffman-coding/) and store the codewords in memory. You will probably need to encode groups of pixels together so that each byte of codewords results in more than one pixel (and hence you could have any real compression). Otherwise, you'd need to find a way to efficiently pack small codewords (say 2 or 3 bits) together, which will certainly hinder your ability to read and write individual pixel values.
How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.
My application is C++ OpenCV based, which needs to detect an object in an image, by threshold filtering. I divided the image into small strips, because of performance reason. I only scan the areas I need to. The image is 2400x1800 pixles. The strips are 1000x50. The Image color space is HSV. As the desired object can be one of few colors (for example 8), I run the filter 8 times per strip. So, in the application, I run the filter a few tens of times.
The application is time critical.
For most of the runs, the strip filter takes <<1 millisecond.
The problem: Every few filters (can be between 10 to 40, depending on the strip size), the run takes 15 milliseconds (always the same 15 milliseconds)!
The total run which should run in 1-2 milliseconds, runs between 50 to 100 milliseconds, depending on how many times there was a 15 millisecond run.
The heart of the code which accesses the Mat and causes the steal of time looks like this:
for i....{ // cols
for j....{ // rows
p1i=img_hsv.at<uchar>(j,i*3+0); // H
p2i=img_hsv.at<uchar>(j,i*3+1); // S
p3i=img_hsv.at<uchar>(j,i*3+2); // V
}
}
Again, the rate of steal increases as the strip size increases. I assume it has something to do with accessing the PC memory resources. I already tries to change the page size, or define the code as critical section, with no success.
The application is Win32 XP or 7 based.
Appreciate your help.
Many thanks,
HBR.
It is normally not necessary to access pixels individually for filtering operations. You left out the details of your algorithm - maybe you can implement it by using an OpenCV function like threshold and related, which will work on the whole image. These methods are optimized for memory access, so you wouldn't have to spend time to track down timing issues like this.
As an educational excercise for myself I'm writing an application that can average a bunch of images. This is often used in Astrophotography to reduce noise.
The library I'm using is Magick++ and I've succeeded in actually writing the application. But, unfortunately, its slow. This is the code I'm using:
for(row=0;row<rows;row++)
{
for(column=0;column<columns;column++)
{
red.clear(); blue.clear(); green.clear();
for(i=1;i<10;i++)
{
ColorRGB rgb(image[i].pixelColor(column,row));
red.push_back(rgb.red());
green.push_back(rgb.green());
blue.push_back(rgb.blue());
}
redVal = avg(red);
greenVal = avg(green);
blueVal = avg(blue);
redVal = redVal*MaxRGB; greenVal = greenVal*MaxRGB; blueVal = blueVal*MaxRGB;
Color newRGB(redVal,greenVal,blueVal);
stackedImage.pixelColor(column,row,newRGB);
}
}
The code averages 10 images by going through each pixel and adding each channel's pixel intensity into a double vector. The function avg then takes the vector as a parameter and averages the result. This average is then used at the corresponding pixel in stackedImage - which is the resultant image. It works just fine but as I mentioned, I'm not happy with the speed. It takes 2 minutes and 30s seconds on a Core i5 machine. The images are 8 megapixel and 16 bit TIFFs. I understand that its a lot of data, but I have seen it done faster in other applications.
Is it my loop thats slow or is pixelColor(x,y) a slow way to access pixels in an image? Is there a faster way?
Why use vectors/arrays at all?
Why not
double red=0.0, blue=0.0, green=0.0;
for(i=1;i<10;i++)
{
ColorRGB rgb(image[i].pixelColor(column,row));
red+=rgb.red();
blue+=rgb.blue();
green+=rgb.green();
}
red/=10;
blue/=10;
green/=10;
This avoids 36 function calls on vector objects per pixel.
And you may get even better performance by using a PixelCache of the whole image instead of the original Image objects. See the "Low-Level Image Pixel Access" section of the online Magick++ documentation for Image
Then the inner loop becomes
PixelPacket* pix = cache[i]+row*columns+column;
red+= pix->red;
blue+= pix->blue;
green+= pix->green;
Now you have also removed 10 calls to PixelColor, 10 ColorRGB constructors, and 30 accessor functions per pixel.
Note, This is all theory; I haven't tested any of it
Comments:
Why do you use vectors for red, blue and green? Because using push_back can perform reallocations, and bottleneck processing. You could instead allocate just once three arrays of 10 colors.
Couldn't you declare rgb outside of the loops in order to relieve stack of unnecessary constructions and destructions?
Doesn't Magick++ have a way to average images?
Just in case anyone else wants to average images to reduce noise, and doesn't feel like too much "educational exercise" ;-)
ImageMagick can do averaging of a sequence of images like this:
convert image1.tif image2.tif ... image32.tif -evaluate-sequence mean result.tif
You can also do median filtering and others by changing the word mean in the above command to whatever you want, e.g.:
convert image1.tif image2.tif ... image32.tif -evaluate-sequence median result.tif
You can get a list of the available operations with:
identify -list evaluate
Output
Abs
Add
AddModulus
And
Cos
Cosine
Divide
Exp
Exponential
GaussianNoise
ImpulseNoise
LaplacianNoise
LeftShift
Log
Max
Mean
Median
Min
MultiplicativeNoise
Multiply
Or
PoissonNoise
Pow
RightShift
RMS
RootMeanSquare
Set
Sin
Sine
Subtract
Sum
Threshold
ThresholdBlack
ThresholdWhite
UniformNoise
Xor