Picture entropy calculation - c++

I've run into some nasty problem with my recorder. Some people are still using it with analog tuners, and analog tuners have a tendency to spit out 'snow' if there is no signal present.
The Problem is that when noise is fed into the encoder, it goes completely crazy and first consumes all CPU then ultimately freezes. Since main point od the recorder is to stay up and running no matter what, I have to figure out how to proceed with this, so encoder won't be exposed to the data it can't handle.
So, idea is to create 'entropy detector' - a simple and small routine that will go through the frame buffer data and calculate entropy index i.e. how the data in the picture is actually random.
Result from the routine would be a number, that will be 0 for completely back picture, and 1 for completely random picture - snow, that is.
Routine in itself should be forward scanning only, with few local variables that would fit into registers nicely.
I could use zlib or 7z api for such task, but I would really want to cook something on my own.
Any ideas?

PNG works this way (approximately): For each pixel, replace its value by the value that it had minus the value of the pixel left to it. Do this from right to left.
Then you can calculate the entropy (bits per character) by making a table of how often which value appears now, making relative values out of these absolute ones and adding the results of log2(n)*n for each element.
Oh, and you have to do this for each color channel (r, g, b) seperately.
For the result, take the average of the bits per character for the channels and divide it by 2^8 (assuming that you have 8 bit per color).

Related

Drawing audio spectrum with Bass library

How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.

find the same area between 2 images

I want to merge 2 images. How can i remove the same area between 2 images?
Can you tell me an algorithm to solve this problem. Thanks.
Two image are screenshoot image. They have the same width and image 1 always above image 2.
When two images have the same width and there is no X-offset at the left side this shouldn't be too difficult.
You should create two vectors of integer and store the CRC of each pixel row in the corresponding vector element. After doing this for both pictures you find the CRC of the first line of the lower image in the first vector. This is the offset in the upper picture. Then you check that all following CRCs from both pictures are identical. If not, you have to look up the next occurrence of the initial CRC in the upper image again.
After checking that the CRCs between both pictures are identical when you apply the offset you can use the bitblit function of your graphics format and build the composite picture.
I haven't come across something similar before but I think the following might work:
Convert both to grey-scale.
Enhance the contrast, the grey box might become white for example and the text would become more black. (This is just to increase the confidence in the next step)
Apply some threshold, converting the pictures to black and white.
afterwards, you could find the similar areas (and thus the offset of overlap) with a good degree of confidence. To find the similar parts, you could harper's method (which is good but I don't know how reliable it would be without the said filtering), or you could apply some DSP operation(s) like convolution.
Hope that helps.
If your images are same width and image 1 is always on top. I don't see how that hard could it be..
Just store the bytes of the last line of image 1.
from the first line to the last of the image 2, make this test :
If the current line of image 2 is not equal to the last line of image 1 -> continue
else -> break the loop
you have to define a new byte container for your new image :
Just store all the lines of image 1 + all the lines of image 2 that start at (the found line + 1).
What would make you sweat here is finding the libraries to manipulate all these data structures. But after a few linkage and documentation digging, you should be able to easily implement that.

C++: How to interpret a byte array representation of an image?

I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.

Filtering 1bpp images

I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.
I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.
Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.
Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.

Using ImageMagick++ to modify image contrast/brightness

I'm trying to apply contrast and brightness to a bitmap in memory and I'm completely lost. Currently I'm trying to use Magick++ to do it, but if one of the other APIs would work better I'm all ears. I managed to find Magick::Image::sigmoidalContrast() for applying the contrast, but I can't figure out how to get it to work. I'm creating an image, passing it the buffer pointer, then calling that function, but it doesn't seem like it's changing anything so my first though was that it's making a copy and modifying that. Even so, I have no idea how to get the data out of the Magick::Image object.
Here's what I got so far.
Magick::Image image(fBitmapData->mGetTextureWidth(), fBitmapData->mGetTextureHeight(), "RGBA", MagickCore::CharPixel, pixels);
image.sigmoidalContrast(1, 20.0);
The documentation is useless and after searching I could only find hints that the first parameter is actually a boolean, even though it takes a size_t, that specifies whether to add or subtract the contrast, and the second value is something I have no idea what to pass so I'm just using 20.0 to test.
So does anyone know if this will work for contrast, and if not, then how do you apply contrast? And likewise I still have no idea how to apply brightness either and can't find any functions that look like they would work.
Figured it out; The function for contrast I was using was correct, and for brightness I ended up using image.modulate(brightness, 100.0, 100.0);. To get the data out of the image object you can grab the pixels of the entire image by doing
const MagickCore::PixelPacket * magickPixels = image.getConstPixels(0, 0, image.columns(), image.rows());
And then copy the magickPixels data back into the original pixels that were passed into the image constructor. An important thing to note is that the member MagickCore::PixelPacket::opacity is not what you would think it would be. If the pixel is completely transparent you'd think the value would be 0, right? Well for some reason ImageMagick is doing it opposite. So for full transparency the value would be 255. This means you need to do 255 - opacity to get the correct value.
Also be careful of the MAGICKCORE_QUANTUM_DEPTH that ImageMagick was compiled with, as this will change the values drastically. For my code MAGICKCORE_QUANTUM_DEPTH just happened to be defined as 16 so all of the values were a range of 0 to 65535, which I just fixed by doing realValue = magickValue >> 8 when copying the data back over since the texture data is unsigned char values.
Just for clarification on how to use these functions, since the documentation is horrible and completely wrong, the first parameter to signmoidalContrast() is actually a boolean, even though the type is a size_t, that specifies whether to increase the contrast (true) or reduce it (false), and the second is a range from 0.00001 to 20.0. I say 0.00001 because 0.0 is an invalid value so it just needs to be some decimal that is close to but not exactly 0.0.
For modulate() the documentation says that each value should be specified as 1.0 for no change, which is completely wrong. The values are actually a percentage so for no change you would specify 100.0.
I hope that helps someone because it took me all damn day to figure this stuff out.
According to the Imagemagick website - for the command line but may be the same?
-sigmoidal-contrast contrastxmid-point
increase the contrast without saturating highlights or shadows.
Increase the contrast of the image using a sigmoidal transfer function without saturating highlights or shadows. Contrast indicates how much to increase the contrast. For example, near 0 is none, 3 is typical and 20 is a lot. Note that exactly zero is invalid, but 0.0001 is negligibly different from no change in contrast. mid-point indicates where midtones fall in the resultant image (0 is white; 50% is middle-gray; 100% is black). By default the image contrast is increased, use +sigmoidal-contrast to decrease the contrast.
To achieve the equivalent of a sigmoidal brightness change, use -sigmoidal-contrast brightnessx0% to increase brightness and class="arg">+sigmoidal-contrast brightnessx0% to decrease brightness.
On the command line there is a new brightness contrast setting that may be in later versions of magic++?
-brightness-contrast brightness{xcontrast}{%}}
Adjust the brightness and/or contrast of the image.
Brightness and Contrast values apply changes to the input image. They are not absolute settings. A brightness or contrast value of zero means no change. The range of values is -100 to +100 on each. Positive values increase the brightness or contrast and negative values decrease the brightness or contrast. To control only contrast, set the brightness=0. To control only brightness, set contrast=0 or just leave it off.
You may also use -channel to control which channels to apply the brightness and/or contrast change. The default is to apply the same transformation to all channels.
Brightness and Contrast arguments are converted to offset and slope of a linear transform and applied using -function polynomial "slope,offset".
The slope varies from 0 at contrast=-100 to almost vertical at contrast=+100. For brightness=0 and contrast=-100, the result are totally midgray. For brightness=0 and contrast=+100, the result will approach but not quite reach a threshold at midgray; that is the linear transformation is a very steep vertical line at mid gray.
Negative slopes, i.e. negating the image, are not possible with this function. All achievable slopes are zero or positive.
The offset varies from -0.5 at brightness=-100 to 0 at brightness=0 to +0.5 at brightness=+100. Thus, when contrast=0 and brightness=100, the result is totally white. Similarly, when contrast=0 and brightness=-100, the result is totally black.
As the range of values for the arguments are -100 to +100, adding the '%' symbol is no different than leaving it off.
If magick++ is like Imagick it may be lagging a long way behind the Imagemagick options