quantization of dct image for steganography - compression

I hav a greyscale image. I did 8x8 blocks and computed each of their DCTs. I want to quantize the DCT coefficients and then replace their LSBs with my secret message bits. How exactly do I quantize the coefficients? Should I use the quantization matrix used by JPEG? How to determine the values of such a quantization matrix?

You will probably want to set the quality level to the highest (smallest values in the quantization matrix) so that the modified LSB of each coefficient perturbs the image data the least.
For encoding:
You will need access to the DCT values after quantization and before entropy coding. There you can modify the LSB's. You should probably only modify the non-zero coefficient values or you will make the compressed image file much larger and more distorted. This way, you will probably be able to encode 20-30 bits per DCT block.
For decoding:
You will need to do the reverse and get access to the DCT values immediately after the entropy decode and before the dequantization step.
To calculate the total number of bits available for your message, use the following example:
For a VGA sized image (640x480) which is encoded as 4:2:0 (subsampled color in both dimensions), you will have 40 x 30 = 1200 MCUs. Each MCU has 6 DCT blocks (4Y, 1Cr, 1Cb). This is a total of 7200 DCT blocks. If each block encodes an average of 25 coefficients (a reasonable quality level), then your message can be a total of 7200x25 = 180000 bits.

Related

Size of prototype mask and Mask produced in Yolact

I read the paper explaining Yolact and Yolact++. I'm confused with the mask size and prototype mask. There is an illustration of protonet and the output from protonet is of size 138 * 138 * 32. Is this the size of protomask? I have read in the paper saying that the algorithm produces an image sized mask. So Please clarify the size of the mask produced.
Take for example an input with the following size:
(H,W,C) = (512,512,3)
The protonet will give you the following output size (a.k.a proto-masks): (128,128,32) - where 32=Number of Protos. It is 1/4 of the input size.
The protos are being used for getting the mask by a linear combination of them, with the corresponding coefficients predicted by the prediction module.
Therefore you will have a mask, with the size (128,128). Then a crop is being done on this mask, the cropping is done according to the bbox prediction (after NMS).
The bbox values can be related as relative to the image size, therefore (0.5,0.5,1.,1.) which corresponds to (256.,256.,512.,512.) in the input image, are (64.,64.,128.,128.) in the mask created by the protos

Compression of an image

I have been calculating the uncompressed and compressed file sizes of an image. This for me has always resulted in the compressed image being smaller than the uncompressed image which I would expect. If an image contains a large number of different colours, then storing the palette takes up a significant amount of space, and more bits are also needed to store each code. However my question is, would it be possible the compression method could potentially result in a larger file than the uncompressed RGB image. What would the size (in pixels) of the smallest square RGB image, containing a total of k different colours, for which this compression method is still useful? So we want to find, for a given value of k, find the smallest integer number n for which an image of size n×n takes up less storage space after compression than the original RGB image.
Let's begin by making a small simplification -- the size of the encoded output depends on the number of pixels (the actual proportion of width vs. height doesn't really matter). Hence, let's generalize the problem to number of pixels N, from which we can always calculate n by taking a square root.
To further simplify the problem, we will also ignore the overhead of any image headers/metadata, such as width, height, size of the palette, etc. In practice, this would generally be some relatively small constant.
Problem Statement
Given that we have
N representing the number of pixels in an image
k representing the number of distinct colours in an image
24 bits per pixel RGB encoding
LRGB representing the length of a RGB image
LP representing the length of a palette image
our goal is to solve the following inequality
in terms of N.
Size of RGB Image
RGB image is just an array of N pixels, each pixel taking up a fixed number of bits given by the RGB encoding. Hence,
Size of Palette Image
Palette image consists of two parts: a palette, and the pixels.
A palette is an array of k colours, each colour taking up a fixed number of bits given by the RGB encoding. Therefore,
In this case, each pixel holds an index to a palette entry, rather than an actual RGB colour. The number of bits required to represent k values is
However, unless we can encode fractional bits (which I consider outside the scope of this question), we need to round this up. Therefore, the number of bits required to encode a palette index is
Since there are N such palette indices, the size of the pixel data is
and the total size of the palette image is
Solving the Inequality
And finally
In Python, we could express this in the following way:
import math
def limit_size(k):
return (k * 24.) / (24. - math.ceil(math.log(k, 2)))
def size_rgb(N):
return (N * 24.)
def size_pal(N, k):
return (N * math.ceil(math.log(k, 2))) + (k * 24.)
In general no, but your question is not precise.
If we compress normal files, they could be larger. E.g. if you compress a random generated sequence of bytes, there is not much to compress, and so you get the header of compression program, which tell which compression method is used, and some versioning. This will enlarge the file, and ev. some escaping. Good compression program will see that compression will not shrink the size, and so they should just not compress, and tell in the header that it is a flat file. Possibly this is done by region of program.
But your question is about images. Compression is done inside the file, and often not all file, but just the image bits. In this case program will see that there is no need to compress, and so they would keep the file uncompressed. But because the image headers are always present, this change only a flag, and so no increase of size.
But this could depends also on file format. You wrote about "palette", but this is not much used nowadays: compression is done finding similar pattern on file. But again: this depends on the image format. If you look in Wikipedia, for particular file format, you may see a table with headers parameters (e.g. bit depth or number of colours (palette), definitions of colours, and methods used to compress).
Then, for palette like image, the answer of Dan Mašek (https://stackoverflow.com/a/58683948/2758823) has some nice mathematical explanation, but one should not forget that compression is much heuristic and test of real examples: real images have patterns.

ITU-R 2k filter implementation

I have an array coming from a digitizer. I do an fft on it and then I calculate the frequency bins and apply a 20kHz low pass filter. The next step would be to apply an ITU-R 2k filter on this array and the filter behaves like the curve in the picture. I know I am supposed to do a multiplication one by one of the samples but I am not sure how to start with it. I know the 0 dB point is at 2 kHz and the maximum of 6 dB is located at 7 kHz. The implementation has to done in C++.
itu-r 468 filter behavior
An LTI filter like this is a straightforward multiplication in the frequency domain. Put the filter coefficients in an array of the same length, multiply the two: std::transform(std::begin(fftbins), std::end(fftbins), std::begin(filtercoeff), std::multiplies<std::complex<double>>()); and perform the IFFT.

Length of FFT and IFFT

I have some signals which I add up to a larger signal, where each signal is located in a different frequency region.
Now, I perform the FFT operation on the big signal with FFTW and cut the concrete FFT bins (where the signals are located) out.
For example: The big signal is FFT transformed with 1024 points,
the sample rate of the signal is fs=200000.
I calculate the concrete bin positions for given start and stop frequencies in the following way:
tIndex.iStartPos = (int64_t) ((tFreqs.i64fstart) / (mSampleRate / uFFTLen));
and e.g. I get for the first signal to be cut out 16 bins.
Now I do the IFFT transformation again with FFTW and get the 16 complex values back (because I reserved the vector for 16 bins).
But when I compare the extracted signal with the original small signal in MATLAB, then I can see that the original signal (is a wav-File) has xxxxx data and my signal (which I saved as raw binary file) has only 16 complex values.
So how do I obtain the length of the IFFT operation to be correctly transformed? What is wrong here?
EDIT
The logic itself is split over 3 programs, each line is in a multithreaded environment. For that reason I post here some pseudo-code:
ReadWavFile(); //returns the signal data and the RIFF/FMT header information
CalculateFFT_using_CUFFTW(); //calculates FFT with user given parameters, like FFT length, polyphase factor, and applies polyphased window to reduce leakage effect
GetFFTData(); //copy/get FFT data from CUDA device
SendDataToSignalDetector(); //detects signals and returns center frequency and bandwith for each sigal
Freq2Index(); // calculates positions with the returned data from the signal detector
CutConcreteBins(position);
AddPaddingZeroToConcreteBins(); // adds zeros till next power of 2
ApplyPolyphaseAndWindow(); //appends the signal itself polyphase-factor times and applies polyphased window
PerformIFFT_using_FFTW();
NormalizeFFTData();
Save2BinaryFile();
-->Then analyse data in MATLAB (is at the moment in work).
If you have a real signal consisting of 1024 samples, the contribution from the 16 frequency bins of interest could be obtained by multiplying the frequency spectrum by a rectangular window then taking the IFFT. This essentially amounts to:
filling a buffer with zeros before and after the frequency bins of interest
copying the frequency bins of interest at the same locations in that buffer
if using a full-spectrum representation (if you are using fftw_plan_dft_1d(..., FFTW_BACKWARD,... for the inverse transform), computing the Hermitian symmetry for the upper half of the spectrum (or simply use a half-spectrum representation and perform the inverse transform through fftw_plan_dft_c2r_1d).
That said, you would get a better frequency decomposition by using specially designed filters instead of just using a rectangular window in the frequency domain.
The output length of the FT is equal to the input length. I don't know how you got to 16 bins; the FT of 1024 inputs is 1024 bins. Now for a real input (not complex) the 1024 bins will be mirrorwise identical around 512/513, so your FFT library may return only the lower 512 bins for a real input. Still, that's more than 16 bins.
You'll probably need to fill all 1024 bins when doing the IFFT, as it generally doesn't assume that its output will become a real signal. But that's just a matter of mirroring the lower 512 bins then.

Drawing audio spectrum with Bass library

How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.