Video codec: dealing with delta values - compression

Assume I have a ref (key) frame I encoded. Now I want to inter-predict the next frame called delta frame P. To do that I calculate motion vectors and find diff between compensated P and I. However, this diff can be quite huge in some cases. Given that my input color values are in range [0, 255] then the diff can be in [-255, 255] range. This is one more bit of information. How is this usually approached? Is this range scaled down to [-127, 127] or something like that? Or do we just compress higher-precision delta data?
EDIT:
Okay, so I did a few experiments:
a) compressing ref frame - resulting compression ratio x26
Then I took two similar frames and tested and got this:
b) before converting to luma-chroma, I take both frames in RGB, compute their xor and run luma-chroma and DCT - compression ratio x39 - severe artifacts show up
c) I take difference frame1-frame2 and output clamp(frame1-frame2, -0.5, 0.5) + 0.5 - compression ratio - x76
d) I compute DCT of both frame1 and frame2 and compute xor on that - compression ratio x72
So I don't see how b) would work - xoring input RGB doesn't make much sense for me. I think most efficient is c) as it also allows for blurring frame1 and frame2 before taking diff and thus greatly enhances compression

Related

Image sensor linear matrix coefficients (color reproduction), how are they applied?

I have some raw images to debayer then apply colour corrections/transforms to. I use OpenCV and C++, and for the image sensor used the linear matrix coefficients are:
1.32 -0.46 0.14
-0.36 1.25 0.11
0.08 -1.96 1.88
I am not sure how to apply these to the image. It's not clear to me what I am supposed to do with them and why.
Can anyone explain what these colour reproduction or colour matrix values are, and how to use them to process an image?
Thank you!
Your question is not clear because it seems you also don't know what to do.
"what I am supposed to do with them"
First thing coming to my mind, you can convolve image with that matrix by using filter2D. According to documentation filter2D:
Convolves an image with the kernel.
The function applies an arbitrary linear filter to an image. In-place
operation is supported. When the aperture is partially outside the
image, the function interpolates outlier pixel values according to the
specified border mode.
Here is the example code snippet hpw tp use it:
Mat output;
Mat kernelMatrix = (Mat_<double>(3, 3) << 1.32, -0.46, 0.14,
-0.36, 1.25, 0.11,
0.08, -1.96, 1.88);
filter2D(rawImage, output, -1, kernelMatrix);
Before debayering you have an array B (-ayer) of MxN filtered "graylevel" values. They are physically filtered in the sense that the the number of photons measured by each one of them is affected by the color filter on top of each sensor site.
After debayering you have an array C (-olor) of MxNx3 BGR values, obtained by (essentially) reindexing the B array. However, each of the 3 values at a (row, col) image location represents 3 physical measurements. This is not the final image because we still need to "convert" the physical measurements to numbers that are representative of color channels as perceived by a human (or, more generally, by the intended user, which could also be some kind of image processing software). That is, the physical values need to be mapped to a color space.
The 3x3 "color correction" matrix you have represents one possible mapping - a simple linear one. You need to apply it in turn to each BGR triple at all (row, col) pixel locations. For example (in python/numpy/cv2):
import numpy as np
def colorCorrect(img, M):
"""Applies a color correction M to a BGR image img"""
rows, cols, depth = img.shape
assert depth == 3
assert M.shape == (3, 3)
img_corr = np.zeros((rows, cols, 3), dtype=img.dtype)
for r in range(rows):
for c in range(cols):
img_corr[r, c, :] = M.dot(img[r, c, :])
return img_corr

Performing threshold operation on an RGB image

I need to perform a threshold operation on an RGB image. The thresholding that I intend to do should behave as follows.
If greyscale equivalent of a pixel ( calculated as 0.299 * R' + 0.587 * G' + 0.114 * B' ) is Y, then the pixel value of the output image will be:
P = Threshold_color, if Y < threshold_value
= (R,G,B), Original value
,where Threshold_color is an RGB color value,
I wanted to perform this operation using Intel IPP library. There I found few API's related to thresholding of images. (ippiThreshold_LTVal_8u_C3R)
But these methods seems to work only on one data point at a time. But the thresholding that I want to do depends on the combination of 3 different values (R, G, B).
Is there a way to achieve this through IPP library?
Suggested approach:
Copy the image into a greyscale image
Create a binary mask 0/1 (same size as greyscale image) using the threshold
Multiply this mask with the replacement color you want to generate an overlay
Apply the overlay to the original image.
Note that you're generating images of different types here: first greyscale, then black&white, and finally color images again (although in step 3 it's a monochromatic image)
Yes you can implement this using IPP but I'm not aware of any standard function that does what you want.
All IPP threshold operations I can find in the reference use a global threshold.

interpolation for smooth downscale of image in OpenCV

I noticed that of the two methods below for scaling an image N halfs that the first produced a more smooth image, looking more appealing to the eye.
while (lod-- > Payload->MaxZoom)
{
cv::resize(img, img, cv::Size(), 0.5, 0.5, cv::INTER_LINEAR);
}
vs
double scale = 1.0 / (1<< (lod - Payload->MaxZoom));
cv::resize(img, img, cv::Size(), scale, scale, cv::INTER_LINEAR);
I am interested in knowing if there is a interpolation that would produce similar result as the first resize but not having to loop over it N times.
Any mathematical insight into why doing the resize in multiply steps can result in a better result is also interesting.
The latter method above gives a very pixelated result (for N=5) where the first is very smooth (it makes sense since its the average of 4 pixel over N steps)
This happens because OpenCV's implementation of linear interpolation is rather simplistic.
A simple implementation of linear interpolation takes the values of four pixels closest to the interpolated point and interpolates between them. This is all right for upscaling, but for downscaling, this will ignore the values of many pixels - if there are N pixels in the output image, then it depends on at most 4N pixels of the input. This cannot give good results when the product of scaling factors is lower than 0.25.
The correct thing to do is to consider all input pixels that correspond to an output pixel after the transformation, and compute an average over them (or more generally, compute a convolution with a suitable resampling filter).
OpenCV seems to have an interpolation mode called cv::INTER_AREA, which should do the thing you want.

Determine difference in stops between images with no EXIF data

I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.

Cement Effect - Artistic Effect

I wish to give an effect to images, where the resultant image would appear as if it is painted on a rough cemented background, and the cemented background customizes itself near the edges to highlight them... Please help me in writing an algorithm to generate such an effect.
The first image is the original image
and the second image is the output im looking for.
please note the edges are detected and the mask changes near the edges to indicate the edges clearly
You need to read up on Bump Mapping. There are plenty of bump mapping algorithms.
The basic algorithm is:
for each pixel
Look up the position on the bump map texture that corresponds to the position on the bumped image.
Calculate the surface normal of the bump map
Add the surface normal from step 2 to the geometric surface normal (in case of an image it's a vector pointing up) so that the normal points in a new direction.
Calculate the interaction of the new 'bumpy' surface with lights in the scene using, for example, Phong shading -- light placement is up to you, and decides where will the shadows lie.
Finally, here's a plain C implementation for 2D images.
Starting with
1) the input image as R, G, B, and
2) a texture image, grayscale.
The images are likely in bytes, 0 to 255. Divide it by 255.0 so we have them as being from 0.0 to 1.0. This makes the math easier. For performance, you wouldn't actually do this but instead use clever fixed-point math, an implementation matter I leave to you.
First, to get the edge effects between different colored areas, add or subtract some fraction of the R, G, and B channels to the texture image:
texture_mod = texture - 0.2*R - 0.3*B
You could get fancier with with nonlinear forumulas, e.g. thresholding the R, G and B channels, or computing some mathematical expression involving them. This is always fun to experiment with; I'm not sure what would work best to recreate your example.
Next, compute an embossed version of texture_mod to create the lighting effect. This is the difference of the texture slid up and right one pixel (or however much you like), and the same texture slid. This give the 3D lighting effect.
emboss = shift(texture_mod, 1,1) - shift(texture_mod, -1, -1)
(Should you use texture_mod or the original texture data in this formula? Experiment and see.)
Here's the power step. Convert the input image to HSV space. (LAB or other colorspaces may work better, or not - experiment and see.) Note that in your desired final image, the cracks between the "mesas" are darker, so we will use the original texture_mod and the emboss difference to alter the V channel, with coefficients to control the strength of the effect:
Vmod = V * ( 1.0 + C_depth * texture_mod + C_light * emboss)
Both C_depth and C_light should be between 0 and 1, probably smaller fractions like 0.2 to 0.5 or so. You will need a fudge factor to keep Vmod from overflowing or clamping at its maximum - divide by (1+C_depth+C_light). Some clamping at the bright end may help the highlights look brighter. As always experiment and see...
As fine point, you could also modify the Saturation channel in some way, perhaps decreasing it where texture_mod is lower.
Finally, convert (H, S, Vmod) back to RGB color space.
If memory is tight or performance critical, you could skip the HSV conversion, and apply the Vmod formula instead to the individual R,G, B channels, but this will cause shifts in hue and saturation. It's a tradeoff between speed and good looks.
This is called bump mapping. It is used to give a non flat appearance to a surface.