Convert from linear RGB to XYZ - c++

This matrix is supposed to convert from linear RGBA to XYZ, preserving the alpha channel as it is:
vec4 M[4]=
{
vec4{0.4124564f,0.3575761f,0.1804375f,0.0f}
,vec4{0.2126729f,0.7151522f,0.0721750f,0.0f}
,vec4{0.0193339f,0.1191920f,0.9503041f,0.0f}
,vec4{0.0,0.0f,0.0f,1.0f}
};
Is that correct? Where can I find the values in double precision? I am asking, because the second row is very close to the luma formula, which from what I understood is associated with non-linear sRGB values:
vec4 weights{0.2126f,0.7152f,0.0722f,0.0f};
auto temp=m_data*weights; //vectorized multiplication
return temp[0] + temp[1] + temp[2] + temp[3]; //Sum it up to compute the dot product (weighted average)
Other questions: Should the weights discussed actually be identical? Should conversion to Y'CbCr use the same weights? Should it be performed in linear or sRGB space?

This matrix converts from a sRGB flavour to CIE XYZ D65. This is however not the official sRGB matrix published in IEC 61966-2-1:1999 which is rounded to 4 digits as follows:
[[ 0.4124 0.3576 0.1805]
[ 0.2126 0.7152 0.0722]
[ 0.0193 0.1192 0.9505]]
Depending the context you are performing your conversions, it might be important to use the official IEC 61966-2-1:1999 matrix to get matching results with other third-party datasets as it is likely that they will be using the canonical matrix.
For reference here is a double precision conversion matrix computed with Colour:
[[0.412390799265960 0.357584339383878 0.180480788401834]
[0.212639005871510 0.715168678767756 0.072192315360734]
[0.019330818715592 0.119194779794626 0.950532152249661]]
And the code used to generate it:
import numpy as np
import colour
print(colour.models.sRGB_COLOURSPACE.RGB_to_XYZ_matrix)
np.set_printoptions(formatter={'float': '{:0.15f}'.format})
colour.models.sRGB_COLOURSPACE.use_derived_transformation_matrices(True)
print(colour.models.sRGB_COLOURSPACE.RGB_to_XYZ_matrix)
Should the weights discussed actually be identical?
For consistency in your computations you might want to use weights matching your matrix or you will get issues when doing conversions back and forth.
Should conversion to Y'CbCr use the same weights?
Y'CbCr has many variants and nobody will be able to answer properly if you don't know which variant you need.
Should it be performed in linear or sRGB space?
Y'CbCr conversions are pretty much always happening in gamma encoded values, ITU-R BT.2020 YcCbcCrc being a notable exception as it is based on linearly encoded values. It is also important to understand that the sRGB colourspace is also linear, as a matter of fact the matrix central to the discussion here is meant to be applied on linearly encoded sRGB values.
Those two last questions should probably be asked in another new question.

Related

Image sensor linear matrix coefficients (color reproduction), how are they applied?

I have some raw images to debayer then apply colour corrections/transforms to. I use OpenCV and C++, and for the image sensor used the linear matrix coefficients are:
1.32 -0.46 0.14
-0.36 1.25 0.11
0.08 -1.96 1.88
I am not sure how to apply these to the image. It's not clear to me what I am supposed to do with them and why.
Can anyone explain what these colour reproduction or colour matrix values are, and how to use them to process an image?
Thank you!
Your question is not clear because it seems you also don't know what to do.
"what I am supposed to do with them"
First thing coming to my mind, you can convolve image with that matrix by using filter2D. According to documentation filter2D:
Convolves an image with the kernel.
The function applies an arbitrary linear filter to an image. In-place
operation is supported. When the aperture is partially outside the
image, the function interpolates outlier pixel values according to the
specified border mode.
Here is the example code snippet hpw tp use it:
Mat output;
Mat kernelMatrix = (Mat_<double>(3, 3) << 1.32, -0.46, 0.14,
-0.36, 1.25, 0.11,
0.08, -1.96, 1.88);
filter2D(rawImage, output, -1, kernelMatrix);
Before debayering you have an array B (-ayer) of MxN filtered "graylevel" values. They are physically filtered in the sense that the the number of photons measured by each one of them is affected by the color filter on top of each sensor site.
After debayering you have an array C (-olor) of MxNx3 BGR values, obtained by (essentially) reindexing the B array. However, each of the 3 values at a (row, col) image location represents 3 physical measurements. This is not the final image because we still need to "convert" the physical measurements to numbers that are representative of color channels as perceived by a human (or, more generally, by the intended user, which could also be some kind of image processing software). That is, the physical values need to be mapped to a color space.
The 3x3 "color correction" matrix you have represents one possible mapping - a simple linear one. You need to apply it in turn to each BGR triple at all (row, col) pixel locations. For example (in python/numpy/cv2):
import numpy as np
def colorCorrect(img, M):
"""Applies a color correction M to a BGR image img"""
rows, cols, depth = img.shape
assert depth == 3
assert M.shape == (3, 3)
img_corr = np.zeros((rows, cols, 3), dtype=img.dtype)
for r in range(rows):
for c in range(cols):
img_corr[r, c, :] = M.dot(img[r, c, :])
return img_corr

Robust atan(y,x) on GLSL for converting XY coordinate to angle

In GLSL (specifically 3.00 that I'm using), there are two versions of
atan(): atan(y_over_x) can only return angles between -PI/2, PI/2, while atan(y/x) can take all 4 quadrants into account so the angle range covers everything from -PI, PI, much like atan2() in C++.
I would like to use the second atan to convert XY coordinates to angle.
However, atan() in GLSL, besides not able to handle when x = 0, is not very stable. Especially where x is close to zero, the division can overflow resulting in an opposite resulting angle (you get something close to -PI/2 where you suppose to get approximately PI/2).
What is a good, simple implementation that we can build on top of GLSL atan(y,x) to make it more robust?
I'm going to answer my own question to share my knowledge. We first notice that the instability happens when x is near zero. However, we can also translate that as abs(x) << abs(y). So first we divide the plane (assuming we are on a unit circle) into two regions: one where |x| <= |y| and another where |x| > |y|, as shown below:
We know that atan(x,y) is much more stable in the green region -- when x is close to zero we simply have something close to atan(0.0) which is very stable numerically, while the usual atan(y,x) is more stable in the orange region. You can also convince yourself that this relationship:
atan(x,y) = PI/2 - atan(y,x)
holds for all non-origin (x,y), where it is undefined, and we are talking about atan(y,x) that is able to return angle value in the entire range of -PI,PI, not atan(y_over_x) which only returns angle between -PI/2, PI/2. Therefore, our robust atan2() routine for GLSL is quite simple:
float atan2(in float y, in float x)
{
bool s = (abs(x) > abs(y));
return mix(PI/2.0 - atan(x,y), atan(y,x), s);
}
As a side note, the identity for mathematical function atan(x) is actually:
atan(x) + atan(1/x) = sgn(x) * PI/2
which is true because its range is (-PI/2, PI/2).
Depending on your targeted platform, this might be a solved problem. The OpenGL spec for atan(y, x) specifies that it should work in all quadrants, leaving behavior undefined only when x and y are both 0.
So one would expect any decent implementation to be stable near all axes, as this is the whole purpose behind 2-argument atan (or atan2).
The questioner/answerer is correct in that some implementations do take shortcuts. However, the accepted solution makes the assumption that a bad implementation will always be unstable when x is near zero: on some hardware (my Galaxy S4 for example) the value is stable when x is near zero, but unstable when y is near zero.
To test your GLSL renderer's implementation of atan(y,x), here's a WebGL test pattern. Follow the link below and as long as your OpenGL implementation is decent, you should see something like this:
Test pattern using native atan(y,x): http://glslsandbox.com/e#26563.2
If all is well, you should see 8 distinct colors (ignoring the center).
The linked demo samples atan(y,x) for several values of x and y, including 0, very large, and very small values. The central box is atan(0.,0.)--undefined mathematically, and implementations vary. I've seen 0 (red), PI/2 (green), and NaN (black) on hardware I've tested.
Here's a test page for the accepted solution. Note: the host's WebGL version lacks mix(float,float,bool), so I added an implementation that matches the spec.
Test pattern using atan2(y,x) from accepted answer: http://glslsandbox.com/e#26666.0
Your proposed solution still fails in the case x=y=0. Here both of the atan() functions return NaN.
Further I would not rely on mix to switch between the two cases. I am not sure how this is implemented/compiled, but IEEE float rules for x*NaN and x+NaN result again in NaN. So if your compiler really used mix/interpolation the result should be NaN for x=0 or y=0.
Here is another fix which solved the problem for me:
float atan2(in float y, in float x)
{
return x == 0.0 ? sign(y)*PI/2 : atan(y, x);
}
When x=0 the angle can be ±π/2. Which of the two depends on y only. If y=0 too, the angle can be arbitrary (vector has length 0). sign(y) returns 0 in that case which is just ok.
Sometimes the best way to improve the performance of a piece of code is to avoid calling it in the first place. For example, one of the reasons you might want to determine the angle of a vector is so that you can use this angle to construct a rotation matrix using combinations of the angle's sine and cosine. However, the sine and cosine of a vector (relative to the origin) are already hidden in plain sight inside the vector itself. All you need to do is to create a normalized version of the vector by dividing each vector coordinate by the total length of the vector. Here's the two-dimensional example to calculate the sine and cosine of the angle of vector [ x y ]:
double length = sqrt(x*x + y*y);
double cos = x / length;
double sin = y / length;
Once you have the sine and cosine values, you can now directly populate a rotation matrix with these values to perform a clockwise or counterclockwise rotation of arbitrary vectors by the same angle, or you can concatenate a second rotation matrix to rotate to an angle other than zero. In this case, you can think of the rotation matrix as "normalizing" the angle to zero for an arbitrary vector. This approach is extensible to the three-dimensional (or N-dimensional) case as well, although for example you will have three angles and six sin/cos pairs to calculate (one angle per plane) for 3D rotation.
In situations where you can use this approach, you get a big win by bypassing the atan calculation completely, which is possible since the only reason you wanted to determine the angle was to calculate the sine and cosine values. By skipping the conversion to angle space and back, you not only avoid worrying about division by zero, but you also improve precision for angles which are near the poles and would otherwise suffer from being multiplied/divided by large numbers. I've successfully used this approach in a GLSL program which rotates a scene to zero degrees to simplify a computation.
It can be easy to get so caught up in an immediate problem that you can lose sight of why you need this information in the first place. Not that this works in every case, but sometimes it helps to think out of the box...
A formula that gives an angle in the four quadrants for any value
of coordinates x and y. For x=y=0 the result is undefined.
f(x,y)=pi()-pi()/2*(1+sign(x))* (1-sign(y^2))-pi()/4*(2+sign(x))*sign(y)
-sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))

Determine difference in stops between images with no EXIF data

I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.

Cement Effect - Artistic Effect

I wish to give an effect to images, where the resultant image would appear as if it is painted on a rough cemented background, and the cemented background customizes itself near the edges to highlight them... Please help me in writing an algorithm to generate such an effect.
The first image is the original image
and the second image is the output im looking for.
please note the edges are detected and the mask changes near the edges to indicate the edges clearly
You need to read up on Bump Mapping. There are plenty of bump mapping algorithms.
The basic algorithm is:
for each pixel
Look up the position on the bump map texture that corresponds to the position on the bumped image.
Calculate the surface normal of the bump map
Add the surface normal from step 2 to the geometric surface normal (in case of an image it's a vector pointing up) so that the normal points in a new direction.
Calculate the interaction of the new 'bumpy' surface with lights in the scene using, for example, Phong shading -- light placement is up to you, and decides where will the shadows lie.
Finally, here's a plain C implementation for 2D images.
Starting with
1) the input image as R, G, B, and
2) a texture image, grayscale.
The images are likely in bytes, 0 to 255. Divide it by 255.0 so we have them as being from 0.0 to 1.0. This makes the math easier. For performance, you wouldn't actually do this but instead use clever fixed-point math, an implementation matter I leave to you.
First, to get the edge effects between different colored areas, add or subtract some fraction of the R, G, and B channels to the texture image:
texture_mod = texture - 0.2*R - 0.3*B
You could get fancier with with nonlinear forumulas, e.g. thresholding the R, G and B channels, or computing some mathematical expression involving them. This is always fun to experiment with; I'm not sure what would work best to recreate your example.
Next, compute an embossed version of texture_mod to create the lighting effect. This is the difference of the texture slid up and right one pixel (or however much you like), and the same texture slid. This give the 3D lighting effect.
emboss = shift(texture_mod, 1,1) - shift(texture_mod, -1, -1)
(Should you use texture_mod or the original texture data in this formula? Experiment and see.)
Here's the power step. Convert the input image to HSV space. (LAB or other colorspaces may work better, or not - experiment and see.) Note that in your desired final image, the cracks between the "mesas" are darker, so we will use the original texture_mod and the emboss difference to alter the V channel, with coefficients to control the strength of the effect:
Vmod = V * ( 1.0 + C_depth * texture_mod + C_light * emboss)
Both C_depth and C_light should be between 0 and 1, probably smaller fractions like 0.2 to 0.5 or so. You will need a fudge factor to keep Vmod from overflowing or clamping at its maximum - divide by (1+C_depth+C_light). Some clamping at the bright end may help the highlights look brighter. As always experiment and see...
As fine point, you could also modify the Saturation channel in some way, perhaps decreasing it where texture_mod is lower.
Finally, convert (H, S, Vmod) back to RGB color space.
If memory is tight or performance critical, you could skip the HSV conversion, and apply the Vmod formula instead to the individual R,G, B channels, but this will cause shifts in hue and saturation. It's a tradeoff between speed and good looks.
This is called bump mapping. It is used to give a non flat appearance to a surface.

Converting an FFT to a spectogram

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.
I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.
My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value
const float realValue = buffer[(y * 2) + 0];
const float imagValue = buffer[(y * 2) + 1];
const float value = sqrt( (realValue * realValue) + (imagValue * imagValue) );
This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.
Any advice and help would be much appreciated!
Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?
Edit2: I get really nice looking results by doing the following:
size_t count2 = 0;
size_t max2 = kFFTSize + 2;
while( count2 < max2 )
{
const float realValue = buffer[(count2) + 0];
const float imagValue = buffer[(count2) + 1];
const float value = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
buffer[count2 >> 1] = value;
count2 += 2;
}
To my eye this even looks better than most other spectrogram implementations I have looked at.
Is there anything MAJORLY wrong with what I'm doing?
The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.
So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)
For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.
mag
/\
|
| ! !
| ! ! !
+--!---!----!----!---!--> freq
0 Fs/2 Fs
Now we need to figure out the frequencies.
Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512
So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.
Just so people know I've done a LOT of work on this whole problem. The main thing I've discovered is that the FFT requires normalisation after doing it.
To do this you average all the values of your window vector together to get a value somewhat less than 1 (or 1 if you are using a rectangular window). You then divide that number by the number of frequency bins you have post the FFT transform.
Finally you divide the actual number returned by the FFT by the normalisation number. Your amplitude values should now be in the -Inf to 1 range. Log, etc, as you please. You will still be working with a known range.
There are a few things that I think you will find helpful.
The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.
The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.
The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation
const float value = sqrt((realValue * realValue) + (imagValue * imagValue));
retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.
Hope this is helpful.
If you are getting strange results then one thing to check is the documentation for the FFT library to see how the output is packed. Some routines use a packed format where real/imaginary values are interleaved, or they may begin at the N/2 element and wrap around.
For a sanity check I would suggest creating sample data with known characteristics, eg Fs/2, Fs/4 (Fs = sample frequency) and compare the output of the FFT routine with what you'd expect. Try creating both a sine and cosine at the same frequency, as these should have the same magnitude in the spectrum, but have different phases (ie the realValue/imagValue will differ, but the sum of squares should be the same.
If you're intending on using the FFT though then you really need to know how it works mathematically, otherwise you're likely to encounter other strange problems such as aliasing.