fast Linear into sRGB - c++

It is good way to convert color from Linear space from 0.0 to 1.0 into sRGB space from 0 to 255 by using Lookup table in this manner?
Example, in Java:
byte[] table;
void initializeTable()
{
table = new byte[65536];
for(int i = 0; i < table.length; i++){
float Lin = i / (float) (table.length-1);
if(Lin<=0.0031308f)
table[i] = (byte)(255*(Lin*12.92f));
else
table[i] = (byte)(255*( (1+0.055f)*Math.pow(Lin,1/2.4f)-0.055f) );
}
}
int sRGB(float Linear/*in range 0..1*/) // Will return 0..255 integer sRGB
{
return 255 & table[(int)( Linear*(table.length-1) )];
}

Yes, that looks fine to me. You can probably use a smaller table, in fact.
If I were feeling pedantic, I'd say that your conversion from float to int always makes the value slightly darker. You can improve that by changing line 6:
float Lin = (i+0.5) / (float) (table.length-1);
or by changing line 16:
return 255 & table[(int)( Linear*(table.length-1) + 0.5 )];
but not both. Either fix centres the rounding error at zero. However, the rounding error still exists, and it is so small that it's not really worth mentioning, so I think your code is fine.

Related

Weird but close fft and ifft of image in c++

I wrote a program that loads, saves, and performs the fft and ifft on black and white png images. After much debugging headache, I finally got some coherent output only to find that it distorted the original image.
input:
fft:
ifft:
As far as I have tested, the pixel data in each array is stored and converted correctly. Pixels are stored in two arrays, 'data' which contains the b/w value of each pixel and 'complex_data' which is twice as long as 'data' and stores real b/w value and imaginary parts of each pixel in alternating indices. My fft algorithm operates on an array structured like 'complex_data'. After code to read commands from the user, here's the code in question:
if (cmd == "fft")
{
if (height > width) size = height;
else size = width;
N = (int)pow(2.0, ceil(log((double)size)/log(2.0)));
temp_data = (double*) malloc(sizeof(double) * width * 2); //array to hold each row of the image for processing in FFT()
for (i = 0; i < (int) height; i++)
{
for (j = 0; j < (int) width; j++)
{
temp_data[j*2] = complex_data[(i*width*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*width*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) width; j++)
{
complex_data[(i*width*2)+(j*2)] = temp_data[j*2];
complex_data[(i*width*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, width, height); //tested
free(temp_data);
temp_data = (double*) malloc(sizeof(double) * height * 2);
for (i = 0; i < (int) width; i++)
{
for (j = 0; j < (int) height; j++)
{
temp_data[j*2] = complex_data[(i*height*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*height*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) height; j++)
{
complex_data[(i*height*2)+(j*2)] = temp_data[j*2];
complex_data[(i*height*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, height, width);
free(temp_data);
free(data);
data = complex_to_real(complex_data, image.size()/4); //tested
image = bw_data_to_vector(data, image.size()/4); //tested
cout << "*** fft success ***" << endl << endl;
void FFT(double* data, unsigned long nn, int f_or_b){ // f_or_b is 1 for fft, -1 for ifft
unsigned long n, mmax, m, j, istep, i;
double wtemp, w_real, wp_real, wp_imaginary, w_imaginary, theta;
double temp_real, temp_imaginary;
// reverse-binary reindexing to separate even and odd indices
// and to allow us to compute the FFT in place
n = nn<<1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
swap(data[j-1], data[i-1]);
swap(data[j], data[i]);
}
m = nn;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
};
// here begins the Danielson-Lanczos section
mmax = 2;
while (n > mmax) {
istep = mmax<<1;
theta = f_or_b * (2 * M_PI/mmax);
wtemp = sin(0.5 * theta);
wp_real = -2.0 * wtemp * wtemp;
wp_imaginary = sin(theta);
w_real = 1.0;
w_imaginary = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j = i + mmax;
temp_real = w_real * data[j-1] - w_imaginary * data[j];
temp_imaginary = w_real * data[j] + w_imaginary * data[j-1];
data[j-1] = data[i-1] - temp_real;
data[j] = data[i] - temp_imaginary;
data[i-1] += temp_real;
data[i] += temp_imaginary;
}
wtemp = w_real;
w_real += w_real * wp_real - w_imaginary * wp_imaginary;
w_imaginary += w_imaginary * wp_real + wtemp * wp_imaginary;
}
mmax=istep;
}}
My ifft is the same only with the f_or_b set to -1 instead of 1. My program calls FFT() on each row, transposes the image, calls FFT() on each row again, then transposes back. Is there maybe an error with my indexing?
Not an actual answer as this question is Debug only so some hints instead:
your results are really bad
it should look like this:
first line is the actual DFFT result
Re,Im,Power is amplified by a constant otherwise you would see a black image
the last image is IDFFT of the original not amplified Re,IM result
the second line is the same but the DFFT result is wrapped by half size of image in booth x,y to match the common results in most DIP/CV texts
As you can see if you IDFFT back the wrapped results the result is not correct (checker board mask)
You have just single image as DFFT result
is it power spectrum?
or you forget to include imaginary part? to view only or perhaps also to computation somewhere as well?
is your 1D **DFFT working?**
for real data the result should be symmetric
check the links from my comment and compare the results for some sample 1D array
debug/repair your 1D FFT first and only then move to the next level
do not forget to test Real and complex data ...
your IDFFT looks BW (no gray) saturated
so did you amplify the DFFT results to see the image and used that for IDFFT instead of the original DFFT result?
also check if you do not round to integers somewhere along the computation
beware of (I)DFFT overflows/underflows
If your image pixel intensities are big and the resolution of image too then your computation could loss precision. Newer saw this in images but if your image is HDR then it is possible. This is a common problem with convolution computed by DFFT for big polynomials.
Thank you everyone for your opinions. All that stuff about memory corruption, while it makes a point, is not the root of the problem. The sizes of data I'm mallocing are not overly large, and I am freeing them in the right places. I had a lot of practice with this while learning c. The problem was not the fft algorithm either, nor even my 2D implementation of it.
All I missed was the scaling by 1/(M*N) at the very end of my ifft code. Because the image is 512x512, I needed to scale my ifft output by 1/(512*512). Also, my fft looks like white noise because the pixel data was not rescaled to fit between 0 and 255.
Suggest you look at the article http://www.yolinux.com/TUTORIALS/C++MemoryCorruptionAndMemoryLeaks.html
Christophe has a good point but he is wrong about it not being related to the problem because it seems that in modern times using malloc instead of new()/free() does not initialise memory or select best data type which would result in all problems listed below:-
Possibly causes are:
Sign of a number changing somewhere, I have seen similar issues when a platform invoke has been used on a dll and a value is passed by value instead of reference. It is caused by memory not necessarily being empty so when your image data enters it will have boolean maths performed on its values. I would suggest that you make sure memory is empty before you put your image data there.
Memory rotating right (ROR in assembly langauge) or left (ROL) . This will occur if data types are being used which do not necessarily match, eg. a signed value entering an unsigned data type or if the number of bits is different in one variable to another.
Data being lost due to an unsigned value entering a signed variable. Outcomes are 1 bit being lost because it will be used to determine negative or positive, or at extremes if twos complement takes place the number will become inverted in meaning, look for twos complement on wikipedia.
Also see how memory should be cleared/assigned before use. http://www.cprogramming.com/tutorial/memory_debugging_parallel_inspector.html

Implementing FFT low-pass filter in C with FFTW

I am trying to create a very simple C++ program that given an argument in range [0-100] applies a low-pass filter to a grayscale image that should "compress" it proprotionally to the value of the given argument.
I am using the FFTW library.
I have some doubts about how I define the frequency threshold, cut. Is there any more effective way to define such value?
//fftw_complex *fft
//double[] magnitude
// . . .
int percent = 100;
if (percent < 0 || percent > 100) {
cerr << "Compression rate must be a value between 0 and 100." << endl;
return -1;
}
double cut =(double)(w*h) * ((double)percent / (double)100);
for (i = 0; i < (w * h); i++) {
magnitude[i] = sqrt(pow(fft[i][0], 2.0) + pow(fft[i][1], 2.0));
if (magnitude[i] < cut) {
fft[i][0] = 0.0;
fft[i][1] = 0.0;
}
}
Update1:
I've changed my code to this, but again I'm not sure this is a proper way to filter frequencies. The image is surely compressed, but non-square images are messed up and setting compression to 100% isn't the real maximum compression available (I can go up to ~140%).
Here you can find an image of what I see now.
int cX = w/2;
int cY = h/2;
cout<<"TEST "<<((double)percent/(double)100)*h<<endl;
for(i = 0; i<(w*h);i++){
int row = i/s;
int col = i%s;
int distance = sqrt((col-cX)*(col-cX)+(row-cY)*(row-cY));
if(distance<((double)percent/(double)100)*min(cX,cY)){
fft[i][0] = 0.0;
fft[i][1] = 0.0;
}
}
This is not a low-pass filter at all. A low-pass filter passes low frequencies, i.e. it removes fine details (blurring). You obviously need a 2D FFT for that.
This code just removes random bits, essentially.
[edit]
The new code looks a lot more like a low-pass filter. The 141% setting is expected: the diagonal of a square is sqrt(2)=1.41 times its side. Converting an index into a row/column pair should use the image width, not some random unexplained s.
I don't know where your zero frequency is located. That should be easy to spot (largest value) but it might be in (0,0) instead of (w/2,h/2)

Floyd Steinberg Dithering gray(pgm ascii) to black-white (pbm ascii)

I have image in pgm
after using this function:
void convertWithDithering(array_type& pixelgray)
{
int oldpixel;
int newpixel;
int quant_error;
for (int y = 0; y< HEIGHT-1; y++){
for (int x = 1; x<WIDTH-1; x++){
oldpixel = pixelgray[x][y];
newpixel = (oldpixel > 128) ? 0 : 1;
pixelgray[x][y] = newpixel;
quant_error = oldpixel - newpixel;
pixelgray[x+1][y] = pixelgray[x+1][y] + 7/16 * quant_error;
pixelgray[x-1][y+1] = pixelgray[x-1][y+1] + 3/16 * quant_error;
pixelgray[x ][y+1]=pixelgray[x ][y+1]+ 5/16 * quant_error;
pixelgray[x+1][y+1] = pixelgray[x+1][y+1]+ 1/16 * quant_error;
}
}
}
i have this
I want to get the same image only in black white colors
Last time I had a simililar smeering with a PGM file, it was because I saved data in a file opened by fopen(filename,"w");
The file had a lot of \r\n line endings (os: windows), whereas it needed only \n. Maybe your issue is something like that. Save the file in binary format, with
fopen(filename,"wb");
Edit: Asides from the smeering, your implementation of Floyd–Steinberg dithering is incorrect.
First, your error propagation should be XXX * quant_error /16 instead of XXX/16 * quant_error (which will always be equal to 0).
Then, you are mixing up the two color spaces (0/1 and 0->255). A correct way to handle it is to always use the 0->255 space by changing to test line to
newpixel = (oldpixel > 128) ? 255 : 0;
(note that the order 255 : 0 is important, if you let 0 : 255, the algorithm won't work)
At the end of the function, your array will be full or 0 or 255. If you want, you can iterate one more time to convert it to 0-1, but I think it is easier to do it once you record your pbm file.
It looks like the conversion to PBM outside of this function is doing something wrong. It looks like it is converting one PGM pixel to several PBM pixels, which results in this 'smeering' effect. Just a wild guess tho.
The function itself looks okay to me, apart from one little thing: I think due to you using int for everything, all your 5/16 * quant_error will be zero. Rather use floats or doubles and make it 5.0/16.0.

Map float values(0.0, 100.0) into RGB

I have around 1000 float values in the range(0.0, 100.0) and I want to map these values into a color(RGB). What I did so far is to create a colormap with 1000 color(RGB) values, use the float values to index the colormap and get an RGB value.
But the problem is, I'm loosing precision since I cast float values into int before using them as indices to my colormap. What is the best way to do this float to rgb conversion?
EDIT:
color color_list[100];
float float_values[1000]
for(i = 0 to 999)
{
int colormap_idx = float_values[i]; // Note that the float is converted into an int
color current_color = color_list[colormap_idx];
}
The total number of RGB values you can have is 256^3. It would be nice if you could utilize all of them, but sometimes it can be hard to come up with a nice intuitive mapping. Since there are a total possible of 256^4 floats (more than possible RGB values) you will lose precision no matter what you do, but you can still do much, much better than what you currently.
I don't know exactly what you are doing with the pre-defined color map, but consider defining only a few intermediate colors that correspond to a few intermediate floating values and interpolating each input floating point value. In the code below, fsample and csample are your corresponding points. For example:
fsample[0] = 0.0 -> csample[0] = (0, 0, 0)
fsample[1] = 0.25 -> csample[1] = (0, 0, 100)
fsample[2] = 0.5 -> csample[2] = (0, 170, 170)
fsample[3] = 0.75 -> csample[3] = (170, 170, 0)
fsample[4] = 1.0 -> csample[4] = (255, 255, 255)
This will allow you to cover a lot more ground in RGB space with floats, allowing a higher precision conversion, while still giving you some power to flexibly define intermediate colors. This is a fairly common method to convert grayscale to color.
There are a few optimizations and error checks you can apply to this code, but I left it unoptimized for the sake of clarity:
int N = float_values.size();
color colormap[N];
for(i = 0 to N)
{
colormap[i] = RGBFromFloat(float_values[i], fsample, csample, num_samples);
}
color RGBFromFloat(float in, float fsample[], float csample[], num_samples)
{
color out;
// find the interval that the input 'in' lies in
// this is a simple search on an ordered array...
// consider replacing with a better algorithm for a large number of samples
for(i = 0 to num_samples-1)
{
if(fsample[i] =< in && in < fsample[i+1])
{
out = interpolate(fsample[i], fsample[i+1], csample[i], csample[i+1], in);
break;
}
}
return color;
}
color interpolate(float flow, float fhigh, color clow, color chigh, float in)
{
float t = (in-flow)/(fhigh-flow);
return clow*(1 - t) + chigh*t
}
I don't know if this is the best method (since you gave us no optimality criteria), but if by "I'm losing precision" you mean that once converted to int, you only have a maximum of 100 different color combinations, then you can do this:
// this code is C99
#define MAX_FLOAT_VAL 100.0
#define N_COLORS 2000
#define N_FLOAT_SAMPLES 1000
color color_list[N_COLORS];
float float_values[N_FLOAT_SAMPLES];
// the following loop must be placed in some function
for( int i = 0; i < N_FLOAT_SAMPLES; i++ )
{
// the following assignment will map
// linearly a float in the range [0 ... MAX_FLOAT_VAL]
// into an int in the range [0 ... (N_COLORS-1)]
int colormap_idx = (float_values[i] / MAX_FLOAT_VAL) * (N_COLORS - 1);
color current_color = color_list[colormap_idx];
// ... do something with current_color ...
}
Of course you still have to generate the entries in color_list with a suitable algorithm (I advice against doing that by hand :-). This is a whole different problem, since it involves more "degrees of freedom", since you try to map a 1-D space (the values of colormap_idx) to a 3-D space (the set of all the possible RGB triples).
P.S: the requirements you seem to have remind me of the computations needed to colorize a fractal like the graphic representation of the Mandelbrot's set.
Hope this helps.

Modifing data from uint8_t array very slow?

I'm currently trying to optimize my code to run a bit faster. Currently it is taking about +30ms to update about 3776000 bytes. If I remove the outPx updates inside my function it runs at about 3ms meaning that the updates to outPx is what is making the function slower.
Any potential feedback on how to improve the speed of my function below would be greatly appreciated.
uint8_t* outPx = (uint8_t*)out.data;
for (int px=0; px<pxSize; px+=4)
{
newTopAlpha = (alpha*inPx[px+3]);
if (0xff == newTopAlpha)
{
// top is opaque covers entire bottom
// set copy over BGR colors
outPx[px] = inPx[px];
outPx[px+1] = inPx[px+1];
outPx[px+2] = inPx[px+2];
outPx[px+3] = 0xff; //Fully opaque
}
else if (0x00 != newTopAlpha)
{
// top is not completely transparent
topAlpha = newTopAlpha/(float)0xff;
bottomAlpha = outPx[px+3]/(float)0xff;
newAlpha = topAlpha + bottomAlpha*(1-topAlpha);
alphaChange = bottomAlpha*(1-topAlpha);
outPx[px] = (uint8_t)((inPx[px]*topAlpha + outPx[px]*alphaChange)/newAlpha);
outPx[px+1] = (uint8_t)((inPx[px+1]*topAlpha + outPx[px+1]*alphaChange)/newAlpha);
outPx[px+2] = (uint8_t)((inPx[px+2]*topAlpha + outPx[px+2]*alphaChange)/newAlpha);
outPx[px+3] = (uint8_t)(newAlpha*0xff);
}
}
uint8_t is an exact width integer type, meaning that you demand the compiler to allocate exactly that much memory for your type. If your system has an alignment requirement, this may cause the code to run slower.
Change uint8_t to uint_fast8_t. This tells the compiler that you want this variable to be 8 bits if possible, but that it is ok to use a larger size if it makes the code faster.
Apart from that, there are lots of things that could cause bad performance, in which case you need to state what system and compiler you are using.
Your code is doing floating point divides, and conversion from byte to float and back again. If you use integer math, it is highly likely more efficient.
Even doing this simple conversion to multiply instead of divide may help quite a bit:
newAlpha = 1/(topAlpha + bottomAlpha*(1-topAlpha));
...
outpx = (uint8_t)((inPx[px]*topAlpha + outPx[px]*alphaChange)*newAlpha);
Multiply tends to be much faster than divide.
OK, if this really is the bottleneck, and you can't use the GPU / built-in methods for some random reason, then there is a lot you can do:
uint8_t *outPx = (uint8_t*) out.data;
const int cAlpha = (int) (alpha * 256.0f + 0.5f);
for( int px = 0; px < pxSize; px += 4 ) {
const int topAlpha = (cAlpha * (int) inPx[px|3]) >> 8; // note | not + for tiny speed boost
if( topAlpha == 255 ) {
memcpy( &outPx[px], &inPx[px], 4 ); // might be slower than per-component copying; benchmark!
} else if( topAlpha ) {
const int bottomAlpha = (int) outPx[px|3];
const int alphaChange = (bottomAlpha * (255 - topAlpha)) / 255;
const int newAlpha = topAlpha + alphaChange;
outPx[px ] = (uint8_t) ((inPx[px ]*topAlpha + outPx[px ]*alphaChange) / newAlpha);
outPx[px|1] = (uint8_t) ((inPx[px|1]*topAlpha + outPx[px|1]*alphaChange) / newAlpha);
outPx[px|2] = (uint8_t) ((inPx[px|2]*topAlpha + outPx[px|2]*alphaChange) / newAlpha);
outPx[px|3] = (uint8_t) newAlpha;
}
}
The main change is that there is no floating point arithmetic any more (I might have missed a /255 or something, but you get the idea). I also removed repeated calculations and used bit operators where possible. Another optimisation would be to use fixed-precision arithmetic to change the 3 divides into a single divide and 3 multiply/bitshifts. But you'd have to benchmark to confirm that actually helps. The memcpy might be faster. Again, you need to benchmark.
Finally, if you know something about the images, you could give the compiler hints about the branching. For example, in GCC you can say if( __builtin_expect( topAlpha == 255, 1 ) ) if you know that most of the image is solid colour, and alpha is 1.0.
Update based on comments:
And for the love of sanity, never (never) benchmark with optimisations turned off.