How to get the calculate the RGB values of a pixel from the luminance? - c++

I want to compute the RGB values from the luminance.
The data that I know are :
the new luminance (the value that I want to apply)
the old luminance
the old RGB values.
We can compute the luminance from the RGB values like this :
uint8_t luminance = R * 0.21 + G * 0.71 + B * 0.07;
My code is :
// We create a function to set the luminance of a pixel
void jpegImage::setLuminance(uint8_t newLuminance, unsigned int x, unsigned int y) {
// If the X or Y value is out of range, we throw an error
if(x >= width) {
throw std::runtime_error("Error : in jpegImage::setLuminance : The X value is out of range");
}
else if(y >= height) {
throw std::runtime_error("Error : in jpegImage::setLuminance : The Y value is out of range");
}
// If the image is monochrome
if(pixelSize == 1) {
// We set the pixel value to the luminance
pixels[y][x] = newLuminance;
}
// Else if the image is colored, we throw an error
else if(pixelSize == 3) {
// I don't know how to proceed
// My image is stored in a std::vector<std::vector<uint8_t>> pixels;
// This is a list that contain the lines of the image
// Each line contains the RGB values of the following pixels
// For example an image with 2 columns and 3 lines
// [[R, G, B, R, G, B], [R, G, B, R, G, B], [R, G, B, R, G, B]]
// For example, the R value with x = 23, y = 12 is:
// pixels[12][23 * pixelSize];
// For example, the B value with x = 23, y = 12 is:
// pixels[12][23 * pixelSize + 2];
// (If the image is colored, the pixelSize will be 3 (R, G and B)
// (If the image is monochrome the pixelSIze will be 1 (just the luminance value)
}
}
How can I proceed ?
Thanks !

You don't need the old luminance if you have the original RGB.
Referencing https://www.fourcc.org/fccyvrgb.php for YUV to RGB conversion.
Compute U and V from original RGB:
```
V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128
U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128
```
Y is the new luminance normalized to a value between 0 and 255
Then just convert back to RGB:
B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)
Make sure you clamp your computed values of each equation to be in range of 0..255. Some of these formulas can convert a YUV or RGB value to something less than 0 or higher than 255.
There's also multiple formula for converting between YUV and RGB. (Different constants). I noticed the page listed above has a different computation for Y than you cited. They are all relatively close with different precisions and adjustments. For just changing the brightness of a pixel, almost any formula will do.
Updated
I originally deleted this answer after the OP suggested it wasn't working for him. I was too busy for the last few days to investigate, but I wrote some sample code to confirm my hypothesis. At the bottom of this answer is a snippet of GDI+ based code increases the luminance of an image by a variable amount. Along with the code is an image that I tested this out on and two conversions. One at 130% brightness. Another at 170% brightness.
Here's a sample conversion
Original Image
Updated Image (at 130% Y)
Updated Image (at 170% Y)
Source:
#define CLAMP(val) {val = (val > 255) ? 255 : ((val < 0) ? 0 : val);}
void Brighten(Gdiplus::BitmapData& dataIn, Gdiplus::BitmapData& dataOut, const double YMultiplier=1.3)
{
if ( ((dataIn.PixelFormat != PixelFormat24bppRGB) && (dataIn.PixelFormat != PixelFormat32bppARGB)) ||
((dataOut.PixelFormat != PixelFormat24bppRGB) && (dataOut.PixelFormat != PixelFormat32bppARGB)))
{
return;
}
if ((dataIn.Width != dataOut.Width) || (dataIn.Height != dataOut.Height))
{
// images sizes aren't the same
return;
}
const size_t incrementIn = dataIn.PixelFormat == PixelFormat24bppRGB ? 3 : 4;
const size_t incrementOut = dataOut.PixelFormat == PixelFormat24bppRGB ? 3 : 4;
const size_t width = dataIn.Width;
const size_t height = dataIn.Height;
for (size_t y = 0; y < height; y++)
{
auto ptrRowIn = (BYTE*)(dataIn.Scan0) + (y * dataIn.Stride);
auto ptrRowOut = (BYTE*)(dataOut.Scan0) + (y * dataOut.Stride);
for (size_t x = 0; x < width; x++)
{
uint8_t B = ptrRowIn[0];
uint8_t G = ptrRowIn[1];
uint8_t R = ptrRowIn[2];
uint8_t A = (incrementIn == 3) ? 0xFF : ptrRowIn[3];
auto Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16;
auto V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128;
auto U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128;
Y *= YMultiplier;
auto newB = 1.164*(Y - 16) + 2.018*(U - 128);
auto newG = 1.164*(Y - 16) - 0.813*(V - 128) - 0.391*(U - 128);
auto newR = 1.164*(Y - 16) + 1.596*(V - 128);
CLAMP(newR);
CLAMP(newG);
CLAMP(newB);
ptrRowOut[0] = newB;
ptrRowOut[1] = newG;
ptrRowOut[2] = newR;
if (incrementOut == 4)
{
ptrRowOut[3] = A; // keep original alpha
}
ptrRowIn += incrementIn;
ptrRowOut += incrementOut;
}
}
}

Related

BGR -> YCbCr conversion not working correctly

I am trying to manually convert an image from RBG (BGR in OpenCV) to the YCbCr color space.
My image is a png color image, 800 width and 600 height, 3 channels, 16 bit depth.
Here's how I tried solving this.
cv::Mat convertToYCbCr(cv::Mat image) {
// converts an RGB image to YCbCr
// cv::Mat: B-G-R
std::cout << "Converting image to YCbCr color space." << std::endl;
int i, j;
for (i = 0; i <= image.cols; i++) {
for (j = 0; j <= image.rows; j++) {
// R, G, B values
auto R = image.at<cv::Vec3d>(j, i)[2];
auto G = image.at<cv::Vec3d>(j, i)[1];
auto B = image.at<cv::Vec3d>(j, i)[0];
// Y'
auto Y = image.at<cv::Vec3d>(j,i)[0] = 0.299 * R + 0.587 * G + 0.114 * B + 16;
// Cb
auto Cb = image.at<cv::Vec3d>(j,i)[1] = 128 + (-0.169 * R -0.331 * G + 0.5 * B);
// Cr
auto Cr = image.at<cv::Vec3d>(j,i)[2] = 128 + (0.5 * R -0.419 * G -0.081 * B);
std::cout << "At conversion: Y = " << Y << ", Cb = " << Cb << ", "
<< Cr << std::endl;
}
}
std::cout << "Converting finished." << std::endl;
return image;
}
The image I receive looks like this:
What I am expecting is this (using OpenCV method):
The vertical lines hint maybe at something? Is my loop wrong? Can I even just "replace" the RGB values with YCbCr values and expect the image to look like the example? typeid() returns the same value for both images, N2cv3MatE.
The primary reason for incorrect results being observed is the incorrect data-type used to access the image. The correct type for accessing 16 bit unsigned pixels is cv::Vec3w (not cv::Vec3d).
The next issue is that the coefficients that are being using for conversion are designed for analog signals ( YPbPr ). For digital images, we have to use coefficients designed for digital images ( YCbCr ). You can find more details on the Wikipedia article on YCbCr in section ITU-R BT.601 conversion.
The piece of information missing from the article is that how will the coefficients change if the images are of 16 bit unsigned depth or 32 bit floating point depth? The answer to this is that we will have to scale the coefficients according to the bit depth of our image.
For images with 16 bit unsigned depth, the scaling should be performed as follows:
auto Y = (R * 65.481f * scale) + (G * 128.553f * scale) + (B * 24.966f * scale) + (16.0f * offset);
auto Cb = (R * -37.797f * scale) + (G * -74.203f * scale) + (B * 112.0f * scale) + (128.0f * offset);
auto Cr = (R * 112.0f * scale) + (G * -93.786f * scale) + (B * -18.214f * scale) + (128.0f * offset);
where scale is equal to 257.0/65535.0 and offset is equal to 257.0.
This conversion technique has been adopted from MATLAB source code for rgb2ycbcr function which references the following book describing the scaling:
C.A. Poynton, "A Technical Introduction to Digital Video", John Wiley
& Sons, Inc., 1996, Chapter 9, Page 175`
Now that the conversion has been done, the third issue we face is the visualization of image similar to that of OpenCV. When we perform color conversion with OpenCV, the output image is stored in the order YCrCb instead of the usual YCbCr. So to get the same image with our custom conversion logic, we have to store values in the relevant order.
A sample conversion code may look like this:
if(image.type() == CV_16UC3)
{
const float scale = 257.0f / 65535.0f;
const float offset = 257.0f;
for (int i = 0; i < image.cols; i++)
{
for (int j = 0; j < image.rows; j++)
{
auto R = image.at<cv::Vec3w>(j, i)[2];
auto G = image.at<cv::Vec3w>(j, i)[1];
auto B = image.at<cv::Vec3w>(j, i)[0];
auto Y = (R * 65.481f * scale) + (G * 128.553f * scale) + (B * 24.966f * scale) + (16.0f * offset);
auto Cb = (R * -37.797f * scale) + (G * -74.203f * scale) + (B * 112.0f * scale) + (128.0f * offset);
auto Cr = (R * 112.0f * scale) + (G * -93.786f * scale) + (B * -18.214f * scale) + (128.0f * offset);
image.at<cv::Vec3w>(j, i)[0] = (unsigned short)Y;
image.at<cv::Vec3w>(j, i)[1] = (unsigned short)Cr;
image.at<cv::Vec3w>(j, i)[2] = (unsigned short)Cb;
}
}
}
You should use cv::cvtColor
cvtColor(src, target_image, cv::COLOR_RGB2YCrCb);
Then just flip the second and third channels.
Though you could be getting that error because you're not casting the resulting values to ints.

Bilinear interpolation in 2D transformation Qt

I'm currently working on 2D transformations (translation, scaling, shearing and rotation) in Qt. I have a problem with bilinear interpolation, which I want to use to cover the 'black pixels' in output image. I'm using matrix calculations to get new coordinates of pixels of input image. Then I use reverse matrix calculation to check which pixel of input image responds to output pixel. Result of that is some float number which I use to interpolation. I check the four neighbour points and calculate the value (color) of output pixel. I have checked my calculations 'by hand' and they seem to be good.
Can anyone find any bug in that code? (I cut out the parts of code which are responsible for interface such as sliders).
Geometric::Geometric(QWidget* parent) : QWidget(parent) {
resize(1000, 800);
displayLogoDefault = true;
a = shx = shy = x0 = y0 = 0;
scx = scy = 1;
tx = ty = 0;
x = 200, y = 200;
paintT = paintSc = paintR = paintShx = paintShy = false;
img = new QImage(600,600,QImage::Format_RGB32);
img2 = new QImage("logo.jpeg");
}
Geometric::~Geometric() {
delete img;
delete img2;
img = NULL;
img2 = NULL;
}
void Geometric::makeChange() {
displayLogoDefault = false;
// iteration through whole input image
for(int i = 0; i < img2->width(); i++) {
for(int j = 0; j < img2->height(); j++) {
// calculate new coordinates basing on given 2D transformations values
//I calculated that formula eariler by multiplying/adding matrixes
x = cos(a)*scx*(i-x0) - sin(a)*scy*(j-y0) + shx*sin(a)*scx*(i-x0) + shx*cos(a)*scy*(j-y0);
y = shy*(x) + sin(a)*scx*(i-x0) + cos(a)*scy*(j-y0);
// tx and ty goes for translation. scx and scy for scaling
// shx and shy for shearing and a is angle for rotation
x += (x0 + tx);
y += (y0 + ty);
if(x >= 0 && y >= 0 && x < img->width() && y < img->height()) {
// reverse matrix calculation formula to find proper pixel from input image
float tmx = x - x0 - tx;
float tmy = y - y0 - ty;
float recX = 1/scx * ( cos(-a)*( (tmx + shx*shy*tmx - shx*tmx) ) + sin(-a)*( shy*tmx - tmy ) ) + x0 ;
float recY = 1/scy * ( sin(-a)*(tmx + shx*shy*tmx - shx*tmx) - cos(-a)*(shy*tmx-tmy) ) + y0;
// here the interpolation starts. I calculate the color basing on four points from input image
// that points are taken from the reverse matrix calculation
float a = recX - floorf(recX);
float b = recY - floorf (recY);
if(recX + 1 > img2->width()) recX -= 1;
if(recY + 1 > img2->height()) recY -= 1;
QColor c1 = QColor(img2->pixel(recX, recY));
QColor c2 = QColor(img2->pixel(recX + 1, recY));
QColor c3 = QColor(img2->pixel(recX , recY + 1));
QColor c4 = QColor(img2->pixel(recX + 1, recY + 1));
float colR = b * ((1.0 - a) * (float)c3.red() + a * (float)c4.red()) + (1.0 - b) * ((1.0 - a) * (float)c1.red() + a * (float)c2.red());
float colG = b * ((1.0 - a) * (float)c3.green() + a * (float)c4.green()) + (1.0 - b) * ((1.0 - a) * (float)c1.green() + a * (float)c2.green());
float colB = b * ((1.0 - a) * (float)c3.blue() + a * (float)c4.blue()) + (1.0 - b) * ((1.0 - a) * (float)c1.blue() + a * (float)c2.blue());
if(colR > 255) colR = 255; if(colG > 255) colG = 255; if(colB > 255) colB = 255;
if(colR < 0 ) colR = 0; if(colG < 0 ) colG = 0; if(colB < 0 ) colB = 0;
paintPixel(x, y, colR, colG, colB);
}
}
}
// x0 and y0 are the starting point of image
x0 = abs(x-tx);
y0 = abs(y-ty);
repaint();
}
// function painting a pixel. It works directly on memory
void Geometric::paintPixel(int i, int j, int r, int g, int b) {
unsigned char *ptr = img->bits();
ptr[4 * (img->width() * j + i)] = b;
ptr[4 * (img->width() * j + i) + 1] = g;
ptr[4 * (img->width() * j + i) + 2] = r;
}
void Geometric::paintEvent(QPaintEvent*) {
QPainter p(this);
p.drawImage(0, 0, *img);
if (displayLogoDefault == true) p.drawImage(0, 0, *img2);
}

Merge two Mat-Objects in OpenCV

I wrote a little function in order to be able to "stick" the pixels of an image on top of another, but it somehow doesnt work: While the "shapes" of my "sticked" image is right, the colors are not.
The example flower is the first image, and as a second image I had a black trapezoid png. As you can see, there are multiple problems:
1. The colors are shown weirdly. Actually there are no colors, just greyscale, and some weird stripes as overlay.
2. Alpha values are not respected. The white part of the overlay image is transparent in the png.
Here is my code:
void mergeMats(Mat mat1, Mat mat2, int x, int y){
//unsigned char * pixelPtr = (unsigned char *)mat2.data;
//unsigned char * pixelPtr2 = (unsigned char *)mat1.data;
//int cn = mat2.channels();
//int cn2 = mat2.channels();
//Scalar_<unsigned char> bgrPixel;
for (int i = 0; i < mat2.cols; i++){
for (int j = 0; j < mat2.rows; j++){
if (x + i < mat1.cols && y + j < mat1.rows){
Vec3b &intensity = mat1.at<Vec3b>(j+y, i+x);
Vec3b intensity2 = mat2.at<Vec3b>(j, i);
for (int k = 0; k < mat1.channels(); k++) {
intensity.val[k] = saturate_cast<uchar>(intensity2.val[k]);
}
//pixelPtr2[(i + x)*mat1.cols*cn2 + (j + y)*cn2 + 0] = pixelPtr[(i + x)*mat2.cols*cn + (j + y)*cn + 0];
//pixelPtr2[(i + x)*mat1.cols*cn2 + (j + y)*cn2 + 1] = pixelPtr[(i + x)*mat2.cols*cn + (j + y)*cn + 1];
//pixelPtr2[(i + x)*mat1.cols*cn2 + (j + y)*cn2 + 2] = pixelPtr[(i + x)*mat2.cols*cn + (j + y)*cn + 2];
}
}
}
}
The commented code was another approach, but had the same results.
So, here are my questions:
1. How do I solve the 2 problems (1. the colors..., 2. alpha ...)
2. How is the pixel-array of any Mat-Object actually, well, organized? I guess it would be easier for me to manipulate those arrays if I knew whats in them.
Because you are iterating mat2 with wrong type. Change Vec3b intensity2 = mat2.at<Vec3b>(j, i); to:
Vec4b intensity2 = mat2.at<Vec4b>(j, i);
and the weird stripes are eliminated. And use intensity2[3] to deal with the alpha channel.
Assume that you are reading the black trapezoid png file using -1 flag:
auto trapezoidImg = cv::imread("trapezoid.png", -1);
where the -1 flag specifies that alpha channel is read. Then trapezoidImg is organized in the following format:
[B, G, R, A, B, G, R, A, ......;
B, G, R, A, B, G, R, A, ......;
......
B, G, R, A, B, G, R, A, ......]
You can print out trapezoidImg, for example using std::cout, to figure out this format.
If you read trapezoidImg using at<Vec3b>, what you get in fact is (B, G, R), (A, B, G), (R, A, B), ......, and this is where the weird stripes came from. Therefore, use at<Vec4b> to read the (R, G, B, A) intensity correctly.
Next, you should define what to do with the alpha channel. You can mixing two Mat or override another, whatever. One simple method is to override mat1 only when the alpha channel in mat2 is large enough:
cv::Vec3b &intensity = mat1.at<cv::Vec3b>(j + y, i + x);
cv::Vec4b intensity2 = mat2.at<cv::Vec4b>(j, i);
for (int k = 0; k < mat1.channels(); k++) {
if (intensity2.val[3] > 250){ //3 for alpha channel
intensity.val[k] = cv::saturate_cast<uchar>(intensity2.val[k]);
}
}
This is enough to deal with your black trapezoid png with transparent background. Or further extend the rule by mixing two Mat:
cv::Vec3b &intensity = mat1.at<cv::Vec3b>(j + y, i + x);
cv::Vec4b intensity2 = mat2.at<cv::Vec4b>(j, i);
auto alphaValue = cv::saturate_cast<uchar>(intensity2.val[3]);
auto alpha = alphaValue / 255.0;
for (int k = 0; k < 2; k++) { //BGR channels only
intensity.val[k] = cv::saturate_cast<uchar>(
intensity2.val[k] * alpha + intensity.val[k] * (1.0 - alpha));
}

How to understand this RayTracer code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
So this RT code creates a 3D image, with blur, through raw code. How is that actually done without any modelling tools?
I am currently working to understand how RT work and different ways to implement them, so this was kind of cool to see such a small amount of code producing a pretty impressive 3D image.
#include <stdlib.h> // card > aek.ppm
#include <stdio.h>
#include <math.h>
#include <fstream>
typedef int i;
typedef float f;
struct v {
f x, y, z;
v operator+(v r) {
return v(x + r.x, y + r.y, z + r.z);
}
v operator*(f r) {
return v(x * r, y * r, z * r);
}
f operator%(v r) {
return x * r.x + y * r.y + z * r.z;
}
v() {}
v operator^(v r) {
return v(y * r.z - z * r.y, z * r.x - x * r.z, x * r.y - y * r.x);
}
v(f a, f b, f c) {x = a; y = b; z = c;}
v operator!() {
return*this * (1 / sqrt(*this % *this));
}
};
i G[] = {247570, 280596, 280600, 249748, 18578, 18577, 231184, 16, 16};
f R()
{
return(f)rand() / RAND_MAX;
}
i T(v o, v d, f&t, v&n)
{
t = 1e9; i m = 0;
f p = -o.z / d.z;
if(.01 < p)t = p, n = v(0, 0, 1), m = 1;
for(i k = 19; k--;)
for(i j = 9; j--;)if(G[j] & 1 << k) {
v p = o + v(-k, 0, -j - 4);
f b = p % d, c = p % p - 1, q = b * b - c;
if(q > 0) {
f s = -b - sqrt(q);
if(s < t && s > .01)
t = s, n = !(p + d * t), m = 2;
}
}
return m;
} v S(v o, v d)
{
f t;
v n;
i m = T(o, d, t, n);
if(!m)return v(.7, .6, 1) * pow(1 - d.z, 4);
v h = o + d * t, l = !(v(9 + R(), 9 + R(), 16) + h * -1), r = d + n * (n % d * -2);
f b = l % n; if(b < 0 || T(h, l, t, n))b = 0;
f p = pow(l % r * (b > 0), 99);
if(m & 1) {
h = h * .2;
return((i)(ceil(h.x) + ceil(h.y)) & 1 ? v(3, 1, 1) : v(3, 3, 3)) * (b * .2 + .1);
} return v(p, p, p) + S(h, r) * .5;
} i
main()
{
FILE * pFile;
pFile = fopen("d:\\myfile3.ppm", "w");
fprintf(pFile,"P6 512 512 255 ");
v g = !v(-6, -16, 0), a = !(v(0, 0, 1) ^ g) * .002, b = !(g ^ a) * .002, c = (a + b) * -256 + g;
for(i y = 512; y--;)
for(i x = 512; x--;) {
v p(13, 13, 13);
for(i r = 64; r--;) {
v t = a * (R() - .5) * 99 + b * (R() - .5) * 99;
p = S(v(17, 16, 8) + t, !(t * -1 + (a * (R() + x) + b * (y + R()) + c) * 16)) * 3.5 + p;
}
fprintf(pFile, "%c%c%c", (i)p.x, (i)p.y, (i)p.z);
}
}
My dear friend that's Paul Heckbert code's right?
You could at least mention it.
For people thinking this code is unreadable, here is why:
This guy made a code that could fit on a credit card, that was the goal :)
His website: http://www.cs.cmu.edu/~ph/
Edit: Knowing the source of this code may help you understand it. Even if it'snot your main motivation...
If you are really interested in raytracing, start with other sources.
Take a look at this website http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-1-writing-a-simple-raytracer/source-code/ (Plus it talk about your code)
This code is not really special. It is basically a ray tracer that was obfuscated into a form that makes it fit on a business card (see https://www.cs.cmu.edu/~ph/).
How is that actually done without any modelling tools?
You don't need tools to render anything. You could even create a complete game of WoW (or what's hip at the moment) without any modelling tool. Modelling tools just make your live easier w.r.t. certain kinds of scenes (read: very complex ones).
You could always hardcode these data, or hack them manually into some external file.
You could also use parametric generators; Perlin Noise is one of the more popular examples thereof.
In a ray tracer, it happens that it is very simple to start out without modelling tools, as it is very simple to calculate geometric intersections between the rendering primitive "ray" and any finite geometric primitive. E.g., intersection a non-approximated, "perfect" sphere is just a few lines of code.
tl;dr: Data is just data. How you create and crunch it is completely up to you.

SSE optimization of Gaussian blur

I'm working on a school project , I have to optimize part of code in SSE, but I'm stuck on one part for few days now.
I dont see any smart way of using vector SSE instructions(inline assembler / instric f) in this code(its a part of guassian blur algorithm). I would be glad if somebody could give me just a small hint
for (int x = x_start; x < x_end; ++x) // vertical blur...
{
float sum = image[x + (y_start - radius - 1)*image_w];
float dif = -sum;
for (int y = y_start - 2*radius - 1; y < y_end; ++y)
{ // inner vertical Radius loop
float p = (float)image[x + (y + radius)*image_w]; // next pixel
buffer[y + radius] = p; // buffer pixel
sum += dif + fRadius*p;
dif += p; // accumulate pixel blur
if (y >= y_start)
{
float s = 0, w = 0; // border blur correction
sum -= buffer[y - radius - 1]*fRadius; // addition for fraction blur
dif += buffer[y - radius] - 2*buffer[y]; // sum up differences: +1, -2, +1
// cut off accumulated blur area of pixel beyond the border
// assume: added pixel values beyond border = value at border
p = (float)(radius - y); // top part to cut off
if (p > 0)
{
p = p*(p-1)/2 + fRadius*p;
s += buffer[0]*p;
w += p;
}
p = (float)(y + radius - image_h + 1); // bottom part to cut off
if (p > 0)
{
p = p*(p-1)/2 + fRadius*p;
s += buffer[image_h - 1]*p;
w += p;
}
new_image[x + y*image_w] = (unsigned char)((sum - s)/(weight - w)); // set blurred pixel
}
else if (y + radius >= y_start)
{
dif -= 2*buffer[y];
}
} // for y
} // for x
One more feature you can use is logical operations and masks:
for example instead of:
// process only 1
if (p > 0)
p = p*(p-1)/2 + fRadius*p;
you can write
// processes 4 floats
const __m128 &mask = _mm_cmplt_ps(p,0);
const __m128 &notMask = _mm_cmplt_ps(0,p);
const __m128 &p_tmp = ( p*(p-1)/2 + fRadius*p );
p = _mm_add_ps(_mm_and_ps(p_tmp, mask), _mm_and_ps(p, notMask)); // = p_tmp & mask + p & !mask
Also I can recommend you to use a special libraries, which overloads instructions. For example: http://code.compeng.uni-frankfurt.de/projects/vc
dif variable makes iterations of inner loop dependent. You should try to parallelize the outer loop. But with out instructions overloading the code will become unmanageable then.
Also consider rethinking the whole algorithm. Current one doesn't look paralell. May be you can neglect precision, or increase scalar time a bit?