adjust bitmap image brightness/contrast using c++ - c++

adjust image brightness/contrast using c++ without using any other 3rd party library or dependancy

Image brightness is here - use the mean of the RGB values and shift them.
Contrast is here with other languages solutions available as well.
Edit in case the above links die:
The answer given by Jerry Coffin below covers the same topic and has links that still live.
But, to adjust brightness, you add a constant value to each for the R,G,B fields of an image. Make sure to use saturated math - don't allow values to go below 0 or above the maximum allowed in your bit-depth (8-bits for 24-bit color)
RGB_struct color = GetPixelColor(x, y);
size_t newRed = truncate(color.red + brightAdjust);
size_t newGreen = truncate(color.green + brightAdjust);
size_t newBlue = truncate(color.blue + brightAdjust);
For contrast, I have taken and slightly modified code from this website:
float factor = (259.0 * (contrast + 255.0)) / (255.0 * (259.0 - contrast));
RGB_struct color = GetPixelColor(x, y);
size_t newRed = truncate((size_t)(factor * (color.red - 128) + 128));
size_t newGreen = truncate((size_t)(factor * (color.green - 128) + 128));
size_t newBlue = truncate((size_t)(factor * (color.blue - 128) + 128));
Where truncate(int value) makes sure the value stays between 0 and 255 for 8-bit color. Note that many CPUs have intrinsic functions to do this in a single cycle.
size_t truncate(size_t value)
{
if(value < 0) return 0;
if(value > 255) return 255;
return value;
}

Read in the image with a library just as the Independent JPEG library. When you have raw data, you can convert it from RGB to HSL or (preferably) CIE Lab*. Both contrast and brightness will basically just involve adjustments to the L channel -- to adjust brightness, just adjust all the L values up or down by an appropriate amount. To adjust contrast, you basically adjust the difference between a particular value and the center value. You'll generally want to do this non-linearly, so values near the middle of the range are adjusted quite a bit, but values close to the ends or the range aren't affected nearly as much (and any that are at the very ends, aren't changed at all).
Once you've done that, you can convert back to RGB, and then back to a normal format such as JPEG.

Related

no suitable conversion function from "Magick::Color" to "MagickCore::Quantum" exists

I know already why it gives that error code-wise.
Problem is, I started using the library itself today and following the tutorial I found this.
I installed the "ImageMagick-7.0.9-1-Q16-x64-dll" version of the library, and tried to find the shortest code that gave that error, which is:
#include <Magick++.h>
int main(){
Magick::Quantum result = Magick::Color("black");
}
Given the tutorial(following one), a method that converts from Magick::Color to Magic::Quantum should exist
// Example of using an image pixel cache
Image my_image("640x480", "white"); // we'll use the 'my_image' object in this example
my_image.modifyImage(); // Ensure that there is only one reference to
// underlying image; if this is not done, then the
// image pixels *may* remain unmodified. [???]
Pixels my_pixel_cache(my_image); // allocate an image pixel cache associated with my_image
Quantum* pixels; // 'pixels' is a pointer to a Quantum array
// define the view area that will be accessed via the image pixel cache
int start_x = 10, start_y = 20, size_x = 200, size_y = 100;
// return a pointer to the pixels of the defined pixel cache
pixels = my_pixel_cache.get(start_x, start_y, size_x, size_y);
// set the color of the first pixel from the pixel cache to black (x=10, y=20 on my_image)
*pixels = Color("black");
// set to green the pixel 200 from the pixel cache:
// this pixel is located at x=0, y=1 in the pixel cache (x=10, y=21 on my_image)
*(pixels+200) = Color("green");
// now that the operations on my_pixel_cache have been finalized
// ensure that the pixel cache is transferred back to my_image
my_pixel_cache.sync();
which gives that error ( no suitable conversion function from "Magick::Color" to "MagickCore::Quantum" exists ) at the following lines
*pixels = Color("black");
*(pixels+200) = Color("green");
I believe you are confusing a data-type with a structure. The pixels represents a continuous list of Quantum parts.
Assuming that we're working with RGB colorspace. You would need to set each color part.
Color black("black");
*(pixels + 0) = black.quantumRed();
*(pixels + 1) = black.quantumGreen();
*(pixels + 2) = black.quantumBlue();
To set the 200th pixel, you would need to multiply the offset by the parts-per-pixel count.
Color green("green");
int offset = 199 * 3; // First pixel starts at 0, and 3 parts (Red, Green, Blue)
*(pixels + offset + 0) = green.quantumRed();
*(pixels + offset + 1) = green.quantumGreen();
*(pixels + offset + 2) = green.quantumBlue();

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

How to most efficiently modify R / G / B values?

So I wanted to implement lighting in my pixel based rendering system, googled and found out to display R / G / B values lighter or darker I have to multiply each red green and blue value by a number < 1 to display it darker and by a number > 1 to display it lighter.
So I implemented it like this, but its really dragging down my performance since I have to do this for each pixel:
void PixelRenderer::applyLight(Uint32& color){
Uint32 alpha = color >> 24;
alpha << 24;
alpha >> 24;
Uint32 red = color >> 16;
red = red << 24;
red = red >> 24;
Uint32 green = color >> 8;
green = green << 24;
green = green >> 24;
Uint32 blue = color;
blue = blue << 24;
blue = blue >> 24;
red = red * 0.5;
green = green * 0.5;
blue = blue * 0.5;
color = alpha << 24 | red << 16 | green << 8 | blue;
}
Any ideas or examples on how to improve the speed?
Try this: (EDIT: as it turns out, this is only a readability improvement, but read on for more insights.)
void PixelRenderer::applyLight(Uint32& color)
{
Uint32 alpha = color >> 24;
Uint32 red = (color >> 16) & 0xff;
Uint32 green = (color >> 8) & 0xff;
Uint32 blue = color & 0xff;
red = red * 0.5;
green = green * 0.5;
blue = blue * 0.5;
color = alpha << 24 | red << 16 | green << 8 | blue;
}
That having been said, you should understand that performing operations of that sort using a general-purpose processor such as the CPU of your computer is bound to be extremely slow. That's why hardware-accelerated graphics cards were invented.
EDIT
If you insist on operating this way, then you will probably have to resort to hacks in order to improve efficiency. One type of hack which is very often used when dealing with 8-bit channel values is lookup tables. With a lookup table, instead of multiplying each individual channel value by a float, you precompute an array of 256 values where the index into the array is a channel value, and the value in that index is the precomputed result of multiplying the channel value by that float. Then, when converting your image, you just use channel values to lookup entries of the array instead of performing actual float multiplication. This is much, much faster. (But still not nearly as fast as programming dedicated, massively parallel hardware do that stuff for you.)
EDIT
As others have already pointed out, if you are not planning to operate on the alpha channel, then you do not need to extract it and then later apply it, you can just leave it unaltered. So, you can just do color = (color & 0xff000000) | red << 16 | green << 8 | blue;
Shifts and masks like this are generally very fast on a modern processor. I might look at a few other things:
Follow the first rule of optimisation - profile your code. You can do this simply by calling the method millions of times and timing it. Are your calculations slow, or is it something else? What is slow? Try omitting part of the method - do things speed up?
Make sure that this function is declared inline (and make sure it has actually been inlined). The function call overhead will massively outweigh the pixel manipulations (particularly if it is virtual).
Consider declaring your method Uint32 PixelRenderer::applyLight(Uint32 color) and returning the modified value, that may help avoid some dereferences and give the compiler some additional optimisation opportunities.
Avoid fp to integer conversions, they can be very expensive. If a plain integer divide is insufficient, look at using fixed-point math.
Finally, look at the assembler to see what the compiler has generated (with optimisations on). Are there any branches or conversions? Has your method actually been inlined?
To preserve the alpha value in the front use:
(color>>1)&0x7F7F7F | (color&0xFF000000)
(A tweak on what Wimmel offered in the comments).
I think the 'learning curve' here is that you were using shift and shift back to mask out bits. You should use & with a masking value.
For a more general solution (where 0.0<=factor<=1.0) :
void PixelRenderer::applyLight(Uint32& color, double factor){
Uint32 alpha=color&0xFF000000;
Uint32 red= (color&0x00FF0000)*factor;
Uint32 green= (color&0x0000FF00)*factor;
Uint32 blue=(color&0x000000FF)*factor;
color=alpha|(red&0x00FF0000)|(green&0x0000FF00)|(blue&0x000000FF);
}
Notice there is no need to shift the components down to the low order bits before performing the multiplication.
Ultimately you may find that the bottleneck is floating point conversions and arithmetic.
To reduce that you should consider either:
Reduce it to a scaling factor for example in the range 0-256.
Precompute factor*component as a 256 element array and 'pick' the components out oft.
I'm proposing a range of 257 because you can achieve the factor as follows:
For a more general solution (where 0<=factor<=256) :
void PixelRenderer::applyLight(Uint32& color, Uint32 factor){
Uint32 alpha=color&0xFF000000;
Uint32 red= ((color&0x00FF0000)*factor)>>8;
Uint32 green= ((color&0x0000FF00)*factor)>>8;
Uint32 blue=((color&0x000000FF)*factor)>>8;
color=alpha|(red&0x00FF0000)|(green&0x0000FF00)|(blue&0x000000FF);
}
Here's a runnable program illustrating the first example:
#include <stdio.h>
#include <inttypes.h>
typedef uint32_t Uint32;
Uint32 make(Uint32 alpha,Uint32 red,Uint32 green,Uint32 blue){
return (alpha<<24)|(red<<16)|(green<<8)|blue;
}
void output(Uint32 color){
printf("alpha=%"PRIu32" red=%"PRIu32" green=%"PRIu32" blue=%"PRIu32"\n",(color>>24),(color&0xFF0000)>>16,(color&0xFF00)>>8,color&0xFF);
}
Uint32 applyLight(Uint32 color, double factor){
Uint32 alpha=color&0xFF000000;
Uint32 red= (color&0x00FF0000)*factor;
Uint32 green= (color&0x0000FF00)*factor;
Uint32 blue=(color&0x000000FF)*factor;
return alpha|(red&0x00FF0000)|(green&0x0000FF00)|(blue&0x000000FF);
}
int main(void) {
Uint32 color1=make(156,100,50,20);
Uint32 result1=applyLight(color1,0.9);
output(result1);
Uint32 color2=make(255,255,255,255);
Uint32 result2=applyLight(color2,0.1);
output(result2);
Uint32 color3=make(78,220,200,100);
Uint32 result3=applyLight(color3,0.05);
output(result3);
return 0;
}
Expected Output is:
alpha=156 red=90 green=45 blue=18
alpha=255 red=25 green=25 blue=25
alpha=78 red=11 green=10 blue=5
One thing that I don't see anyone else mentioning is parallelizing your code. There are at least 2 ways to do this: SIMD instructions, and multiple threads.
SIMD instructions (like SSE, AVX, etc.) perform the same math on multiple pieces of data at the same time. So you could, for example, multiply the red, green, blue, and alpha of a pixel by the same values in 1 instruction, like this:
vec4 lightValue = vec4(0.5, 0.5, 0.5, 1.0);
vec4 result = vec_Mult(inputPixel, lightValue);
That's the equivalent of:
lightValue.red = 0.5;
lightValue.green = 0.5;
lightValue.blue = 0.5;
lightValue.alpha = 1.0;
result.red = inputPixel.red * lightValue.red;
result.green = inputPixel.green * lightValue.green;
result.blue = inputPixel.blue * lightValue.blue;
result.alpha = inputPixel.alpha * lightValue.alpha;
You can also cut your image into tiles and perform the lightening operation on several tiles at once using threads run on multiple cores. If you're using C++11, you can use std::thread to start multiple threads. Otherwise your OS probably has functionality for threading, such as WinThreads, Grand Central Dispatch, pthreads, boost threads, Threading Building Blocks, etc.
You can combine both of the above and have multithreaded code that operates on whole pixels at a time.
If you want to take it even further, you can do your processing on the GPU of your machine using OpenGL, OpenCL, DirectX, Metal, Mantle, CUDA, or one of the other GPGPU technologies. GPUs are generally hundreds of cores that can very quickly process many tiles in parallel, each of which processes whole pixels (rather than just channels) at a time.
But an even better option may be not to write any code at all. It's extremely likely that someone has already done this work and you can leverage it. For example, on MacOS there's CoreImage and the Accelerate framework. On iOS you also have CoreImage, and there's also GPUImage. I'm sure there are similar libraries on Windows, Linux, and other OSes you might be working with.
Another solution without using bit shifters, is to convert your 32 bits uint into a struct.
Try to keep your implementation in the .h include file, so that it can be inlined
If you don't want to have the implementation inlined (see above), modify your applyLight method to accept an array of pixels. Method call overhead can be significant for such a small method
Enable "loop unroll" optimisation on your compiler, which will enable the usage of SIMD instructions
Implementation:
class brightness {
private:
struct pixel { uint8_t b, g, r, a; };
float factor;
static inline void apply(uint8_t& p, float f) {
p = max(min(int(p * f), 255),0);
}
public:
brightness(float factor) : factor(factor) { }
void apply(uint32_t& color){
pixel& p = (pixel&)color;
apply(p.b, factor);
apply(p.g, factor);
apply(p.r, factor);
}
};
Implementation with a lookup table (slower when you use "loop unroll"):
class brightness {
struct pixel { uint8_t b, g, r, a; };
uint8_t table[256];
public:
brightness(float factor) {
for(int i = 0; i < 256; i++)
table[i] = max(min(int(i * factor), 255), 0);
}
void apply(uint32_t& color){
pixel& p = (pixel&)color;
p.b = table[p.b];
p.g = table[p.g];
p.r = table[p.r];
}
};
// usage
brightness half_bright(0.5);
uint32_t pixel = 0xffffffff;
half_bright.apply(pixel);

OpenCV: Calculating new red pixel value

I'm currently aiming to adjust the red pixels in an image (more specifically, an eye region to remove red eyes caused by flash), and this works well, but the issue I'm getting is sometimes green patches appear on the skin.
This is a good result (before and after):
I realize why this is happening, but when I go to adjust the threshold to a higher a value (meaning the red intensity must be stronger), less red pixels are picked up and changed, i.e.:
The lower the threshold, the more green shows up on the skin.
I was wondering if there was an alternate method to what I'm currently doing to change the red pixels?
int lcount = 0;
for(int y=0;y<lcroppedEye.rows;y++)
{
for(int x=0;x<lcroppedEye.cols;x++)
{
double b = lcroppedEye.at<cv::Vec3b>(y, x)[0];
double g = lcroppedEye.at<cv::Vec3b>(y, x)[1];
double r = lcroppedEye.at<cv::Vec3b>(y, x)[2];
double redIntensity = r / ((g + b) / 2);
//currently causes issues with non-red-eye images
if (redIntensity >= 1.8)
{
double newRedValue = (g + b) / 2;
cv::Vec3b pixelColor(newRedValue,g,b);
lroi.at<cv::Vec3b>(cv::Point(x,y)) = pixelColor;
lcount++;
}
}
}
EDIT: I can possibly add in a check to ensure the new RGB values are low enough, and so R, G, B values are similar/close values so black/grey pixels are written out only... or have a range of RGB values (greenish) which aren't allowed... would that work?
Adjusting color in RGB space has caveats like this greenish areas you faced. Convert the R,G,B values to a better color space, like HSV or LUV.
I suggest you go for HSV to detect and change the red-eye colors. R/(G+B) is not a good way for calculating red intensity. This means you are calling (R=10,G=1,B=0) a very red color, but it is deadly black. Take a look at the comparison below:
So, you'd better check if Saturation and Value are high values which is the case for a red-eye color. If you encounter other high intensity colors, you may check the Hue is in the range of something like [0-20] and [340-359]. But without this, you are still safe against the white itself, as it has a very low saturation and you won't select white areas anyway.
That was for selecting, for changing the color, it is again better to not use RGB, as changing in that space is not linear as we perceive colors. Looking at the image above, you can see that lowering both the saturation and value would be a good start. But you may experiment with it and see what looks better. Maybe you'll be fine with a dark gray always, that would mean set Saturation to zero, and lower the Value a bit. You may think a dark brown would be better, go for a low saturation and value but set Hue to something about 30 degrees.
References that may help you:
Converting color values in OpenCV
An online tool to experiment with RGB and HSV colors
It may be better to change
double redIntensity = r / ((g + b) / 2);
to
double redIntensity = r / ((g+b+1) / 2);
because g+b can be equal to 0, and you'll get NAN.
Also take alook at cv::floodfill method.
May be it is better to ignore color information at red zones at all, as soon as color information in extra red area is too much distorted by extra red values. So new values could be:
newRedValue = (g+b)/2; newGreenValue = newRedValue; newBlueValue = newRedValue;
Even if you will detect wrong red area its desaturating will give better result than greenish area.
You can also use morphological closing operations (using circle structuring element) to avoid gaps in your red area mask. So you will need perform 3 steps: 1. find red areas and create mask for this 2. do red area mask morphological closing operations 3. desaturate image using this mask
And yes, don't use "r /((g+b)/2)" as it can lead to division by zero error.
Prepare a mask the same size as your lcroppedEye image, which is initially all black (I'll call this image maskImage here onwards).
For every pixel in lcroppedEye(row, col) that pass your (redIntensity >= 1.8) condition, set the maskImage(row, col) pixel to white.
When you are done with all the pixels in lcroppedEye, maskImage will have all redeye-like pixels in white.
If you perform a connected component analysis on this maskImage, you should be able to filter out other regions by considering circle or disk-like-features etc.
Now you can use this maskImage as a mask to apply the color transformation to the ROI of the original image
(You may have to do some preprocessing on maskImage before moving on to connected component analysis. Also you can replace the code segment in the question with split, divide and threshold functions unless there's special reason to iterate through pixels)
The problem seems to be that you replace regardless of the presence of any red eye, so you must somehow test if there is any high red values (more red than your skin).
My guess it that the areas where there is reflection there will also be specific blue and green values, either high or low that should be check so that you for example need high red values combined with low blue and/or low green values.
// first pass, getting the highest red value
int highRed = 0;
cv::Point redPos = cv::Point(0,0);
int lcount = 0;
for(int y=0;y<lcroppedEye.rows;y++)
{
for(int x=0;x<lcroppedEye.cols;x++)
{
double r = lcroppedEye.at<cv::Vec3b>(y, x)[2];
if (redIntensity > highRed)
{
highRed = redIntensity ;
redPos = cv::Point(x,y);
}
}
}
// decide if its red enough, need to find a good minRed value.
if (highRed < minRed)
return;
Original code here with the following changes.
// avoid division by zero, code from #AndreySmorodov
double redIntensity = r / ((g+b+1) / 2);
// add check for actual red colour.
if (redIntensity >= 1.8 && r > highRed*0.75)
// potential add check for low absolute r/b values.
{
double newRedValue = (g + b) / 2;
cv::Vec3b pixelColor(newRedValue,g,b);
lroi.at<cv::Vec3b>(cv::Point(x,y)) = pixelColor;
lcount++;
}
}

Vertically flipping an Char array: is there a more efficient way?

Lets start with some code:
QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
if (vertFlip){
/* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
* The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
* The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
* This function is EXTREMELY SLOW
*/
QByteArray tempArray = imageArray;
for (int h = 0; h < height; ++h){
for (int w = 0; w < width/2; ++w){
for (int i = 0; i < 4; ++i){
imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
}
}
}
}
return imageArray;
}
This is the code I use right now to vertically flip an image which is 640*480 (The image is actually not guaranteed to be 640*480, but it mostly is). The color encoding is RGBA, which means that the total array size is 640*480*4. I get the images with 30 FPS, and I want to show them on the screen with the same FPS.
On an older CPU (Athlon x2) this code is just too much: the CPU is racing to keep up with the 30 FPS, so the question is: can I do this more efficient?
I am also working with OpenGL, does that have a gimmic I am not aware of that can flip images with relativly low CPU/GPU usage?
According to this question, you can flip an image in OpenGL by scaling it by (1,-1,1). This question explains how to do transformations and scaling.
You can improve at least by doing it blockwise, making use of the cache architecture. In your example one of the accesses (either the read OR the write) will be off-cache.
For a start it can help to "capture scanlines" if you're using two loops to loop through the pixels of an image, like so:
for (int y = 0; y < height; ++y)
{
// Capture scanline.
char* scanline = imageArray.data() + y*width*4;
for (int x = 0; x < width/2; ++x)
{
const int flipped_x = width - x-1;
for (int i = 0; i < 4; ++i)
swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
}
}
Another thing to note is that I used swap instead of a temporary image. That'll tend to be more efficient since you can just swap using registers instead of loading pixels from a copy of the entire image.
But also it generally helps if you use a 32-bit integer instead of working one byte at a time if you're going to be doing anything like this. If you're working with pixels with 8-bit types but know that each pixel is 32-bits, e.g., as in your case, you can generally get away with a case to uint32_t*, e.g.
for (int y = 0; y < height; ++y)
{
uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
std::reverse(scanline, scanline + width);
}
At this point you might parellelize the y loop. Flipping an image horizontally (it should be "horizontal" if I understood your original code correctly) in this way is a little bit tricky with the access patterns, but you should be able to get quite a decent boost using the above techniques.
I am also working with OpenGL, does that have a gimmic I am not aware
of that can flip images with relativly low CPU/GPU usage?
Naturally the fastest way to flip images is to not touch their pixels at all and just save the flipping for the final part of the pipeline when you render the result. For this you might render a texture in OGL with negative scaling instead of modifying the pixels of a texture.
Another thing that's really useful in video and image processing is to represent an image to process like this for all your image operations:
struct Image32
{
uint32_t* pixels;
int32_t width;
int32_t height;
int32_t x_stride;
int32_t y_stride;
};
The stride fields are what you use to get from one scanline (row) of an image to the next vertically and one column to the next horizontally. When you use this representation, you can use negative values for the stride and offset the pixels accordingly. You can also use the stride fields to, say, render only every other scanline of an image for fast interactive half-res scanline previews by using y_stride=height*2 and height/=2. You can quarter-res an image by setting x stride to 2 and y stride to 2*width and then halving the width and height. You can render a cropped image without making your blit functions accept a boatload of parameters by just modifying these fields and keeping the y stride to width to get from one row of the cropped section of the image to the next:
// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source,
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);
// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst,
int src_x, int src_y, int src_w, int src_h,
const uint32_t* src, bool flip_x, bool flip_y);
By using negative values and passing it to an image operation (ex: a blit operation), the result will naturally be flipped without having to actually flip the image. It'll end up being "drawn flipped", so to speak, just as with the case of using OGL with a negative scaling transformation matrix.