Fast, good quality pixel interpolation for extreme image downscaling - c++

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!

The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.

As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

Related

Logistic regression for fault detection in an image

Basically, I want to detect a fault in an image using logistic regression. I'm hoping to get so feedback on my approach, which is as follows:
For training:
Take a small section of the image marked "bad" and "good"
Greyscale them, then break them up into a series of 5*5 pixel segments
Calculate the histogram of pixel intensities for each of these segments
Pass the histograms along with the labels to the Logistic Regression class for training
Break the whole image into 5*5 segments and predict "good"/"bad" for each segment.
Using the sigmod function the linear regression equation is:
1/ (1 - e^(xθ))
Where x is the input values and theta (θ) is the weights. I use gradient descent to train the network. My code for this is:
void LogisticRegression::Train(float **trainingSet,float *labels, int m)
{
float tempThetaValues[m_NumberOfWeights];
for (int iteration = 0; iteration < 10000; ++iteration)
{
// Reset the temp values for theta.
memset(tempThetaValues,0,m_NumberOfWeights*sizeof(float));
float error = 0.0f;
// For each training set in the example
for (int trainingExample = 0; trainingExample < m; ++trainingExample)
{
float * x = trainingSet[trainingExample];
float y = labels[trainingExample];
// Partial derivative of the cost function.
float h = Hypothesis(x) - y;
for (int i =0; i < m_NumberOfWeights; ++i)
{
tempThetaValues[i] += h*x[i];
}
float cost = h-y; //Actual J(theta), Cost(x,y), keeps giving NaN use MSE for now
error += cost*cost;
}
// Update the weights using batch gradient desent.
for (int theta = 0; theta < m_NumberOfWeights; ++theta)
{
m_pWeights[theta] = m_pWeights[theta] - 0.1f*tempThetaValues[theta];
}
printf("Cost on iteration[%d] = %f\n",iteration,error);
}
}
Where sigmoid and the hypothesis are calculated using:
float LogisticRegression::Sigmoid(float z) const
{
return 1.0f/(1.0f+exp(-z));
}
float LogisticRegression::Hypothesis(float *x) const
{
float z = 0.0f;
for (int index = 0; index < m_NumberOfWeights; ++index)
{
z += m_pWeights[index]*x[index];
}
return Sigmoid(z);
}
And the final prediction is given by:
int LogisticRegression::Predict(float *x)
{
return Hypothesis(x) > 0.5f;
}
As we are using a histogram of intensities the input and weight arrays are 255 elements. My hope is to use it on something like a picture of an apple with a bruise and use it to identify the brused parts. The (normalized) histograms for the whole brused and apple training sets look somthing like this:
For the "good" sections of the apple (y=0):
For the "bad" sections of the apple (y=1):
I'm not 100% convinced that using the intensites alone will produce the results I want but even so, using it on a clearly seperable data set isn't working either. To test it I passed it a, labeled, completely white and a completely black image. I then run it on the small image below:
Even on this image it fails to identify any segments as being black.
Using MSE I see that the cost is converging downwards to a point where it remains, for the black and white test it starts at about cost 250 and settles on 100. The apple chuncks start at about 4000 and settle on 1600.
What I can't tell is where the issues are.
Is, the approach sound but the implementation broken? Is logistic regression the wrong algorithm to use for this task? Is gradient decent not robust enough?
I forgot to answer this... Basically the problem was in my histograms which when generated weren't being memset to 0. As to the overall problem of whether or not logistic regression with greyscale images was a good solution, the answer is no. Greyscale just didn't provide enough information for good classification. Using all colour channels was a bit better but I think the complexity of the problem I was trying to solve (bruises in apples) was a bit much for simple logistic regression on its own. You can see the results on my blog here.

colorbalance in an image using c++ and opencv

I'm trying to score the colorbalance of an image using c++ and opencv.
To do this the easiest way is to count the number of pixels in each color and then see if one of the colors is more prevalent.
I figured I should probably used calcHist and with the split function I can split a image in R, G, and B histograms. However I am unsure about what to do next. I could probably walk through all the bins and just see how many pixels are in there but this seems like a lot of work (I currently use 256 bins).
Is there a faster way to count the pixels in a color range? Also I am not sure how it would work if white or black are the more prevalant colors?
Automatic color balance algorithm is described in this link http://web.stanford.edu/~sujason/ColorBalancing/simplestcb.html
For C++ Code you can refer to this link : https://www.morethantechnical.com/2015/01/14/simplest-color-balance-with-opencv-wcode/
/// perform the Simplest Color Balancing algorithm
void SimplestCB(Mat& in, Mat& out, float percent) {
assert(in.channels() == 3);
assert(percent > 0 && percent < 100);
float half_percent = percent / 200.0f;
vector<Mat> tmpsplit; split(in,tmpsplit);
for(int i=0;i<3;i++) {
//find the low and high precentile values (based on the input percentile)
Mat flat; tmpsplit[i].reshape(1,1).copyTo(flat);
cv::sort(flat,flat,CV_SORT_EVERY_ROW + CV_SORT_ASCENDING);
int lowval = flat.at<uchar>(cvFloor(((float)flat.cols) * half_percent));
int highval = flat.at<uchar>(cvCeil(((float)flat.cols) * (1.0 - half_percent)));
cout << lowval << " " << highval << endl;
//saturate below the low percentile and above the high percentile
tmpsplit[i].setTo(lowval,tmpsplit[i] < lowval);
tmpsplit[i].setTo(highval,tmpsplit[i] > highval);
//scale the channel
normalize(tmpsplit[i],tmpsplit[i],0,255,NORM_MINMAX);
}
merge(tmpsplit,out);
}
// Usage example
void main() {
Mat tmp,im = imread("lily.png");
SimplestCB(im,tmp,1);
imshow("orig",im);
imshow("balanced",tmp);
waitKey(0);
return;
}
Colour balance is normally looking at a white (or gray) surface and checking the ratios of red/blue to green. A perfectly balanced system would have equal signal levels in red/blue.
You can then simply work out the average red/blue from the test gray card image and apply the same scaling to your real image.
Doing it on a live image with no reference is trickier, you have to find areas that are probably white (ie bright and nearly r=g=b) and use them as the reference
There's no definitive algorithm for colour balance, so anything you might implement, however good it is, will probably fail in some conditions.
One of the simplest algorithms is called Grey World, and assumes that statistically the average colour of a scene should be grey. And if it isn't, it means that it needs to be corrected to grey. So, very simply (in pseudo-python), if you have an image RGB:
cc[0] = np.mean(RGB[:,0]) # calculating channel-wise average
cc[1] = np.mean(RGB[:,1])
cc[2] = np.mean(RGB[:,2])
cc = cc / np.sqrt((cc**2).sum()) # normalise the light (you might want to
# play with this a bit
RGB /= cc # divide every pixel by the estimated light
Note that here I'm assuming that RGB is an array of floats with values between 0 and 1. Something else that helps is to exclude from the average pixels that contain values below and above certain thresholds (e.g., below 0.05 and above 0.95). This way you ignore pixels whose value is heavily influenced by noise (small values) and pixels that saturated the camera sensor and whose colour may not be reliable (large values).

How can I pixelate a 1d array

I want to pixelate an image stored in a 1d array, although i am not sure how to do it, this is what i have comeup with so far...
the value of pixelation is currently 3 for testing purposes.
currently it just creates a section of randomly coloured pixels along the left third of the image, if i increase the value of pixelation the amount of random coloured pixels decreases and vice versa, so what am i doing wrong?
I have also already implemented the rotation, reading of the image and saving of a new image this is just a separate function which i need assistance with.
picture pixelate( const std::string& file_name, picture& tempImage, int& pixelation /* TODO: OTHER PARAMETERS HERE */)
{
picture pixelated = tempImage;
RGB tempPixel;
tempPixel.r = 0;
tempPixel.g = 0;
tempPixel.b = 0;
int counter = 0;
int numtimesrun = 0;
for (int x = 1; x<tempImage.width; x+=pixelation)
{
for (int y = 1; y<tempImage.height; y+=pixelation)
{
//RGB tempcol;
//tempcol for pixelate
for (int i = 1; i<pixelation; i++)
{
for (int j = 1; j<pixelation; j++)
{
tempPixel.r +=tempImage.pixel[counter+pixelation*numtimesrun].colour.r;
tempPixel.g +=tempImage.pixel[counter+pixelation*numtimesrun].colour.g;
tempPixel.b +=tempImage.pixel[counter+pixelation*numtimesrun].colour.b;
counter++;
//read colour
}
}
for (int k = 1; k<pixelation; k++)
{
for (int l = 1; l<pixelation; l++)
{
pixelated.pixel[numtimesrun].colour.r = tempPixel.r/pixelation;
pixelated.pixel[numtimesrun].colour.g = tempPixel.g/pixelation;
pixelated.pixel[numtimesrun].colour.b = tempPixel.b/pixelation;
//set colour
}
}
counter = 0;
numtimesrun++;
}
cout << x << endl;
}
cout << "Image successfully pixelated." << endl;
return pixelated;
}
I'm not too sure what you really want to do with your code, but I can see a few problems.
For one, you use for() loops with variables starting at 1. That's certainly wrong. Arrays in C/C++ start at 0.
The other main problem I can see is the pixelation parameter. You use it to increase x and y without knowing (at least in that function) whether it is a multiple of width and height. If not, you will definitively be missing pixels on the right edge and at the bottom (which edges will depend on the orientation, of course). Again, it very much depends on what you're trying to achieve.
Also the i and j loops start at the position defined by counter and numtimesrun which means that the last line you want to hit is not tempImage.width or tempImage.height. With that you are rather likely to have many overflows. Actually that would also explain the problems you see on the edges. (see update below)
Another potential problem, cannot tell for sure without seeing the structure declaration, but this sum using tempPixel.c += <value> may overflow. If the RGB components are defined as unsigned char (rather common) then you will definitively get overflows. So your average sum is broken if that's the fact. If that structure uses floats, then you're good.
Note also that your average is wrong. You are adding source data for pixelation x pixalation and your average is calculated as sum / pixelation. So you get a total which is pixalation times larger. You probably wanted sum / (pixelation * pixelation).
Your first loop with i and j computes a sum. The math is most certainly wrong. The counter + pixelation * numtimesrun expression will start reading at the second line, it seems. However, you are reading i * j values. That being said, it may be what you are trying to do (i.e. a moving average) in which case it could be optimized but I'll leave that out for now.
Update
If I understand what you are doing, a representation would be something like a filter. There is a picture of a 3x3:
.+. *
+*+ =>
.+.
What is on the left is what you are reading. This means the source needs to be at least 3x3. What I show on the right is the result. As we can see, the result needs to be 1x1. From what I see in your code you do not take that in account at all. (the varied characters represent varied weights, in your case all weights are 1.0).
You have two ways to handle that problem:
The resulting image has a size of width - pixelation * 2 + 1 by height - pixelation * 2 + 1; in this case you keep one result and do not care about the edges...
You rewrite the code to handle edges. This means you use less source data to compute the resulting edges. Another way is to compute the edge cases and save that in several output pixels (i.e. duplicate the pixels on the edges).
Update 2
Hmmm... looking at your code again, it seems that you compute the average of the 3x3 and save it in the 3x3:
.+. ***
+*+ => ***
.+. ***
Then the problem is different. The numtimesrun is wrong. In your k and l loops you save the pixels pixelation * pixelation in the SAME pixel and that advanced by one each time... so you are doing what I shown in my first update, but it looks like you were trying to do what is shown in my 2nd update.
The numtimesrun could be increased by pixelation each time:
numtimesrun += pixelation;
However, that's not enough to fix your k and l loops. There you probably need to calculate the correct destination. Maybe something like this (also requires a reset of the counter before the loop):
counter = 0;
... for loops ...
pixelated.pixel[counter+pixelation*numtimesrun].colour.r = ...;
... (take care of g and b)
++counter;
Yet again, I cannot tell for sure what you are trying to do, so I do not know why you'd want to copy the same pixel pixelation x pixelation times. But that explains why you get data only at the left (or top) of the image (very much depends on the orientation, one side for sure. And if that's 1/3rd then pixelation is probably 3.)
WARNING: if you implement the save properly, you'll experience crashes if you do not take care of the overflows mentioned earlier.
Update 3
As explained by Mark in the comment below, you have an array representing a 2d image. In that case, your counter variable is completely wrong since this is 100% linear whereas the 2d image is not. The 2nd line is width further away. At this point, you read the first 3 pixels at the top-left, then the next 3 pixels on the same, and finally the next 3 pixels still on the same line. Of course, it could be that your image is thus defined and these pixels are really one after another, although it is not very likely...
Mark's answer is concise and gives you the information necessary to access the correct pixels. However, you will still be hit by the overflow and possibly the fact that the width and height parameters are not a multiple of pixelation...
I don't do a lot of C++, but here's a pixelate function I wrote for Processing. It takes an argument of the width/height of the pixels you want to create.
void pixelateImage(int pxSize) {
// use ratio of height/width...
float ratio;
if (width < height) {
ratio = height/width;
}
else {
ratio = width/height;
}
// ... to set pixel height
int pxH = int(pxSize * ratio);
noStroke();
for (int x=0; x<width; x+=pxSize) {
for (int y=0; y<height; y+=pxH) {
fill(p.get(x, y));
rect(x, y, pxSize, pxH);
}
}
}
Without the built-in rect() function you'd have to write pixel-by-pixel using another two for loops:
for (int px=0; px<pxSize; px++) {
for (int py=0; py<pxH; py++) {
pixelated.pixel[py * tempImage.width + px].colour.r = tempPixel.r;
pixelated.pixel[py * tempImage.width + px].colour.g = tempPixel.g;
pixelated.pixel[py * tempImage.width + px].colour.b = tempPixel.b;
}
}
Generally when accessing an image stored in a 1D buffer, each row of the image will be stored as consecutive pixels and the next row will follow immediately after. The way to address into such a buffer is:
image[y*width+x]
For your purposes you want both inner loops to generate coordinates that go from the top and left of the pixelation square to the bottom right.

Template Matching with Mask

I want to perform Template matching with mask. In general Template matching can be made faster by converting the image from Spacial domain into Frequency domain. But is there any any method i can apply if i want to perform the same with mask? I'm using opencv c++. Is there any matching function already there in opencv for this task?
My current Approach:
Bitwise Xor Image A & Image B with Mask.
Count the Non-Zero Pixels.
Fill the Resultant matrix with this count.
Search for maxi-ma.
Few parameters I'm guessing now are:
Skip the Tile position if the matches are less than 25%.
Skip the tile position if the matches are less than 25%.
Skip the Tile position if the previous Tile has matches are less than 50%.
My question: is there any algorithm to do this matching already? Is there any mathematical operation which can speed up this process?
With binary images, you can use directly HU-Moments and Mahalanobis distance to find if image A is similar to image B. If the distance tends to 0, then the images are the same.
Of course you can use also Features detectors so see what matches, but for pictures like these, HU Moments or Features detectors will give approximately same results, but HU Moments are more efficient.
Using findContours, you can extract the black regions inside the white star and fill them, in order to have image A = image B.
Other approach: using findContours on your mask and apply the result to Image A (extracting the Region of Interest), you can extract what's inside the star and count how many black pixels you have (the mismatching ones).
I have same requirement and I have tried the almost same way. As in the image, I want to match the castle. The castle has a different shield image and variable length clan name and also grass background(This image comes from game Clash of Clans). The normal opencv matchTemplate does not work. So I write my own.
I follow the ways of matchTemplate to create a result image, but with different algorithm.
The core idea is to count the matched pixel under the mask. The code is following, it is simple.
This works fine, but the time cost is high. As you can see, it costs 457ms.
Now I am working on the optimization.
The source and template images are both CV_8U3C, mask image is CV_8U. Match one channel is OK. It is more faster, but it still costs high.
Mat tmp(matTempl.cols, matTempl.rows, matTempl.type());
int matchCount = 0;
float maxVal = 0;
double areaInvert = 1.0 / countNonZero(matMask);
for (int j = 0; j < resultRows; j++)
{
float* data = imgResult.ptr<float>(j);
for (int i = 0; i < resultCols; i++)
{
Mat matROI(matSource, Rect(i, j, matTempl.cols, matTempl.rows));
tmp.setTo(Scalar(0));
bitwise_xor(matROI, matTempl, tmp);
bitwise_and(tmp, matMask, tmp);
data[i] = 1.0f - float(countNonZero(tmp) * areaInvert);
if (data[i] > matchingDegree)
{
SRect rc;
rc.left = i;
rc.top = j;
rc.right = i + imgTemplate.cols;
rc.bottom = j + imgTemplate.rows;
rcOuts.push_back(rc);
if ( data[i] > maxVal)
{
maxVal = data[i];
maxIndex = rcOuts.size() - 1;
}
if (++matchCount == maxMatchs)
{
Log_Warn("Too many matches, stopped at: " << matchCount);
return true;
}
}
}
}
It says I have not enough reputations to post image....
http://i.stack.imgur.com/mJrqU.png
New added:
I success optimize the algorithm by using key points. Calculate all the points is cost, but it is faster to calculate only server key points. See the picture, the costs decrease greatly, now it is about 7ms.
I still can not post image, please visit: http://i.stack.imgur.com/ePcD9.png
Please give me reputations, so I can post images. :)
There is a technical formulation for template matching with mask in OpenCV Documentation, which works well. It can be used by calling cv::matchTemplate and its source code is also available under the Intel License.

Vertically flipping an Char array: is there a more efficient way?

Lets start with some code:
QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
if (vertFlip){
/* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
* The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
* The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
* This function is EXTREMELY SLOW
*/
QByteArray tempArray = imageArray;
for (int h = 0; h < height; ++h){
for (int w = 0; w < width/2; ++w){
for (int i = 0; i < 4; ++i){
imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
}
}
}
}
return imageArray;
}
This is the code I use right now to vertically flip an image which is 640*480 (The image is actually not guaranteed to be 640*480, but it mostly is). The color encoding is RGBA, which means that the total array size is 640*480*4. I get the images with 30 FPS, and I want to show them on the screen with the same FPS.
On an older CPU (Athlon x2) this code is just too much: the CPU is racing to keep up with the 30 FPS, so the question is: can I do this more efficient?
I am also working with OpenGL, does that have a gimmic I am not aware of that can flip images with relativly low CPU/GPU usage?
According to this question, you can flip an image in OpenGL by scaling it by (1,-1,1). This question explains how to do transformations and scaling.
You can improve at least by doing it blockwise, making use of the cache architecture. In your example one of the accesses (either the read OR the write) will be off-cache.
For a start it can help to "capture scanlines" if you're using two loops to loop through the pixels of an image, like so:
for (int y = 0; y < height; ++y)
{
// Capture scanline.
char* scanline = imageArray.data() + y*width*4;
for (int x = 0; x < width/2; ++x)
{
const int flipped_x = width - x-1;
for (int i = 0; i < 4; ++i)
swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
}
}
Another thing to note is that I used swap instead of a temporary image. That'll tend to be more efficient since you can just swap using registers instead of loading pixels from a copy of the entire image.
But also it generally helps if you use a 32-bit integer instead of working one byte at a time if you're going to be doing anything like this. If you're working with pixels with 8-bit types but know that each pixel is 32-bits, e.g., as in your case, you can generally get away with a case to uint32_t*, e.g.
for (int y = 0; y < height; ++y)
{
uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
std::reverse(scanline, scanline + width);
}
At this point you might parellelize the y loop. Flipping an image horizontally (it should be "horizontal" if I understood your original code correctly) in this way is a little bit tricky with the access patterns, but you should be able to get quite a decent boost using the above techniques.
I am also working with OpenGL, does that have a gimmic I am not aware
of that can flip images with relativly low CPU/GPU usage?
Naturally the fastest way to flip images is to not touch their pixels at all and just save the flipping for the final part of the pipeline when you render the result. For this you might render a texture in OGL with negative scaling instead of modifying the pixels of a texture.
Another thing that's really useful in video and image processing is to represent an image to process like this for all your image operations:
struct Image32
{
uint32_t* pixels;
int32_t width;
int32_t height;
int32_t x_stride;
int32_t y_stride;
};
The stride fields are what you use to get from one scanline (row) of an image to the next vertically and one column to the next horizontally. When you use this representation, you can use negative values for the stride and offset the pixels accordingly. You can also use the stride fields to, say, render only every other scanline of an image for fast interactive half-res scanline previews by using y_stride=height*2 and height/=2. You can quarter-res an image by setting x stride to 2 and y stride to 2*width and then halving the width and height. You can render a cropped image without making your blit functions accept a boatload of parameters by just modifying these fields and keeping the y stride to width to get from one row of the cropped section of the image to the next:
// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source,
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);
// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst,
int src_x, int src_y, int src_w, int src_h,
const uint32_t* src, bool flip_x, bool flip_y);
By using negative values and passing it to an image operation (ex: a blit operation), the result will naturally be flipped without having to actually flip the image. It'll end up being "drawn flipped", so to speak, just as with the case of using OGL with a negative scaling transformation matrix.