MFC visibile function - mfc

I have an assignment using MFC which is a completely foreign language to me.
I have to be able to upload Image1, and Image2 into 2 picture controls. And using a slider: when it is on the far left, you see Image1 in a third picture control, when it is on the far right you see Image2. Anywhere in between you should see a transition.
I have most of the assignment down, the only thing I have left is this transition. I have an idea of what I need to do and I'm using a function similar to Allegro. I just can't seem to find the syntax for MFC.
This is probably as wrong as it gets so any help at all would be appreciated enormously. Thanks!
I have:
int nPos = m_Slider.GetPos();
int nWidth1 = Image1.GetWidth();
int nHeight1 = Image1.GetHeight();
int nWidth2 = Image2.GetWidth();
int nHeight2 = Image2.GetHeight();
int nWidth3 = (nWidth1 +nWidth2)/2;
int nHeight3 = (nHeight1 + nHeight2)/2;
int nPixel1;
int nPixel2;
int nPixel3;
int i1, i2, i3, j1, j2, j3;
Image3.Create(nWidth3, nHeight3, 24);
for(i3=0; i3 < nWidth3; i3++){
for(j3=0; j3 < nHeight3; j3++){
i1 = i3 * nWidth1 / nWidth3;
i2 = i3 * nWidth2 / nWidth3;
j1 = j3 * nHeight1 / nHeight3;
j2 = j3 * nHeight2 / nHeight3;
getpixel(nPixel1, i1, j1);
getpixel(nPixel2, i2, j2);
putpixel(nPixel3, i3, j3);
nPixel3 = (nPixel1 * (100-nPos) + nPixel2*nPos) *Image3.visible/100;
}
}

You need a device context (DC) for the pictures and the transition. Load the pictures into the DC (you can do this in the background using a CMemDC), then you can calculate the transition and paint it into the third DC. The DC supports the functionality you want (GetPixel etc.)

Related

Producing a JPG via Ray Tracing(Ray Tracing in one Weekend)

I'm following the book Ray Tracing in on Weekend in which the author produces a small Ray Tracer using plain C++ and the result is a PPM image.
The author's code
Which produces this PPM image.
So the author suggests as an exercise to make it so the program produces a JPG image via the stb_image library. So far I tried by changing the original code like this :
#include <fstream>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
struct RGB{
unsigned char R;
unsigned char G;
unsigned char B;
};
int main(){
int nx = 200;
int ny = 100;
struct RGB data[nx][ny];
for(int j = ny - 1 ; j >= 0 ; j-- ){
for(int i = 0; i < nx ; i++){
float r = float(i) / float(nx);
float g = float(j) / float(ny);
float b = 0.2;
int ir = int(255.99 * r);
int ig = int(255.99 * g);
int ib = int(255.99 * b);
data[i][j].R = ir;
data[i][j].G = ig;
data[i][j].B = ib;
}
}
stbi_write_jpg("image.jpg", nx, ny, 3, data, 100);
}
And this is the result:
As you can see my result is slightly different and I don't know why.
The main problems are:
That the black color shows at the top left of the screen, and in general the colors don't show in the correct order left to right, top to bottom.
The image is "split" in half and the result is actually the original image of the author but produced in a pair ????
Probably I'm misunderstanding something about the way STB_IMAGE_WRITE is supposed to be used so if anyone experienced with this library can tell me what's going on I would be grateful.
EDIT 1 I implemented the changes suggested by #1201ProgramAlarm in comments plus I changed struct RGB data[nx][ny] to struct RGB data[ny][nx] ,so the result now is this.
The library should be working as intended. The problem is how you give the data and what you should do is invert the y axis. So, when you are at index 4 from the start, you should give the color of the index 4 from the end.
Taking the result from your edit, just change the line:
float g = float(j) / float(ny);
to
float g = float(ny - 1 - j) / float(ny);
You've got your indexes wrong for data. The inner loop variable should be the second subscript (data[j][i]).

Poor performance of large number of viz::Widget3D in OpenCV

Using several thousands of viz::Widget3D in OpenCV is very slow. I tried v3.4.3 and v4.0.0 on Windows with Visual Studio 2017. This code snippet takes over 5s to execute the timed part (t0 to t1) and viewing is very choppy afterwards:
using namespace std;
using namespace cv;
int main()
{
constexpr double n = 100;
viz::Viz3d window("Viz3d");
window.setFullScreen();
window.showWidget("Coordinate Widget", viz::WCoordinateSystem());
window.spinOnce();
auto t0 = chrono::high_resolution_clock::now();
for (double x = 0; x < n; x += 1)
for (double y = 0; y < n; y += 1)
window.showWidget(to_string(x+y*n), viz::WArrow({x, y, 0}, {x+1, y+1, 0}, 0.02, viz::Color::bluberry()));
auto t1 = chrono::high_resolution_clock::now();
window.spin();
fmt::print("\nTime: {}ms", chrono::duration_cast<chrono::milliseconds>(t1-t0).count());
fmt::print("\nVersion {}.{}.{}{}\n", CV_VERSION_MAJOR, CV_VERSION_MINOR, CV_VERSION_REVISION, CV_VERSION_STATUS);
return 0;
}
It seems that widget management imposes a huge overhead. Is there any other way to display thousands of widgets (text, lines, arrow) with low latency? I tried viz::WWidgetMerger and it's even slower.
EDIT
BTW, I need only "immediate" mode rendering. I'm not modifying the widgets after they are shown.
If you have thousands of widgets, creating a single WidgetMerger will be ungodly slow when adding all the widgets together (if you were to visualize the time of each individual addition you will see the exponential slowdown).
However, if you instead have multiple mergers it should be smooth with absolutely no choppiness when displaying, while only being a little bit slower on creation.
You can try doing something similar to this:
std::vector<viz::WWidgetMerger> mergers(n);
for (double x = 0; x < n; x += 1) {
viz::WWidgetMerger merger = mergers[x];
for (double y = 0; y < n; y += 1) {
merger.addWidget(viz::WArrow({x, y, 0}, {x+1, y+1, 0}, 0.02, viz::Color::bluberry()));
}
merger.finalize();
window.showWidget("merger_" + std::to_string(x), merger);
}
Performance differences:
Your code as is: Time: 2522ms extremely choppy to the point of being almost unusable.
Using multiple mergers: Time: 6097ms buttery smooth, no choppiness
Using a single merger: Time > 5 minutes. I didn't even wait for it to finish it was taking so long.
So while using multiple mergers is a slight slowdown initially when creating the widgets, for the display performance, I find it to be worth it.

Drawing circle, OpenGL style

I have a 13 x 13 array of pixels, and I am using a function to draw a circle onto them. (The screen is 13 * 13, which may seem strange, but its an array of LED's so that explains it.)
unsigned char matrix[13][13];
const unsigned char ON = 0x01;
const unsigned char OFF = 0x00;
Here is the first implementation I thought up. (It's inefficient, which is a particular problem as this is an embedded systems project, 80 MHz processor.)
// Draw a circle
// mode is 'ON' or 'OFF'
inline void drawCircle(float rad, unsigned char mode)
{
for(int ix = 0; ix < 13; ++ ix)
{
for(int jx = 0; jx < 13; ++ jx)
{
float r; // Radial
float s; // Angular ("theta")
matrix_to_polar(ix, jx, &r, &s); // Converts polar coordinates
// specified by r and s, where
// s is the angle, to index coordinates
// specified by ix and jx.
// This function just converts to
// cartesian and then translates by 6.0.
if(r < rad)
{
matrix[ix][jx] = mode; // Turn pixel in matrix 'ON' or 'OFF'
}
}
}
}
I hope that's clear. It's pretty simple, but then I programmed it so I know how it's supposed to work. If you'd like more info / explanation then I can add some more code / comments.
It can be considered that drawing several circles, eg 4 to 6, is very slow... Hence I'm asking for advice on a more efficient algorithm to draw the circles.
EDIT: Managed to double the performance by making the following modification:
The function calling the drawing used to look like this:
for(;;)
{
clearAll(); // Clear matrix
for(int ix = 0; ix < 6; ++ ix)
{
rad[ix] += rad_incr_step;
drawRing(rad[ix], rad[ix] - rad_width);
}
if(rad[5] >= 7.0)
{
for(int ix = 0; ix < 6; ++ ix)
{
rad[ix] = rad_space_step * (float)(-ix);
}
}
writeAll(); // Write
}
I added the following check:
if(rad[ix] - rad_width < 7.0)
drawRing(rad[ix], rad[ix] - rad_width);
This increased the performance by a factor of about 2, but ideally I'd like to make the circle drawing more efficient to increase it further. This checks to see if the ring is completely outside of the screen.
EDIT 2: Similarly adding the reverse check increased performance further.
if(rad[ix] >= 0.0)
drawRing(rad[ix], rad[ix] - rad_width);
Performance is now pretty good, but again I have made no modifications to the actual drawing code of the circles and this is what I was intending to focus on with this question.
Edit 3: Matrix to polar:
inline void matrix_to_polar(int i, int j, float* r, float* s)
{
float x, y;
matrix_to_cartesian(i, j, &x, &y);
calcPolar(x, y, r, s);
}
inline void matrix_to_cartesian(int i, int j, float* x, float* y)
{
*x = getX(i);
*y = getY(j);
}
inline void calcPolar(float x, float y, float* r, float* s)
{
*r = sqrt(x * x + y * y);
*s = atan2(y, x);
}
inline float getX(int xc)
{
return (float(xc) - 6.0);
}
inline float getY(int yc)
{
return (float(yc) - 6.0);
}
In response to Clifford that's actually a lot of function calls if they are not inlined.
Edit 4: drawRing just draws 2 circles, firstly an outer circle with mode ON and then an inner circle with mode OFF. I am fairly confident that there is a more efficient method of drawing such a shape too, but that distracts from the question.
You're doing a lot of calculations that aren't really needed. For example, you're calculating the angle of the polar coordinates, but never use it. The square root can also easily be avoided by comparing the square of the values.
Without doing anything fancy, something like this should be a good start:
int intRad = (int)rad;
int intRadSqr = (int)(rad * rad);
for (int ix = 0; ix <= intRad; ++ix)
{
for (int jx = 0; jx <= intRad; ++jx)
{
if (ix * ix + jx * jx <= radSqr)
{
matrix[6 - ix][6 - jx] = mode;
matrix[6 - ix][6 + jx] = mode;
matrix[6 + ix][6 - jx] = mode;
matrix[6 + ix][6 + jx] = mode;
}
}
}
This does all the math in integer format, and takes advantage of the circle symmetry.
Variation of the above, based on feedback in the comments:
int intRad = (int)rad;
int intRadSqr = (int)(rad * rad);
for (int ix = 0; ix <= intRad; ++ix)
{
for (int jx = 0; ix * ix + jx * jx <= radSqr; ++jx)
{
matrix[6 - ix][6 - jx] = mode;
matrix[6 - ix][6 + jx] = mode;
matrix[6 + ix][6 - jx] = mode;
matrix[6 + ix][6 + jx] = mode;
}
}
Don't underestimate the cost of even basic arithmetic using floating point on a processor with no FPU. It seems unlikely that floating point is necessary, but the details of its use are hidden in your matrix_to_polar() implementation.
Your current implementation considers every pixel as a candidate - that is also unnecessary.
Using the equation y = cy ± √[rad2 - (x-cx)2] where cx, cy is the centre (7, 7 in this case), and a suitable integer square root implementation, the circle can be drawn thus:
void drawCircle( int rad, unsigned char mode )
{
int r2 = rad * rad ;
for( int x = 7 - rad; x <= 7 + rad; x++ )
{
int dx = x - 7 ;
int dy = isqrt( r2 - dx * dx ) ;
matrix[x][7 - dy] = mode ;
matrix[x][7 + dy] = mode ;
}
}
In my test I used the isqrt() below based on code from here, but given that the maximum r2 necessary is 169 (132, you could implement a 16 or even 8 bit optimised version if necessary. If your processor is 32 bit, this is probably fine.
uint32_t isqrt(uint32_t n)
{
uint32_t root = 0, bit, trial;
bit = (n >= 0x10000) ? 1<<30 : 1<<14;
do
{
trial = root+bit;
if (n >= trial)
{
n -= trial;
root = trial+bit;
}
root >>= 1;
bit >>= 2;
} while (bit);
return root;
}
All that said, on such a low resolution device, you will probably get better quality circles and faster performance by hand generating bitmap lookup tables for each radius required. If memory is an issue, then a single circle needs only 7 bytes to describe a 7 x 7 quadrant that you can reflect to all three quadrants, or for greater performance you could use 7 x 16 bit words to describe a semi-circle (since reversing bit order is more expensive than reversing array access - unless you are using an ARM Cortex-M with bit-banding). Using semi-circle look-ups, 13 circles would need 13 x 7 x 2 bytes (182 bytes), quadrant look-ups would be 7 x 8 x 13 (91 bytes) - you may find that is fewer bytes that the code space required to calculate the circles.
For a slow embedded device with only a 13x13 element display, you should really just make a look-up table. For example:
struct ComputedCircle
{
float rMax;
char col[13][2];
};
Where the draw routine uses rMax to determine which LUT element to use. For example, if you have 2 elements with one rMax = 1.4f, the other = 1.7f, then any radius between 1.4f and 1.7f will use that entry.
The column elements would specify zero, one, or two line segments per row, which can be encoded in the lower and upper 4 bits of each char. -1 can be used as a sentinel value for nothing-at-this-row. It is up to you how many look-up table entries to use, but with a 13x13 grid you should be able to encode every possible outcome of pixels with well under 100 entries, and a reasonable approximation using only 10 or so. You can also trade off compression for draw speed as well, e.g. putting the col[13][2] matrix in a flat list and encoding the number of rows defined.
I would accept MooseBoy's answer if only he explained the method he proposes better. Here's my take on the lookup table approach.
Solve it with a lookup table
The 13x13 display is quite small, and if you only need circles which are fully visible within this pixel count, you will get around with a quite small table. Even if you need larger circles, it should be still better than any algorithmic way if you need it to be fast (and have the ROM to store it).
How to do it
You basically need to define how each possible circle looks like on the 13x13 display. It is not sufficient to just produce snapshots for the 13x13 display, as it is likely you would like to plot the circles at arbitrary positions. My take for a table entry would look like this:
struct circle_entry_s{
unsigned int diameter;
unsigned int offset;
};
The entry would map a given diameter in pixels to offsets in a large byte table containing the shape of the circles. For example for diameter 9, the byte sequence would look like this:
0x1CU, 0x00U, /* 000111000 */
0x63U, 0x00U, /* 011000110 */
0x41U, 0x00U, /* 010000010 */
0x80U, 0x80U, /* 100000001 */
0x80U, 0x80U, /* 100000001 */
0x80U, 0x80U, /* 100000001 */
0x41U, 0x00U, /* 010000010 */
0x63U, 0x00U, /* 011000110 */
0x1CU, 0x00U, /* 000111000 */
The diameter specifies how many bytes of the table belong to the circle: one row of pixels are generated from (diameter + 7) >> 3 bytes, and the number of rows correspond to the diameter. The output code of these can be made quite fast, while the lookup table is sufficiently compact to get even larger than the 13x13 display circles defined in it if needed.
Note that defining circles this way for odd and even diameters may or may not appeal you when output by a centre location. The odd diameter circles will appear to have a centre in the "middle" of a pixel, while the even diameter circles will appear to have their centre on the "corner" of a pixel.
You may also find it nice later to refine the overall method so having multiple circles of different apparent sizes, but having the same pixel radius. Depends on what is your goal: if you want some kind of smooth animation, you may get there eventually.
Algorithmic solutions I think mostly will perform poorly here, since with this limited display surface really every pixel's state counts for the appearance.

How to efficiently render a 24-bpp image on a 32-bpp display?

First of all, I'm programming in the kernel context so no existing libraries exist. In fact this code is going to go into a library of my own.
Two questions, one more important than the other:
As the title suggests, how can I efficiently render a 24-bpp image onto a 32-bpp device, assuming that I have the address of the frame buffer?
Currently I have this code:
void BitmapImage::Render24(uint16_t x, uint16_t y, void (*r)(uint16_t, uint16_t, uint32_t))
{
uint32_t imght = Math::AbsoluteValue(this->DIB->GetBitmapHeight());
uint64_t ptr = (uint64_t)this->ActualBMP + this->Header->BitmapArrayOffset;
uint64_t rowsize = ((this->DIB->GetBitsPerPixel() * this->DIB->GetBitmapWidth() + 31) / 32) * 4;
uint64_t oposx = x;
uint64_t posx = oposx;
uint64_t posy = y + (this->DIB->Type == InfoHeaderV1 && this->DIB->GetBitmapHeight() < 0 ? 0 : this->DIB->GetBitmapHeight());
for(uint32_t d = 0; d < imght; d++)
{
for(uint32_t w = 0; w < rowsize / (this->DIB->GetBitsPerPixel() / 8); w++)
{
r(posx, posy, (*((uint32_t*)ptr) & 0xFFFFFF));
ptr += this->DIB->GetBitsPerPixel() / 8;
posx++;
}
posx = oposx;
posy--;
}
}
r is a function pointer to a PutPixel-esque thing that accepts x, y, and colour parameters.
Obviously this code is terribly slow, since plotting pixels one at a time is never a good idea.
For my 32-bpp rendering code (which I also have a question about, more on that later) I can easily Memory::Copy() the bitmap array (I'm loading bmp files here) to the frame buffer.
However, how do I do this with 24bpp images? On a 24bpp display this would be fine but I'm working with a 32bpp one.
One solution I can think of right now is to create another bitmap array which essentially contains values of 0x00(colour) and the use that to draw to the screen -- I don't think this is very good though, so I'm looking for a better alternative.
Next question:
2. Given, for obvious reasons, one cannot simply Memory::Copy() the entire array at once onto the frame buffer, the next best thing would be to copy them row by row.
Is there a better way?
Basically something like this:
for (uint32_t l = 0; l < h; ++l) // l line index in pixels
{
// srcPitch is distance between lines in bytes
char* srcLine = (char*)srcBuffer + l * srcPitch;
unsigned* trgLine = ((unsigned*)trgBuffer) + l * trgPitch;
for (uint32_t c = 0; c < w; ++c) // c is column index in pixels
{
// build target pixel. arrange indexes to fit your render target (0, 1, 2)
++(*trgLine) = (srcLine[0] << 16) | (srcLine[1] << 8)
| srcLine[2] | (0xff << 24);
srcLine += 3;
}
}
A few notes:
- better to write to a different buffer than the render buffer so the image is displayed at once.
- using functions for pixel placement like you did is very (very very) slow.

exchanging 2 memory positions

I am working with OpenCV and Qt, Opencv use BGR while Qt uses RGB , so I have to swap those 2 bytes for very big images.
There is a better way of doing the following?
I can not think of anything faster but looks so simple and lame...
int width = iplImage->width;
int height = iplImage->height;
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
int limit = height * width;
for (int y = 0; y < limit; ++y) {
buf = iplImagePtr[2];
iplImagePtr[2] = iplImagePtr[0];
iplImagePtr[0] = buf;
iplImagePtr += 3;
}
QImage img((uchar *) iplImage->imageData, width, height,
QImage::Format_RGB888);
We are currently dealing with this issue in a Qt application. We've found that the Intel Performance Primitives to be be fastest way to do this. They have extremely optimized code. In the html help files at Intel ippiSwapChannels Documentation they have an example of exactly what you are looking for.
There are couple of downsides
Is the size of the library, but you can link static link just the library routines you need.
Running on AMD cpus. Intel libs run VERY slow by default on AMD. Check out www.agner.org/optimize/asmlib.zip for details on how do a work around.
I think this looks absolutely fine. That the code is simple is not something negative. If you want to make it shorter you could use std::swap:
std::swap(iplImagePtr[0], iplImagePtr[2]);
You could also do the following:
uchar* end = iplImagePtr + height * width * 3;
for ( ; iplImagePtr != end; iplImagePtr += 3) {
std::swap(iplImagePtr[0], iplImagePtr[2]);
}
There's cvConvertImage to do the whole thing in one line, but I doubt it's any faster either.
Couldn't you use one of the following methods ?
void QImage::invertPixels ( InvertMode mode = InvertRgb )
or
QImage QImage::rgbSwapped () const
Hope this helps a bit !
I would be inclined to do something like the following, working on the basis of that RGB data being in three byte blocks.
int i = 0;
int limit = (width * height); // / 3;
while(i != limit)
{
buf = iplImagePtr[i]; // should be blue colour byte
iplImagePtr[i] = iplImagaePtr[i + 2]; // save the red colour byte in the blue space
iplImagePtr[i + 2] = buf; // save the blue color byte into what was the red slot
// i++;
i += 3;
}
I doubt it is any 'faster' but at end of day, you just have to go through the entire image, pixel by pixel.
You could always do this:
int width = iplImage->width;
int height = iplImage->height;
uchar *start = (uchar *) iplImage->imageData;
uchar *end = start + width * height;
for (uchar *p = start ; p < end ; p += 3)
{
uchar buf = *p;
*p = *(p+2);
*(p+2) = buf;
}
but a decent compiler would do this anyway.
Your biggest overhead in these sorts of operations is going to be memory bandwidth.
If you're using Windows then you can probably do this conversion using the BitBlt and two appropriately set up DIBs. If you're really lucky then this could be done in the graphics hardware.
I hate to ruin anyone's day, but if you don't want to go the IPP route (see photo_tom) or pull in an optimized library, you might get better performance from the following (modifying Andreas answer):
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;
for (size_t y = 0; y < limit; ++y) {
std::swap(iplImagePtr[y * 3], iplImagePtr[y * 3 + 2]);
}
Now hold on, folks, I hear you yelling "but all those extra multiplies and adds!" The thing is, this form of the loop is far easier for a compiler to optimize, especially if they get smart enough to multithread this sort of algorithm, because each pass through the loop is independent of those before or after. In the other form, the value of iplImagePtr was dependent on the value in previous pass. In this form, it is constant throughout the whole loop; only y changes, and that is in a very, very common "count from 0 to N-1" loop construct, so it's easier for an optimizer to digest.
Or maybe it doesn't make a difference these days because optimizers are insanely smart (are they?). I wonder what a benchmark would say...
P.S. If you actually benchmark this, I'd also like to see how well the following performs:
uchar *iplImagePtr = (uchar *) iplImage->imageData;
uchar buf;
size_t limit = height * width;
for (size_t y = 0; y < limit; ++y) {
uchar *pixel = iplImagePtr + y * 3;
std::swap(pix[0], pix[2]);
}
Again, pixel is defined in the loop to limit its scope and keep the optimizer from thinking there's a cycle-to-cycle dependency. If the compiler increments and decrements the stack pointer each time through the loop to "create" and "destroy" pixel, well, it's stupid and I'll apologize for wasting your time.
cvCvtColor(iplImage, iplImage, CV_BGR2RGB);