Image Magick slow drawing - c++

I am trying to draw a bunch of lines on image using Image Magick library (Magick++ API) and the total execution time appears to be quite a large.
Are there any ways to optimize IMagick drawing performance?
int SIZE = 700, LINES_NUM = 6000;
Image outputImage(Geometry(SIZE, SIZE), Color("white"));
for (int i = 0; i < LINES_NUM; i++) {
outputImage.draw(DrawableLine(lines[i].x1,lines[i].y1,
lines[i].x2,lines[i].y2));
}

Try to avoid repeat Magick::Image.draw calls.
std::vector<Magick::Drawable> drawList;
for (int i = 0; i < LINES_NUM; i++) {
drawList.push_back(DrawableLine(lines[i].x1,lines[i].y1,
lines[i].x2,lines[i].y2));
}
outputImage.draw(drawList);
Also ensure that the ImageMagick libraries have been compiled with OpenMP support. If your going for speed, and not quality, I would recommend recompiling without High Dynamic Range Imagery --enable-hdri=no, and a low Quantum depth --with-quantum-depth=8.

Related

SDL GPU Why is blitting two images in two seperates for loops way faster?

So i currently am trying out some stuff in SDL_GPU/C++ and i have the following setup, the images are 32 by 32 pixels respectively and the second image is transparent.
//..sdl init..//
GPU_Image* image = GPU_LoadImage("path");
GPU_Image* image2 = GPU_LoadImage("otherpath");
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
GPU_Blit(image2, NULL, screen, j, i);
}
}
This codes with a WQHD sized screen has ~20FPS. When i do the following however
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
}
}
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image2, NULL, screen, j, i);
}
}
i.e. seperate the two blitt calls in two differenct for loops i get 300FPS.
Can someone try to explain this to me or has any idea what might be going on here?
While cache locality might have an impact, I don't think it is the main issue here, especially considering the drop of frame time from 50ms to 3.3ms.
The call of interest is of course GPU_Blit, which is defined here as making some checks followed by a call to _gpu_current_renderer->impl->Blit. This Blit function seems to refer to the same one, regardless of the renderer. It's defined here.
A lot of code in there makes use of the image parameter, but two functions in particular, prepareToRenderImage and bindTexture, call FlushBlitBuffer several times if you are not rendering the same thing as in the previous blit. That looks to me like an expensive operation. I haven't used SDL_gpu before, so I can't guarantee anything, but it necessarily makes more glDraw* calls if you render something other than what you rendered previously, than if you render the same thing again and again. And glDraw* calls are usually the most expensive API calls in an OpenGL application.
It's relatively well known in 3D graphics that making as few changes to the context (in this case, the image to blit) as possible can improve performance, simply because it makes better use of the bandwidth between CPU and GPU. A typical example is grouping together all the rendering that uses some particular set of textures (e.g. materials). In your case, it's grouping all the rendering of one image, and then of the other image.
While both examples render the same number of textures, the first one forces the GPU to make hundreds/thousands (depends on screen size) texture binds while the second makes only 2 texture binds.
The cost of rendering a texture is very cheap on modern GPUs while texture binds (switching to use another texture) are quite expensive.
Note that you can use texture atlas to alleviate the texture bind bottleneck while retaining the desired render order.

How Can I make it faster in c++11 with std::vector?

I have cv::Mat Mat_A and cv::Mat Mat_B both are (800000 X 512) floats
and below code is looks slow .
int rows = Mat_B.rows;
cv::Mat Mat_A = cv::repeat(img, rows, 1, Mat_A);
Mat_A = Mat_A - Mat_B
cv::pow(Mat_A,2,Mat_A)
cv::reduce(Mat_A, Mat_A, 1, CV_REDUCE_SUM);
cv::minMaxLoc(Mat_A, &dis, 0, &point, 0);
How Can I do this in std::vector ?
I think it should be faster.
In my 2.4 Ghz mabook pro it took 4 sec ? very slow.
I don't think you should use std::vector to do these operations. Image processing (CV aka Computer Vision) algorithms tend to be quite computationally heavy because there is so much data to deal with. OpenCV 2.0 C++ is highly optimized for this kind of operations, e.g. cv::Mat has a header and whenever a cv::Mat is copied with copy assignment or constructor, only the headers are copied with a pointer to the data. They use reference counting to keep track of instances. So memory management is done for you, and that's a good thing.
https://docs.opencv.org/2.4/doc/tutorials/core/mat_the_basic_image_container/mat_the_basic_image_container.html
You could try to compile without debug symbols, i.e. release vs debug. And you can also try to compile with optimization flags, e.g. for gcc -O3 which should reduce the size of your binary and speed up runtime operations. Maybe it might make a difference.
https://www.rapidtables.com/code/linux/gcc/gcc-o.html
Another thing you could try is to give your process a higher priority, i.e. the higher the priority, the less it the process yields the CPU. Again, that might not make a lot of difference, it all depends of other processes and their priorities, etc.
https://superuser.com/questions/42817/is-there-any-way-to-set-the-priority-of-a-process-in-mac-os-x
I hope that helps a bit.
Well your thinking is wrong.
Why your program is slow:
Your CPU have to loop through a lot of number and do calculation. This will make computation complexity high. That's why it's slow. Your program's speed is in proportion to size of Mat A and B. You can check this point by reducing/increasing the size of Mat A and B.
Can we accelerate it by std::vector
Sorry but it's no. Using std::vector will not reduce the calculation complexity. The math arthmetic of opencv is da "best", re-writing will only lead to slower code.
How to accelerate the calculation: you need to enable the acceleration options for opencv
you can see it at : https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options . Intel provide intel mkl library to accelerate the matrix calculation. You could try it first.
Personally, the easiest approach is to use the GPU. But your machine doesn't have GPU, so it's out of the scope here.
You keep iterating over the data over and over again to do independent operations on them.
Something like this iterates only once over the data.
//assumes Mat_B and img cv::Mat
using px_t = float;//you mentioned float so I'll assume both img and Mat_B use floats
int rows = Mat_B.rows;
cv::Mat output(1,rows, Mat_B.type());
auto output_ptr = output.ptr<px_t>(0);
auto img_ptr = img.ptr<px_t>(0);
int min_idx =0;
int max_idx =0;
px_t min_ele = std::numeric_limits<px_t>::max();
px_t max_ele = std::numeric_limits<px_t>::min();
for(int i = 0; i< rows; ++i)
{
output[i]=0;
auto mat_row = Mat_B.ptr<px_t>(i);
for(int j = 0; j< Mat_B.cols; ++j)
{
output[i] +=(img_ptr[j]-mat_row[j])*(img_ptr[j]-mat_row[j]);
}
if(output[i]<min_ele)
{
min_idx = i;
min_ele = output[i];
}
if(output[i]>max_ele)
{
max_idx = i;
max_ele = output[i];
}
}
While I am also not sure if it is faster you can do this, assuming Mat_B contains uchar
std::vector<uchar> array_B(Mat_B.rows* Mat_B.cols);
if(Mat_B.isContinuous())
array_B = Mat_B.data;

SFML - optimize copying from GPU to RAM

My application contains one short function which copies SFML GPU buffer (sf::RenderTexture converted to sf::Image) into two-dimensional array of colors (which is stored in RAM and processed by CPU). Here is code:
const sf::Image image = renderTexture.getTexture().copyToImage();
for (Point_t y = 0; y < totalHeight; ++y)
{
for (Point_t x = 0; x < totalWidth; ++x)
{
const sf::Color& c = image.getPixel(x, totalHeight - y - 1);
// here processing this c variable
}
}
The problem is: with screen 256x64px I am getting like 20 FPS - that's too low, I need like 50 FPS in my application. How can I improve performance of this process?
Maybe I should use additional library that would speed it up?
EDIT:
Someone gave suggestion that I should use real imaging library instead of SFML. But the point is that SFML is perfect library for things like rotating objects in real time etc., so I will stick to SFML, just need optimization or another way to copy buffer from GPU to CPU.
You dont need to do that, SFML optimize it self all this. You really need an Image cant you just pass through sf::Texture and sf::Sprite ?
For interested ones, I have found a solution:
Copy from GPU to CPU speedup is provided by this lib: https://github.com/adafruit/rpi-fb-matrix
https://github.com/adafruit/rpi-fb-matrix/blob/master/rpi-fb-matrix.cpp (line 70)

OpenCV execution speed (for loops and meanStdDev)

I'm fairly new to OpenCV and C++ ( learning it now after doing a fair share if image processing on MATLAB and LabView).
I'm having a weird issue I wanted to ask your opinion.
I'm trying to do a fairly simple thing: moving window 1x9 stdev on a gray scaled image (~ 4500X2000 pix).
here is the heart of the code:
Mat src = imread("E:\\moon project\\Photos\\Skyline testing\\IMGP6043 sourse.jpg");
Scalar roi_mean, roi_stdev;
Mat stdev_map(src.rows, src.cols, CV_64FC1,Scalar(0));
cvtColor(src, src_gray, CV_BGR2GRAY);
int t = clock();
for (int i = 0; i < src_gray.cols - 1; i++)
{
for (int j = 0; j < src_gray.rows - 8; j++)
{
meanStdDev(src_gray.col(i).rowRange(j,j+9), roi_mean, roi_stdev);
stdev_map.at<double>(j, i) = roi_stdev[0];
}
}
t = clock() - t;
cout << "stdev calc : " << t << " msec" << endl;
Now on the aforementioned image it takes 35 seconds to run the double loop (delta t value) and even if I throw away the meanStdDev and just assign a constant to stdev_map.at(j, i) it still takes 14 seconds to run the double loop.
I'm pretty sure I'm doing something wrong since on Labview it takes only 2.5 seconds to chew this baby with the exact same math.
Please help me.
To answer your question and some of the comments: do compile the lib in release mode will surely increase the computation time, by what order it depends, for example if you are using eigen it probably will speed things up a lot.
If you really want to do the loop by yourself, consider getting the row pointer to the data directly mat.data, or mat.ptr<cv::Vec3b>.
If you want to speed up the task of computing mean/stdDev on any part of your image, then use integral images. The doc is pretty clear about it, and I'm pretty sure it will take less than 2.5s probably even in debug mode.

Increasing the fps

I'm trying to develop a 3D graphics engine, I use a framebuffer class which is of my own creation but the fps is too low, and I think it's because I use putpixel() function from winbgim library,
My function to show framebuffer on screen is:
void framebuffer::showonscreen() //from buffer to screen(space to space 1d to 2d)
{
int i;
for(int y=0; y < length; y++)
{
for(int x=0; x < width; x++)
{
i = x + screeny[y];
putpixel(x, y, colbuf[i]);
}
}
}
Is there any alternative to this putpixel function or a technique to speed it up, or any other manual (without using libraries) way
I heard about giving a direct access to memory blocks, or using the vram
Would any one know how to help me in this problem?
Please don't try to reinvent the wheel, there are nice, open source, cross platform wheels for you to use. You can't really expect to get much performance if you're individually setting each pixel using CPU, it's much nicer to use the GPU, which basically requires you to use the nice, open source, cross platform wheels I was referring you to earlier.