I have a gray image (648*488 pixels) and I want to get the coordinates of the pixels above a threshold, I need to be really fast doing this so I want to know if there is a function in opencv to do this.
You could start with the OpenCV forEach method in the Mat class.
It uses one of several parallel frameworks for multithreading.
If it has to be even faster, access the memory directly and compute the threshold for several pixels with SIMD instructions at the same time.
You can also use GPUs (Cuda / OpenCL).
Related
I am blurring the background of an image using the blur method. All the tutorials I have seen show the highest kernel size of (7,7). But that is not blurred enough for what I need it for.
I have used Size(33,33) and it works alright but I would like to go higher so currently I am using Size(77,77). Is this the most efficient way of blurring an image in OpenCV? And is it okay to go that high at all?
Another Idea is run the blur method more than once. with a kernel size of (7,7), but that doesn't seem like it is more efficient.
EDIT:
OpenCV version 3.2
Try cv::stackBlur().
It's an addition from v4.7.0. Its performance is almost flat, i.e. independent of kernel size. The pull request contains performance figures: https://github.com/opencv/opencv/pull/20379
GaussianBlur(sigmaX=22) (30 ms)
stackBlur(ksize=(101,101)) (0.4 ms)
I am currently planning on writing a function that extracts overlapping image patches from a 2D image (width x height) into a 3D batch of these patches (batch_id x patch_width x patch_height). As far as I know, there are no utilities in CUDA or OpenCV CUDA which make that very easy. (Please correct me if I am wrong here)
Since I need to resort to writing my own CUDA kernel for this task I need to decide how to tackle this approach. As far as I see there are two ways how to write the kernel:
Create a GPU thread for each pixel and map this pixel to potentially multiple locations in the 3D batch.
Create a GPU thread for each pixel in the 3D batch and let it fetch its corresponding pixel from the image.
I didn't find a clear answer in the CUDA Programming Guide to whether any of these approaches has specific advantages or disadvantages. Would you favour one of these approaches or is there an even easier way of doing this?
I think 1 is better, because it can minimize memory transactions. Memory transactions are done in a fixed size (e.g. L1 : 128 bytes), so grouping data loads and making as few cache transactions as possible can affect processing time...
Of course, it's possible that memory transactions in both way are same. Although I'm not sure about my choice, consider this when you make a kernel.
I need to render vector graphics very fast to use it in OpenCV (in nodejs).
Fastest way to render simple shapes like oval is to use OpenCV drawing functions.
In my multithreaded test program I have ~625 1-channel 512*512 Mat's with 1 random filled oval per second.
With fastest available in nodejs SVG to PNG renderer 'librsvg' I have only ~277 same Mat's per second. It's not fast enough for my purposes.
I found another SVG renderer lib based on OpenGL - SVGL, but I didn't test it's performance, there is no bindings for node, C++ only.
I will need to render much more complicated vector graphics than just one ellipse.
So I expect a lot of work if I will try to implement all the drawing functions I will need with OpenCV, and I am not sure if OpenCV performance will be still acceptable in case of complicated vector images.
"Complicated" I mean some hundreds of semi-transparent arcs, beziers or some kind of rounded polygons, not filled or filled with solid semi-transparent color or, possibly, with gradients. And I want to render it to pretty large Mat, may be 1024*768 or so.
SVG already has everything I need, but I don't know C++,
so it will(probably) also take a lot of time to implement bindings for SVGL, while I still don't know it's performance
May be there are some alternative opensource ways?
I am using Dlib's frontal face detector to detect faces in an images; however, it cannot detect faces smaller than 80 by 80 pixels.
Dlib's example in face_detection_ex.cpp upsamples the input image using pyramid_up() to increase the face sizes. However, it makes the algorithm much slower because it will have to search in a larger image.
I wonder if anyone knows a solution for this problem.
Dlib's face detector is trained to process 80x80 faces. If you want to detect smaller faces, you have two ways:
increase resolution to make faces bigger. you can use pyramid_up or any other way lice cv::resize. And you can increase resultion not 2x, but may be 1.5x will be enough - it's on you
train new face detector that will work on small faces - dlib has samples for training process
And the next your question is performance of face detector. Yes, it depends on resolution and if you want to detect 20x20 faces on 13 MP image - it will be slow. To make it work fast you have this options:
reduce amount of pixels that should be processed by detector - use the right scale and region of interest
use greyscale images
reduce the amount of scale changes at scanning process
use the recommendations from FAQ . I can only add that MinGW/GCC code works about 20% faster than MSVC and Android/ARM code is not using SIMD instructions
for videos: apply motion detection and detect only changed areas (crop them manually and detect in cropped area) and also run frames in separate threads to consume all CPU cores
I'm trying to track something in some frames. I know calcOpticalFlowPyrLK is supposed to be used for sparse tracking problems. However, I thought it wouldn't really hurt if I just try to track all pixels in the frames.
So my video frames are actually very stable(motions are barely visible by eyes), and calcopticalflowpyrlk works well for most pixels. But for some pixels it returns really big flow vectors(like [200,300]), which doesn't make sense.
And I also found a Matlab implementation that's using the same Pyramidal Lucas-Kanade algorithm, but this Matlab version doesn't return any crazy values.
So I'm wondering what is causing opencv function to return huge non-reasonable values. Is it because the matrix inversion is done differently?