In my trials with images of 1409x900 and 960x696, it takes 2.5 ms on average to split channels of a RGB image using OpenCV in my 64-bit 6-core 3.2 GHz Windows machine.
vector<cv::Mat> channels;
cv::split(img, channels);
I found that this is almost similar amount of time for the other image processing (boolean operation + morphological opening).
Considering my code only uses an image of a channel from the splitting, I wonder if there is any faster way of extracting single channel from a RGB image, preferably with OpenCV.
UPDATE
As #DanMašek pointed out, there was another function mixChannels that can extract a single channel image from multi-channel. I've tested about 2000 images with the same sizes. mixChannels took about 1 ms on average. For now, I am satisfied with the result. But post your answer if you can make it faster.
cv::Mat channel(img.rows, img.cols, CV_8UC1);
int from_to[] = { sel_channel,0 };
mixChannels(&img, 1, &channel, 1, from_to, 1);
Two simple options come to mind here.
You mention that you perform this operation repeatedly on images captured from a camera. Therefore it is safe to assume that the images are always the same size.
Allocations of cv::Mat have a non-negligible overhead, so in this case it would be beneficial to reuse the channel Mats. (i.e. allocate the destination images when you receive the first frame, and then just overwrite the contents for subsequent frames)
The additional benefit of this approach is (quite likely) reducing memory fragmentation. This can become a real problem for 32bit code.
You mention that you're interested in only one specific channel (which the user may select arbitrarily). That means you could use cv::mixChannels, which gives you the flexibility in selecting what channels and how you want to extract them.
That means you can extract data for only a single channel, theoretically (depending on the implementation -- study the source code for more details) avoiding the overhead in extracting and/or copying the data for the channels you're not interested in.
Let's make a test program evaluating the 4 possible combinations of the approaches outlined above.
Variant 0: cv::split without reuse
Variant 1: cv::split with reuse
Variant 2: cv::mixChannels without reuse
Variant 3: cv::mixChannels with reuse
NB: I just use static for simplicity here, usually i'd make this member variable in a class that wraps the algorithm.
#include <opencv2/opencv.hpp>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <vector>
#define SELECTED_CHANNEL 1
cv::Mat variant_0(cv::Mat const& img)
{
std::vector<cv::Mat> channels;
cv::split(img, channels);
return channels[SELECTED_CHANNEL];
}
cv::Mat variant_1(cv::Mat const& img)
{
static std::vector<cv::Mat> channels;
cv::split(img, channels);
return channels[SELECTED_CHANNEL];
}
cv::Mat variant_2(cv::Mat const& img)
{
// NB: output Mat must be preallocated
cv::Mat channel(img.rows, img.cols, CV_8UC1);
int from_to[] = { SELECTED_CHANNEL, 0 };
cv::mixChannels(&img, 1, &channel, 1, from_to, 1);
return channel;
}
cv::Mat variant_3(cv::Mat const& img)
{
// NB: output Mat must be preallocated
static cv::Mat channel(img.rows, img.cols, CV_8UC1);
int from_to[] = { SELECTED_CHANNEL, 0 };
cv::mixChannels(&img, 1, &channel, 1, from_to, 1);
return channel;
}
template<typename T>
void timeit(std::string const& title, T f)
{
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::microseconds;
cv::Mat img(1024,1024, CV_8UC3);
cv::randu(img, 0, 256);
int32_t const STEPS(1024);
high_resolution_clock::time_point t1 = high_resolution_clock::now();
for (uint32_t i(0); i < STEPS; ++i) {
cv::Mat result = f(img);
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(t2 - t1).count();
double t_ms(static_cast<double>(duration) / 1000.0);
std::cout << title << "\n"
<< "Total = " << t_ms << " ms\n"
<< "Iteration = " << (t_ms / STEPS) << " ms\n"
<< "FPS = " << (STEPS / t_ms * 1000.0) << "\n"
<< "\n";
}
int main()
{
for (uint8_t i(0); i < 2; ++i) {
timeit("Variant 0", variant_0);
timeit("Variant 1", variant_1);
timeit("Variant 2", variant_2);
timeit("Variant 3", variant_3);
std::cout << "--------------------------\n\n";
}
return 0;
}
Output for the second pass (so we avoid any warmup costs).
Note: Running this on i7-4930K, using OpenCV 3.1.0 (64-bit, MSVC12.0), Windows 10 -- YMMV, especially with CPUs that have AVX2
Variant 0
Total = 1518.69 ms
Iteration = 1.48309 ms
FPS = 674.267
Variant 1
Total = 359.048 ms
Iteration = 0.350633 ms
FPS = 2851.99
Variant 2
Total = 820.223 ms
Iteration = 0.800999 ms
FPS = 1248.44
Variant 3
Total = 427.089 ms
Iteration = 0.417079 ms
FPS = 2397.63
Interestingly, cv::split with reuse wins here. Feel free to edit the answer and add timings from different platforms/CPU generations (especially if the proportions differ radically).
It also seems that with my setup, none of this is parallelized quite well, so that may be another possible path at speeding this up (something like cv::parallel_for_).
Related
I am trying to make a visual odometry algorithm work in real time (using my stereo camera). The camera feed gets returned as a single image (i420 pixel format), where I have to manually split the image into a left and right frame. One of the problems that I am running into is when I call cv::triangulatePoints. The function gives me an error saying that the input matrices (meaning the left and right frame) are not continuous.
When I receive the input image from the camera, using:
// Read camera feed
IMAGE_FORMAT fmt = {IMAGE_ENCODING_I420, 50};
BUFFER *buffer = arducam_capture(camera_instance, &fmt, 3000);
if (!buffer)
return -1;
// Store feed in image
cv::Mat image = cv::Mat(cv::Size(width,(int)(height * 1.5)), CV_8UC1, buffer->data);
arducam_release_buffer(buffer);
// Change image to grayscale (grayscale increases FPS)
cv::cvtColor(image, image, cv::COLOR_YUV2GRAY_I420);
if (!image.isContinuous())
std::cout << "image is not continuous" << std::endl;
The image passes the continuity check fine (meaning the image is continuous).
However, after I resize and split the image into a left and right frame, using:
double scale_factor = 640.0 / width;
int custom_width = int(width * scale_factor);
int custom_height = int(height * scale_factor);
// OpenCV resize
cv::Mat frame = cv::Mat(cv::Size(custom_width, (int)(custom_height * 1.5)), CV_8UC1);
cv::resize(image, frame, frame.size(), 0, 0);
// Split image into left and right frame
cv::Mat frame_left = frame(cv::Rect(0, 0, custom_width / 2, (int)(custom_height * 1.5)));
cv::Mat frame_right = frame(cv::Rect(custom_width / 2, 0, custom_width / 2, (int)(custom_height * 1.5)));
if (!frame.isContinuous())
std::cout << "frame is not continuous" << std::endl;
if (!frame_right.isContinuous())
std::cout << "right frame is not continuous" << std::endl;
if (!frame_left.isContinuous())
std::cout << "left frame is not continuous" << std::endl;
The resized image (frame) is continuous, but the left and right frames fail the continuity check (meaning they are not continuous).
So I guess my question is how can I split the image into two different images, while keeping them continuous?
The solution to this problem is actually quite simple:
if (!frame_right.isContinuous()) {
frame_right = frame_right.clone();
if (!frame_left.isContinuous()) {
frame_left = frame_left.clone();
By using the clone() function, you can copy the image and OpenCV will consider it to be a new image. This way the right and left frames will retain continuity (or be set to continuous status).
So splitting the image destroys continuity and cloning will restore continuity.
I decided to try to start learning OpenCL. I spent a lot of time compiling and the like, and finally I have a Qt project with OpenCV embedded and OpenCL working. The information on the internet about the next steps is kinds scarce though. Using other stackoverflow posts, I botched together this kernel, which should swap image color channels.
This is my kernel:
__kernel void shift(
read_only image2d_t input,
float shift_x,
float shift_y,
write_only image2d_t output,
int dst_step, int dst_offset, int dst_rows, int dst_cols)
{
int2 coord = (get_global_id(1), get_global_id(0));
uint4 pixel = read_imageui(input, samplerLN, coord);
// create pixel with swapped channels
uint4 pixel2;
pixel2.s0 = pixel.s1;
pixel2.s1 = pixel.s2;
pixel2.s2 = pixel.s0;
write_imageui(output, coord, pixel2);
}
And this is how I try to run it:
//! run gpu operation
cv::ocl::Device(context.device(0));
cv::Mat imageOpenCL = cv::imread("D:\\images\\20200424_162602.jpg", cv::IMREAD_GRAYSCALE);
imageOpenCL.convertTo(imageOpenCL, CV_32F, 1.0 / 255);
cv::UMat umat_src = imageOpenCL.getUMat(cv::ACCESS_READ, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
cv::UMat umat_dst(imageOpenCL.size(), CV_32F, cv::ACCESS_WRITE, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
cv::ocl::ProgramSource program(source);
cv::ocl::Image2D imageCL(umat_src);
cv::ocl::Image2D imageCLOut(umat_dst);
float shift_x = 100.5;
float shift_y = -50.0;
cv::ocl::Kernel kernel("shift", program);
kernel.args(imageCL, shift_x, shift_y, imageCLOut);
size_t globalThreads[3] = { (size_t)imageOpenCL.cols, (size_t)imageOpenCL.rows, 1 };
//size_t localThreads[3] = { 16, 16, 1 };
bool success = kernel.run(3, globalThreads, NULL, true);
if (!success){
std::cout << "Failed running the kernel..." << std::endl;
return;
}
// Download the dst data from the device (?)
cv::Mat mat_dst = umat_dst.getMat(cv::ACCESS_READ);
cv::imshow("src", imageOpenCL);
cv::imshow("dst", mat_dst);
I'm probably copying the data wrong, but I'm not sure what to do. I also tried different types instead of CV_32F for the image, such as CV_8U and CV_8UC3.
Your kernel has 8 arguments; you are only setting 4 of them. Hence, the CL_INVALID_KERNEL_ARGS error. It does not appear you are using the last 4 arguments in the kernel; this fix seems to be to remove them from the kernel argument list.
How can we get the mean of an input RGB image(3 dimensional Mat object) so that we get a gray image? The cvtColor() function of OpenCV converts the image to gray based on a pre-existing formula. I want to get the mean of all three channels and store the resultant image in another matrix. The cv::mean() function in OpenCV returns the scalar mean of all input channels.
Were this Python, with img being a RGB image, img.mean(2) would get me what I want. Successive calls of the addWeighted() function and using gray= blue/3.0 + red/3.0 +green/3.0 [ After splitting channels] yielded different results when compared with Python.
Is there anything analogous to img.mean(2) in C++ or the OpenCV library of C++?
Is there anything analogous to img.mean(2) in C++ or the OpenCV library of C++?
No, but you can easily compute that. There are a few ways of doing it:
Loop over all the image, and set each value as the mean of the input pixel values. Take care of computing the intermediate values for the mean on a type with more capacity and accuracy than uchar (here I used double) or you may end up with wrong results. You can also optimize the code further, e.g. see this question and its answers. You just need to change the function computed in the inner loop to compute the mean.
Use reduce. You can reshape you 3 channel matrix of size rows x cols to be a matrix of shape ((rows*cols) x 3), and then you can use the reduce operation with parameter REDUCE_AVG to compute the average row-wise. Then reshape the matrix to correct size. reshape operation is very fast, since you just modify the header without affecting the stored data.
Use matrix operations to sum channels. You can use split to get the matrix for each channel, and sum them. Take care to not saturate your values while summing up! (Thanks to beaker for this one.)
You can see that the first approach is faster with small matrices, but as soon as the size increase, the second approach performs much better since you take advantage of OpenCV optimizations.
The third approach works surprisingly well (thanks to matrix expressions).
Some numbers, time in ms. Time may vary on you computer depending on OpenCV optimizations enabled. Run in release!
Size : 10x10 100x100 1000x1000 10000x10000
Loop : 0.0077 0.3625 34.82 3456.71
Reduce: 1.44 1.42 8.88 716.75
Split : 0.1158 0.0656 2.26304 246.476
Code:
#include <opencv2\opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;
int main()
{
Mat3b img(1000, 1000);
randu(img, Scalar(0, 0, 0), Scalar(10, 10, 10));
{
double tic = double(getTickCount());
Mat1b mean_img(img.rows, img.cols, uchar(0));
for (int r = 0; r < img.rows; ++r) {
for (int c = 0; c < img.cols; ++c) {
const Vec3b& v = img(r, c);
mean_img(r, c) = static_cast<uchar>(round((double(v[0]) + double(v[1]) + double(v[2])) / 3.0));
}
}
double toc = (double(getTickCount()) - tic) * 1000.0 / getTickFrequency();
cout << "Loop: " << toc << endl;
}
{
double tic = double(getTickCount());
Mat1b mean_img2 = img.reshape(1, img.rows*img.cols);
reduce(mean_img2, mean_img2, 1, REDUCE_AVG);
mean_img2 = mean_img2.reshape(1, img.rows);
double toc = (double(getTickCount()) - tic) * 1000.0 / getTickFrequency();
cout << "Reduce: " << toc << endl;
}
{
double tic = double(getTickCount());
vector<Mat1b> planes;
split(img, planes);
Mat1b mean_img3;
if (img.channels() == 3) {
mean_img3 = (planes[0] + planes[1] + planes[2]) / 3.0;
}
double toc = (double(getTickCount()) - tic) * 1000.0 / getTickFrequency();
cout << "Split: " << toc << endl;
}
getchar();
return 0;
}
mean()
Calculates an average (mean) of array elements.
C++: Scalar mean(InputArray src, InputArray mask=noArray())
Python: cv2.mean(src[, mask]) → retval
C: CvScalar cvAvg(const CvArr* arr, const CvArr* mask=NULL )
Python: cv.Avg(arr, mask=None) → scalar
Parameters:
src – input array that should have from 1 to 4 channels so that the result can be stored in Scalar_ .
mask – optional operation mask.
The function mean calculates the mean value M of array elements, independently for each channel, and return it:
When all the mask elements are 0’s, the functions return Scalar::all(0) .
Also check this answer how to calculate and use cvMat mean value
I found some interesting results regarding performing the cv::dft function on cv::UMats vs cv::Mats. Essentially I found that UMats are actually much slower until images up to 4096x4096. Until then, cv::Mat consistently wins. Is this just be cause the dft is not implemented for the TAPI api and only the CV::Mat implementation? The test I ran looks something like this (I used the celero project to create the benchmark):
constexpr int num_samples = 2;
constexpr int num_iterations = 10;
constexpr int num_rows = 4096;
constexpr int num_cols = 4096;
cv::UMat a = cv::UMat(num_rows, num_cols, CV_32F);
cv::Mat b = cv::Mat(num_rows, num_cols, CV_32F);
void CreateUMat() { cv::randu(a, 0, 256); }
void CreateMat() { cv::randu(b, 0, 256); }
void DftUMat() {
CreateUMat();
cv::dft(a, a);
cv::idft(a, a, cv::DFT_SCALE | cv::DFT_INVERSE);
}
void DftMat() {
CreateMat();
cv::dft(b, b);
cv::idft(b, b, cv::DFT_SCALE | cv::DFT_INVERSE);
}
BASELINE(UMatBenchmarks, Baseline, num_samples, num_iterations) { DftUMat(); }
BENCHMARK(UMatBenchmarks, NoGPU, num_samples, num_iterations) { DftMat(); }
I got the following results:
cv::UMat iterations/sec = 4.51
cv::Mat iterations/sec = 4.70
for a smaller image, say 1024x1024 I got the following results:
cv::UMat iterations/sec = 63.21
cv::Mat iterations/sec = 85.83
From these results, you can see that there is almost no advantage to using UMat for large images sizes and there is especially no advantage on smaller images. This surprises me because I got significant speed ups with cv::matchTemplate when switching to the OpenCV TAPI. My guess is that cv::dft has not been implemented in with OpenCL, but is this truly the case? Is the DFT just a not good algorithm to offload to the GPUS? Thanks!
Say I need to find a specific element in a cv::Mat which can be a row vector in my case (though for more general case Mat can be more than one dimension).
The data type of the target can be as simple as char, int, double etc.
There is an existing post: How to find if an item is present in a std::vector? which explained how to find the element in a std::vector. Therefore, one way to do this can be: 1) converting the cv::Mat to std::vector; 2) used the method in the post to find the element.
However, I need to do this searching operation hundreds of times per row. When I have hundreds of rows need to be processed, the performance can be an issue.
I am wondering how's the performance of above method (convert + search) and is there any more efficient way to do this (maybe search element directly using cv::Mat without conversion)?
p.s: Here is a post for Converting a row of cv::Mat to std::vector
Combining those two answers and depending on the mat type (here CV_64F) you get:
bool findValue(const cv::Mat &mat, double value) {
for(int i = 0;i < mat.rows;i++) {
const double* row = mat.ptr<double>(i);
if(std::find(row, row + mat.cols, value) != row + mat.cols)
return true;
}
return false;
}
(see find docs for more information). Of course first converting mat row to a vector and then using std::find on that vector is slower than using find directly on pointer to a row array.
EDIT: After some more research, it is not quite hard to develop a generic version:
template <class T>
bool findValue(const cv::Mat &mat, T value) {
for(int i = 0;i < mat.rows;i++) {
const T* row = mat.ptr<T>(i);
if(std::find(row, row + mat.cols, value) != row + mat.cols)
return true;
}
return false;
}
I tested it on more complex data types:
cv::Mat matDouble = cv::Mat::zeros(10, 10, CV_64F);
cv::Mat matRGB = cv::Mat(10, 10, CV_8UC3, cv::Scalar(255, 255, 255));
std::cout << findValue(matDouble, 0.0) << std::endl;
std::cout << findValue(matDouble,1.0) << std::endl;
std::cout << findValue(matRGB, cv::Scalar(255, 255, 255)) << std::endl;
std::cout << findValue(matRGB, cv::Scalar(255, 255, 254)) << std::endl;
And what surprised me the output is:
1
0
0 // should be 1, right?
0
The problem is with the cv::Scalar size structure. No matter of the version of the constructor we're using (ie oone, two, three or four arguments) the size is... constant. This is no so surprising cause this is still the same structure, on my machine the size is 32 bytes (by default cv::Scalar is type of double so on my machine double is 8 bytes and 4 * 8 = 32). So the find goes strictly wrong, cause it assumes size of the element in the array as 32 bytes and it should be 3 bytes.
So don't use std::find with cv::Scalar! However it works with the primitive data types remarkable well and efficient.
EDIT2 (after berak's comment):
Yes, you can use cv::Vec3b with find and it seems working well although it have not done more testing than simply correct test:
cv::Mat matRGB = cv::Mat(10, 10, CV_8UC3, cv::Scalar(255, 255, 255));
std::cout << findValue(matRGB, cv::Vec3b(255, 255, 255)) << std::endl;
std::cout << findValue(matRGB, cv::Vec3b(255, 255, 254)) << std::endl;
(still you have to use Scalar in the Mat constructor, but it does not matter and the Mat is properly initialized). Now the output is as expected:
1
0