I'm trying to apply thrust algorithms to the data in cuda::GpuMats. Unfortunately OpenCV basically never produces continuous GpuMats (which screws up virtually all my algorithms, code, performance etc...). Normally, when I encounter these kinds of scenarios with discontinuous host matrices, I just clone the matrix, typically host side matrices become discontinuous when the matrix came from a rectangle view of another matrix.
This... doesn't work with gpu mats. Literally never seem to come out contiguous. I'm not sure what the heck is going on with OpenCV. All I'm doing is the following:
cv::Mat host(600,400);
cv::gpu::GpuMat device;
device.upload(host);
cv::gpu::GpuMat continuous;
if(device.isContinuous()){
continuous = device;
}else{
continuous = device.clone();
}
//always prints...
if(!continuous.isContinuous()){
std::cout << "isn't Continuous\n";
}
As you can see, the mere act of uploading data produces discontinuous data...
Hi Dear,
To generate continuous GpuMat, you can use one of the below methods:
use cv::cuda::createContinuous(int rows, int cols, int type, continuous_gpumat) or its overloads.
allocate a continuous Cuda memory using cudaMalloc cuda api call or similar functions, and then construct a GpuMat header for this continuous buffer:
// elem_size is depent on data type
int alloc_size = rows*cols*elem_size;
uchar *data = nullptr;
cudaError_t status = cudaMalloc(&data, alloc_size);
assert(status==cudaSuccess);
continuous_gpumat = cv::cuda::GpuMat(rows, cols, type, data);
// in destructor:
status = cudaFree(data);
assert(status==cudaSuccess);
Related
I got the following code (example) to create a mask, which uses cv:Mat:
int v;
cv::Mat m1; // being a submat
cv::Mat mask = (m1==v);
These lines are derived from the python prototype
mask = np.where( m1[x1:x2,y1:y2]==v, 255, 0 );
In the c++ version I'd like to use UMat instead of Mat because there's a larger processing pipeline around this one line. Sadly it seems to me that MatExpressions (like the m1==v above) are not implemnted for cv::UMat in OpenCV3.4.1. Is that correct?
Are there operations available on cv::UMat with which I could efficiently mimic the mask=(m1==v) to obtain the same mask?
My current code (converting from UMat to Mat, i.e. copying from graphics mem to main mem and then doing the cv::Mat operation) is not efficient.
using c++11, gcc5.4.0, opencv3.4.1
NB: The question is not about possibly different values in the mask between python and c++ version.
As #DanMaĆĄek correctly pointed out, cv::compare is my friend in this case:
// having some UMat m1 and some (let's say) double v
cv::UMat mask;
cv::compare( m1, cv::Scalar{v}, mask, cv::CMP_EQ );
How can I reshape TF_Tensor* using Tensorflow's C_api as it's being done in C++?
TensorShape inputShape({1,1,80,80});
Tensor inputTensor;
Tensor newTensor;
bool result = inputTensor->CopyFrom(newTensor, inputShape);
I don't see a similar method using the tensorflow's c_api.
Tensorflow C API operates with a (data,dims) model - treating data as a flat raw array supplied with the needed dimensions.
Step 1: Allocating a new Tensor
Have a look at TF_AllocateTensor(ref):
TF_CAPI_EXPORT extern TF_Tensor* TF_AllocateTensor(TF_DataType,
const int64_t* dims,
int num_dims, size_t len);
Here:
TF_DataType: The TF equivalent of the data type you need from here.
dims: Array corresponding to dimensions of tensor to be allocated eg. {1, 1, 80, 80}
num_dims: length of dims(4 above)
len: reduce(dims, *): i.e. 1*1*80*80*sizeof(DataType) = 6400*sizeof(DataType).
Step 2: Copying data
// Get the tensor buffer
auto buff = (DataType *)TF_TensorData(output_of_tf_allocate);
// std::memcpy() ...
Here is some sample code from a project I did a while back on writing a very light Tensorflow C-API Wrapper.
So, essentially your reshape will involve allocating your new tensor and copying the data from the original tensor into buff.
The Tensorflow C API isnt meant for regular usage and thus is harder to learn + lacking documentation. I figured a lot of this out with experimentation. Any suggestions from the more experienced developers out there?
My goal is augment my pre-existing image processing pipeline (written in Halide) with OpenCV functions such as NL means denoising. OpenCV functions will not be capable of using Halide's scheduling functionality, so my plan is to realize each Halide Func before each OpenCV stage. The remaining question is how to best convert from a Halide Image (the result of the Func realization) to an OpenCV Mat (as input to an OpenCV function) and from OpenCV Mat to Halide Image when done. My Halide Images are of type float and have 3 channels.
One obvious solution to this is to write functions which copy the data from one data type to the other, but this strikes me as wasteful. Not only will it take precious time to copy over the data, but it will also waste memory since the image will then be stored as two different data types. Is there a way to use pointers or data buffers to simply re-wrap the image data in a new format? Hopefully this process would be reversible so I can go from Halide to OpenCV, and then after the OpenCV function is done back to Halide.
buffer_t is gone now, so I should update this answer. The current way to make a buffer that wraps an OpenCV mat (which uses an interleaved storage layout) is:
Halide::Runtime::Buffer<float>::make_interleaved(image.data, image.cols, image.rows, image.channels());
If the OpenCV matrix has padding between the rows, the longer form is:
halide_dimension_t shape[3] = {{0, image.cols, image.step1(1)},
{0, image.rows, image.step1(0)},
{0, image.channels(), 1}};
Halide::Runtime::Buffer<float> buffer(image.data, 3, shape);
a halide_dimension_t is the min coordinate, the extent, and then the stride/step in that dimension.
Yes, you can avoid copying data. I see two possible approaches: either allocate memory yourself and refer to that memory in both an OpenCV Mat instance and a Halide buffer_t structure; or let OpenCV's Mat class allocate the memory and refer to that memory in a buffer_t structure.
For the first approach, you can use a Mat constructor that takes a data pointer:
float* data = new float[3 * width * height];
cv::Mat image(height, width, CV_32FC3, data, AUTO_STEP);
For the second approach, you can use the usual constructor or Mat::create method:
cv::Mat image(height, width, CV_32FC3);
Either way, you can use something like the following code to wrap the memory in a Halide buffer_t structure:
buffer_t buffer;
memset(&buffer, 0, sizeof(buffer));
buffer.host = image.data;
buffer.elem_size = image.elemSize1();
buffer.extent[0] = image.cols;
buffer.extent[1] = image.rows;
buffer.extent[2] = image.channels();
buffer.stride[0] = image.step1(1);
buffer.stride[1] = image.step1(0);
buffer.stride[2] = 1;
Now you should be able to operate on the same memory with both OpenCV and Halide functions.
I have a function which requires me to pass a fairly large matrix (which I created using Eigen) - and ranges from dimensions 200x200 -> 1000x1000. The function is more complex than this, but the bare bones of it are:
#include <Eigen/Dense>
int main()
{
MatrixXi mIndices = MatrixXi::Zero(1000,1000);
MatrixXi* pMatrix = &mIndices;
MatrixXi mTest;
for(int i = 0; i < 10000; i++)
{
mTest = pMatrix[0];
// Then do stuff to the copy
}
}
Is the reason that it takes much longer to run with a larger size of matrix because it takes longer to find the available space in RAM for the array when I set it equal to mTest? When I switch to a sparse array, this seems to be quite a lot quicker.
If I need to pass around large matrices, and I want to minimise the incremental effect of matrix size on runtime, then what is best practice here? At the moment, the same program is running slower in c++ than it is in Matlab, and obviously I would like to speed it up!
Best,
Ben
In the code you show, you are copying a 1,000,000 element 10,000 times. The assignment in the loop creates a copy.
Generally if you're passing an Eigen matrix to another function, it can be beneficial to accept the argument by reference.
It's not really clear from your code what you're trying to achieve however.
I have a question concerning access speed of OpenCV matrix.
I currently need two channels of unsigned char to contain my data.
But at one point i need to split my data to process them separately (which probably results in a matrix copy)
for( auto ptr = ROI.begin<cv::Vec2b>(); ptr!=ROI.end<cv::Vec2b>();ptr++){
//insert values
}
cv::split(ROI,channels_vector)
process(channels_vector[0]);
process(channels_vector[1]);
more_stuff(ROI);
My question is the following :
Should I use two different matrix at the beginning to avoid the split or let it like this ?
Or as it may depend on my computation what is the difference of cost between two accesses of a matrix and a matrix copy ?