OpenCV: Accessing elements of 5D Matrix - c++

I have a problem accessing elements of a 5D Matrix in OpenCV. I create my Matrix using
int sizes[5] = { height_, width_, range_, range_, range_ };
Mat w_i_ = Mat(2 + channels, sizes, CV_16UC(channels), Scalar(0));
where channels = 3. Then I'm trying to access and modify the matrix elements using for loops:
for (UINT Y = 0; Y < height; ++Y) {
for (UINT X = 0; X < width; ++X) {
// a) Compute the homogeneous vector (wi,w)
Vec3b wi = image.at<Vec3b>(Y, X);
// b) Compute the downsampled coordinates
UINT y = round(Y / sigmaSpatial);
UINT x = round(X / sigmaSpatial);
Vec3b zeta = round( (image.at<Vec3b>(Y, X) - min) / sigmaRange);
// round() here is overloaded for vectors
// c) Update the downsampled S×R space
int idx[5] = { y, x, zeta[0], zeta[1], zeta[2] };
w_i_.at<Vec3b>(idx) = wi;
}
}
I am getting an assertion failed error produced by Mat::at() when I run the code. Specifically the message I get is:
OpenCV Error: Assertion failed (elemSize() == (((((DataType<_Tp>::type) & ((512 - 1) << 3)) >> 3) + 1) << ((((sizeof(size_t)/4+1)*16384|0x3a50) >> ((DataType<_Tp>::type) & ((1 << 3) - 1))*2) & 3))) in cv::Mat::at, file c:\opencv\build\include\opencv2\core\mat.inl.hpp, line 1003
I have searched the web but I can't seem to find any topics on 5D Matrices (similar topics proved of no help).
Thanks in advance

You initialize the zeta variable and do not check its values.
Most likely you get an out-of-range value for zeta[0], zeta[1] and zeta[2] indices and thus the internal range checking in at() function fails.
To prevent such crashes at least add some manual range checking before calling at():
for(int i = 0 ; i < 3 ; i++)
if(zeta[i] < 0 || zeta[i] >= _range)
continue;

Related

Using Eigen matrix multiplication for efficient pointcloud calculation

I am attempting to calculate a pointcloud from an opencv Mat depth image and an intrinsic matrix. Currently I do it as follows (k matrix values were extracted into fx,fy,cx,cy earlier):
for(int i=0; i<depth.rows; i++)
{
const float* row_ptr = depth.ptr<float>(i);
for(int j=0; j<depth.cols; j++)
{
// Only add valid depth points
if(row_ptr[j] != 0)
{
const float x = ((j - cx) * row_ptr[j]/focal_length_x);
const float y = ((i - cy) * row_ptr[j]/focal_length_y);
pointcloud[cnt] = pcl::PointXYZ(x/1000, y/1000, row_ptr[j]/1000);
cnt++;
}
}
}
However I am wondering is it possible to turn this into a matmul operation and use eigen for better performance. I am aware that:
[x, y, z] = depth value * inv(k) * [u, v, 1] with
[fx, 0, cx]
k = [0, fy, cy]
[0, 0, 1]
How would I go about turning this into a full matrix multiplication, my depth image is 1280x800, obviously directly multiplying a 1280x800 matrix with a 3x3 with a 3x1 wont work so what ways can this be done if any?

Image Rotation without cropping

Dears,
With the below code, I rotate my cv::Mat object (I'm not using any Cv's functions, apart from load/save/convertionColor.., as this is a academic project) and I receive a cropped Image
rotation function:
float rads = angle*3.1415926/180.0;
float _cos = cos(-rads);
float _sin = sin(-rads);
float xcenter = (float)(src.cols)/2.0;
float ycenter = (float)(src.rows)/2.0;
for(int i = 0; i < src.rows; i++)
for(int j = 0; j < src.cols; j++){
int x = ycenter + ((float)(i)-ycenter)*_cos - ((float)(j)-xcenter)*_sin;
int y = xcenter + ((float)(i)-ycenter)*_sin + ((float)(j)-xcenter)*_cos;
if (x >= 0 && x < src.rows && y >= 0 && y < src.cols) {
dst.at<cv::Vec4b>(i ,j) = src.at<cv::Vec4b>(x, y);
}
else {
dst.at<cv::Vec4b>(i ,j)[3] = 0;
}
}
I would like to know, How I can keep my Full image every time I want to rotate it.
Am I missing something in my function maybe?
thanks in advance
The rotated image usually has to be large than the old image to store all pixel values.
Each point (x,y) is translated to
(x', y') = (x*cos(rads) - y*sin(rads), x*sin(rads) + y*cos(rads))
An image with height h and width w, center at (0,0) and corners at
(h/2, w/2)
(h/2, -w/2)
(-h/2, w/2)
(-h/2, -w/2)
has a new height of
h' = 2*y' = 2 * (w/2*sin(rads) + h/2*cos(rads))
and a new width of
w' = 2*x' = 2 * (w/2*cos(rads) + h/2*sin(rads))
for 0 <= rads <= pi/4. It is x * y <= x' * y' and for rads != k*pi/2 with k = 1, 2, ... it is x * y < x' * y'
In any case the area of the rotated image is same size as or larger than the area of the old image.
If you use the old size, you cut off the corners.
Example:
Your image has h=1, w=1 and rads=pi/4. You need a new image with h'=sqrt(2)=1.41421356237 and w'=sqrt(2)=1.41421356237 to store all pixel values. The pixel from (1,1) is translated to (0, sqrt(2)).

SIFT orientations in OpenCV implementation

In the OpenCV implementation of SIFT, keypoints has (angles) in degrees (ranging from 180 to -180), which represents the calculated orientations for these keypoints. Since SIFT assign the dominant orientation of a keypoint using 10 degrees bins in a histogram, how we can get this range of angles? shouldn't the values be in 10 degrees steps?
Is that so because of the histogram smoothing?
This is the code where the keypoint.angle is assigned a value, can you help me understanding how we got this value?
float omax = calcOrientationHist(gauss_pyr[o*(nOctaveLayers+3) + layer],
Point(c1, r1),
cvRound(SIFT_ORI_RADIUS * scl_octv),
SIFT_ORI_SIG_FCTR * scl_octv,
hist, n);
float mag_thr = (float)(omax * SIFT_ORI_PEAK_RATIO);
for( int j = 0; j < n; j++ )
{
int l = j > 0 ? j - 1 : n - 1;
int r2 = j < n-1 ? j + 1 : 0;
if( hist[j] > hist[l] && hist[j] > hist[r2] && hist[j] >= mag_thr )
{
float bin = j + 0.5f * (hist[l]-hist[r2]) / (hist[l] - 2*hist[j] + hist[r2]);
bin = bin < 0 ? n + bin : bin >= n ? bin - n : bin;
kpt.angle = 360.f - (float)((360.f/n) * bin);
if(std::abs(kpt.angle - 360.f) < FLT_EPSILON)
kpt.angle = 0.f;
keypoints.push_back(kpt);
}
}
I think that I found the answer to my question.
A parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy. That's why we can get continues range of values instead of 10 step values.
This is a link of how we can fit a parabola to 3 points:
Curve fitting

Nested loop summation in 2D using OpenCL

I have recently started working with OpenCL in C++ and I'm trying to fully understand how to use 2D and 3D NDRange. I'm currently implementing Inverse Distance Weighting in OpenCL, but my problem is general.
Below is the serial function to compute the weights and it consists of a nested loop.
void computeWeights(int nGrids, int nPoints, double *distances, double *weightSum, const double p) {
for (int i = 0; i < nGrids; ++i) {
double sum = 0;
for (int j = 0; j < nPoints; ++j) {
double weight = 1 / pow(distances[i * nPoints + j], p);
distances[i * nPoints + j] = weight;
sum += weight;
}
weightSum[i] = sum;
}
}
What I would want is to implement the above function using a 2D NDRange, the first being over nGrids and the second over nPoints. What I don't understand, though, is how to handle the summation of the weights into weightSum[i]. I understand that I may have to use parallel sum reduction, somehow.
When dispatching a kernel with a 2D global workspace, OpenCL creates a grid of work-items. Each work-item executes the kernel and gets unique ids in both those dimensions.
(x,y)|________________________
| (0,0) (0,1) (0,2) ...
| (1,0) (1,1) (1,2)
| (2,0) (2,1) (2,2)
| ...
The work-items are also divided into groups and get unique ids within those work-groups. E.g. for work-groups of size (2,2):
(x,y)|________________________
| (0,0) (0,1) (0,0) ...
| (1,0) (1,1) (1,0)
| (0,0) (0,1) (0,0)
| ...
You can arrange the work-groups, so that each one of them performs a reduction.
Your SDK probably has samples, and a parallel reduction will be one of them.
To get you started, here is a kernel that solves your problem. It's in its simplest form, and works for a single work-group per row.
// cl::NDRange global(nPoints, nGrids);
// cl::NDRange local(nPoints, 1);
// cl::Local data(nPoints * sizeof (double));
kernel
void computeWeights(global double *distances, global double *weightSum, local double *data, double p)
{
uint nPoints = get_global_size(0);
uint j = get_global_id(0);
uint i = get_global_id(1);
uint lX = get_local_id(0);
double weight = 1.0 / pow(distances[i * nPoints + j], p);
distances[i * nPoints + j] = weight;
data[lX] = weight;
for (uint d = get_local_size(0) >> 1; d > 0; d >>= 1)
{
barrier(CLK_LOCAL_MEM_FENCE);
if (lX < d)
data[lX] += data[lX + d];
}
if (lX == 0)
weightSum[i] = data[0];
}
Each row of work-items (i.e. each work-group) computes the weights (and their sum) for grid i. Each work-item computes a weight, stores it back to distances, and loads it onto local memory. Then each work-group performs a reduction in local memory, and finally the result gets stored in weightSum.

SSE optimization of Gaussian blur

I'm working on a school project , I have to optimize part of code in SSE, but I'm stuck on one part for few days now.
I dont see any smart way of using vector SSE instructions(inline assembler / instric f) in this code(its a part of guassian blur algorithm). I would be glad if somebody could give me just a small hint
for (int x = x_start; x < x_end; ++x) // vertical blur...
{
float sum = image[x + (y_start - radius - 1)*image_w];
float dif = -sum;
for (int y = y_start - 2*radius - 1; y < y_end; ++y)
{ // inner vertical Radius loop
float p = (float)image[x + (y + radius)*image_w]; // next pixel
buffer[y + radius] = p; // buffer pixel
sum += dif + fRadius*p;
dif += p; // accumulate pixel blur
if (y >= y_start)
{
float s = 0, w = 0; // border blur correction
sum -= buffer[y - radius - 1]*fRadius; // addition for fraction blur
dif += buffer[y - radius] - 2*buffer[y]; // sum up differences: +1, -2, +1
// cut off accumulated blur area of pixel beyond the border
// assume: added pixel values beyond border = value at border
p = (float)(radius - y); // top part to cut off
if (p > 0)
{
p = p*(p-1)/2 + fRadius*p;
s += buffer[0]*p;
w += p;
}
p = (float)(y + radius - image_h + 1); // bottom part to cut off
if (p > 0)
{
p = p*(p-1)/2 + fRadius*p;
s += buffer[image_h - 1]*p;
w += p;
}
new_image[x + y*image_w] = (unsigned char)((sum - s)/(weight - w)); // set blurred pixel
}
else if (y + radius >= y_start)
{
dif -= 2*buffer[y];
}
} // for y
} // for x
One more feature you can use is logical operations and masks:
for example instead of:
// process only 1
if (p > 0)
p = p*(p-1)/2 + fRadius*p;
you can write
// processes 4 floats
const __m128 &mask = _mm_cmplt_ps(p,0);
const __m128 &notMask = _mm_cmplt_ps(0,p);
const __m128 &p_tmp = ( p*(p-1)/2 + fRadius*p );
p = _mm_add_ps(_mm_and_ps(p_tmp, mask), _mm_and_ps(p, notMask)); // = p_tmp & mask + p & !mask
Also I can recommend you to use a special libraries, which overloads instructions. For example: http://code.compeng.uni-frankfurt.de/projects/vc
dif variable makes iterations of inner loop dependent. You should try to parallelize the outer loop. But with out instructions overloading the code will become unmanageable then.
Also consider rethinking the whole algorithm. Current one doesn't look paralell. May be you can neglect precision, or increase scalar time a bit?