Issue with reading pixel RGB values in OpenCL - c++

I need to read pixels from two parts (with same width and height) of image ( e.g. squares ([0,0], [300, 300]) and ([400,0], [700,300])) and make difference for each pixel.
This is C (pseudo)code:
/**
* #param img Input image
* #param pos Integer position of top left corner of the second square (in this case 400)
*/
double getSum(Image& img, int pos)
{
const int width_of_cut = 300;
int right_bottom = pos + width;
Rgb first, second;
double ret_val = 0.0;
for(int i=0; i < width_of_cut; i++)
{
for(int j=0; j < width_of_cut; j++)
{
first = img.getPixel( i, j );
second = img.getPixel( i + pos, j );
ret_val += ( first.R - second.R ) +
( first.G - second.G ) +
( first.B - second.B );
}
}
return ret_val;
}
But my kernel (with same arguments and the __global float* output is set to 0.0 in host code) is giving me completely different values:
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
CLK_ADDRESS_CLAMP_TO_EDGE |
CLK_FILTER_NEAREST;
__kernel void getSum( __read_only image2d_t input,
const int x_coord,
__global float* output )
{
int width = get_image_width( input );
int height = get_image_height( input );
int2 pixelcoord = (int2) (get_global_id(0), get_global_id(1)); // image coordinates
const int width_of_cut = 300;
const int right_bottom = x_coord + width_of_cut;
int a,b;
a = (int)(pixelcoord.x + x_coord);
b = pixelcoord.y;
if( a < right_bottom && b < width_of_cut )
{
float4 first = read_imagef(input, sampler, pixelcoord);
float4 second = read_imagef(input, sampler, (int2)(a,b));
output[get_global_id(0)] += ((first.x - second.x) +
(first.y - second.y) +
(first.z - second.z));
}
}
I am new to OpenCL and I have no idea what am I doing wrong.
Update (1d image):
I changed the kernel code. Now I'm reading an 1d image in one loop, but I'm still not getting the correct values. I'm not sure that I know, how to read pixels from 1d image correctly.
__kernel void getSum( __read_only image1d_t input,
const int x_coord,
__global float* output,
const int img_width )
{
const int width_of_cut = 300;
int i = (int)(get_global_id(0));
for(int j=0; j < width_of_cut; j++)
{
int f = ( img_width*i + j );
int s = f + x_coord;
float4 first = read_imagef( input, sampler, f ); //pixel from 1st sq.
float4 second = read_imagef( input, sampler, s ); //pixel from 2nd sq.
output[get_global_id(0)] += ((first.x - second.x) +
(first.y - second.y) +
(first.z - second.z));
}
}

Race condition.
All vertical work items are accessing the same output memory (output[get_global_id(0)] +=) and not atomically. Therefore the result are likely incorrect (e.g., two threads read the same value, add something to it, and write it back. Only one wins).
If your device supports it, you could make this an atomic operation, but it would be slow. You'd be better off running a 1D kernel that has a loop accumulating these vertically (so, the j loop from your C example).

Related

convolution implementation in c++

I want to implement 2D convolution function in C++ by myself, without using filter2D(). I'm trying to iterate all pixels of input image and kernel, then, assign new value to each pixel of dst.
However, I got this error.
Thread 1: EXC_BAD_ACCESS (code=1, address=0x0)
I found that this error tells I'm accessing nullptr, but I could not solve the problem. Here is my c++ code.
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
// declare variables
Mat_<float> dst;
Mat_<float> flipped_kernel;
float tmp = 0.0;
// flip kernel
flip(kernel, flipped_kernel, -1);
// multiply and integrate
// input rows
for(int i=0;i<src.rows;i++){
// input columns
for(int j=0;j<src.cols;j++){
// kernel rows
for(int k=0;k<flipped_kernel.rows;k++){
// kernel columns
for(int l=0;l<flipped_kernel.cols;l++){
tmp += src.at<float>(i,j) * flipped_kernel.at<float>(k,l);
}
}
dst.at<float>(i,j) = tmp;
}
}
return dst.clone();
}
To simplify let's suppose you have kernel 3x3
k(0,0) k(0,1) k(0,2)
k(1,0) k(1,1) k(1,2)
k(2,0) k(2,1) k(2,2)
to calculate convolution you are scanning input image (marked as I) from left to fright, from top to bottom
and for every pixel of input image you assign one value calculated from the formula below:
newValue(y,x) = I(y-1,x-1) * k(0,0) + I(y-1,x) * k(0,1) + I(y-1,x+1) * k(0,2)
+ I(y,x-1) * k(1,0) + I(y,x) * k(1,1) + I(y,x+1) * k(1,2) +
+ I(y+1,x-1) * k(2,0) + I(y+1,x) * k(2,1) + I(y+1,x+1) * k(2,2)
------------------x------------>
|
|
| [k(0,0) k(0,1) k(0,2)]
y [k(1,0) k(1,1) k(1,2)]
| [k(2,0) k(2,1) k(2,2)]
|
(y,x) of input Image (I) is anchor point of kernel, to assign new value to I(y,x)
you need to multiply every k coefficient by corresponding point of I - your code doesn't do it.
First you need to create dst matrix with dimenstion as original image, and the same type of pixel.
Then you need to rewrite your loops to reflect formula described above:
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
Mat dst(src.rows,src.cols,src.type());
Mat_<float> flipped_kernel;
flip(kernel, flipped_kernel, -1);
const int dx = kernel.cols / 2;
const int dy = kernel.rows / 2;
for (int i = 0; i<src.rows; i++)
{
for (int j = 0; j<src.cols; j++)
{
float tmp = 0.0f;
for (int k = 0; k<flipped_kernel.rows; k++)
{
for (int l = 0; l<flipped_kernel.cols; l++)
{
int x = j - dx + l;
int y = i - dy + k;
if (x >= 0 && x < src.cols && y >= 0 && y < src.rows)
tmp += src.at<float>(y, x) * flipped_kernel.at<float>(k, l);
}
}
dst.at<float>(i, j) = saturate_cast<float>(tmp);
}
}
return dst.clone();
}
Your memory access error is presumably happening due to the line:
dst.at<float>(i,j) = tmp;
because dst is not initialized. You can't assign something to that index of the matrix if it has no size/data. Instead, initialize the matrix first, as Mat_<float> is a declaration, not an initialization. Use one of the initializations where you can specify a cv::Size or the rows/columns from the different constructors for Mat (see the docs). For example, you can initialize dst with:
Mat dst{src.size(), src.type()};

list of white pixels indices in image using CUDA

Given a binary image, I want to return the list of indices for white pixels in it using GPU (Compute Unified Device Architecture). How to determine the index for points vector?
Here is the CUDA Kernel .
//copy only active pixel locations
__global__ void get_white_pixels_kernel(unsigned char* bin_image,
float * points,
int width,
int height,
int grayWidthStep)
{
int row_index = threadIdx.y+ blockIdx.y*blockDim.y;
int col_index = threadIdx.x+blockIdx.x*blockDim.x;
if ((col_index < width) && (row_index < height))
{
//Location of gray pixel in output
const int gray_tid = row_index * grayWidthStep + col_index;
if(input[gray_tid]==255)
points[--here is the index]= Point2f(row_index,col_index);
}
}
Following is a naive method to achieve the desired functionality:
Generate a mask of pixel indices with dummy values for pixel with zero value.
Count the number of non-zero pixels
Create an output vector with length equal to non-zero count.
Copy the non-zero pixel indices from the generated mask to the output vector (a process known as stream-compaction)
Following is a sample code for the above mentioned process.
Code
#include <cstdio>
#include <vector>
#include <cuda_runtime.h>
#include <thrust/count.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <opencv2/opencv.hpp>
static void _check_err(cudaError_t err, const char* file, int line)
{
if(err)
{
const char* err_str = cudaGetErrorString(err);
printf("CUDA Error: %s\nFile: %s\nLine: %d\n", err_str, file, line);
exit(EXIT_FAILURE);
}
}
#define CHECK_ERR(err) _check_err((err), __FILE__, __LINE__)
__global__ void kernel_find_indices(const unsigned char* input, int width, int height, int step, int2* indices)
{
const int x = blockIdx.x * blockDim.x + threadIdx.x;
const int y = blockIdx.y * blockDim.y + threadIdx.y;
if(x < width && y < height)
{
const int tidPixel = y * step + x;
const int tidIndex = y * width + x;
unsigned char value = input[tidPixel];
int2 index_to_write;
if(value)
{
//Write actual index to pixels with non-zero value
index_to_write.x = x;
index_to_write.y = y;
}
else
{
//Write dummy index to pixels with zero value
index_to_write.x = -1;
index_to_write.y = -1;
}
indices[tidIndex] = index_to_write;
}
}
//Operator to check whether an index is of a non-zero pixel
struct isNonZeroIndex
{
__host__ __device__ bool operator()(const int2 &idx)
{
return (idx.x != -1) && (idx.y != -1);
}
};
std::vector<cv::Point> getIndicesOfNonZeroPixels(cv::Mat input)
{
std::vector<int2> output_int2;
std::vector<cv::Point> output;
int pixelCount = input.cols * input.rows;
size_t imageBytes= input.step * input.rows;
unsigned char* image_d;
thrust::device_vector<int2> index_buffer_d(pixelCount);
//Allocate device memory for input image
CHECK_ERR(cudaMalloc(&image_d, imageBytes));
//Copy input image to device
CHECK_ERR(cudaMemcpy(image_d, input.ptr(), imageBytes, cudaMemcpyHostToDevice));
dim3 block(16,16);
dim3 grid;
grid.x = (input.cols + block.x - 1) / block.x;
grid.y = (input.rows + block.y - 1) / block.y;
//Generate an index mask with dummy values for indices with zero pixel value
kernel_find_indices<<<grid, block>>>(image_d, input.cols, input.rows, input.step, thrust::raw_pointer_cast(index_buffer_d.data()));
CHECK_ERR(cudaDeviceSynchronize());
int nonZeroCount = thrust::count_if(index_buffer_d.begin(), index_buffer_d.end(), isNonZeroIndex());
//Keep only those indices whose pixel value is non-zero (stream compaction)
thrust::device_vector<int2> compacted(nonZeroCount);
thrust::copy_if(index_buffer_d.begin(), index_buffer_d.end(), compacted.begin(), isNonZeroIndex());
//Copy non-zero pixel indices to host
output_int2.resize(nonZeroCount);
thrust::copy(compacted.begin(), compacted.end(), output_int2.begin());
CHECK_ERR(cudaFree(image_d));
//Convert vector<int2> to vector<cv::Point>
output.resize(nonZeroCount);
for(size_t i=0; i<nonZeroCount; i++)
output[i] = cv::Point(output_int2[i].x, output_int2[i].y);
return output;
}
void run_test()
{
//Generate a sample test image
cv::Mat test = cv::Mat::zeros(100,100, CV_8UC1);
cv::rectangle(test, cv::Rect(5,5,20,20), cv::Scalar::all(255), CV_FILLED);
//Get pixel indices of non-zero pixels
std::vector<cv::Point> indices = getIndicesOfNonZeroPixels(test);
//Display those indices
for(size_t i=0; i<indices.size(); i++)
{
printf("%d, %d\n", indices[i].x, indices[i].y);
}
//Show image
cv::imshow("Sample", test);
cv::waitKey();
}
int main(int argc, char** argv)
{
run_test();
return 0;
}
Compilation Command
nvcc -o nz nz.cu -arch=sm_61 -L/usr/local/lib -lopencv_core
-lopencv_highgui -lopencv_imgproc
Please keep in mind that this code is for image of type 8UC1 (8 bit, single channel) only. You can easily extend it to other data-types as required.

Implementing For loop in open cv?

I want to implement similar loop in open cv.This code is done in Matlab .As i am new to open cv.I dont know how to proceed.Can Anyone give me ideas to do this in C++
for m=1:10
for n=1:20
for l=1:Ns
for k=1:Ns
Y(l,k)=image1(m-Ns+l-1,n-Ns+k-1);
DD(l,k)=image2(m-Ns+l-1,n-Ns+k-1);
end
end
e=Y-DD ;
end
end
here Image1 and image2 are 300*300 pixel in size. Y ,DD,image1,image2 al are mat images.
In OpenCV, the images can be represented as either Mat or IplImage. Your question does not specify the type of the image.
If IplImage:
IplImage *img;
unsigned char *image = (unsigned char*)(img->imageData);
int imageStride = img->widthStep;
pixData = image[xCount + yCount*imageStride];
If Mat:
Mat img;
unsigned char *image = (unsigned char*)(img.data);
int imageStride = img.step;
pixData = image[xCount + yCount*imageStride];
pixData will contain that data at (xCount, yCount). You could use this understanding in the for loop.
As you already know the logic, I am mentioning only on how to access data from a particular point in an image.
The most efficient way in OpenCV to access pixels in a for loop is:
cv::Mat rgbImage;
cv::Mat grayImage;
for ( int i = 0; i < rgbImage.rows; ++i )
{
const uint8_t* rowRgbI = rgbImage.ptr<uint8_t> ( i );
const uint8_t* rowGrayI = grayImage.ptr<uint8_t> ( i );
for ( int j = 0; j < rgbImage.cols; ++j )
{
uint8_t redChannel = *rowRgbI++;
uint8_t greenChannel = *rowRgbI++;
uint8_t blueChannel = *rowRgbI++;
uint8_t grayChannel = *rowGrayI++
}
}
Depending whether your image is one or more channels you can modify the code above.
If you want to implement window sliding, you could do something like this:
cv::Mat img;
int windowWidth = 5;
int windowHeight = 5;
for ( int i = 0; i < img.rows - windowHeight; ++i )
{
for ( int j = 0; j < img.cols - winddowWidth; ++j )
{
// either this
cv::Mat currentWindow = img(cv::Range(j, i), cv::Range(j + windowWidth, i + windowHeight));
// perform some operations on the currentWindow
// or do this
getRectSubPix(img, cv::Size(windowWidth, windowHeight), cv::Point2f(j, i), currentWindow));
// perform some operations on the currentWindow
}
}
You can read more about getRectSubPix().

Image Rotation in a C++ Method

I had posted before on Stack Overflow for the question how exactly to rotate a BMP image in a c++ program. Now, however, I have more to show in regards of my progress.
I was wondering how (or why) my program will not output the image after I do the image calculation:
void BMPImage::Rotate45Left(float point1, float point2, float point3)
{
float radians = (2 * 3.1416*45) / 360;
float cosine = (float)cos(radians);
float sine = (float)sin(radians);
float point1Xtreme = 0;
float point1Yearly = 0;
float point2Xtreme = 0;
float point2Yearly = 0;
float point3Xtreme = 0;
float point3Yearly = 0;
int SourceBitmapHeight = m_BIH.biHeight;
int SourceBitmapWidth = m_BIH.biWidth;
point1Xtreme = (-m_BIH.biHeight*sine);
point1Yearly = (m_BIH.biHeight*cosine);
point2Xtreme = (m_BIH.biWidth*cosine - m_BIH.biHeight*sine);
point2Yearly = (m_BIH.biHeight*cosine + m_BIH.biWidth*sine);
point3Xtreme = (m_BIH.biWidth*cosine);
point3Yearly = (m_BIH.biWidth*sine);
float Minx = min(0, min(point1Xtreme, min(point2Xtreme, point3Xtreme)));
float Miny = min(0, min(point1Yearly, min(point2Yearly, point3Yearly)));
float Maxx = max(point1Xtreme, max(point2Xtreme, point3Xtreme));
float Maxy = max(point1Yearly, max(point2Yearly, point3Yearly));
int FinalBitmapWidth = (int)ceil(fabs(Maxx) - Minx);
int FinalBitmapHeight = (int)ceil(fabs(Maxy) - Miny);
FinalBitmapHeight = m_BIH.biHeight;
FinalBitmapWidth = m_BIH.biWidth;
int finalBitmap;
If anyone has any helpful pointers, that would be great.
I should mention that:
I can't use other outside libraries for the purpose of this program
It is a small image processing program, which has a menu system
Image transformation is usually done by projecting a target pixel onto a source pixel then calculating the value for that target pixel. This way you can easily incorporate different interpolation methods.
template <typename T>
struct Image {
Image(T* data, size_t rows, size_t cols) :
data_(data), rows_(rows), cols_(cols) {}
T* data_;
size_t rows_;
size_t cols_;
T& operator()(size_t row, size_t col) {
return data_[col + row * cols_];
}
};
template <typename T>
T clamp(T value, T lower_bound, T upper_bound) {
value = std::min(std::max(value, lower_bound), upper_bound);
}
void rotate_image(Image const &src, Image &dst, float ang) {
// Affine transformation matrix
// H = [a, b, c]
// [d, e, f]
// Remember, we are transforming from destination to source,
// thus the negated angle.
float H[] = {cos(-ang), -sin(-ang), dst.cols_/2 - src.cols_*cos(-ang)/2,
sin(-ang), cos(-ang), dst.rows_/2 - src.rows_*cos(-ang)/2};
for (size_t row = 0; row < dst.rows_; ++row) {
for (size_t col = 0; col < dst.cols_; ++cols) {
int src_col = round(H[0] * col + H[1] * row + H[2]);
src_col = clamp(src_col, 0, src.cols_ - 1);
int src_row = round(H[3] * col + H[4] * row + H[5]);
src_row = clamp(src_row, 0, src.rows_ - 1);
dst(row, col) = src(src_row, src_col);
}
}
}
The above method rotates an image with an arbitrary angle and uses nearest-neighbour interpolation. I typed it directly into stackoverflow, so it is full of bugs; nonetheless, the concept is there.

FFTW and OpenCV's C++ interface, real and imaginary part in Mat output

I'm trying to code a FFT/IFFT function with FFTW 3.3 and OpenCV 2.1 using the C++ interface. I've seen a lot of examples using the old OpenCV formats and I did a direct conversion, but something doesn't work.
The objective of my function is to return a Mat object with the real part and the imaginary part of the FFT, like dft default OpenCV function does. Here is the code of the function. Program gets blocked with memory problem in the lines that copy im_data to data_in.
Does somebody know what am I doing wrong? Thank you
Mat fft_sr(Mat& I)
{
double *im_data;
double *realP_data;
double *imP_data;
fftw_complex *data_in;
fftw_complex *fft;
fftw_plan plan_f;
int width = I.cols;
int height = I.rows;
int step = I.step;
int i, j, k;
Mat realP=Mat::zeros(height,width,CV_64F); // Real Part FFT
Mat imP=Mat::zeros(height,width,CV_64F); // Imaginary Part FFT
im_data = ( double* ) I.data;
realP_data = ( double* ) realP.data;
imP_data = ( double* ) imP.data;
data_in = ( fftw_complex* )fftw_malloc( sizeof( fftw_complex ) * width * height );
fft = ( fftw_complex* )fftw_malloc( sizeof( fftw_complex ) * width * height );
// Problem Here
for( i = 0, k = 0 ; i < height ; i++ ) {
for( j = 0 ; j < width ; j++ ) {
data_in[k][0] = ( double )im_data[i * step + j];
data_in[k][1] = ( double )0.0;
k++;
}
}
plan_f = fftw_plan_dft_2d( height, width, data_in, fft, FFTW_FORWARD, FFTW_ESTIMATE );
fftw_execute( plan_f );
// Copy real and imaginary data
for( i = 0, k = 0 ; i < height ; i++ ) {
for( j = 0 ; j < width ; j++ ) {
realP_data[i * step + j] = ( double )fft[k][0];
imP_data[i * step + j] = ( double )fft[k][1];
k++;
}
}
Mat fft_I(I.size(),CV_64FC2);
Mat fftplanes[] = {Mat_<double>(realP), Mat_<double>(imP)};
merge(fftplanes, 2, fft_I);
fftw_destroy_plan(plan_f);
fftw_free(data_in);
fftw_free(fft);
return fft_I;
}
You are using step wrong. It is meant to index into Mat::data. Since you already casted Mat::data to double* when assigning it to im_data, you can index into im_data "normally":
data_in[k][0] = im_data[i * width + j];
When using step the correct way to index is:
data_in[k][0] = ( double )I.data[i * step + j];
Update:
Try to access your images row-wise. That way you avoid running into problems with stride/step, while still exploiting fast access:
for (int i = 0; i < I.rows; i++)
{
double* row = I.ptr<double>(i);
for (int j = 0; j < I.cols; j++)
{
// Do something with the current pixel.
double someValue = row[j];
}
}
I know this is old but when you are using fftw you need to initialize fftw_complex *data_in
only after creating the plan for the fft, if i recall correctly when you create the plan it sets all the
*data_in values to 0.
so allocate before the plan and initialize after!
Statement
im_data = ( double* ) I.data;
defines im_data as double pointer to image data.
I think that should be mandatory that I was a double values image.