I have an OpenCL program that calculates pixel RGB values. Declaration as follows.
__kernel void pixel_kernel( __global uchar* r,
__global uchar* g,
__global uchar* b,
__global uchar* a,
__global int* width,
__global int* height)
During the program a float4 col variable is created and calculated. So I want to extract the RGB components and return them as the r g b and a uchar types.
At the end of the code I have
r[x]=255;
g[x]=0;
b[x]=0;
Which happily compiles and the returned color is red.
If I try and convert the float4 values into RGB I cannot seem to work out how to cast them. For example the following results in a compilation error and the cl does not run
r[x]=(uchar)(col[0]*255);
g[x]=(uchar)(col[1]*255);
b[x]=(uchar)(col[2]*255);
What am I missing? How should this cast be declared so it correctly converts the float RGB components into uchar values between 0 and 255?
Must be a simple fix, but I have tried all permutations of casting I can think of and none of them seem to want to work. Thanks for any tips.
The OpenCL float4 data type contains 4 float. To address these components, you can use either .x, .y, .z, .w or .s0, .s1, .s2, .s3:
float4 col = (float4)(0.1f, 0.2f, 0.3f, 0.4f);
r[x]=(uchar)(col.s0*255);
g[x]=(uchar)(col.s1*255);
b[x]=(uchar)(col.s2*255);
a[x]=(uchar)(col.s3*255);
float4 col; is not the same as a 4-vector float col[4];, but more like a C99 struct; this is why addressing like col[0] does not work with float4. See also the OpenCL 1.2 Reference Card page 3.
Related
I'm trying to port a piece of OpenCL Kernel code over to SideFX Houdini using
its internal scripting language call VEX (stand for vector expression).
However, Im having problem in understanding what those indexes do and how they work.
I understand that get_global_id() returned the index into the work for a given work item (read it somewhere ) but I dnt really understand exactly whats that is. (perhaps something to do with the computer cores, i guess?)
SO aasuming the input is a 2D grid formed by 500pixel in x and y,
and assuming every pixel got some attributes (the one I pass into the kernel arguments, with the name_in, while the name_out are to update the same attributes value ), what is he doing with those index operation ?
How exactly is it workin and how could I do the same in c for example ?
Many thank you in advance,
Alessandro
__kernel void rd_compute(__global float4 *a_in, __global float4 *b_in, __global float4 *c_in, __global float4 *d_in, __global float4 *e_in, __global float4 *f_in, __global float4 *g_in, __global float4 *h_in, __global float4 *i_in, __global float4 *a_out, __global float4 *b_out, __global float4 *c_out, __global float4 *d_out, __global float4 *e_out, __global float4 *f_out, __global float4 *g_out, __global float4 *h_out, __global float4 *i_out)
{
const int index_x = get_global_id(0);
const int index_y = get_global_id(1);
const int index_z = get_global_id(2);
const int X = get_global_size(0);
const int Y = get_global_size(1);
const int Z = get_global_size(2);
const int index_here = X*(Y*index_z + index_y) + index_x;
Please study many of the great introductory tutorials.
In serial code if you used a loop (e.g., for (int i=0; i<10; i++)) then int i = get_global_id(0) replaces that so you can get the index of the current work item. The runtime ensures that all work items are run. They might be in parallel, in serial, or in groups (some combination).
i am trying to optimize a simple opencl kernel using float4 instead of float.
This is the example code without float4.
example code:
__kernel void Substract (
__global const float* data,
const float val,
__global float* result
){
size_t gi = get_global_id(0);
float input_val = data[gi];
result[gi] = val - input_val;
}
My idea for float4:
__kernel void substract (
__global const float* data,
const float val,
__global float* result
){
size_t gi = get_global_id(0);
float4 val2 = float4 (val,val,val,val);
float4 input_val = data[gi*4];
result[gi] = val2 - input_val;
}
However this does not work, because we can not write back a float4 result into a float array. Is there a performant possibilty to write back float4 to a normal float array in opencl? The simple idea would be a for loop with 4 runs.
I want to optimize the kernel for gpu and cpu.
So if i have a variant with float4 and one without, both should run under the excact same kernel arguments. Is this possible?
You can just declare your arguments as float4 pointers, without changing anything on the host. Also, the compiler should automatically widen scalar values if they are used in expressions containing vectors, so you don't need to manually create a float4 version of val:
__kernel void Substract (
__global const float4* data,
const float val,
__global float4* result
){
size_t gi = get_global_id(0);
float4 input_val = data[gi];
result[gi] = val - input_val;
}
I've a problem with downsampling image with bilinear interpolation. I've read almost all relevant articles on stackoverflow and searched around in google, trying to solve or at least to find the problem in my OpenCL kernel. This is my main source for the theory. After I implemented this code in OpenCL:
__kernel void downsample(__global uchar* image, __global uchar* outputImage, __global int* width, __global int* height, __global float* factor){
//image vector containing original RGB values
//outputImage vector containing "downsampled" RGB mean values
//factor - downsampling factor, downscaling the image by factor: 1024*1024 -> 1024/factor * 1024/factor
int r = get_global_id(0);
int c = get_global_id(1); //current coordinates
int oWidth = get_global_size(0);
int olc, ohc, olr, ohr; //coordinates of the original image used for bilinear interpolation
int index; //linearized index of the point
uchar q11, q12, q21, q22;
float accurate_c, accurate_r; //the exact scaled point
int k;
accurate_c = convert_float(c*factor[0]);
olc=convert_int(accurate_c);
ohc=olc+1;
if(!(ohc<width[0]))
ohc=olc;
accurate_r = convert_float(r*factor[0]);
olr=convert_int(accurate_r);
ohr=olr+1;
if(!(ohr<height[0]))
ohr=olr;
index= (c + r*oWidth)*3; //3 bytes per pixel
//Compute RGB values: take a central mean RGB values among four points
for(k=0; k<3; k++){
q11=image[(olc + olr*width[0])*3+k];
q12=image[(olc + ohr*width[0])*3+k];
q21=image[(ohc + olr*width[0])*3+k];
q22=image[(ohc + ohr*width[0])*3+k];
outputImage[index+k] = convert_uchar((q11*(ohc - accurate_c)*(ohr - accurate_r) +
q21*(accurate_c - olc)*(ohr - accurate_r) +
q12*(ohc - accurate_c)*(accurate_r - olr) +
q22*(accurate_c - olc)*(accurate_r - olr)));
}
}
The kernel works with factor = 2, 4, 5, 6 but not with factor = 3, 7 (I get missing pixels, and the image appears little bit skewed) whereas the "identical" code written in c++ works fine with all factor values. I cann't explain it to myself why that happens in opencl. I attach my full code project here
I am trying to write a code in C++, but after some search on the internet, I found one OpenCL based code is doing exactly the same thing as I want to do in C++. But since this is the first time I see a OpenCL code, I don't know how to change the following functions into c++:
const __global float4 *in_buf;
int x = get_global_id(0);
int y = get_global_id(1);
float result = y * get_global_size(0);
Is 'const __global float4 *in_buf' equivalent to 'const float *in_buf' in c++? And how to change the above other functions? Could anyone help? Thanks.
In general, you should take a look at the OpenCL specification (I'm assuming it's written in OpenCL 1.x) to better understand functions, types and how a kernel works.
Specifically for your question:
get_global_id returns the id of the current work item, and get_global_size returns the total number of work items. Since an OpenCL work-item is roughly equivalent to a single iteration in a sequential language, the equivalent of OpenCL's:
int x = get_global_id(0);
int y = get_global_id(1);
// do something with x and y
float result = y * get_global_size(0);
Will be C's:
for (int x = 0; x < dim0; x++) {
for (int y = 0; y < dim1; y++) {
// do something with x and y
float result = y * dim0;
}
}
As for float4 it's a vector type of 4 floats, roughly equivalent to C's float[4] (except that it supports many additional operators, such as vector arithmetic). Of course in this case it's a buffer, so an appropriate type would be float** or float[4]* - or better yet, just pack them together into a float* buffer and then load 4 at a time.
Feel free to ignore the __global modifier.
const __global float4 *in_buf is not equivalent to const float *in_buf.
The OpenCL uses vector variables, e.g. floatN, where N is e.g. 2,4,8. So float4 is in fact struct { float w, float x, float y, float z} with lot of tricks available to express vector operations.
get_global_id(0) gives you the iterator variable, so essentially replace every get_global_id(dim) with for(int x = 0; x< max[dim]; x++)
A function that I am trying to conform to requires three 1-Dimensional arrays of type double[19200]. The following arrays are RGB arrays such that:
double r[19200]; // r
double g[19200]; // g
double b[19200]; // b
So far, I can extract pixel information from a QImage and populate the above arrays.
The problem is with testing. I don't know how to do the inverse: given the three 1-Dimensional arrays how do I create a new QImage from this data?
I would like to verify that I am indeed getting the correct values. (Things like column vs. row major order is giving me doubts). As a result, I am trying to construct an image a QImage from these three 1-D Dimensional arrays.
I don't really understand why you're having a problem if you managed to do it one way. The process is essentially the same:
for (int x=0; x<w; x++)
for (int y=0; y<h; y++)
image.setPixel(x,y, convertToRGB(r[x*w+y], ...);
Where convertToRGB is the inverse transform of what you to to convert and RGB value to your float values, supposing the image has dimension w*h. If you discover this is the wrong row-major/column major variant, just inverse it.
Since you gave no info about how you do the color space conversion, and we don't know if it's row-major or column-major either, can't help you much more than that.
Well it looks like QImage supports a couple of ways to load from pixel arrays.
QImage(const uchar *data, int width, int height, Format format)
bool QImage::loadFromData(const uchar *buf, int len, const char *format=0)
Using the first example, if you have the arrays you mention, then you will likely want to use the format QImage::Format_RGB888 (from qimage.h).
You will need to know the width and height yourself.
Finally you will want to repack your arrays into a single uchar* array
uchar* rgb_array = new uchar[19200+19200+19200];
for( int i = 0, j = 0; j < 19200; ++j )
{
// here we convert from the double range 0..1 to the integer range 0..255
rgb_array[i++] = r[j] * 255;
rgb_array[i++] = g[j] * 255;
rgb_array[i++] = b[j] * 255;
}
{
QImage my_image( rgb_array, width, height, QImage::Format_RGB888 );
// do stuff with my_image...
}
delete[] rgb_array; // note you need to hold onto this array while the image still exists