Using halide with HDR images represented as float array - c++

that's my first post here so sorry if I do something wrong:). I will try to do my best.
I currently working on my HDR image processing program, and I wonna implement some basing TMO using Halide. Problem is all my images are represented as float array (with order like: b1,g1,r1,a1, b2,g2,r2,a2, ... ). Using Halide to process image require Halide::Image class. Problem is I don't know how to pass those data there.
Anyone can help, or have same problem and know the answer?
Edit:
Finally got it! I need to set stride on input and output buffer in generator. Thx all for help:-)
Edit:
I tried two different ways:
int halideOperations( float data[] , int size, int width,int heighy )
{
buffer_t input_buf = { 0 };
input_buf.host = &data[0];
}
or:
int halideOperations( float data[] , int size, int width,int heighy )
{
Halide::Image(Halide::Type::Float, x, y, 0, 0, data);
}
I was thinking about editing Halide.h file and changing uint8_t * host to float_t * host but i don't think it's good idea.
Edit:
I tried using code below with my float image (RGBA):
AOT function generation:
int main(int arg, char ** argv)
{
Halide::ImageParam img(Halide::type_of<float>(), 3);
Halide::Func f;
Halide::Var x, y, c;
f(x, y, c) = Halide::pow(img(x,y,c), 2.f);
std::vector<Halide::Argument> arguments = { img };
f.compile_to_file("function", arguments);
return 0;
}
Proper code calling:
int halideOperations(float data[], int size, int width, int height)
{
buffer_t output_buf = { 0 };
buffer_t buf = { 0 };
buf.host = (uint8_t *)data;
float * output = new float[width * height * 4];
output_buf.host = (uint8_t*)(output);
output_buf.extent[0] = buf.extent[0] = width;
output_buf.extent[1] = buf.extent[1] = height;
output_buf.extent[2] = buf.extent[2] = 4;
output_buf.stride[0] = buf.stride[0] = 4;
output_buf.stride[1] = buf.stride[1] = width * 4;
output_buf.elem_size = buf.elem_size = sizeof(float);
function(&buf, &output_buf);
delete output;
return 1;
}
unfortunately I got crash with msg:
Error: Constraint violated: f0.stride.0 (4) == 1 (1)
I think something is wrong with this line: output_buf.stride[0] = buf.stride[0] = 4, but I'm not sure what should I change. Any tips?

If you are using buffer_t directly, you must cast the pointer assigned to host to a uint8_t * :
buf.host = (uint8_t *)&data[0]; // Often, can be just "(uint8_t *)data"
This is what you want to do if you are using Ahead-Of-Time (AOT) compilation and the data is not being allocated as part of the code which directly calls Halide. (Other methods discussed below control the storage allocation so they cannot take a pointer that is passed to them.)
If you are using either Halide::Image or Halide::Tools::Image, then the type casting is handled internally. The constructor used above for Halide::Image does't exist as Halide::Image is a template class where the underlying data type is a template parameter:
Halide::Image<float> image_storage(width, height, channels);
Note this will store the data in planar layout. Halide::Tools::Image is similar but has an option to do interleaved layout. (Personally, I try not to use either of these outside of small test programs. There is a long term plan to rationalize all of this which will proceed after the arbitrary dimension buffer_t branch is merged. Note also Halide::Image requires libHalide.a to be linked where Halide::Tools::Image does not and is header file only via including common/halide_image.h .)
There is also the Halide::Buffer class which is a wrapper on buffer_t that is useful in Just-In-Time (JIT) compilation. It can reference passed in storage and is not templated. However my guess is you want to use buffer_t directly and simply need the type cast to assign host. Also be sure to set the elem_size field of buffer_t to "sizeof(float)".
For an interleaved float buffer, you'll end up with something like:
buffer_t buf = {0};
buf.host = (uint8_t *)float_data; // Might also need const_cast
// If the buffer doesn't start at (0, 0), then assign mins
buf.extent[0] = width; // In elements, not bytes
buf.extent[1] = height; // In elements, not bytes
buf.extent[2] = 3; // Assuming RGB
// No need to assign additional extents as they were init'ed to zero above
buf.stride[0] = 3; // RGB interleaved
buf.stride[1] = width * 3; // Assuming no line padding
buf.stride[2] = 1; // Channel interleaved
buf.elem_size = sizeof(float);
You will also need to pay attention to the bounds in the Halide code itself. Probably best to look at the set_stride and bound calls in tutorial/lesson_16_rgb_generate.cpp for information on that.

In addition to Zalman's answer above you also have to specify the strides for the inputs and outputs when defining your Halide function like below:
int main(int arg, char ** argv)
{
Halide::ImageParam img(Halide::type_of<float>(), 3);
Halide::Func f;
Halide::Var x, y, c;
f(x, y, c) = Halide::pow(img(x,y,c), 2.f);
// You need the following
f.set_stride(0, f.output_buffer().extent(2));
f.set_stride(1, f.output_buffer().extent(0) * f.output_buffer().extent(2));
img.set_stride(0, img.extent(2));
img.set_stride(1, img.extent(2) *img.extent(0));
// <- up to here
std::vector<Halide::Argument> arguments = { img };
f.compile_to_file("function", arguments);
return 0;
}
then your code should run.

Related

Get certain pixels with their coordonates from an Image with Qt/QML

We are creating a game where there are maps. On those maps, players can walk, but to know if they can walk somewhere, we have another image, where the path is paint.
The player can move by clicking on the map, if the click match with the collider image, the character should go to the clicked point with a pathfinder. If not, the character don't move.
For example, here is a map and its collision path image :
How can I know if I've clicked on the collider (this is a png with one color and transparency) in Qt ?
I'm using QML and Felgo for rendering so if there is already a way to do it with QML, it's even better, but I can implement it in C++ too.
My second question is how can I do a pathfinder ? I know the algorithms for that but should I move by using pixels ?
I've seen the QPainterPath class which could be what i'm looking for, how can I read all pixels with a certain color of my image and know their coordonates ?
Thanks
QML interface doesn't provide efficient way to resolve your task. It should be done at C++ side.
To get image data you can use:
QImage to load image
Call N times QImage::constScanLine, each time read K pixels. N equals to image height in pixels, K equals to width.
How to deal with returned uchar* of QImage::constScanLine?
You should call QImage::format() to determine pixel format hidden by uchar*. Or you can call QImage::convertToFormat(QImage::Format_RGB32) and always cast pixel data from uchar* to your custom struct like PixelData:
#pragma pack(push, 1)
struct PixelData {
uint8_t padding;
uint8_t r;
uint8_t g;
uint8_t b;
};
#pragma pack(pop)
according to this documentation: https://doc.qt.io/qt-5/qimage.html#Format-enum
Here is compilable solution for loading image into RAM for further effective working with it's data:
#include <QImage>
#pragma pack(push, 1)
struct PixelData {
uint8_t padding;
uint8_t r;
uint8_t g;
uint8_t b;
};
#pragma pack(pop)
void loadImage(const char* path, int& w, int& h, PixelData** data) {
Q_ASSERT(data);
QImage initialImage;
initialImage.load(path);
auto image = initialImage.convertToFormat(QImage::Format_RGB32);
w = image.width();
h = image.height();
*data = new PixelData[w * h];
PixelData* outData = *data;
for (int y = 0; y < h; y++) {
auto scanLine = image.constScanLine(y);
memcpy(outData, scanLine, sizeof(PixelData) * w);
outData += w;
}
}
void pathfinder(const PixelData* data, int w, int h) {
// Your algorithm here
}
void cleanupData(PixelData* data) {
delete[] data;
}
int main(int argc, char *argv[])
{
int width, height;
PixelData* data;
loadImage("D:\\image.png", width, height, &data);
pathfinder(data, width, height);
cleanupData(data);
return 0;
}
You can access each pixel by calling this function
inline const PixelData& getPixel(int x, int y, const PixelData* data, int w) {
return *(data + (w * y) + x);
}
... or use this formula somewhere in your pathfinding algorithm, where it could be more efficient.

ITK Filter on an array of floats

I want to use ITK to Filter a 3D volume that is in a contiguous float array. I want to apply the Curvature Flow Filter as shown in this example.
I don't know how to interface my float array with ITK and then to get a float array out on the other side. Will it modify the data in place? How does it work?
Here is some code to demonstrate what I am trying to do.
#include "itkCurvatureFlowImageFilter.h"
int main(int argc, char *argv[])
{
float *my_array; // I have an array I have generated elsewhere
// This is a 3D volume in order XYZ with these dimensions along each axis
size_t num_x = 125;
size_t num_y = 250;
size_t num_z = 125;
size_t num_elements = num_x * num_y * num_z;
float voxel_size = 8e-3; // 8 mm voxels
constexpr unsigned int Dimension = 3;
// convert to an itk image
// itkimage = ???
using InputPixelType = float;
using InputImageType = itk::Image<InputPixelType, Dimension>;
const int numberOfIterations = 10;
const InputPixelType timeStep = 0.05;
using FilterType = itk::CurvatureFlowImageFilter<InputImageType, InputImageType>;
FilterType::Pointer filter = FilterType::New();
filter->SetInput(itkimage);
filter->SetNumberOfIterations(numberOfIterations);
filter->SetTimeStep(timeStep);
// now I want to put the filter result back in th array
filter->GetOutput();
// Is the data modified in place? How can I get a regular float array out??
return EXIT_SUCCESS;
}
itk::ImportImageFilter is used for the purpose of representing some array as an image. Look at the examples linked in the documentation.
As CurvatureFlowImageFilter derives from InPlaceImageFilter, and both your input and output pixel type are same (float), it can run in-place. But you still need to request that:
itkimage = import...;
...
filter->SetInput(itkimage );
// set other parameters
filter->SetInPlace(true);
filter->Update();
itkimage = filter->GetOutput();

Flatten array of structs efficiently

I'm looking for the most efficient way to flatten an array of structs in C++ for passing the flattend 1D array data as input to a cv::Mat. The struct looks as follows:
struct Color3
{
uint8_t red, green, blue;
}
My code then looks like this:
// Update color frame
cv::Mat colorMat = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3)
const Color3* colorPtr = colorFrame->getData(); // Get Frame from Library
std::vector<uchar> vecColorData;
data.reserve(1920 * 1080 * 3);
for (int i = 0; i < 1920 * 1080; ++i)
{
auto color = *colorPtr;
vecColorData.push_back(color.red);
vecColorData.push_back(color.green);
vecColorData.push_back(color.blue);
vecColorData++;
}
colorMat.data = vecColorData.data();
Is there a more efficient way than creating an intermediate std::vector and looping over the entire array? I guess I'm looking for something like:
colorMat.data = colorFrame->getData()
However, I'm getting the following error: a value of type Color3* cannot be assigned to an entity of type uchar*.
you don't need an intermediate vector.
If I understood, you want to assign the same RGB triple to all data.
It is also unclear to me if you have to allocate colorMat.data on your own or not.
If this is the case, once colorMat.data is allocated and sized 1920 * 1080 * 3, you can do something like the following:
uchar * data = colorMat.data;
for (int i = 0; i < 1920 * 1080; ++i)
{
*data++ = (uchar)colorPtr->red;
*data++ = (uchar)colorPtr->green;
*data++ = (uchar)colorPtr->.blue;
}
The following answer is not technically portable but will work on the vast majority of platforms you will encounter in real life.
It is extremely likely that your Color3 struct has no padding. You can veryify this by using a static_assert:
static_assert(sizeof(Color3) == sizeof(uint8_t) * 3);
With this confirmed you can cast an array of Color3 to an array of uint8_t and pass it directly to colorMat.data (assuming that member actually accepts uint8_t).
Your code therefore becomes:
cv::Mat colorMat = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3)
const Color3* colorPtr = colorFrame->getData(); // Get Frame from Library
colorMat.data = reinterpret_cast<const uint8_t*>(colorPtr);
Bear in mind I have never used the cv library and know nothing about the ownership requirements of the data pointer. The above just replicates what you're doing without the unnecessary std::vector.

Halide with GPU (OpenGL) as Target - benchmarking and using HalideRuntimeOpenGL.h

I am new to Halide. I have been playing around with the tutorials to get a feel for the language. Now, I am writing a small demo app to run from command line on OSX.
My goal is to perform a pixel-by-pixel operation on an image, schedule it on the GPU and measure the performance. I have tried a couple things which I want to share here and have a few questions about the next steps.
First approach
I scheduled the algorithm on GPU with Target being OpenGL, but because I could not access the GPU memory to write to a file, in the Halide routine, I copied the output to the CPU by creating Func cpu_out similar to the glsl sample app in the Halide repo
pixel_operation_cpu_out.cpp
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
const int _number_of_channels = 4;
int main(int argc, char** argv)
{
ImageParam input8(UInt(8), 3);
input8
.set_stride(0, _number_of_channels) // stride in dimension 0 (x) is three
.set_stride(2, 1); // stride in dimension 2 (c) is one
Var x("x"), y("y"), c("c");
// algorithm
Func input;
input(x, y, c) = cast<float>(input8(clamp(x, input8.left(), input8.right()),
clamp(y, input8.top(), input8.bottom()),
clamp(c, 0, _number_of_channels))) / 255.0f;
Func pixel_operation;
// calculate the corresponding value for input(x, y, c) after doing a
// pixel-wise operation on each each pixel. This gives us pixel_operation(x, y, c).
// This operation is not location dependent, eg: brighten
Func out;
out(x, y, c) = cast<uint8_t>(pixel_operation(x, y, c) * 255.0f + 0.5f);
out.output_buffer()
.set_stride(0, _number_of_channels)
.set_stride(2, 1);
input8.set_bounds(2, 0, _number_of_channels); // Dimension 2 (c) starts at 0 and has extent _number_of_channels.
out.output_buffer().set_bounds(2, 0, _number_of_channels);
// schedule
out.compute_root();
out.reorder(c, x, y)
.bound(c, 0, _number_of_channels)
.unroll(c);
// Schedule for GLSL
out.glsl(x, y, c);
Target target = get_target_from_environment();
target.set_feature(Target::OpenGL);
// create a cpu_out Func to copy over the data in Func out from GPU to CPU
std::vector<Argument> args = {input8};
Func cpu_out;
cpu_out(x, y, c) = out(x, y, c);
cpu_out.output_buffer()
.set_stride(0, _number_of_channels)
.set_stride(2, 1);
cpu_out.output_buffer().set_bounds(2, 0, _number_of_channels);
cpu_out.compile_to_file("pixel_operation_cpu_out", args, target);
return 0;
}
Since I compile this AOT, I make a function call in my main() for it. main() resides in another file.
main_file.cpp
Note: the Image class used here is the same as the one in this Halide sample app
int main()
{
char *encodeded_jpeg_input_buffer = read_from_jpeg_file("input_image.jpg");
unsigned char *pixelsRGBA = decompress_jpeg(encoded_jpeg_input_buffer);
Image input(width, height, channels, sizeof(uint8_t), Image::Interleaved);
Image output(width, height, channels, sizeof(uint8_t), Image::Interleaved);
input.buf.host = &pixelsRGBA[0];
unsigned char *outputPixelsRGBA = (unsigned char *)malloc(sizeof(unsigned char) * width * height * channels);
output.buf.host = &outputPixelsRGBA[0];
double best = benchmark(100, 10, [&]() {
pixel_operation_cpu_out(&input.buf, &output.buf);
});
char* encoded_jpeg_output_buffer = compress_jpeg(output.buf.host);
write_to_jpeg_file("output_image.jpg", encoded_jpeg_output_buffer);
}
This works just fine and gives me the output I expect. From what I understand, cpu_out makes the values in out available on the CPU memory, which is why I am able to access these values by accessing output.buf.host in main_file.cpp
Second approach:
The second thing I tried was to not do the copy to host from device in the Halide schedule by creating Func cpu_out, instead using copy_to_host function in main_file.cpp.
pixel_operation_gpu_out.cpp
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
const int _number_of_channels = 4;
int main(int argc, char** argv)
{
ImageParam input8(UInt(8), 3);
input8
.set_stride(0, _number_of_channels) // stride in dimension 0 (x) is three
.set_stride(2, 1); // stride in dimension 2 (c) is one
Var x("x"), y("y"), c("c");
// algorithm
Func input;
input(x, y, c) = cast<float>(input8(clamp(x, input8.left(), input8.right()),
clamp(y, input8.top(), input8.bottom()),
clamp(c, 0, _number_of_channels))) / 255.0f;
Func pixel_operation;
// calculate the corresponding value for input(x, y, c) after doing a
// pixel-wise operation on each each pixel. This gives us pixel_operation(x, y, c).
// This operation is not location dependent, eg: brighten
Func out;
out(x, y, c) = cast<uint8_t>(pixel_operation(x, y, c) * 255.0f + 0.5f);
out.output_buffer()
.set_stride(0, _number_of_channels)
.set_stride(2, 1);
input8.set_bounds(2, 0, _number_of_channels); // Dimension 2 (c) starts at 0 and has extent _number_of_channels.
out.output_buffer().set_bounds(2, 0, _number_of_channels);
// schedule
out.compute_root();
out.reorder(c, x, y)
.bound(c, 0, _number_of_channels)
.unroll(c);
// Schedule for GLSL
out.glsl(x, y, c);
Target target = get_target_from_environment();
target.set_feature(Target::OpenGL);
std::vector<Argument> args = {input8};
out.compile_to_file("pixel_operation_gpu_out", args, target);
return 0;
}
main_file.cpp
#include "pixel_operation_gpu_out.h"
#include "runtime/HalideRuntime.h"
int main()
{
char *encodeded_jpeg_input_buffer = read_from_jpeg_file("input_image.jpg");
unsigned char *pixelsRGBA = decompress_jpeg(encoded_jpeg_input_buffer);
Image input(width, height, channels, sizeof(uint8_t), Image::Interleaved);
Image output(width, height, channels, sizeof(uint8_t), Image::Interleaved);
input.buf.host = &pixelsRGBA[0];
unsigned char *outputPixelsRGBA = (unsigned char *)malloc(sizeof(unsigned char) * width * height * channels);
output.buf.host = &outputPixelsRGBA[0];
double best = benchmark(100, 10, [&]() {
pixel_operation_gpu_out(&input.buf, &output.buf);
});
int status = halide_copy_to_host(NULL, &output.buf);
char* encoded_jpeg_output_buffer = compress_jpeg(output.buf.host);
write_to_jpeg_file("output_image.jpg", encoded_jpeg_output_buffer);
return 0;
}
So, now, what I think is happening is that pixel_operation_gpu_out is keeping output.buf on the GPU and when I do copy_to_host, that's when I get the memory copied over to the CPU. This program gives me the expected output as well.
Questions:
The second approach is much slower than the first approach. The slow part is not in the benchmarked part though. For example, for first approach, I get 17ms as benchmarked time for a 4k image. For the same image, in the second approach, I get the benchmarked time as 22us and the time taken for copy_to_host is 10s. I'm not sure if this behavior is expected since both approach 1 and 2 are essentially doing the same thing.
The next thing I tried was to use [HalideRuntimeOpenGL.h][3] and link textures to input and output buffers to be able to draw directly to a OpenGL context from main_file.cpp instead of saving to a jpeg file. However, I could find no examples to figure out how to use the functions in HalideRuntimeOpenGL.h and whatever things I did try on my own were always giving me run time errors which I could not figure out how to solve. If anyone has any resources they can point me to, that will be great.
Also, any feedback on the code I have above are welcome too. I know it works and is doing what I want but it could be the completely wrong way of doing it and I wouldn't know any better.
Mostly likely the reason for the 10s to copy memory back is because the GPU API has queued all the kernel invocations and then waits on them to finish when halide_copy_to_host is called. You can call halide_device_sync inside the benchmark timing after running all the compute calls to handle get the compute time inside the loop without the copy back time.
I cannot tell from the code how many times the kernel is being run from this code. (My guess is 100, but it may be that those arguments to benchmark setup some sort of parameterization where it tries to run it as many times as need be to get significance. If so, that is a problem because the queuing call is really fast but the compute is of course async. If this is the case, you can do things like queue ten calls and then call halide_device_sync and play with the number "10" to get a real picture of how long it takes.)

glReadPixels store x, y values

I'm trying to store pixel data by using glReadPixels, but so far I managed to only store it one pixel at a time. I'm not sure if this is the way to go. I currently have this:
unsigned char pixels[3];
glReadPixels(50,50, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, pixels);
What would be a good way to store it in an array, so that I can get the values like this:
pixels[20][50][0]; // x=20 y=50 -> R value
pixels[20][50][1]; // x=20 y=50 -> G value
pixels[20][50][2]; // x=20 y=50 -> B value
I guess I could simple put it in a loop:
for ( all pixels on Y axis )
{
for ( all pixels in X axis )
{
unsigned char pixels[width][height][3];
glReadPixels(x,y, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, pixels[x][y]);
}
}
But I have the feeling that there must be a much better way to do this. But I do however need my array to be like I described above the code. So would the for loop idea be good, or is there a better way?
glReadPixels simply returns bytes in the order R, G, B, R, G, B, ... (based on your setting of GL_RGB) from the bottom left of the screen going up to the top right. From the OpenGL documentation:
glReadPixels returns pixel data from the frame buffer, starting with
the pixel whose lower left corner is at location (x, y), into client
memory starting at location data. Several parameters control the
processing of the pixel data before it is placed into client memory.
These parameters are set with three commands: glPixelStore,
glPixelTransfer, and glPixelMap. This reference page describes the
effects on glReadPixels of most, but not all of the parameters
specified by these three commands.
The overhead of calling glReadPixels thousands of times will most likely take a noticeable amount of time (depends on the window size, I wouldn't be surprised if the loop took 1-2 seconds).
It is recommended that you only call glReadPixels once and store it in a byte array of size (width - x) * (height - y) * 3. From there you can either reference a pixel's component location with data[(py * width + px) * 3 + component] where px and py are the pixel locations you want to look up, and component being the R, G, or B components of the pixel.
If you absolutely must have it in a 3-dimensional array, you can write some code to rearrange the 1d array after the glReadPixels call.
If you'll define pixel array like: this:
unsigned char pixels[MAX_Y][MAX_X][3];
And the you'll access it like this:
pixels[y][x][0] = r;
pixels[y][x][1] = g;
pixels[y][x][2] = b;
Then you'll be able to read pixels with one glReadPixels call:
glReadPixels(left, top, MAX_Y, MAX_X, GL_RGB, GL_UNSIGNED_BYTE, pixels);
What you can do is declare a simple one dimensional array in a struct and use operator overloading for convenient subscript notation
struct Pixel2d
{
static const int SIZE = 50;
unsigned char& operator()( int nCol, int nRow, int RGB)
{
return pixels[ ( nCol* SIZE + nRow) * 3 + RGB];
}
unsigned char pixels[SIZE * SIZE * 3 ];
};
int main()
{
Pixel2d p2darray;
glReadPixels(50,50, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, &p.pixels);
for( int i = 0; i < Pixel2d::SIZE ; ++i )
{
for( int j = 0; j < Pixel2d::SIZE ; ++j )
{
unsigned char rpixel = p2darray(i , j , 0);
unsigned char gpixel = p2darray(i , j , 1);
unsigned char bpixel = p2darray(i , j , 2);
}
}
}
Here you are reading a 50*50 pixel in one shot and using operator()( int nCol, int nRow, int RGB) operator provides the needed convenience. For performance reasons you don't want to make too many glReadPixels calls