Halide: How to process image in (overlapping) blocks? - c++

I'm discovering Halide and got some success with a pipeline doing various
transformations. Most of these are based on the examples within the sources (color-transformations, various filters, hist-eq).
My next step needs to process the image in blocks. In a more general form,
partially-overlapping blocks.
Examples
Input:
[ 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32]
Non-overlapping blocks:
Size: 2x4
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 5, 6, 7, 8,
13, 14, 15, 16]
[ 17, 18, 19, 20,
25, 26, 27, 28]
[ 21, 22, 23, 24,
29, 30, 31, 32]
Overlapping blocks:
Size: 2x4 with 50% overlap (both axes)
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 3, 4, 5, 6,
11, 12, 13, 14]
[ 5, 6, 7, 8,
13, 14, 15, 16]
-
[ 9, 10, 11, 12,
17, 18, 19, 20]
[11, 12, 13, 14,
19, 20, 21, 22]
...
I suspect there should be a nice way to express these, as those are also quite common
in many algorithms (e.g. macroblocks).
What i checked out
I tried to gather ideas from the tutorial and example apps and found the following,
which seem somewhat connected to what i want to implement:
Halide tutorial lesson 6: Realizing Funcs over arbitrary domains
// We start by creating an image that represents that rectangle
Image<int> shifted(5, 7); // In the constructor we tell it the size
shifted.set_min(100, 50); // Then we tell it the top-left corner
The problem i have: how to generalize this to multiple shifted domains without looping?
Halide tutorial lesson 9: Multi-pass Funcs, update definitions, and reductions
Here RDom is introduced which looks nice to create a block-view
Most examples using RDom seem to be sliding-window like approaches where there are no jumps
Target
So in general i'm asking how to implement a block-based view which can then be processed by
other steps.
It would be nice if the approach will be general enough to realize both, overlapping & no overlapping
Somehow generating the top-left indices first?
In my case, the image-dimension is known at compile-time which simplifies this
But i still would like some compact form which is nice to work with from Halide's perspective (no handcoded stuff like those examples with small filter-boxes)
The approach used might be depending on the output per block, which is a scalar in my case
Maybe someone can give me some ideas and/or some examples (which would be very helpful).
I'm sorry for not providing code, as i don't think i could produce anything helpful.
Edit: Solution
After dsharlet's answer and some tiny debugging/discussion here, the following very simplified self-containing code works (assuming an 1-channel 64x128 input like this one i created).
#include "Halide.h"
#include "Halide/tools/halide_image_io.h"
#include <iostream>
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = Halide::Tools::load_image("TestImages/block_example.png");
// This is a simple example assuming an input of 64x128
std::cout << "dim 0: " << input.width() << std::endl;
std::cout << "dim 1: " << input.height() << std::endl;
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Halide::Var xo, yo, xi, yi, x, y;
// The distance between the start of each tile in the input.
int tile_stride_x = 32;
int tile_stride_y = 64;
int tile_size_x = 32;
int tile_size_y = 64;
Halide::Func tiled_f;
tiled_f(xi, yi, xo, yo) = input(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Halide::RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Halide::Func tile_means;
tile_means(xo, yo) = sum(Halide::cast<uint32_t>(tiled_f(tile_dom.x, tile_dom.y, xo, yo))) / (tile_size_x * tile_size_y);
Halide::Func output;
output(xo, yo) = Halide::cast<uint8_t>(tile_means(xo, yo));
Halide::Buffer<uint8_t> output_(2, 2);
output.realize(output_);
Halide::Tools::save_image(output_, "block_based_stuff.png");
}

Here's an example that breaks a Func into blocks of abitrary stride and size:
Func f = ... // The thing being blocked
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Var xo, yo, xi, yi;
// The distance between the start of each tile in the input.
int tile_stride_x, tile_stride_y;
Func tiled_f;
tiled_f(xi, yi, xo, yo) = f(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Func tiled_output;
tiled_output(xi, yi, xo, yo) = ... // Your tiled processing here
To compute some reduction (like statistics) on each block, you can do the following:
RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Func tile_means;
tile_means(xo, yo) = sum(tiled_output(tile_dom.x, tile_dom.y, xo, yo)) / (tile_size_x * tile_size_y);
To flatten the tiles back into a result is a bit tricky. It probably depends on your method of combining the results in overlapped areas. If you want to add up the overlapping tiles, the simplest way is probably to use an RDom:
RDom tiles_dom(
0, tile_size_x,
0, tile_size_y,
min_tile_xo, extent_tile_xo,
min_tile_yo, extent_tile_yo);
Func output;
Expr output_x = tiles_dom[2] * tile_stride_x + tiles_dom[0];
Expr output_y = tiles_dom[3] * tile_stride_y + tiles_dom[1];
output(x, y) = 0;
output(output_x, output_y) += tiled_output(tiles_dom[0], tiles_dom[1], tiles_dom[2], tiles_dom[3]);
Note that in the above two blocks of code, tile_stride_x and tile_size_x are independent parameters, allowing for any tile size and overlap.
In both of your examples, tile_size_x = 4, and tile_size_y = 2. To get non-overlapping tiles, set the tile strides equal to the tile size. To get 50% overlapping tiles, set tile_stride_x = 2, and tile_stride_y = 1.
A useful schedule for an algorithm like this is:
// Compute tiles as needed by the output.
tiled_output.compute_at(output, tile_dom[2]);
// or
tiled_output.compute_at(tile_means, xo);
There are other options, like using a pure func (no update/RDom) that uses the mod operator to figure out tile inner and outer indices. However, this approach can be difficult to schedule efficiently with overlapping tiles (depending on the processing you do at each tile). I use the RDom approach when this problem comes up.
Note that with the RDom approach, you have to supply the bounds of the tile indices you want computed (min_tile_xo, extent_tile_xo, ...), which can be tricky for overlapped tiles.

Related

Controlling Mutation in 39bit string as a candidate solution in genetic algorithm

I am working on an optimization problem. I have X number of ambulance locations, where X ranges from 1-39.
There are 43 numbers [Ambulance Locations] to choose from (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39) , we choose 3 of them since I have 3 ambulances.
I can only put my ambulance in three locations among 1-39 locations (Restriction). Assume that I want to put my Ambulance on the 5th, 19th, and 31 positions. -- Chromosome 1= [000010000000000000100000000000100000000]. In the above presentation, I am turning on 5-bit, 19-bit, and 31-bit.
Is it possible to flip a bit close to the original solution? For example, keeping 2 bits on in the original position and randomly changing the 3rd bit close to 2bits. It is important for me to keep 3bits on among 39bits. I want to make a control mutation with the aim to produce a small change.
My goal is to make small changes since each bit represents a location. The purpose of mutation is to make small changes and see evaluate results. Therefore, a code should do something like this. As for CS1: (111000000000000000000000000000000000000), I want something like (011010000000000000000000000000000000000), or (011001000000000000000000000000000000000), or (010110000000000000000000000000000000000) or (101010000000000000000000000000000000000), or (00101100000000000000000000000000000000), etc
To achieve mutation, what can be a good way to randomly change present positions to other positions keeping the range only between 1-39 locations (Restriction)?
you could use numpy and do something like
import numpy
s = "1110000000000000000000000000"
def mutate(s):
arr = numpy.array(list(s))
mask = arr == "1"
indices_of_ones = numpy.argwhere(mask).flatten()
pick_one_1_index = numpy.random.choice(indices_of_ones)
potential_swaps = numpy.argwhere(~mask).flatten()
distances = numpy.abs(pick_one_1_index - potential_swaps)
probabilities = (1/distances) # higher probabilities the less distance from its original position
# probabilities = (1/(distances*2)) # even higher probabilities the less distance from its original position
pick_one_0_index = numpy.random.choice(potential_swaps,p=probabilities/probabilities.sum())
arr[pick_one_1_index] = '0'
arr[pick_one_0_index] = '1'
return "".join(arr)
there is likely a more optimal solution
alternatively you can add a scalar or power to the distances to penalize more for distance...
if you wanted to test different multipliers or powers for the probabilities
you could use something like
def score_solution(s1,s2):
ix1 = set([i for i,v in enumerate(s1) if v == "1"])
ix2 = set([i for i,v in enumerate(s2) if v == "1"])
a,b = ix1 ^ ix2
return numpy.abs(a-b)
def get_solution_score_quantiles(sample_size=100,quantiles = [0.25,0.5,0.75]):
scores = []
for i in range(10):
s1 = mutate(s)
scores.append(score_solution(s,s1))
return numpy.quantile(scores,quantiles)
print(get_solution_score_quantiles(50))

Determining custom Yolov4 output layer shape for 2 classes

I've recently trained a darknet yolov4 model to detect 2 objects, converted it to tensorflow and then onnx using the following tutorial.
https://github.com/onnx/models/blob/master/vision/object_detection_segmentation/yolov4/dependencies/Conversion.ipynb
I ended up with a model with the following input and output layer dimensions
How can I determine the shape of the three output layers that have the unknown numbers?
I need them so I can use the model in ML.Net.
This is easy:
unk__2241 is batch size, so its unknown for now.
After CSPDarknet53, your output shape is (unk_2241,13,13,512) as reduce factor is 32. Then after SPP you have (unk_2241, 13, 13, 2048) with kernel size = [1,3,5,13].
You have 3 heads for detecting large, medium and small objects on image and 3 anchors per each of size of object. In this 3 heads yolo uses 3 feature maps which comes from modified PANet and this feature maps are (unk_2241,52,52,256), (unk_2241,26,26,512), (unk_2241,13,13,1024) as reduce factor is 8, 16, 32.
Then before each yolo layer, there is convolution for making final feature map and its shape is (unk_2241, 52, 52, 21), (unk_2241, 26, 26, 21) , (unk_2241, 13, 13, 21), and kernel shape is (256, (class number + coords of bbox + confidence)*anchor number for each size, 1, 1) ->(256, 21, 1, 1), (512, 21, 1, 1), (1024, 21, 1, 1).
And in yolo head you will have (unk_2241, 52, 52, 3, 7), (unk_2241, 26, 26, 3, 7), (unk_2241, 13, 13, 3, 7), where yolo head is dividing last axis of input by number of anchor, and number of class + coords of bbox + confidence.
As the result:
unk__2241 = unk__2242 = unk__2245 = unk__2248 -> They all batch sizes.
YOLO head output shapes: (unk_2242, 52, 52, 3, 7), (unk_2245, 26, 26, 3, 7), (unk_2248, 13, 13, 3, 7).
Batch size could be size of all dataset.

Copy a chunk of one tensor into another one in C++ API

I need to copy a row of one tensor (in c++ API) into some part of another tensor, form which the begin and end indexes are available. Within C++ we can use something like:
int myints[] = {10, 20, 30, 40, 50, 60, 70};
std::vector<int> myvector(18);
std::copy(myints, myints + 3, myvector.begin() + 4);
to copy three values from myints into myvector, starting at the fourth index.
I was wondering if there is a similar API in libtorch (i.e., C++)?
The C++ API provides the Python slice equivalent function
at::Tensor at::Tensor::slice(int64_t dim, int64_t start, int64_t end, int64_t step);
You can thus do something like:
auto myints = torch::Tensor({10, 20, 30, 40, 50, 60, 70});
auto myvector = torch::ones({18});
myvector.slice(0, 3, 7) = myints.slice(0, 0, 3);
in your case using dim=0 first dimension
Pytorch 1.5 using Tensor::index and Tensor::index_put_
using namespace torch::indexing;
auto myints = torch::Tensor({10, 20, 30, 40, 50, 60, 70});
auto myvector = torch::ones({18});
myvector.index_put_({3, 7}, myints.index({0, 3}));
General translation for Tensor::index and Tensor::index_put_
Python C++ (assuming `using namespace torch::indexing`)
-------------------------------------------------------------------
0 0
None None
... "..." or Ellipsis
: Slice()
start:stop:step Slice(start, stop, step)
True / False true / false
[[1, 2]] torch::tensor({{1, 2}})
Pytorch 1.4 alternative functions
Tensor Tensor::narrow(int64_t dim, int64_t start, int64_t length)
Tensor & Tensor::copy_(const Tensor & src, bool non_blocking=false)
narrow is almost exactly like slice and using copy_ for assignment
auto myints = torch::Tensor({10, 20, 30, 40, 50, 60, 70});
auto myvector = torch::ones({18});
myvector.narrow(0, 3, 4).copy_(myvector.narrow(0, 0, 3));

Efficient Eigen Matrix SubIndexing + Concatenation

I'm using Eigen for easy optimization of some of my matrix math. I'm currently trying to make the following operation more efficient:
Given Matrix A:
1, 2, 3
4, 5, 6
Matrix B:
7, 11, 13, 19, 26, 7, 11
8, 9, 15, 6, 8, 4, 1
and "index map" column vector IM:
0, 1, 3, 6
I'd like to append the columns of Matrix B mapping to the indexes in IM, to Matrix A as such:
1, 2, 3, 7, 11, 19, 11
4, 5, 6, 8, 9, 6, 1
I'm currently able to do this with a massive for loop, but this is the bottleneck in my code and I'd like to avoid this:
#pragma unroll
for (int i = 0; i < 25088; i++) {
block.noalias() += _features.col(ff[i]);
}
I've seen the discussion here and poured over the docs but can't seem to figure out the right syntax relating to Eigen matrices: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=329
Any thoughts/tips would be much appreciated!

How to detect a shake gesture, given a list of touch points?

Detecting a shake gesture, from a collection of points, is basically looking for three changes in direction:
Example: (We need to look only at x-coordinates, as we are looking only for horizontal shakes, not vertical shakes)
1,2,3,4,5,6,7,8,[9],8,7,[6],[7]
In the above sequence of x-coords, I have marked the changes in direction with [].
The problem is, in the above case, we would detect even tiny unintentional shakes - for example, if you ask a person to drag his finger from the bottom of the screen to the top in a straight line, his hand may move a little left and right unintentionally, and we would regard this as a "shake"
Example:
1,2,[3],[2],[3].... (unintentional shake)
To avoid this, we need some kind of threshold, only above which we regard the movement as a shake. For example, the gap between changes in direction should be atleast 3 points, and the difference in value should be atleast 4.
So we should have something like:
1,2,3,4,5,6,7,8,[9],8,7,6,[5],6,7,8,[9]..... detected shake
1,2,3,4,5,6,7,8,[9],8,7,6,6,7,8,9..... ignored shake
1,2,3,2,1.... ignored shake...
This seems tricky to implement, as one would probably have to keep track of three indices. Rather than implement this myself, I was wondering if this is a known algorithm with a solution that I can look up ?
Depending on the fact that the derivative describes the change in movement of a function, you may use derivative to this solve the problem easily.
Let us take the first example:
1, 2, 3, 4, 5, 6, 7, 8, [9], 8, 7, [6], [7]
By finding the derivative of this sequence:
1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 1
+ + + + + + + + - - - +
Now, it is easy to know where the shakes were happend.
Another example:
1, 12, 15, 8, 3, 1, 0, 5, 17, 30
1st derivative:
11, 3, -7, -5, -2, -1, 5, 12, 13
+ + - - - - + + +
Simple implementation (non-tested, non-optimized):
template <typename valueType> // http://stackoverflow.com/a/67020/4523099
bool same_sign(typename valueType x, typename valueType y){
return (x >= 0) ^ (y < 0);
}
template <typename T>
std::vector<T> get_derivative(std::vector<T> vec_x){
for(size_t i=0;i<vec_x.size()-1;++i){
vec_x[i]= vec_x[i+1]-vec_x[i];
}
vec_x.pop_back();
return vec_x;
}
int main(){
std::vector<int> x{1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 7 };
auto first_derivative=get_derivative(x);
std::vector<size_t> indices_of_shakes;
for(size_t i=0;i<first_derivative.size()-1;++i){
if(!same_sign(first_derivative[i],first_derivative[i+1])){
indices_of_shakes.emplace_back(i);
}
}
}