bitmap row size calculation - c++

I found this source which works quite well, I just want to ask about this piece of code which I dont get:
//calculate total size of RGBQUAD scanlines (DWORD aligned)
bih.biSizeImage = (((bih.biWidth * 3) + 3) & 0xFFFC) * bih.biHeight ;
I get why there is "*3", but dont get the "+3" and the bitwise AND with FFFC hexa. Could someone explain me why he claculates size of the image this way?
Thanks

If you try that out for various values, you'll see it's actually forcing (width * 3) to round up to the smallest multiple of 4 that will contain it. He's probably doing this to enforce things to be 32-bit aligned.
Using python:
>>> f = lambda x: ((x * 3) + 3) & 0xFFFC
>>> [f(x) for x in range(1, 20)]
[4, 8, 12, 12, 16, 20, 24, 24, 28, 32, 36, 36, 40, 44, 48, 48, 52, 56, 60]
The following shows the difference between just doing the straight multiplication and rounding upwards towards a multiple of 4
>>> [(3*x, f(x)) for x in range(1, 8)]
[(3, 4), (6, 8), (9, 12), (12, 12), (15, 16), (18, 20), (21, 24)]
I'm surprised the code doesn't actually document this fact. Bit twiddling is a wonderful thing, but it can seem very arbitrary.

Related

Determining custom Yolov4 output layer shape for 2 classes

I've recently trained a darknet yolov4 model to detect 2 objects, converted it to tensorflow and then onnx using the following tutorial.
https://github.com/onnx/models/blob/master/vision/object_detection_segmentation/yolov4/dependencies/Conversion.ipynb
I ended up with a model with the following input and output layer dimensions
How can I determine the shape of the three output layers that have the unknown numbers?
I need them so I can use the model in ML.Net.
This is easy:
unk__2241 is batch size, so its unknown for now.
After CSPDarknet53, your output shape is (unk_2241,13,13,512) as reduce factor is 32. Then after SPP you have (unk_2241, 13, 13, 2048) with kernel size = [1,3,5,13].
You have 3 heads for detecting large, medium and small objects on image and 3 anchors per each of size of object. In this 3 heads yolo uses 3 feature maps which comes from modified PANet and this feature maps are (unk_2241,52,52,256), (unk_2241,26,26,512), (unk_2241,13,13,1024) as reduce factor is 8, 16, 32.
Then before each yolo layer, there is convolution for making final feature map and its shape is (unk_2241, 52, 52, 21), (unk_2241, 26, 26, 21) , (unk_2241, 13, 13, 21), and kernel shape is (256, (class number + coords of bbox + confidence)*anchor number for each size, 1, 1) ->(256, 21, 1, 1), (512, 21, 1, 1), (1024, 21, 1, 1).
And in yolo head you will have (unk_2241, 52, 52, 3, 7), (unk_2241, 26, 26, 3, 7), (unk_2241, 13, 13, 3, 7), where yolo head is dividing last axis of input by number of anchor, and number of class + coords of bbox + confidence.
As the result:
unk__2241 = unk__2242 = unk__2245 = unk__2248 -> They all batch sizes.
YOLO head output shapes: (unk_2242, 52, 52, 3, 7), (unk_2245, 26, 26, 3, 7), (unk_2248, 13, 13, 3, 7).
Batch size could be size of all dataset.

Halide: How to process image in (overlapping) blocks?

I'm discovering Halide and got some success with a pipeline doing various
transformations. Most of these are based on the examples within the sources (color-transformations, various filters, hist-eq).
My next step needs to process the image in blocks. In a more general form,
partially-overlapping blocks.
Examples
Input:
[ 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32]
Non-overlapping blocks:
Size: 2x4
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 5, 6, 7, 8,
13, 14, 15, 16]
[ 17, 18, 19, 20,
25, 26, 27, 28]
[ 21, 22, 23, 24,
29, 30, 31, 32]
Overlapping blocks:
Size: 2x4 with 50% overlap (both axes)
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 3, 4, 5, 6,
11, 12, 13, 14]
[ 5, 6, 7, 8,
13, 14, 15, 16]
-
[ 9, 10, 11, 12,
17, 18, 19, 20]
[11, 12, 13, 14,
19, 20, 21, 22]
...
I suspect there should be a nice way to express these, as those are also quite common
in many algorithms (e.g. macroblocks).
What i checked out
I tried to gather ideas from the tutorial and example apps and found the following,
which seem somewhat connected to what i want to implement:
Halide tutorial lesson 6: Realizing Funcs over arbitrary domains
// We start by creating an image that represents that rectangle
Image<int> shifted(5, 7); // In the constructor we tell it the size
shifted.set_min(100, 50); // Then we tell it the top-left corner
The problem i have: how to generalize this to multiple shifted domains without looping?
Halide tutorial lesson 9: Multi-pass Funcs, update definitions, and reductions
Here RDom is introduced which looks nice to create a block-view
Most examples using RDom seem to be sliding-window like approaches where there are no jumps
Target
So in general i'm asking how to implement a block-based view which can then be processed by
other steps.
It would be nice if the approach will be general enough to realize both, overlapping & no overlapping
Somehow generating the top-left indices first?
In my case, the image-dimension is known at compile-time which simplifies this
But i still would like some compact form which is nice to work with from Halide's perspective (no handcoded stuff like those examples with small filter-boxes)
The approach used might be depending on the output per block, which is a scalar in my case
Maybe someone can give me some ideas and/or some examples (which would be very helpful).
I'm sorry for not providing code, as i don't think i could produce anything helpful.
Edit: Solution
After dsharlet's answer and some tiny debugging/discussion here, the following very simplified self-containing code works (assuming an 1-channel 64x128 input like this one i created).
#include "Halide.h"
#include "Halide/tools/halide_image_io.h"
#include <iostream>
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = Halide::Tools::load_image("TestImages/block_example.png");
// This is a simple example assuming an input of 64x128
std::cout << "dim 0: " << input.width() << std::endl;
std::cout << "dim 1: " << input.height() << std::endl;
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Halide::Var xo, yo, xi, yi, x, y;
// The distance between the start of each tile in the input.
int tile_stride_x = 32;
int tile_stride_y = 64;
int tile_size_x = 32;
int tile_size_y = 64;
Halide::Func tiled_f;
tiled_f(xi, yi, xo, yo) = input(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Halide::RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Halide::Func tile_means;
tile_means(xo, yo) = sum(Halide::cast<uint32_t>(tiled_f(tile_dom.x, tile_dom.y, xo, yo))) / (tile_size_x * tile_size_y);
Halide::Func output;
output(xo, yo) = Halide::cast<uint8_t>(tile_means(xo, yo));
Halide::Buffer<uint8_t> output_(2, 2);
output.realize(output_);
Halide::Tools::save_image(output_, "block_based_stuff.png");
}
Here's an example that breaks a Func into blocks of abitrary stride and size:
Func f = ... // The thing being blocked
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Var xo, yo, xi, yi;
// The distance between the start of each tile in the input.
int tile_stride_x, tile_stride_y;
Func tiled_f;
tiled_f(xi, yi, xo, yo) = f(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Func tiled_output;
tiled_output(xi, yi, xo, yo) = ... // Your tiled processing here
To compute some reduction (like statistics) on each block, you can do the following:
RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Func tile_means;
tile_means(xo, yo) = sum(tiled_output(tile_dom.x, tile_dom.y, xo, yo)) / (tile_size_x * tile_size_y);
To flatten the tiles back into a result is a bit tricky. It probably depends on your method of combining the results in overlapped areas. If you want to add up the overlapping tiles, the simplest way is probably to use an RDom:
RDom tiles_dom(
0, tile_size_x,
0, tile_size_y,
min_tile_xo, extent_tile_xo,
min_tile_yo, extent_tile_yo);
Func output;
Expr output_x = tiles_dom[2] * tile_stride_x + tiles_dom[0];
Expr output_y = tiles_dom[3] * tile_stride_y + tiles_dom[1];
output(x, y) = 0;
output(output_x, output_y) += tiled_output(tiles_dom[0], tiles_dom[1], tiles_dom[2], tiles_dom[3]);
Note that in the above two blocks of code, tile_stride_x and tile_size_x are independent parameters, allowing for any tile size and overlap.
In both of your examples, tile_size_x = 4, and tile_size_y = 2. To get non-overlapping tiles, set the tile strides equal to the tile size. To get 50% overlapping tiles, set tile_stride_x = 2, and tile_stride_y = 1.
A useful schedule for an algorithm like this is:
// Compute tiles as needed by the output.
tiled_output.compute_at(output, tile_dom[2]);
// or
tiled_output.compute_at(tile_means, xo);
There are other options, like using a pure func (no update/RDom) that uses the mod operator to figure out tile inner and outer indices. However, this approach can be difficult to schedule efficiently with overlapping tiles (depending on the processing you do at each tile). I use the RDom approach when this problem comes up.
Note that with the RDom approach, you have to supply the bounds of the tile indices you want computed (min_tile_xo, extent_tile_xo, ...), which can be tricky for overlapped tiles.

C++ Pointers Tricky Summation Check

This is my first post, but I've frequented SO for help plenty of times. Still, please bear with me if I am not following proper etiquette. So to be brief, I'm working on a tricky problem involving pointers. I'll copy and paste it here because I'm worried I'd otherwise leave out important info:
Given a PHB list (1000 elements--ints from 1-100) of in_len (the pointer towards the last element) elements without any duplicates and pointed by in_list, the function will:
(1) determine if it is possible to generate the sum of 31 within 5 consecutive elements;
(2) if it is possible, return one list pointed by out_list that consists of positions of the list whose associated values add up to the number 31;
(3) if it is possible, return the list length pointed by out_len.
Note that each element can only be used once. In addition, you need to return only one list if multiple cases exist. For example, if a PHB list is {9, 5, 16, 22, 1, 37, 26, 14}, then the returning value of the function is true, and out_list contains {9, 5, 16, 1} and its length is 4, or it contains {9, 22} and its length is 2. Note that the list {16, 1, 14} is not qualified since the set of three values spans more than 5 elements. As another example, if a PHB list is {9, 5, 34}, it returns false. out_list will be empty with a length of 0.
That's it. At first, I attempted hard coding in tons of options (along the lines of
if *in_list, *(in_list + 1), *(in_list + 2), *(in_list + 3), *(in_list+4)
add up to 31,
//not actual code-just odd formatting
or *in_list, *(in_list + 1), *(in_list + 2), *(in_list + 3), add up to 31
or *in_list, *(in_list + 1), *(in_list + 2), *(in_list + 4), add up to 31
or *in_list, *(in_list + 1), *(in_list + 3), *in_list + 4), add to 31
or *in_list, *(in_list + 2), *(in_list + 3), *(in_list + 4) add to 31
or *in_list, *(in_list + 1), *(in_list + 2), add up to 31
etcetc,
but quickly realized I just don't know what I'm doing here. Fortunately, the out_list only needs one list that adds up to 31. I'd appreciate any sort of help.
Thanks for your time!
The Solution (thank you all for your advice):
http://pastebin.com/2sSShWty

Efficient Division by Least Significant Bit set in C/C++

I would like to perform the following operation as quickly as possible
x / LSB(x)
where x is an integral value unknown at compile time and LSB(x) = x & -x.
(Alternatively, the operation is equivalent to an even division by the highest power of 2 <= x.) I am looking for a reasonably portable solution (without compiler intrinsics/builtins like GCC's __builtin_clz or alike).
My concern is that the following simple implementation
x / (x & -x)
would still result in an expensive division as compiler might fail to realize that the division is in fact equivalent to right-shift by the number of trailing zeroes in the divisor.
If my concerns are reasonable, what would be a more efficient way to implement it?
I would appreciate a solution that is easily extendible to integral types of sizes 32-bit, 64-bits, 128-bits, ...
How about
x >>= ffs(x)-1;
The ffs function conforms to 4.3BSD, POSIX.1-2001.
It won't work if x is 0.
If you don't want to rely on CLZ (count leading zeros) hardware instructions, you can count leading zeros as described in this answer. It's very fast with a look-up and multiplication by a magic number. I'll re-post the code here:
unsigned x; // input to clz
unsigned c; // output of clz
static const unsigned MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
c = MultiplyDeBruijnBitPosition[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
Once you have counted the leading zeros, you no loner need to use a division instruction. Instead, you can just shift the value right by c. That is (eliminating an unneeded temporary value), the code becomes this:
static const unsigned MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
x >>= MultiplyDeBruijnBitPosition[((unsigned)((x & -x) * 0x077CB531U)) >> 27]; // x /= LSB(x)

How to count Combinations

I was wondering how i would go about counting combinations in a list. To be more precise i have a list that is comprised of smaller lists that are made up of 6 randomly chosen numbers and i want to count how many times each combinations occurs within the bigger list and then finally display the least occurring combination. So far i tried using Counter() but it seems it can't count lists.
here's an example of what i want to do:
list = [[1,2,3,4,5,6],[1,5,16,35,55,22],[1,2,3,4,5,6],[5,25,35,45,55,10],[1,5,16,35,55,22],[1,2,3,4,5,6],[9,16,21,22,23,6],[9,16,21,22,23,6]]
so after counting the combinations it should print the combination [5,25,35,45,55,10]
since it only occurred once in the list
FYI the list is going to randomly generated with around 1 billion combinations stored but given the range of numbers, there's only 175 million possible combinations
FYI 2 i'm extremely new to python
When you construct the Counter instance you can convert your lists to tuples; the latter are hashable, which is the property an object needs to be able to serve as a key of a dict.
>>> from collections import Counter
>>> l = [[1,2,3,4,5,6],[1,5,16,35,55,22],[1,2,3,4,5,6],[5,25,35,45,55,10],[1,5,16,35,55,22],[1,2,3,4,5,6],[9,16,21,22,23,6],[9,16,21,22,23,6]]
>>> c = Counter(tuple(e) for e in l)
>>> c
Counter({(1, 2, 3, 4, 5, 6): 3, (1, 5, 16, 35, 55, 22): 2, (9, 16, 21, 22, 23, 6): 2, (5, 25, 35, 45, 55, 10): 1})
>>> list(c.most_common()[-1][0])
[5, 25, 35, 45, 55, 10]