Pack smaller Rectangle in Bigger One with highest Repeat Count? - c++

i have a canvas rectangle ( constant width and height ) , i have child rectangle (also constant width and height) .
i want to fit the smaller rectangle in canvas with highest repeat count (or least wastage space Or maximize occupancy ratio ) .
when i tried famous algorithms like GuillotineBinPack or MaxRectsBinPack to fit lets say 25*20 rectangle in 70*100 all of them give me maximum of 13 rectangle instead of optimal result of 14 ( 5 first row + 5 second row + 4 third row).
Note : i tried all possible Heuristic permutations available with algorithm and even failed to achieve my optimal goal .
any small hint will highly appreciated.

Related

Disparity Map Block Matching

I am writing a disparity matching algorithm using block matching, but I am not sure how to find the corresponding pixel values in the secondary image.
Given a square window of some size, what techniques exist to find the corresponding pixels? Do I need to use feature matching algorithms or is there a simpler method, such as summing the pixel values and determining whether they are within some threshold, or perhaps converting the pixel values to binary strings where the values are either greater than or less than the center pixel?
I'm going to assume you're talking about Stereo Disparity, in which case you will likely want to use a simple Sum of Absolute Differences (read that wiki article before you continue here). You should also read this tutorial by Chris McCormick before you read more here.
side note: SAD is not the only method, but it's really common and should solve your problem.
You already have the right idea. Make windows, move windows, sum pixels, find minimums. So I'll give you what I think might help:
To start:
If you have color images, first you will want to convert them to black and white. In python you might use a simple function like this per pixel, where x is a pixel that contains RGB.
def rgb_to_bw(x):
return int(x[0]*0.299 + x[1]*0.587 + x[2]*0.114)
You will want this to be black and white to make the SAD easier to computer. If you're wondering why you don't loose significant information from this, you might be interested in learning what a Bayer Filter is. The Bayer Filter, which is typically RGGB, also explains the multiplication ratios of the Red, Green, and Blue portions of the pixel.
Calculating the SAD:
You already mentioned that you have a window of some size, which is exactly what you want to do. Let's say this window is n x n in size. You would also have some window in your left image WL and some window in your right image WR. The idea is to find the pair that has the smallest SAD.
So, for each left window pixel pl at some location in the window (x,y) you would the absolute value of difference of the right window pixel pr also located at (x,y). you would also want some running value, which is the sum of these absolute differences. In sudo code:
SAD = 0
from x = 0 to n:
from y = 0 to n:
SAD = SAD + absolute_value|pl - pr|
After you calculate the SAD for this pair of windows, WL and WR you will want to "slide" WR to a new location and calculate another SAD. You want to find the pair of WL and WR with the smallest SAD - which you can think of as being the most similar windows. In other words, the WL and WR with the smallest SAD are "matched". When you have the minimum SAD for the current WL you will "slide" WL and repeat.
Disparity is calculated by the distance between the matched WL and WR. For visualization, you can scale this distance to be between 0-255 and output that to another image. I posted 3 images below to show you this.
Typical Results:
Left Image:
Right Image:
Calculated Disparity (from the left image):
you can get test images here: http://vision.middlebury.edu/stereo/data/scenes2003/

Choose rectangles for maximizing the area

I've got a 2D-binary matrix of arbitrary size. I want to find a set of rectangles in this matrix, showing a maximum area. The constraints are:
Rectangles may only cover "0"-fields in the matrix and no "1"-fields.
Each rectangle has to have a given distance from the next rectangle.
So let me illustrate this a bit further by this matrix:
1 0 0 1
0 0 0 0
0 0 1 0
0 0 0 0
0 1 0 0
Let the minimal distance between two rectangles be 1. Consequently, the optimal solution would be by choosing the rectangles with corners (1,0)-(3,1) and (1,3)-(4,3). These rectangles are min. 1 field apart from each other and they do not lie on "1"-fields. Additionally, this solution got the maximum area (6+4=10).
If the minimal distance would be 2, the optimum would be (1,0)-(4,0) and (1,3)-(4,3) with area 4+4=8.
Till now, I achieved to find out rectangles analogous to this post:
Find largest rectangle containing only zeros in an N×N binary matrix
I saved all these rectangles in a list:
list<rectangle> rectangles;
with
struct rectangle {
int i,j; // bottom left corner of rectangle
int width,length; // width=size in neg. i direction, length=size in pos. j direction
};
Till now, I only thought about brute-force-methods but of course, I am not happy with this.
I hope you can give me some hints and tips of how to find the corresponding rectangles in my list and I hope my problem is clear to you.
The following counterexample shows that even a brute-force checking of all combinations of maximal-area rectangles can fail to find the optimum:
110
000
110
In the above example, there are 2 maximal-area rectangles, each of area 3, one vertical and one horizontal. You can't pick both, so if you are restricted to choosing a subset of these rectangles, the best you can do is to pick (either) one for a total area of 3. But if you instead picked the vertical area-3 rectangle, and then also took the non-maximal 1x2 rectangle consisting of just the leftmost two 0s, you could get a better total area of 5. (That's for a minimum separation distance of 0; if the minimum separation distance is 1, as in your own example, then you could instead pick just the leftmost 0 as a 1x1 rectangle for a total area of 4, which is still better than 3.)
For the special case when the separation distance is 0, there's a trivial algorithm: you can simply put a 1x1 rectangle on every single 0 in the matrix. When the separation distance is strictly greater than 0, I don't yet see a fast algorithm, though I'm less sure that the problem is NP-hard now than I was a few minutes ago...

HOG: What is done in the contrast-normalization step?

According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.
I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?
As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.
Is there something I don't understand well?
Should I normalize the cells' values?
Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?
Link to the paper:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
The contrast normalization is achieved by normalization of each block's local histogram.
The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract
When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.
The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.
The steps of the algorithm due to my understanding - worked for me with 95% success rate:
Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):
A cell size in pixels (e.g. 6x6)
A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
Calculate gradient:
Compute the X difference using convolution with the vector [-1 0 1]
Compute the Y difference using convolution with the transpose of the above vector
Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
"Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):
Split to cells
For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
Create final vector:
Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318

Calculate scrollbar height in grid with varied row height

I've grid with a lot of rows (e.g. 1 000 000). Height of each row may be unique. But most of rows has same height. So it's not possible to determine height of each row and get total grid height.
I need implement smooth vertical scrolling over this grid, not only jump over row, because row can be higher than visible area.
My solution is:
get number of rows
each row is divided into 10 parts
=> scroll bar max value is (number of rows)*10
from scroll position I get :
first visible row = (scroll position) / 10
first visible row shift = (scroll position) % 10
This work fine, if all rows has +- same height. If there is one row with height 500 px and other has 25 px scroll looks awful.
Has anybody suggestion how to better solve this problem?
Grid is here :
http://img560.imageshack.us/img560/7775/scroll.png
Let the scroll be in pixel units:
Sum the total height of all the rows and set the scrollbar max value to that value.
Cache the first visible row index in a variable.
When the user scrolls up or down, you can scan sequentially from the current first visible row to find the new one. This gives amortized constant-time work per update for sequential read.
You won't do random access (e.g. scroll to row number N) frequently so doing a linear search when you do is fine. If you need something faster (I doubt that) then you can pre compute the partial sums of row heights and do binary search.

How to create data fom image like "Letter Image Recognition Dataset" from UCI

I am using letter_regcog example from OpenCV, it used dataset from UCI which have structure like this:
Attribute Information:
1. lettr capital letter (26 values from A to Z)
2. x-box horizontal position of box (integer)
3. y-box vertical position of box (integer)
4. width width of box (integer)
5. high height of box (integer)
6. onpix total # on pixels (integer)
7. x-bar mean x of on pixels in box (integer)
8. y-bar mean y of on pixels in box (integer)
9. x2bar mean x variance (integer)
10. y2bar mean y variance (integer)
11. xybar mean x y correlation (integer)
12. x2ybr mean of x * x * y (integer)
13. xy2br mean of x * y * y (integer)
14. x-ege mean edge count left to right (integer)
15. xegvy correlation of x-ege with y (integer)
16. y-ege mean edge count bottom to top (integer)
17. yegvx correlation of y-ege with x (integer)
example:
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
now I have segmented image of letter and want to transform it into data like this to put recognize it but I don't understand the mean of all value like "6. onpix total # on pixels" what is it mean ? Can you please explain the mean of these value. thanks.
I am not familiar with OpenCV's letter_recog example, but this appears to be a feature vector, or set of statistics about the image of a letter that is used to classify the future occurrences of the letter. The results of your segmentation should leave you with a binary mask with 1's on the letter and 0's everywhere else. onpix is simply the total count of pixels that fall on the letter, or in other words, the sum of your binary mask.
Most of the rest values in the list need to be calculated based on the set of pixels with a value of 1 in your binary mask. x and y are just the position of the pixel. For instance, x-bar is just the sample mean of all of the x positions of all pixels that have a 1 in the mask. You should be able to easily find references on the web for mathematical definitions of mean, variance, covariance and correlation.
14-17 are a little different since they are based on edge pixels, but the calculations should be similar, just over a different set of pixels.
My name is Antonio Bernal.
In page 3 of this article you will find a good description for each value.
Letter Recognition Using Holland-Style Adaptive Classifiers.
If you have any doubt let me know.
I am trying to make this algorithm work, but my problem is that I do not know how to scale the values to fit them to the range 0-15.
Do you have any idea how to do this?
Another Link from Google scholar -> Letter Recognition Using Holland-Style Adaptive Classifiers