Boolean operations on 2d rectangles - c++

I have written my own rectangle class and it includes a method to subtract one rectangle from another. The algorithm simply determines which edge the source rectangle is overlapping on the destination rectangle and then chugs through all possible cases, including being completely inside, just on the edge, completely enclosing and so on. In fact there are so many cases I'm looking at the code and wondering if there are algorithms or examples of boolean operations on rectangles already available.
I know there are generalised clipping algorithms for 2d polytopes but I was looking for something specific to 2d rectangles, with the appropriate concomitant optimisations and simplifications.
Can anyone point me in the right direction, or is Weiler-Atherton the last word on this general class of problem of which the rectangle is just a single case?

If you separate the two directions, you have only a few base cases, which you can then combine in a nested loop.
The base cases are sketched below:
| |
XXXXX |..............| 1 section
| |
XXXXXXX...........| 2 sections
| |
|...XXXXXXX....| 3 sections
| |
|..........XXXXXXXX 2 secions
| |
|..............| XXXX 1 section
| |
XXXXXXXXXXXXXXXXXXXX nothing
| |
The vertical bars are the edges of the original rectangle, the X is the rectangle that is to be subtracted, the dots mark sections. X between the vertical bars are also sections that are kept, except when combined with an X section of the other direction. (If that sounds too complicated: The hole left behind is designated by the X section in both directions.
We can separate the directions by redesigning the rectangles properties left, top, right and bottom into arrays of min/max values:
typedef struct Rect Rect;
struct Rect {
int min[2];
int max[2];
};
(The code is C, not C++, I'm afraid.)
Then we can find the sections for each direction:
int rect_sub_dir(int sec[], int *skip, Rect a, Rect b, int dir)
{
int n = 0;
sec[n++] = a.min[dir];
if (b.min[dir] > a.min[dir] && b.min[dir] < a.max[dir]) {
sec[n++] = b.min[dir];
}
*skip = n - 1;
if (b.max[dir] < a.max[dir] && b.max[dir] > a.min[dir]) {
sec[n++] = b.max[dir];
}
sec[n] = a.max[dir];
// Backpatch if rectangles don't overlap
if (b.max[dir] < a.min[dir]) *skip = -1;
if (b.min[dir] > a.max[dir]) *skip = -1;
return n;
}
This creates an array of n + 1 boundaries, between n sections. The skip value denotes a section marked X between vertical bars.
You can then combine the sections of the two directions:
int rect_sub(Rect res[], Rect a, Rect b)
{
int hor[4];
int ver[4];
int hskip, nhor;
int vskip, nver;
int h, v;
int n = 0;
nhor = rect_sub_dir(hor, &hskip, a, b, 0);
nver = rect_sub_dir(ver, &vskip, a, b, 1);
printf("%d, %d\n", hskip, vskip);
for (h = 0; h < nhor; h++) {
for (v = 0; v < nver; v++) {
if (h == hskip && v == vskip) continue;
res[n++] = rect(hor[h], ver[v], hor[h + 1], ver[v + 1]);
}
}
return n;
}
This solution is not optimal. It will create eight rectangles when the second rectangle is contained in the first one, which may not be what you are looking for. You could always try to merge adjacent rectangles afterwards. Or you could rewrite the code to split the rectangles more intelligently.
I have tested the code with some cases, but because there are many possible arrangements, the code is not fully tested.

Related

How to respond to a limitless possibility of outcomes?

So let's say that there is an imaginary 2 by 2 grid comprised for 4 numbers ...
1 2
3 4
You can either flip the grid horizontally or vertically down the middle by imputing either H or V respectively. You can also flip the grid as many times as you wish, with the previous choice affecting your future outcome.
For example, you could flip the grid horizontally down the middle, and then vertically.
While solving this problem, I got enough code written down so that the program works, except for the part where the "flipping" happens. Since you can enter as many H's and V's as you would like, I have some trouble writing code that would support this action.
Since the program input could contain as many horizontal or vertical flips as the user would prefer, that prevents me from manually using if-statements; in other words, I can't say "if the 1st letter is H, flip horizontally, if the 2nd letter is V, flip vertically, etc.".
This is just a short snippet of what I have figured out so far...
void flipGrid(string str, int letterPlace)
{
while (letterPlace < str.length())
{
if (str.at(letterPlace) == 'H')
{
// flip grid horizontally
}
else if (str.at(letterPlace) == 'V')
{
// flip grid vertically
}
letterPlace += 1;
}
}
int main()
{
int increment = 0;
string userInput;
cin >> userInput;
flipGrid(userInput, increment);
return 0;
}
As you can probably tell, I need help with the parts specified by the comments. If the code were to run as planned, it should look something like this...
Input (example 1)
H
Output
3 4
1 2
Input (example 2)
HVVH
Output (the two H's and the two V's cancel out, leaving us with the original)
1 2
3 4
I feel like there should be an easier way to solve this problem, or is the method I'm currently working on the right way to approach this problem? Please let me know if I'm on the right track or not. Thanks!
I would do a few things. First, I would simply count the H's and V's and, when done, modulo 2 each count. This will leave you flipCountH and flipCountV each having 0 or 1. There's no need to do multiple flips, right? Then you'll at most do each action once.
void flipCounts(string str, int &flipCountH, int &flipCountY)
{
for (char c: str) {
if (c == 'H')
{
++flipCountH;
}
else if (c == 'V')
{
++clipCountY
}
}
}
Use that method, then:
flipCountH %= 2;
flipCountY %= 2;
if (flipCountH > 0) {
performHorizontalFlip();
}
if (flipCountV > 0) {
performVerticalFlip();
}
Now, HOW you flip is based on how you store the data. For this very specific problem, I would store it in an int[2][2].
void performVerticalFlip() {
int[2] topLine;
topLine[0] = grid[0][0];
topLine[1] = grid[0][1];
grid[0][0] = grid[1][0];
grid[0][1] = grid[1][1];
grid[1][0] = topLine[0];
grid[1][1] = topLine[1];
}
Now, you can probably make use of C++ move semantics, but that's an advanced topic. You could also make a swap method that swaps two integers. That's not so advanced.
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
Then the code above is simpler:
swap(grid[0][0], grid[1][0]);
swap(grid[0][1], grid[1][1]);
Horizontal flip is similar.
From the comments:
I don't know how to flip it in each statement
So, flipping a 2x2 grid vertically is simple:
int tmp = grid[0][0];
grid[0][0] = grid[1][0];
grid[1][0] = tmp;
tmp = grid[0][1];
grid[0][1] = grid[1][1];
grid[1][1] = tmp;
If you have a grid bigger than a 2x2, this will work as well:
// for half the height of the grid
for(unsigned int i = 0;i<Height/2;i++) {
// for the width of the grid
for(unsigned int j =0; j<Width) {
// store a copy of the old value
int tmp = grid[i][j];
// put the new value in
grid[i][j] = grid[Height-1-i][j]; // note, we are flipping this vertically,
// so we want something an equal distance away
// from the other end as us
// replace the value we were grabbing from with the saved value
grid[Height-1-i][j] = tmp;
}
}
In case this is homework, I'm going to leave a horizontal flip for you to figure out (hint, it's the same thing, but with the width and height reversed).

Alive neighbour cells not correctly counted

I know my title isn't very specific but that's because I have no idea where the problem comes from. I'm stuck with this problem since 2 or 3 hours and in theory everything should be working, but it's not.
This piece of code:
for ( int x = -1; x <= 1; x++ ) { //Iterate through the 8 neighbour cells plus the one indicated
for ( int y = -1; y <= 1; y++ ) {
neighbour = coords(locX + x, locY + y, width); //Get the cell index in the array
if (existsInOrtho(ortho, neighbour)) { //If the index exists in the array
if (ortho[neighbour] == 0) { //Cell is dead
cnt--; //Remove one from the number of alive neighbour cells
}
} else { //Cell is not in the zone
cnt--; //Remove one from the number of alive neighbour cells
}
}
}
Iterates through all the neighbour cells to get their value in the array (1 for alive, 0 for dead). The "coords" function, shown here:
int coords(int locX, int locY, int width)
{
int res = -1;
locX = locX - 1; //Remove one from both coordinates, since an index starts at 0 (and the zone starts at (1;1) )
locY = locY - 1;
res = locX * width + locY; //Small calculation to get the index of the pixel in the array
return res;
}
Gets the index of the cell in the array. But when I run the code, it doesn't work, the number of neighbour cells is not correct (it's like a cell is not counted every time there's some alive in the neighborhood). I tried decomposing everything manually, and it works, so I don't know what ruins everything in the final code... Here is the complete code. Sorry if I made any English mistake, it's not my native language.
This code ...
for ( int x = -1; x <= 1; x++ ) { //Iterate through the 8 neighbour cells plus the one indicated
for ( int y = -1; y <= 1; y++ ) {
Actually checks 9 cells. Perhaps you forgot that it checks (x,y) = (0,0). That would include the cell itself as well as its neighbours.
A simple fix is:
for ( int x = -1; x <= 1; x++ ) { //Iterate through the 8 neighbour cells plus the one indicated
for ( int y = -1; y <= 1; y++ ) {
if (x || y) {
Also, the simulate function (from your link) makes the common mistake of updating the value of the cell in the same array before processing state changes required for the cells beside it. The easiest fix is to keep two arrays -- two complete copies of the grid (two ortho arrays, in your code). When reading from orthoA, update orthoB. And then on the next generation, flip. Read from orthoB and write to orthoA.

Tallest tower with stacked boxes in the given order

Given N boxes. How can i find the tallest tower made with them in the given order ? (Given order means that the first box must be at the base of the tower and so on). All boxes must be used to make a valid tower.
It is possible to rotate the box on any axis in a way that any of its 6 faces gets parallel to the ground, however the perimeter of such face must be completely restrained inside the perimeter of the superior face of the box below it. In the case of the first box it is possible to choose any face, because the ground is big enough.
To solve this problem i've tried the following:
- Firstly the code generates the rotations for each rectangle (just a permutation of the dimensions)
- secondly constructing a dynamic programming solution for each box and each possible rotation
- finally search for the highest tower made (in the dp table)
But my algorithm is taking wrong answer in unknown test cases. What is wrong with it ? Dynamic programming is the best approach to solve this problem ?
Here is my code:
#include <cstdio>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <cstring>
struct rectangle{
int coords[3];
rectangle(){ coords[0] = coords[1] = coords[2] = 0; }
rectangle(int a, int b, int c){coords[0] = a; coords[1] = b; coords[2] = c; }
};
bool canStack(rectangle &current_rectangle, rectangle &last_rectangle){
for (int i = 0; i < 2; ++i)
if(current_rectangle.coords[i] > last_rectangle.coords[i])
return false;
return true;
}
//six is the number of rotations for each rectangle
int dp(std::vector< std::vector<rectangle> > &v){
int memoization[6][v.size()];
memset(memoization, -1, sizeof(memoization));
//all rotations of the first rectangle can be used
for (int i = 0; i < 6; ++i) {
memoization[i][0] = v[0][i].coords[2];
}
//for each rectangle
for (int i = 1; i < v.size(); ++i) {
//for each possible permutation of the current rectangle
for (int j = 0; j < 6; ++j) {
//for each permutation of the previous rectangle
for (int k = 0; k < 6; ++k) {
rectangle &prev = v[i - 1][k];
rectangle &curr = v[i][j];
//is possible to put the current rectangle with the previous rectangle ?
if( canStack(curr, prev) ) {
memoization[j][i] = std::max(memoization[j][i], curr.coords[2] + memoization[k][i-1]);
}
}
}
}
//what is the best solution ?
int ret = -1;
for (int i = 0; i < 6; ++i) {
ret = std::max(memoization[i][v.size()-1], ret);
}
return ret;
}
int main ( void ) {
int n;
scanf("%d", &n);
std::vector< std::vector<rectangle> > v(n);
for (int i = 0; i < n; ++i) {
rectangle r;
scanf("%d %d %d", &r.coords[0], &r.coords[1], &r.coords[2]);
//generate all rotations with the given rectangle (all combinations of the coordinates)
for (int j = 0; j < 3; ++j)
for (int k = 0; k < 3; ++k)
if(j != k) //micro optimization disease
for (int l = 0; l < 3; ++l)
if(l != j && l != k)
v[i].push_back( rectangle(r.coords[j], r.coords[k], r.coords[l]) );
}
printf("%d\n", dp(v));
}
Input Description
A test case starts with an integer N, representing the number of boxes (1 ≤ N ≤ 10^5).
Following there will be N rows, each containing three integers, A, B and C, representing the dimensions of the boxes (1 ≤ A, B, C ≤ 10^4).
Output Description
Print one row containing one integer, representing the maximum height of the stack if it’s possible to pile all the N boxes, or -1 otherwise.
Sample Input
2
5 2 2
1 3 4
Sample Output
6
Sample image for the given input and output.
Usually you're given the test case that made you fail. Otherwise, finding the problem is a lot harder.
You can always approach it from a different angle! I'm going to leave out the boring parts that are easily replicated.
struct Box { unsigned int dim[3]; };
Box will store the dimensions of each... box. When it comes time to read the dimensions, it needs to be sorted so that dim[0] >= dim[1] >= dim[2].
The idea is to loop and read the next box each iteration. It then compares the second largest dimension of the new box with the second largest dimension of the last box, and same with the third largest. If in either case the newer box is larger, it adjusts the older box to compare the first largest and third largest dimension. If that fails too, then the first and second largest. This way, it always prefers using a larger dimension as the vertical one.
If it had to rotate a box, it goes to the next box down and checks that the rotation doesn't need to be adjusted there too. It continues until there are no more boxes or it didn't need to rotate the next box. If at any time, all three rotations for a box failed to make it large enough, it stops because there is no solution.
Once all the boxes are in place, it just sums up each one's vertical dimension.
int main()
{
unsigned int size; //num boxes
std::cin >> size;
std::vector<Box> boxes(size); //all boxes
std::vector<unsigned char> pos(size, 0); //index of vertical dimension
//gets the index of dimension that isn't vertical
//largest indicates if it should pick the larger or smaller one
auto get = [](unsigned char x, bool largest) { if (largest) return x == 0 ? 1 : 0; return x == 2 ? 1 : 2; };
//check will compare the dimensions of two boxes and return true if the smaller one is under the larger one
auto check = [&boxes, &pos, &get](unsigned int x, bool largest) { return boxes[x - 1].dim[get(pos[x - 1], largest)] < boxes[x].dim[get(pos[x], largest)]; };
unsigned int x = 0, y; //indexing variables
unsigned char change; //detects box rotation change
bool fail = false; //if it cannot be solved
for (x = 0; x < size && !fail; ++x)
{
//read in the next three dimensions
//make sure dim[0] >= dim[1] >= dim[2]
//simple enough to write
//mine was too ugly and I didn't want to be embarrassed
y = x;
while (y && !fail) //when y == 0, no more boxes to check
{
change = pos[y - 1];
while (check(y, true) || check(y, false)) //while invalid rotation
{
if (++pos[y - 1] == 3) //rotate, when pos == 3, no solution
{
fail = true;
break;
}
}
if (change != pos[y - 1]) //if rotated box
--y;
else
break;
}
}
if (fail)
{
std::cout << -1;
}
else
{
unsigned long long max = 0;
for (x = 0; x < size; ++x)
max += boxes[x].dim[pos[x]];
std::cout << max;
}
return 0;
}
It works for the test cases I've written, but given that I don't know what caused yours to fail, I can't tell you what mine does differently (assuming it also doesn't fail your test conditions).
If you are allowed, this problem might benefit from a tree data structure.
First, define the three possible cases of block:
1) Cube - there is only one possible option for orientation, since every orientation results in the same height (applied toward total height) and the same footprint (applied to the restriction that the footprint of each block is completely contained by the block below it).
2) Square Rectangle - there are three possible orientations for this rectangle with two equal dimensions (for examples, a 4x4x1 or a 4x4x7 would both fit this).
3) All Different Dimensions - there are six possible orientations for this shape, where each side is different from the rest.
For the first box, choose how many orientations its shape allows, and create corresponding nodes at the first level (a root node with zero height will allow using simple binary trees, rather than requiring a more complicated type of tree that allows multiple elements within each node). Then, for each orientation, choose how many orientations the next box allows but only create nodes for those that are valid for the given orientation of the current box. If no orientations are possible given the orientation of the current box, remove that entire unique branch of orientations (the first parent node with multiple valid orientations will have one orientation removed by this pruning, but that parent node and all of its ancestors will be preserved otherwise).
By doing this, you can check for sets of boxes that have no solution by checking whether there are any elements below the root node, since an empty tree indicates that all possible orientations have been pruned away by invalid combinations.
If the tree is not empty, then just walk the tree to find the highest sum of heights within each branch of the tree, recursively up the tree to the root - the sum value is your maximum height, such as the following pseudocode:
std::size_t maximum_height() const{
if(leftnode == nullptr || rightnode == nullptr)
return this_node_box_height;
else{
auto leftheight = leftnode->maximum_height() + this_node_box_height;
auto rightheight = rightnode->maximum_height() + this_node_box_height;
if(leftheight >= rightheight)
return leftheight;
else
return rightheight;
}
}
The benefits of using a tree data structure are
1) You will greatly reduce the number of possible combinations you have to store and check, because in a tree, the invalid orientations will be eliminated at the earliest possible point - for example, using your 2x2x5 first box, with three possible orientations (as a Square Rectangle), only two orientations are possible because there is no possible way to orient it on its 2x2 end and still fit the 4x3x1 block on it. If on average only two orientations are possible for each block, you will need a much smaller number of nodes than if you compute every possible orientation and then filter them as a second step.
2) Detecting sets of blocks where there is no solution is much easier, because the data structure will only contain valid combinations.
3) Working with the finished tree will be much easier - for example, to find the sequence of orientations of the highest, rather than just the actual height, you could pass an empty std::vector to a modified highest() implementation, and let it append the actual orientation of each highest node as it walks the tree, in addition to returning the height.

Combining overlapping groups in an image

I am using opencv_contrib to detect textual regions in an image.
This is the original image
This is the image after textual regions are found:
As can be seen, there are overlapping groups in the image. For example, there seem to be two groups around Hello World and two around Some more sample text
Question
In scenarios like these how can I keep the widest possible box by merging the two boxes. For these examples that would be one starting with H and ending in d so that it covers Hello World. My reason for doing is is that I would like to crop part of this image and send it to tesseract.
Here is the relevant code that draws the boxes.
void groups_draw(Mat &src, vector<Rect> &groups)
{
for (int i=(int)groups.size()-1; i>=0; i--)
{
if (src.type() == CV_8UC3)
rectangle(src,groups.at(i).tl(),groups.at(i).br(),Scalar( 0, 255, 255 ), 2, 8 );
}
}
Here is what I've tried. My ideas are in comments.
void groups_draw(Mat &src, vector<Rect> &groups)
{
int previous_tl_x = 0;
int previous_tl_y = 0;
int prevoius_br_x = 0;
int previous_br_y = 0;
//sort the groups from lowest to largest.
for (int i=(int)groups.size()-1; i>=0; i--)
{
//if previous_tl_x is smaller than current_tl_x then keep the current one.
//if previous_br_x is smaller than current_br_x then keep the current one.
if (src.type() == CV_8UC3) {
//crop the image
Mat cropedImage = src(Rect(Point(groups.at(i).tl().x, groups.at(i).tl().y),Point(groups.at(i).br().x, groups.at(i).br().y)));
imshow("cropped",cropedImage);
waitKey(-1);
}
}
}
Update
I'm trying to use [groupRectangles][4] to accomplish this:
void groups_draw(Mat &src, vector<Rect> &groups)
{
vector<Rect> rects;
for (int i=(int)groups.size()-1; i>=0; i--)
{
rects.push_back(groups.at(i));
}
groupRectangles(rects, 1, 0.2);
}
However, this is giving me an error:
textdetection.cpp:106:5: error: use of undeclared identifier 'groupRectangles'
groupRectangles(rects, 1, 0.2);
^
1 error generated.
First, the reason you get overlapping bounding boxes is that the text detector module is working on inverted channels (e.g: gray and inverted gray) and because of that the inner regions of some characters such as o's and g's are wrongly detected and grouped as characters. So if you want to detect only one mode of text (white text on dark background) just pass the inverted channels.
Replace:
for (int c = 0; c < cn-1; c++)
channels.push_back(255-channels[c]);
With:
for (int c = 0; c < cn-1; c++)
channels[c] = (255-channels[c]);
Now for your question, rectangles have defined intersection and combining operators:
rect = rect1 & rect2 (rectangle intersection)
rect = rect1 | rect2 (minimum area rectangle containing rect2 and rect3 )
rect &= rect1, rect |= rect1 (and the corresponding augmenting operations)
You can use those operators while iterating over rectangles to detect intersected rectangles and combine them, as follows:
if ((rect1 & rect2).area() != 0)
rect1 |= rect2;
Edit:
First, sort rectangle groups by area from largest to smallest:
std::sort(groups.begin(), groups.end(),
[](const cv::Rect &rect1, const cv::Rect &rect2) -> bool {return rect1.area() > rect2.area();});
Then, iterate over the rectangles, when two rectangles intersect add the smaller to the larger and then delete it:
for (int i = 0; i < groups.size(); i++)
{
for (int j = i + 1; j < groups.size(); j++)
{
if ((groups[i] & groups[j]).area() != 0)
{
groups[i] |= groups[j];
groups.erase(groups.begin() + j--);
}
}
}
One approach would be to compare every rectangle with every other rectangle to see if they overlap or intersect. If they do in a sufficient amount you can combine them into one larger rectangle.

C++ R-Tree Library - Do I have to compute the bouding boxes?

I'm trying to use the following library here (the templated version) but in the example shown in the library the user defines the bounding boxes. In my problem I have data of unknown dimensionality each time, so I don't know how to use it. Apart from this, shouldn't the R-Tree be able to calculate the bounding boxes each time there is an insertion?
This is the sample code of the library, as you can see the user defines the bounding boxes each time:
#include <stdio.h>
#include "RTree.h"
struct Rect
{
Rect() {}
Rect(int a_minX, int a_minY, int a_maxX, int a_maxY)
{
min[0] = a_minX;
min[1] = a_minY;
max[0] = a_maxX;
max[1] = a_maxY;
}
int min[2];
int max[2];
};
struct Rect rects[] =
{
Rect(0, 0, 2, 2), // xmin, ymin, xmax, ymax (for 2 dimensional RTree)
Rect(5, 5, 7, 7),
Rect(8, 5, 9, 6),
Rect(7, 1, 9, 2),
};
int nrects = sizeof(rects) / sizeof(rects[0]);
Rect search_rect(6, 4, 10, 6); // search will find above rects that this one overlaps
bool MySearchCallback(int id, void* arg)
{
printf("Hit data rect %d\n", id);
return true; // keep going
}
void main()
{
RTree<int, int, 2, float> tree;
int i, nhits;
printf("nrects = %d\n", nrects);
for(i=0; i<nrects; i++)
{
tree.Insert(rects[i].min, rects[i].max, i); // Note, all values including zero are fine in this version
}
nhits = tree.Search(search_rect.min, search_rect.max, MySearchCallback, NULL);
printf("Search resulted in %d hits\n", nhits);
// Iterator test
int itIndex = 0;
RTree<int, int, 2, float>::Iterator it;
for( tree.GetFirst(it);
!tree.IsNull(it);
tree.GetNext(it) )
{
int value = tree.GetAt(it);
int boundsMin[2] = {0,0};
int boundsMax[2] = {0,0};
it.GetBounds(boundsMin, boundsMax);
printf("it[%d] %d = (%d,%d,%d,%d)\n", itIndex++, value, boundsMin[0], boundsMin[1], boundsMax[0], boundsMax[1]);
}
// Iterator test, alternate syntax
itIndex = 0;
tree.GetFirst(it);
while( !it.IsNull() )
{
int value = *it;
++it;
printf("it[%d] %d\n", itIndex++, value);
}
getchar(); // Wait for keypress on exit so we can read console output
}
An example of what I want to save in an R-Tree is:
-------------------------------
| ID | dimension1 | dimension2|
-------------------------------
| 1 | 8 | 9 |
| 2 | 3 | 5 |
| 3 | 2 | 1 |
| 4 | 6 | 7 |
-------------------------------
Dimensionality
There will be some limit in your requirements to the dimensionality. This is because computers only have infinite storage so cannot store an infinite number of dimensions. Really it is a decision for you how many dimensions you wish to support. The most common numbers of course are two and three. Do you actually need to support eleven? When are you going to use it?
You can do this either by always using an R-tree with the maximum number you support, and passing zero as the other coordinates, or preferably you would create several code paths, one for each supported number of dimensions. I.e. you would have one set of routines for two-dimensional data and another for three dimensional, and so on.
Calculating the bounding box
The bounding box is the rectangle or cuboid which is aligned to the axes, and completely surrounds the object you wish to add.
So if you are inserting axis-aligned rectangles/cuboids etc, then the shape is the bounding box.
If you are inserting points, the min and max of each dimension are just the point value of that dimension.
Any other shape, you have to calculate the bounding box. E.g. if you are inserting a triangle, you need to calculate the rectangle which completely surrounds the triangle as the bounding box.
The library can't do this for you because it doesn't know what you are inserting. You might be inserting spheres stored as centre + radius, or complex triangle mesh shapes. The R-Tree can provide the spatial index but needs you to provide that little bit of information to fill in the gaps.