Sliding window with roi c++ - c++

I have little indexing problem with the following for loop. Simply I am scanning the image with roi but I am not able to scan whole image. I have some non scanned regions left in the last rows and cols. Any suggestion?
Sorry for simple question.
`
// Sliding Window for scaning the image
for (int rowIndex = 0; rowIndex <= lBPIIImage2.rows - roih; rowIndex = getNextIndex(rowIndex, lBPIIImage2.rows, roih, steprow))
{
for (int colindex = 0; colindex <=lBPIIImage2.cols - roiw; colindex = getNextIndex(colindex, lBPIIImage2.cols, roiw, stepcol))
{
searchRect = cvRect(colindex, rowIndex, roiw, roih);
frameSearchRect = lBPIIImage2(searchRect);
LoopDummy = frameSearchRect.clone();
rectangle(frame, searchRect, CV_RGB(255, 0, 0), 1, 8, 0);
//normalize(LoopDummy, LoopDummy, 0, 255, NORM_MINMAX, CV_8UC1);
//imshow("Track", LoopDummy);
//waitKey(30);
images.push_back(LoopDummy);
Coordinate.push_back(make_pair(rowIndex, colindex));
}
}

Instead of this
for (int rowindex = 0; rowindex <= frameWidth - roiw; rowindex += StepRow)
try something like this
for (int rowIndex = 0; rowIndex < frameWidth - roiw;
rowIndex = getNextIndex(rowIndex, frameWidth, roiw, StepRow))
where function getNextIndex is like this:
int getNextIndex(int currentIndex, int frameSize, int roi, int step)
{
int temp = min(currentIndex + step, frameSize - roi - 1);
return currentIndex != temp ? temp : frameSize;
}
Note that I replaced <= with < in the condition part of the loop. It makes sense to me to change that bit. In my comments index values should be 0,5,6, instead of 0,5,7. If it really should stand <= then just remove the - 1 part above.

Dialecticus has your problem covered, but I'd like to expand a little. The problem, as you've noticed, affects both rows and columns, but I'll only talk about width since height handling is analogous.
Parts of an array or range not iterated over by a loop, especially ones at the end, is a typical symptom that the loop's condition and/or increment do not cover the entire array/range and need adjustment. In your case, to scan the entire image width, you'd need frameWidth - roiw to be an exact number of StepRows, i.e. (frameWidth - roiw) % StepRow == 0. I strongly suspect that's not true in your case.
For example, if your image is 14 pixels wide, and your region of interest width is 8, in which case your step would be 4 pixels (I assume you meant roiw * 0.5), then this would happen:
Iteration #3 would fail the rowindex <= frameWidth - roiw test and exit the loop without scanning the last 2 pixels.
I see three options, each with its disadvantages:
Disclaimer: All code below is not tested and may contain typos and logical errors. Please test and adjust accordingly.
Change the offset of the last region so it fits the end of the image. In this case, the last rowindex increment would not be StepRow but the number of pixels between the end of your last scanning region and the end of the image. The drawback is that the last two regions may overlap more than those before them - in our case, region 1 would cover pixels 0:7, region 2 - 4:11 and region 3 - 6:13.
To do this, you can modify your loop increment to check if this is the last iteration (please think of how to write it prettier, I just used the ternary operator to make the change smaller):
for (int rowindex = 0;
/* Note that a region of (roiw) pixels ending at (frameWidth - 1) should start
at (frameWidth - roiw), so the <= operator should be correct. To verify that,
check the single-region case where rowindex == 0 and frameWidth == roiw */
rowindex <= frameWidth - roiw;
rowindex += ((rowindex + roiw + StepRow <= frameWidth) ?
StepRow :
(frameWidth - (rowindex + roiw))))
Clamp the size of the last region to the end of the image. The drawback would be that your last region is smaller than those before it - in our case, regions 1 and 2 would have a size of 8, while region 3 would be 6 pixels.
To do this, you can put a check when constructing the cvRect to see if it needs to be smaller. You should also change your loop condition somehow, so that you actually enter it in this case:
/* (StepRow - 1) should make sure that enter the loop is entered when the
last unscanned region has a width between 1 and stepRow, inclusive */
for (int rowindex = 0; rowindex <= frameWidth - roiw + (StepRow - 1); rowindex += StepRow)
/* ... column loop ... */
{
if (rowindex > frameWidth - roiw)
{
roiw = frameWidth - rowindex;
}
searchRect = cvRect(rowindex, colindex, roiw, roih)
Use a varying step so that your regions are of equal width and overlap as equally as possible. In our case, the three regions would overlap equally, being at pixels 0:7, 3:11 and 6:14, but obviously that won't be true in most cases.
To do that, you can calculate how many iterations you expect to do with your current step, and then divide by that number the difference between your frame and ROI width, which would give you your fractional step. Since pixel offset is an integer, to construct the cvRect, you'd need an integer value. I suggest keeping and incrementing the rowindex and step as fractions and converting to int just before use in the cvRect constructor - in this way, your index will move as evenly as possible.
Note that I've used an epsilon value to (hopefully) avoid rounding errors in float-to-int conversion when the float value is very close to an int. See this blog post and comments for an example and an explanation of the technique. However, keep in mind that the code is not tested and there may still be rounding errors, especially if your int values are big!
/* To see how these calculations work out, experiment with
roiw == 8, StepRow == 4 and frameWidth == 14, 15, 16, 17 */
/* N.B.: Does not cover the case when frameWidth < roiw! */
int stepRowCount = (frameWidth - roiw + (StepRow - 1)) / StepRow + 1;
float step = ((float)(frameWidth - roiw)) / (stepRowCount - 1);
float eps = 0.005;
for (float rowindex = 0.0; (int)(rowindex + eps) <= frameWidth - roiw; rowindex += StepRow)
/* ... column loop ... */
{ searchRect = cvRect((int)(rowindex + eps), colindex, roiw, roih);
P.S.: Shouldn't your index that iterates the image width be called colindex and vice versa, since in an 1920x1080 image you have 1920 columns and 1080 rows (hence terms like 1080p)?

Related

How to determine pixel intensity with respect to pixel range in x-axis?

I want to see the distribution of a color with respect to image width. That is, if a (black and white) image has width of 720 px, then I want to conclude that a specific range (e.g. pixels [500,720]) has more white color in compared to rest of the image. What I thought is, I need a slice of the image of 720x1 px, then I need to check the values and distribute them w.r.t. width of 720 px. But I don't know the way I can apply this in a suitable way?
edit: I use OpenCV 4.0.0 with C++.
Example Case: In the first image, it is obvious that right hand side pixels are white. I want to get estimate coordinates of this dense line or zone. The light pink zone is where I am interested in and the red borders are the range where I want to find it.
If you want to get minimum continious range of image columns which contain more white than the rest of the image, than you need first to calculate number of white pixels in each column. Lets assume we have an image 720x500 (500 pixels high and 720 pixels wide). Than you will get an array Arr of 720 elements that equal number of white pixels in each column (1x500) respectively.
const int Width = img.cols;
int* Arr = new int[Width];
for( int x = 0; x < Width; x++ ) {
Arr[x] = 0;
for( int y = 0; y < img.rows; y++ ) {
if ( img.at<cv::Vec3b>(y,x) == cv::Vec3b(255,255,255) ) {
Arr[x]++;
}
}
}
You need to find a minimum range [A;B] in this array that satisfies condition Sum(Arr[0 to A-1]) + Sum(Arr[B+1 to Width-1]) < Sum(Arr[A to B]).
// minimum range width is guaranteed to be less or equal to (Width/2 + 1)
int bestA = 0, minimumWidth = Width/2 + 1;
int total = RangeSum(Arr, 0, Width-1);
for (int i = 0; i < Width; i++) {
for (int j = i; j < Width && j < i + minimumWidth; j++) {
int rangeSum = RangeSum(Arr, i, j);
if (rangeSum > total - rangeSum) {
bestA = i;
minimumWidth = j - i + 1;
break;
}
}
}
std::cout << "Most white minimum range - [" << bestA << ";" << bestA + minimumWidth - 1 << "]\n";
You can optimize the code if you precalculate sums for all [0; i] ranges, i from 0 to Width - 1. Than you can calculate RangeSum(Arr, A, B) as PrecalculatedSums[B] - PrecalculatedSums[A] (in O(1) complexity).

Dynamic programming state calculations

Question:
Fox Ciel is writing an AI for the game Starcraft and she needs your help.
In Starcraft, one of the available units is a mutalisk. Mutalisks are very useful for harassing Terran bases. Fox Ciel has one mutalisk. The enemy base contains one or more Space Construction Vehicles (SCVs). Each SCV has some amount of hit points.
When the mutalisk attacks, it can target up to three different SCVs.
The first targeted SCV will lose 9 hit points.
The second targeted SCV (if any) will lose 3 hit points.
The third targeted SCV (if any) will lose 1 hit point.
If the hit points of a SCV drop to 0 or lower, the SCV is destroyed. Note that you may not target the same SCV twice in the same attack.
You are given a int[] HP containing the current hit points of your enemy's SCVs. Return the smallest number of attacks in which you can destroy all these SCVs.
Constraints-
- x will contain between 1 and 3 elements, inclusive.
- Each element in x will be between 1 and 60, inclusive.
And the solution is:
int minimalAttacks(vector<int> x)
{
int dist[61][61][61];
memset(dist, -1, sizeof(dist));
dist[0][0][0] = 0;
for (int total = 1; total <= 180; total++) {
for (int i = 0; i <= 60 && i <= total; i++) {
for (int j = max(0, total - i - 60); j <= 60 && i + j <= total; j++) {
// j >= max(0, total - i - 60) ensures that k <= 60
int k = total - (i + j);
int & res = dist[i][j][k];
res = 1000000;
// one way to avoid doing repetitive work in enumerating
// all options is to use c++'s next_permutation,
// we first createa vector:
vector<int> curr = {i,j,k};
sort(curr.begin(), curr.end()); //needs to be sorted
// which will be permuted
do {
int ni = max(0, curr[0] - 9);
int nj = max(0, curr[1] - 3);
int nk = max(0, curr[2] - 1);
res = std::min(res, 1 + dist[ni][nj][nk] );
} while (next_permutation(curr.begin(), curr.end()) );
}
}
}
// get the case's respective hitpoints:
while (x.size() < 3) {
x.push_back(0); // add zeros for missing SCVs
}
int a = x[0], b = x[1], c = x[2];
return dist[a][b][c];
}
As far as i understand, this solution calculates all possible state's best outcome first then simply match the queried position and displays the result. But I dont understand the way this code is written. I can see that nowhere dist[i][j][k] value is edited. By default its -1. So how come when i query any dist[i][j][k] I get a different value?.
Can someone explain me the code please?
Thank you!

How to filter given width of lines in a image?

I need to filter given width of lines in a image.
I am coding a program which will detect lines of road image. And I found something like that but can't understand logic of it. My function has to do that:
I will send image and width of line in terms of pixel size(e.g 30 pixel width), the function will filter just these lines in image.
I found that code:
void filterWidth(Mat image, int tau) // tau=width of line I want to filter
int aux = 0;
for (int j = 0; j < quad.rows; ++j)
{
unsigned char *ptRowSrc = quad.ptr<uchar>(j);
unsigned char *ptRowDst = quadDst.ptr<uchar>(j);
for (int i = tau; i < quad.cols - tau; ++i)
{
if (ptRowSrc[i] != 0)
{
aux = 2 * ptRowSrc[i];
aux += -ptRowSrc[i - tau];
aux += -ptRowSrc[i + tau];
aux += -abs((int)(ptRowSrc[i - tau] - ptRowSrc[i + tau]));
aux = (aux < 0) ? (0) : (aux);
aux = (aux > 255) ? (255) : (aux);
ptRowDst[i] = (unsigned char)aux;
}
}
}
What is the mathematical explanation of that code? And how does that work?
Read up about convolution filters. This code is a particular case of a 1 dimensional convolution filter (it only convolves with other pixels on the currently processed line).
The value of aux is started with 2 * the current pixel value, then pixels on either side of it at distance tau are being subtracted from that value. Next the absolute difference of those two pixels is also subtracted from it. Finally it is capped to the range 0...255 before being stored in the output image.
If you have an image:
0011100
This convolution will cause the centre 1 to gain the value:
2 * 1
- 0
- 0
- abs(0 - 0)
= 2
The first '1' will become:
2 * 1
- 0
- 1
- abs(0 - 1)
= 0
And so will the third '1' (it's a mirror image).
And of course the 0 values will always stay zero or become negative, which will be capped back to 0.
This is a rather weird filter. It takes the pixel values three by three on the same line, with a tau spacing. Let these values by Vl, V and Vr.
The filter computes - Vl + 2 V - Vr, which can be seen as a second derivative, and deducts |Vl - Vr|, which can be seen as a first derivative (also called gradient). The second derivative gives a maximum response in case of a maximum configuration (Vl < V > Vr); the first derivative gives a minimum response in case of a symmetric configuration (Vl = Vr).
So the global filter will give a maximum response for a symmetric maximum (like with a light road on a dark background, vertical, with a width less than 2.tau).
By rearranging the terms, you can see that the filter also yields the smallest of the left and right gradients, V - Vm and V - Vp (clamped to zero).

efficient way of accessing opencv Mat elements

I'm trying to play around with some OpenCV and thought up an interesting little scenario to work on.
Basically, I want to take a pixel, add the colour values from the 3 neighbouring pixels (so (x, y), (x+1, y) (x, y+1) and (x+1, y+1)) and divide the result by 4 to get an average colour value. Then the next set of pixels I process is (x+2, y+2) with it's 3 neighbours.
I then also want to be able to do a similar thing, but with 9 pixels (with the chosen co-ordinate to work from being the centre).
Initially I started with a gaussian blur type masking, but that's not the result I want to acheive. As from those calculations, I just want to get 1 pixel value. So the output image will be 1/4 or a 1/9 of the size. So for now I've got it working where I've literally written out the calculation in a for loop as:
for (int i = 1; i < myImage.rows -1; i++)
{
b = 0;
for (int k = 1; k < myImage.cols -1; k++)
{
//9 pixel radius
Result.at<Vec3b>(a, b)[1] = (myImage.at<Vec3b>(i-1, k-1)[1]+myImage.at<Vec3b>(i-1, k)[1]+myImage.at<Vec3b>(i+1, k)[1] + myImage.at<Vec3b>(i, k)[1]+myImage.at<Vec3b>(i, k-1)[1]+myImage.at<Vec3b>(i, k+1)[1] + myImage.at<Vec3b>(i + 1, k+1)[1] + myImage.at<Vec3b>(i-1, k + 1)[1] + myImage.at<Vec3b>(i + 1, k - 1)[1]) / 9;
Result.at<Vec3b>(a, b)[2] = (myImage.at<Vec3b>(i-1, k-1)[2]+myImage.at<Vec3b>(i-1, k)[2]+myImage.at<Vec3b>(i+1, k)[2] + myImage.at<Vec3b>(i, k)[2]+myImage.at<Vec3b>(i, k-1)[2]+myImage.at<Vec3b>(i, k+1)[2] + myImage.at<Vec3b>(i + 1, k+1)[2] + myImage.at<Vec3b>(i-1, k + 1)[2] + myImage.at<Vec3b>(i + 1, k - 1)[2]) / 9;
Result.at<Vec3b>(a, b)[0] = (myImage.at<Vec3b>(i-1, k-1)[0]+myImage.at<Vec3b>(i-1, k)[0]+myImage.at<Vec3b>(i+1, k)[0] + myImage.at<Vec3b>(i, k)[0]+myImage.at<Vec3b>(i, k-1)[0]+myImage.at<Vec3b>(i, k+1)[0] + myImage.at<Vec3b>(i + 1, k+1)[0] + myImage.at<Vec3b>(i-1, k + 1)[0] + myImage.at<Vec3b>(i + 1, k - 1)[0]) / 9;
//4 pixel radius
// Result.at<Vec3b>(a, b)[1] = (myImage.at<Vec3b>(i, k)[1] + myImage.at<Vec3b>(i + 1, k)[1] + myImage.at<Vec3b>(i, k + 1)[1] + myImage.at<Vec3b>(i, k - 1)[1] + myImage.at<Vec3b>(i - 1, k)[1]) / 5;
// Result.at<Vec3b>(a, b)[2] = (myImage.at<Vec3b>(i, k)[2] + myImage.at<Vec3b>(i + 1, k)[2] + myImage.at<Vec3b>(i, k + 1)[2] + myImage.at<Vec3b>(i, k - 1)[2] + myImage.at<Vec3b>(i - 1, k)[2]) / 5;
// Result.at<Vec3b>(a, b)[0] = (myImage.at<Vec3b>(i, k)[0] + myImage.at<Vec3b>(i + 1, k)[0] + myImage.at<Vec3b>(i, k + 1)[0] + myImage.at<Vec3b>(i, k - 1)[0] + myImage.at<Vec3b>(i - 1, k)[0]) / 5;
b++;
}
a++;
}
Obviously, it's possible to setup the two options as different function that is called, but I'm just wondering if there's a more efficient way of achieveing this, that would let the size of the mask be changed.
Thanks for any help!
I'm assuming that you want to do this all without built-in functions (like resize, mean, or filter2d) and just want to directly address the image using at. There are further optimizations that can be made, but this is intended as a reasonable and understandable improvement on the original code.
Also, it should be noted that I ignore any extra rows/columns when the image size is not exactly divisible by the scale factor. You'll need to specify the expected behavior if you want something different.
The first thing I'd do is change what you think of as the target pixel. Assume you have a 3x3 neighborhood like so:
1 2 3
4 5 6
7 8 9
We're going to take the mean value of all of these pixels anyway, so whether we call pixel 5 the target or pixel 1 makes no difference to the resulting image. I'm going to call pixel 1 the target because it makes the math cleaner.
The 1 pixel will always be on coordinates divisible by the scaling factor. If the scaling factor is 2, the coordinates of 1 will always be even.
Second, rather than loop over the original image dimensions, which actually results in recalculating the same pixel in Result numerous times, I'm going to loop over the dimensions of Result and figure out which pixels in the original image contribute to each pixel in the result.
So to find neighborhood in the original image that corresponds to pixel (x, y) in the result image, we just have to look for pixel 1 of that neighborhood. Since it's a multiple of the scaling factor, it's just
(x * scaleFactor, y * scaleFactor)
Finally, we need to add two more nested loops to loop over the scaleFactor x scaleFactor window. This is the part the avoids having to type out those long calculations.
In the 3x3 example above, for example, pixel 9 in the neighborhood of (x, y) will be:
(x * scaleFactor + 2, y * scaleFactor + 2)
I also do the mean calculation directly in a vector rather than doing each channel individually. This means that our results will overflow a uchar, so I use Vec3i and cast it back to a Vec3b after the division. This is one place where you should consider using a built-in function mean to calculate the average over the window as it will remove the need for these new loops.
So, if our original image is myImage, we have:
int scaleFactor = 3;
Mat Result(myImage.rows/scaleFactor, myImage.rows/scaleFactor,
myImage.type(), Scalar::all(0));
for (int i = 0; i < Result.rows; i++)
{
for (int k = 0; k < Result.cols; k++)
{
// make sum an int vector so it can hold
// value = scaleFactor x scaleFactor x 255
Vec3i areaSum = Vec3i(0,0,0);
for (int m = 0; m < scaleFactor; m++)
{
for (int n = 0; n < scaleFactor; n++)
{
areaSum += myImage.at<Vec3b>(i*scaleFactor+m, k*scaleFactor+n);
}
}
Result.at<Vec3b>(i,k) = Vec3b(areaSum/(scaleFactor*scaleFactor));
}
}
Here are a couple of samples...
Original:
scaleFactor = 2:
scaleFactor = 3:
scaleFactor = 5:

detection of the darkest fixed-size square from a picture

I have a picture of 2600x2600 in gray.
Or it can be seen as a matrix of unsigned short.
I would like to find the darkest (or the brightest by computing the inverse picture) square are of a fixed size N. N could be parametrized (if there is more than one darkest square I would like all).
I read detection-of-rectangular-bright-area-in-a-image-using-opencv
but it needs to a threshold value I don't have and furthermore I search a fixed size.
Do anyone as a way to find it in c++ or python ?
For each row of the image,
Add up the N consecutive pixels, so you get W - N + 1 pixels.
For each column of the new image,
For each consecutive sequence of N pixels, (H - N + 1)
Add them up and compare to the current best.
To add up each consecutive sequence of pixels, you could subtract the last pixel, and add the next pixel.
You could also reuse the image array as storage, if it can be modified. If not, a memory-optimization would be to just store the latest column, and go trough it for each step in the first loop.
Runtime: O(w·h)
Here is some code in C#, to demonstrate this (ignoring the pixel format, and any potential overflows):
List<Point> FindBrightestSquare(int[,] image, int N, out int squareSum)
{
int width = image.GetLength(0);
int height = image.GetLength(1);
if (width < N || height < N)
{
return false;
}
int currentSum;
for (int y = 0; y < height; y++)
{
currentSum = 0;
for (int x = 0; x < width; x++)
{
currentSum += image[x,y];
if (x => N)
{
currentSum -= image[x-N,y];
image[x-N,y] = currentSum;
}
}
}
int? bestSum = null;
List<Point> bestCandidates = new List<Point>();
for (int x = 0; x <= width-N; x++)
{
currentSum = 0;
for (int y = 0; y < height; y++)
{
currentSum += image[x,y];
if (y >= N)
{
currentSum -= image[x, y-N];
if (bestSum == null || currentSum > bestSum)
{
bestSum = currentSum;
bestCandidates.Clear();
bestCandidates.Add(new Point(x, y-N));
}
else if (currentSum == bestSum)
{
bestCandidates.Add(new Point(x, y-N));
}
}
}
}
squareSum = bestSum.Value;
return bestCandidates;
}
You could increment the threshold until you find a square, and use a 2D FSM to detect the square.
This will produce a match in O(width * height * bpp) (binary search on the lowest possible threshold, assuming a power-of-two range):
- set threshold to its maximum value
- for every bit of the threshold
- clear the bit in the threshold
- if there is a match
- record the set of matches as a result
- else
- set the bit
- if there is no record, then the threshold is its maximum.
to detect a square:
- for every pixel:
- if the pixel is too bright, set its line-len to 0
- else if it's the first column, set its line-len to 1
- else set its line-len to the line-len of the pixel to the left, plus one
- if the pixel line-len is less than N, set its rect-len to 0
- else if it's the first row, set its rect-len to 1
- else set its rect-len to the rect-len of the pixel above, plus one
- if the rect-len is at least N, record a match.
line-len represents the number of consecutive pixels that are dark enough.
rect-len represents the number of consecutive rows of dark pixels that are long enough and aligned.
For video-capture, replace the binary search by a linear search from the threshold for the previous frame.
Obviously, you can't get better than theta(width/N * height/N) best case (as you'll have to rule out every possible position for a darker square) and the bit depth can be assumed constant, so this algorithm is asymptotically optimal for a fixed N. It's probably asymptotically optimal for N as a part of the input as well, as (intuitively) you have to consider almost every pixel in the average case.