I'm trying to binarise a picture, firstly of course having it prepared(grayscaling)
My method is to find the maximum and minimum values of grayscale, then find the middle value(which is my threshold) and then, iterating over all the pixels I compare the current one with a threshold and if the grayscale is larger than the threshold, I put 0 in a matrix, or for the others I put 1.
But now I'm facing the problem. In common I'm binarising images with white background, so my algorithm is further based on this feature. But when I meet an image with black background everything collapses, but I still can see the number clearly(now 0's and 1's switch places)
How can i solve this problem, make my program more common?
Maybe I'd better look for another ways of binarization/
P.S. I looked for an understandable explanation of Otsu threshold method, but it seems either I'm not prepared for this way of difficulty or I find very complicated explanations every time, but I can't write it in C. If anyone could hrlp here, it'd be wonderful.
Sorry for not answering the questions, just didn't see them
Firstly - the code
for (int y=1;y<Source->Picture->Height;y++)
for (int x=1;x<Source->Picture->Width;x++)
{
unsigned green = GetGValue(Source->Canvas->Pixels[x][y]);
unsigned red = GetRValue(Source->Canvas->Pixels[x][y]);
unsigned blue = GetBValue(Source->Canvas->Pixels[x][y]);
threshold = (0.2125*red+0.7154*green+0.0721*blue);
if (min>threshold)
min=threshold;
if (max<threshold)
max = threshold;
}
middle = (max+min)/2;
Then iterating through the image
if (threshold<middle)
{
picture[x][y]=1;
fprintf( fo,"1");
} else {
picture[x][y]=0;
fprintf( fo,"0");
}
}
fprintf( fo,"\n");
}
fclose(fo);
So I get a file, something like this
000000000
000001000
000001000
000011000
000101000
000001000
000001000
000001000
000000000
Here you can see an example of one.
Then I can interpolate it, or do something else (recognize), depending on zero's and one's.
But if I switch the colors, the numbers won't be the same. So the recognition will not work. I wonder if there's an algoritm that can help me out.
I've never heard of Otsu's method, but I understand some of the wikipedia page so I'll try to simplify that.
1 Count how many pixels are at each level of darkness.
2 "Guess" a threshold.
3 Calculate the variance of the counts of darkness less than the threshold
4 Calculate the variance of the counts of darkness greater than the threshold
5 If the variance of the darker side is greater, guess a darker threshold,
else guess a higher threshold.
Do this like a binary search so that it ends.
6 Turn all pixels darker than threshold black, the rest white.
Otsu's method is actually "maximizing inter-class variance", but I don't understand that part of the math.
The concept of variance, is "how far apart are the values from each other." A low variance means everything is similar. A high variance means the values are far apart. The variance of a rainbow is very high, lots of colors. The variance of the background of stackoverflow is 0, since it's all perfectly white, with no other colors. Variance is calculated more or less like this
double variance(unsigned int* counts, int size, int threshold, bool above) {
//this is a quick trick to turn the "upper" into lower, save myself code
if (above) return variance(counts, size-threshold, size-threshold, false);
//first we calculate the average value
unsigned long long atotal=0;
unsigned long long acount=0;
for(int i=0; i<threshold; ++i) {
atotal += counts[i]*i //number of px times value
acount += counts[i];
}
//finish calculating average
double average = double(atotal)/count;
//next we calculate the variance
double vtotal=0;
for(int i=0; i<threshold; ++i) {
//to do so we get each values's difference from the average
double t = std::abs(i-average);
//and square it (I hate mathmaticians)
vtotal += counts[i]*t*t;
}
//and return the average of those squared values.
return vtotal/count;
}
I would tackle this problem with another approach:
Compute the cumulative histogram of greyscaled values of the image.
Use as threshold the pixel value in which this cumulative
reaches half of the total pixels of the image.
The algorithm would go as follows:
int bin [256];
foreach pixel in image
bin[pixelvalue]++;
endfor // this computes the histogram of the image
int thresholdCount = ImageWidth * ImageSize / 2;
int count = 0;
for int i = 0 to 255
count = count + bin[i];
if( count > thresholdCount)
threshold = i;
break; // we are done
endif
endfor
This algorithm does not compute the cumulative histogram itself but rather uses the image histogram to do what I said earlier.
If your algorithm works properly for white backgrounds but fails for black backgrounds, you simply need to detect when you have a black background and invert the values. If you assume the background value will be more common, you can simply count the number of 1s and 0s in the result; if the 0s are greater, invert the result.
Instead of using mean of min and max, you should use median of all points as threshold. In general kth percentile (k = what percentage of points you want as black) is more appropriate.
Another solution is to cluster the data into two clusters.
Related
I have a stack of images in which I want to calculate the mean of each pixel down the stack.
For example, let (x_n,y_n) be the (x,y) pixel in the nth image. Thus, the mean of pixel (x,y) for three images in the image stack is:
mean-of-(x,y) = (1/3) * ((x_1,y_1) + (x_2,y_2) + (x_3,y_3))
My first thought was to load all pixel intensities from each image into a data structure with a single linear buffer like so:
|All pixels from image 1| All pixels from image 2| All pixels from image 3|
To find the sum of a pixel down the image stack, I perform a series of nested for loops like so:
for(int col=0; col<img_cols; col++)
{
for(int row=0; row<img_rows; row++)
{
for(int img=0; img<num_of_images; img++)
{
sum_of_px += px_buffer[(img*img_rows*img_cols)+col*img_rows+row];
}
}
}
Basically img*img_rows*img_cols gives the buffer element of the first pixel in the nth image and col*img_rows+row gives the (x,y) pixel that I want to find for each n image in the stack.
Is there a data structure or algorithm that will help me sum up pixel intensities down an image stack that is faster and more organized than my current implementation?
I am aiming for portability so I will not be using OpenCV and am using C++ on linux.
The problem with the nested loop in the question is that it's not very cache friendly. You go skipping through memory with a long stride, effectively rendering your data cache useless. You're going to spend a lot of time just accessing the memory.
If you can spare the memory, you can create an extra image-sized buffer to accumulate totals for each pixel as you walk through all the pixels in all the images in memory order. Then you do a single pass through the buffer for the division.
Your accumulation buffer may need to use a larger type than you use for individual pixel values, since it has to accumulate many of them. If your pixel values are, say, 8-bit integers, then your accumulation buffer might need 32-bit integers or floats.
Usually, a stack of pixels
(x_1,y_1),...,(x_n,y_n)
is conditionally independent from a stack
(a_1,b_1),...,(a_n,b_n)
And even if they weren't (assuming a particular dataset), then modeling their interactions is a complex task and will give you only an estimate of the mean. So, if you want to compute the exact mean for each stack, you don't have any other choice but to iterate through the three loops that you supply. Languages such as Matlab/octave and libraries such as Theano (python) or Torch7 (lua) all parallelize these iterations. If you are using C++, what you do is well suited for Cuda or OpenMP. As for portability, I think OpenMP is the easier solution.
A portable, fast data structure specifically for the average calculation could be:
std::vector<std::vector<std::vector<sometype> > > VoVoV;
VoVoV.resize(img_cols);
int i,j;
for (i=0 ; i<img_cols ; ++i)
{
VoVoV[i].resize(img_rows);
for (j=0 ; j<img_rows ; ++j)
{
VoVoV[i][j].resize(num_of_images);
// The values of all images at this pixel are stored continguously,
// therefore should be fast to access.
}
}
VoVoV[col][row][img] = foo;
As a side note, 1/3 in your example will evaluate to 0 which is not what you want.
For fast summation/averaging you can now do:
sometype sum = 0;
std::vector<sometype>::iterator it = VoVoV[col][row].begin();
std::vector<sometype>::iterator it_end = VoVoV[col][row].end();
for ( ; it != it_end ; ++it)
sum += *it;
sometype avg = sum / num_of_images; // or similar for integers; check for num_of_images==0
Basically you should not rely that the compiler would optimize away the repeated calculation of always the same offsets.
I'm writing some code in openCV and want to find the median value of a very large matrix array (single channel grayscale, float).
I tried several methods such as sorting the array (using std::sort) and picking the middle entry but it is extremely slow when comparing with the median function in matlab. To be precise - what takes 0.25 seconds in matlab takes over 19 seconds in openCV.
My input image is originally a 12-bit greyscale image with the dimensions 3840x2748 (~10.5 megapixels), converted to float (CV_32FC1) where all the values are now mapped to the range [0,1] and at some point in the code I request the median value by calling:
double myMedianValue = medianMat(Input);
Where the function medianMat is:
double medianMat(cv::Mat Input){
Input = Input.reshape(0,1); // spread Input Mat to single row
std::vector<double> vecFromMat;
Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat
std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat
if (vecFromMat.size()%2==0) {return (vecFromMat[vecFromMat.size()/2-1]+vecFromMat[vecFromMat.size()/2])/2;} // in case of even-numbered matrix
return vecFromMat[(vecFromMat.size()-1)/2]; // odd-number of elements in matrix
}
I timed the function medinaMat by itself and also the various parts - as expected the bottleneck is in:
std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat
Does anyone here have an efficient solution?
Thanks!
EDIT
I have tried using std::nth_element given in the answer of Adi Shavit.
The function medianMat now reads as:
double medianMat(cv::Mat Input){
Input = Input.reshape(0,1); // spread Input Mat to single row
std::vector<double> vecFromMat;
Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat
std::nth_element(vecFromMat.begin(), vecFromMat.begin() + vecFromMat.size() / 2, vecFromMat.end());
return vecFromMat[vecFromMat.size() / 2];
}
The runtime has lowered from over 19 seconds to 3.5 seconds. This is still nowhere near the 0.25 second in Matlab using the median function...
Sorting and taking the middle element is not the most efficient way to find a median. It requires O(n log n) operations.
With C++ you should use std::nth_element() and take the middle iterator. This is an O(n) operation:
nth_element is a partial sorting algorithm that rearranges elements in [first, last) such that:
The element pointed at by nth is changed to whatever element would occur in that position if [first, last) was sorted.
All of the elements before this new nth element are less than or equal to the elements after the new nth element.
Also, your original data is 12 bit integers. Your implementation does a few things that make the comparison to Matlab problematic:
You converted to floating point (CV_32FC1 or double or both) this is costly and takes time
The code has an extra copy to a vector<double>
Operations on float and especially doubles cost more than on integers.
Assuming your image is continuous in memory, as is the default for OpenCV you should use CV_16C1, and work directly on the data array after reshape()
Another option which should be very fast is to simply build a histogram of the image - this is a single pass on the image. Then, working on the histogram, find the bin that corresponds to half the pixels on each side - this is at most a single pass over the bins.
The OpenCV docs have several tutorials on how to build a histograms. Once you have the histogram, accumulate the bin values until you get pass 3840x2748/2. This bin is your median.
OK.
I actually tried this before posting the question and due to some silly mistakes I disqualified it as a solution... anyway here it is:
I basically create a histogram of values for my original input with 2^12 = 4096 bins, compute the CDF and normalize it so it is mapped from 0 to 1 and find the smallest index in the CDF that is equal or larger than 0.5. I then divide this index by 12^2 and thus find the median value requested. Now it runs in 0.11 seconds (and that's in debug mode without heavy optimizations) which is less than half the time required in Matlab.
Here's the function (nVals = 4096 in my case corresponding with 12-bits of values):
double medianMat(cv::Mat Input, int nVals){
// COMPUTE HISTOGRAM OF SINGLE CHANNEL MATRIX
float range[] = { 0, nVals };
const float* histRange = { range };
bool uniform = true; bool accumulate = false;
cv::Mat hist;
calcHist(&Input, 1, 0, cv::Mat(), hist, 1, &nVals, &histRange, uniform, accumulate);
// COMPUTE CUMULATIVE DISTRIBUTION FUNCTION (CDF)
cv::Mat cdf;
hist.copyTo(cdf);
for (int i = 1; i <= nVals-1; i++){
cdf.at<float>(i) += cdf.at<float>(i - 1);
}
cdf /= Input.total();
// COMPUTE MEDIAN
double medianVal;
for (int i = 0; i <= nVals-1; i++){
if (cdf.at<float>(i) >= 0.5) { medianVal = i; break; }
}
return medianVal/nVals; }
It's probably faster to find it from the original data.
Since the original data has 12-bit values, there are only
4096 different possible values. That's a nice and small table!
Go through all the data in one pass, and count how many of each value
you have. That is a O(n) operation. Then it's easy to find the median,
only count size/2 items from either end of the table.
I have two images and I want to get a sense for how much they differ on a pixel by pixel level. My basic idea was to take the two images, apply absdiff, and then go each pixel in the difference, take its norm, and store the norm in another array. The problem with this approach is that it is too slow for my application. Does anyone know any alternatives to this?
Many Thanks,
Hillary
The code for calculating the normed difference:
uchar* row_pointer_image_difference;
double* row_pointer_normed_difference;
Vec3b bgrPixel;
double pixel_distance;
for (long int r = 0; r < rows; r++){
row_pointer_image_difference = image_difference.ptr<uchar>(r);
row_pointer_normed_difference = normed_difference.ptr<double>(r);
for (long int c = 0; c < columns; c++){
//calculate pixel distance
bgrPixel = row_pointer_image_difference[c];
pixel_distance = norm(bgrPixel);
row_pointer_normed_difference[c] = pixel_distance;
}
}
You need to clarify your use case better, in order to see what shortcuts are available. Ask yourself: What do you use the difference for? Can you live with an approximate difference? Do you only need to tell if the images are exactly equal?
Also, what computation time do you want to optimize? Worst case? Average? Can you live with a large variance in computation times?
For example, if you are only interested in testing for exact equality, early termination at the first difference is very fast, and will have low expected time if most images are different from each other.
If the fraction of duplicates is expected to be large, random pixel sampling may be a viable approach, and from the sample rate you can quantify the likelihood of false positives and negatives.
I'm implementing the Viola Jones algorithm for face detection. I'm having issues with the first part of the AdaBoost learning part of the algorithm.
The original paper states
The weak classifier selection algorithm proceeds as follows. For each feature, the examples are sorted based on feature value.
I'm currently working with a relatively small training set of 2000 positive images and 1000 negative images. The paper describes having data sets as large as 10,000.
The main purpose of AdaBoost is to decrease the number of features in a 24x24 window, which totals 160,000+. The algorithm works on these features and selects the best ones.
The paper describes that for each feature, it calculates its value on each image, and then sorts them based on value. What this means is I need to make a container for each feature and store the values of all the samples.
My problem is my program runs out of memory after evaluating only 10,000 of the features (only 6% of them). The overall size of all the containers will end up being 160,000*3000, which is in the billions. How am I supposed to implement this algorithm without running out of memory? I've increased the heap size, and it got me from 3% to 6%, I don't think increasing it much more will work.
The paper implies that these sorted values are needed throughout the algorithm, so I can't discard them after each feature.
Here's my code so far
public static List<WeakClassifier> train(List<Image> positiveSamples, List<Image> negativeSamples, List<Feature> allFeatures, int T) {
List<WeakClassifier> solution = new LinkedList<WeakClassifier>();
// Initialize Weights for each sample, whether positive or negative
float[] positiveWeights = new float[positiveSamples.size()];
float[] negativeWeights = new float[negativeSamples.size()];
float initialPositiveWeight = 0.5f / positiveWeights.length;
float initialNegativeWeight = 0.5f / negativeWeights.length;
for (int i = 0; i < positiveWeights.length; ++i) {
positiveWeights[i] = initialPositiveWeight;
}
for (int i = 0; i < negativeWeights.length; ++i) {
negativeWeights[i] = initialNegativeWeight;
}
// Each feature's value for each image
List<List<FeatureValue>> featureValues = new LinkedList<List<FeatureValue>>();
// For each feature get the values for each image, and sort them based off the value
for (Feature feature : allFeatures) {
List<FeatureValue> thisFeaturesValues = new LinkedList<FeatureValue>();
int index = 0;
for (Image positive : positiveSamples) {
int value = positive.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, true));
++index;
}
index = 0;
for (Image negative : negativeSamples) {
int value = negative.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, false));
++index;
}
Collections.sort(thisFeaturesValues);
// Add this feature to the list
featureValues.add(thisFeaturesValues);
++currentFeature;
}
... rest of code
This should be the pseudocode for the selection of one of the weak classifiers:
normalize the per-example weights // one float per example
for feature j from 1 to 45,396:
// Training a weak classifier based on feature j.
- Extract the feature's response from each training image (1 float per example)
// This threshold selection and error computation is where sorting the examples
// by feature response comes in.
- Choose a threshold to best separate the positive from negative examples
- Record the threshold and weighted error for this weak classifier
choose the best feature j and threshold (lowest error)
update the per-example weights
Nowhere do you need to store billions of features. Just extract the feature responses on the fly on each iteration. You're using integral images, so extraction is fast. That is the main memory bottleneck, and it's not that much, just one integer for every pixel in every image... basically the same amount of storage as your images required.
Even if you did just compute all the feature responses for all images and save them all so you don't have to do that every iteration, that still only:
45396 * 3000 * 4 bytes =~ 520 MB, or if you're convinced there are 160000 possible features,
160000 * 3000 * 4 bytes =~ 1.78 GB, or if you use 10000 training images,
160000 * 10000 * 4 bytes =~ 5.96 GB
Basically, you shouldn't be running out of memory even if you do store all the feature values.
The basic problem is this:
I have a CVMat, type CV_8UC1, which is mostly filled in with integers (well, chars, actually, but whatever) between 1 and 100 inclusive. The remaining elements are zeros.
In this case, 0 basically means "unknown". I want to fill in the unknown elements with, essentially, the average of its nearest neighbors... i.e. if this matrix were representing a 3d surface with a bunch of holes in it, I want to smoothly fill in the holes.
Keeping in mind, of course, that it's possible there are some rather big holes.
Efficiency isn't super important, as this operation is only going to be happening once, and the matrix in question isn't bigger than around 1000x1000.
Here's the code I need to finish:
for(int x=0; x<heightMatrix.cols; x++) {
for (int y=0; y<heightMatrix.rows; y++) {
if (heightMatrix.at<char>(x,y) == 0) {
// ???
}
}
}
Thanks!!
How about this instead:
put your data in an image and use image closing with a large kernel (or with a lot of iterations):
http://opencv.willowgarage.com/documentation/image_filtering.html#morphologyex
What about this?
int sum = 0;
... paste the following part inside the loop ...
sum += heightMatrix.at<char>(x - 1,y);
sum += heightMatrix.at<char>(x + 1,y);
sum += heightMatrix.at<char>(x,y - 1);
sum += heightMatrix.at<char>(x,y + 1);
heightMatrix.at<char>(x,y) = sum / 4;
Since you deal with a CV_8UC1 Mat you have in practice a 2d array and each pixel has just 4 nearest neighbors.
There are some caveats however:
1) put your averaged pixel in a Mat of floats to avoid round off!
2) to fill the whole Mat with this average may not be what you are looking for if the non-zero pixels are quite sparse: when there is a lot of empty pixels and really few non-zero pixels the more you move away from a non-zero pixel, the more the average converges to 0. And this may happen in as few as 3-4 iterations (another good reason to store not to store the values in a Mat of integers).