How to remove small shapes from an image? - c++

I have this image:
I have the following functions:
void DrawPixel(int x, int y, unsigned char r, unsigned char g, unsigned char b);
void ReadPixel(GLint x, GLint y, unsigned char &r, unsigned char &g, unsigned char &b);
Objective
Remove the small shapes of the image, there are 4 small shapes in the image, I want to remove them. One way to do this would be by how many red pixels each shape has, if one shape has less than 200 red pixels for example, I'll remove it from the image by painting it in black. This is just a form of solution that I imagined, if anyone has any other alternative it will be welcome.
What I've tried
void RemoveLastNoises(){
int x,y;
int cont = 0;
unsigned char r,g,b;
int xAux;
for(y=0;y<NewImage.SizeY()-0;y++){
for(x=0;x<NewImage.SizeX()-0;x++){
NewImage.ReadPixel(x,y,r,g,b);
if(r == 255){
cont = 0;
while(r == 255){
NewImage.ReadPixel(x+cont,y,r,g,b);
cont = cont + 1;
}
if(cont < 300){
NewImage.DrawPixel(x,y,255,255,255);
}
}
xAux = x;
x = x+cont;
}
x = xAux;
}
}
This works, but only counts how many red pixels it has in a row (x), I found it interesting to put it here as a reference. Anyway, any idea to remove the small shapes will be welcome.
Note:The larger shapes are not to be modified, the height and width of the image is larger, I decrease the dimensions for the question to be readable.

There is a set of nonlinear filters that are called morophological filters.
They use "and" and "or" combination to create filter masks.
What you need to do is implement a closing. OpenGL does not privde such functions thus, you have to write the code on your own. To do so you
If they are always red just use the red channel as grey-scale image
Create a binary image from the created image
Inverse the image from 2
Create a filter mask with '1's that is large enough to cover the small parts you want to rease
Do A Dilation (https://en.wikipedia.org/wiki/Dilation_(morphology)) on the image with the filter mask
Do an Erosion (https://en.wikipedia.org/wiki/Erosion_(morphology)) on the image with the same filter mask or a slightly smaller one
Invert the image again
5 and 6 are describing a "Closing" https://en.wikipedia.org/wiki/Closing_(morphology)

Related

C++ Image Processing: Uniform Smoothing Operation Makes Image Darker

I'm trying to reduce noise in an image by creating a 2D uniform smoothing algorithm in C++. The function I'm using for each pixel computes the average value of neighboring pixels within a square odd window and uses that as the new value.
Whenever I run the code, however, the pixels in the new image become darker (a pixel value of 255=white, and 0=black). Here is the function for obtaining the new pixel value:
int utility::windowAverage (image &src, int x, int y, int window_size)
{
int sum = 0;
int avg;
for(int i = x-(window_size/2); i < x+(window_size/2);++i)
{
for(int j = y-(window_size/2); j < y+(window_size/2);++j)
{
sum += src.getPixel(i,j);
}
}
avg = sum/(window_size*window_size);
return avg;
}
The parameter image &src is the source image, and the function src.getPixel(i,j) returns an integer from 0 to 255 representing the brightness of the pixel at the specified coordinates (i,j).
I am running the code over gray-level images of the .pgm format.
How can I smooth the image while maintaining the same brightness?
The problem is that you are not actually adding the pixels in a window with the dimension of windows_size*windows_size, but you are missing the last pixel in each dimension when computing sum.
You can fix this by using <=instead of <in both of your for loops.
Example of what is going wrong for window_size = 3 and x=0 and y=0:
The integer division by 2 in your for loops is floored, which means that your loops would become for (int i=-1; i < 1; i++). This obviously only loops over the (two) pixles -1 and 0 in the given direction, but you still divide by the full window_size of 3, which makes the image darker (in this case by one third if it has constant color values).

Plot an audio waveform in C++/Qt

I have an university assignement which consists in displaying the waveform of an audio file using C++/Qt. We should be able to modify the scale that we use to display it (expressed in audio samples per screen pixel).
So far, I am able to:
open the audio file
read the samples
plot the samples at a given scale
To plot the samples at a given scale, I have tried two strategies. Let assume that N is the value of the scale:
for i going from 0 to the width of my window, plot the i * Nth audio sample at the screen pixel i. This is very fast and constant in time because we always access the same amount of audio data points.
However, it does not represent the waveform correctly, as we use the value of only 1 point to represent N points.
for i going from 0 to N * width, plot the ith audio sample at the screen position i / (N * width) and let Qt figure out how to represent that correctly on physical screen pixels.
That plots very beautiful waveforms but it takes hell a lot of time to access data. For instance, if I want to display 500 samples per pixel and the width of my window is 100px, I have to access 50 000 points, which are then plotted by Qt as 100 physical points (pixels).
So, how can I get a correct plot of my audio data, which can be calculated fast? Should I calculate the average of N samples for each physical pixel? Should I do some curve fitting?
In other words, what kind of operation is involved when Qt/Matplotlib/Matlab/etc plot thousands of data point to a very limited amount of physical pixels?
Just because I do know how to do it and I already asked something similar on stackoverflow I'll reference this. I'll provide code later.
Drawing Waveforms is a real problem. I tried to figure this out for more than a half of a year!
To sum this up:
According to the Audacity Documentation:
The waveform view uses two shades of blue, one darker and one lighter.
The dark blue part of the waveform displays the tallest peak in the area that pixel represents. At default zoom level Audacity will
display many samples within that pixel width, so this pixel represents
the value of the loudest sample in the group.
The light blue part of the waveform displays the average RMS (Root Mean Square) value for the same group of samples. This is a rough
guide to how loud this area might sound, but there is no way to
extract or use this RMS part of the waveform separately.
So you simply try to get the important information out of a chunk of data. If you do this over and over you'll have multiple stages which can be used for drawing.
I'll provide some code here, please bear with me it's in development:
template<typename T>
class CacheHandler {
public:
std::vector<T> data;
vector2d<T> min, max, rms;
CacheHandler(std::vector<T>& data) throw(std::exception);
void addData(std::vector<T>& samples);
/*
irreversible removes data.
Fails if end index is greater than data length
*/
void removeData(int endIndex);
void removeData(int startIndex, int endIndex);
};
using this:
template<typename T>
inline WaveformPane::CacheHandler<T>::CacheHandler(std::vector<T>& data, int sampleSizeInBits) throw(std::exception)
{
this->data = data;
this->sampleSizeInBits = sampleSizeInBits;
int N = log(data.size()) / log(2);
rms.resize(N); min.resize(N); max.resize(N);
rms[0] = calcRMSSegments(data, 2);
min[0] = getMinPitchSegments(data, 2);
max[0] = getMaxPitchSegments(data, 2);
for (int i = 1; i < N; i++) {
rms[i] = calcRMSSegments(rms[i - 1], 2);
min[i] = getMinPitchSegments(min[i - 1], 2);
max[i] = getMaxPitchSegments(max[i - 1], 2);
}
}
What I'd suggest is something like this:
Given totalNumSamples audio samples in your audio file, and widgetWidth pixels of width in your display widget, you can calculate which samples are to be represented by each pixel:
// Given an x value (in pixels), returns the appropriate corresponding
// offset into the audio-samples array that represents the
// first sample that should be included in that pixel.
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int totalNumSamples)
{
return (totalNumSamples*x)/widgetWidth;
}
virtual void paintEvent(QPaintEvent * e)
{
QPainter p(this);
for (int x=0; x<widgetWidth; x++)
{
const int firstSampleIndexForPixel = GetFirstSampleIndexForPixel(x, widgetWidth, totalNumSamples);
const int lastSampleIndexForPixel = GetFirstSampleIndexForPixel(x+1, widgetWidth, totalNumSamples)-1;
const int largestSampleValueForPixel = GetMaximumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
const int smallestSampleValueForPixel = GetMinimumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
// draw a vertical line spanning all sample values that are contained in this pixel
p.drawLine(x, GetYValueForSampleValue(largestSampleValueForPixel), x, GetYValueForSampleValue(smallestSampleValueForPixel));
}
}
Note that I didn't include source code for GetMinimumSampleValueInRange(), GetMaximumSampleValueInRange(), or GetYValueForSampleValue(), since hopefully what they do is obvious from their names, but if not, let me know and I can explain them.
Once you have the above working reasonably well (i.e. drawing a waveform that shows the entire file into your widget), you can start working on adding in zoom-and-pan functionality. Horizontal zoom can be implemented by modifying the behavior of GetFirstSampleIndexForPixel(), e.g.:
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int sampleIndexAtLeftEdgeOfWidget, int sampleIndexAfterRightEdgeOfWidget)
{
int numSamplesToDisplay = sampleIndexAfterRightEdgeOfWidget-sampleIndexAtLeftEdgeOfWidget;
return sampleIndexAtLeftEdgeOfWidget+((numSamplesToDisplay*x)/widgetWidth);
}
With that, you can zoom/pan simply by passing in different values for sampleIndexAtLeftEdgeOfWidget and sampleIndexAfterRightEdgeOfWidget that together indicate the subrange of the file you want to display.

Vertically flipping an Char array: is there a more efficient way?

Lets start with some code:
QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
if (vertFlip){
/* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
* The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
* The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
* This function is EXTREMELY SLOW
*/
QByteArray tempArray = imageArray;
for (int h = 0; h < height; ++h){
for (int w = 0; w < width/2; ++w){
for (int i = 0; i < 4; ++i){
imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
}
}
}
}
return imageArray;
}
This is the code I use right now to vertically flip an image which is 640*480 (The image is actually not guaranteed to be 640*480, but it mostly is). The color encoding is RGBA, which means that the total array size is 640*480*4. I get the images with 30 FPS, and I want to show them on the screen with the same FPS.
On an older CPU (Athlon x2) this code is just too much: the CPU is racing to keep up with the 30 FPS, so the question is: can I do this more efficient?
I am also working with OpenGL, does that have a gimmic I am not aware of that can flip images with relativly low CPU/GPU usage?
According to this question, you can flip an image in OpenGL by scaling it by (1,-1,1). This question explains how to do transformations and scaling.
You can improve at least by doing it blockwise, making use of the cache architecture. In your example one of the accesses (either the read OR the write) will be off-cache.
For a start it can help to "capture scanlines" if you're using two loops to loop through the pixels of an image, like so:
for (int y = 0; y < height; ++y)
{
// Capture scanline.
char* scanline = imageArray.data() + y*width*4;
for (int x = 0; x < width/2; ++x)
{
const int flipped_x = width - x-1;
for (int i = 0; i < 4; ++i)
swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
}
}
Another thing to note is that I used swap instead of a temporary image. That'll tend to be more efficient since you can just swap using registers instead of loading pixels from a copy of the entire image.
But also it generally helps if you use a 32-bit integer instead of working one byte at a time if you're going to be doing anything like this. If you're working with pixels with 8-bit types but know that each pixel is 32-bits, e.g., as in your case, you can generally get away with a case to uint32_t*, e.g.
for (int y = 0; y < height; ++y)
{
uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
std::reverse(scanline, scanline + width);
}
At this point you might parellelize the y loop. Flipping an image horizontally (it should be "horizontal" if I understood your original code correctly) in this way is a little bit tricky with the access patterns, but you should be able to get quite a decent boost using the above techniques.
I am also working with OpenGL, does that have a gimmic I am not aware
of that can flip images with relativly low CPU/GPU usage?
Naturally the fastest way to flip images is to not touch their pixels at all and just save the flipping for the final part of the pipeline when you render the result. For this you might render a texture in OGL with negative scaling instead of modifying the pixels of a texture.
Another thing that's really useful in video and image processing is to represent an image to process like this for all your image operations:
struct Image32
{
uint32_t* pixels;
int32_t width;
int32_t height;
int32_t x_stride;
int32_t y_stride;
};
The stride fields are what you use to get from one scanline (row) of an image to the next vertically and one column to the next horizontally. When you use this representation, you can use negative values for the stride and offset the pixels accordingly. You can also use the stride fields to, say, render only every other scanline of an image for fast interactive half-res scanline previews by using y_stride=height*2 and height/=2. You can quarter-res an image by setting x stride to 2 and y stride to 2*width and then halving the width and height. You can render a cropped image without making your blit functions accept a boatload of parameters by just modifying these fields and keeping the y stride to width to get from one row of the cropped section of the image to the next:
// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source,
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);
// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst,
int src_x, int src_y, int src_w, int src_h,
const uint32_t* src, bool flip_x, bool flip_y);
By using negative values and passing it to an image operation (ex: a blit operation), the result will naturally be flipped without having to actually flip the image. It'll end up being "drawn flipped", so to speak, just as with the case of using OGL with a negative scaling transformation matrix.

Manipulating JPEG images pixel-per-pixel using Mini Jpeg Decoder

I want to manipulate JPEG images with C++ using the decoder Mini Jpeg Decoder.
The problem is: I want to read pixel per pixel, but the decoder only returns an imageData-array, similar as libjpeg does.
I can't make a method like this:
char getPixel(char x, char y, unsigned char* imageData)
{
//...???
}
The return (the char variable) should contain the luminance of the pixel.
(I work with grayscale images...)
How can I solve this problem?
As far as I can tell, the Decoder class delivers a byte array of color values with the GetImage() method. So you could write a function that looks like this:
char getLuminance(Decoder* dec, int x, int y) {
if(x < 0 || y < 0 || x >= dec->GetWidth() || y >= dec->GetHeight()) {
throw "out of bounds";
}
return dec->GetImage()[x + y * dec->GetWidth()];
}
I'm uncertain of the pixel layout, so maybe the array access is not right. Also this only works for grey-scale images, or else you would get the luminance of the Red color value at the position only. HTH

OpenCV: Accessing And Taking The Square Root Of Pixels

I'm using OpenCV for object detection and one of the operations I would like to be able to perform is a per-pixel square root. I imagine the loop would be something like:
IplImage* img_;
...
for (int y = 0; y < img_->height; y++) {
for(int x = 0; x < img_->width; x++) {
// Take pixel square root here
}
}
My question is how can I access the pixel value at coordinates (x, y) in an IplImage object?
Assuming img_ is of type IplImage, and assuming 16 bit unsigned integer data, I would say
unsigned short pixel_value = ((unsigned short *)&(img_->imageData[img_->widthStep * y]))[x];
See also here for IplImage definition.
OpenCV IplImage is a one dimensional array. You must create a single index to get at image data. The position of your pixel will be based on the color depth, and number of channels in your image.
// width step
int ws = img_->withStep;
// the number of channels (colors)
int nc = img_->nChannels;
// the depth in bytes of the color
int d = img_->depth&0x0000ffff) >> 3;
// assuming the depth is the size of a short
unsigned short * pixel_value = (img_->imageData)+((y*ws)+(x*nc*d));
// this gives you a pointer to the first color in a pixel
//if your are rolling grayscale just dereference the pointer.
You can pick a channel (color) by moving over pixel pointer pixel_value++. I would suggest using a look up table for square roots of pixels if this is going to be any sort of real time application.
please use the CV_IMAGE_ELEM macro.
Also, consider using cvPow with power=0.5 instead of working on pixels yourself, which should be avoided anyways
You may find several ways of reaching image elements in Gady Agam's nice OpenCV tutorial here.