I had been reading a webpage on Image Compression (Lossy and Non-lossy).
Now this is my problem, I was successful in making a project on Face detection using opencv - however - my Project Guide is not satisfied - my project simply captures the frames from a Capture device [ webcam ] and passes frames in a function to detect the Faces in those frames and outputs the detect frames in Windows.
My Project Guide wants me to implement some algorithm either of image compression or morphing , etc. but was not happy on seeing such heavy usage of the Library -
So what I would like to know - is it possible to code using C or C++ - image compression algorithms? If yes would not the code size be huge? (my project is supposed to be a minor one)
Please help me out, suppose I want to use the RLE compression using C++ how should I go about it?
You want to invent your own image compression or implement one of the standard ones?
( I assume this is for some sort of class/assignment, you wouldn't do this in the real world!)
You can compress simple images a little using something like Run-Length, especially if you can reduce the number of colours ie. a cartoon or graphic, but for a real photo style image it isn't going to work - that's why complex lossy techniques like jpeg or wavelets were invented.
It's very possible, and RLE compression is quite easy. If you want to look at a relatively straight-forward approach to RLE that won't use a lot of code, look at implementing a version of packbits.
Here's another link as well: http://michael.dipperstein.com/rle/index.html (includes an implementation with source-code for both traditional RLE and packbits)
BTW, keep in mind that you could, with noisy data, actually end up with more data than uncompressed using RLE schemes. For most "real-world" images though that have some form of low-pass filtering applied and a relatively good signal-to-noise ration (i.e,. above 40db), you should expect around 1.5:1 to 1.7:1 compression ratios.
Another option for lossless compression would be huffman-encoding ... that algorithm is more tolerant of noisy images, in that it generally prevents the data-expansion that could occur with those types of images when encoded with a RLE compression algorithm.
Finally, you didn't mention whether you were working with color or grayscale images ... if it's a color image, remember that you will find much greater redundancy if you compress each color-channel in a planar-color-channel image, rather than trying to compress contiguous RGB data.
RLE is the best way to go here. Even the "simplest" compression algorithms are non-trivial and require in-depth knowledge of color space transforms, discrete sin/cosine transforms, entropy, etc.
Back to RLE... to loop through pixesls use something like this:
cv::Mat img = cv::imread("lenna.png");
for(int i=0; i < img.rows; i++)
for(int j=0; i < img.cols; j++)
// You can now access the pixel value with cv::Vec3b
std::cout << img.at<cv::Vec3b>(i,j)[0] << " " << img.at<cv::Vec3b>(i,j)[1] << " " << img.at<cv::Vec3b>(i,j)[2] << std::endl;
Count the number of similar pixels in a row and store them in any data structure (maybe a < #Occurences, Vec3b > tuple in a vector?). Once you have your final vector, don't forget to store the size of your image somewhere with the aforementioned vector (maybe in a simple compressedImage struct) and voilĂ , you just compressed an image. To store it in a file, I suggest you use boost::serialize or something similar.
Your final struct may look something similar to:
struct compressedImage {
int height;
int width;
vector< pair<int, Vec3b> > data;
};
Happy coding!
You want to implement a compression based on colour reduction with a space-filling-curve or a spatial index. A si reduce the 2d complexity to a 1d complexity and it looks like a quadtree and a bit like a fractal. You want to look for Nick's hilbert curve quadtree spatial index blog!
Here is another interesting RLE encoding idea: Lossless hierarchical run length encoding. Maybe that's something for you?
if you need to abstract the raster type, you can use GDAL C++ library. Here is the list of supported by default or on request raster formats:
http://gdal.org/formats_list.html
Related
I am using Visual Studio and looking to find a useful image processing library that will take care of basic image processing functions such as rotation so that I don't have to keep coding them manually. I came across CImg and it supports this, as well as many other useful functions, along with interpolation.
However, all the examples I've seen show CImg being used by loading and using full images. I want to work with pixel data. So my loops are the typical:
for (x=0;x<width; x++)
for (y=0;y<height; y++)
I want to perform bilinear or bicubic rotation in this instance and I see CImg supports this. It provides a rotate() and get_rotate function, among others.
I can't find any examples online that show how to use this with pixel data. Ideally, I could simply pass it the pixel color, x, y, and interpolation method, and have it return the result.
Could anyone provide any helpful suggestions? If CImg is not the right library for this type of this, could anyone recommend a simple, light-weight, easy-to-use one?
Thank you!
You can copy pixel data to CImg class using iterators, and copy it back when you are done.
std::vector<uint8_t> pixels_src, pixels_dst;
size_t width, height, n_colors;
// Copy from pixel data
cimg_library::CImg<uint8_t> image(width, height, 1, n_colors);
std::copy(pixels_src.begin(), pixels_src.end(), image.begin());
// Do image processing
// Copy to pixel data
pixels_dst.resize(width * height * n_colors);
std::copy(image.begin(), image.end(), pixels_dst.begin());
I'm looking for some image compression operations, preferably simple in nature, that provide moderate compression ratios while preserving the edges in the images.
Please note that algorithms like JPEG which pack multiple operations are not applicable (unfortunately).
If you're using numpy, I suggest you take a look at the scipy.misc.imsave method
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.imsave.html
You can easily store your data in png without any loss and with compression ratios along the ranges you mentioned in your comment, e.g.,
rgb = np.zeros((255, 255, 3), dtype=np.uint8)
rgb[..., 0] = np.arange(255)
rgb[..., 1] = 55
rgb[..., 2] = 1 - np.arange(255)
imsave('/tmp/rgb_gradient.png', rgb)
Edit after comment 1:
It is really difficult to answer this question because of the lack of specifics.
Retaining a compressed version of the image in memory will certainly slow down your processing, as you will either need to decode and encode relevant parts of the image in each operation, or you'll need to use very specific algorithms that allow you to access and modify pixel values in the compressed domain (e.g., http://ieeexplore.ieee.org/document/232097/).
Now, to answer your question, the simplest way I can think is to use Huffman coding (https://www.geeksforgeeks.org/greedy-algorithms-set-3-huffman-coding/) and store the codewords in memory. You will probably need to encode groups of pixels together so that each byte of codewords results in more than one pixel (and hence you could have any real compression). Otherwise, you'd need to find a way to efficiently pack small codewords (say 2 or 3 bits) together, which will certainly hinder your ability to read and write individual pixel values.
I wanted to plot the wave-form of the .wav file for the specific plotting width.
Which method should I use to display correct waveform plot ?
Any Suggestions , tutorial , links are welcomed....
Basic algorithm:
Find number of samples to fit into draw-window
Determine how many samples should be presented by each pixel
Calculate RMS (or peak) value for each pixel from a sample block. Averaging does not work for audio signals.
Draw the values.
Let's assume that n(number of samples)=44100, w(width)=100 pixels:
then each pixel should represent 44100/100 == 441 samples (blocksize)
for (x = 0; x < w; x++)
draw_pixel(x_offset + x,
y_baseline - rms(&mono_samples[x * blocksize], blocksize));
Stuff to try for different visual appear:
rms vs max value from block
overlapping blocks (blocksize x but advance x/2 for each pixel etc)
Downsampling would not probably work as you would lose peak information.
Either use RMS, BlockSize depends on how far you are zoomed in!
float RMS = 0;
for (int a = 0; a < BlockSize; a++)
{
RMS += Samples[a]*Samples[a];
}
RMS = sqrt(RMS/BlockSize);
or Min/Max (this is what cool edit/Audtion Uses)
float Max = -10000000;
float Min = 1000000;
for (int a = 0; a < BlockSize; a++)
{
if (Samples[a] > Max) Max = Samples[a];
if (Samples[a] < Min) Min = Samples[a];
}
Almost any kind of plotting is platform specific. That said, .wav files are most commonly used on Windows, so it's probably a fair guess that you're interested primarily (or exclusively) in code for Windows as well. In this case, it mostly depends on your speed requirements. If you want a fairly static display, you can just draw with MoveTo and (mostly) LineTo. If that's not fast enough, you can gain a little speed by using something like PolyLine.
If you want it substantially faster, chances are that your best bet is to use something like OpenGL or DirectX graphics. Either of these does the majority of real work on the graphics card. Given that you're talking about drawing a graph of sound waves, even a low-end graphics card with little or no work on optimizing the drawing will probably keep up quite easily with almost anything you're likely to throw at it.
Edit: As far as reading the .wav file itself goes, the format is pretty simple. Most .wav files are uncompressed PCM samples, so drawing them is a simple matter of reading the headers to figure out the sample size and number of channels, then scaling the data to fit in your window.
Edit2: You have a couple of choices for handling left and right channels. One is to draw them in two separate plots, typically one above the other. Another is to draw them superimposed, but in different colors. Which is more suitable depends on what you're trying to accomplish -- if it's mostly to look cool, a superimposed, multi-color plot will probably work nicely. If you want to allow the user to really examine what's there in detail, you'll probably want two separate plots.
What exactly do you mean by a waveform? Are you trying to plot the level of the frequency components in the signal a.k.a the spectrum, most commonly seen in musci visualizers, car stereos, boomboxes? If so, you should use the Fast Fourier Transform. FFT is a standard technique to split a time domain signal into its individual frequencies. There are tons of good FFT library routines available.
In C++, you can use the openFrameworks library to set up a music player for wav, extract the FFT and draw it.
You can also use Processing with the Minim library to do the same. I have tried it and it is pretty straightforward.
Processing even has support for OpenGL and it is a snap to use.
I have a bitmap image context and want to let this appear blurry. So best thing I can think of is a gauss algorithm, but I have no big idea about how this kind of gauss blur algorithms look like? Do you know good tutorials or examples on this? The language does not matter so much, if it's done all by hand without using language-specific API too much. I.e. in cocoa the lucky guys don't need to think about it, they just use a Imagefilter that's already there. But I don't have something like this in cocoa-touch (objective-c, iPhone OS).
This is actually quite simple. You have a filter pattern (also known as filter kernel) - a (small) rectangular array with coefficients - and just calculate the convolution of the image and the pattern.
for y = 1 to ImageHeight
for x = 1 to ImageWidth
newValue = 0
for j = 1 to PatternHeight
for i = 1 to PatternWidth
newValue += OldImage[x-PatternWidth/2+i,y-PatternHeight/2+j] * Pattern[i,j]
NewImage[x,y] = newValue
The pattern is just a Gauss curve in two dimensions or any other filter pattern you like. You have to take care at the edges of the image because the filter pattern will be partialy outside of the image. You can just assume that this pixels are balck, or use a mirrored version of the image, or what ever seems reasonable.
As a final note, there are faster ways to calculate a convolution using Fourier transforms but this simple version should be sufficent for a first test.
The Wikipedia article has a sample matrix in addition to some standard information on the subject.
Best place for image processing is THIS. You can get matlab codes there.
And this Wolfram demo should clear any doubts about doing it by hand.
And if you don't want to learn too many things learn PIL(Python Imaging Library).
"Here" is exactly what you need.
Code copied from above link:
import ImageFilter
def filterBlur(im):
im1 = im.filter(ImageFilter.BLUR)
im1.save("BLUR" + ext)
filterBlur(im1)
I am developing a graphics application using Qt 4.5 and am putting images in the QPixmapCache, I wanted to optimise this so that if a user inserts an image which is already in the cache it will use that.
Right now each image has a unique id which helps optimises itself on paint events. However I realise that if I could calculate a hash of the image I could lookup the cache to see if it already exists and use that (it would help more for duplicate objects of course).
My problem is that if its a large QPixmap will a hash calculation of it slow things down or is there a quicker way?
A couple of comments on this:
If you're going to be generating a hash/cache key of a pixmap, then you may want to skip the QPixmapCache and use QCache directly. This would eliminate some overhead of using QStrings as keys (unless you also want to use the file path to locate the items)
As of Qt4.4, QPixmap has a "hash" value associated with it (see QPixmap::cacheKey() ). The documentation claims "Distinct QPixmap objects can only have the same cache key if they refer to the same contents." However, since Qt uses shared-data copying, this may only apply to copied pixmaps and not to two distinct pixmaps loaded from the same image. A bit of testing would tell you if it works, and if it does, it would let you easily get a hash value.
If you really want to do a good, fairly quick cache with removing duplications, you might want to look at your own data structure that sorts according to sizes, color depths, image types, and things such as that. Then you would only need to hash the actual image data after you find the same type of image with the same dimensions, bit-depths, etc. Of course, if your users generally open a lot of images with those things the same, it wouldn't help at all.
Performance: Don't forget about the benchmarking stuff Qt added in 4.5, which would let you compare your various hashing ideas and see which one runs the fastest. I haven't checked it out yet, but it looks pretty neat.
Just in case anyone comes across this problem (and isn't too terribly experienced with hashing things, particularly something like an image), here's a VERY simple solution I used for hashing QPixmaps and entering them into a lookup table for later comparison:
qint32 HashClass::hashPixmap(QPixmap pix)
{
QImage image = pix.toImage();
qint32 hash = 0;
for(int y = 0; y < image.height(); y++)
{
for(int x = 0; x < image.width(); x++)
{
QRgb pixel = image.pixel(x,y);
hash += pixel;
hash += (hash << 10);
hash ^= (hash >> 6);
}
}
return hash;
}
Here is the hashing function itself (you can have it hash into a qint64 if you desire less collisions). As you can see I convert the pixmap into a QImage, and simply walk through its dimensions and perform a very simple one-at-a-time hash on each pixel and return the final result. There are many ways to improve this implementation (see the other answers to this question), but this is the basic gist of what needs to be done.
The OP mentioned how he would use this hashing function to then construct a lookup table for later comparing images. This would require a very simple lookup initialization function -- something like this:
void HashClass::initializeImageLookupTable()
{
imageTable.insert(hashPixmap(QPixmap(":/Image_Path1.png")), "ImageKey1");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path2.png")), "ImageKey2");
imageTable.insert(hashPixmap(QPixmap(":/Image_Path3.png")), "ImageKey2");
// Etc...
}
I'm using a QMap here called imageTable which would need to be declared in the class as such:
QMap<qint32, QString> imageTable;
Then, finally, when you want to compare an image to the images in your lookup table (ie: "what image, out of the images I know it can be, is this particular image?"), you just call the hashing function on the image (which I'm assuming will also be a QPixmap) and the return QString value will allow you to figure that out. Something like this would work:
void HashClass::compareImage(const QPixmap& pixmap)
{
QString value = imageTable[hashPixmap(pixmap)];
// Do whatever needs to be done with the QString value and pixmap after this point.
}
That's it. I hope this helps someone -- it would have saved me some time, although I was happy to have the experience of figuring it out.
Hash calculations should be pretty quick (somewhere above 100 MB/s if no disk I/O involved) depending on which algorithm you use. Before hashing, you could also do some quick tests to sort out potential candidates - f.e. images must have same width and height, else it's useless to compare their hash values.
Of course, you should also keep the hash values for inserted images so you only have to calculate a hash for new images and won't have to calculate it again for the cached images.
If the images are different enough, it would perhaps be enough to not hash the whole image but a smaller thumbnail or a part of the image (f.e. first and last 10 lines), this will be faster, but will lead to more collisions.
I'm assuming you're talking about actually calculating a hash over the data of the image rather than getting the unique id generated by QT.
Depending on your images, you probably don't need to go over the whole image to calculate a hash. Maybe only read the first 10 pixels? first scan line?
Maybe a pseudo random selection of pixels from the entire image? (with a known seed so that you could repeat the sequence) Don't forget to add the size of the image to the hash as well.