I have images that I am using for a computer vision task. The task is sensitive to image quality. I'd like to remove all images that are below a certain threshold, but I am unsure if there is any method/heuristic to automatically detect images that are heavily compressed via JPEG. Anyone have an idea?
Image Quality Assessment is a rapidly developing research field. As you don't mention being able to access the original (uncompressed) images, you are interested in no reference image quality assessment. This is actually a pretty hard problem, but here are some points to get you started:
Since you mention JPEG, there are two major degradation features that manifest themselves in JPEG-compressed images: blocking and blurring
No-reference image quality assessment metrics typically look for those two features
Blocking is fairly easy to pick up, as it appears only on macroblock boundaries. Macroblocks are a fixed size -- 8x8 or 16x16 depending on what the image was encoded with
Blurring is a bit more difficult. It occurs because higher frequencies in the image have been attenuated (removed). You can break up the image into blocks, DCT (Discrete Cosine Transform) each block and look at the high-frequency components of the DCT result. If the high-frequency components are lacking for a majority of blocks, then you are probably looking at a blurry image
Another approach to blur detection is to measure the average width of edges of the image. Perform Sobel edge detection on the image and then measure the distance between local minima/maxima on each side of the edge. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous approach. "No Reference Block Based Blur Detection" by Debing is a more recent paper
Regardless of what metric you use, think about how you will deal with false positives/negatives. As opposed to simple thresholding, I'd use the metric result to sort the images and then snip the end of the list that looks like it contains only blurry images.
Your task will be a lot simpler if your image set contains fairly similar content (e.g. faces only). This is because the image quality assessment metrics
can often be influenced by image content, unfortunately.
Google Scholar is truly your friend here. I wish I could give you a concrete solution, but I don't have one yet -- if I did, I'd be a very successful Masters student.
UPDATE:
Just thought of another idea: for each image, re-compress the image with JPEG and examine the change in file size before and after re-compression. If the file size after re-compression is significantly smaller than before, then it's likely the image is not heavily compressed, because it had some significant detail that was removed by re-compression. Otherwise (very little difference or file size after re-compression is greater) it is likely that the image was heavily compressed.
The use of the quality setting during re-compression will allow you to determine what exactly heavily compressed means.
If you're on Linux, this shouldn't be too hard to implement using bash and imageMagick's convert utility.
You can try other variations of this approach:
Instead of JPEG compression, try another form of degradation, such as Gaussian blurring
Instead of merely comparing file-sizes, try a full reference metric such as SSIM -- there's an OpenCV implementation freely available. Other implementations (e.g. Matlab, C#) also exist, so look around.
Let me know how you go.
I had many photos shot to an ancient book (so similar layout, two pages per image), but some were much blurred, to the point that the text could not be read. I searched for a ready-made batch script to find the most blurred one, but I didn't find any useful, so I used another part of script got on the net (based on ImageMagick, but no longer working; I couldn't retrieve the author for the credits!), useful to assessing the blur level of a single image, tweaked it, and automatised it over a whole folder. I uploaded here:
https://gist.github.com/888239
hoping it will be useful for someone else. It works on a Linux system, and uses ImageMagick (and some usually command line installed tools, as gawk, sort, grep, etc.).
One simple heuristic could be to look at width * height * color depth < sigma * file size. You would have to determine a good value for sigma, of course. sigma would be dependent on the expected entropy of the images you are looking at.
Related
My objective is to recognize the footprints of buildings on aerial photos. Having heard about recent progress in machine vision (ImageNet Large Scale Visual Recognition Challenges) I though I could (at least) try to use neural networks for this task.
Can anybody give me the idea what should be the topology of such a network? I guess it should have as many outputs as inputs (which means all the pixels in picture) since I want to recognize the outlines of buildings with their (at least approximate) placement on the picture.
I guess the input pictures should be of standard size, with each pixel normalized to grey scale or YUV color space (1 value per color) and maybe normalized resolution (each pixel should represent fixed size in reality). I am not sure if the picture could be preprocessed in any other way before inputting into net, maybe by extracting the edges first?
The tricky part is how the outputs should be represented and how to train the net. Using just e.g. output=0 for the pixel within building footprint and 1 for the pixel outside of it, might not be the best idea. Maybe I should teach the network to recognize edges of the building instead so the pixels which represent building edges should have 1's and 0's for the rest of pixels?
Can anybody throw in some suggestions about network topology/inputs/outputs formats?
Or maybe this task is hopelessly difficult and I have 0 chances to solve it?
I think we need a better definition of "buildings". If you want to do building "detection", that is detect the presence of a building of any shape/size, this is difficult for a cascade classifier. You can try the following, though:
Partition a set of known images to fixed-size blocks.
Label each block as "building", "not building", or
"boundary(includes portions
of both)"
Extract basic features like intensity histograms, edges,
hough lines, HOG, etc.
Train SVM classifiers based on these features (you can try others, too, but I recommend SVM by experience).
Now you can partition your images again and use the trained classifier to get the results. The results will have to be combined to identify buildings.
This will still need some testing to get the parameters(size of histograms, parameters of SVM classifier etc.) right.
I have used this approach to detect "food" regions on images. The accuracy was below 70%, but my guess is that it will be better for buildings.
I am doing a project on face recognition, for that I have already used different methods like eigenface, fisherface, LBP histograms and surf. But these methods are not giving me an accurate result. Surf gives good matches for exact same images, but I need to match one image with it's own different poses(wearing glasses,side pose,if somebody is covering his face) etc. LBP compares histogram of images, i.e., only color informations. So when there is high variation on lighting condition it is not showing good results. So I heard about neural networks, but I don't know much about that. Is it possible to train the system very accurately by using neural networks. If possible how can we do that?
According to this OpenCV page, there does seem to be some support for machine learning. That being said, the support does seem to be a bit limited.
What you could do, would be to:
User OpenCV to extract the face of the person.
Change the image to grey scale.
Try to manipulate so that the face is always the same size.
All the above should be doable with OpenCV itself (could be wrong, haven't messed with OpenCV in a while) so that should save you some time.
Next, you take the image, as a bitmap maybe, and feed the bitmap as a vector to the neural network. Alternatively, as #MatthiasB recommended, you could feed the features instead of individual pixels. This would simplify the data being passed, thus making the network easier to train.
As for training, you manipulate these images as above, and then feed them to the network. If a person uses glasses occasionally, you could have cases of the same person with and without glasses, etc.
I am using FCam to take pictures and right now without modification, the pictures are par for a smartphone camera. FCam advertises HDR and low-light performance, but I don't see any examples of how to use that when taking pictures.
How do I take HDR pictures? From my experience with SLRs, you normally take 3 pictures, 1 under, 1 over, and 1 exposed properly for the scene.
Since I will be taking many pictures, how should I blend those pixels together? An average?
Walter
The FCam project page includes a complete camera app - FCamera, check the FCam Download Page, last item, which for "HDR Viewfinder" simply averages a long/short exposure image together, and for "HDR Capture" automatically records a burst of suitably-exposed shots. See src/CameraThread.cpp in the sources, I'm not sure how appropriate it is to quote from that but you'll find both pieces in CameraThread::run().
It doesn't average the HDR images for you, it records them as sequence. I guess that's by intent - much of the "HDR" appeal you achieve by carefully tuning the tone mapping after the averaging process, i.e. adjust how exactly the dynamic range compression back to 8bit is performed. If you'd do that in a hardcoded way on camera, you'll restrict the photographer's options with respect to achieving the optimal output. The MPI has a research group on HDR imaging techniques that provides sourcecode for this purpose.
In short, a "poor man's HDR" would just be an average. A "proper HDR" will never be 8-bit JPEG because that throws away the "high" bit in "high dynamic range" - for that reason, the conversion from HDR (which will have 16bit/color or even more) to e.g. JPEG is usually done as postprocessing (off-camera) step, from the HDR image sequence.
Note on HDR video
For HDR video, if you're recording with a single sensor on a hand-held you'll normally introduce motion between the images that form the "HDR sequence" (your total exposure time equals the sum of all subexposures, plus latency from sensor data reads and camera controller reprogramming).
That means image registration should be attempted before the actual overlay and final tone mapping operation as well, unless you're ok with the blur. Registration is somewhat compute intensive and another good reason to record the image stream first and perform the HDR video creation later (with some manual adjustment offered). The OpenCV library provides registration / matching functions.
The abovementioned MPI software is PFSTools, particularly the Tone Mapping operators (PFStmo) library. The research papers by one of the authors provide a good starting point; as to your question on how to perform the postprocessing, PFSTools are command-line utilities that interoperate/pass data via UNIX pipes; on Maemo / the N900, their use is straightforward thanks to the full Linux environment; just spawn a shell script via system().
Is there a table that gives the compression ratio of a jpeg image at a given quality?
Something like the table given on the wiki page, except for more values.
A formula could also do the trick.
Bonus: Are the [compression ratio] values on the wiki page roughly true for all images? Does the ratio depend on what the image is and the size of the image?
Purpose of these questions: I am trying to determine the upper bound of the size of a compressed image for a given quality.
Note: I am not looking to make a table myself(I already have). I am looking for other data to check with my own.
I had exactly the same question and I was disappointed that no one created such table (studies based on a single classic Lena image or JPEG tombstone are looking ridiculous). That's why I made my own study. I cannot say that it is perfect, but it is definitely better than others.
I took 60 real life photos from different devices with different dimensions. I created a script which compress them with different JPEG quality values (it uses our company imaging library, but it is based on libjpeg, so it should be fine for other software as well) and saved results to CSV file. After some Excel magic, I came to the following values (note, I did not calculated anything for JPEG quality lower than 55 as they seem to be useless to me):
Q=55 43.27
Q=60 36.90
Q=65 34.24
Q=70 31.50
Q=75 26.00
Q=80 25.06
Q=85 19.08
Q=90 14.30
Q=95 9.88
Q=100 5.27
To tell the truth, the dispersion of the values is significant (e.g. for Q=55 min compression ratio is 22.91 while max value is 116.55) and the distribution is not normal. So it is not so easy to understand what value should be taken as typical for a specific JPEG quality. But I think these values are good as a rough estimate.
I wrote a blog post which explains how I received these numbers.
http://www.graphicsmill.com/blog/2014/11/06/Compression-ratio-for-different-JPEG-quality-values
Hopefully anyone will find it useful.
Browsing Wikipedia a little more led to http://en.wikipedia.org/wiki/Standard_test_image and Kodak's test suite. Although they're a little outdated and small, you could make your own table.
Alternately, pictures of stars and galaxies from NASA.gov should stress the compressor well, being large, almost exclusively composed of tiny speckled detail, and distributed in uncompressed format. In other words, HUBBLE GOTCHOO!
The compression you get will depend on what the image is of as well as the size. Obviously a larger image will produce a larger file even if it's of the same scene.
As an example, a random set of photos from my digital camera (a Canon EOS 450) range from 1.8MB to 3.6MB. Another set has even more variation - 1.5MB to 4.6MB.
If I understand correctly, one of the key mechanisms for attaining compression in JPEG is using frequency analysis on every 8x8 pixel block of the image and scaling the resulting amplitudes with a "quantization matrix" that varies with the specified compression quality.
The scaling of high frequency components often result in the block containing many zeros, which can be encoded at negligible cost.
From this we can deduce that in principle there is no relation between the quality and the final compression ratio that will be independent of the image. The number of frequency components that can be dropped from a block without perceptually altering its content significantly will necessarily depend on the intensity of those components, i.e. whether the block contains a sharp edge, highly variable content, noise, etc.
I've found myself evaluating both of these libs. Apart from what the GraphicsMagick comparison says, I see that ImageMagick still got updates and it seems that the two are almost identical.
I'm just looking to do basic image manipulation in C++ (i.e. image load, filters, display); are there any differences I should be aware of when choosing between these libraries?
As with many things in life, different people have different ideas about what is best. If you ask a landscape photographer who wanders around in the rain in Scotland's mountains which is the best camera in the world, he's going to tell you a light-weight, weather-sealed camera. Ask a studio photographer, and he'll tell you the highest resolution one with the best flash sync speed. And if you ask a sports photographer he'll tell you the one with the fastest autofocus and highest frame rate. So it is with ImageMagick and GraphicsMagick.
Having answered around 2,000 StackOverflow questions on ImageMagick over the last 5+ years, I make the following observations...
In terms of popularity...
ImageMagick questions on SO outnumber GraphicsMagick questions by a factor of 12:1 (7,375 questions vs 611 at May 2019), and
ImageMagick followers on SO outnumber GraphicsMagick followers by 15:1 ((387 followers versus 25 at May 2019)
In terms of performance...
I am happy to concede that GraphicsMagick may be faster for some, but not all problems. However, if speed is your most important consideration, I think you should probably be using either libvips, or parallel code on today's multi-core CPUs or heavily SIMD-optimised (or GPU-optimised) libraries like OpenCV.
In terms of features and flexibility...
There is one very clear winner here - ImageMagick. My experience is that there are many features missing from GraphicsMagick which are present in ImageMagick and I list some of these below, in no particular order.
I freely admit I am not as familiar with GraphicsMagick as I am with ImageMagick, but I made my very best effort to find any mention of the features in the most recent GraphicsMagick source code. So, for Canny Edge Detector, I ran the following command on the GM source code:
find . -type f -exec grep -i Canny {} \;
and found nothing.
Canny Edge detector
This appears to be completely missing in GM. See -canny radiusxsigma{+lower-percent}{+upper-percent} in IM.
See example here and sample of edge-detection on Lena image:
Parenthesised processing, sophisticated re-sequencing
This is a killer feature of ImageMagick that I frequently sorely miss when having to use GM. IM can load, or create, or clone a whole series of images and apply different processing selectively to specific images and re-sequence, duplicate and re-order them very simply and conveniently. It is hard to convey the incredible flexibility this affords you in a short answer.
Imagine you want to do something fairly simple like load image A and blur it, load image B and make it greyscale and then place the images side-by-side with Image B on the left. That looks like this with ImageMagick:
magick imageA.png -blur x3 \( imageB.png -colorspace gray \) +swap +append result.png
You can't even get started with GM, it will complain about the parentheses. If you remove them, it will complain about swapping the image order. If you remove that it will apply the greyscale conversion to both images because it doesn't understand parentheses and place imageA on the left.
See the following sequencing commands in IM:
-swap
-clone
-duplicate
-delete
-insert
-reverse
fx DIY Image Processing Operator
IM has the -fx operator which allows you to create and experiment with incredibly sophisticated image processing. You can have function evaluated for every single pixel in an image. The function can be as complicated as you like (save it in a file if you want to) and use all mathematical operations, ternary-style if statements, references to pixels even in other images and their brightness or saturation and so on.
Here are a couple of examples:
magick rose: -channel G -fx 'sin(pi*i/w)' -separate fx_sine_gradient.gif
magick -size 80x80 xc: -channel G -fx 'sin((i-w/2)*(j-h/2)/w)/2+.5' -separate fx_2d_gradient.gif
A StackOverflow answer that uses this feature to great effect in processing green-screen (chroma-keyed) images is here.
Fourier (frequency domain) Analysis
There appears to be no mention of forward or reverse Fourier Analysis in GM, nor the High Dynamic Range support (see later) that is typically required to support it. See -fft in IM.
Connected Component Analysis / Labelling/ Blob Analysis
There appears to be no "Connected Component Analysis" in GM - also known as "labelling" and "Blob Analysis". See -connected-components connectivity for 4- and 8-connected blob analysis.
This feature alone has provided 60+ answers - see here.
Hough Line Detection
There appears to be no Hough Line Detection in GM. See -hough-lines widthxheight{+threshold} in IM.
See description of the feature here and following example of detected lines:
Moments and Perceptual Hash (pHash)
There appears to be no support for image moments calculation (centroids and higher orders), nor Perceptual Hashing in GM. See -moments in IM.
Morphology
There appears to be no support for Morphological processing in GM. In IM there is sophisticated support for:
dilation
erosion
morphological opening and closing
skeletonisation
distance morphology
top hat and bottom hat morphology
Hit and Miss morphology - line ends, line junctions, peaks, ridges, Convex Hulls etc
See all the sophisticated processing you can do with this great tutorial.
Contrast Limited Adaptive Histogram Equalisation - CLAHE
There appears to be no support for Contrast Limited Adaptive Histogram Equalisation in GM. See -clahe widthxheight{%}{+}number-bins{+}clip-limit{!} in IM.
HDRI - High Dynamic Range Imaging
There appears to be no support for High Dynamic Range Imaging in GM - just 8, 16, and 32-bit integer types.
Convolution
ImageMagick supports many types of convolution:
Difference of Gaussians DoG
Laplacian
Sobel
Compass
Prewitt
Roberts
Frei-Chen
None of these are mentioned in the GM source code.
Magick Persistent Register (MPR)
This is an invaluable feature present in ImageMagick that allows you to write intermediate processing results to named chunks of memory during processing without the overhead of writing to disk. For example, you can prepare a texture or pattern and then tile it over an image, or prepare a mask and then alter it and apply it later in the same processing without going to disk.
Here's an example:
magick tree.gif -flip -write mpr:tree +delete -size 64x64 tile:mpr:tree mpr_tile.gif
Broader Colourspace Support
IM supports the following colourspaces not found in GM:
CIELab
HCL
HSI
LMS
others.
Pango Support
IM supports Pango Text Markup Language which is similar to HTML and allows you to annotate images with text that changes:
font, colour, size, weight, italics
subscript, superscript, strike-through
justification
mid-sentence and much, much more. There is a great example here.
Shrink-on-load with JPEG
This invaluable feature allows the library to shrink JPEG images as they are read from disk, so that only the necessary coefficients are read, so the I/O is lessened, and the memory consumption is minimised. It can massively improve performance when down-scaling images.
See example here.
Defined maximum JPEG size when writing
IM supports the much-requested option to specify a maximum filesize when writing JPEG files, -define jpeg:extent=400KB for example.
Polar coordinate transforms
IM supports conversion between cartesian and polar coordinates, see -distort polar and -distort depolar.
Statistics and operations on customisable areas
With its -statistic MxN operator, ImageMagick can generate many useful kinds of statistics and effects. For example, you can set each pixel in an image to the gradient (difference between brightest and darkest) of its 5x3 neighbourhood:
magick image.png -statistic gradient 5x3 result.png
Or you can set each pixel to the median of its 1x200 neighbourhood:
magick image.png -statistic median 1x200 result.png
See example of application here.
Sequences of images
ImageMagick supports sequences of images, so if you have a set of very noisy images shot at high ISO, you can load up the entire sequence of images and, for example, take the median or average of all images to reduce noise. See the -evaluate-sequence operator. I do not mean the median in a surrounding neighbourhood in a single image, I mean by finding the median of all images at each pixel position.
The above is not an exhaustive list by any means, they are just the first few things that came to mind when I thought about the differences. I didn't even mention support for HEIC (Apple's format for iPhone images), increasingly common High Dynamic Range formats such as EXR, or any others. In fact, if you compare the file formats supported by the two products (gm convert -list format and magick identify -list format) you will find that IM supports 261 formats and GM supports 192.
As I said, different people have different opinions. Choose the one that you like and enjoy using it.
As always, I am indebted to Anthony Thyssen for his excellent insights and discourse on ImageMagick at https://www.imagemagick.org/Usage/ Thanks also to Fred Weinhaus for his examples.
From what I have read GraphicsMagick is more stable and is faster.
I did a couple of unscientific tests and found gm to be twice as fast as im (doing a resize).
I found ImageMagick to be incredibly slow for processing TIFF group-4 images (B&W document images), mainly due to the fact that it converts from 1-bit-per-pixel to 8 and back again to do any image manipulation. The GraphicsMagick group overhauled the TIFF format support with their version 1.2, and it is much faster at processing these types of images than the original ImageMagick was. The current GraphicsMagick stable release is at 1.3.5.
I use ImageMagick when speed isn't a factor. However on the server side, where tens of thousands of images are being processed daily, GraphicsMagick is quite noticeably faster - in some cases up to 50% faster in benchmarks!
History
graphicsmagick was forked from imagemagick back in 2002 due to disputes between founding developers. thus they share the same codebase.
Ref : https://en.wikipedia.org/wiki/GraphicsMagick
Goal
graphicsmagick
focuses on simple, stable, and clearer codebase / architecture
imagemagick
focuses on rolling out new features, extend a wider toolbase
Other than speed, imagemagick adds a number of cli tools to terminal shell whereas graphicsmagick is a single tool which you can call.
CLI interface design
graphicsmagick
gm <command> <options> <file>
imagemagick
convert <options> <file>
compare <options> <file>
imho, i prefer (in fact, only use) graphicsmagick(gm) over imagemagick as the latter has higher chance of tool name clash, which causes lots of issues in finding out why certain tools are not running, especially during server side automation tasks. in summary graphicsmagick has much clearer design.
imagine a binary called convert in a project and is it imagemagick's convert or your own rolled tool in project that will be called?
list of imagemagick tools (including convert, compare, display) : https://imagemagick.org/script/command-line-tools.php
list of graphicsmagick commands :
http://www.graphicsmagick.org/utilities.html
note : as of v7 as mentioned by Mark S, imagemagick is now distributed as single binary, and also supporting older v6 commands.
Performance
a simple memory consumption test can be found here :
https://coderwall.com/p/1l7h-a/imagemagick-bloat-graphicsmagick
Dependancies
GraphicsMagick depends on 36 libraries whereas ImageMagick requires 64. Ref : http://www.graphicsmagick.org/1.3/FAQ.html
Note that GraphicsMagick provides API and ABI stability, which isn't part of the guarantee for ImageMagick. This would be important in the long run unless you are vendoring all your dependencies.
GraphicsMagick was an early fork from Imagemagick. You can read about Imagemagick's history and the fork to GraphicsMagick at https://imagemagick.org/script/history.php. It seems that Imagemagick has continued to be developed rather extensively, while GraphicsMagick has remained more or less stagnant since the fork.