Minor image differences in image manipulation program - c++

I've implemented a method that takes an input image and flips it around a vertical line through the center and saves it to an output image file. So whats on the left becomes on the right and vice versa. The image looks great and looks like it flipped perfectly. However, we are given the actual flipped image file that its supposed to look like, and I used the diff utility in terminal to compare the two, and it states that there are indeed differences. Using a program called Kaliedoscope, I was able to find out the difference:there are a handful of pixels that for some reason are colored differently than they should be. Not sure why it is. My code doesnt even manipulate RGB values.

What image format did you save as? If you used a lossy compression, such as JPEG, then the image colours will always be slightly different, as they have been re-compressed. You should use a non-lossy format such as PNG.
You should also not use 'diff' to look at images. I don't know what Kaleidoscope is, but the ImageMagick 'compare' utility is good for looking at the difference between two images. 'diff' will almost always tell you there is a difference between two images, even if they are identical and you used a non-lossy format, due to the fact that when you recompressed it, it might use a different compression technique.
Also, you say you were given the flipped image file (assuming this is a homework thing). In that case, it's possible that the person who generated that file made the mistake (e.g., using a lossy compression). I would not worry about minor pixel differences in that case.

If your function works well, it should be its own inverse (as a necessary condition, although not enough to prove correctness).
Check if
a == flipleft(flipleft(a))
What if you trusted third party testing software has a bug?
HTH!
Edit
Also check the vertical center of the image, to be sure that when the number of horizontal pixels is even you are swapping the two in the middle.

There doesn't seem to be anything obviously wrong with your code.
Since you have a reference image, you can tell the precise pixel positions where there is a difference. I suggest that you step through your program with a debugger (gdb if you're a Linux user, or Visual Studio if that's what floats your boat) and put breakpoints in the inner loop of your code at the problem positions. Using those breakpoints, look for the first point in the program where the problem manifests itself. This will help you find the cause.
Working with smaller images (something that you can print to the command-line at each iteration, for example 8x8 pixels) may save you some time when debugging.
It may be good to post images -- your result and the expected reference.

Modify your code so that you read the input image into an Image and then write it into an output file (without reversal) and compare the input file to the output file.
If they do not match then either the file->Image->file process is corrupting the data (perhaps with a pixel member that is the wrong size, leading to roundoff or use of uninitialized memory) or the comparator (e.g. Kaleidoscope) is wrong, and you can test by just copying the input file and comparing.
If they do match then either your reversal procedure is wrong (which seems unlikely) or the reference file (which the output "is supposed to look like") is wrong, and you can test by altering the code to read in the reference file as well, and report the first disagreement-- that is, construct three Images, Before (read from the input file), After (which will be written to the output file) and Reference (which was read from the reference file), then iterate over x and y, comparing After(x,y) to Reference(x,y). As soon as you find a disagreement, see which one matches Before(width-x-1, y); if Reference matches then your reversal routine is wrong, and if After matches then the reference file is wrong (and you can point to a pixel that proves it).

Related

C++: What can I use to generate sine wave patterns onto an image?

I am looking for any C++ tools that will help me generate sine wave like fringe patterns onto a loaded image like so:
Any ideas using other programming modes (scripts?) would also be useful. If any more information is requested, please let me know.
You might want to look into OpenCV:
http://opencv.itseez.com/doc/tutorials/core/basic_linear_transform/basic_linear_transform.html#brightness-and-contrast-adjustments
Looks like it might be of use, though I don't know if it is sufficient for your specific use case. You should be able to do it manually though.
The rendering of a sine wave would result from local brightness adjustments through calculation of the sine value for the image position relative to the period ( e.g. period == image width). I don't have any real knowledge of the library, but from telling from previous experiences with Matlab and similar tools, the brightness distribution would pixel-wise hence be calculated
local_brightness = sin(2pi*cur_pos/width)*local_brightness
If you know the color space and the format of the image you might as well do it manually, pixel for pixel like described above. In that case you could read in the image with http://libav.org/ and recalculate it.
Oh and one last general idea, given you know the image format and color space:
Generate a vector that fits the width of the target image, then calculate the sine signal relating to the x-axis and multply the resulting vector with the target image brightness?
I admit it's a long shot, but it might work for you :P
You'll have to be more specific about exactly what you're looking for. Magick++, the C++ bindings for the ImageMagick library, has a lot of tools for doing various types of image processing, but depending on your needs it may or may not be able to do what you want.

Extracting part of a scanned document (personal ID) - which library and method to choose?

I have to process a lot of scanned IDs and I need to extract photos from them for further processing.
Here's a fictional example:
The problem is that the scans are not perfectly aligned (rotated up to 10 degrees). So I need to find their position, rotate them and cut out the photo. This turned out to be a lot harder than I originally thought.
I checked OpenCV and the only thing I found was rectangle detection but it didn't give me good results: the rectangle not always matches good enough on samples. Also its image matching algorithm works only for not-rotated image since it's just a brute force comparison.
So I though about using ARToolkit (augmented reality lib) because I know that it's able to very precisely locate given marker on an image. But it it seems that the markers have to be very simple, so I can't use a constant part of the document for this purpose (please correct me if I'm wrong). Also, I found it super-hard to compile it on Ubuntu 11.10.
OCR - haven't tried this one yet and before I start my research I'd be thankful for any suggestions what to look for.
I look for a C(preferable)/C++ solution. Python is an option too.
If you don't find another ideal solution, one method I ended up using for OCR preprocessing in the past was to convert the source images to PPM and use unpaper in Ubuntu. You can attempt to deskew the image based on whichever sides you specify as having clearly-defined edges, and there is an option to bypass the filters that would normally be applied to black and white text. You probably don't want those for images.
Example for images skewed no more than 15 degrees, using the bottom and right edges to detect rotation:
unpaper -n -dn bottom,right -dr 15 input.ppm output.ppm
unpaper was written in C, if the source is any help to you.

Simple image analysis

I am looking for a method, software or library for simple image analysis.
The input image will be a white-colored background, and some random small black dots on it.
I need to generate a .txt file that represents these dots' coordinates. That is, if there are three dots in the image the output will be a text file that includes somehow a representation of three coordinates, (x1,y1), (x2,y2), and (x3,y3).
I have searched the web for hours and didn't find something appropriate, all I found was complex programs for image processing.
I've been told that it's easy to write code for this mission in MATLAB, but I'm unfamiliar with MATLAB.
Can this be done easily with C++, Java or C#?
Any good libraries?
It is quite simple in any language. Depending on the form of your input, you probably need to go over all of it (assuming it is a simple matrix - simply have two nested loops, one for the x coordinate and one for the y coordinate), whenever you encounter a black dot - simply output the current indexes which would be the x and y coordinates for the dot.
As to libraries, anything other than something to decode your input to the form of such a matrix (e.g. a JPEG decoder) would be overkill.
I don't think you would need image processing libraries for this kind of problem (somebody correct me if I am wrong) since these libraries may focus on image manipulation and not recognition. What you will need is a knowledge of the image format that you are supporting (how are they stored, how are they interpreted, etc) and basic C file system functions.
For example, if you are expecting a JPG file format you will simply calculate the padding for each scanline and reach each scan line one by one, and each pixel in the line one by one. You'd have to use two counters, one for the row and one for the column. If the pixel is simply not white, then you have your coordinate
This is something which should be very easy for you to do without any external software; something like
for(y in [0..height]) {
for(x in [0..width]) {
if(pixels[y][x].color == BLACK)
print("(%d, %d)", x, y);
}
}
would work.
The bitmap file format is quite easy to read.
http://en.wikipedia.org/wiki/BMP_file_format
You could just stream the bytes into an array using this info. I've written a few BMP readers; it is a trivial matter.
Also, although I cannot vouch for its ease of use as I've never used it before, I've heard that EasyBMP works fine too.
CImg library shold help you. From CImg FAQ:
1.1. What is the CImg Library ?
The CImg Library is an open-source C++ toolkit for image processing.
It mainly consists in a (big) single header file CImg.h providing a
set of C++ classes and functions that can be used in your own sources,
to load/save, manage/process and display generic images. It's actually
a very simple and pleasant toolkit for coding image processing stuffs
in C++ : Just include the header file CImg.h, and you are ready to
handle images in your C++ programs.

How did this person code "Hello World" with Microsoft Paint?

I have just seen this within the past few days and cannot figure out how it works. The video I talk about is here:
It's the top rated answer from this Stack Overflow question: Why was this program rejected by three compilers?
How is this bitmap able to show a C++ program for "Hello World"?
A BMP (DIB) image is composed by a header followed by uncompressed1 color data (for 24 bpp images it's 3 bytes per pixel, stored in reverse row order and with 4 bytes row stride).
The bytes for color data are used to represent colors (i.e. none of them are "mandated" by the file format2, they all come from the color of each pixel), and there's a perfect 1:1 correspondence between pixel colors and bytes written in the file; thus, using perfectly chosen colors you can actually write anything you want in the file (with the exception of the header).
When you open the generated file in notepad, the color data will be shown as text; you can still clearly see from the header (the part from BM to the start of the text), that is mandated by the file format.
In my opinion this video was done this way: first the author calculated the size needed for the bitmap, and created a DIB file of the correct size filled with a color that expands to a simple pattern (e.g. all bytes 65 => 'A'); then replaced such pattern with the "payload" code, as shown in the video.
Notice however that it's not impossible to hand-craft the whole thing with notepad - with the color chooser dialog, an ASCII table and a basic knowledge of the DIB format it can be done, but it would be much much slower and error-prone.
More info about the DIB format
There are RLE compressed DIBs, but in this case uncompressed bitmaps are used (and they are used really rarely anyway).
With the exception of the stride, that was avoided using rows multiple of 4 bytes.
I assume you're referring to the answer to one of the April Fools questions.
My guess is that each pixel has a binary representation for it. And that each character in source code has a binary representation for it.
The person who created the program must have worked out the color for each pixel that'd have a binary representation that'd correspond to each character.
From a theoretical computer science point of view, it would be interesting to ask, if every program can be written in such a way so that, viewed as a bitmap, you actually saw the source code that does the same thing. If you are seriously interested in such results, read e.g. about the Kleene's fixed point theorem.
Program-as-an-image can also be viewed as a form of code obfuscation. Not that it were particularly practical...

Extracting basic info from animation file

I'm writing an application that handles metadata for images and all kinds of animations, so I'm looking for a way to find basic info about an animation file, e.g:
length (in minutes/seconds/frames)
aspect ratio of pixels
resolution of individual frames
framerate
Right now, I let my program execute
mplayer -identify animfile.avi
and parse its console output, which contains all the info I need in a machine-readable format. This works fine, but I know that some potential users of the program prefer vlc as a media player so I'd rather avoid having a hard dependence on mplayer being installed.
I've tried
vlc -vv animfile.avi
which prints an ungodly amount of junk on the console, sometimes containing the stuff I'm looking for. The formatting and what data gets printed seems to vary depending on the file format of the animation though.
Is there an easier way to extract basic info from an animation of any format one has a decoder for (especially the length of the animation) using vlc or som other app/library that is usually available on a typical Linux installation?
Edit: I'd rather use another program to do the dirty work, as this is supposed to work for any animation format, e.g avi, mpg, mov, wmv, vob etc.
Edit: totem-video-indexer seems more promising, and was also included with the standard installation. Enough codecs to make it useful, however, was not. That could be fixed by installing the "non-free-codecs" package from medibuntu.
The output of totem-video-indexer is very easy to parse:
TOTEM_INFO_DURATION=5217
TOTEM_INFO_HAS_VIDEO=True
TOTEM_INFO_VIDEO_WIDTH=720
TOTEM_INFO_VIDEO_HEIGHT=480
TOTEM_INFO_VIDEO_CODEC=XVID MPEG-4
TOTEM_INFO_FPS=30
TOTEM_INFO_HAS_AUDIO=True
TOTEM_INFO_AUDIO_BITRATE=50
TOTEM_INFO_AUDIO_CODEC=MPEG 1 Audio, Layer 3 (MP3)
TOTEM_INFO_AUDIO_SAMPLE_RATE=48000
TOTEM_INFO_AUDIO_CHANNELS=Stereo
mediainfo is a pretty useful program. It's LGPL, and is just a frontend for libmediainfo, which should be exactly what you want.
http://mediainfo.sf.net/
This is a little more difficult question than you may realize. The AVI file format grew over time, and often has nearly the same information in two or three different places. In some cases those are really supposed to agree (but sometimes don't) and in other cases they're subtly different.
Just for example, you asked about the width and height. There are actually four different width/height specs for a single frame: the screen width/height, the pixel width/height (from which you derive the pixel aspect ratio), the active width/height, and the compressed width/height. The frame width and height is the (theoretical) size of the screen. The active width/height excludes the overscan area. The compressed width/height takes into account rounding -- for example, JPEG compresses in blocks of 8x8 pixels, so the compressed width and height have to be multiples of 8 for a motion JPEG file. The active width/height tells you if (for example) some pixels at the border should be ignored.
In any case, since your question is tagged C++, I'm going to guess you'd rather read the file and get the data directly than depend on spawning something else to do the dirty work. If so, you probably want to look at the OpenDML AVI file spec. You can get at least some idea of the length, resolution, and framerate just from reading the basic AVI header, which is in a fixed spot at the beginning of the file, so that much is trivial to get. It'll take a bit more work to get to the pixel aspect ratio though...