I have parsed the bitstream to extract each block. I have counted the correct number of blocks based on the size of my image (3440) and it brings me nicely to the EOI marker ([255, 217]), so I am confident I have done this much correctly.
However there are ~1000 more bytes of data after this EOI marker, followed by another EOI. My image is greyscale and consists only of a Luminance component, as confirmed by the header. What is this mystery data??
It seems to be important in some way, because the image is an unintelligible mess if it is removed...
The only way I can think of having more than one EOI marker would be a second one within a thumbnail. If you are scanning for markers, you need to skip over those with lengths. It it possible to have something that looks like an EOI within a marker with a length.
That said, I don't know how you can calculate the location of the EOI marker because scans have no length indicator.
Related
If I use the Image::Save() function to save two images as JPEG (with the same quality), and these two images have the same dimensions (width and height), will these two images have an identical header (which I noticed it to be 623 bytes)?
Note: my testing shows that they will indeed have an identical header, but is this guaranteed?
I would never rely on the headers (or first 623 bytes) being the same in a JPEG image (except for the first 2 bytes, being 0xff and 0xd8 - Start Of Image / SOI). Even if this is the case now, it could change in the future.
So basically you would be removing information from the image, which you would not be able to reassemble if that header changes.
There could be anything in those first 623 bytes. Even the order of APPn can be random.
It would be better to give each distinct block of header data a unique ID (or MD5), and store that ID where the header data was. That way, if the header changes in the future, you just store a different ID and you will always be able to reassemble the image. Or in case you're storing the images as BLOBs in a database, add a column with the header ID.
Also make sure you split the data on full blocks - blocks starting with markers (0xff 0xnn) - the next two bytes contain the size of the block.
JPEG format: http://www.w3.org/Graphics/JPEG/itu-t81.pdf
I want to merge 2 images. How can i remove the same area between 2 images?
Can you tell me an algorithm to solve this problem. Thanks.
Two image are screenshoot image. They have the same width and image 1 always above image 2.
When two images have the same width and there is no X-offset at the left side this shouldn't be too difficult.
You should create two vectors of integer and store the CRC of each pixel row in the corresponding vector element. After doing this for both pictures you find the CRC of the first line of the lower image in the first vector. This is the offset in the upper picture. Then you check that all following CRCs from both pictures are identical. If not, you have to look up the next occurrence of the initial CRC in the upper image again.
After checking that the CRCs between both pictures are identical when you apply the offset you can use the bitblit function of your graphics format and build the composite picture.
I haven't come across something similar before but I think the following might work:
Convert both to grey-scale.
Enhance the contrast, the grey box might become white for example and the text would become more black. (This is just to increase the confidence in the next step)
Apply some threshold, converting the pictures to black and white.
afterwards, you could find the similar areas (and thus the offset of overlap) with a good degree of confidence. To find the similar parts, you could harper's method (which is good but I don't know how reliable it would be without the said filtering), or you could apply some DSP operation(s) like convolution.
Hope that helps.
If your images are same width and image 1 is always on top. I don't see how that hard could it be..
Just store the bytes of the last line of image 1.
from the first line to the last of the image 2, make this test :
If the current line of image 2 is not equal to the last line of image 1 -> continue
else -> break the loop
you have to define a new byte container for your new image :
Just store all the lines of image 1 + all the lines of image 2 that start at (the found line + 1).
What would make you sweat here is finding the libraries to manipulate all these data structures. But after a few linkage and documentation digging, you should be able to easily implement that.
I am reading the official WebP lossless bitstream spec. and I have a feeling, that the document is missing some explanation.
Let me describe some fragments of the specification:
1. Introduction - clear
2. Riff header - clear
3. Transformations
The transformations are used only for the main level ARGB image: the
subresolution images have no transforms, not even the 0 bit indicating
the end-of-transforms.
Nowhere earlier was it mentioned, that the container holds some sub-resolution images. What are they? Where are they described, if not in the specification? How to they add to the final image?
Then, in the Predictor transform paragraph:
We divide the image into squares...
..what image? The main image or sub-resolution image? What if the image cannot be divided into squares (apart from pixel-size squares)?
The first 4 bits of prediction data define the block width and height
in number of bits. The number of block columns, block_xsize, is used
in indexing two-dimensionally.
Does this mean that the image width is block_xsize * block_width ?
The transform data contains the prediction mode for each block of the image.
In what way, what format?
I dont know why I am having a hard time understanding this. Maybe because I am not a native english speaker or because the description is too laconic.
I'd appreciate any help in decoding this specification :)
It was mentioned earlier. Right at the top of the document it says:
The format uses subresolution images, recursively embedded into the
format itself, for storing statistical data about the images, such as
the used entropy codes, spatial predictors, color space conversion,
and color table.
These are arrays (or a vector in the case of the color table) of data where each element applies to a block of pixels in the actual image, e.g. a 16x16 block. These "subresolution images" are not themselves subsamples of the image being compressed.
The format description calls them images because they are stored exactly like the main image is in the format. The transforms are instructions to the decoder to apply to the decompressed main image data. The entropy image is used to decompress the main image, by virtue of providing the Huffman codes for each block.
I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.
I've run into some nasty problem with my recorder. Some people are still using it with analog tuners, and analog tuners have a tendency to spit out 'snow' if there is no signal present.
The Problem is that when noise is fed into the encoder, it goes completely crazy and first consumes all CPU then ultimately freezes. Since main point od the recorder is to stay up and running no matter what, I have to figure out how to proceed with this, so encoder won't be exposed to the data it can't handle.
So, idea is to create 'entropy detector' - a simple and small routine that will go through the frame buffer data and calculate entropy index i.e. how the data in the picture is actually random.
Result from the routine would be a number, that will be 0 for completely back picture, and 1 for completely random picture - snow, that is.
Routine in itself should be forward scanning only, with few local variables that would fit into registers nicely.
I could use zlib or 7z api for such task, but I would really want to cook something on my own.
Any ideas?
PNG works this way (approximately): For each pixel, replace its value by the value that it had minus the value of the pixel left to it. Do this from right to left.
Then you can calculate the entropy (bits per character) by making a table of how often which value appears now, making relative values out of these absolute ones and adding the results of log2(n)*n for each element.
Oh, and you have to do this for each color channel (r, g, b) seperately.
For the result, take the average of the bits per character for the channels and divide it by 2^8 (assuming that you have 8 bit per color).