Recovering image header from a corrupt PNG - crc

I'm working on a pentesting challenge in which a corrupt PNG is provided with the eight dimension bytes in the IHDR chunk all set to zero. The CRC checksum and the rest of the file is still intact. I was wondering whether there is a way to recover the image dimensions by somehow reversing the CRC, since as I understand it, the CRC is calculated from the chunk's bytes. If this is not possible, is there some other way to find the dimensions based on the image data? Any help would be much appreciated.

In general, no, you cannot recover 62 bits of information from a 32-bit CRC. (It's 62 and not 64 because the specification limits the range of each to 1..231.)
However, if you assume that the image width and height are both constrained to fit in 16 bits each, say 1..65535, then it can be done with just the CRC. spoof will do this for you, where you provide those bit locations and the exclusive-or of the CRC of the header that's there, with the zeroed-out width and height, and the CRC stored in the header, asserted to be the CRC of the header when it contained the original width and height. spoof does this by solving the linear equations over GF(2) of 32 equations in 32 unknowns.
This can be solved in general for very large images if you also make use of the image data, asserted to be intact. Using the rest of the header information and decoding the image data, you would factor the integer number of total bytes in the decompressed data into its prime decomposition. You will then have a small number of possible factorizations into rows and columns. You can try each of those back in the header to see which matches the CRC. Some may be ruled out even before checking the CRC, since the number of bytes in a row has to be one plus a multiple of the bytes per pixel, e.g. three for an RGB image or four for RGBA. (In fact, for the corrupted image originally provided in the question, there is only one factorization that meets that constraint, which is the answer.)
In the incredibly unlikely case that more than one of those matches the CRC, you can use each to decompress the image and see which one looks right. The others will likely look terribly skewed, like an old television that is unable to lock onto the horizontal sync from the received signal.

Related

DCT based Video Encoding Process

I am having some issues that I am hoping you will be able to clarify. I have self taught myself a video encoding process similar to Mpeg2. The process is as follows:
Split an RGBA image into 4 separate channel data memory blocks. so an array of all R values, a separate array of G values etc.
take the array and grab a block of 8x8 pixel data, to transform it using the Discrete Cosine Transform (DCT).
Quantize this 8x8 block using a pre-calculated quantization matrix.
Zigzag encode the output of the quantization step. So I should get a trail of consecutive numbers.
Run Length Encode (RLE) the output from the zigzag algorithm.
Huffman Code the data after the RLE stage. Using substitution of values from a pre-computed huffman table.
Go back to step 2 and repeat until all the channels data has been encoded
Go back to step 2 and repeat for each channel
First question is do I need to convert the RGBA values to YUV+A (YCbCr+A) values for the process to work or can it continue using RGBA? I ask as the RGBA->YUVA conversion is a heavy workload that I would like to avoid if possible.
Next question. I am wondering should the RLE store runs for just 0's or can that be extended to all the values in the array? See examples below:
440000000111 == [2,4][7,0][3,1] // RLE for all values
or
440000000111 == 44[7,0]111 // RLE for 0's only
The final question is what would a single symbol be in regard to the huffman stage? would a symbol to be replaced be a value like 2 or 4, or would a symbol be the Run-level pair [2,4] for example.
Thanks for taking the time to read and help me out here. I have read many papers and watched many youtube videos, which have aided my understanding of the individual algorithms but not how they all link to together to form the encoding process in code.
(this seems more like JPEG than MPEG-2 - video formats are more about compressing differences between frames, rather than just image compression)
If you work in RGB rather than YUV, you're probably not going to get the same compression ratio and/or quality, but you can do that if you want. Colour-space conversion is hardly a heavy workload compared to the rest of the algorithm.
Typically in this sort of application you RLE the zeros, because that's the element that you get a lot of repetitions of (and hopefully also a good number at the end of each block which can be replaced with a single marker value), whereas other coefficients are not so repetitive but if you expect repetitions of other values, I guess YMMV.
And yes, you can encode the RLE pairs as single symbols in the huffman encoding.
1) Yes you'll want to convert to YUV... to achieve higher compression ratios, you need to take advantage of the human eye's ability to "overlook" significant loss in color. Typically, you'll keep your Y plane the same resolution (presumably the A plane as well), but downsample the U and V planes by 2x2. E.g. if you're doing 640x480, the Y is 640x480 and the U and V planes are 320x240. Also, you might choose different quantization for the U/V planes. The cost for this conversion is small compared to DCT or DFT.
2) You don't have to RLE it, you could just Huffman Code it directly.

Why JPEG compression processes image by 8x8 blocks?

Why JPEG compression processes image by 8x8 blocks instead of applying Discrete Cosine Transform to the whole image?
8 X 8 was chosen after numerous experiments with other sizes.
The conclusions of experiments are:
1. Any matrices of sizes greater than 8 X 8 are harder to do mathematical operations (like transforms etc..) or not supported by hardware or take longer time.
2. Any matrices of sizes less than 8 X 8 dont have enough information to continue along with the pipeline. It results in bad quality of the compressed image.
Because, that would take "forever" to decode. I don't remember fully now, but I think you need at least as many coefficients as there are pixels in the block. If you code the whole image as a single block I think you need to, for every pixel, iterate through all the DCT coefficients.
I'm not very good at big O calculations but I guess the complexity would be O("forever"). ;-)
For modern video codecs I think they've started using 16x16 blocks instead.
One good reason is that images (or at least the kind of images humans like to look at) have a high degree of information correlation locally, but not globally.
Every relatively smooth patch of skin, or piece of sky or grass or wall eventually ends in a sharp edge and is replaced by something entirely different. This means you still need a high frequency cutoff in order to represent the image adequately rather than just blur it out.
Now, because Fourier-like transforms like DCT "jumble" all the spacial information, you wouldn't be able to throw away any intermediate coefficients either, nor the high-frequency components "you don't like".
There are of course other ways to try to discard visual noise and reconstruct edges at the same time by preserving high frequency components only when needed, or do some iterative reconstruction of the image at finer levels of detail. You might want to look into space-scale representation and wavelet transforms.

MagickQuantizeImage usage

I am processing some images using ImageMagick library. As part of the processing I want to minimize the number of colors if this doesn't affect image quality (too much).
For this I have tried to use MagickQuantizeImage function. Can someone explain me whow should I choose the parameters ?
treedepth:
Normally, this integer value is zero or one. A zero or one tells Quantize to choose a optimal tree depth of Log4(number_colors).% A tree of this depth generally allows the best representation of the reference image with the least amount of memory and the fastest computational speed. In some cases, such as an image with low color dispersion (a few number of colors), a value other than Log4(number_colors) is required. To expand the color tree completely, use a value of 8.
dither:
A value other than zero distributes the difference between an original image and the corresponding color reduced algorithm to neighboring pixels along a Hilbert curve.
measure_error:
A value other than zero measures the difference between the original and quantized images. This difference is the total quantization error. The error is computed by summing over all pixels in an image the distance squared in RGB space between each reference pixel value and its quantized value.
ps: I have made some tests but sometimes the quality of images in severely affected, and I don't want find a result by trial and error.
This is a really good description of the algorithm
http://www.imagemagick.org/www/quantize.html
They are referencing the command-line version, but the concepts are the same.
The parameter measure_error is meant to give you an indication of how good an answer you got. Set to non-zero, then look at the Image object's mean_error_per_pixel field after you quantize to see how good a quantization you got.
If it's not good enough, increase the number of colors.

detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.
Problem
I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)
If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)
Current Approach
I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.
I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.
Advice on how to map a group of pcm samples to relative volume would be key.
Thanks!
PCM is a time frame base encoding of sound. For each time frame, you get a peak level. (If you want a physical reference for this: The peak level corresponds to the distance the microphone membrane was moved out of it's resting position at that given time.)
Let's forget that PCM can uses unsigned values for 8 bit samples, and focus on
signed values. If the value is > 0, the membrane was on one side of it's resting position, if it is < 0 it was on the other side. The bigger the dislocation from rest (no matter to which side), the louder the sound.
Most voice classification methods start with one very simple step: They compare the peak level to a threshold level. If the peak level is below the threshold, the sound is considered background noise.
Looking at the parameters in Audacity's Silence Finder, the silence level should be that threshold. The next parameter, Minimum silence duration, is obviously the length of a silence period that is required to mark a break (or in your case, the end of a sentence).
If you want to code a similar tool yourself, I recommend the following approach:
Divide your sound sample in discrete sets of a specific duration. I would start with 1/10, 1/20 or 1/100 of a second.
For each of these sets, compute the maximum peak level
Compare this maximum peak to a threshold (the silence level in Audacity). The threshold is something you have to determine yourself, based on the specifics of your sound sample (loudnes, background noise etc). If the max peak is below your threshold, this set is silence.
Now analyse the series of classified sets: Calculate the length of silence in your recording. (length = number of silent sets * length of a set). If it is above your Minimum silence duration, assume that you have the end of a sentence here.
The main point in coding this yourself instead of continuing to use Audacity is that you can improve your classification by using advanced analysis methods. One very simple metric you can apply is called zero crossing rate, it just counts how often the sign switches in your given set of peak levels (i.e. your values cross the 0 line). There are many more, all of them more complex, but it may be worth the effort. Have a look at discrete cosine transformations for example...
Just wanted to update this. I'm having moderate success using Audacity's Silence Finder. However, I'm still interested in this problem. Thanks.
PCM is a way of encoding a sinusoidal wave. It will be encoded as a series of bits, where one of the bits (1, I'd guess) indicates an increase in the function, and 0 indicates a decrease. The function can stay roughly constant by alternating 1 and 0.
To estimate amplitude, plot the sin wave, then normalize it over the x axis. Then, you should be able to estimate the amplitude of the sin wave at different points. Once you've done that, you should be able to pick out the spots where amplitude is lower.
You may also try to use a Fourier transform to estimate where the signals are most distinct.

How can I scale down an array of raw rgb data on a 16 bit display

I have an array of raw rgb data on a 16 bit display with dimension of 320 * 480. The size of the array is 320*480*4 = 6144000.
I would like to know how can I scale this down (80 * 120) without losing image quality?
I found this link about scaling image in 2D array, but how can I apply that to my array of 16 bit display? It is not a 2D array (because of it has 16 bit color).
Image scaling and rotating in C/C++
Thank you.
If you are scaling down a big image to a smaller one, you WILL lose image quality.
The question, then, is how to minimize that loss.
There are many algorithms that do this, each with strengths and weaknesses.
Typically you will apply some sort of filter to your image, such as Bilinear or Nearest Neighbor. Here is a discussion of such filters in the context of ImageMagick.
Also, if the output is going to be less than 16 bits per pixel, you need to do some form of Color Quantization.
I assume that you mean a 16 bit rgb display, not a display that has each color (red, green, and blue) as 16 bits. I also assume you know how your r, g, and b values are encoded in that 16 bit space, because there are two possibilities.
So, assuming you know how to split your color space up, you can now use a series of byte arrays to represent your data. What becomes a tricky decision is whether to go with byte arrays, because you have a body of algorithms that can already do the work on those arrays but will cost you a few extra bits per byte that you may not be able to spend, or to keep everything crammed into that 16 bit format and then do the work on the appropriate bits of each 16 bit pixel. Only you can really answer that question; if you have the memory, I'd opt for the byte array approach, because it's probably faster and you'll get a little extra precision to make the images look smooth(er) in the end.
Given those assumptions, the question is really answerable by how much time you have on your device. If you have a very fast device, you can implement a Lanczos resampling. If you have a less fast device, bicubic interpolation works very well as well. If you have an even slower device, bilinear interpolation is your friend.
If you really have no speed, I'd do the rescaling down in some external application, like photoshop, and save a series of bitmaps that you load as you need them.
There are plenty of methods of scaling down images, but none can guarantee not losing "quality". Ultimately information is lost during the rescaling process.
You have 16bit colors = 2bytes, but in your calculations you use 4 multiplier.
Maybe you don't needed reducing image size?
in general it is impossible to scale raster image without loosing quality. Some algorithms make scaling almost without visible quality loosing.
Since you are scaling down by a factor of 4, each 4x4 block of pixels in your original image will correspond to a single pixel in your output image. You can then loop through each 4x4 block in the original image and then reduce this to a single pixel. A simple way (perhaps not the best way) to do this reduction could be to take the average or median of the RGB components.
You should note that you cannot do image scaling without losing image quality unless for all the blocks in the original image, each pixel is the exact same colour (which is unlikely).