Jpeg Compression fundamentals - compression

I was reading Jpeg compression but I have some problem understanding the basics !
pls see this schema
http://www.cs.cf.ac.uk/Dave/Multimedia/Topic5.fig_29.gif
My problem is in the last steps,consider we have a 16*16 pixel gray image ,so we have 4 blocks of size 8*8. in the zigzag scan we have 4 arrays of size 1*64 which the first index of each array is the DC value and the remaining 63 values are AC components. let's assume the are like;
BLOCK-1::150,-1, 6, 0,-3,....
BLOCK-2:-38, 4,-6,-1, 1,....
BLOCK-3:18,-2,3,4,1,....
BLOCK-4:45,3,5,-1,1,....
I know the DPCM encode the difference from previous 8*8 blocks but how ?! somthing like this :
150,150-(-38),-38-18,45-18>>
150,188,-156,27
then according to JPEG coefficient coding table we have
10010110-111110,10111100-111110,01100011-111110,11011-110
and for the AC component of, (for example), the first row (-1, 6, 0,-3,....)we use RLE so we have:
(0,-1),(0,6),(1,-3),...
then according to JPEG default AC code table we have :
00-0,100-110,111001-10
and if my calculations are correct what happens next ?! we put the first DC of the first block and after that the RLE of 63 remaining values and so on ? I mean for the first block we have 10010110-111110 ,00-0,100-110,111001-10, ...
I'm a bit confused and I couldn't find the answer anywhere :(

First of all I greatly recommend you to refer to jpec a tiny JPEG encoder written in C (grayscale only, baseline DCT-based JPEG, 8x8 blocks only).
You can find the main compression steps here. In particular, please refer to this line that corresponds to the entropy coding step of the current block.
the DPCM encode the difference from previous 8*8 blocks but how?
The entropy coding operates block after block. Assuming the current block is not empty, the DC coefficient encoding is done by first computing the difference between the current and previous DC values:
int val, bits, nbits;
/* DC coefficient encoding */
if (block->len > 0) {
val = block->zz[0] - s->dc;
s->dc = block->zz[0];
}
Note: s represent the entropy coder state. Also, for the first block, s->dc is initialized to 0.
So val represents the current DC difference:
the size of this difference (= number of bits) is encoded by reading the corresponding DC code in the Huffman table,
then, its amplitude (= value) is encoded.
If the difference is negative, two's complement is used.
bits = val;
if (val < 0) {
val = -val;
bits = ~val;
}
JPEC_HUFF_NBITS(nbits, val);
jpec_huff_write_bits(s, jpec_dc_code[nbits], jpec_dc_len[nbits]); /* (1) */
if (nbits) jpec_huff_write_bits(s, (unsigned int) bits, nbits); /* (2) */
For the full version please refer to this code section.

Related

Arduino eInk Image2LCD - Size of c-array

This Image2LCD software (https://www.buydisplay.com/default/image2lcd) converts images to c-arrays. I want to write this basic operation myself, but I dont understand why the software outputs an array of length 5000 for an input image of size 200x200. For 400x400 the array size is 20000. It seems like its always 1/8 of the number of pixels.
The output array for the square 200x200 image begins and ends like this:
const unsigned char gImage_test[5000] = { /* 0X00,0X01,0XC8,0X00,0XC8,0X00, */
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X60,0X00,0X00,0X00,0X00,
0X3C,0X60,0X00,0X0C,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X70,0X00,0X00,0X00,0X00,0X7E,0X70,0X00,0X0E,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X78,0X00,0X00,
0X00,0X00,0X7F,0X78,0X00,0X0F,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X7F,0XFC,0X3C,0X3E,0X3C,0X3F,0XF8,0X3C,0X7F,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X7F,
...
,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,
0X00,0X00,0X00,0X00,0X00,0X00,0X00,0X00,};
(Yes there is a lot of white in the image.)
Why don't you need one value for each pixel?
Shooting from the hip here, but if you're using monochrome, you only need one bit per pixel (Byte = 8 bits). These bits can be packed into bytes for storage efficiency. Say the first 8 pixels of your image are these:
0 1 0 0 0 0 0 1
If we interpret these eight bits as one binary number, this is 1000001, which is 65 in decimal - so just storing 65 in an 8-bit integer, taking up only one byte, will store all 8 monochrome pixels. The downside is that it's not as intuitive as having each pixel as a separate value in the array.
I may be wrong, but 1/8th points straight to this kind of compression.

Combining two part of a word with a part of another word and using a scale factor

I am trying to extract data from a GPS receiver. The way they transmit their information is shown in the following figure.
I am trying to get roota. I have word 8 and word 9 in seperate bitsets. How do I combine the relevant bits into one bitset?
I also need to use this scale factor information to get a double precision number. The scale factor information is shown below.
I have tried
std::bitset<32> word8_binary; // is filled with word 8 data.
std::bitset<32> word9_binary; // is filled with word 9 data.
std::bitset<32> roota_binary;
for (int i=13; i>5; i--)
{
roota_binary = word8_binary[i] << 32 | word9_binary[i];
}
But this is not giving me the result I want. I also don't know how to sue the scale factor.
Your help would be much appreciated.
This might not be "c++" enough, but I'd do bit manipulation on the corresponding integers:
ulong roota_binary = (((word8_binary.to_ulong() >> 6) & 0xFF) <<24) +
((word9_binary.to_ulong() >>6) & 0xFFFFFF);
The >>6 gets rid of the parity bits, the masks isolate the bits you need, and the <<24 puts the high ones on top.

C++ - Reading number of bits per pixel from BMP file

I am trying to get number of bits per pixel in a bmp file. According to Wikipedia, it is supposed to be at 28th byte. So after reading a file:
// Przejscie do bajtu pod ktorym zapisana jest liczba bitow na pixel
plik.seekg(28, ios::beg);
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
cout << "liczba bitow " << liczbaBitow << endl;
But liczbaBitow (variable that is supposed to hold number of bits per pixel value) is -859045864. I don't know where it comes from... I'm pretty lost.
Any ideas?
To clarify #TheBluefish's answer, this code has the bug
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
When you use (char*)&libczbaBitow, you're taking the address of a 4 byte integer, and telling the code to put 2 bytes there.
The other two bytes of that integer are unspecified and uninitialized. In this case, they're 0xCC because that's the stack initialization value used by the system.
But if you're calling this from another function or repeatedly, you can expect the stack to contain other bogus values.
If you initialize the variable, you'll get the value you expect.
But there's another bug.. Byte order matters here too. This code is assuming that the machine native byte order exactly matches the byte order from the file specification. There are a number of different bitmap formats, but from your reference, the wikipedia article says:
All of the integer values are stored in little-endian format (i.e. least-significant byte first).
That's the same as yours, which is obviously also x86 little endian. Other fields aren't defined to be little endian, so as you proceed to decode the image, you'll have to watch for it.
Ideally, you'd read into a byte array and put the bytes where they belong.
See Convert Little Endian to Big Endian
int libczbaBitow;
unsigned char bpp[2];
plik.read(bpp, 2);
libczbaBitow = bpp[0] | (bpp[1]<<8);
-859045864 can be represented in hexadecimal as 0xCCCC0018.
Reading the second byte gives us 0x0018 = 24bpp.
What is most likely happening here, is that liczbaBitow is being initialized to 0xCCCCCCCC; while your plik.read is only writing the lower 16 bits and leaving the upper 16 bits unchanged. Changing that line should fix this issue:
int liczbaBitow = 0;
Though, especially with something like this, it's best to use a datatype that exactly matches your data:
int16_t liczbaBitow = 0;
This can be found in <cstdint>.

Coding an enhanced LSB reverser

I'm stumbling upon a steganographied image with a divided IDAT structure of 12 blocks (the last LSB slightly smaller) (.PNG). I'll elaborate a bit on the structure of the issue before I get to the real point of my question since I need to clarify some of the things so please do not mark it as off-topic since it is not. I just have to explain the notion behind the script so that I may get to the issue itself. It definitely has embedded data into itself. The data seems to have been concealed by altering the enhanced LSB values eliminating the high-level bits for each pixel except for the last least significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. I've been analyzing this image in many different ways, but don't see anything odd beyond the utter lack of one value in any of the three color values (RGB) and the heightened presence of another value in 1/3 of the color values. Studying these and replacing bytes has given me nothing, however, and I am at a loss to whether this avenue is even worth pursuing.
Hence, I'm looking into developing a script in rather Python, PHP or C/C++ that would reverse the process and 'restore' the enhanced LSBs.
I've converted it to a 24-bit .BMP and tracking down the red curve from a chi-square steganalysis, it's certain that there is a steganographied data within the file.
First, there is a little bit more than 8 vertical zones. Which means that there is hidden data little bit more than 8kB. One pixel can be used to hide three bits (one in the LSB of each RGB color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. But that ain't the case here.
The analisys of the APPO and APP1 markers of a .JPG extension of the file also give some awkward outputs:
Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
OFFSET: 0x00000000
*** Marker: APP0 (xFFE0) ***
OFFSET: 0x00000002
length = 16
identifier = [JFIF]
version = [1.1]
density = 96 x 96 DPI (dots per inch)
thumbnail = 0 x 0
*** Marker: APP1 (xFFE1) ***
OFFSET: 0x00000014
length = 58
Identifier = [Exif]
Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]
Endian = Motorola (big)
TAG Mark x002A = x[002A]
EXIF IFD0 # Absolute x[00000026]
Dir Length = x[0003]
[IFD0.x5110 ] =
[IFD0.x5111 ] = 0
[IFD0.x5112 ] = 0
Offset to Next IFD = [00000000]
*** Marker: DQT (xFFDB) ***
Define a Quantization Table.
OFFSET: 0x00000050
Table length = 67
----
Precision=8 bits
Destination ID=0 (Luminance)
DQT, Row #0: 2 1 1 2 3 5 6 7
DQT, Row #1: 1 1 2 2 3 7 7 7
DQT, Row #2: 2 2 2 3 5 7 8 7
DQT, Row #3: 2 2 3 3 6 10 10 7
DQT, Row #4: 2 3 4 7 8 13 12 9
DQT, Row #5: 3 4 7 8 10 12 14 11
DQT, Row #6: 6 8 9 10 12 15 14 12
DQT, Row #7: 9 11 11 12 13 12 12 12
Approx quality factor = 94.02 (scaling=11.97 variance=1.37)
I'm nearly convinced that there is no encryption algorithm applied therefore no key implementation follows the concealment. My notion is that of coding a script that would shift the LSB values and return the originals. I've ran the file under several structure analyses, statistical attacks, BPCS,
The histogram of the image shows a specific color with an unusual spike to it. I've manipulated that as best I can to try and view any hidden data, but to no avail. Those are the histograms of the RGB values as follows:
Then there are the multiple IDAT chunks. But, I've put together a similar image by defining random color values at each pixel location, and I too wound up with several of these. So far, I've also found very little inside them. Even more interesting, is the way that color values are repeated in the image. It seems, that the frequency of reused colors could hold some clue. But, I have yet to fully understand that relationship, if one exists. Additionally, there is only a single column and a single row of pixels that do not possess a full value of 255 on their alpha channel. I've even interpreted the X, Y, A, R, G, and B values of every pixel in the image as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs cannot tell us anything. There is no evident break. Here are several other histograms which show the weird curve of the blue value from the RGB:
But the red curve, the output of the chi-square analysis, shows some difference. It can see something that we cannot see. Statistical detection is more sensitive than our eyes, and I guess that was my final point. However, there is also a sort of latency in the red curve. Even without hidden data, it starts at maximum and stays like that for some time. It's close to a false positive. It looks like the LSB in the image and is very close to random, and the algorithm needs a large population (remember the analysis is done on an incrementing population of pixels) before reaching a threshold where it can decide that actually, they are not random after all, and the red curve starts to go down. The same sort of latency happens with hidden data. You hide 1 or 2 kb, but the red curve does not go down right after this amount of data. It waits a little bit, here respectively at around 1.3 kb and 2.6 kb. Here is a representation of the data types from a hex editor:
byte = 166
signed byte = -90
word = 40,358
signed word = -25,178
double word = 3,444,481,446
signed double word = -850,485,850
quad = 3,226,549,723,063,033,254
signed quad = 3,226,549,723,063,033,254
float = -216652384.
double = 5.51490063721e-093
word motorola = 42,653
double word motorola = 2,795,327,181
quad motorola = 12,005,838,827,773,085,484
Here's another spectrum to confirm the behavior of the blue (RGB) value.
Please note that I needed to go through all of this in order to clarify the situation and the programming matter that I'm in pursuit of. This by itself makes my question NOT off-topic so I'd be glad if it doesn't get marked as such. Thank you.
In case of an image with LSB enhancement applied, I cannot think of a way to reverse it back to its original state because there is no clue about the original values of RGBs. They are set to either 255 or 0 depending on their Least Significant Bit. The other option I see round here is if this is some sort of protocol to include quantum steganography.
Matlab and some steganalysis techniques could be the key to your issue though.
Here's a Java chi-square class for some statistical analysis:
private long[] pov = new long[256];
and three methods as
public double[] getExpected() {
double[] result = new double[pov.length / 2];
for (int i = 0; i < result.length; i++) {
double avg = (pov[2 * i] + pov[2 * i + 1]) / 2;
result[i] = avg;
}
return result;
}
public void incPov(int i) {
pov[i]++;
}
public long[] getPov() {
long[] result = new long[pov.length / 2];
for (int i = 0; i < result.length; i++) {
result[i] = pov[2 * i + 1];
}
return result;
or try with some bitwise shift operations as:
int pRGB = image.getRGB(x, y);
int alpha = (pRGB >> 24) & 0xFF;
int blue = (pRGB >> 16) & 0xFF;
int green = (pRGB >> 8) & 0xFF;
int red = pRGB & 0xFF;

How can I assign RGB color codes to a WORD?

I am assigning the color values to the display frame buffer, and that buffer pointer return type is BYTE. But i am not able to assign the RGB color value into it. This i am doing to set the pixel location using directdraw on WINCE platform .Here is the snapshot code.
BYTE* pDisplayMemOffset = (BYTE*) ddsd.lpSurface;
int x = 100;
int y = 100;
pDisplayMemOffset += x*ddds.lXPitch + y*ddds.lPitch ;
***(WORD*)pDisplayMemOffset = 0x0f00;
But how i can assign RGB(100,150,100) combination in this, i have tried to put DWORD instead of WORD while assigment but it desnt work. i knw i required hex value for color in 0x000000 format(RGB), but i think BYTE cnt store such large value into it.
Can anyone tell me how to do this?
How this assignment can be done is very dependent on the pixel-format you specified when acquiring ddsd. See the field ddpfPixelFormat and also specifically in there: dwRGBBitCount.
Maybe you can provide this pixel format information so that i can improve my answer. However, i can easily give you an example of how you do this pixel-color assignment if e.g. the pixel-format is:
[1 byte red] [1 byte green] [1 byte blue] [1 byte unused]
Here's the example:
*(pDisplayMemOffset+0) = 0x10;// asigning 0x10 to the red-value of first pixel
*(pDisplayMemOffset+1) = 123; // asigning 123 to green-value of first pixel
// (no need for hex)
*(pDisplayMemOffset+4) = 200; // asigning 200 to red-value of second pixel
// (BYTE is unsigned)
If you have to extract the color values from an integer it largely depends on which byte-ordering and color-ordering that integer was given in, but you can try it out easily.
First i would try this:
*(((unsigned int*)pDisplayMemOffset)+0) = 0x1A2A3A4A
*(((unsigned int*)pDisplayMemOffset)+1) = 0x1B2B3B4B
If this works, then the pixel-format had either an unused 4th byte (like my example above) or an alpha-value that is now set to one of the values. Again: aside from the pixel-format also the ordering of the bytes in your integer decides whether this directly works or whether you have to do some byte-swapping.