One of my major projects is a display library for microcontrollers. As part of it I have a collection of fonts (bitmap) and icons (alpha channel).
Since resources (flash memory and RAM) are limited in microcontrollers I am looking at better ways of storing the data for these fonts and icons.
I am leaning towards using a separated-plane arrangement for the data (like ILBM on the Amiga used) - that is, instead of storing all the bits for each pixel together you store all the first bits for the entire image together, followed by the second bits, etc. That becomes more efficient for working with image depths that aren't a power-of-two (have you tried packing 3 bit data into an 8 bit data stream?).
I'd also then like to compress each of those bitplanes. RLE seems to be the most sensible. However, since I am now working with streams of bits, and not integer numbers, I am wondering what the best way of implementing the RLE would be.
I could stick to the traditional method of treating the bits in blocks of 8 and looking for repeated bytes (2 or more the same, replace with 2 the same followed by the count of how many in the run), but I can't see that being that great when it comes to the bit-wise data that would comprise one single bitplane. (Incidentally, ILBM uses a variety of this byte-wise method - treating the data purely as bytes and repeating them as necessary with "header" bytes defining how the next byte(s) are to be treated).
An alternative would be to use an alternating-bit-count method. That is, start assuming the bit is 0, and record the number of that bit in the run. Then switch to 1 and record the number of 1 bits in the run. Then switch back to 0 again and record the number of bits. Etc.
Again, great if you have long runs of the same bit, but as soon as you get a rapid alternation of bits you end up with a massive increase in space taken up (8 bits, say 01010101, could end up as 8 bytes of [1,1,1,1,1,1,1,1]).
The main caveat for anything here is that it has to be efficient - both in CPU to decompress it, and in memory to hold any working buffers while it decompresses. That's why I am thinking RLE rather than any of the other methods.
So I guess I'm looking for the ideas that I have missed. What would be the best implementation for compressing a stream of single bits and representing that compressed data in a byte-centric system?
An example glyph (decimal):
00 00 02 14 03 00 00 00
00 00 09 13 10 00 00 00
00 00 13 05 13 00 00 00
00 05 12 00 12 06 00 00
00 11 15 15 15 11 00 00
00 14 02 00 01 14 00 00
08 12 00 00 00 12 08 00
11 07 00 00 00 07 12 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
Bitplanes 0-3 would, therefore, be
0 1 2 3
00001000 00111000 00010000 00010000
00111000 00001000 00010000 00111000
00111000 00000000 00111000 00101000
01000000 00000100 01101100 00101000
01111100 01111100 00111000 01111100
00001000 01100100 01000100 01000100
00000000 00000000 01000100 11000110
11000100 11000100 01000110 10000010
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
However a glyph this size I would be unlikely to even attempt to compress. It's small enough to be pointless. However, it illustrates the layering of the bitplanes and how the bitstreams would look in relation to the original data.
The problem of the compression of bitmap images has been the subject of research for decades, so you could just use the result of that, which is JBIG2. You can google for open source JBIG2 code.
Related
is it posible to generate wav file in python with 24bit deep and sample width 4, not 3 (3x8=24). The idea is to have 32bit deep, so that sample width of 4 (4x8=32) can be made, but i would try to make upper bits all ones (1), so that it looks like 24bit deep.
Im open to suggestions.
Thank you.
I found solution. The trick was to make wav file the same way as you would make it for 32 bit depth, but set LOWER(not upper) 8 bits(LSBs) to zeros. So in hex format you would have 00 xx xx 00 xx xx ... where xx are some hex numbers.
I have a set of 2D points stored in an array. I need to compress it as much as I can. Preferably fast, but not a deal breaker, compression rate is the goal. The rules are:
a point = a 32-bit structure, stored as (x,y), 2 bytes for each coordinate
a coordinate = a "float" with 8 bits integer part, 8 bits fractional part
Special properties:
I may change the order of the points as I see fit
I'm given the points in the order of the integer parts of their x and y, maybe I can exploit that, but the fractional parts are pretty much random from what I saw
the array I receive is contiguous (from a memory standpoint)
What I've researched so far:
consider them plain integers (32 bit), sort them (the order is mine to choose), then compress it as in this question.
consider my array as a plain char string, then apply a Burrows-Wheeler transform (BWT) with run-length encoding, or Huffman
treat my array as plain binary data, then apply an LZW
I only managed to implement Huffman and BWT, but neither of them gives me a good compression ratio (or use the main property of my data set). I'm going to try the first option today.
I'm sure there are better ideas. Do you have any? Did you come across anything similar and implemented something really good?
Dataset example, in hex:
00 0A 00 77 00 55 00 80 00 2B 00 B9 00 7A 00 5B
00 F6 00 76 00 B4 00 25 00 47 00 D3 00 F6 00 7D
...
01 05 00 A9 01 B8 00 10 01 4F 00 6A 01 E6 00 DF
01 1F 00 F0 01 BE 00 C3 01 6C 00 87 01 CE 00 44
...
...
15 06 03 F4 15 1E 03 29 15 35 03 10 15 B9 03 22
15 67 03 73 15 EF 03 5C 15 29 03 B8 15 4C 03 2F
...
where e.g. the particle 15 67 03 73 (last row) means particle at x = 15 and 67/256, y = 3 and 73/256. As you see, the data is somewhat ordered but the fractional parts are in a total disarray.
First option from OP is more appropriate. But it may be improved further.
Reinterpret coordinates as 16-bit integers.
Transform point positions into distances along Hilbert curve (or any other Space-filling curve).
Sort distances, then apply delta-encoding (compute differences of adjacent distances).
Depending on compression/speed preferences, (a) use something like Elias or Golomb codes (fastest), (b) use Huffman encoding, or (c) use something like arithmetic encoding (best compression rate).
If there is some pattern in points' distribution, you could try more advanced compressors for step #4: LZ*, BWT, or PPM.
Here are results of experimental comparison for methods used in step 4. Worst-case scenario is assumed: points are randomly uniformly distributed in range 00.00 .. FF.FF (so that the only compression possibility is to lose information about their ordering). All results are computed for 250000 points:
method compressed size
------ ---------------
Uncompressed: 1000000
Elias4: 522989
Elias3: 495371
Elias2: 505376
Golomb12: 479802
Golomb13: 472238
Golomb14: 479431
Golomb15: 501422
FSE1: 455367
FSE2: 454120
FSE3: 453862
I didn't try Huffman encoding. FSE is a method similar to arithmetic coding. Numbers after method name show configuration parameters: for Elias coding - how many bits are used to encode each number's bitlength, for Golomb coding - how many least significant bits are left uncompressed, for FSE - how many most significant bits are compressed (along with bitlength).
All the results were produced by this source: http://ideone.com/ulmPzO
Interleave the bits representing the X and Y coordinates of every point, sort and compress.
For instance, if you have the point (X, Y) represented by the two 16 bit numbers
(X15X14X13X12X11X10X9X8X7X6X5X4X3X2X1X0, Y15Y14Y13Y12Y11Y10Y9Y8Y7Y6Y5Y4Y3Y2Y1Y0)
Convert it into the following 32 bit number:
X15Y15X14Y14X13Y13X12Y12X11Y11X10Y10X9Y9X8Y8X7Y7X6Y6X5Y5X4Y4X3Y3X2Y2X1Y1X0Y0
That would take advantage of any clustering that may appear in the data, as near physically near points will appear in near positions on the sorted list and their representations share their head bits.
Update: The point is to have near points sort in near positions. If you mix X and Y bits, you get that, resulting on long sequences of 32bit integers which identical values in its head bits. If you then do deltas, you will have smaller values that if you just sort on X and then on Y (or vice versa).
The thing is that then, you can consider it as a k-d tree, every bit partitions the space (left/right or up/down). For the first levels, you can compress then just saying how many elements there are at one side until you get to the poles with just a few elements that can be represented by stating the remaining few bits explicitly. For the best compression you will have to use arithmetic coding.
From the sample image below, I have a border in yellow just for display purposes only.
The actual .png file is a simple black/white image 3 pixels by 3 pixels. I was originally thinking to try as a 2x2, but that would not help trying to interpret low/hi vs hi/low drawing stream. At least this way, I would have two black, one white from the top, or one white, two black from the bottom..
So I read the chunks of data, get to the IDAT chunk, decode that (zlib) and come up with 12 bytes as follows
00 20 00 40 00 80
So, my question, how does the above get broken down into the 3x3 black and white sample... Also, it is saved in palette format and properly recognizes the bit depth of 1 and color palette of 2... color pallet[0] is RGBA all zeros. Palette1 has RGBA of 255, 255, 255, 0
I'll eventually get into the multiple other depth formats later, just wanted to start with what would expect to be the easiest.
Part II. Any guidance on handling the other depth formats would help if anything special to be considered especially regarding alpha channel (which I am already looking for in the palette) that might trip me up.
It wouuld be easier if you use libpng, so I guess this is for learning purposes.
The thing is if you decompress the IDAT chunk directly, you get some data that is not supposed to be displayed and/or may need to be transformed (because a filter was applied) to get the actual bytes. In PNG format each line starts with an extra byte that tells you which filter was applied to that line, the remaining bytes contain the line pixels.
BTW, 00 20 00 40 00 80 are 6 bytes only (not 12, as you think). Now if you see this data as binary, your 3 lines would look like this:
00000000 00100000
00000000 01000000
00000000 10000000
Now, your image is 1 bit per pixel, so 1 byte is required to save a line of 3 pixels. The 3 highest bits are actually used (the 5 lower bits are ignored). I replaced the ignored bits with a x, so I think is easier to see the actual pixels (0 is black, 1 is white):
00000000 001xxxxx
00000000 010xxxxx
00000000 100xxxxx
In this case, no filter was applied to any line, because the first byte of each line is zero (0 means no filter applied, values from 1 to 4 means a filter was applied).
An H264 file is a stream of NAL (Network Abstraction Layer) units, each encoding a frame (I, B, or P). What is the best way to parse this file and to extract sizes and detect ends of each NAL unit in the file, as well as detect the type of frame the NAL unit contains?
If you're not actually trying to decode the frames, you can write a simple 'parser' by reading the h.264 byte stream and looking for NAL unit signature.
Here's what you need to know:
NAL Units start code: 00 00 01 X Y
X = IDR Picture NAL Units (e.g 25, 45, 65)
Y = Non IDR Picture NAL Units (e.g. 01, 21, 41, 61)
So, if you find 3 bytes [00 00 01] in sequence, very likely it's the beginning of the NAL unit. Then you will need to parse the the next two bytes [X Y] to find out the type of frame. Please refer to the spec for more details.
When I use Photoshop Save As function, and pick jpeg file format I get following window:
As you can see, I select Baseline ("Standard") format, and maximum picture quality. When I open this picture in Hex editor, I see several FF DB markers (which is start of Quantization table/s). No problem yet, but lets look next picture:
As you can see form picture above, at address row BDA starts FFDB marker. First two bytes are 00 84 which means that this marker holds 132 bytes of data. Doing some math, we can conclude that two Quantization tables are defined by this marker. Values of first table are: 0C 08 08 08 09 etc...
In the same file, there is another FFDB marker, starting from 2885 address row, as you can see in picture:
Again, value of first two bytes are 00 84 which means 132 bytes of data. But this time, first Quantization table values are: 01 01 01 etc...
How to know which of FF DB markers should I use, and why there is several FFDB markers in file?
Without seeing the entire file, it's hard to say with certainty, but it looks like your first quantization table is for an embedded thumbnail which is compressed with a lower quality. The second quantization table is for the main image and has values of 01,01,01,... because you chose the highest quality and therefore the coefficient values are quantized the least possible amount.