Advanced Data Compression

Advanced Data Compression - compression

My problem is related to data compression: I have 17 bytes of data and each byte has 5 bits of data and 3 unused bits. So I can't figure out: how do I compress this data such that the remaining 3 bits are filled from byte number 0, so that a few of the bytes towards the end become unused and hence reduce space. Any ideas to proceed? Please note: A small chunk of code would also be helpful.

Related

Is there some way to predict png size?

I am implementing clipboard, and I want to allocate memory for png image one time. Is there some way to predict maximum size of png file?

A PNG image includes several things:
Signature and basic metadata (image size and type)
Palette (only if the image is indexed)
Raw pixel data
Optional metadata (ancillary chunks)
End of image chunk
Size of item 1 is fixed: 8 + 12 + 11 = 31 bytes
Size of item 2 (if required) is at most 12 + 3 * 256 = 780 bytes
Size of item 5 is fixed: 12 bytes
Item 3, raw pixels data, is usually the most important one. The filtered-uncompressed data amounts to
FUD=(W*C*8/BPC+1)*H bytes
Where W=width in pixels, H=height in pixels, C=channels (3 if RGB, 1 if palette or grayscale, 4 if RGBA, 2 if GA), BPC=bits per channel (normally 8)
That is compressed with ZLIB. It's practically impossible to bound precisely the worst case compression rate. In practice, one might assume that in the worst case the compressed stream will have a few bytes more than the original.
Then the item 3 size would be approximately bound by (again assuming a fairly small IDAT chunk size of 8192 bytes) by
(FUD + 6)(1 + 12/8192) ~ FUD
Item 4 (ancillary chunk data) is practically impossible to bound.

RLE Encoding bit sequence, not bytes

I need to implement a compression algorithm for binary data, that need to work on embedded constrained devices (256kB ROM, 48 KB RAM).
I'm thinking to the RLE compression algorithm. Unless implementing it from scratch, I've found a lot of C implementations, (for example: http://sourceforge.net/projects/bcl/?source=typ_redirect ), but they apply the RLE algorithm over the byte sequence (the word of the dictionary are 1 to 255, that is 8-bit encoding.
I'm finding for an implementation that, starting from a sequence of bytes, applies the RLE encoding over the bit-sequence corresponding to the input (0 and 1). Note that also another algorithm can work (I need a compression ratio <0.9, so I think any algorithm can do it), but the implementation need to work on a bit-basis, not bytes.
Can anyone help me? Thank you!

I think that you can encode bytes such as 0, 1, 2, 3, 255… etc. (Where lots of 0 and 1)
Let's encode this bit sequence:
000000011111110
1. Shift bits and increment the counter if bit compare to last bit
2. If NOT— shift 111 to output buffer and write control bit 1 before bit sequence
3. If bits can not be packed — write 0 bit and rest of data
Output:
111101110100
To decompress simply shift first control bit:
If 0 — write next bit to output buffer
If 1 — read 3 bits (can be other length) and convert them to the decimal number. Shift next bit which will mean what bit to repeat and start loop to represent the original sequence
But this compression method will work only on files, which have lots of 0 and 255 bytes (00000000 and 11111111 in binary), such as BMP files with black or white background.
Hope I helped you!

Base64 string difference

I'm sending from server to clients base64 strings with decoded coordinates of some objects. There are ~20 objects with x;y pair, 2 bytes per integer. This coords changes with time but some of them saves same values for about 2-3 sending calls.
Any ways to calculate the difference and send it instead of full base64 each time? The network traffic is very important here.
Here is the example of 2 strings made with 100ms pause:
AFg7IP+SAAJg/ana/zAA52BJO/D/9wAxIFkAIABIABQBSADtAFEAMGlLctX/
AFo7IP+SAAJgAKnb/0EA6GBJO/D//wA0IFkAIABIABQBSADtAEoAYmlLctX/

First, pack the data efficiently in bytes, then encode if necessary. As #twall says, you should try hard to eliminate the base 64 because it expands the size of the data by 33% at least. Here is one way to pack it, if there are exactly 20 x,y pairs:
Bytes 1-3: bitset. Each bit represents an x,y pair. If set, there is an updated value for that pair in this message. Last 4 bits of 3rd byte unused.
Bytes 5&6 x coord of first point
Bytes 7&8 y coord of first point
... up to 19 more points
Max of 83 bytes, min of 3 bytes (no coords updated)

C++ Bitmap Bit per pixel

I'm trying to understand building a bmp based on raw data in c++ and I have a few questions.
My bmp can be black and white so I figured that the in the bit per pixel field I should go with 1. However in a lot of guides I see the padding field adds the number of bits to keep 32 bit alignment, meaning my bmp will be the same file size as a 24 bit per pixel bmp.
Is this understanding correct or in some way is the 1 bit per pixel smaller than 24, 32 etc?
Thanks

Monochrome bitmaps are aligned too, but they will not take as much space as 24/32-bpp ones.
A row of 5-pixel wide 24-bit bitmap will take 16 bytes: 5*3=15 for pixels, and 1 byte of padding.
A row of 5-pixel wide 32-bit bitmap will take 20 bytes: 5*4=20 for pixels, no need for padding.
A row of 5-pixel wide monochrome bitmap will take 4 bytes: 1 byte for pixels (it is not possible to use less than a byte, so whole byte is taken but 3 of its 8 bits are not used), and 3 bytes of padding.
So, monochrome bitmap will of course be smaller than 24-bit one.

The answer is already given above (that bitmap rows are aligned/padded to 32-bit boundary), however if you want more information, you might want to read DIBs and Their Uses, the "DIB Header" section - it explains in detail.
Every scanline is DWORD-aligned. The scanline is buffered to alignment; the buffering is not necessarily 0.
The scanlines are stored upside down, with the first scan (scan 0) in memory being the bottommost scan in the image. (See Figure 1.) This is another artifact of Presentation Manager compatibility. GDI automatically inverts the image during the Set and Get operations. Figure 1. (Embedded image showing memory and screen representations.)

How to load image with IMG_Load() without the byte padding for each line?

If i load image such as 98x*** which is 3 bytes per pixel, it will create 2 bytes padding there to make it fit in 4 bytes sequences.
Is it possible to use IMG_Load() without generating the padded bytes in the ->pixels raw data?
At the moment i use this to detect how many bytes it has been padded:
int pad = img->pitch - (img->w * img->format->BytesPerPixel);
And if > 0 Then i rebuild new image without the padded bytes... but this is inefficient, so im hoping if theres better fix?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Advanced Data Compression - compression

Related

Is there some way to predict png size?

RLE Encoding bit sequence, not bytes

Base64 string difference

C++ Bitmap Bit per pixel

How to load image with IMG_Load() without the byte padding for each line?

Categories

Resources