LZW Decompression - c++

I am implementing a LZW algorithm in C++.
The size of the dictionary is a user input, but the minimum is 256, so it should work with binary files. If it reaches the end of the dictionary it goes around to the index 0 and works up overwriting it from there.
For example, if i put in a alice in wonderland script and compress it with a dictionary size 512 i get this dictionary.
But i have a problem with decompression and the output dictionary from decompressing the compressed file looks like this.
And my code for decompressing looks like this
struct dictionary
vector<unsigned char> entry;
vector<bool> bits;
void decompress(dictionary dict[], vector<bool> file, int dictionarySize, int numberOfBits)
//in this example
//dictionarySize = 512, tells the max size of the dictionary, and goes back to 0 if it reaches 513
//numberOfBits = log2(512) = 9
//dictionary dict[] contains bits and strings (strings can be empty)
// dict[0] =
// entry = (unsigned char)0
// bits = (if numberOfBits = 9) 000000001
// dict[255] =
// entry = (unsigned char)255
// bits = (if numberOfBits = 9) 011111111
// so the next entry will be dict[next] (next is currently 256)
// dict[256] =
// entry = what gets added in the code below
// bits = 100000000
// all the bits are already set previously (dictionary size is int dictionarySize) so in this case all the bits from 0 to 511 are already set, entries are set from 0 to 255, so extended ASCII
vector<bool> currentCode;
vector<unsigned char> currentString;
vector<unsigned char> temp;
int next=256;
bool found=false;
for(int i=0;i<file.size();i+=numberOfBits)
for(int j=0;j<numberOfBits;j++)
for(int j=0;j<dictionarySize;j++)
// when the currentCode (size numberOfBits) gets found in the dictionary
currentString = dict[j].entry;
// if the current string isnt empty, then it means it found the characted in the dictionary
found = true;
//if the currentCode in the dictionary has a string value attached to it
for(int j=0;j<currentString.size();j++)
// so it doesnt just push 1 character into the dictionary
// example, if first read character is 'r', it is already in the dictionary so it doesnt get added
// if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
if(next>dictionarySize-1) //next > 512-1
next = 0;
dict[next].entry = temp;
//temp = currentString;
currentString = temp;
for(int j=0;j<currentString.size();j++)
// if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
next = 0;
dict[next].entry = currentString;
temp = currentString;
// currentCode gets cleared, and written into in the next iteration
found = false;
Im am currently stuck and dont know what to fix here to fix the output.
I have also noticed, that if i put a dictionary big enough, so it doesnt go around the dictionary (it doesnt reach the end and begin again at 0) it works.

start small
you are using files that is too much data to debug. Start small with strings. I took this nice example from Wikli:
Input: "abacdacacadaad"
step input match output new_entry new_index
a 0
b 1
c 2
d 3
1 abacdacacadaad a 0 ab 4
2 bacdacacadaad b 1 ba 5
3 acdacacadaad a 0 ac 6
4 cdacacadaad c 2 cd 7
5 dacacadaad d 3 da 8
6 acacadaad ac 6 aca 9
7 acadaad aca 9 acad 10
8 daad da 8 daa 11
9 ad a 0 ad 12
10 d d 3
Output: "0102369803"
So you can debug your code step by step with cross matching both input/output and dictionary contents. Once that is done correctly then you can do the same for decoding:
Input: "0102369803"
step input output new_entry new_index
a 0
b 1
c 2
d 3
1 0 a
2 1 b ab 4
3 0 a ba 5
4 2 c ac 6
5 3 d cd 7
6 6 ac da 8
7 9 aca aca 9
8 8 da acad 10
9 0 a daa 11
10 3 d ad 12
Output: "abacdacacadaad"
Only then move to files and clear dictionary handling.
once you succesfully done the LZW on small alphabet you can try to use the full alphabet and bit encoding. You know the LZW stream can be encoded at any bitlength (not just 8/16/32/64 bits) which can greatly affect compression ratios (in respect to used data properties). So I would try to do univeral access to data at variable (or predefined bitlength).
Was a bit curious so I encoded a simple C++/VCL example for the compression:
// LZW
const int LZW_bits=12; // encoded bitstream size
const int LZW_size=1<<LZW_bits; // dictinary size
// bitstream R/W
DWORD bitstream_tmp=0;
// return LZW_bits from dat[adr,bit] and increment position (adr,bit)
DWORD bitstream_read(BYTE *dat,int siz,int &adr,int &bit,int bits)
DWORD a=0,m=(1<<bits)-1;
// save tmp if enough bits
if (bit>=bits){ a=(bitstream_tmp>>(bit-bits))&m; bit-=bits; return a; }
for (;;)
// insert byte
adr++; bit+=8;
// save tmp if enough bits
if (bit>=bits){ a=(bitstream_tmp>>(bit-bits))&m; bit-=bits; return a; }
// end of data
if (adr>=siz) return 0;
// write LZW_bits from a to dat[adr,bit] and increment position (adr,bit)
// return true if buffer is full
bool bitstream_write(BYTE *dat,int siz,int &adr,int &bit,int bits,DWORD a)
a<<=32-bits; // align to MSB
// save tmp if aligned
if ((adr<siz)&&(bit==32)){ dat[adr]=(bitstream_tmp>>24)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit==24)){ dat[adr]=(bitstream_tmp>>16)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit==16)){ dat[adr]=(bitstream_tmp>> 8)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit== 8)){ dat[adr]=(bitstream_tmp )&255; adr++; bit-=8; }
// process all bits of a
for (;bits;bits--)
// insert bit
a<<=1; bit++;
// save tmp if aligned
if ((adr<siz)&&(bit==32)){ dat[adr]=(bitstream_tmp>>24)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit==24)){ dat[adr]=(bitstream_tmp>>16)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit==16)){ dat[adr]=(bitstream_tmp>> 8)&255; adr++; bit-=8; }
if ((adr<siz)&&(bit== 8)){ dat[adr]=(bitstream_tmp )&255; adr++; bit-=8; }
return (adr>=siz);
bool str_compare(char *s0,int l0,char *s1,int l1)
if (l1<l0) return false;
for (;l0;l0--,s0++,s1++)
if (*s0!=*s1) return false;
return true;
AnsiString LZW_encode(AnsiString raw)
AnsiString lzw="";
int i,j,k,l;
int adr,bit;
const int siz=32; // bitstream buffer
BYTE buf[siz];
AnsiString dict[LZW_size]; // dictionary
int dicts=0; // actual size of dictionary
// init dictionary
for (dicts=0;dicts<256;dicts++) dict[dicts]=char(dicts); // full 8bit binary alphabet
// for (dicts=0;dicts<4;dicts++) dict[dicts]=char('a'+dicts); // test alphabet "a,b,c,d"
adr=0; bit=0;
for (i=0;i<l;)
// find match in dictionary
for (j=dicts-1;j>=0;j--)
if (str_compare(dict[j].c_str(),dict[j].Length(),raw.c_str()+i,l-i))
if (i<l) // add new entry in dictionary (if not end of input)
// clear dictionary if full
if (dicts>=LZW_size) dicts=256; // full 8bit binary alphabet
// if (dicts>=LZW_size) dicts=4; // test alphabet "a,b,c,d"
dict[dicts]=dict[j]+AnsiString(raw[i+1]); // AnsiString index starts from 1 hence the +1
a=j; j=-1; break; // full binary output
// a='0'+j; j=-1; break; // test ASCII output
// store result to bitstream
if (bitstream_write(buf,siz,adr,bit,LZW_bits,a))
// append buf to lzw
for (j=0;j<adr;j++) lzw[j+k+1]=buf[j];
// reset buf
if (bit)
// store the remainding bits with zeropad
if (adr)
// append buf to lzw
for (j=0;j<adr;j++) lzw[j+k+1]=buf[j];
return lzw;
AnsiString LZW_decode(AnsiString lzw)
AnsiString raw="";
int adr,bit,siz,ix;
AnsiString dict[LZW_size]; // dictionary
int dicts=0; // actual size of dictionary
// init dictionary
for (dicts=0;dicts<256;dicts++) dict[dicts]=char(dicts); // full 8bit binary alphabet
// for (dicts=0;dicts<4;dicts++) dict[dicts]=char('a'+dicts); // test alphabet "a,b,c,d"
adr=0; bit=0; ix=-1;
for (adr=0;(adr<siz)||(bit>=LZW_bits);)
// a-='0'; // test ASCII input
// clear dictionary if full
if (dicts>=LZW_size){ dicts=4; ix=-1; }
// new dictionary entry
if (ix>=0)
if (a>=dicts){ dict[dicts]=dict[ix]+AnsiString(dict[ix][1]); dicts++; }
else { dict[dicts]=dict[ix]+AnsiString(dict[a ][1]); dicts++; }
} ix=a;
// update decoded output
return raw;
and output using // test ASCII input lines:
where AnsiString is the only VCL stuff I used and its just self allocating string variable beware its indexes starts at 1.
AnsiString s;
s[5] // character access (1 is first character)
s.Length() // returns size
s.c_str() // returns char*
s.SetLength(size) // resize
So just use any string you got ...
In case you do not have BYTE,DWORD use unsigned char and unsigned int instead ...
Looks like its working for long texts too (bigger than dictionary and or bitstream buffer sizes). However beware that the clearing might be done in few different places of code but must be synchronized in both encoder/decoder otherwise after clearing the data would corrupt.
The example can use either just "a,b,c,d" alphabet or full 8it one. Currently is set for 8bit. If you want to change it just un-rem the // test ASCII input lines and rem out the // full 8bit binary alphabet lines in the code.
To test crossing buffers and boundary you can play with:
const int LZW_bits=12; // encoded bitstream size
const int LZW_size=1<<LZW_bits; // dictinary size
and also with:
const int siz=32; // bitstream buffer
constants... The also affect performance so tweak to your liking.
Beware the bitstream_write is not optimized and can be speed up considerably ...
Also in order to debug 4bit aligned coding I am using hex print of encoded data (hex string is twice as long as its ASCII version) like this (ignore the VCL stuff):
AnsiString txt="abacdacacadaadddddddaaaaaaaabcccddaaaaaaaaa",enc,dec,hex;
// convert to hex
hex=""; for (int i=1,l=enc.Length();i<=l;i++) hex+=AnsiString().sprintf("%02X",enc[i]);
mm_log->Lines->Add(AnsiString().sprintf("ratio: %i%",(100*enc.Length()/dec.Length())));
and result:
ratio: 81%


Arduino, ambilight main loop for displaying LEDs and error handling

i have made ambilight on arduino and now im trying to figure how it works. This is main loop of the program which is displaying LEDS.
Can somebody tell me what does first loop (what is magic word), Hi, Lo, Checksum and If checksum does not match go back to wait.
void loop() {
// Wait for first byte of Magic Word
for(i = 0; i < sizeof prefix; ++i) {
waitLoop: while (!Serial.available()) ;;
// Check next byte in Magic Word
if(prefix[i] == Serial.read()) continue;
// otherwise, start over
i = 0;
goto waitLoop;
// Hi, Lo, Checksum
while (!Serial.available()) ;;
while (!Serial.available()) ;;
while (!Serial.available()) ;;
// If checksum does not match go back to wait
if (chk != (hi ^ lo ^ 0x55)) {
goto waitLoop;
memset(leds, 0, NUM_LEDS * sizeof(struct CRGB));
// Read the transmission data and set LED values
for (uint8_t i = 0; i < NUM_LEDS; i++) {
byte r, g, b;
r = Serial.read();
g = Serial.read();
b = Serial.read();
leds[i].r = r;
leds[i].g = g;
leds[i].b = b;
// Shows new values
The code decoding what is generally called as the "Adalight protocol", it consists of a 3-byte prefix as the "magic word" {'A', 'd', 'a'} or "Ada"), followed by a uint16_t value in big endian format that represents the number of LEDs - 1, followed by 16-bit checksum. LED data follows, 3 bytes per LED, in order R, G, B (where 0 = off and 255 = max brightness).
By the way, wherever you copy your code from, it is not well written. You could find better implementation online.

Parsing Message with Varying Fields

I have a byte stream that represents a message in my application. There are 5 fields in the message for demonstration. The first byte in the stream indicates which message fields are present for the current stream. For instance 0x2 in the byte-0 means only the Field-1 is present for the current stream.
The mask field might have 2^5=32 different values. To parse this varying width of message, I wrote the example structure and parser below. My question is, is there any other way to parse such dynamically changing fields? If the message had 64 fields with I would have to write 64 cases, which is cumbersome.
#include <iostream>
typedef struct
uint8_t iDummy0;
int iDummy1;
}__attribute__((packed, aligned(1)))Field4;
typedef struct
int iField0;
uint8_t ui8Field1;
short i16Field2;
long long i64Field3;
Field4 stField4;
}__attribute__((packed, aligned(1)))MessageStream;
char* constructIncomingMessage()
char* cpStream = new char(1+sizeof(MessageStream)); // Demonstrative message byte array
// 1 byte for Mask, 20 bytes for messageStream
cpStream[0] = 0x1F; // the 0-th byte is a mask marking
// which fields are present for the messageStream
// all 5 fields are present for the example
return cpStream;
void deleteMessage( char* cpMessage)
delete cpMessage;
int main() {
MessageStream messageStream; // Local storage for messageStream
uint8_t ui8FieldMask; // Mask to indicate which fields of messageStream
// are present for the current incoming message
const uint8_t ui8BitIsolator = 0x01;
uint8_t ui8FieldPresent; // ANDed result of Mask and Isolator
std::size_t szParsedByteCount = 0; // Total number of parsed bytes
const std::size_t szMaxMessageFieldCount = 5; // There can be maximum 5 fields in
// the messageStream
char* cpMessageStream = constructIncomingMessage();
ui8FieldMask = (uint8_t)cpMessageStream[0];
szParsedByteCount += 1;
for(std::size_t i = 0; i<szMaxMessageFieldCount; ++i)
ui8FieldPresent = ui8FieldMask & ui8BitIsolator;
case 0:
memcpy(&messageStream.iField0, cpMessageStream+szParsedByteCount, sizeof(messageStream.iField0));
szParsedByteCount += sizeof(messageStream.iField0);
case 1:
memcpy(&messageStream.ui8Field1, cpMessageStream+szParsedByteCount, sizeof(messageStream.ui8Field1));
szParsedByteCount += sizeof(messageStream.ui8Field1);
case 2:
memcpy(&messageStream.i16Field2, cpMessageStream+szParsedByteCount, sizeof(messageStream.i16Field2));
szParsedByteCount += sizeof(messageStream.i16Field2);
case 3:
memcpy(&messageStream.i64Field3, cpMessageStream+szParsedByteCount, sizeof(messageStream.i64Field3));
szParsedByteCount += sizeof(messageStream.i64Field3);
case 4:
memcpy(&messageStream.stField4, cpMessageStream+szParsedByteCount, sizeof(messageStream.stField4));
szParsedByteCount += sizeof(messageStream.stField4);
std::cerr << "Undefined Message field number: " << i << '\n';
ui8FieldMask >>= 1; // shift the mask
delete deleteMessage(cpMessageStream);
return 0;
The first thing I'd change is to drop the __attribute__((packed, aligned(1))) on Field4. This is a hack to create structures which mirror a packed wire-format, but that's not the format you're dealing with anyway.
Next, I'd make MessageStream a std::tuple of std::optional<T> fields.
You now know that there are std::tuple_size<MessageStream> possible bits in the mask. Obviously you can't fit 64 bits in a ui8FieldMask but I'll assume that's a trivial problem to solve.
You can write a for-loop from 0 to std::tuple_size<MessageStream> to extract the bits from ui8FieldMask to see which bits are set. The slight problem with that logic is that you'll need compile-time constants I for std::get<size_t I>(MessageStream), and a for-loop only gives you run-time variables.
Hence, you'll need a recursive template <size_t I> extract(char const*& cpMessageStream, MessageStream&), and of course a specialization extract<0>. In extract<I>, you can use typename std::tuple_element<I, MessageStream>::type to get the std::optional<T> at the I'th position in your MessageStream.

Send 4bytes floating point value through tcp From Dart to C++

I have a trouble with sending 4 bytes double value through tcp socket from Dart client to c++ server.
Below is flutter(Dart) code.
class DataPacket extends object{
String message = "some";
int ID = 1;
double x = 1.38;
String toString() {
String value = message;
value += getCharCodeStringFromInt(ID);
value += getCharCodeStringFromDouble(x);
return value;
Uint8List _getInt16LittleEndianBytes(int value) =>
Uint8List(2)..buffer.asByteData().setInt16(0, value, Endian.little);
String getCharCodeStringFromInt(int value){
Uint8List message_id_list = _getInt16LittleEndianBytes(value);
return getCharCodeStringFromUint8List(message_id_list);
String getCharCodeStringFromDouble(double value) {
// List<double> temp = List<double>() ;
// temp.add(value);
// Float32List floatlist = Float32List.fromList(temp);
// Uint8List list = Uint8List.view(floatlist.buffer);
// Uint8List list = binaryCodec.encode(value);
Uint8List list = Uint8List(4)..buffer.asByteData().setFloat32(0, value);
print("Uint8List from double : ${list}");
print("Uint8List length from double : ${list.length}");
print("CharCodeString from double Length :
return getCharCodeStringFromUint8List(list);
String getCharCodeStringFromUint8List(Uint8List list){
String charCodeString = "";
list.forEach((charCode) => charCodeString += String.fromCharCode(charCode));
return charCodeString;
//Some Class
void sendMessage(){
List<int> data = _socket.encoding.encode(DataPacket().toString());
I can parse String and int in c++ by memcpy.
but can't double value.
When I checked the contents of Byte Data and Length,
Uint8List that was gotten from double has lengthened in encoding method of socket.
I mean, length of Uint8List from double was 4 before encoded.
However length of return value(List) becomes 7 after encoding.
so print result of DataPacket().toString().length and data.length is different each other.
I can't parse 7 bytes of float in c++..
Commented lines are ways I tried.
Is there any way?
Thank you.
The problem isn't how you are encoding the float as bytes, but rather what you do to it next trying to convert it to a string.
If you want to send 4 bytes through the socket (assuming you are referring to a dart:io Socket), just use the add method.
Socket s;
var floatValue = 1.38;
var bytes = Uint8List(4)
..buffer.asByteData().setFloat32(0, floatValue, Endian.little);
print(bytes); // prints [215, 163, 176, 63]
In this example, you've started with the target byte array and then used it asByteData to set one value. This is a good way to build up a struct of mixed types - ints, floats, etc. If you just need to convert an array of floats to bytes you can do it slightly more simply with:
var bytes = Float32List.fromList([floatValue]).buffer.asUint8List();
I'm not sure you really need to be doing anything with strings. Is what you really want:
Socket s;
var message = 'some';
var id = 1;
var x = 1.38;
s.add(ascii.encode(message)); // choose the appropriate codec: ascii, utf8
// rather than adding each in turn, you could also form a longer byte array
// of the 3 elements and add that

Comparing an usart received uint8_t* data with a constant string

I'm working on an Arduino Due, trying to use DMA functions as I'm working on a project where speed is critical. I found the following function to receive through serial:
uint8_t DmaSerial::get(uint8_t* bytes, uint8_t length) {
// Disable receive PDC
// Wait for PDC disable to take effect
while (uart->UART_PTSR & UART_PTSR_RXTEN);
// Modulus needed if RNCR is zero and RPR counts to end of buffer
rx_tail = (uart->UART_RPR - (uint32_t)rx_buffer) % DMA_SERIAL_RX_BUFFER_LENGTH;
// Make sure RPR follows (actually only needed if RRP is counted to the end of buffer and RNCR is zero)
uart->UART_RPR = (uint32_t)rx_buffer + rx_tail;
// Update fill counter
// No bytes in buffer to retrieve
if (rx_count == 0) { uart->UART_PTCR = UART_PTCR_RXTEN; return 0; }
uint8_t i = 0;
while (length--) {
bytes[i++] = rx_buffer[rx_head];
// If buffer is wrapped, increment RNCR, else just increment the RCR
if (rx_tail > rx_head) { uart->UART_RNCR++; } else { uart->UART_RCR++; }
// Increment head and account for wrap around
rx_head = (rx_head + 1) % DMA_SERIAL_RX_BUFFER_LENGTH;
// Decrement counter keeping track of amount data in buffer
// Buffer is empty
if (rx_count == 0) { break; }
// Turn on receiver
return i;
So, as far as I understand, this function writes to the variable bytes, as a pointer, what is received as long as is no longer than length. So I'm calling it this way:
dma_serial1.get(data, 8);
without assigning its returning value to a variable. I'm thinking the received value is stored to the uint8_t* data but I might be wrong.
Finally, what I want to do is to check if the received data is a certain char to take decisions, like this:
if (data == "t"){
//do something//}
How could I make this work?
For comparing strings like intended by if (data == "t"), you'll need a string comparison function like, for example, strcmp. For this to work, you must ensure that the arguments are actually (0-terminated) C-strings:
uint8_t data[9];
uint8_t size = dma_serial1.get(data, 8);
if (strcmp(data,"t")==0) {
In case that the default character type in your environment is signed char, to pass data directly to string functions, a cast is needed from unsigned to signed:
if (strcmp(reinterpret_cast<const char*>(data),"t")==0) {
So a complete MVCE could look as follows:
int get(uint8_t *data, int size) {
data[0] = 't';
return 1;
int main()
uint8_t data[9];
uint8_t size = get(data, 8);
if (strcmp(reinterpret_cast<const char*>(data),"t")==0) {
cout << "found 't'" << endl;
found 't'

ifstream fails without any reason?

I have a list of 3d spheres, when I save the list, I loop through:
void Facade::Save(std::ostream& fs)
fs<<x<<" "<<y<<" "<<z<<" "<<r<<" "; //save fields
fs<<color[0]<<" "<<color[1]<<" "<<color[2]<<std::endl;
and when I restore the list, I use:
void Facade::Load(std::ifstream& fs, BallList* blist)
GLfloat c[3];
//fails there, why
I don't know what goes wrong, but when reading the last line, the color components of the last sphere cannot be read, the stream fails after reading the radius of the last sphere. I checked the sphere list file:
7.05008 8.99167 -7.16849 2.31024 1 0 0
3.85784 -3.93902 -1.46886 0.640751 1 0 0
9.33226 3.66375 -6.93533 2.25451 1 0 0
6.43361 1.64098 -6.17298 0.855785 1 0 0
6.34388 -0.494705 -6.88894 1.50784 1 0 0
This looks good. Can somebody tell me why is this happening? Is this a bug of ifstream?
I'm using Unicode by the way.
The loops are attached below:
void BallList::Load(std::istream& fs)
Facade ball1;
while(!fs.fail() && !fs.eof())
ball1.Load(fs, this);
balls.pop_back(); //this is a work around, get rid of the last one
void BallList::Save(std::ostream& fs)
vector<Facade>::iterator itero = this->originalballs.begin();
while (itero != this->originalballs.end())
//work around the ifstream problem: the color of the last sphere cannot be read
//add a dummy item as the last
itero = this->originalballs.begin();
if(itero != this->originalballs.end())
I would expect this to fail after reading 5 balls (spheres) correctly.
The loop is designed so that attempting to read the 6th ball will fail but Add() is still called!!
You should redefine your code a bit:
std::ifstream& Facade::Load(std::ifstream& fs, BallList* blist)
GLfloat c[3];
fs>>x>>y>>z>>r; // This will fail if there is no input.
// Once all 5 balls have been read
// There is only a new line character on the stream.
// Thus the above line will fail and the fail bit is now set.
return fs; // returned so it can be tested in loop.
void BallList::Load(std::istream& fs)
Facade ball1;
while(ball1.Load(fs, this)) // Only enter loop if the load worked
{ // Load worked if the stream is in a good state.
// Only call Add() if Load() worked.
PS. White space is your friend. Personally I think this is easier to read:
fs >> x >> y >> z >> r;