DEFLATED data will always be larger than the uncompressed input data [duplicate]

DEFLATED data will always be larger than the uncompressed input data [duplicate] - c++

This question already has answers here:
Zlib deflated input is larger than original input string of chars?
(3 answers)
Closed 5 years ago.
I'm now trying to understand how deflate and inflate work
Here is a simple program with dummyFields struct.
// Here i give a total memory size for the output buffer used by deflate func
#define CHUNK 16384
struct dummyFields
{
long a;
char b;
long c;
float d;
float e;
float f;
float g;
float h;
char i;
unsigned int j;
};
Bytef *dataOriginal = (Bytef*)malloc( sizeof(dummyFields) );
Bytef *dataCompressed = (Bytef*)malloc( CHUNK );
z_stream s
s.zalloc = Z_NULL;
s.zfree = Z_NULL;
s.opaque = Z_NULL;
deflateInit(&s, Z_DEFAULT_COMPRESSION);
s.avail_out = CHUNK;
s.next_out = dataCompressed;
int compressSize = 0;
int decompSize = 0;
dummyFields para;
// set all values equals to 0
memset( &para, 0, sizeof(dummyFields) );
//Inserts value in struct fields
para.a = 31272;
para.b = 'z';
para.c = 66.54;
para.e = 123;
para.f = 66.54;
.
.
para.j = 123;
//copy this values in a Bytef* elements
memcpy( dataOriginal, &para, sizeof(dummyFields));
s.avail_in = sizeof(dummyFields);
s.next_in = dataOriginal;
int response = deflate(&s, Z_FINISH);
//don't get any errors here
if( res == Z_STREAM_END ){
compressSize = CHUNK - s.avail_out;
}
}
deflateEnd(&s);
//here I get 45 byte but actual struct sizeof is 40.
printf("Total bytes after compress %d\n",compressSize);
// Trying to get back my data
Bytef *decomp = (Bytef*)malloc( sizeof(Particle) );
z_stream s_inflate;
s_inflate.zalloc = Z_NULL;
s_inflate.zfree = Z_NULL;
s_inflate.opaque = Z_NULL;
inflateInit(&s_inflate);
// data i want to get at the next inflate
s_inflate.avail_in = spaceUsed;
s_inflate.next_in = dataCompressed;
s_inflate.avail_out = sizeof(dummyFields);
s_inflate.next_out = decomp;
int response = inflate( &s_inflate, Z_NO_FLUSH );
if( res == Z_STREAM_END ){
decompSize = CHUNK - s.avail_out;
}
//Here I got 40 bytes which is correct beacuse actual struct size is 40
printf("Total bytes after compress %d\n",decompSize);
inflateEnd( &s_inflate );
dummyFields data;
memset( &data, 0, sizeof(data) );
memcpy( &data, decomp, sizeof(data));
when I tried to back my data from the inflate response I get actual values(whic h is correct). Deflate and Inflate functions work fine.
When i try to find the size of(sizeof(dummyFields)) the struct it give me 40 bytes
Problem
Actual size of struct is 40 when I compress the data it give me 45
bytes how it is possible ?
My requirement data is 30 to 40 bytes is there any another library
which will compress the data 10 to 20 bytes(when I give 30 to 40 bytes)?
Is there some way to GUARANTEE that the output compressed data will
be SMALLER than the input data?
Note
When I increase number of struct fields or size of data 40 to 100 bytes compression result is Ok.
When I decrease number of fields or size 100 bytes to 40 bytes compression result is not Good

There is no GUARANTEE that output compressed data will be SMALLER. Compression implies some overhead for storing some structural information describing the data being packed (like dictionary in primitive case) that allows to represent the data using less space. You may get a larger compressed output for a single compressed structure, but you most likely will get a much smaller compressed output for an array of these structures (especially when they are not very different from each other).
If your compressed output turns out to be larger that uncompressed then just store an uncompressed version.

Related

8b10b encoder with byte stream output (bits carry): faster bitwise algorithm?

I have written a 8b10b encoder that generates a stream of bytes intended to be sent to a serial transmitter which sends the bytes as-is LSb first.
What I'm doing here is basically lay down groups of 10 bits (encoded from the input stream of bytes) on groups of 8, so a varying number of bits get carried over from one output byte to the next - kind of like in music/rhythm.
The program has been successfully tested, but it is about 4-5x too slow for my application. I think it comes from the fact that every bit has to be looked up in an array. My guts tell me we could make that faster by having some sort of rolling mask but I can't yet see how to do that even by swapping out the 3d array of booleans to a 2D array of integers.
Any pointer or other idea?
Here is the code. Please ignore most of the macros and some of the code related to deciding which byte is to be written as this is application-specific.
Header:
#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED
#include <stdint.h> //for standard portable types such as uint16_t
#define MAX_USB_TRANSFER_SIZE 1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK 5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL 172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation
#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S) ((N / S) * S) //Same rounding down
#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz) (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL+2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz) ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet
#define MAX_TX_PACKET_SIZE (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin
//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
NO_ERR = 0,
INVALID_DIN_SIZE = 1,
INVALID_DOUT_SIZE = 2,
NULL_DIN_PTR = 4,
NULL_DOUT_PTR = 8
};
char const * const ERR_CODE_DESC[] = {
"No error",
"Invalid size of input data",
"Invalid size of output buffer",
"Input data pointer is NULL",
"Output buffer pointer is NULL"
};
/** #brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
#arg din is a pointer to an allocated array of bytes which contains the data to encode
#arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
#arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
#arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
#return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);
#endif // TX_BYTESTREAM_GEN_H_INCLUDED
Source file:
#include "TX_bytestream_gen.h"
#include <cstddef> //NULL
#define N_BYTE_VALUES (256+1) //256 possible data values + 1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte
//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Long table (see appendix)
};
//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Long table (see appendix)
};
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
static bool RDp = false; //Running disparity is initially negative
int ret = 0;
//If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
//return an invalid output buffer size error
if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
ret |= INVALID_DOUT_SIZE;
//If the input data size is not consistent with the output buffer size, return the appropriate error code
if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
ret |= INVALID_DIN_SIZE;
if(din == NULL)
ret |= NULL_DIN_PTR;
if(dout == NULL)
ret |= NULL_DOUT_PTR;
//If everything checks out, carry on
if(ret == NO_ERR)
{
uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
uint16_t iByteOut = 0; //Index of the output byte currently being written to
uint8_t iBitOut = 0; //Starts with LSb
int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.
//For all output bytes to generate
while(iByteOut < doutSize)
{
bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)
//If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
if(nBytesUntilSync <= 0)
{
sync = true;
if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
{
nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
}
}
//Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
//The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
//input data if it isn't
uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut++)
{
//If the current output byte is complete, reset the bit index and select the next one
if(iBitOut >= 8)
{
iByteOut++;
iBitOut = 0;
}
//Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
}
//The running disparity is also updated as per the standard (to achieve DC balance)
RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity
//If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
//Also decrement the synchronisation counter
if(!sync) {
iByteIn++;
}
//In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
nBytesUntilSync--;
}
}
return ret;
}
Testbench:
#include <iostream>
#include "TX_bytestream_gen.h"
#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE
#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)
#include <chrono>
using namespace std;
//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
uint8_t out[PACKET_SIZE] = {0};
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for(unsigned int i = 0 ; i < N_ITERATIONS ; i++)
{
TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";
return 0;
}
Appendix: lookup tables. I don't have enough characters to paste it here, but it looks like so:
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
//Running disparity = RD-
{
{1,0,0,1,1,1,0,1,0,0},
//...
},
//Running disparity = RD+
{
{0,1,1,0,0,0,1,0,1,1},
//...
}
};
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
//Previous running disparity was RD-
{
0,
//...
},
//Previous running disparity was RD+
{
1,
//...
}
};

This will be a lot faster if you do everything a byte at time instead of a bit at a time.
First change the way you store your lookup tables. You should have something like:
// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}
Then you need to change our encoding loop to write a byte at a time. You can use a uint16_t to store the leftover bits from each byte you output.
Something like this (I didn't figure out your sync byte logic, but presumably you can put that in the input or output byte loop):
// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
// bits generated, but not yet written, LSB first
uint16_t bits = 0;
// number of bits in bits
unsigned numbits = 0;
// current RD, either 0 or 256
uint16_t rd = isRD1 ? 256 : 0;
for (const uint8_t *end = src + src_len; src < end; ++src) {
// lookup code and next rd
uint16_t code = BYTE_TO_CODE[rd + *src];
// new rd from code bit 10
rd = (code>>2) & 256;
// store bits
bits |= (code & (uint16_t)0x03FF) << numbits;
numbits+=10;
// write out any complete bytes
while(numbits >= 8) {
*dest++ = (uint8_t)bits;
bits >>=8;
numbits-=8;
}
}
// If src_len isn't divisible by 4, then we have some extra bits
if (numbits) {
*dest = (uint8_t)bits;
}
return !!rd;
}

Converting struct headers to and from vec<char>

I am trying to pack a message buffer vector with smaller messages of char* interleaved along with their respective headers. These headers are cast as char* from a struct. A receiver unpacks the buffers by jumping from header to header. Using the header, they can identify the size of the corresponding message and at which position the next header starts.
struct header_t {
uint64_t src : 16;
uint64_t dst : 16;
uint64_t len : 32;
} __attribute__((packed));
This is a fixed-size header struct. They are converted to a vector using using :
std::vector<char> pack_header(int src, int dest, int data_size){
header_t hdr_struct{(uint64_t)src, (uint64_t)dest, (uint64_t)data_size};
char* ptr = reinterpret_cast<char*>(&hdr_struct);
std::vector<char> hdr = std::vector<char>(ptr, ptr + sizeof(header_t));
return hdr;
}
The main packs 3 messages along with their respective headers by appending them to the buffer. For simplicity, I also call the unpacking function here but in the real app, the buffers are unpacked at some other process.
int main()
{
auto header1 = pack_header(2, 4, 43);
auto header2 = pack_header(2, 4, 29);
auto header3 = pack_header(2, 4, 23);
//Test messages of the given sizes
char* data1 = new char[43]();
char* data2 = new char[29]();
char* data3 = new char[23]();
//messages interleaved with headers
std::vector<char> buffer;
buffer.insert(buffer.end(), header1.begin(),header1.end());
buffer.insert(buffer.end(), data1, data1+43);
buffer.insert(buffer.end(), header2.begin(),header2.end());
buffer.insert(buffer.end(), data2, data2+29);
buffer.insert(buffer.end(), header3.begin(),header3.end());
buffer.insert(buffer.end(), data3, data3+23);
unpack(buffer);
return 0;
}
This unpacks function iterates through the buffer by taking steps in the size of header length(8 bytes) + message length.
void unpack(std::vector<char> &stream)
{
char* bitr = &stream[0];
int step =0;
std::vector<header_t*> hdr_stream;
while(step <= stream.size())
{
std::cout<<"\nstep: "<<step;
header_t header;
memcpy(&header, bitr, sizeof(header_t));
std::cout<<"\nlen_after : "<<header.len<<"\n";
step+=header.len + sizeof(header_t);
bitr+=step;
}
}
The unpack function steps through the first 2 messages correctly but loses its way after that and doesn't infer the headers. The output looks like this :
step: 0
message length : 43
step: 51
message length : 29
step: 88
message length : 0
step: 96
message length : 0
step: 104
message length : 0
step: 112
message length : 0
Any reasons why it couldn't correctly infer the size from the headers after the second message. It just reads the size as 0 and steps over headers rather than header + messages.

With every iteration, you increment step by the size of the current packet. It is an index into stream of where the next packet starts. However, you add to bitr this index, resulting in Undefined Behavior because bitr has been increased past the end of the buffer.
What you get is:
start of 1st loop: bitr = 0
start of 2nd loop: bitr = 51 (size of first packet)
start of 3rd loop: bitr = 139 (51 + 51 + 37, 2 * size of first packet + size of second packet)
What you want to do is increment bitr the same amount you increase step (bitr+=header.len + sizeof(header_t);), or set set bitr to the newly computed index (bitr = &stream[step];).

Why are my bytes being written and read properly by Qt<->Matlab when I use doubles, but improperly for my bytes storing uint32?

I'm trying to send a packet of data via UDP from my Qt GUI application to a Mathworks Simulink model where it is unpacked. I am using a union to save the data to and convert it to a char array of bytes to send. My first 58 bytes write and read as expected, but the last part doesn't.
I've tried multiple datatypes (int, unsigned int, float, int32_t) and none of them seem to be writing to the proper bytes in m_tx_data.myBytes[].
My guesses are either something is slightly wrong in my code, or Matlab and Qt read/write bytes to ints differently and I can't find out how.
My defined union
const int GUI2DEVICE_NUM_DOUBLE = 7;
const int GUI2DEVICE_NUM_BOOL = 2;
const int GUI2DEVICE_NUM_INT32 = 3;
const int GUI2DEVICE_DATA_SIZE = (GUI2DEVICE_NUM_DOUBLE*sizeof(double)) +
(GUI2DEVICE_NUM_BOOL*sizeof(bool)) +
(GUI2DEVICE_NUM_INT32*sizeof(unsigned int));
union GuiToDeviceDataType
{
char myBytes[GUI2DEVICE_DATA_SIZE];
struct
{
double doub[GUI2DEVICE_NUM_DOUBLE];
bool boolean[GUI2DEVICE_NUM_BOOL];
int32_t int32[GUI2DEVICE_NUM_INT32];
} part;
};
Assigning values to the union. Every variable matches type to the part of the struct it is going to. m_tx_data.myBytes is initialized to 0.
m_tx_data.part.doub[iAmplitude] = amplitude;
m_tx_data.part.doub[iStartFrequency] = startHz;
m_tx_data.part.doub[iStopFrequency] = stopHz;
m_tx_data.part.doub[iFrequencyRampTime] = FreqRampTime;
m_tx_data.part.doub[iAmpRampUpTime] = AmpRampUpTime;
m_tx_data.part.doub[iAmpRampDownTime] = AmpRampDownTime;
m_tx_data.part.doub[iAutoScaleDecrementPercent] = ASDecPercent;
m_tx_data.part.boolean[0] = 1;
m_tx_data.part.boolean[1] = 1;
m_tx_data.part.int32[iSweepSteps] = SweepSteps;
m_tx_data.part.int32[iPeriodsToAverage] = PeriodsToAverage;
m_tx_data.part.int32[iPeriodsToIgnore] = PeriodsToSkip;
The values being assigned to m_tx_data.part.int32[] are equal to 1 (assigning them to 2 or 3 return a similar result). I'm expecting m_tx_data.myBytes[58:61] to be (0 0 0 1) or (1 0 0 0) (endianness), but instead it's returning (0 0 1 0), which is being read as 65536 by Matlab (little endian).
However, for my doubles, they are saving and reading as expected. m_tx_data.myBytes[0:7] = (0 0 0 0 0 0 240 63) which is read as 1 when casted to double by Matlab.
When debugging and looking at m_tx_data.myBytes, I can see the value stored in each byte and watch as each variable is written to the structure.

Add check static_assert(sizeof(GuiToDeviceDataType::part) == GUI2DEVICE_DATA_SIZE). It should show error in your case because of alignment. Lets check:
size_t sz1 = sizeof(GuiToDeviceDataType::part);
// sz1 == 72 and GUI2DEVICE_DATA_SIZE is 70
Reorder struct to move bool fields to the end of struct, so you'll get more expected results - padding should be only at the end of struct. You can check this by casting addresses of pointers to char* and then showing the difference:
GuiToDeviceDataType data;
size_t sz1 = sizeof(GuiToDeviceDataType::part);
char* p1 = (char*)(void*)&data.part.boolean[0];
char* p2 = (char*)(void*)&data.part.int32[0];
size_t diff = p2 - p1;
std::cout << diff; // shows 4 for your code

How to copy Buffer bytes block in Poco C++?

Hi i am trying to write a TCP connection in poco. the client sends a packet with this fields :
packetSize : int
date : int
ID : int
so the first 4 bytes contains the packet size. in the receive side i have this code :
int packetSize = 0;
char *bufferHeader = new char[4];
// receive 4 bytes that contains packetsize here
socket().receiveBytes(bufferHeader, sizeof(bufferHeader), MSG_WAITALL);
Poco::MemoryInputStream *inStreamHeader = new Poco::MemoryInputStream(bufferHeader, sizeof(bufferHeader));
Poco::BinaryReader *BinaryReaderHeader = new Poco::BinaryReader(*inStreamHeader);
(*BinaryReaderHeader) >> packetSize; // now we have the full packet size
Now I am trying to store all remaining incoming bytes into one array for future binary reading :
int ID = 0;
int date = 0;
int availableBytes = 0;
int readedBytes = 0;
char *body = new char[packetSize - 4];
do
{
char *bytes = new char[packetSize - 4];
availableBytes = socket().receiveBytes(bytes, sizeof(bytes), MSG_WAITALL);
if (availableBytes == 0)
break;
memcpy(body + readedBytes, bytes, availableBytes);
readedBytes += availableBytes;
} while (availableBytes > 0);
Poco::MemoryInputStream *inStream = new Poco::MemoryInputStream(body, sizeof(body));
Poco::BinaryReader *BinaryReader = new Poco::BinaryReader(*inStream);
(*BinaryReader) >> date;
(*BinaryReader) >> ID;
cout << "date :" << date << endl;
cout << "ID :" << ID << endl;
the problem is the byte block of body is not storing the remaining bytes , it has always only the first 4 bytes (date). so in the out put the date is correct but the ID is not as expected. I tried to Stream it without Block copy and manually receive the each field without loop, it was just fine and had expected data. but when i try to store the incoming bytes into one array and then pass that array to a memorystream to read it, i have only the first block correct and expected!!
I really need to store all incoming bytes into one array and then read whole that array, how should i change my code?
thanks alot

I see two errors in your code. Or, more precisely, an error you do twice.
You confuse sizeof of char[] with sizeof of char *; the first is the number of characters in the array, the second is the size of the pointer: typically 4 or 8 bytes, depending on the memory model.
So, when you write
availableBytes = socket().receiveBytes(bytes, sizeof(bytes), MSG_WAITALL);
you are asking for 4 (I suppose) bytes. This is not serious as you continue to ask other bytes until the message is finished.
The real problem is the following instruction
Poco::MemoryInputStream *inStream = new Poco::MemoryInputStream(body, sizeof(body));
where you transfer only sizeof(char *) bytes in inStream
You should substitute sizeof(body) and sizeof(bytes) with packetSize - 4.
P.s.: sorry for my bad english
Edit: I've seen another error. In this instruction
char *bytes = new char[packetSize - 4];
you allocate packetSize - 4 chars. This memory in never deleted and in allocated in the do ... while() cycle.
You can allocate bytes outside of the cycle (togheter with body).
Edit 2016.03.17
Proposed solution (caution: non tested)
size_t toRead = packetSize - 4U;
size_t totRead = 0U;
size_t nowRead;
char * body = new char[toRead];
do
{
nowRead += socket().receiveBytes(body+totRead, toRead-totRead,
MSG_WAITALL);
if ( 0 == nowRead )
throw std::runtime_error("shutdown from receiveBytes()");
totRead += nowRead;
} while ( totRead < toRead );
Poco::MemoryInputStream *inStream = new Poco::MemoryInputStream(body,
toRead);
delete[] body;
body = NULL;

C++ Algorithm compression ratio calculation

I am stuck since 2 days with a seemingly simple calculation.
But I just don't get it.
I am encoding an audio file with a compressing algorithm.
The entire audio file is separated into "chunks" of 960 bytes.
Each chunk is compressed to 60 bytes.
My uncompressed file is 1480320 bytes long.
My encoded file is 46320 bytes long.
Something seems to be wrong.
I tried to calculate the theoretic uncompressed file size from the file size of the encoded audio.
Here is how the file is encoded:
short *m_in;
short *m_out;
unsigned char *m_data;
unsigned char *m_fbytes;
int m_max_frame_size;
int m_frame_size;
int m_sampling_rate;
int m_max_payload_bytes;
int m_bitrate_bps;
int m_iByteLen1FrameEncoded;
int m_iByteLen1FrameDecoded;
m_sampling_rate=48000;
m_max_frame_size = 960*6;
m_max_payload_bytes=1500;
m_bitrate_bps= 24000;
m_iByteLen1FrameEncoded=60;
m_iByteLen1FrameDecoded=960;
m_in = (short*)malloc(m_max_frame_size*sizeof(short));
m_out = (short*)malloc(m_max_frame_size*sizeof(short));
m_data = (unsigned char*)calloc(m_max_payload_bytes,sizeof(char));
m_fbytes = (unsigned char*)malloc(m_iByteLen1FrameDecoded*sizeof(short));
FILE *fin= fopen(uPathInput.c_str(), "rb");
FILE *fout=fopen(uPathOutput.c_str(), "wb");
int curr_read=0;
int stop=0;
while (!stop)
{
int err;
err = fread(m_fbytes, sizeof(short), 960, fin);
curr_read = err;
for(int i=0;i<curr_read;i++)
{
opus_int32 s;
s=m_fbytes[2*i+1]<<8|m_fbytes[2*i];
s=((s&0xFFFF)^0x8000)-0x8000;
m_in[i]=s;
}
if (curr_read < 960)
{
for (int i=curr_read;i<960;i++)
{
m_in[i] = 0;
}
stop = 1;
}
//iLen will always return 60, so I guess the 960 bytes are compressed to 60 bytes, right?
int iLen = opus_encode(m_enc, m_in, m_iByteLen1FrameDecoded, m_data, m_max_payload_bytes);
if (fwrite(m_data, 1, iLen, fout) !=iLen)
{
fprintf(stderr, "Error writing.\n");
}
}
fclose(fin);
fclose(fout);
}
The compression ratio seems to be
960/60 = 16
So I calculated 46320 bytes * 16.
But that gets me to 741120 bytes.
And that doesn't fit. I expected it to be 1480320 bytes.
I am trying to find the error in my calculation, but I just don't manage.
Does anybody see where I went wrong?
Thank you very much for any help!

Right, to expand on my comments. The problem is found here:
fread(m_fbytes, sizeof(short), 960, fin);
You're reading 960 shorts, which should be 2 bytes wide, so you're really reading 1920 bytes. If opus_encode() returns the compressed size in bytes, that would make the compression ration 32 as Robert observed.
I'd also simplify the code processing the chunks:
size_t ITEM_SIZE = sizeof(short);
int ITEM_COUNT = 960;
// fread should first return a short item count, then zero
size_t shorts_read = 0;
while (shorts_read = fread(m_fbytes, ITEM_SIZE, ITEM_COUNT, fin)) {
size_t i = 0;
for (; i<read; i++) {
opus_int32 s;
// etc.
}
for (; i < ITEM_COUNT; i++) {
m_in[i] = 0;
}
// opus_encode() etc
}
You get rid of the useless stop flag and a level of nesting, and the construct is idiomatic for "read until you can't." (See this SO question.)
I retract what I mentioned about the code being hokey, I thought fread returns the bytes read, not the items read.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

DEFLATED data will always be larger than the uncompressed input data [duplicate] - c++

Related

8b10b encoder with byte stream output (bits carry): faster bitwise algorithm?

Converting struct headers to and from vec<char>

Why are my bytes being written and read properly by Qt<->Matlab when I use doubles, but improperly for my bytes storing uint32?

How to copy Buffer bytes block in Poco C++?

C++ Algorithm compression ratio calculation

Categories

Resources