Decompressing a vector into another vector using miniz in c++ - c++

I'm trying to do the simplest thing here. I want to create a method that will take in a byte (char) array, inflate it using miniz tinfl_decompress method and then return a byte array containing the inflated data.
First things first. The arrays given will never be bigger than 100kB, vast majority will be smaller than 50k. Hence, I don't think I need to use any kind of buffer for it. Anyway, this is what I've got:
std::vector<unsigned char> unzip(std::vector<unsigned char> data)
{
unsigned char *outBuffer = new unsigned char[1024 * 1024];
tinfl_decompressor inflator;
tinfl_status status;
tinfl_init(&inflator);
size_t inBytes = data.size() - 9;
size_t outBytes = 1024 * 1024;
status = tinfl_decompress(&inflator, (const mz_uint8 *)&data[9], &inBytes, outBuffer, (mz_uint8 *)outBuffer, &outBytes, 0);
return ???
}
I know the output I want begins at memory location &outBuffer, but I don't know how long it is (I do happen to know it will be less than 1MB), so I cannot pack it into a vector and send it on it's way. I had hoped that outBytes would hold the size of the output, but they are set to 1 after the decompression. I know that decompression didn't fail, since status returned is TINFL_STATUS_DONE (0).
Is this even the right way of doing it? This is a method that will be called a lot in my program, so I want something that is as fast as possible.
How do I get the vector out of it? Should I use a different data type? An array (the [] type)? The decompressed data will be read sequentially only once, after what it will be discarded.
EDIT:
It seems that the file I was trying to decompress was not of the proper format; it was zip, this takes zlib.

Caveat: Totally untested code.
It should go something like exchange
unsigned char *outBuffer = new unsigned char[1024 * 1024];
for
std::vector<unsigned char> outBuffer(1024 * 1024);
to get a vector. Then call tinfl_decompress using the data method to get the vector's underlying buffer. It should look something like
status = tinfl_decompress(&inflator,
(const mz_uint8 *)&data[9],
&inBytes,
(mz_uint8 *)outBuffer.data(),
(mz_uint8 *)outBuffer.data(),
&outBytes,
0);
And then resize the vector to the number of bytes stored in the vector for convenience later.
outBuffer.resize(outBytes);
Note the vector will NOT be resized down. It will still have a capacity of 1 MiB. If this is a problem, an additional call to std::vector::shrink_to_fit is required.
Finally
return outBuffer;

Related

Copy multiple void* into one vector

I am coming back to C++ after many years (and never went so deeply before), so please indulge with me for my noobness :)
I have, as a field of a struct, a *void which points to some data. The memory pointed by the struct is filled with different data after every call of a given function, so I'd like to "cache" the results after every function call in a vector, in order to obtain all data in the end. How can I achieve this goal?
I declared a static vector<unsigned char> vectorBuffer; and tried vectorBuffer.insert(vectorBuffer.end(), (unsigned char*)myStruct->thePointer) and vectorBuffer.push_back((unsigned char*)myStruct->thePointer) but obviously I'm getting errors. Which is the correct way to obtain such a result?
Thanks
EDIT: I know the size of the void* as well, since I have another field in my struct that refreshes as the data do.
Something along these lines is what you want to do to buffer the data fragmented over multiple callbacks:
#include <vector>
std::vector<char> buffer;
buffer.insert(buffer.end(), data, data+length);
Assuming that data is your data coming in the callback and length is available too.
You should declare the vector with
static vector<unsigned char *> vectorBuffer;
(it's an array of unsigned character pointers, not unsigned characters).
To save the data (assuming you know the size),
unsigned char *p = new unsigned char[myStruct->bufferLength];
memcpy((void *) p, myStruct->thePointer, myStruct->bufferLength);
vectorBuffer.push_back(p);
You can then keep track of the length with
static vector<unsigned char *> bufferLength;
bufferLength.push_back(myStruct->bufferLength);
Note that you will need to free the memory afterwards.

How do I combine multiple char reads into a std::vector?

I'm reading multiple reports from a HID device into an unsigned char, then trying to copy the data to a std::vector. I'm also writing the data out to a file for hex analysis, whose content appears to be correct when I view it. However, the std::vector doesn't appear to contain the correct data when I dump it to the console.
This is the code:
typedef vector<unsigned char> buffer_t;
buffer_t sendCommand (hid_device *devh, const unsigned char cmd[], int reports) {
unsigned char outbuf[0x40];
buffer_t retbuf(0x40 * reports);
hid_write(devh, cmd, 0x41);
int i;
FILE *file = fopen("test.out", "w+b");
while (i++ < reports) {
hid_read(devh, outbuf, 0x40);
fwrite(outbuf, 1, sizeof(outbuf), file);
retbuf.push_back(*outbuf);
}
fclose(file);
cout << &retbuf[0];
return retbuf;
}
I have a feeling I'm way off the mark here; I'm fairly new to C/C++, and I've been stuck with this for a while now. Can anyone tell me what I'm doing wrong, or point me in a better direction?
You want to add multiple unsigned char objects to your vector, but push_back only adds one.
So, replace retbuf.push_back(*outbuf); with either:
for (size_t i = 0; i < sizeof(outbuf); ++i) {
retbuf.push_back(outbuf[i]);
}
or
std::copy(outbuf, outbuf+sizeof(outbuf), std::back_inserter(retbuf));
or
retbuf.insert(retbuf.end(), outbuf, outbuf+sizeof(outbuf));
which all do the same thing.
You create your vector with a certain size:
buffer_t retbuf(0x40 * reports);
but push_back increases the size of the vector by adding an element at the end. You should create it empty:
buffer_t retbuf;
Optionally, you could arrange for the vector to have enough space allocated, ready for the elements you're going to add:
retbuf.reserve(0x40 * reports);
This is purely a performance issue, but sometimes it's a significant issue for large vectors, or vectors of types that (unlike unsigned char) are expensive to copy/move when the vector runs out of internal space and has to allocate more.
A note on style: you repeat the literal value 0x40 a few times, and also use sizeof(outbuf). It's often best to define a constant, and use the name throughout:
const int report_size = 0x40;
This is partly in case the number changes in future, but also it's about the readability of your code -- if someone sees 0x40 they may or may not immediately understand why that is the correct value. If someone sees report_size then they don't know what value that actually is until they look it up, but they do know why you're using that value.
The problem is in this line: buffer_t retbuf(0x40 * reports); It means that you create vector with 0x40 * reports elements filled with default value for unsigned char (zero). Then push_back() just adds new elements to the end of vector and doesn't affect existing elements.
You need to rewrite it this way:
buffer_t retbuf; // Empty vector
retbuf.reserve(0x40 * reports); // Preallocate memory for known element count
This way push_back() will work as expected and add elements to empty vector from beginning.
And of course you shall push_back() all elements of outbuf, not only first one (*outbuf).
To push back multiple values use std::vector's function assign. For example:
std::vector<char>vec1;
char array[3] = {'a', 'b', 'c'};
vec1.assign(array, array+3);
I am currently working on a project were I had to do this.
Your vector is of a type unsigned char, which means every element of it is of this type. Your outbuf is an array of unsigned chars.
The push_back() only appends one item to the end of the vector, so push_back(*outbuf) will only add the first element of the outbuf to the vector, not all of them.
To put all the data into the vector, you will need to push_back them one-by-one, or use std::copy.
Note that since outbuf is a char array, then *outbuf will be the first element of the char array because of the array/pointer duality.
I think you probably wanted to do:
typedef vector<string> buffer_t; // alternatively vector<unsigned char*>
...
retbuf.push_back(outbuf);
...
Or
typedef vector<unsigned char> buffer_t;
...
for (size_t i = 0; i < sizeof(outbuf); i++)
retbuf.push_back(outbuf);
...

Copy unsigned char * to unsigned char*

I need to save packet state for a while.
So I read the packet data which is represented as unsigned char* and than I create a record with this data and save the record in the list for a while.
Which will be a better way to represent the packet in the record as char* or as char[].
How do i copy the read data ( unsigned char ) to both options :
To unsigned char[] and to unsigned char*
I need to copy the data because each time I read packet it will be readed to the same char*,so when I save it for a while I need to copy data first
If the packet data is binary I'd prefer using std::vector to store the data, as opposed to one of the C strXXX functions, to avoid issues with a potential NULL character existing in the data stream. Most strXXX functions look for NULL characters and truncate their operation. Since the data is not a string, I'd also avoid std::string for this task.
std::vector<unsigned char> v( buf, buf + datalen );
The vector constructor will copy all the data from buf[0] to buf[datalen - 1] and will deallocate the memory when the vector goes out of scope. You can get a pointer to the underlying buffer using v.data() or &v[0].
So, it sounds like you need to save the data from multiple packets in a list until some point in the future.
If it was me, I'd use std::string or std::vector normally because that removes allocation issues and is generally plenty fast.
If you do intend to use char* or char[], then you'd want to use char*. Declaring a variable like "char buf[1024];" allocates it on the stack, which means that when that function returns it goes away. To save it in a list, you'd need to dynamically allocate it, so you would do something like "char *buf = new char[packet.size];" and then copy the data and store the pointer and the length of the data in your list (or, as I said before, use std::string which avoids keeping the length separately).
How do you copy the data?
Probably memcpy. The strcpy function would have problems with data which can have nul characters in it, which is common in networking situations. So, something like:
char *buf = new char[packet_length];
memcpy(buf, packet_data, packet_length);
// Put buf and packet_length into a structure in your list.

Avoid overwriting on array

Is there any way where overwritting of the array can be avoided? In my implementation I have to write data to an buffer/array of fixed size say buff[100] and will be using buff[100] whenever I want to o/p data I will write to buff[100] (i.e will you again use the same buff[100]) the next time when I use buff[100] it should append the data.
Maintain an index into the array. When the length of the data you want to write plus the index is greater than or equal to 100, write out the buffer and the data. Otherwise, shove the data into the buffer at that offset and add the length of the data to the index.
For example, assuming that the following variables are in scope:
#define BUFFER_LENGTH 100
char buffer[BUFFER_LENGTH];
int buffer_index;
int output_fd;
You could have a function like this:
void write_buffered(char *data, int data_length)
{
if (data_length + buffer_index >= BUFFER_LENGTH) {
write(output_fd, buffer, buffer_index);
write(output_fd, data, data_length);
buffer_index = 0;
return;
}
memcpy(&buffer[buffer_index], data, data_length);
buffer_index += data_length;
}
This is written C-style because I know C better than C++, but the basic principles are sound. Obviously, avoid the use of global variables and alter the write() calls to whatever call you are already using.
Since you mention C++, why don't you use a std::vector or similar container? It would be much simpler and less error-prone.

Reinterpret float vector as unsigned char array and back

I've searched and searched stackoverflow for the answer, but have not found what I needed.
I have a routine that takes an unsigned char array as a parameter in order to encode it as Base64. I would like to encode an STL float vector (vector) in Base64, and therefore would need to reinterpret the bytes in the float vector as an array of unsigned characters in order to pass it to the encode routine. I have tried a number of things from reinterpret and static casts, to mem copies, etc, but none of them seem to work (at least not the way I implemented them).
Likewise, I'll need to do the exact opposite when decoding the encoded data back to a float array. The decode routine will provide the decoded data as an unsigned char array, and I will need to reinterpret that array of bytes, converting it to a float vector again.
Here is a stripped down version of my C++ code to do the encoding:
std::string
EncodeBase64FloatVector( const vector<float>& p_vector )
{
unsigned char* sourceArray;
// SOMEHOW FILL THE sourceArray WITH THE FLOAT VECTOR DATA BITS!!
char* target;
size_t targetSize = p_vector.size() * sizeof(float);
target = new char[ targetSize ];
int result = EncodeBase64( sourceArray, floatArraySizeInUChars, target, targetSize );
string returnResult;
if( result != -1 )
{
returnResult = target;
}
delete target;
delete sourceArray;
return returnResult;
}
Any help would be greatly appreciated. Thanks.
Raymond.
std::vector guarantees the data will be contiguous, and you can get a pointer to the first element in the vector by taking the address of the first element (assuming it's not empty).
typedef unsigned char byte;
std::vector<float> original_data;
...
if (!original_data.empty()) {
const float *p_floats = &(original_data[0]); // parens for clarity
Now, to treat that as an array of unsigned char, you use a reinterpret_cast:
const byte *p_bytes = reinterpret_cast<const byte *>(p_floats);
// pass p_bytes to your base-64 encoder
}
You might want to encode the length of the vector before the rest of the data, in order to make it easier to decode them.
CAUTION: You still have to worry about endianness and representation details. This will only work if you read back on the same platform (or a compatible one) that you wrote with.
sourceArray = reinterpret_cast<const unsigned char *>(&(p_vector[0]))
I would highly recommend checking out Google's protobuf to solve your problem. Floats and doubles can vary in size and layout between platforms and that package has solved all those problems for you. Additionally, it can easily handle your data structure should it ever become more complicated than a simple array of floats.
If you do use that, you will have to do your own base64 encoding still as protobuf encodes data assuming you have an 8-bit clean channel to work with. But that's fairly trivial.