c++ compress with libzpaq to char* buffer - c++

I have a std::vector<short> and would like to compress (and later decompress) with the libzpaq from https://github.com/zpaq/zpaq/ to something like char* buffer.
However I don't get the concept of this Reader and Writer class mentioned in the header file. How do I put my std::vector in to get a compressed buffer out?
Currently I have something like the following code.
#include <vector>
#include <string>
#include <stdio.h>
#include "libzpaq.h"
struct writer: public libzpaq::Writer {
void put(int c) {
}
};
struct reader: public libzpaq::Reader {
int get() {
}
};
void libzpaq::error(const char* msg) {
fprintf(stderr, "Oops: %s\n", msg);
exit(1);
}
int main() {
short a[] = {2,5,8,2,4,2,2,2,6,5,4,3,4,2,2};
std::vector<short> v(a, a+15);
char* buffer;
reader in;
writer out;
libzpaq::compress(&in, &out, "5");
}
And I wan't to compress the vector v into buffer. (And later decompress it again.)
But I don't understand the concept of the Reader and Writer struct/class.
The docu (http://mattmahoney.net/dc/libzpaq.3.html) also mentions the functions virtual int read(char* buf, int n) and virtual void write(const char* buf, int n) for the Reader and Writer. How can I cast a std::vector<short> to char* buf end get the length in n bytes of this buf?
Edit 1: I found a class StringBuffer in libzpaq.h line 1376. But something like
buffer = reinterpret_cast<char*> (&v[0]);
length = sizeof(short)*v.size();
libzpaq::StringBuffer inString, outString;
inString.read(buffer, length);
libzpaq::compress(&inString, &outString, "5");
std::cout << "size outstring: " << outString.size() << std::endl;
std::cout << "size instring: " << inString.size() << std::endl;
always gives me
size outstring: 0
size instring: 0
Even if I try it with a much larger vector v of some thousend random elements.

With Reader you provide byte by byte access to the data you want to compress. So with std::vector<short> it would look like this.
struct reader : public libzpaq::Reader {
reader(const std::vector<short>& v) :
m_v(v),
m_offset(0) {
}
int get() {
if (m_offset < m_v.size() * sizeof (short)) {
return *((char*) m_v.data() + m_offset++);
} else {
return -1;
}
}
int m_offset;
std::vector<short> m_v;
};
Writer should collect output data of the Reader. If you want to collect it in char array I could recommend to do it like this.
struct writer : public libzpaq::Writer {
void put(int c) {
m_buffer.push_back(c);
}
int size() {
m_buffer.size();
}
void copy_to(char* dst) {
memcpy(dst, m_buffer.data(), m_buffer.size());
}
std::vector<char> m_buffer;
};
Then call it:
writer w;
reader r(v1);
libzpaq::compress(&r, &w, "5");
char* buffer = new char[w.size()];
w.copy_to(buffer);
If you want to use StringBuffer then you should write some data to buffer, before read, that why it returns 0. Look at example:
char* buffer = reinterpret_cast<char*> (&v[0]);
int length = sizeof (short)*v.size();
libzpaq::StringBuffer in, out1, out2;
// fill buffer with source data
in.write(buffer, length);
// compress to out1
libzpaq::compress(&in, &out1, "5");
// decompress out1 to out2
libzpaq::decompress(&out1, &out2);
// check result
short* b = (short*)out2.data();
for(int i = 0; i < 15; ++i) {
std::cout << b[i] << std::endl;
}

Related

How do I deserialise a const byte * to a structure in cpp?

I have a structure like this
struct foo {
string str1;
uint16_t int1
string str2;
uint32_t int2;
string str3;
};
strings str1, str2 , str3 are of fixed length of 12 bytes, 3 bytes,etc. left padded with spaces.
I have a function
void func(const byte* data, const size_t len) which is supposed to convert the byte * data to structure foo. len is length of data.What are the ways in which I can do this?
Again the data is const pointer of byte type and will not have null characters in between to distinguish different members.
Should I use character array instead of string for str1, str2, str3?
Easiest (but most errorprone) way is to just reinterpret_cast / std::memcpy if the strings have fixed length:
// no padding
#pragma pack(push, 1)
struct foo {
char str1[12];
uint16_t int1;
char str2[3];
uint32_t int2;
char str3[4];
};
#pragma pack(pop)
void func(const byte* data, const size_t len) {
assert(len == sizeof(foo));
// non owning
const foo* reinterpreted = reinterpret_cast<const foo*>(data);
// owning
foo reinterpreted_val = *reinterpret_cast<const foo*>(data);
foo copied;
memcpy(&copied, data, len);
}
Notes:
Make sure that you're allowed to use reinterpret_cast
https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing
if you'd try to use strlen or another string operation on any of the strings you most likely will get UB, since the strings are not null terminated.
Slightly better approach:
struct foo {
char str1[13];
uint16_t int1;
char str2[4];
uint32_t int2;
char str3[5];
};
void func(const char* data, const size_t len) {
foo f;
memcpy(f.str1, data, 12);
f.str1[12] = '\0';
data+=12;
memcpy(&f.int1, data, sizeof(uint16_t));
data+=sizeof(uint16_t);
memcpy(f.str2, data, 3);
f.str2[3] = '\0';
data+=3;
memcpy(&f.int2, data, sizeof(uint32_t));
data+=sizeof(uint32_t);
memcpy(f.str3, data, 4);
f.str3[4] = '\0';
data+=4;
}
Notes:
You could combine both approaches to get rid of the pointer arithmetic. That would also account for any padding in your struct you might have.
I think the easiest way to do this is to change the string inside the structure
to the type of char. Then you can easily copy the objects of this
structure according to its size.
you will have to somehow deal with the byte order on machines with different byte
order
struct foo {
char str1[12];
uint16_t int1;
char str2[3];
uint32_t int2;
char str3[5];
};
byte* Encode(foo* p, int Size) {
int FullSize = Size * sizeof(foo);
byte* Returner = new byte[FullSize];
memcpy_s(Returner, FullSize, p, FullSize);
return Returner;
}
foo * func(const byte* data, const size_t len) {
int ArrSize = len/sizeof(foo);
if (!ArrSize || (ArrSize* sizeof(foo)!= len))
return nullptr;
foo* Returner = new foo[ArrSize];
memcpy_s(Returner, len, data, len);
return Returner;
}
int main()
{
const size_t ArrSize = 3;
foo Test[ArrSize] = { {"Test1",1000,"TT",2000,"cccc"},{"Test2",1001,"YY",2001,"vvvv"},{"Test1",1002,"UU",2002,"bbbb"}};
foo* Test1 = nullptr;
byte* Data = Encode(Test, ArrSize);
Test1 = func(Data, ArrSize * sizeof(foo));
if (Test1 == nullptr) {
std::cout << "Error extracting data!" << std::endl;
delete [] Data;
return -1;
}
std::cout << Test1[0].str1 << " " << Test1[1].str1 << " " << Test1[2].str3 << std::endl;
delete [] Data;
delete[] Test1;
return 0;
}
output
Test1 Test2 bbbb

C++ Exception "Access violation reading location"

#include <iostream>
#include <fstream>
using namespace std;
struct review {
string text;
string date;
};
void getRegistry(int i) {
review* reg = new review;
ifstream file;
file.open("test.txt", ios::binary);
if (file) {
file.seekg(i * sizeof(review), ios::beg);
file.read(reinterpret_cast<char*>(reg), sizeof(review));
cout << reg->text;
file.close();
}
delete reg;
}
void generateBinary()
{
ofstream arq("test.txt", ios::binary);
review x;
x.text = "asdasdasd";
x.date = "qweqweqwe";
for (int i = 1; i <= 1000000; i++)
{
arq.write(reinterpret_cast<const char*>(&x), sizeof(review));
}
arq.close();
}
int main() {
generateBinary();
getRegistry(2);
return 0;
}
Hello, I'm trying to make a program which writes several "reviews" to a binary file, then reads a certain registry. The program seems to work, but, in the end, it always throws an exception: "Exception thrown at 0x00007FF628E58C95 in trabalho.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF." How can I solve this? Thank you!
The problem is that you can't read/write std::string objects they way you are. std::string holds a pointer to variable-length character data that is stored elsewhere in memory. Your code is not accounting for that fact.
To be able to seek to a specific object in a file of objects the way you are attempting, you have to use fixed-sized objects, eg:
#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;
struct review {
char text[12];
char date[12];
};
void getRegistry(int i) {
ifstream file("test.txt", ios::binary);
if (file) {
if (!file.seekg(i * sizeof(review), ios::beg)) throw ...;
review reg;
if (!file.read(reinterpret_cast<char*>(&reg), sizeof(reg))) throw ...;
cout << reg.text;
}
}
void generateBinary()
{
ofstream arq("test.txt", ios::binary);
review x = {};
strncpy(x.text, "asdasdasd", sizeof(x.text)-1);
strncpy(x.date, "qweqweqwe", sizeof(x.date)-1);
for (int i = 1; i <= 1000000; ++i) {
if (!arq.write(reinterpret_cast<char*>(&x), sizeof(x))) throw ...;
}
}
int main() {
generateBinary();
getRegistry(2);
return 0;
}
Otherwise, to deal with variable-length data, you need to (de)serialize each object instead, eg:
#include <iostream>
#include <fstream>
#include <cstdint>
using namespace std;
struct review {
string text;
string date;
};
string readStr(istream &is) {
string s;
uint32_t len;
if (!is.read(reinterpret_cast<char*>(&len), sizeof(len))) throw ...;
if (len > 0) {
s.resize(len);
if (!is.read(s.data(), len)) throw ...;
}
return s;
}
void skipStr(istream &is) {
uint32_t len;
if (!is.read(reinterpret_cast<char*>(&len), sizeof(len))) throw ...;
if (len > 0) {
if (!is.ignore(len)) throw ...;
}
}
void writeStr(ostream &os, const string &s) {
uint32_t len = s.size();
if (!os.write(reinterpret_cast<char*>(&len), sizeof(len)) throw ...;
if (!os.write(s.c_str(), len)) throw ...;
}
review readReview(istream &is) {
review r;
r.text = readStr(is);
r.date = readStr(is);
return r;
}
void skipReview(istream &is) {
skipStr(is);
skipStr(is);
}
void writeReview(ostream &os, const review &r) {
writeStr(is, r.text);
writeStr(is, r.date);
}
void getRegistry(int i) {
ifstream file("test.txt", ios::binary);
if (file) {
while (i--) skipReview(file);
review reg = readReview(file);
cout << reg.text;
}
}
void generateBinary()
{
ofstream arq("test.txt", ios::binary);
review x;
x.text = "asdasdasd";
x.date = "qweqweqwe";
for (int i = 1; i <= 1000000; ++i) {
writeReview(arq, x);
}
}
int main() {
generateBinary();
getRegistry(2);
return 0;
}
The operator sizeof (review) does not return the length of containing strings. This is due to the fact that string class contain pointers to real strings, which are located in a separated location of the memory, allocated dynamically. You should use explicitly the length of strings, and write explicitly the strings instead of the class. Same thing with reading from file. Read strings first, then attribute to review.

C++ object containing an array of char using unique_ptr

I am looking on a way to use unique_ptr to allocate a structure that contains an array of char with a number of bytes that set dynamically to support different types of message.
Assuming:
struct MyMessage
{
uint32_t id;
uint32_t data_size;
char data[4];
};
How can I convert send_message() below to use a smart pointer?
void send_message(void* data, const size_t data_size)
{
const auto message_size = sizeof(MyMessage) - 4 + data_size;
const auto msg = reinterpret_cast<MyMessage*>(new char[message_size]);
msg->id = 3;
msg->data_size = data_size;
memcpy(msg->data, data, data_size);
// Sending the message
// ...
delete[] msg;
}
My attempt to use smart point using the code below does not compile:
const auto message_size = sizeof(MyMessage) - 4 + data_size;
const auto msg = std::unique_ptr<MyMessage*>(new char[message_size]);
Below a complete working example:
#include <iostream>
#include <iterator>
#include <memory>
using namespace std;
struct MyMessage
{
uint32_t id;
uint32_t data_size;
char data[4];
};
void send_message(void* data, const size_t data_size)
{
const auto message_size = sizeof(MyMessage) - 4 + data_size;
const auto msg = reinterpret_cast<MyMessage*>(new char[message_size]);
if (msg == nullptr)
{
throw std::domain_error("Not enough memory to allocate space for the message to sent");
}
msg->id = 3;
msg->data_size = data_size;
memcpy(msg->data, data, data_size);
// Sending the message
// ...
delete[] msg;
}
struct MyData
{
int page_id;
char point_name[8];
};
void main()
{
try
{
MyData data{};
data.page_id = 7;
strcpy_s(data.point_name, sizeof(data.point_name), "ab332");
send_message(&data, sizeof(data));
}
catch (std::exception& e)
{
std::cout << "Error: " << e.what() << std::endl;
}
}
The data type that you pass to delete[] needs to match what new[] returns. In your example, you are new[]ing a char[] array, but are then delete[]ing a MyMessage object instead. That will not work.
The simple fix would be to change this line:
delete[] msg;
To this instead:
delete[] reinterpret_cast<char*>(msg);
However, You should use a smart pointer to manage the memory deletion for you. But, the pointer that you give to std::unique_ptr needs to match the template parameter that you specify. In your example, you are declaring a std::unique_ptr whose template parameter is MyMessage*, so the constructor is expecting a MyMessage**, but you are passing it a char* instead.
Try this instead:
// if this struct is being sent externally, consider
// setting its alignment to 1 byte, and setting the
// size of the data[] member to 1 instead of 4...
struct MyMessage
{
uint32_t id;
uint32_t data_size;
char data[4];
};
void send_message(void* data, const size_t data_size)
{
const auto message_size = offsetof(MyMessage, data) + data_size;
std::unique_ptr<char[]> buffer = std::make_unique<char[]>(message_size);
MyMessage *msg = reinterpret_cast<MyMessage*>(buffer.get());
msg->id = 3;
msg->data_size = data_size;
std::memcpy(msg->data, data, data_size);
// Sending the message
// ...
}
Or this:
using MyMessage_ptr = std::unique_ptr<MyMessage, void(*)(MyMessage*)>;
void send_message(void* data, const size_t data_size)
{
const auto message_size = offsetof(MyMessage, data) + data_size;
MyMessage_ptr msg(
reinterpret_cast<MyMessage*>(new char[message_size]),
[](MyMessage *m){ delete[] reinterpret_cast<char*>(m); }
);
msg->id = 3;
msg->data_size = data_size;
std::memcpy(msg->data, data, data_size);
// Sending the message
// ...
}
This should work, but it is still not clear if accessing msg->data out of bounds is legal (but at least it is not worst than in your original code):
const auto message_size = sizeof(MyMessage) - ( data_size < 4 ? 0 : data_size - 4 );
auto rawmsg = std::make_unique<char[]>( message_size );
auto msg = new (rawmsg.get()) MyMessage;

Store more data in iov in C++

I am writing a C++ program (see below). My goal is to store data in iov struct. I have allocated buffer of fixed length in constructor. Every time that buffer gets filled, I want to transfer data in iov and allocated new buffer of fixed length. Finally when done with data processing, I want to return iov struct. My intension here is to store all these data into iov so that if it's required in future, I can send data easily. I have written sample code. But it seems it's not working. I got an "Bus error: 10". Can someone help me?
Sample code:
#include <iostream>
#include <string>
#include <sys/uio.h>
#include <cstdlib>
using namespace std;
#define MAX_LEN 1000
#define MIN_LEN 20
class MyClass
{
public:
MyClass();
~MyClass();
void fillData(std::string &data);
private:
struct iovec *iov;
unsigned int count;
unsigned int len;
char *buf;
unsigned int total_len;
unsigned int tmp_len;
};
MyClass::MyClass()
{
cout << "Inside constructor" << endl;
total_len = MIN_LEN;
buf = (char *)malloc(MAX_LEN);
if (buf == NULL) {
cout << "Error: can’t allocate buf" << endl;
exit(EXIT_FAILURE);
}
}
MyClass::~MyClass()
{
free(buf);
}
void MyClass::fillData(std::string &data)
{
unsigned int d_len, tmp_len, offset;
d_len = data.size();
const char* t = data.c_str();
total_len += d_len;
tmp_len += d_len;
if (total_len > MAX_LEN) {
/* Allocate memory and assign to iov */
tmp_len = d_len;
}
memcpy(buf + offset, t, d_len);
/* Adjust offset */
}
int main()
{
MyClass my_obj;
int i;
std::string str = "Hey, welcome to my first class!";
for (i = 0; i < 10; i++) {
my_obj.fillData(str);
}
return 0;
}
Without understanding the intent of your program in detail, it is very clear that you forgot to reserve memory for the iov-objects themselfes.
For example, in your constructor you write iov[0].iov_base = buf, yet iov has not been allocated before.
To overcome this, somewhere in your code, before the first access to iov, you should write something like iov = calloc(100,sizeof(struct iovev)) or a c++ equivalent using new[].
Consider the following program:
struct myStruct {
char *buf;
int len;
};
int main() {
struct myStruct *myStructPtr;
myStructPtr->buf = "Herbert"; // Illegal, since myStructPtr is not initialized; So even if "Herbert" is valid, there is no place to store the pointer to literal "Herbert".
myStructPtr[0].buf = "Herbert"; // Illegal, since myStructPtr is not initialized
// but:
struct myStruct *myStructObj = new (struct myStruct);
myStructObj->buf = "Herbert"; // OK, because myStructObj can store the pointer to literal "Herbert"
myStructObj->buf = "Something else"; // OK; myStructObj can hold a pointer, so just let it point to a different portion of memory. No need for an extra "new (struct myStruct)" here
}
I took your code, which didn't exactly use anything with the iovec, and I modified it a little.
I am not sure why developers prefer buffers of char* instead of std::string
or why use a pointer which should be allocated and then deleted instead of using a std::vector
I also added a function which uses the iovec. It is called void MyClass::print_data(). It prints all the data in the vector iovecs
#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <vector>
#include <unistd.h>
#include <sys/uio.h>
using namespace std;
class MyClass
{
vector<struct iovec> iovs;
vector<string> bufs;
public:
MyClass();
~MyClass();
void fill_data(const string &data);
void print_data();
};
MyClass::MyClass()
{
cout << "Inside constructor" << endl;
}
MyClass::~MyClass()
{
}
void MyClass::fill_data(const string &data)
{
stringstream stream;
stream << setw(2) << setfill(' ') << (this->bufs.size() + 1) << ". "
<< data << endl;
this->bufs.push_back(stream.str());
iovec iov = {&(bufs.back()[0]), bufs.back().size()};
this->iovs.push_back(iov);
}
void MyClass::print_data() {
writev(STDOUT_FILENO, iovs.data(), iovs.size());
}
int main() {
MyClass my_obj;
string str = "Hey, welcome to my first class!";
for (int i = 0; i < 10; ++i) {
my_obj.fill_data(str);
}
my_obj.print_data();
return 0;
}
compile it like so: g++ test.cpp

Get a char* from an ostream without copying

I have an ostream and data has been written to it. Now I want that data in the form of a char array. Is there a way to get the char buffer and its size without copying all of the bytes? I mean, I know I can use ostringstream and call str().c_str() on it but that produces a temporary copy.
I guess this is what you're looking for - a stream buffer that returns a pointer to its buffer:
#include <iostream>
#include <vector>
#include <string>
class raw_buffer : public std::streambuf
{
public:
raw_buffer(std::ostream& os, int buf_size = 256);
int_type overflow(int_type c) override;
std::streamsize showmanyc() override;
std::streamsize xsputn(const char_type*, std::streamsize) override;
int sync() override;
bool flush();
std::string const& str() const;
private:
std::ostream& os_;
std::vector<char> buffer;
std::string aux;
};
Now str() is simple. It returns a pointer to the underlying buffer of the auxillary buffer:
std::string const& raw_buffer::str() const
{
return aux;
}
The rest of the functions are the usual implementations for a stream buffer. showmanyc() should return the size of the auxiliary buffer (aux is just a running total of the entire buffer, buffer on the other hand is the size specified at construction).
For example, here is overflow(), which should update both buffers at same time but still treat buffer as the primary buffer:
raw_buffer::int_type raw_buffer::overflow(raw_buffer::int_type c) override
{
if (os_ && !traits_type::eq_int_type(c, traits_type::eof()))
{
aux += *this->pptr() = traits_type::to_char_type(c);
this->pbump(1);
if (flush())
{
this->pbump(-(this->pptr() - this->pbase()));
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size());
return c;
}
}
return traits_type::eof();
}
flush() is used to copy the contents of buffer to the stream (os_), and sync() should be overrided to call flush() too.
xsputn also needs to be overrided to write to aux as well:
std::streamsize raw_buffer::xsputn(const raw_buffer::char_type* str, std::streamsize count) override
{
for (int i = 0; i < count; ++i)
{
if (traits_type::eq_int_type(this->sputc(str[i]), traits_type::eof()))
return i;
else
aux += str[i];
}
return count;
}
Now we can put this together with a customized stream:
class raw_ostream : private virtual raw_buffer
, public std::ostream
{
public:
raw_ostream(std::ostream& os) : raw_buffer(os)
, std::ostream(this)
{ }
std::string const& str() const
{
return this->raw_buffer::str();
}
std::streamsize count()
{
return this->str().size();
}
};
It can be used like this:
int main()
{
raw_ostream rostr(std::cout);
rostr << "Hello, World " << 123 << true << false;
auto& buf = rostr.str();
std::cout << buf;
}