LZ4 not compressing strings correctly - c++

Recently, I made an attempt to integrate loss less data compression to my game engine for loading assets; but this simple compression example does not seem to work correctly. Any suggestions ? Here is my code :
const char *srcData = "Hi ! This is a really really really long test string !";
const int dstBufferSize = LZ4_compressBound(sizeof(srcData));
char *dstData = new char[dstBufferSize];
int bytesPassed = LZ4_compress_default(srcData, dstData,
sizeof(srcData),
dstBufferSize); // compress data
BOOST_LOG_TRIVIAL(info) << dstData << std::endl; // print compressed data
delete[] dstData;
This is the output. Obviously, you can see it's wrong (part of the string is missing) :
[2016-02-24 15:56:47.986366] [0x00000b0c] [info] #Hi !═══════════════²²²²À▀WÏÇ0
EDIT
When decompressing data, only the 'Hi' part is appearing : the rest are random characters/ no characters
EDIT 2 After Simon's suggestion, I changed the code; but after decompressing the code; i only get Hi ! (nothing after it); Here is the updated code :
const char *srcData = "Hi ! This is a really really really long test string !";
const int dstBufferSize = LZ4_compressBound(strlen(srcData) + 1);
char *dstData = new char[dstBufferSize];
int bytesPassed = LZ4_compress_default(srcData, dstData,
sizeof(srcData),
dstBufferSize);
BOOST_LOG_TRIVIAL(info) << dstData << std::endl;
std::ofstream fWriter("test.bin", std::ofstream::binary);
fWriter << dstData;
fWriter.close();
char* decStr = new char[strlen(srcData) + 1];
LZ4_decompress_fast(dstData, decStr, strlen(srcData) + 1);
std::cout << decStr << std::endl; // only Hi appearing
delete[] dstData;

You are using sizeof(srcData) which will give you the size of the pointer and not of the data it points to.
You should use strlen(srcData)+1 instead (+1 for the \0).
Or use std::string and std::string::size() (Also with +1 for the null terminator).

Related

How to convert a SINGLE char to system string C++?

I'm working on a program that displays the following chars/integers in a console app.
The code that I wrote works in console app, but doesn't work in form...
I also want to display these values in my form( textBox->text).
My myfunctions.h file:
typedef struct{
char Header [23]; //the char I want to display is 23 characters long
int Version [4]; //4 characters...
char Projectname [21];
char Developer [8];
char email [16];
char Description [132];
char Unknown [1];
}PackedContent;
void BinaryReader(){
system("CLS");
PackedContent t;
fstream myFile;
myFile.open("PackedContent.opf");
if(!myFile){
cout<<"An unknown error occured! Cannot read .opf file... exiting now... \n"<<endl;
}else{
cout <<"Reading packed data content... done!\n"<<endl;
myFile.read((char *)&t, sizeof(PackedContent));
cout<<"\nHeader :" <<t.Header <<endl; // Header info [ok]
//cout<<"\nVersion :" <<t.Version <<endl; // Project Version [err]
cout<<"\nProject name:" <<t.Projectname <<endl; // Project name
cout<<"\nDeveloper name:" <<t.Developer<<endl;
cout<<"\nEmail :" <<t.email <<endl; // Developer email
cout<<"\nDescription :" <<t.Description <<endl; // Project description [ok]
cout<<"Unknown" <<t.Unknown <<endl;
}
Form:
Binary Reader.H (form)
PackedContent t;
BinaryReader();
textBox1->Text = t.Header; // doesnt work...
I've aslo tried:
textBox1->Text = Convert::ToString(t.Header); //doesn't work...
If your char array was null-terminated, like C-strings, you could have passed it as is to an std::string c'tor:
textBox1->Text = std::string(t.Header)
Your char array is not null terminated, so you should also provide the size, like this:
int headerSize = 1; // This variable is just for the example. Instead you can pass the size right in the function below
textBox1->Text = std::string(t.Header, headerSize)
Or:
textBox1->Text = std::string(t.Header, std::size(t.Header))
You have forgot about terminating zero and this is why you have problems.
If you what to store text in c-array which can have length of 23 you need char[24] (23 + 1) to include tailing zero too.
Anyway using this c-array to store text is C-style code not C++ and this should be avoided.
If you do not include terminating zero then for example this call extBox1->Text = t.Header; will try to finds it to perform conversion to String. This will lead to undefined behavior, result string will contain some trash at the end or it will end with a crash.
If your code records terminating zero in this structure, but reaches size limit, then you will have buffer overrun error.

A risk of string copy

I just study c/c++ string using MS'Visual Studio 2013, And find a question about function strcpy_s.
I find that even one don't give a dest char* enough memory,you can use strcpy_s successfully.Is there any risk using like this?
code:
const char* s5 = "hello!";
char* cs6 = new char[1];
strcpy_s(cs6, strlen(s5) + 1, s5);
cs6[2] = 'g';
string s7(cs6);
cout << "----------cs6------------" << endl;
cout << s7 << endl;
display in console:
----------cs6------------
heglo!
And find a question about function strcpy_s. I find that even one don't give a dest char* enough memory,you can use strcpy_s successfully.Is there any risk ...?
Not if you use strcpy_s correctly, like this:
int buffer_size = 1; // this is silly since null terminated buffer
// of size 1 can only fit a string of length 0
char* cs6 = new char[buffer_size];
strcpy_s(cs6, buffer_size, s5);
cs6[buffer_size - 1] = '\0';
The code above is perfectly safe. However:
Is there any risk using like this?
const char* s5 = "hello!";
char* cs6 = new char[1];
strcpy_s(cs6, strlen(s5) + 1, s5);
Yes, there is risk. The behaviour is undefined. Best case scenario: The program crashes. Worse scenario: A blackhat hacker exploits the behaviour and steals your data that you were bound by law to not leak.

C/C++ HDF5 Read string attribute

A colleague of mine used labview to write an ASCII string as an attribute in an HDF5 file. I can see that the attribute exist, and read it, but I can't print it.
The attribute is, as shown in HDF Viewer:
Date = 2015\07\09
So "Date" is its name.
I'm trying to read the attribute with this code
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl; //prints 16
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl; //prints 50331867
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl; //prints 0
std::cout<<date<<std::endl; //prints messy characters!
//even with an std::string
std::string s(date);
std::cout<<s<<std::endl; //also prints a mess
Why is this happening? How can I get this string as a const char* or std::string?
I tried also using the type atype = H5Tcopy (H5T_C_S1);, and that didn't work too...
EDIT:
Here I provide a full, self-contained program as it was requested:
#include <string>
#include <iostream>
#include <fstream>
#include <hdf5/serial/hdf5.h>
#include <hdf5/serial/hdf5_hl.h>
std::size_t GetFileSize(const std::string &filename)
{
std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
return file.tellg();
}
int ReadBinFileToString(const std::string &filename, std::string &data)
{
std::fstream fileObject(filename.c_str(),std::ios::in | std::ios::binary);
if(!fileObject.good())
{
return 1;
}
size_t filesize = GetFileSize(filename);
data.resize(filesize);
fileObject.read(&data.front(),filesize);
fileObject.close();
return 0;
}
int main(int argc, char *argv[])
{
std::string filename("../Example.hdf5");
std::string fileData;
std::cout<<"Success read file into memory: "<<
ReadBinFileToString(filename.c_str(),fileData)<<std::endl;
hid_t handle;
hid_t magFieldsDSHandle;
hid_t dateAttribHandler;
htri_t dateAtribExists;
handle = H5LTopen_file_image((void*)fileData.c_str(),fileData.size(),H5LT_FILE_IMAGE_DONT_COPY | H5LT_FILE_IMAGE_DONT_RELEASE);
magFieldsDSHandle = H5Dopen(handle,"MagneticFields",H5P_DEFAULT);
dateAtribExists = H5Aexists(magFieldsDSHandle,"Date");
if(dateAtribExists)
{
dateAttribHandler = H5Aopen(magFieldsDSHandle,"Date",H5P_DEFAULT);
}
std::cout<<"Reading file done."<<std::endl;
std::cout<<"Open handler: "<<handle<<std::endl;
std::cout<<"DS handler: "<<magFieldsDSHandle<<std::endl;
std::cout<<"Attributes exists: "<<dateAtribExists<<std::endl;
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl;
char* date = new char[sz+1];
std::cout<<"mem bef: "<<date<<std::endl;
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl;
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl;
fprintf(stderr, "Attribute string read was '%s'\n", date);
date[sz] = '\0';
std::string s(date);
std::cout<<"mem aft: "<<date<<std::endl;
std::cout<<s<<std::endl;
H5Dclose(magFieldsDSHandle);
H5Fclose(handle);
return 0;
}
Printed output of this program:
Success read file into memory: 0
Reading file done.
Open handler: 16777216
DS handler: 83886080
Attributes exists: 1
16
mem bef:
50331867
0
Attribute string read was '�P7'
mem aft: �P7
�P7
Press <RETURN> to close this window...
Thanks.
It turned out that H5Aread has to be called with a reference of the char pointer... so pointer of a pointer:
H5Aread(dateAttribHandler,atype,&date);
Keep in mind that one doesn't have to reserve memory for that. The library will reserve memory, and then you can free it with H5free_memory(date).
This worked fine.
EDIT:
I learned that this is the case only when the string to be read has variable length. If the string has a fixed length, then one has to manually reserve memory with size length+1 and even manually set the last char to null (to get a null-terminated string. There is a function in the hdf5 library that checks whether a string is fixed in length.
I discovered that if you do not allocate date and pass the &date to H5Aread, then it works. (I use the C++ and python APIs, so I do not know the C api very well.) Specifically change:
char* date = 0;
// std::cout<<"mem bef: "<<date<<std::endl;
std::cout << H5Aread(dateAttribHandler, atype, &date) << std::endl;
And you should see 2015\07\09 printed.
You may want to consider using the C++ API. Using the C++ API, your example becomes:
std::string filename("c:/temp/Example.hdf5");
H5::H5File file(filename, H5F_ACC_RDONLY);
H5::DataSet ds_mag = file.openDataSet("MagneticFields");
if (ds_mag.attrExists("Date"))
{
H5::Attribute attr_date = ds_mag.openAttribute("Date");
H5::StrType stype = attr_date.getStrType();
std::string date_str;
attr_date.read(stype, date_str);
std::cout << "date_str= <" << date_str << ">" << std::endl;
}
As a simpler alternative to existing APIs, your use-case could be solved as follows in C using HDFql:
// declare variable 'value'
char *value;
// register variable 'value' for subsequent use (by HDFql)
hdfql_variable_register(&value);
// read 'Date' (from 'MagneticFields') and populate variable 'value' with it
hdfql_execute("SELECT FROM Example.hdf5 MagneticFields/Date INTO MEMORY 0");
// display value stored in variable 'value'
printf("Date=%s\n", value);
FYI, besides C, the code above can be used in C++, Python, Java, C#, Fortran or R with minimal changes.

Gzip compress/uncompress a long char array

I need to compress a large byte array, im already using the Crypto++ library in the application, so having the compression/decompression part in the same library would be great.
this little test works as expected:
///
string test = "bleachbleachtestingbiatchbleach123123bleachbleachtestingb.....more";
string compress(string input)
{
string result ("");
CryptoPP::StringSource(input, true, new CryptoPP::Gzip(new CryptoPP::StringSink(result), 1));
return result;
}
string decompress(string _input)
{
string _result ("");
CryptoPP::StringSource(_input, true, new CryptoPP::Gunzip(new CryptoPP::StringSink(_result), 1));
return _result;
}
void main()
{
string compressed = compress(test);
string decompressed = decompress(compressed);
cout << "orginal size :" << test.length() << endl;
cout << "compressed size :" << compressed.length() << endl;
cout << "decompressed size :" << decompressed.length() << endl;
system("PAUSE");
}
I need to compress something like this:
unsigned char long_array[194506]
{
0x00,0x00,0x02,0x00,0x00,0x04,0x00,0x00,0x00,
0x01,0x00,0x02,0x00,0x00,0x04,0x02,0x00,0x04,
0x04,0x00,0x02,0x00,0x01,0x04,0x02,0x00,0x04,
0x01,0x00,0x02,0x02,0x00,0x04,0x02,0x00,0x00,
0x03,0x00,0x02,0x00,0x00,0x04,0x01,0x00,0x04,
....
};
i tried to use the long_array as const char * and as byte then feed it to the compress function, it seems to be compressed but the decompressed one has a size of 4, and its clearly uncomplete. maybe its too long.
How could i rewrite those compress/uncompress functions to work with that byte array?
Thank you all. :)
i tried to use the array as const char * and as byte then feed it to the compress function, it seems to be compressed but the decompressed one has a size of 4, and its clearly uncomplete.
Use the alternate StringSource constructor that takes a pointer and a length. It will be immune to embedded NULL's.
CryptoPP::StringSource ss(long_array, sizeof(long_array), true,
new CryptoPP::Gzip(
new CryptoPP::StringSink(result), 1)
));
Or, you can use:
Gzip zipper(new StringSink(result), 1);
zipper.Put(long_array, sizeof(long_array));
zipper.MessageEnd();
Crypto++ added an ArraySource at 5.6. You can use it too (but its really a typedef for a StringSource):
CryptoPP::ArraySource as(long_array, sizeof(long_array), true,
new CryptoPP::Gzip(
new CryptoPP::StringSink(result), 1)
));
The 1 that is used as an argument to Gzip is a deflate level. 1 is one of the lowest compressions. You might consider using 9 or Gzip::MAX_DEFLATE_LEVEL (which is 9). The default log2 windows size is the max size, so there's no need to turn any knobs on it.
Gzip zipper(new StringSink(result), Gzip::MAX_DEFLATE_LEVEL);
You should also name your declarations. I've seen GCC generate bad code when using anonymous declarations.
Finally, use long_array (or similar) because array is a keyword in C++ 11.

how to read a particular string from a buffer

i have a buffer
char buffer[size];
which i am using to store the file contents of a stream(suppose pStream here)
HRESULT hr = pStream->Read(buffer, size, &cbRead );
now i have all the contents of this stream in buffer which is of size(suppose size here). now i know that i have two strings
"<!doctortype html" and ".html>"
which are present somewhere (we don't their loctions) inside the stored contents of this buffer and i want to store just the contents of the buffer from the location
"<!doctortype html" to another string ".html>"
in to another buffer2[SizeWeDontKnow] yet.
How to do that ??? (actually contents from these two location are the contents of a html file and i want to store the contents of only html file present in this buffer). any ideas how to do that ??
You can use strnstr function to find the right position in your buffer. After you've found the starting and ending tag, you can extract the text inbetween using strncpy, or use it in place if the performance is an issue.
You can calculate needed size from the positions of the tags and the length of the first tag nLength = nPosEnd - nPosStart - nStartTagLength
Look for HTML parsers for C/C++.
Another way is to have a char pointer from the start of the buffer and then check each char there after. See if it follows your requirement.
If that's the only operation which operates on HTML code in your app, then you could use the solution I provided below (you can also test it online - here). However, if you are going to do some more complicated parsing, then I suggest using some external library.
#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
int main()
{
const char* beforePrefix = "asdfasdfasdfasdf";
const char* prefix = "<!doctortype html";
const char* suffix = ".html>";
const char* postSuffix = "asdasdasd";
unsigned size = 1024;
char buf[size];
sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);
cout << "Before: " << buf << endl;
const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);
if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
{
unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
char newBuf[size];
strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
newBuf[textLen] = 0;
cout << "After: " << newBuf << endl;
}
return 0;
}
EDIT
I get it now :). You should use strstr to find the first occurence of the prefix then. I edited the code above, and updated the link.
Are you limited to C, or can you use C++?
In the C library reference there are plenty of useful ways of tokenising strings and comparing for matches (string.h):
http://www.cplusplus.com/reference/cstring/
Using C++ I would do the following (using buffer and size variables from your code):
// copy char array to std::string
std::string text(buffer, buffer + size);
// define what we're looking for
std::string begin_text("<!doctortype html");
std::string end_text(".html>");
// find the start and end of the text we need to extract
size_t begin_pos = text.find(begin_text) + begin_text.length();
size_t end_pos = text.find(end_text);
// create a substring from the positions
std::string extract = text.substr(begin_pos,end_pos);
// test that we got the extract
std::cout << extract << std::endl;
If you need C string compatibility you can use:
char* tmp = extract.c_str();