So I'm trying to make use of zlib in C++ using Visual Studio 2019, to extract the contents from a specific file format. According to the documentation that I'm following, this file format is consisted mainly consisted of values that's consisted of "32-bit (4-byte) little-endian signed integers", and within several sections of the file there's also blocks of data that is compressed by zlib to save space.
But I believe that's not relevant to my problem, I'm having trouble with just simply using zlib.
I should note that I'm unfamiliar to using fstream and more specifically, zlib. I can guess uncompress() may be the function I'm looking since the number for the compressed bytes is read before I can even call it. It's not unlikely the issue could be related to the former library.
But I do believe I'm not putting in the buffer properly (or maybe not even reading it from the file properly), as I'm getting either syntax errors for incorrect types, the program crashing, and most importantly, unable to get the blocks of the uncompressed data. I can tell it's not working properly as it's returning Z_STREAM_ERROR (-2) or Z_DATA_ERROR (-3) from the call, not Z_OK (0). The program at least reads the 32-bit data correctly, at least.
#include <iostream>
#include <fstream>
#include "zlib.h"
using namespace std;
//Basically it works like this.
int main()
{
streampos size;
unsigned char memblock;
char* memblock2;
Bytef memblock3;
Bytef memblock_res;
int ret;
int res=0;
uint32_t a;
ifstream file("not_a_real_file.lol", ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
file.seekg(0, ios::beg);
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Format Identifier: " << a << "\n";
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "File Version: " << a << "\n";
//A bunch of other 32-bit values be here, would be redudent to put them all.
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Length of Zlib Block: " << a << "\n";
//Anyways, this is where things get really weird. I'm using 'a' to determine the length of bytes, and I know it should be stored into it's own variable.
char* membuffer = new char[a];
file.read(membuffer, a);
uLongf zaz;
res=uncompress(&memblock_res, &zaz, (unsigned char*)(&membuffer), a);
if (res==Z_OK)
std::cout << "Good!\n";
std::cout << "This resulted in " << (int)res << ", it's got this many bytes: " << zaz << "\n";
//It should be Z_DATA_ERROR with 0 bytes returned; it's obviously not the desired results.
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Value after Block: " << a << "\n";
//At least it seems the 32-bit value that comes after the block is correctly read.
file.close();
}
}
Either I'm using read() incorrectly, or don't know how to properly convert the data into the use for uncompress(). Or maybe I'm using the wrong functions; I honestly have no clue. I spent hours trying to figure this out from looking up things, but having no avail.
Related
I am assigned a task where I have to explain why PrintStream and OutputDataStream produce two different kinds of output files (which I know - the first writes a string representation byte-by-byte, whilst the second writes the raw binary data). In order to elaborate on the background of this, I wanted to write a small C++ file to demonstrate reading the written data off the file back to stdout.
The idea is simple: Write short values from 20.000 to 32.000 to a file using OutputDataStream using it's writeShort(int) method. According to the Java documentation, those values are written in two bytes.
Now... I did try to implement this with std::ifstream on the C++ side, and I believe I ran into some endianess-related issues. According to what I have gathered from various SO questions, Java will write in "network format", which is apparently a different description for "Little Endian". But as far as I think I am aware of, my Mac (MacBook, mid. 2014), uses "Big Endian" - so the bytes are in a wrong order.
This is what I have come up with so far:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char** argv) {
ifstream fh("./out.DataOutputStream.dat", ios::in|ios::binary);
if(!fh.is_open()) {
cerr << "Error while opening file." << endl;
cerr << "Are you in the same directory as <out.DataOutputStream.dat>?" << endl;
return 1;
}
cout << "--- Begin of data ---" << endl;
char num1, num2;
#define SWAP(b) ( (b >> 8) | (b << 8) )
while(!fh.eof()) {
fh.read(&num1, 1); // read one byte
fh.read(&num2, 1); // read the next byte
cout << (unsigned short)SWAP(num2) << (unsigned short)SWAP(num1);
}
cout << flush;
cout << "--- End of data ---" << endl;
return 0;
}
This result does print 32000 at the (very) end...but it prints that twice, and everything else is completely off... Any idea on how I can get this to work with the STL only?
I need to read in an mp3 file so that I can run the hash(). I do not need to parse the mp3 tag data out of this so I can just read the whole thing all together.
Currently I am using ifstream() to open the file in binary mode. I then get the size of the file, allocate enough space with a char* and read it all at once.
I know that when I run cout on this data I can only see "ID3 and some gibberish." I opened the mp3 file up in a hex editor and ID3 and the gibberish was what was at the beginning of the file. The next binary data I believe is being interpreted as end of line/string and does not print.
This is okay because I don't need to print it. I need to get the data in a format that I can run the Hash function on. Any ideas on a type I can convert it to that will not interpret the end of the file being a couple bytes in?
Here is code of what I have so far.
bool Sender::openSoundFile(){
streampos size;
soundSampleStream.open(soundFilePath.c_str(), ios::in|ios::binary|ios::ate);
if(!soundSampleStream.is_open()){
return false;
}
size = soundSampleStream.tellg();
cout << "Size of MP3: " << size << endl;
soundFileInMemory = new char [size];
soundSampleStream.seekg (0, ios::beg);
soundSampleStream.read(soundFileInMemory, size);
cout << "Error is: " << strerror(errno) << endl;
cout << "gcount: " << soundSampleStream.gcount() << endl;
soundSampleStream.close();
cout << soundFileInMemory << endl;
return true;
}
I get no error on reading the file and gcount() comes back with the correct numbers of bytes for the file.
Edit 1:
To add some more on this. The hash() seems to hash the char* and not the data being pointed at because the hash value changes on different program runs. This is why I need to convert to some other thing. I also don't think that a vector is supported by the c++11 hash().
std::string has a constructor that takes a char * and a size_t. See the fourth item in http://en.cppreference.com/w/cpp/string/basic_string/basic_string.
std::string file_contents(soundFileInMemory, size);
That will convert your char array to a string.
I am currently reading a binary file that i know the structure of and i am trying to place into a struct but when i come to read off the binary file i am finding that when it prints out the struc individually it seems to come out right but then on the fourth read it seems to add it onto last member from the last read.
here the code which probably make's more sense than how i am explaining it:
Struc
#pragma pack(push, r1, 1)
struct header
{
char headers[13];
unsigned int number;
char date[19];
char fws[16];
char collectversion[12];
unsigned int seiral;
char gain[12];
char padding[16];
};
Main
header head;
int index = 0;
fstream data;
data.open(argv[1], ios::in | ios::binary);
if(data.fail())
{
cout << "Unable to open the data file!!!" << endl;
cout << "It looks Like Someone Has Deleted the file!"<<endl<<endl<<endl;
return 0;
}
//check the size of head
cout << "Size:" << endl;
cout << sizeof(head) << endl;
data.seekg(0,std::ios::beg);
data.read( (char*)(&head.headers), sizeof(head.headers));
data.read( (char*)(&head.number), sizeof(head.number));
data.read( (char*)(&head.date), sizeof(head.date));
data.read( (char*)head.fws, sizeof(head.fws));
//Here im just testing to see if the correct data went in.
cout<<head.headers<< endl;
cout<<head.number<< endl;
cout<<head.date<< endl;
cout<<head.fws<< endl;
data.close();
return 0;
Output
Size:
96
CF001 D 01.00
0
15/11/2013 12:16:56CF10001001002000
CF10001001002000
for some reason the fws seems to add to head.date? but when i take out the line to read head.fws i get a date that doesn't have anything added?
i also know thier more data to get for the header but i wanted to check the data up to what i have written is correct
cheers
1. Your date is declared as:
char date[19];
2. Your date format is exactly 19-characters long:
15/11/2013 12:16:56
3. And you print it this way:
cout<<head.date
Shortly speaking, you try to print fixed char[] using its address, which means, that it will be interpreted as null-terminated c-string. Is it null-terminated? No.
To solve this problem, declare date as:
char date[20];
And after you fill it, append null terminator:
date[19] = 0;
It applies to all members, that will be interpreted as string literals.
You have char date[19] filled with 15/11/2013 12:16:56 which is exactly 19 valid characters. This leaves no space for a terminating null and so doing cout << head.date outputs your 19 valid characters and then a load of garbage.
I have a c++ program that computes populations within a given radius by reading gridded population data from an ascii file into a large 8640x3432-element vector of doubles. Reading the ascii data into the vector takes ~30 seconds (looping over each column and each row), while the rest of the program only takes a few seconds. I was asked to speed up this process by writing the population data to a binary file, which would supposedly read in faster.
The ascii data file has a few header rows that give some data specs like the number of columns and rows, followed by population data for each grid cell, which is formatted as 3432 rows of 8640 numbers, separated by spaces. The population data numbers are mixed formats and can be just 0, a decimal value (0.000685648), or a value in scientific notation (2.687768e-05).
I found a few examples of reading/writing structs containing vectors to binary, and tried to implement something similar, but am running into problems. When I both write and read the vector to/from the binary file in the same program, it seems to work and gives me all the correct values, but then it ends with either a "segment fault: 11" or a memory allocation error that a "pointer being freed was not allocated". And if I try to just read the data in from the previously written binary file (without re-writing it in the same program run), then it gives me the header variables just fine but gives me a segfault before giving me the vector data.
Any advice on what I might have done wrong, or on a better way to do this would be greatly appreciated! I am compiling and running on a mac, and I don't have boost or other non-standard libraries at present. (Note: I am extremely new at coding and am having to learn by jumping in the deep end, so I may be missing a lot of basic concepts and terminology -- sorry!).
Here is the code I came up with:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <fstream>
# include <iostream>
# include <vector>
# include <string.h>
using namespace std;
//Define struct for population file data and initialize one struct variable for reading in ascii (A) and one for reading in binary (B)
struct popFileData
{
int nRows, nCol;
vector< vector<double> > popCount; //this will end up having 3432x8640 elements
} popDataA, popDataB;
int main() {
string gridFname = "sample";
double dum;
vector<double> tempVector;
//open ascii population grid file to stream
ifstream gridFile;
gridFile.open(gridFname + ".asc");
int i = 0, j = 0;
if (gridFile.is_open())
{
//read in header data from file
string fileLine;
gridFile >> fileLine >> popDataA.nCol;
gridFile >> fileLine >> popDataA.nRows;
popDataA.popCount.clear();
//read in vector data, point-by-point
for (i = 0; i < popDataA.nRows; i++)
{
tempVector.clear();
for (j = 0; j<popDataA.nCol; j++)
{
gridFile >> dum;
tempVector.push_back(dum);
}
popDataA.popCount.push_back(tempVector);
}
//close ascii grid file
gridFile.close();
}
else
{
cout << "Population file read failed!" << endl;
}
//create/open binary file
ofstream ofs(gridFname + ".bin", ios::trunc | ios::binary);
if (ofs.is_open())
{
//write struct to binary file then close binary file
ofs.write((char *)&popDataA, sizeof(popDataA));
ofs.close();
}
else cout << "error writing to binary file" << endl;
//read data from binary file into popDataB struct
ifstream ifs(gridFname + ".bin", ios::binary);
if (ifs.is_open())
{
ifs.read((char *)&popDataB, sizeof(popDataB));
ifs.close();
}
else cout << "error reading from binary file" << endl;
//compare results of reading in from the ascii file and reading in from the binary file
cout << "File Header Values:\n";
cout << "Columns (ascii vs binary): " << popDataA.nCol << " vs. " << popDataB.nCol << endl;
cout << "Rows (ascii vs binary):" << popDataA.nRows << " vs." << popDataB.nRows << endl;
cout << "Spot Check Vector Values: " << endl;
cout << "Index 0,0: " << popDataA.popCount[0][0] << " vs. " << popDataB.popCount[0][0] << endl;
cout << "Index 3431,8639: " << popDataA.popCount[3431][8639] << " vs. " << popDataB.popCount[3431][8639] << endl;
cout << "Index 1600,4320: " << popDataA.popCount[1600][4320] << " vs. " << popDataB.popCount[1600][4320] << endl;
return 0;
}
Here is the output when I both write and read the binary file in the same run:
File Header Values:
Columns (ascii vs binary): 8640 vs. 8640
Rows (ascii vs binary):3432 vs.3432
Spot Check Vector Values:
Index 0,0: 0 vs. 0
Index 3431,8639: 0 vs. 0
Index 1600,4320: 25.2184 vs. 25.2184
a.out(11402,0x7fff77c25310) malloc: *** error for object 0x7fde9821c000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
And here is the output I get if I just try to read from the pre-existing binary file:
File Header Values:
Columns (binary): 8640
Rows (binary):3432
Spot Check Vector Values:
Segmentation fault: 11
Thanks in advance for any help!
When you write popDataA to the file, you are writing the binary representation of the vector of vectors. However this really is quite a small object, consisting of a pointer to the actual data (itself a series of vectors, in this case) and some size information.
When it's read back in to popDataB, it kinda works! But only because the raw pointer that was in popDataA is now in popDataB, and it points to the same stuff in memory. Things go crazy at the end, because when the memory for the vectors is freed, the code tries to free the data referenced by popDataA twice (once for popDataA, and once again for popDataB.)
The short version is, it's not a reasonable thing to write a vector to a file in this fashion.
So what to do? The best approach is to first decide on your data representation. It will, like the ASCII format, specify what value gets written where, and will include information about the matrix size, so that you know how large a vector you will need to allocate when reading them in.
In semi-pseudo code, writing will look something like:
int nrow=...;
int ncol=...;
ofs.write((char *)&nrow,sizeof(nrow));
ofs.write((char *)&ncol,sizeof(ncol));
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val=data[i][j];
ofs.write((char *)&val,sizeof(val));
}
}
And reading will be the reverse:
ifs.read((char *)&nrow,sizeof(nrow));
ifs.read((char *)&ncol,sizeof(ncol));
// allocate data-structure of size nrow x ncol
// ...
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val;
ifs.read((char *)&val,sizeof(val));
data[i][j]=val;
}
}
All that said though, you should consider not writing things into a binary file like this. These sorts of ad hoc binary formats tend to live on, long past their anticipated utility, and tend to suffer from:
Lack of documentation
Lack of extensibility
Format changes without versioning information
Issues when using saved data across different machines, including endianness problems, different default sizes for integers, etc.
Instead, I would strongly recommend using a third-party library. For scientific data, HDF5 and netcdf4 are good choices which address all of the above issues for you, and come with tools that can inspect the data without knowing anything about your particular program.
Lighter-weight options include the Boost serialization library and Google's protocol buffers, but these address only some of the issues listed above.
i have a function defined as follows:
void AddHeadCode(std::ofstream &ostream, size_t length){
ostream.write((char*)length, sizeof(length));
ostream.seekp(0x10L, std::ios::beg);
}
Now when this executes, it will fail obviously... as the char pointer will point nowhere.
But i want the actual pointervalue written into the file.
Like length = 19152
Then when I open up the file in an HEX Editor, I should find 0d4a there.
How can this be achieved in c++? Im kinda lost here...
Take the address of your length variable, and pass that address to .write:
void AddHeadCode(std::ofstream &ostream, size_t length){
// by popular demand, removed C-style cast
ostream.write(reinterpret_cast<char*>(&length), sizeof(length));
ostream.seekp(0x10L, std::ios::beg);
}
But, this is not usually refered to as writing an integer "as hex". This is sometimes referred to as writing an integer "as binary", or simply writing an integer.
Note that what you have done is not a portable practice. If you read the value back in on a different computer (or even on the same computer, but with a different compiler), you might not read in the same value.
Filestreams are tricky so I am uncertain about that but I use this for stringstreams:
std::basic_stringstream<wchar_t> oTmpStream;
oTmpStream << L"0x" << std::nouppercase << std::setfill( L'0' ) << std::hex << std::setw( 8 ) << iSomeValue
// or without fancy formatting;
oTmpStream << std::hex << iSomeValue