I am assigned a task where I have to explain why PrintStream and OutputDataStream produce two different kinds of output files (which I know - the first writes a string representation byte-by-byte, whilst the second writes the raw binary data). In order to elaborate on the background of this, I wanted to write a small C++ file to demonstrate reading the written data off the file back to stdout.
The idea is simple: Write short values from 20.000 to 32.000 to a file using OutputDataStream using it's writeShort(int) method. According to the Java documentation, those values are written in two bytes.
Now... I did try to implement this with std::ifstream on the C++ side, and I believe I ran into some endianess-related issues. According to what I have gathered from various SO questions, Java will write in "network format", which is apparently a different description for "Little Endian". But as far as I think I am aware of, my Mac (MacBook, mid. 2014), uses "Big Endian" - so the bytes are in a wrong order.
This is what I have come up with so far:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char** argv) {
ifstream fh("./out.DataOutputStream.dat", ios::in|ios::binary);
if(!fh.is_open()) {
cerr << "Error while opening file." << endl;
cerr << "Are you in the same directory as <out.DataOutputStream.dat>?" << endl;
return 1;
}
cout << "--- Begin of data ---" << endl;
char num1, num2;
#define SWAP(b) ( (b >> 8) | (b << 8) )
while(!fh.eof()) {
fh.read(&num1, 1); // read one byte
fh.read(&num2, 1); // read the next byte
cout << (unsigned short)SWAP(num2) << (unsigned short)SWAP(num1);
}
cout << flush;
cout << "--- End of data ---" << endl;
return 0;
}
This result does print 32000 at the (very) end...but it prints that twice, and everything else is completely off... Any idea on how I can get this to work with the STL only?
Related
So I'm trying to make use of zlib in C++ using Visual Studio 2019, to extract the contents from a specific file format. According to the documentation that I'm following, this file format is consisted mainly consisted of values that's consisted of "32-bit (4-byte) little-endian signed integers", and within several sections of the file there's also blocks of data that is compressed by zlib to save space.
But I believe that's not relevant to my problem, I'm having trouble with just simply using zlib.
I should note that I'm unfamiliar to using fstream and more specifically, zlib. I can guess uncompress() may be the function I'm looking since the number for the compressed bytes is read before I can even call it. It's not unlikely the issue could be related to the former library.
But I do believe I'm not putting in the buffer properly (or maybe not even reading it from the file properly), as I'm getting either syntax errors for incorrect types, the program crashing, and most importantly, unable to get the blocks of the uncompressed data. I can tell it's not working properly as it's returning Z_STREAM_ERROR (-2) or Z_DATA_ERROR (-3) from the call, not Z_OK (0). The program at least reads the 32-bit data correctly, at least.
#include <iostream>
#include <fstream>
#include "zlib.h"
using namespace std;
//Basically it works like this.
int main()
{
streampos size;
unsigned char memblock;
char* memblock2;
Bytef memblock3;
Bytef memblock_res;
int ret;
int res=0;
uint32_t a;
ifstream file("not_a_real_file.lol", ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
file.seekg(0, ios::beg);
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Format Identifier: " << a << "\n";
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "File Version: " << a << "\n";
//A bunch of other 32-bit values be here, would be redudent to put them all.
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Length of Zlib Block: " << a << "\n";
//Anyways, this is where things get really weird. I'm using 'a' to determine the length of bytes, and I know it should be stored into it's own variable.
char* membuffer = new char[a];
file.read(membuffer, a);
uLongf zaz;
res=uncompress(&memblock_res, &zaz, (unsigned char*)(&membuffer), a);
if (res==Z_OK)
std::cout << "Good!\n";
std::cout << "This resulted in " << (int)res << ", it's got this many bytes: " << zaz << "\n";
//It should be Z_DATA_ERROR with 0 bytes returned; it's obviously not the desired results.
file.read(reinterpret_cast<char*>(&a), sizeof(a));
std::cout << "Value after Block: " << a << "\n";
//At least it seems the 32-bit value that comes after the block is correctly read.
file.close();
}
}
Either I'm using read() incorrectly, or don't know how to properly convert the data into the use for uncompress(). Or maybe I'm using the wrong functions; I honestly have no clue. I spent hours trying to figure this out from looking up things, but having no avail.
I am reading a file header using ifstream.
Edit: I was asked to put the full minimal program, so here it is.
#include <iostream>
#include <fstream>
using namespace std;
#pragma pack(push,2)
struct Header
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
int main(int argc,char* argv[])
{
string fileName;
fileName = "test";
string fileInName = fileName + ".dst";
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
if (!fileIn)
{
cout << "File Not Found" << endl;
return 0;
}
Header h={};
if (fileIn.is_open()) {
cout << "\n" << endl;
fileIn.read(reinterpret_cast<char *>(&h.label), sizeof(h.label));
cout << "Label: " << h.label << endl;
fileIn.read(reinterpret_cast<char *>(&h.st), sizeof(h.st));
cout << "Stitches: " << h.st << endl;
fileIn.read(reinterpret_cast<char *>(&h.co), sizeof(h.co));
cout << "Colour Count: " << h.co << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusXExtends),sizeof(h.plusXExtends));
cout << "Extends: " << h.plusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.minusXExtends),sizeof(h.minusXExtends));
cout << "Extends: " << h.minusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusYExtends),sizeof(h.plusYExtends));
cout << "Extends: " << h.plusYExtends << endl;
// This will output corrupted
cout << endl << endl;
cout << "Label: " << h.label << endl;
cout << "Stitches: " << h.st << endl;
cout << "Colour Count: " << h.co << endl;
cout << "Extends: " << h.plusXExtends << endl;
cout << "Extends: " << h.minusXExtends << endl;
cout << "Extends: " << h.plusYExtends << endl;
}
fileIn.close();
cout << "\n";
//cin.get();
return 0;
}
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
Then I use a struct to store the header items
The actual struct is longer than this. I shortened it because I didn't need the whole struct for the question.
Anyway as I read the struct I do a cout to see what I am getting. This part is fine.
As expected my cout shows the Label, Stitches, Colour Count no problem.
The problem is that if I want to do another cout after it has read the header I am getting corruption in the output. For instance if I put the following lines right after the above code eg
Instead of seeing Label, Stitches and Colour Count I get strange symbols, and corrupt output. Sometimes you can see the output of the h.label, with some corruption, but the labels are Stitches are written over. Sometimes with strange symbols, but sometimes with text from the previous cout. I think either the data in the struct is getting corrupted, or the cout output is getting corrupted, and I don't know why. The longer the header the more the problem becomes apparent. I would really like to do all the couts at the end of the header, but if I do that I see a big mess instead of what should be outputting.
My question is why is my cout becoming corrupted?
Using arrays to store strings is dangerous because if you allocate 20 characters to store the label and the label happens to be 20 characters long, then there is no room to store a NUL (0) terminating character. Once the bytes are stored in the array there's nothing to tell functions that are expecting null-terminated strings (like cout) where the end of the string is.
Your label has 20 chars. That's enough to store the first 20 letters of the alphabet:
ABCDEFGHIJKLMNOPQRST
But this is not a null-terminated string. This is just an array of characters. In fact, in memory, the byte right after the T will be the first byte of the next field, which happens to be your 11-character st array. Let's say those 11 characters are: abcdefghijk.
Now the bytes in memory look like this:
ABCDEFGHIJKLMNOPQRSTabcdefghijk
There's no way to tell where label ends and st begins. When you pass a pointer to the first byte of the array that is intended to be interpreted as a null-terminated string by convention, the implementation will happily start scanning until it finds a null terminating character (0). Which, on subsequent reuses of the structure, it may not! There's a serious risk of overrunning the buffer (reading past the end of the buffer), and potentially even the end of your virtual memory block, ultimately causing an access violation / segmentation fault.
When your program first ran, the memory of the header structure was all zeros (because you initialized with {}) and so after reading the label field from disk, the bytes after the T were already zero, so your first cout worked correctly. There happened to be a terminating null character at st[0]. You then overwrite this when you read the st field from disk. When you come back to output label again, the terminator is gone, and some characters of st will get interpreted as belonging to the string.
To fix the problem you probably want to use a different, more practical data structure to store your strings that allows for convenient string functions. And use your raw header structure just to represent the file format.
You can still read the data from disk into memory using fixed sized buffers, this is just for staging purposes (to get it into memory) but then store the data into a different structure that uses std::string variables for convenience and later use by your program.
For this you'll want these two structures:
#pragma pack(push,2)
struct RawHeader // only for file IO
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
struct Header // A much more practical Header struct than the raw one
{
std::string label;
std::string st;
std::string co;
std::string plusXExtends;
std::string minusXExtends;
std::string plusYExtends;
};
After you read the first structure, you'll transfer the fields by assigning the variables. Here's a helper function to do it.
#include <string>
#include <string.h>
template <int n> std::string arrayToString(const char(&raw)[n]) {
return std::string(raw, strnlen_s(raw, n));
}
In your function:
Header h;
RawHeader raw;
fileIn.read((char*)&raw, sizeof(raw));
// Now marshal all the fields from the raw header over to the practical header.
h.label = arrayToString(raw.label);
h.st = arrayToString(raw.st);
h.st = arrayToString(raw.st);
h.co = arrayToString(raw.co);
h.plusXExtends = arrayToString(raw.plusXExtends);
h.minusXExtends = arrayToString(raw.minusXExtends);
h.plusYExtends = arrayToString(raw.plusYExtends);
It's worth mentioning that you also have the option of keeping the raw structure around and not copying your raw char arrays to std::strings when you read the file. But you must then be certain that when you want to use the data, you always to compute and pass lengths of the strings to functions that will deal with those buffers as string data. (Similar to what my arrayToString helper does anyway.)
I'm trying to output the plaintext contents of this .exe file. It's got plaintext stuff in it like "Changing the code in this way will not affect the quality of the resulting optimized code." all the stuff microsoft puts into .exe files. When I run the following code I get the output of M Z E followed by a heart and a diamond. What am I doing wrong?
ifstream file;
char inputCharacter;
file.open("test.exe", ios::binary);
while ((inputCharacter = file.get()) != EOF)
{
cout << inputCharacter << "\n";
}
file.close();
I would use something like std::isprint to make sure the character is printable and not some weird control code before printing it.
Something like this:
#include <cctype>
#include <fstream>
#include <iostream>
int main()
{
std::ifstream file("test.exe", std::ios::binary);
char c;
while(file.get(c)) // don't loop on EOF
{
if(std::isprint(c)) // check if is printable
std::cout << c;
}
}
You have opened the stream in binary, which is good for the intended purpose. However you print every binary data as it is: some of thes characters are not printable, giving weird output.
Potential solutions:
If you want to print the content of an exe, you'll get more non-printable chars than printable ones. So one approach could be to print the hex value instead:
while ( file.get(inputCharacter ) )
{
cout << setw(2) << setfill('0') << hex << (int)(inputCharacter&0xff) << "\n";
}
Or you could use the debugger approach of displaying the hex value, and then display the char if it's printable or '.' if not:
while (file.get(inputCharacter)) {
cout << setw(2) << setfill('0') << hex << (int)(inputCharacter&0xff)<<" ";
if (isprint(inputCharacter & 0xff))
cout << inputCharacter << "\n";
else cout << ".\n";
}
Well, for the sake of ergonomy, if the exe file contains any real exe, you'd better opt for displaying several chars on each line ;-)
Binary file is a collection of bytes. Byte has a range of values 0..255. Printable characters that can be safely "printed" form a much narrower range. Assuming most basic ASCII encoding
32..63
64..95
96..126
plus, maybe, some higher than 128, if your codepage has them
see ascii table.
Every character that falls out of that range may, at least:
print out as invisible
print out as some weird trash
be in fact a control character that will change settings of your terminal
Some terminals support "end of text" character and will simply stop printing any text afterwards. Maybe you hit that.
I'd say, if you are interested only in text, then print only that printables and ignore others. Or, if you want everything, then maybe write them out in hex form instead?
This worked:
ifstream file;
char inputCharacter;
string Result;
file.open("test.exe", ios::binary);
while (file.get(inputCharacter))
{
if ((inputCharacter > 31) && (inputCharacter < 127))
Result += inputCharacter;
}
cout << Result << endl;
cout << "These are the ascii characters in the exe file" << endl;
file.close();
I have a c++ program that computes populations within a given radius by reading gridded population data from an ascii file into a large 8640x3432-element vector of doubles. Reading the ascii data into the vector takes ~30 seconds (looping over each column and each row), while the rest of the program only takes a few seconds. I was asked to speed up this process by writing the population data to a binary file, which would supposedly read in faster.
The ascii data file has a few header rows that give some data specs like the number of columns and rows, followed by population data for each grid cell, which is formatted as 3432 rows of 8640 numbers, separated by spaces. The population data numbers are mixed formats and can be just 0, a decimal value (0.000685648), or a value in scientific notation (2.687768e-05).
I found a few examples of reading/writing structs containing vectors to binary, and tried to implement something similar, but am running into problems. When I both write and read the vector to/from the binary file in the same program, it seems to work and gives me all the correct values, but then it ends with either a "segment fault: 11" or a memory allocation error that a "pointer being freed was not allocated". And if I try to just read the data in from the previously written binary file (without re-writing it in the same program run), then it gives me the header variables just fine but gives me a segfault before giving me the vector data.
Any advice on what I might have done wrong, or on a better way to do this would be greatly appreciated! I am compiling and running on a mac, and I don't have boost or other non-standard libraries at present. (Note: I am extremely new at coding and am having to learn by jumping in the deep end, so I may be missing a lot of basic concepts and terminology -- sorry!).
Here is the code I came up with:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <fstream>
# include <iostream>
# include <vector>
# include <string.h>
using namespace std;
//Define struct for population file data and initialize one struct variable for reading in ascii (A) and one for reading in binary (B)
struct popFileData
{
int nRows, nCol;
vector< vector<double> > popCount; //this will end up having 3432x8640 elements
} popDataA, popDataB;
int main() {
string gridFname = "sample";
double dum;
vector<double> tempVector;
//open ascii population grid file to stream
ifstream gridFile;
gridFile.open(gridFname + ".asc");
int i = 0, j = 0;
if (gridFile.is_open())
{
//read in header data from file
string fileLine;
gridFile >> fileLine >> popDataA.nCol;
gridFile >> fileLine >> popDataA.nRows;
popDataA.popCount.clear();
//read in vector data, point-by-point
for (i = 0; i < popDataA.nRows; i++)
{
tempVector.clear();
for (j = 0; j<popDataA.nCol; j++)
{
gridFile >> dum;
tempVector.push_back(dum);
}
popDataA.popCount.push_back(tempVector);
}
//close ascii grid file
gridFile.close();
}
else
{
cout << "Population file read failed!" << endl;
}
//create/open binary file
ofstream ofs(gridFname + ".bin", ios::trunc | ios::binary);
if (ofs.is_open())
{
//write struct to binary file then close binary file
ofs.write((char *)&popDataA, sizeof(popDataA));
ofs.close();
}
else cout << "error writing to binary file" << endl;
//read data from binary file into popDataB struct
ifstream ifs(gridFname + ".bin", ios::binary);
if (ifs.is_open())
{
ifs.read((char *)&popDataB, sizeof(popDataB));
ifs.close();
}
else cout << "error reading from binary file" << endl;
//compare results of reading in from the ascii file and reading in from the binary file
cout << "File Header Values:\n";
cout << "Columns (ascii vs binary): " << popDataA.nCol << " vs. " << popDataB.nCol << endl;
cout << "Rows (ascii vs binary):" << popDataA.nRows << " vs." << popDataB.nRows << endl;
cout << "Spot Check Vector Values: " << endl;
cout << "Index 0,0: " << popDataA.popCount[0][0] << " vs. " << popDataB.popCount[0][0] << endl;
cout << "Index 3431,8639: " << popDataA.popCount[3431][8639] << " vs. " << popDataB.popCount[3431][8639] << endl;
cout << "Index 1600,4320: " << popDataA.popCount[1600][4320] << " vs. " << popDataB.popCount[1600][4320] << endl;
return 0;
}
Here is the output when I both write and read the binary file in the same run:
File Header Values:
Columns (ascii vs binary): 8640 vs. 8640
Rows (ascii vs binary):3432 vs.3432
Spot Check Vector Values:
Index 0,0: 0 vs. 0
Index 3431,8639: 0 vs. 0
Index 1600,4320: 25.2184 vs. 25.2184
a.out(11402,0x7fff77c25310) malloc: *** error for object 0x7fde9821c000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
And here is the output I get if I just try to read from the pre-existing binary file:
File Header Values:
Columns (binary): 8640
Rows (binary):3432
Spot Check Vector Values:
Segmentation fault: 11
Thanks in advance for any help!
When you write popDataA to the file, you are writing the binary representation of the vector of vectors. However this really is quite a small object, consisting of a pointer to the actual data (itself a series of vectors, in this case) and some size information.
When it's read back in to popDataB, it kinda works! But only because the raw pointer that was in popDataA is now in popDataB, and it points to the same stuff in memory. Things go crazy at the end, because when the memory for the vectors is freed, the code tries to free the data referenced by popDataA twice (once for popDataA, and once again for popDataB.)
The short version is, it's not a reasonable thing to write a vector to a file in this fashion.
So what to do? The best approach is to first decide on your data representation. It will, like the ASCII format, specify what value gets written where, and will include information about the matrix size, so that you know how large a vector you will need to allocate when reading them in.
In semi-pseudo code, writing will look something like:
int nrow=...;
int ncol=...;
ofs.write((char *)&nrow,sizeof(nrow));
ofs.write((char *)&ncol,sizeof(ncol));
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val=data[i][j];
ofs.write((char *)&val,sizeof(val));
}
}
And reading will be the reverse:
ifs.read((char *)&nrow,sizeof(nrow));
ifs.read((char *)&ncol,sizeof(ncol));
// allocate data-structure of size nrow x ncol
// ...
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val;
ifs.read((char *)&val,sizeof(val));
data[i][j]=val;
}
}
All that said though, you should consider not writing things into a binary file like this. These sorts of ad hoc binary formats tend to live on, long past their anticipated utility, and tend to suffer from:
Lack of documentation
Lack of extensibility
Format changes without versioning information
Issues when using saved data across different machines, including endianness problems, different default sizes for integers, etc.
Instead, I would strongly recommend using a third-party library. For scientific data, HDF5 and netcdf4 are good choices which address all of the above issues for you, and come with tools that can inspect the data without knowing anything about your particular program.
Lighter-weight options include the Boost serialization library and Google's protocol buffers, but these address only some of the issues listed above.
I did some searching on this site and on google as well. But i couldnt understand a lot of code i seen and im hoping to find more direct help from here. Im only 2 semesters in for c++ and i have a side project id like to do for my boss.
He generates a csv file for call logs and i want to be able to retrieve certain lines from the log and be able to calculate and display data.
Im not sure of the exact questions i need to ask but heres my code where i tried to start getting data but ran into problems (my programming knowledge is fairly limited due to lack of time and experience :)
#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
int main()
{
//opens the csv file.
ifstream gwfile;
gwfile.open("log.csv");
if(!gwfile) { // file couldn't be opened
cout << "FAILED: file could not be opened" << endl << "Press enter to close.";
cin.get();
return 0;
}else
cout << "SUCCESSFULLY opened file!\n";
cout << "-------------------------------------------------------------------------------\n\n\n";
long int SIZE = 0;
char data[SIZE];
cout << "This is data SIZE:" << data[SIZE] << endl;
//in the csv im trying to only read lines that start with the voice as those are only valid data we need.
//also i would like to display the headings in teh very first line
while( !gwfile.eof() ){
//This is where im trying to only accept the lines starting with "Voice"
//if(data[SIZE] == "Voice"){
for( int i=0; i!=","; i++){
cout << "This is i: " << i << endl; //testing purposes.
}
//}
// getline(gwfile, data, '');
// cout << data[0];
}
return 0;
}
Let’s begin with the obvious
long int SIZE = 0;
char data[SIZE];
cout << "This is data SIZE:" << data[SIZE] << endl;
You are creating an array of size 0, then reaching for its first member: data[0]. This cannot work. Give your array a size that is large enough to handle the data you want to treat, or use a dynamicly resizable container (such as std::vector) to deal with it.