ifstream doesn't load whole file - c++

name of this topic is probably incorrect, but I haven't idea how to name this issue.
Something about background, I am programming one game, which 3d surface is divided to chunks. I thought out a saving mechanism, in which all chunk objects with their properties are saved in compressed form to unordered map, which is then serialized to file, so parts of world can be loaded and saved effectively regarding current needs.
Of course, when loading, file is loaded, deserialized to unordered map and strings are converted to chunks objects in real time.
That is a plann, but with hard problems with realization.
I tried all possible searches, but without result. during my play with tests, I wrote a small test script like this:
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::ifstream reader("output.dat", std::ios::binary);
std::string data;
reader>>data;
reader.close();
std::cout<<data.size()<<std::endl;
std::stringstream ss;
ss.str(data);
unsigned char id_prefix=0, zone_prefix=1;
while (ss.peek()!=EOF)
{
unsigned char type;
ss>>type;
if (type==id_prefix)
{
unsigned char tempx, tempy, tempz;
unsigned short tempid;
if (!(ss>>tempx)) std::cout<<"reading of x failed."<<std::endl;
if (!(ss>>tempy)) std::cout<<"Reading of y failed"<<std::endl;
if (!(ss>>tempz)) std::cout<<"Reading of z failed."<<std::endl;
if (!(ss>>tempid)) std::cout<<"Reading of id failed, position is "+std::to_string(ss.tellg())+", values are "+std::to_string(type)+" "+std::to_string(tempx)+" "+std::to_string(tempy)+" "+std::to_string(tempz)<<std::endl;
std::cout<<(int)tempx<<" "<<(int)tempy<<" "<<(int)tempz<<" "<<(int)tempid<<std::endl;
}
else if (type==zone_prefix)
{
unsigned char tempx, tempy, tempz;
unsigned int tempzone;
ss>>tempx;
ss>>tempy;
ss>>tempz;
ss>>tempzone;
std::cout<<(int)tempx<<" "<<(int)tempy<<" "<<(int)tempz<<" "<<(int)tempzone<<std::endl;
}
}
}
Output.dat is a file with one experimental decompressed chunk to reproduce parsing process in the game.
You can download it from:
https://www.dropbox.com/s/mljsb0t6gvfedc5/output.dat?dl=1
if you want, it have about 160 kb in size. And here is a first problem.
It is probably only my stupidity, but I thought that when I use std::ios::binary to open ifstream, and then extract its content to string, it will load whole file, but it loaded only first 46 bytes.
That is first problem, next in the game, I used other system to load data which worked, but then stringstream processing as can be seen in lower part of code failed too around this position.
I guess there are problems also with data, as you can see, format is uchar type (indicates whether following bytes refer to id or zone), coordinates (each as uchar), and ushort in case of id, uint in case of zone.
But when I looked into the file with my own created binary editor, it showed id as one byte only, not two as I expected from short value. Saving was done also with stringstream, in form:
unsigned short tempid=3; //example value
ss<
and in result file this was represented as a 51 (in one byte), what is ascii code for 3, so I am little confused, or little more than little.
Can you please help me with this? I am using Mingw g++ 4.9.3 on win7 64-bit.
Thanks much!
Edit from 1.1.2017
Now whole file is read in stringstream, but extraction of values still fails.
When >> extraction reads to the next whitespace, how is it with extraction to unsigned short for example?
I was playing with code bit, trying to change for example unsigned short tempid to unsigned char tempid.
And output does not make sense to me.
In short version, bytes like:
0;1;0;0;51
were read as type 0, x 1, y 0, z 0 and id 3 what is correct, even I don't understand why 51 is here instead of a 3.
Writing to the stream before seemed as:
unsigned short idtowrite=3;
ss<<idtowrite;
But when I changed unsigned short tempid to unsigned char tempid, it read it as type 0, x 1, y 0, z 0 and id 51, what is not correct, but I expect it from writed file.
I wouldn't solve it if it read correctly through full stream, but for some reason until 0;8;0;0;51 all is correct, and from 0;9;0;0;51, which is next to it fails, with x readed as 0, y as 0 and z as 51 and EOF is set.
I am thinking if reading haven't missed a byte, but I don't see a reason to do it.
Can you please recommend me some effective and working way how to store values in stringstream?
Thanks in advance!

std::ios::binary only has the effect of suppressing end-of-line conversion (so that e.g. \r\n in file is not converted to just \n in memory). It is certainly correct to supply this when dealing with binary files.
However, >> is still a formatted input function, which skips leading whitespace, terminates at whitespace and so on.
If you want to actually read the file as binary data, you must use the read function on the stream object.

Related

What is the best solution for writing numbers into file and than read them?

I have 640*480 numbers. I need to write them into a file. I will need to read them later. What is the best solution? Numbers are between 0 - 255.
For me the best solution is to write them binary(8 bits). I wrote the numbers into txt file and now it looks like 1011111010111110 ..... So there are no questions where the number starts and ends.
How am I supposed to read them from the file?
Using c++
It's not good idea to write bit values like 1 and 0 to text file. The file size will bigger in 8 times. 1 byte = 8 bits. You have to store bytes, 0-255 - is byte. So your file will have size 640*480 bytes instead of 640*480*8. Every symbol in text file has size of 1 byte minimum. If you want to get bits, use binary operators of programming language that you use. To read bytes much easier. Use binary file for saving your data.
Presumably you have some sort of data structure representing your image, which somewhere inside holds the actual data:
class pixmap
{
public:
// stuff...
private:
std::unique_ptr<std::uint8_t[]> data;
};
So you can add a new constructor which takes a filename and reads bytes from that file:
pixmap(const std::string& filename)
{
constexpr int SIZE = 640 * 480;
// Open an input file stream and set it to throw exceptions:
std::ifstream file;
file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
file.open(filename.c_str());
// Create a unique ptr to hold the data: this will be cleaned up
// automatically if file reading throws
std::unique_ptr<std::uint8_t[]> temp(new std::uint8_t[SIZE]);
// Read SIZE bytes from the file
file.read(reinterpret_cast<char*>(temp.get()), SIZE);
// If we get to here, the read worked, so we move the temp data we've just read
// into where we'd like it
data = std::move(temp); // or std::swap(data, temp) if you prefer
}
I realise I've assumed some implementation details here (you might not be using a std::unique_ptr to store the underlying image data, though you probably should be) but hopefully this is enough to get you started.
You can print the number between 0-255 as the char value in the file.
See the below code. in this example I am printing integer 70 as char.
So this result in print as 'F' on the console.
Similarly you can read it as char and then convert this char to integer.
#include <stdio.h>
int main()
{
int i = 70;
char dig = (char)i;
printf("%c", dig);
return 0;
}
This way you can restrict the file size.

Reading in raw encoded nrrd data file into double

Does anyone know how to read in a file with raw encoding? So stumped.... I am trying to read in floats or doubles (I think). I have been stuck on this for a few weeks. Thank you!
File that I am trying to read from:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.raw
Description of raw encoding:
hello://teem.sourceforge.net/nrrd/format.html#encoding (change hello to http to go to page)
- "raw" - The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread().
Info of file:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.nhdr - I think the only things that matter here are the big endian (still trying to understand what that means from google) and raw encoding.
My current approach, uncertain if it's correct:
//Function ripped off from example of c++ ifstream::read reference page
void scantensor(string filename){
ifstream tdata(filename, ifstream::binary); // not sure if I should put ifstream::binary here
// other things I tried
// ifstream tdata(filename) ifstream tdata(filename, ios::in)
if(tdata){
tdata.seekg(0, tdata.end);
int length = tdata.tellg();
tdata.seekg(0, tdata.beg);
char* buffer = new char[length];
tdata.read(buffer, length);
tdata.close();
double* d;
d = (double*) buffer;
} else cerr << "failed" << endl;
}
/* P.S. I attempted to print the first 100 elements of the array.
Then I print 100 other elements at some arbitrary array indices (i.e. 9,900 - 10,000). I actually kept increasing the number of 0's until I ran out of bound at 100,000,000 (I don't think that's how it works lol but I was just playing around to see what happens)
Here's the part that makes me suspicious: so the ifstream different has different constructors like the ones I tried above.
the first 100 values are always the same.
if I use ifstream::binary, then I get some values for the 100 arbitrary printing
if I use the other two options, then I get -6.27744e+066 for all 100 of them
So for now I am going to assume that ifstream::binary is the correct one. The thing is, I am not sure if the file I provided is how binary files actually look like. I am also unsure if these are the actual numbers that I am supposed to read in or just casting gone wrong. I do realize that my casting from char* to double* can be unsafe, and I got that from one of the threads.
*/
I really appreciate it!
Edit 1: Right now the data being read in using the above method is apparently "incorrect" since in paraview the values are:
Dxx,Dxy,Dxz,Dyy,Dyz,Dzz
[0, 1], [-15.4006, 13.2248], [-5.32436, 5.39517], [-5.32915, 5.96026], [-17.87, 19.0954], [-6.02961, 5.24771], [-13.9861, 14.0524]
It's a 3 x 3 symmetric matrix, so 7 distinct values, 7 ranges of values.
The floats that I am currently parsing from the file right now are very large (i.e. -4.68855e-229, -1.32351e+120).
Perhaps somebody knows how to extract the floats from Paraview?
Since you want to work with doubles, I recommend to read the data from file as buffer of doubles:
const long machineMemory = 0x40000000; // 1 GB
FILE* file = fopen("c:\\data.bin", "rb");
if (file)
{
int size = machineMemory / sizeof(double);
if (size > 0)
{
double* data = new double[size];
int read(0);
while (read = fread(data, sizeof(double), size, file))
{
// Process data here (read = number of doubles)
}
delete [] data;
}
fclose(file);
}

C++ reading leftover data at the end of a file

I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');

Endian-ness in a char array containing binary characters

I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.

How do you extract bytes from a file into simple data types?

I've tried everything I can think of and can't get anything to work. I have a binary file I've written in VB.Net which basically consists of an integer, (in binary of course) that tells me the array size for the following data, then the floats as binary data. The file writes just fine from VB.Net, and I can read it back in through Visual C++ just fine using the following code:
ifstream output("c:\\out.ipv", ios::in | ios::binary);
UInt32 len;
UInt32 *ptr2 = (UInt32*)&len;
output.read((char*)ptr2, 4);
This returns the correct value of 456780, bytes are: 76, 248, 6, 0. When I run the exact same code on my iPad, I get 1043089572. If I use the alternate method below:
NSData *data = [[NSData alloc] initWithContentsOfFile:filePath];
UInt32 num;
const NSRange numV = {0, 4};
[data getBytes:&num range:numV];
This code returns a different value, 124724, and I'm not sure how to read what the exact bytes are that are getting pulled from the file. That's something else I was trying to figure out but couldn't get working. Any idea why the same method that works in Visual C++ won't work on the iPad? I'm really at a loss on this one.
This sounds like an endian issue. You can use any of the functions in <libkern/OSByteOrder.h> to read data in a specified endianness. In your case, you may want to do something like
NSInputStream *istream = [NSInputStream inputStreamWithFileAtPath:filePath];
UInt32 num = 0;
if (istream) {
uint8_t buffer[4];
if ([istream read:buffer maxLength:4] == 4) {
num = OSReadLittleInt32(buffer, 0);
} else {
// there weren't 4 bytes in the file
}
} else {
// the file could not be opened
}
OK, something really strange is going on with my data. I just looked at the raw byte values in both Visual c++ and objective-c, and they don't agree at all. I'm only reading the first four bytes of the file and looking at their values. I'm assuming at this point that I'm not reading them in correctly, but I don't know what I'm missing here. The Visual C++ code I'm using to look at the byte values is below:
ifstream input("c:\\out.ipv", ios::in | ios::binary);
Byte tmp[4];
input.read((char*)&tmp[0], 4);
The values in the tmp array are:
76
248
6
0
If I do the same thing in objective-c:
ifstream input([filePath UTF8String], ios::in | ios::binary);
Byte tmp[4];
input.read((char*)&tmp[0], 4);
I get:
164
72
44
62
What gives? I would have at least expected to get the same byte values. The file containing the four bytes I am having trouble with is here: newout1.ipv
EDIT:
I realized where the 164,72,44,62 byte values are coming from: those are the intial values the Byte array has before I put anything in it. For some reason the line:
input.read((char*)&tmp[0], 4);
isn't doing anything. Any ideas why it's not reading from the file like it should?
FINAL EDIT:
OK, I probably shouldn't post the answer to this since it makes me look really dumb, but I don't want anyone reading these posts to get confused. So the arrays and objects were always returning the same values no matter what, which also happened to be whatever values they had when they were allocated. I had one too many .'s in my filename, so it was trying to read in out..ipv rather than out.ipv. Once I fixed the filename, everything worked exactly how I expected it to. Sorry for the confusion, and thanks for everyones help.