Reading data from hard drive into a class

Reading data from hard drive into a class - c++

Every time i try to read a file form the hard drive and cast the data into a structure, i end up with problems of the data not casting properly. Is there a requirement with the reinterpret_cast() function that requires the number of bytes in a structure be a multiple of 4 bytes? If not, what am I doing wrong? If so, how do i get around that?
my structure looks like this: (they are in 50 byte chunks)
class stlFormat
{
public:
float normalX, normalY, normalZ;
float x1,y1,z1;
float x2,y2,z2;
float x3,y3,z3;
char byte1, byte2;
};
Rest of my code:
void main()
{
int size;
int numTriangles;
int * header = new int [21]; // size of header
ifstream stlFile ("tetrahedron binary.STL", ios::in|ios::binary|ios::ate);
size = stlFile.tellg(); // get the size of file
stlFile.seekg(0, ios::beg); //read the number of triangles in the file
stlFile.read(reinterpret_cast<char*>(header), 84);
numTriangles = header[20];
stlFormat * triangles = new stlFormat [numTriangles]; //create data array to hold vertex data
stlFile.seekg (84, ios::beg); //read vertex data and put them into data array
stlFile.read(reinterpret_cast<char*>(triangles), (numTriangles * 50));
cout << "number of triangles: " << numTriangles << endl << endl;
for (int i = 0; i < numTriangles; i++)
{
cout << "triangle " << i + 1 << endl;
cout << triangles[i].normalX << " " << triangles[i].normalY << " " << triangles[i].normalZ << endl;
cout << triangles[i].x1 << " " << triangles[i].y1 << " " << triangles[i].z1 << endl;
cout << triangles[i].x2 << " " << triangles[i].y2 << " " << triangles[i].z2 << endl;
cout << triangles[i].x3 << " " << triangles[i].z3 << " " << triangles[i].z3 << endl << endl;
}
stlFile.close();
getchar();
}
Just for you John, although its rather incomprehensible. Its in hex format.
73 6f 6c 69 64 20 50 61 72 74 33 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04 00 00 00 ec 05 51 bf ab aa aa 3e ef 5b f1 be
00 00 00 00 00 00 00 00 f3 f9 2f 42 33 33 cb 41
80 e9 25 42 9a a2 ea 41 33 33 cb 41 00 00 00 00
00 00 00 00 00 00 00 00 00 00 ab aa aa 3e ef 5b
71 3f 33 33 4b 42 00 00 00 00 f3 f9 2f 42 33 33
cb 41 80 e9 25 42 9a a2 ea 41 00 00 00 00 00 00
00 00 f3 f9 2f 42 00 00 ec 05 51 3f ab aa aa 3e
ef 5b f1 be 33 33 cb 41 00 00 00 00 00 00 00 00
33 33 cb 41 80 e9 25 42 9a a2 ea 41 33 33 4b 42
00 00 00 00 f3 f9 2f 42 00 00 00 00 00 00 00 00
80 bf 00 00 00 00 33 33 cb 41 00 00 00 00 00 00
00 00 33 33 4b 42 00 00 00 00 f3 f9 2f 42 00 00
00 00 00 00 00 00 f3 f9 2f 42 00 00

Most likely, float has an alignment of four bytes on your system. This means that, because you use it in your structure, the compiler will make sure the start of the structure when allocated using normal methods will always be a multiple of four bytes. Since the raw size of your structure is 4*12+2 = 50 bytes, it needs to be rounded up to the next multiple of four bytes - otherwise, the second element of arrays of this structure would be unaligned. So your struct ends up 52 bytes, throwing off your parsing.
If you need to parse a binary format, it's often a good idea to either use compiler-specific directives to disable alignment, or read one field at a time, to avoid these problems.
For example, on MSVC++, you can use __declspec(align(1)) Edit: Actually __declspec(align(X)) can only increase alignment restrictions. Oops. You'll need to either load one field at a time, or make the padding part of the binary format.

I used my favorite text editor (editpadpro) to save the file you posted in the OP as a binary file called "c:\work\test.bin", edited your code to the following, and it (apparently) produced the correct (expected) output. Please try it out.
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
#pragma pack( push, 1 )
class stlFormat
{
public:
float normalX, normalY, normalZ;
float x1,y1,z1;
float x2,y2,z2;
float x3,y3,z3;
char byte1, byte2;
};
#pragma pack( pop )
struct foo
{
char c, d, e;
};
void main()
{
size_t sz = sizeof(foo);
int size;
int numTriangles;
int * header = new int [21]; // size of header
ifstream stlFile ("c:\\work\\test.bin", ios::in|ios::binary|ios::ate);
size = stlFile.tellg(); // get the size of file
stlFile.seekg(0, ios::beg); //read the number of triangles in the file
stlFile.read(reinterpret_cast<char*>(header), 84);
numTriangles = header[20];
stlFormat * triangles = new stlFormat [numTriangles]; //create data array to hold vertex data
stlFile.seekg (84, ios::beg); //read vertex data and put them into data array
stlFile.read(reinterpret_cast<char*>(triangles), (numTriangles * 50));
cout << "number of triangles: " << numTriangles << endl << endl;
for (int i = 0; i < numTriangles; i++)
{
cout << "triangle " << i + 1 << endl;
cout << triangles[i].normalX << " " << triangles[i].normalY << " " << triangles[i].normalZ << endl;
cout << triangles[i].x1 << " " << triangles[i].y1 << " " << triangles[i].z1 << endl;
cout << triangles[i].x2 << " " << triangles[i].y2 << " " << triangles[i].z2 << endl;
cout << triangles[i].x3 << " " << triangles[i].z3 << " " << triangles[i].z3 << endl << endl;
}
stlFile.close();
getchar();
}

instead of fiddling with padding and differences between platforms, maybe have a look at serialization to/from binary files? It might be somewhat less performant then reading data straight into memory, but it's way more extensible.

You should be aware that you are throwing portability out the window with that kind of code: your files may be incompatible with new versions of your program if you compile with a different compiler or for a different system.
That said, you might fix this by using sizeof( int[21] ) and sizeof( stlFormat[ numTriangles ] ) rather than hardcoded sizes in bytes. Reason being, as others noted, the alignment bytes your compiler may or may not add.
If this is a program that other people may use or files might be shared, look up serialization.

IMO you really ought to be explicitly reading the triangles directly (deserialization) instead of casting bytes. Doing so will help you avoid portability and performance problems. If you're doing a lot of calculations with those triangles after you read them, the performance hit for using a non-standard memory layout can be non-trivial.
Replace the line "stlFile.read(reinterpret_cast(triangles), (numTriangles * 50));" with this:
for (int i = 0; i < numTriangles; i++)
{
stlFile.read((char*)&triangles[i].normalX, sizeof(float));
stlFile.read((char*)&triangles[i].normalY, sizeof(float));
stlFile.read((char*)&triangles[i].normalZ, sizeof(float));
stlFile.read((char*)&triangles[i].x1, sizeof(float));
stlFile.read((char*)&triangles[i].y1, sizeof(float));
stlFile.read((char*)&triangles[i].z1, sizeof(float));
stlFile.read((char*)&triangles[i].x2, sizeof(float));
stlFile.read((char*)&triangles[i].y2, sizeof(float));
stlFile.read((char*)&triangles[i].z2, sizeof(float));
stlFile.read((char*)&triangles[i].x3, sizeof(float));
stlFile.read((char*)&triangles[i].y3, sizeof(float));
stlFile.read((char*)&triangles[i].z3, sizeof(float));
stlFile.read(&triangles[i].byte1, 1);
stlFile.read(&triangles[i].byte2, 1);
}
It takes a little more code and a little more time to read in the triangles, but you'll avoid a few potential headaches.
Note that writing triangles also requires similar code to avoid inadvertently writing out some padding.

I think the problem is not so much the reading of each individual triangle as that the triangle array isn't laid out as you think. There appear to be 50 bytes in each struct, but the allocated memory is almost certainly laid out as if the structs were 52 bytes. Consider reading in each struct individually.
Two more points:
First, there is no such thing as void main in C++. Use int main().
Second, you seem to be leaking memory. You'd be better off in general using the vector facility.

Storing a struct entirely at once isn't portable unless you take great care with compiler-specific flags and all compilers and architectures might still not allow the same binary format. Storing a field (e.g. a floating-point number) at a time is better, but still isn't portable because of endianess issues and possibly different data types (e.g. what is sizeof(long) on your system).
In order to save integers safely and portably, you have to format them byte at a time into a char buffer that will then be written out to a file. E.g.
char buf[100]; // Extra space for more values (instead of only 4 bytes)
// Write a 32 bit integer value into buf, using big endian order
buf[0] = value >> 24; // The most significant byte
buf[1] = value >> 16;
buf[2] = value >> 8;
buf[3] = value; // The least significant byte
Similarly, reading back has to be done a byte at a time:
// Converting the pointer to unsigned to avoid sign extension issues
unsigned char* ubuf = reinterpret_cast<unsigned char*>(buf);
value = ubuf[0] << 24 | ubuf[1] << 16 | ubuf[2] << 8 | ubuf[3];
If little endian order is desired, invert the indexing order of buf and ubuf.
Because no pointer casting of integer types to char or vice-versa are done, the code is fully portable. Doing the same for floating-point types requires extra caution and a pointer cast so that the value can be handled as an integer, so that bit shifting works. I won't cover that in detail here.
While this solution seems extremely painful to use, you only need to write a few helper functions to make it tolerable. Alternatively, especially if the exact format used does not matter to you, you can use an existing serialization library. Boost.Serialization is a rather nice library for that.

Related

How can I display an integer variable as thier intended ASCII letters in `cout` when parsing a binary file?

When I use hexdump -C on the command line to examine this MIDI file, we can see that some bytes of this binary file are ASCII letters that are meant to be human readable text.
00000000 4d 54 68 64 00 00 00 06 00 01 00 08 00 78 4d 54 |MThd.........xMT|
000024f0 2f 00 4d 54 72 6b 00 00 00 19 00 ff 21 01 00 00 |/.MTrk......!...|
00002500 ff 03 0c 62 79 20 42 65 65 74 68 6f 76 65 6e 00 |...by Beethoven.|
00002510 ff 2f 00 4d 54 72 6b 00 00 00 18 00 ff 21 01 00 |./.MTrk......!..|
For debugging purposes, when I have a variable that holds an integer that I know represents a string of ASCII characters, I would like to simply display the ASCII characters in cout.
I this case, the first four bytes of the file are 0x4d546864, which represent the letters MThd.
uint32_t n32Bits = 0;
ifs.read((char*)&n32Bits, sizeof(uint32_t));
n32Bits = BigEndianToLittleEndian(n32Bits); // reverse byte order
This is the integer:
std::cout << "n32Bits: " << n32Bits <<std::endl; // 1297377380
I can easily display it as hex:
std::cout << "n32Bits: " << std::hex << n32Bits <<std::endl; // 4d546864
Now, I want this line to output the letters MThd just like hexdump does.:
std::cout << "n32Bits: " << std::ascii << n32Bits <<std::endl; // compile error.
Isn't there some simple built-in way to dump ASCII letters from integers that represent ASCII letters?

There is no formatting spec like std::ascii but there is a string constructor you can use:
std::string int2str((char*)&n32Bits, 4);
std::cout << "n32Bits: " << int2str << std::endl;
This constructor takes a char buffer and length.

There is no built-in function to print raw bytes as an ASCII string the way a hex dump does. You will have to do that yourself manually, eg:
#include <algorithm>
#include <iterator>
#include <cctype>
#include <cstring>
char buffer[sizeof(n32Bits)];
std::memcpy(buffer, &n32Bits, sizeof(n32Bits));
std::transform(std::begin(buffer), std::end(buffer), std::begin(buffer),
[](unsigned char ch){ return std::isprint(ch) ? static_cast<char>(ch) : '.'; }
);
std::cout << "n32Bits: ";
std::cout.write(buffer, sizeof(buffer));
std::cout << std::endl;
Online Demo

int a = 0x4d546864;
// swap_bytes(a);
int b[] = {a, 0};
cout << (char*)b <<endl;

converting a string read from binary file to integer

I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.

You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();

It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.

Disable alignment on a 64-bit structure

I'm trying to align my structure and make it as small as possible using bit fields. I have to send this data back to a client, which will examine the fields to set a few data members.
The size of the structure is indeed the same, but when I set members it does not work at all.
Here's some example code:
#pragma pack(push, 1)
struct PW_INFO
{
char hash[16]; //Does not matter
uint32_t number; //Does not matter
uint32_t salt_id : 30; //Position: 0 bits
uint32_t enc_level : 7; //Position: 30 bits
uint32_t delta : 27; //Position: 37 bits
}; //Total size: 28 bytes
#pragma pack(pop)
void int64shrl(uint64_t& base, uint32_t to_shift, uint32_t position)
{
uint64_t res = static_cast<uint64_t>(to_shift);
res = Int64ShllMod32(res, position);
base |= res;
}
int32_t main()
{
std::cout << "Size of PW_INFO: " << sizeof(PW_INFO) << "\n"; //Returns 28 as expected (16 + sizeof(uint32_t) + 8)
PW_INFO pw = { "abc123", 0, 0, 0, 0 };
pw.enc_level = 105;
uint64_t base{ 0 };
&base; //debug purposes
int64shrl(base, 103, 30);
return 0;
}
Here's where it gets weird: setting the "salt_id" field (which is 30 bits into the bitfield) will yield the following result in memory:
0x003FFB8C 61 62 63 31 32 33 00 00 abc123..
0x003FFB94 00 00 00 00 00 00 00 00 ........
0x003FFB9C 00 00 00 00 00 00 00 00 ........
0x003FFBA4 69 00 00 00 i...
(Only the last 8 bytes are of concern since they represent the bit field.)
But, Int64ShllMod32 returns a correct result (the remote client undersands it perfectly):
0x003FFB7C 00 00 00 c0 19 00 00 00 ...À....
I'm guessing it has to do with alignment, if so how would I completely get rid of it? It seems even if the size is correct, it will try to align it (1 byte boundary as the #pragma directive suggests).
More information:
I use Visual Studio 2015 and its compiler.
I am not trying to write those in a different format, the reason I'm asking this is that I do NOT want to use my own format. They are reading from 64 bit bitfields everywhere, I don't have access to the source code but I see a lot of calls to Int64ShrlMod32 (from what I read, this is what the compiler produces when dealing with 8 byte structures).
The actual bitfield starts at "salt_id". 30 + 7 + 27 = 64 bits, I hope it is clearer now.

Bit reading puzzle (reading a binary file in C++)

I am trying to read the file 'train-images-idx3-ubyte', which can be found here along with the corresponding file format description (at the bottom of the webpage). When I look at the bytes with od -t x1 train-images-idx3-ubyte | less (hexadecimal, bytewise), I get the following output:
adress bytes
0000000 00 00 08 03 00 00 ea 60 00 00 00 1c 00 00 00 1c
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
This is what I expected according to 1. But when I try to read the data with C++ I've got a problem. What I do is this:
std::fstream trainingData("minst/train-images-idx3-ubyte",
std::ios::in | std::ios::binary);
int8_t zero = 0, encoding = 0, dimension = 0;
int32_t samples = -1;
trainingData >> zero >> zero >> encoding >> dimension;
trainingData >> samples;
debugLogger << "training set image file, encoding = "
<< (int) encoding << ", dimension = "
<< (int) dimension << ", items = " << (int) samples << "\n";
But the output of these few lines of code is:
training set image file, encoding = 8, dimension = 3, items = 0
Everything but the number of instances (items, samples) is correct. I tried reading the next 4 bytes as int8_t and that gave me at least the same result as od. I cannot imagine how samples can be 0. What I actually wanted to read here was 10,000. Maybe you've got a clue?

As mentioned in other answers, you need to use unformatted input, i.e. istream::read(...) instead of operator>>. Translating your code above to use read yields:
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&encoding), sizeof(encoding));
trainingData.read(reinterpret_cast<char*>(&dimension), sizeof(dimension));
trainingData.read(reinterpret_cast<char*>(&samples), sizeof(samples));
Which gets you most of the way there - but 00 00 ea 60 looks like it's in Big-endian format, so you'll have to pass it through ntohl to make sense of it if you're running on an intel-based machine:
samples = ntohl(samples);
which gives encoding = 8, dimension = 3, items = 60000.

The input is formatted, which will result in you reading wrong results from the file. Reading from an unformatted input will provide the correct results.

Binary File interpretation

I am reading in a binary file (in c++). And the header is something like this (printed in hexadecimal)
43 27 41 1A 00 00 00 00 23 00 00 00 00 00 00 00 04 63 68 72 31 FFFFFFB4 01 00 00 04 63 68 72 32 FFFFFFEE FFFFFFB7
when printed out using:
std::cout << hex << (int)mem[c];
Is there an efficient way to store 23 which is the 9th byte(?) into an integer without using stringstream? Or is stringstream the best way?
Something like
int n= mem[8]
I want to store 23 in n not 35.

You did store 23 in n. You only see 35 because you are outputting it with a routine that converts it to decimal for display. If you could look at the binary data inside the computer, you would see that it is in fact a hex 23.
You will get the same result as if you did:
int n=0x23;
(What you might think you want is impossible. What number should be stored in n for 1E? The only corresponding number is 31, which is what you are getting.)

Do you mean you want to treat the value as binary-coded decimal? In that case, you could convert it using something like:
unsigned char bcd = mem[8];
unsigned char ones = bcd % 16;
unsigned char tens = bcd / 16;
if (ones > 9 || tens > 9) {
// handle error
}
int n = 10*tens + ones;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading data from hard drive into a class - c++

instead of fiddling with padding and differences between platforms, maybe have a look at serialization to/from binary files? It might be somewhat less performant then reading data straight into memory, but it's way more extensible.

Related

How can I display an integer variable as thier intended ASCII letters in `cout` when parsing a binary file?

converting a string read from binary file to integer

Disable alignment on a 64-bit structure

Bit reading puzzle (reading a binary file in C++)

Binary File interpretation

Categories

Resources