I'm trying to read an unsigned long number from a binary file.
i'm doing this in this way:
infile.open("file.bin", std::ios::in | std::ios::binary);
char* U=new char[sizeof(unsigned long)];
unsigned long out=0;
infile.read(U, sizeof(unsigned long));
out=static_cast<unsigned long>(*U);
delete[] U;
U=NULL;
infile.close();
but result is not correct.
My data is 6A F2 6B 58 00 00 00 00 witch should be read as 1483469418 but out is 106 in my code which is just the first byte of data
What is the problem?
how should i correctly read an unsigned long from file?
That is because you are casting a dereferenced value. I.e. only a char not full 4 bytes. *U is 106.
You can read the data in without the intermediate buffer:
infile.read(reinterpret_cast<char*>(&out), sizeof out);
The difference is that here you are reinterpreting the pointer, not the value under it.
If you still want to use the buffer, it should be *reinterpret_cast<unsigned long*>(U);, this also reinterprets the pointer 1st, and then dereferences it. The key is to dereference a pointer of proper type. The type of pointer determines how many bytes are used for the value.
out=static_cast(U); should be out=(unsigned long *)(U);
It can be much simpler:
infile.open("file.bin", std::ios::in | std::ios::binary);
unsigned long out=0;
infile.read((char *)&out, sizeof(out));
infile.close();
Try out=*reinterpret_cast<unsigned long *>(U);
You need to know whether the file (not the program) is big endian or little endian. Then read the bytes with fgetc() and reconsitute the number
so
unsigned long read32be(FILE *fp)
{
unsigned long ch0, ch1, ch2 ch3;
ch0 = fgetc(fp);
ch1 = fgetc(fp);
ch2 = fgetc(fp);
ch3 = fgetc(fp);
return (unsigned long) (ch0 << 24) | (ch1 << 16) | (ch2 << 8) | ch3
}
Now it will work regardless of whether longs is 32 bits or 64, big_endian or little endian. If the file is little endian, swap the order of the fgetc()s.
Reading binary files portably is surprisingly tricky. I've put some code on github
https://github.com/MalcolmMcLean/ieee754
Related
I'm reading a bunch of bit values from a text file which are in binary from because I stored them using fwrite. The problem is that the first value in the file is 5 bytes in size and the next 4800 values are 2 bytes in size. So when I try to cycle through the file and read the values it will give me the wrong results because my program does not know that it should take 5 bytes the first time and then 2 bytes the remaining 4800 times.
Here is how I'm cycling through the file:
long lSize;
unsigned short * buffer;
size_t result;
pFile = fOpen("dataValues.txt", "rb");
lSize = ftell(pFile);
buffer = (unsigned short *) malloc (sizeof(unsigned short)*lSize);
size_t count = lSize/sizeof(short);
for(size_t i = 0; i < count; ++i)
{
result = fread(buffer+i, sizeof(unsigned short), 1, pFile);
print("%u\n", buffer[i]);
}
I'm pretty sure I'm going to need to change my fread statement because the first value is of type time_t so I'll probably need a statement that looks like this:
result = fread(buffer+i, sizeof(time_t), 1, pFile);
However, this did not work work when I tried it and I think it's because I am not changing the starting position properly. I think that while I do read 5 bytes worth of data, I don't move the starting position enough.
Does anyone here have a good understanding of fread? Can you please let me know what I can change to make my program accomplish what I need.
EDIT:
This is how I'm writing to the file.
fwrite(&timer, sizeof(timer), 1, pFile);
fwrite(ptr, sizeof(unsigned short), rawData.size(), pFile);
EDIT2:
I tried to read the file using ifstream
int main()
{
time_t x;
ifstream infile;
infile.open("binaryValues.txt", ios::binary | ios::in);
infile.read((char *) &x, sizeof(x));
return 0;
}
However, now it doesn't compile and just give me a bunch of undefined reference to errors to code that I don't even have written.
I don't see the problem:
uint8_t five_byte_buffer[5];
uint8_t two_byte_buffer[2];
//...
ifstream my_file(/*...*/);
my_file.read(&five_byte_buffer[0], 5);
my_file.read(&two_byte_buffer[0], 2);
So, what is your specific issue?
Edit 1: Reading in a loop
while (my_file.read(&five_byte_buffer[0], 5))
{
my_file.read(&two_byte_buffer[0], 5);
Process_Data();
}
You can't. Streams are byte, almost always octet (8 bit byte) oriented.
You can easily enough build a bit-oriented stream on top of that. You just keep a few bytes in a buffer and keep track of which bit is current. Watch out for getting the last few bits, and attempts to mix byte access with bit access.
Untested but this is the general idea.
struct bitstream
{
unsigned long long rack; // 64 bits rack
FILE *fp; // file opened for reading
int rackpos; // 0 - 63, poisition of bits read.
}
int getbits(struct bitstream *bs, int Nbits)
{
unsigned long long mask = 0x8000 0000 0000 0000;
int answer = 0;
while(bs->rackpos > 8)
{
bs->rack <<= 8;
bs->rack |= fgetc(bs->fp);
bs->rackpos -= 8;
}
mask >>= bs->rackpos;
for(i=0;i<Nbits;i++)
{
answer <<= 1;
answer |= bs->rack & mask;
mask >>= 1;
}
bs->rackpos += Nbits;
return answer;
}
You need to decide how you know when the stream is terminated. As is you'll corrupt the last few bits with the EOF read by fgetc().
What is the most suitable type of vector to keep the bytes of a file?
I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!
The goal is to save this data (bytes) to a file and retrieve from this file later.
NOTE: The files contain null bytes ("00000000" in bits)!
I'm a bit lost here. Help me! =D Thanks!
UPDATE I:
To read the file I'm using this function:
char* readFileBytes(const char *name){
std::ifstream fl(name);
fl.seekg( 0, std::ios::end );
size_t len = fl.tellg();
char *ret = new char[len];
fl.seekg(0, std::ios::beg);
fl.read(ret, len);
fl.close();
return ret;
}
NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!
NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?
NOTE III: When using char array I had problems converting bits "00000000" for that type.
Code Snippet:
int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7] ) |
(bit8Array[6] << 1) |
(bit8Array[5] << 2) |
(bit8Array[4] << 3) |
(bit8Array[3] << 4) |
(bit8Array[2] << 5) |
(bit8Array[1] << 6) |
(bit8Array[0] << 7);
UPDATE II:
Following the #chqrlie recommendations.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>
std::vector<unsigned char> readFileBytes(const char* filename)
{
// Open the file.
std::ifstream file(filename, std::ios::binary);
// Stop eating new lines in binary mode!
file.unsetf(std::ios::skipws);
// Get its size
std::streampos fileSize;
file.seekg(0, std::ios::end);
fileSize = file.tellg();
file.seekg(0, std::ios::beg);
// Reserve capacity.
std::vector<unsigned char> unsignedCharVec;
unsignedCharVec.reserve(fileSize);
// Read the data.
unsignedCharVec.insert(unsignedCharVec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return unsignedCharVec;
}
int main(){
std::vector<unsigned char> unsignedCharVec;
// txt file contents "xz"
unsignedCharVec=readFileBytes("xz.txt");
// Letters -> UTF8/HEX -> bits!
// x -> 78 -> 0111 1000
// z -> 7a -> 0111 1010
for(unsigned char c : unsignedCharVec){
printf("%c\n", c);
for(int o=7; o >= 0; o--){
printf("%i", ((c >> o) & 1));
}
printf("%s", "\n");
}
// Prints...
// x
// 01111000
// z
// 01111010
return 0;
}
UPDATE III:
This is the code I am using using to write to a binary file:
void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
std::ofstream file(filename, std::ios::out|std::ios::binary);
file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0,
std::streamsize(fileBytes.size()));
}
writeFileBytes("xz.bin", fileBytesOutput);
UPDATE IV:
Futher read about UPDATE III:
c++ - Save the contents of a "std::vector<unsigned char>" to a file
CONCLUSION:
Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char> as the guidance of friends. std::vector<unsigned char> is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!
In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).
Thanks a lot!
Use std::vector<unsigned char>. Don't use std::uint8_t: it's won't exist on systems that don't have a native hardware type of exactly 8 bits. unsigned char will always exist; it will usually be the smallest addressable type that the hardware supports, and it's required to be at least 8 bits wide, so if you're trafficking in 8-bit bytes, it will handle the bits that you need.
If you really, really, really like the fixed-width types, you might consider std::uint_least8_t, which will always exist, and has at least eight bits, or std::uint_fast8_t, which also has at least eight bits. But file I/O traffics in char types, and mixing char and it's variants with vaguely specified "least" and "fast" types may well get confusing.
There are 3 problems in your code:
You use the char type and return a char *. Yet the return value is not a proper C string as you do not allocate an extra byte for the '\0' terminator nor null terminate it.
If the file may contain null bytes, you should probably use type unsigned char or uint8_t to make it explicit that the array does not contain text.
You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t> or std::vector<unsigned char> instead of an array allocated with new.
uint8_t is the winner in my eyes:
it's exactly 8 bits, or 1 byte, long;
it's unsigned without requiring you to type unsigned every time;
it's exactly the same on all platforms;
it's a generic type that does not imply any specific use, unlike char / unsigned char, which is associated with characters of text even if it can technically be used for any purpose just the same as uint8_t.
Bottom line: uint8_t is functionally equivalent to unsigned char, but does a better job of saying this is some data of unspecified nature in the source code.
So use std::vector<uint8_t>.
#include <stdint.h> to make the uint8_t definition available.
P. S. As pointed out in the comments, the C++ standard defines char as 1 byte, and byte is not, strictly speaking, required to be the same as octet (8 bits). On such a hypothetical system, char will still exist and will be 1 byte long, but uint8_t is defined as 8 bits (octet) and thus may not exist (due to implementation difficulties / overhead). So char is more portable, theoretically speaking, but uint8_t is more strict and has wider guarantees of expected behavior.
I am reading in binary data from a file:
char* buffIn = new char[8];
ifstream inFile(path, ifstream::binary);
inFile.read(buffIn, 8);
I then want to convert the char* read in (as binary) to an unsigned long but I am having problems - I am not quite sure what is going on, but for instance 0x00000000000ACD gets interpreted as 0xFFFFFFFFFFFFCD - I suspect all the 0x00 bytes are causing some sort of problem when converting from char* to unsigned long...
unsigned long number = *(buffIn);
How do I do this properly?
Since buffIn is of type char pointer, when you do *(buffIn) you are just grabbing one character. You have to reinterpret the memory address as an unsigned long pointer and then dereference it.
unsigned long number = *((unsigned long*)buffIn);
In addition to recasting the char[8] (which will only read the the first unsigned long - which is 32-bits in length), you can also use some simple bit-wise operations
unsigned long value = (((unsigned long)buffin[0]) << 24) | (((unsigned long)buffin[1]) << 16) | (((unsigned long)buffin[2]) << 8) | (unsigned long)buffin[3];
Try something like
unsigned long* buffInL = new unsigned long[2];
char* buffIn=(char*)buffInL;
ifstream inFile(path, ifstream::binary);
inFile.read(buffIn, 8);
Unlike other types, char* is allowed to alias.
Hi everyone i have an issue while reading binary data from a binary file as following:
File Content:
D3 EE EE 00 00 01 D7 C4 D9 40
char * afpContentBlock = new char[10];
ifstream inputStream(sInputFile, ios::in|ios::binary);
if (inputStream.is_open()))
{
inputStream.read(afpContentBlock, 10);
int n = sizeof(afpContentBlock)/sizeof(afpContentBlock[0]); // Print 4
// Here i would like to check every byte, but no matter how i convert the
// char[] afpContentBlock, it always cut at first byte 0x00.
}
I know this happens cause of the byte 0x00. Is there a way to manage it somehow ?
I have tried to write it with an ofstream object, and it works fine since it writes out the whole 10 bytes. Anyway i would like to loop through the whole byte array to check bytes value.
Thank you very much.
It's much easier to just get how many bytes you read from the ifstream like so:
if (inputStream.is_open()))
{
inputStream.read(afpContentBlock, 10);
int bytesRead = (int)inputStream.gcount();
for( int i = 0; i < bytesRead; i++ )
{
// check each byte however you want
// access with afpContentBlock[i]
}
}
I have a binary file in big-endian format from which I am retrieving 2-bit and 4-bit integer data. The machine I'm running on is little-endian.
Does anyone have any suggestions or a best-practice on pulling integer data from a known format binary and switching endianness on the fly? I'm not sure that my current solution is even correct:
int myInt;
ifstream dataFile(dataFileLocation, ios::in | ios::binary);
dataFile.seekg(99, ios::beg); //Pull data starting at byte 100;
//For 4-byte value:
char chunk[4];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 24 | chunk[1] << 16 | chunk[2] << 8 | chunk[3]);
//For 2-byte value:
char chunk[2];
dataFile.read(chunk, 4);
myInt = (int)(chunk[0] << 8 | chunk[1]);
This seems to work fine for 2-byte data but gives what I believe are incorrect values on 4-byte data. I've read about htonl() but from what I've read that's not a smart way to go for flexibility.
Use unsigned integral types only and you'll be fine:
unsigned char buf[4];
infile.read(reinterpret_cast<char*>(buf), 4);
unsigned int b4 = (buf[0] << 24) + ... + (buf[3]);
unsigned int b2 = (buf[0] << 8) + (buf[1]);
Shifting involves type promotions, and indefinite sign extensions (given the implementation-defined nature of char). Basically you always want everything to be unsigned when manipulating bits.