My first time working with binary files and I'm having clumps of hair in my hands. Anyway, I have the following defined:
unsigned int cols, rows;
Those variables can be anywhere from 1 to about 500. When I get to writing them to a binary file, I'm doing this:
myFile.write(reinterpret_cast<const char *>(&cols), sizeof(cols));
myFile.write(reinterpret_cast<const char *>(&rows), sizeof(rows));
When I go back to read the file, on cols = 300, I get this as result:
44
1
0
0
Can someone please explain to me why I'm getting that result? I can't say that there's something wrong, as I honestly think it's me who don't understand things. What I'd LIKE to do is store the value, as is, in the file so that when I read it back, I get that as well. And maybe I do, I just don't know it.
I'd like some explanation of how this is working and how do I get the data I put in read back.
You are simply looking at the four bytes of a 32 bit integer, interpreted on a little-endian platform.
300 base 10 = 0x12C
So little-endianness gives you 0x2C 0x01, and of course 0x2C=44.
Each byte in the file has 8 bits, so can represent values from 0 to 255. It's written in little-endian order, with the low byte first. So, starting at the other end, treat the numbers as digits in base 256. The value is 0 * 256^3 + 0 * 256^2 + 1 * 256^1 + 44 * 256^0 (where ^ means exponentiation, not xor).
You have not (yet) shown how you unmarshal the data nor how you printed this text that you've cited. 44 01 00 00 looks like the bytewise decimal representation of each of the little-endian bytes of the the data you've written (decimal "300").
If you read the data back like so, it should give you the effect you want (presuming that you're okay with the limitation that the computer which writes this file is the same endianness as the one which reads it back):
unsigned int colsReadFromFile = 0;
myOtherFile.read(reinterpret_cast<char *>(&colsReadFromFile), sizeof(colsReadFromFile));
if (!myOtherFile)
{
std::cerr << "Oh noes!" << std::endl;
}
300 in binary is 100101100 which is 9 bits long.
But when you say char*, compiler looks for only first 1 byte(8 bits)
so it is 00101100(bits) of (1 00101100) = 44
^^^^^^^^
Related
I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
I've realized to read a file as a binary with file.read and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)
int ReverseInt (int i)
{
unsigned char ch1, ch2, ch3, ch4;
ch1 = i & 255;
ch2 = (i >> 8) & 255;
ch3 = (i >> 16) & 255;
ch4 = (i >> 24) & 255;
return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}
I've did a little debugging with cout and when it revised for example 270991360 it return 10000 , which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?
PS :
1-I already have the MNIST converted images but i want to understand the algorithm.
2-I've already unzip the gz files so the file is pure binary.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
This function read a file (t10k-images-idx3-ubyte.gz) as follow:
Read a magic number and adjust endianness
Read number of images and adjust endianness
Read number rows and adjust endianness
Read number of columns and adjust endianness
Read all the given images x rows x columns characters (but loose them).
The function use normal int and always switch endianness, that means it target a very specific architecture and is not portable.
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
ifstream provides a function to seek to a given position:
file.seekg( posInBytes, std::ios_base::beg);
At the given position, you could read the 32-bit integer:
int32_t val;
file.read ((char*)&val,sizeof(int32_t));
2- What the function reverseInt is doing?
This function reverse order of the bytes of an int value:
Considering an integer of 32bit like aaaaaaaabbbbbbbbccccccccdddddddd, it return the integer ddddddddccccccccbbbbbbbbaaaaaaaa.
This is useful for normalizing endianness, however, it is probably not very portable, as int might not be 32bit (but e.g. 16bit or 64bit)
I'm programming in C++ and I have to store big numbers in one of my exercices.
The biggest number i have to store is : 9 780 321 563 842.
Each time i try to print the number (contained in a variable) it gives me a wrong result (not that number).
A 32bit type isn't enough since 2^32 is a 10 digit number and I have to store a 13 digit number. But with 64 bits you can respresent a number that has 20digits. So I tried using the type "uint64_t" but that didn't work for me and I really don't understand why.
So I searched on the internet to find which type would be sufficient for my variable to fit in. I saw on this forum persons with the same problem but they solved it using long long int or long double as type. But none worked for me (neither did long float).
I really don't know which other type could store that number, as I tried a lot but nothing worked for me.
Thanks for your help! :)
--
EDIT : The code is a bit long and complex and would not matter for the question, so this is actually what I do with the variable containing that number :
string barcode_s = "9780321563842";
uint64_t barcode = atoi(barcode_s.c_str());
cout << "Barcode is : " << barcode << endl;
Off course I don't put that number in a variable (of type string) "barcode_s" to convert it directly to a number, but that's what happen in my program. I read text from an input file and put it in "barcode_s" (the text I read and put in that variable is always a number) and then I convert that string to a number (using atoi).
So i presume the problem comes from the "atoi" function?
Thanks for your help!
The problem is indeed atoi: it returns an int, which is on most platforms a 32-bits integer. Converting to uint64_t from int will not magically restore the information that has been lost.
There are several solutions, though. In C++03, you could use stringstream to handle the conversion:
std::istringstream stream(barcode_s);
unsigned long barcode = 0;
if (not (stream >> barcode)) { std::abort(); }
In C++11, you can simply use stoul or stoull:
unsigned long long const barcode = std::stoull(barcode_s);
Your number 9 780 321 563 842 is hex 8E52897B4C2, which fits into 44 bits (4 bits per hex digit), so any 64 bit integer, no matter if signed or unsigned, will have space to spare. 'uint64_t' will work, and it will even fit into a 'double' with no loss of precision.
It follows that the remaining issue is a mistake in your code, usually that is either an accidental conversion of the 64 bit number to another type somewhere, or you are calling the wrong fouction to print a 64 bit integer.
Edit: just saw your code. 'atoi' returns int. As in 'int32_t'. Converting that to 'unit64_t' will not reconstruct the 64 bit number. Have a look at this: http://msdn.microsoft.com/en-us/library/czcad93k.aspx
The atoll () function converts char* to a long long.
If you don't have the longer function available, write your own in the mean time.
uint64_t result = 0 ;
for (unsigned int ii = 0 ; str.c_str()[ii] != 0 ; ++ ii)
{
result *= 10 ;
result += str.c_str () [ii] - '0' ;
}
This question already has answers here:
<< operator in C++?
(3 answers)
Closed 9 years ago.
I'm reading values from an accelerometer and saving them in a buffer called 'values'. Each accelerometer reading is 10 bits long, but the values are read in as bytes, so eah accelerometer reading is actually two bytes or two values in the 'values' buffer. This is sample code on how to combine those two bytes to get the one value:
x = ((int)values[1]<<8)|(int)values[0];
I get that I'm combining values[1] and values[2] and I'm pretty sure the (int) part is type casting those parts as integers (although I'm not sure why). The parts that have me really confused are <<8 and the vertical bar |. What are these two parts doing?
Thanks for any explanation and help you can give!
It's a bitmask.
You are left shifting (<<) the value in values[1] by 8 bit-positions. and then ORing (|) it to the value in values[0].
Please take some values and try to work through them. You will understand it better.
Here's a link for more reading and bit-manipulation examples.
This line of code combines two char into a int in a way that first char is moved 8 bits.
For example, value[0] = 5, value[1] = 1, then the read in value should be, 128 + 5 = 133. Because the high byte 1 means 128. Another way to look at it is:
x = ((int)values[1]<<8) + (int)values[0];
Replace or with +, it will be more readable. Hope this helps.
Take for example a 10-bit reading of 0101010111 in binary.
The lower 8 bits go to values[0] = 01010111 in binary (= 87 decimal).
The higher 2 bits go to values[1] = 01.
To recover the original 10-bit number from values:
(int)values[1] << 8 --> 01 << 8 --> 0100000000
values[1] is converted to a int (typically 32 bits) and then shifted left << 8 bits.
((int)values[1]<<8) | (int)values[0] --> 0100000000 | 01010111
or in vertical notation to express a bitwise-or:
0100000000
| 01010111
------------
0101010111
QED
The << operator shifts the bits in the second byte left by 8 bits so for example 0000000011111111 becomes 1111111100000000. The | is the binary "or" operator that combines the two bits in every position making it 1 if either bit or both bits are 1.
You have 2 bytes (1 byte = 8 bits) and you are trying to read in a 10 bit value, which is why you need 2 bytes instead of just using 1. When you are reading in the value you need to cast the 2 bytes to int so you can treat them like and integer value, but there is an issue, if value[1] is 3 (00000011) and the next byte value[0] is 227 (11100011) you can get a proper reading if you add them so you need to bit shift value[1] left by 8.
When you bit shift a unsigned char/char/byte by 8 you end up with 0, so you need to cast both value[1] and value[0] to an int so when you do the bit shift you end up with 768 (00000011 00000000) now you | that with value [0] and you end up with
(00000011 00000000 | 00000000 11100011) = (00000011 11100011) = 995
note I am only using 16bit ints so the example isn't full of a bunch of 0s.
If you have access to a programming calculator it can help you understand why you need to cast these byte values to ints, it can also just help you with casting in general. I would sugest playing around with the windows Calculator app for a bit if you have access to it. In order to get it into the programmer view go to view->programmer.
I haven't found a question answering this exact behaviour, and somehow I just don't understand what is going on:
I read the contents of a Windows Bitmap File (bmp) into a array and use this array later to extract required information:
char biHeader[40];
// ...
source.read(biHeader,40);
// ...
int biHeight = biHeader[8] | (biHeader[9] << 8) | (biHeader[10] << 16) | (biHeader[11] << 24);
After this, biHeight shows as -112 which is totally wrong because it should be 400.
So, I took a look at a hexdump of the file. The contents read are:
90 01 00 00
Changing the byte order to big endian gives 0x190 which is 400 in decimal, as expected.
If I change above code to:
unsigned char biHeader[40];
// ...
source.read((char*)biHeader,40);
// ...
int biHeight = ... (same as before)
... then I get the expected value. What is going on here?
And: How would you read this data?
As a signed 8-bit two's complement integer, 0x90 is -112. When that is converted to int for the |, its value is preserved. Since all bits from the seventh on are set if the representation is two's complement, a bitwise or with values shifted left by at least eight bits doesn't change the value anymore.
As an unsigned 8-bit integer, the value of 0x90 is 144, a positive number with no bits beyond the 2^7 bit set. Then, a bitwise or with biHeader[9] << 8 changes the value to the desired 144 + 256 = 400.
When working with bitwise operators, (almost) always use unsigned types, signed types often lead to unpleasant surprises (and undefined behaviour if the shift result is out of range or a negative integer is shifted left).
What I must do is open a file in binary mode that contains stored data that is intended to be interpreted as integers. I have seen other examples such as Stackoverflow-Reading “integer” size bytes from a char* array. but I want to try taking a different approach (I may just be stubborn, or stupid :/). I first created a simple binary file in a hex editor that reads as follows.
00 00 00 47 00 00 00 17 00 00 00 41
This (should) equal 71, 23, and 65 if the 12 bytes were divided into 3 integers.
After opening this file in binary mode and reading 4 bytes into an array of chars, how can I use bitwise operations to make char[0] bits be the first 8 bits of an int and so on until the bits of each char are part of the int.
My integer = 00 00 00 00
+ ^ ^ ^ ^
Chars Char[0] Char[1] Char[2] Char[3]
00 00 00 47
So my integer(hex) = 00 00 00 47 = numerical value of 71
Also, I don't know how the endianness of my system comes into play here, so is there anything that I need to keep in mind?
Here is a code snippet of what I have so far, I just don't know the next steps to take.
std::fstream myfile;
myfile.open("C:\\Users\\Jacob\\Desktop\\hextest.txt", std::ios::in | std::ios::out | std::ios::binary);
if(myfile.is_open() == false)
{
std::cout << "Error" << std::endl;
}
char* mychar;
std::cout << myfile.is_open() << std::endl;
mychar = new char[4];
myfile.read(mychar, 4);
I eventually plan on dealing with reading floats from a file and maybe a custom data type eventually, but first I just need to get more familiar with using bitwise operations.
Thanks.
You want the bitwise left shift operator:
typedef unsigned char u8; // in case char is signed by default on your platform
unsigned num = ((u8)chars[0] << 24) | ((u8)chars[1] << 16) | ((u8)chars[2] << 8) | (u8)chars[3];
What it does is shift the left argument a specified number of bits to the left, adding zeros from the right as stuffing. For example, 2 << 1 is 4, since 2 is 10 in binary and shifting one to the left gives 100, which is 4.
This can be more written in a more general loop form:
unsigned num = 0;
for (int i = 0; i != 4; ++i) {
num |= (u8)chars[i] << (24 - i * 8); // += could have also been used
}
The endianness of your system doesn't matter here; you know the endianness of the representation in the file, which is constant (and therefore portable), so when you read in the bytes you know what to do with them. The internal representation of the integer in your CPU/memory may be different from that of the file, but the logical bitwise manipulation of it in code is independent of your system's endianness; the least significant bits are always at the right, and the most at the left (in code). That's why shifting is cross-platform -- it operates at the logical bit level :-)
Have you thought of using Boost.Spirit to make a binary parser? You might hit a bit of a learning curve when you start, but if you want to expand your program later to read floats and structured types, you'll have an excellent base to start from.
Spirit is very well-documented and is part of Boost. Once you get around to understanding its ins and outs, it's really mind-boggling what you can do with it, so if you have a bit of time to play around with it, I'd really recommend taking a look.
Otherwise, if you want your binary to be "portable" - i.e. you want to be able to read it on a big-endian and a little-endian machine, you'll need some sort of byte-order mark (BOM). That would be the first thing you'd read, after which you can simply read your integers byte by byte. Simplest thing would probably be to read them into a union (if you know the size of the integer you're going to read), like this:
union U
{
unsigned char uc_[4];
unsigned long ui_;
};
read the data into the uc_ member, swap the bytes around if you need to change endianness and read the value from the ui_ member. There's no shifting etc. to be done - except for the swapping if you want to change endianness..
HTH
rlc