How to read little endian integers from file in C++? - c++

Say I have a binary file; it contains positive binary numbers, but written in little endian as 32-bit integers
How do I read this file? I have this right now.
int main() {
FILE * fp;
char buffer[4];
int num = 0;
fp=fopen("file.txt","rb");
while ( fread(&buffer, 1, 4,fp) != 0) {
// I think buffer should be 32 bit integer I read,
// how can I let num equal to 32 bit little endian integer?
}
// Say I just want to get the sum of all these binary little endian integers,
// is there an another way to make read and get sum faster since it's all
// binary, shouldnt it be faster if i just add in binary? not sure..
return 0;
}

This is one way to do it that works on either big-endian or little-endian architectures:
int main() {
unsigned char bytes[4];
int sum = 0;
FILE *fp=fopen("file.txt","rb");
while ( fread(bytes, 4, 1,fp) != 0) {
sum += bytes[0] | (bytes[1]<<8) | (bytes[2]<<16) | (bytes[3]<<24);
}
return 0;
}

If you are using linux you should look here ;-)
It is about useful functions such as le32toh

From CodeGuru:
inline void endian_swap(unsigned int& x)
{
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
So, you can read directly to unsigned int and then just call this.
while ( fread(&num, 1, 4,fp) != 0) {
endian_swap(num);
// conversion done; then use num
}

If you are working with short files, I recommend the simple use of the class stringstream and then the function stoul. The code below reads byte per byte (in this case 2 bytes) from an ifstream and writes them in hex inside a string stream. Then thanks to stoul converts the string into a 16 bit integer:
#include <sstream>
#include <iomanip>
using namespace std;
ifstream is("filename.bin", ios::binary);
if(!is) { /*Error*/ }
is.unsetf(ios_base::skipws);
stringstream ss;
uint8_t byte1, byte2;
uint16_t val;
is >> byte1; is >> byte2;
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte1);
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte2);
val = static_cast<uint16_t>(stoul(ss.str(), nullptr, 16));
cout << val << endl;
For example if you have to read from a binary file, a 16 bit integer stored in Big Endian (00 f3), you put it inside a stringstream ("00f3") and then convert it in a integer (243). The example writes the value in hex, but it could be dec or oct, even binary, using the class bitset. The iomanip functions (setw, setfill) are used to give a correct format to the sstream.
The bad of this method is that it's tremendously slow if you have to work with files large in size.

You read the code normally. However when you go to interpret the data you need to make the proper conversions.
This can be a pain in the butt as if you want to make your code portable, ie to run in both little and big endian machines, you need to handle all types of combinations: little to big, big to little, little to little and big to big. In the last two cases a no-op.
Fortunately this all can be automated with the boost::endian library. An example from their documentation:
#include <iostream>
#include <cstdio>
#include <boost/endian/arithmetic.hpp>
#include <boost/static_assert.hpp>
using namespace boost::endian;
namespace
{
// This is an extract from a very widely used GIS file format.
// Why the designer decided to mix big and little endians in
// the same file is not known. But this is a real-world format
// and users wishing to write low level code manipulating these
// files have to deal with the mixed endianness.
struct header
{
big_int32_t file_code;
big_int32_t file_length;
little_int32_t version;
little_int32_t shape_type;
};
const char* filename = "test.dat";
}
int main(int, char* [])
{
header h;
BOOST_STATIC_ASSERT(sizeof(h) == 16U); // reality check
h.file_code = 0x01020304;
h.file_length = sizeof(header);
h.version = 1;
h.shape_type = 0x01020304;
// Low-level I/O such as POSIX read/write or <cstdio>
// fread/fwrite is sometimes used for binary file operations
// when ultimate efficiency is important. Such I/O is often
// performed in some C++ wrapper class, but to drive home the
// point that endian integers are often used in fairly
// low-level code that does bulk I/O operations, <cstdio>
// fopen/fwrite is used for I/O in this example.
std::FILE* fi = std::fopen(filename, "wb"); // MUST BE BINARY
if (!fi)
{
std::cout << "could not open " << filename << '\n';
return 1;
}
if (std::fwrite(&h, sizeof(header), 1, fi) != 1)
{
std::cout << "write failure for " << filename << '\n';
return 1;
}
std::fclose(fi);
std::cout << "created file " << filename << '\n';
return 0;
}
After compiling and executing endian_example.cpp, a hex dump of test.dat shows:
01020304 00000010 01000000 04030201

Related

How to access range of bits in a bitset?

I have a bitset which is very large, say, 10 billion bits.
What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.
What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.
However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.
I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.
#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;
int main()
{
bitset<50> bit_array(302332342342342323);
cout<<bit_array << "\n";
bitset<50>* p;
p = &bit_array;
p++;
int some_int;
memcpy(&some_int, p , 2);
cout << &bit_array << "\n";
cout << &p << "\n";
cout << some_int << "\n";
return 0;
}
the output
10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090
0x7ffe8aa2b098
17736
The last number seems to change on each run which is not what I expect.
There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.
some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.
After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.
std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
if (bit_array[i])
some_int |= mask;
mask <<= 1;
}
You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.
See DEMO.
For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.
I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.
The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.
#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;
constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
// assumes CHUNK_SIZE is nonzero and at most the number of
// bits in long long (which is at least 64).
int main()
{
// Generate some large bitset. This is just test data, so don't read too much into this.
bitset<FULL_SIZE> bit_array(302332342342342323);
bit_array |= bit_array << (FULL_SIZE/2);
cout << "Source: " << bit_array << "\n";
// The mask avoids overflow in to_ullong().
// The mask should be have exactly its CHUNK_SIZE low-order bits set.
// As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
cout << "Mask: " << mask64 << "\n";
// Extract chunks.
const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
for ( size_t i = 0; i < num_chunks; ++i ) {
// Extract the next CHUNK_SIZE bits, then convert to an integer.
const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
unsigned long long chunk_val = chunk_set.to_ullong();
// NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
}
return 0;
}
The output:
Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask: 000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b

Reading bytes in c++

I'm trying to read bytes from binary file but to no success.
I've tried many solutions, but I get no get result.
Struct of file:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
........
xxxx unsigned byte ?? pixel
How I tried (doesn't work):
auto myfile = fopen("t10k-images.idx3-ubyte", "r");
char buf[30];
auto x = fread(buf, 1, sizeof(int), myfile);
Read the bytes as unsigned char:
ifstream if;
if.open("filename", ios::binary);
if (if.fail())
{
//error
}
vector<unsigned char> bytes;
while (!if.eof())
{
unsigned char byte;
if >> byte;
if (if.fail())
{
//error
break;
}
bytes.push_back(byte);
}
if.close();
Then to turn multiple bytes into a 32-bit integer for example:
uint32_t number;
number = ((static_cast<uint32_t>(byte3) << 24)
| (static_cast<uint32_t>(byte2) << 16)
| (static_cast<uint32_t>(byte1) << 8)
| (static_cast<uint32_t>(byte0)));
This should cover endian issues. It doesn't matter if int shows up as B0B1B2B3 or B3B2B1B0 on the system, since the conversion is handled by bit shifts. The code doesn't assume any particular order in memory.
The C++ stream library function read() can be used for binary file I/O. Given the code example from the link, I would start like this:
std::ifstream myfile("t10k-images.idx3-ubyte", std::ios::binary);
std::uint32_t magic, numim, numro, numco;
myfile.read(reinterpret_cast<char*>(&magic), 4);
myfile.read(reinterpret_cast<char*>(&numim), 4);
myfile.read(reinterpret_cast<char*>(&numro), 4);
myfile.read(reinterpret_cast<char*>(&numco), 4);
// Changing byte order if necessary
//endswap(&magic);
//endswap(&numim);
//endswap(&numro);
//endswap(&numco);
if (myfile) {
std::cout << "Magic = " << magic << std::endl
<< "Images = " << numim << std::endl
<< "Rows = " << numro << std::endl
<< "Cols = " << numco << std::endl;
}
If the byte order (Endianness) should be reversed you could write a simple reverse function like this one: endswap()
Knowing the endianness of your file layout whence reading multi-byte numerics is important. Assuming big-endian is always the written format, and assuming the value is indeed a 32bit unsigned value:
uint32_t magic = 0;
unsigned char[4] bytes;
if (1 == fread(bytes, sizeof(bytes), 1, f))
{
magic = (uint32_t)((bytes[0] << 24) |
(bytes[1] << 16) |
(bytes[2] << 8) |
bytes[3]);
}
Note: this will work regardless of whether the reader (your program) is little endian or big-endian. I'm sure I missed at least one cast in there, but hopefully you get the point. The only safe, and portable way of reading multi-byte numerics is to (a) know the endianness they were written with, and (b) read-and-assemble them byte by byte.
This is how you read an uint32_t from a file:
auto f = fopen("", "rb"); // not the b, for binary files you need to specify 'b'
std::uint32_t magic = 0;
fread (&magic, sizeof(std::uint32_t), 1, f);
Hope this helps.

Write BitSet of 8 bits to file (C++)

I have a BitSet of 8 bits.
How would I convert those 8 bits to a byte then write to file?
I have looked everywhere and only find converting the other way.
Thanks alot!
Assuming that you are talking about C++ STL bitsets, the answer is to convert the bitset to int (ulong to be precise), and casting the result into a char.
Example:
#include <bitset>
#include <iostream>
using namespace std;
main()
{
bitset<8> x;
char byte;
cout << "Enter a 8-bit bitset in binary: " << flush;
cin >> x;
cout << "x = " << x << endl;
byte = (char) x.to_ulong();
cout << "As byte: " << (int) byte << endl;
}
http://www.cplusplus.com/reference/stl/bitset/
They can also be directly inserted and extracted from streams in binary format.
You don't need to convert anything, you just write them to the output stream.
Aside from that, if you really wanted to extract them into something you're used to, to_ulong and to_string methods are provided.
If you have more bits in the set than an unsigned long can hold and don't want to write them out directly to the stream, then you're either going to have convert to a string and go that route, or access each bit using the [] operator and shift them into bytes that you're writing out.
You could use fstream std::ofstream:
#include <fstream>
std::ofstream os("myfile.txt", std::ofstream::binary);
os << static_cast<uint_fast8_t>(bitset<8>("01101001").to_ulong());
os.close();

How to output the Binary value of a variable in C++

I've got a homework assignment in my C++ programming class to write a function that outputs the binary value of a variable's value.
So for example, if I set a value of "a" to a char I should get the binary value of "a" output.
My C++ professor isn't the greatest in the whole world and I'm having trouble getting my code to work using the cryptic examples he gave us. Right now, my code just outputs a binary value of 11111111 no matter what I set it too (unless its NULL then I get 00000000).
Here is my code:
#include <iostream>
#define useavalue 1
using namespace std;
void GiveMeTehBinary(char bin);
int main(){
#ifdef useavalue
char b = 'a';
#else
char b = '\0';
#endif
GiveMeTehBinary(b);
system("pause");
return 0;
}
void GiveMeTehBinary(char bin){
long s;
for (int i = 0; i < 8; i++){
s = bin >> i;
cout << s%2;
}
cout << endl << endl;
}
Thanks a ton in advance guys. You're always extremely helpful :)
Edit: Fixed now - thanks a bunch :D The problem was that I was not storing the value from the bit shift. I've updated the code to its working state above.
The compiler should warn you about certain statements in your code that have no effect1. Consider
bin >> i;
This does nothing, since you don’t store the result of this operation anywhere.
Also, why did you declare tehbinary as an array? All you ever use is one element (the current one). It would be enough to store just the current bit.
Some other things:
NULL must only be used with pointer values. Your usage works but it’s not the intended usage. What you really want is a null character, i.e. '\0'.
Please use real, descriptive names. I vividly remember myself using variables called tehdataz etc. but this really makes the code hard to read and once the initial funny wears off it’s annoying both for you when you try to read your code, and for whoever is grading your code.
Formatting the code properly helps understanding a lot: make the indentation logical and consistent.
1 If you’re using g++, always pass the compiler flags -Wall -Wextra to get useful diagnostics about your code.
Try this:
#include <bitset>
#include <iostream>
int main()
{
std::bitset<8> x('a');
std::cout << x << std::endl;
}
it's actually really simple. to convert from decimal to binary you will need to include #include <bitset> in your program. inside here, it gives you a function that allows you to convert from decimal to binary form. and the function looks like this:
std::cout << std::bitset<8>(0b01000101) << std::endl;
the number 8 in the first argument means the length of the output string. the second argument is the value you want to convert. by the way, you can input a variable in binary form by declaring a 0b in front of the number to write it in binary form. note that to write in binary form is a feature added in c++14 so using any version lower than that won't work. here is the full code if you want to test it out.
#include <iostream>
#include <bitset>
int main()
{
std::cout << std::bitset<8>(0b01000101) << std::endl;
}
note that you don't have to input a binary number to do this.
#include <iostream>
#include <bitset>
int main()
{
std::cout << std::bitset<8>(34) << std::endl;
}
output:
00100010
Why not just check each bit in the unsigned char variable?
unsigned char b=0x80|0x20|0x01; //some test data
int bitbreakout[8];
if(b&0x80) bitbreakout[7]=1;
//repeat above for 0x40, 0x20, etc.
cout << bitbreakout;
There are a TON of ways to optimize this, but this should give you an idea of what do to.
#include <iostream>
using namespace std;
int main(){
int x = 255;
for(int i = numeric_limits<int>::digits; i >=0; i--){
cout << ((x & (1 << i)) >> i);
}
}
it's actually really simple. if you know how to convert decimal to binary, then you can code it easily in c++. in fact I have gone ahead and created a header file that allows you not only to convert from decimal to binary, it can convert from decimal to any number system. here's the code.
#pragma once
#include <string>
char valToChar(const uint32_t val)
{
if (val <= 9)
return 48 + val;
if (val <= 35)
return 65 + val - 10;
return 63;
}
std::string baseConverter(uint32_t num, const uint32_t &base)
{
std::string result;
while (num != 0)
{
result = valToChar(num % base) + result;
num /= base;
}
return result;
}
now, here is how you can use it.
int main()
{
std::cout << baseConverter(2021, 2) << "\n";
}
output:
11111100101

Reading/Writing integer values on a string object

I have the contents of a file assigned into a string object. For simplicity the file only has 5 bytes, which is the size of 1 integer plus another byte.
What I want to do is get the first four bytes of the string object and somehow store it into a valid integer variable by the program.
Then the program will do various operations on the integer, changing it.
Afterward I want the changed integer stored back into the first four bytes of the string object.
Could anyone tell me I could achieve this? I would prefer to stick with the standard C++ library exclusively for this purpose. Thanks in advance for any help.
The following code snippet should illustrate a handful of things. Beware of endian differences. Play around with it. Try to understand what's going on. Add some file operations (binary read & write). The only way to really understand how to do this, is to experiment and create some tests.
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char *argv[]) {
int a = 108554107; // some random number for example sake
char c[4]; // simulate std::string containing a binary int
*((int *) &c[0]) = a; // use casting to copy the data
// reassemble a into b, using indexed bytes from c
int b = 0;
b |= (c[3] & 0xff) << 24;
b |= (c[2] & 0xff) << 16;
b |= (c[1] & 0xff) << 8;
b |= c[0] & 0xff;
// show that all three are equivalent
cout << "a: " << a << " b: " << b
<< " c: " << *((int *) &c[0]) << endl;
return 0;
}
If you are reading into std::string from that file any zero byte would signal end of the string, so you might end up with a string that is shorter then 5 bytes. Take a look here for how to do binary I/O with C++ streams.