I have the contents of a file assigned into a string object. For simplicity the file only has 5 bytes, which is the size of 1 integer plus another byte.
What I want to do is get the first four bytes of the string object and somehow store it into a valid integer variable by the program.
Then the program will do various operations on the integer, changing it.
Afterward I want the changed integer stored back into the first four bytes of the string object.
Could anyone tell me I could achieve this? I would prefer to stick with the standard C++ library exclusively for this purpose. Thanks in advance for any help.
The following code snippet should illustrate a handful of things. Beware of endian differences. Play around with it. Try to understand what's going on. Add some file operations (binary read & write). The only way to really understand how to do this, is to experiment and create some tests.
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char *argv[]) {
int a = 108554107; // some random number for example sake
char c[4]; // simulate std::string containing a binary int
*((int *) &c[0]) = a; // use casting to copy the data
// reassemble a into b, using indexed bytes from c
int b = 0;
b |= (c[3] & 0xff) << 24;
b |= (c[2] & 0xff) << 16;
b |= (c[1] & 0xff) << 8;
b |= c[0] & 0xff;
// show that all three are equivalent
cout << "a: " << a << " b: " << b
<< " c: " << *((int *) &c[0]) << endl;
return 0;
}
If you are reading into std::string from that file any zero byte would signal end of the string, so you might end up with a string that is shorter then 5 bytes. Take a look here for how to do binary I/O with C++ streams.
Related
I am trying to create a bitmaped data in , here is the code I used but I am not able to figure the right logic. Here's my code
bool a=1;
bool b=0;
bool c=1;
bool d=0;
uint8_t output = a|b|c|d;
printf("outupt = %X", output);
I want my output to be "1010" which is equivalent to hex "0x0A". How do I do it ??
The bitwise or operator ors the bits in each position. The result of a|b|c|d will be 1 because you're bitwise oring 0 and 1 in the least significant position.
You can shift (<<) the bits to the correct positions like this:
uint8_t output = a << 3 | b << 2 | c << 1 | d;
This will result in
00001000 (a << 3)
00000000 (b << 2)
00000010 (c << 1)
| 00000000 (d; d << 0)
--------
00001010 (output)
Strictly speaking, the calculation happens with ints and the intermediate results have more leading zeroes, but in this case we do not need to care about that.
If you're interested in setting/clearing/accessing very simply specific bits, you could consider std::bitset:
bitset<8> s; // bit set of 8 bits
s[3]=a; // access individual bits, as if it was an array
s[2]=b;
s[1]=c;
s[0]=d; // the first bit is the least significant bit
cout << s <<endl; // streams the bitset as a string of '0' and '1'
cout << "0x"<< hex << s.to_ulong()<<endl; // convert the bitset to unsigned long
cout << s[3] <<endl; // access a specific bit
cout << "Number of bits set: " << s.count()<<endl;
Online demo
The advantage is that the code is easier to read and maintain, especially if you're modifying bitmapped data. Because setting specific bits using binary arithmetics with a combination of << and | operators as explained by Anttii is a vorkable solution. But clearing specific bits in an existing bitmap, by combining the use of << and ~ (to create a bit mask) with & is a little more tricky.
Another advantage is that you can easily manage large bitsets of hundreds of bits, much larger than the largest built-in type unsigned long long (although doing so will not allow you to convert as easily to an unsigned long or an unsigned long long: you'll have to go via a string).
C only
I would use bitfields. I know that they are not portable, but for the particular embedded hardware (especially uCs) it is well defined.
#include <string.h>
#include <stdio.h>
#include <stdbool.h>
typedef union
{
struct
{
bool a:1;
bool b:1;
bool c:1;
bool d:1;
bool e:1;
bool f:1;
};
unsigned char byte;
}mydata;
int main(void)
{
mydata d;
d.a=1;
d.b=0;
d.c=1;
d.d=0;
printf("outupt = %hhX", d.byte);
}
I have a bitset which is very large, say, 10 billion bits.
What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.
What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.
However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.
I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.
#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;
int main()
{
bitset<50> bit_array(302332342342342323);
cout<<bit_array << "\n";
bitset<50>* p;
p = &bit_array;
p++;
int some_int;
memcpy(&some_int, p , 2);
cout << &bit_array << "\n";
cout << &p << "\n";
cout << some_int << "\n";
return 0;
}
the output
10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090
0x7ffe8aa2b098
17736
The last number seems to change on each run which is not what I expect.
There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.
some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.
After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.
std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
if (bit_array[i])
some_int |= mask;
mask <<= 1;
}
You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.
See DEMO.
For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.
I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.
The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.
#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;
constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
// assumes CHUNK_SIZE is nonzero and at most the number of
// bits in long long (which is at least 64).
int main()
{
// Generate some large bitset. This is just test data, so don't read too much into this.
bitset<FULL_SIZE> bit_array(302332342342342323);
bit_array |= bit_array << (FULL_SIZE/2);
cout << "Source: " << bit_array << "\n";
// The mask avoids overflow in to_ullong().
// The mask should be have exactly its CHUNK_SIZE low-order bits set.
// As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
cout << "Mask: " << mask64 << "\n";
// Extract chunks.
const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
for ( size_t i = 0; i < num_chunks; ++i ) {
// Extract the next CHUNK_SIZE bits, then convert to an integer.
const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
unsigned long long chunk_val = chunk_set.to_ullong();
// NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
}
return 0;
}
The output:
Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask: 000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b
I am trying to read in a binary file in a known format. I want to find the most efficient way to extract values from it. My ideas are:
Method 1: Read each value into a new char array then get it into the correct data type. For the first 4 byte positive int, I bitshift the values accordingly and assign to an integer as below.
Method 2: Keep the whole file in a char array, then create pointers to different parts of it. In the code below I am trying to point to these first 4 bytes and use reinterpret_cast to interpret them as an integer when I dereference the variable 'bui'.
But the ouput from this code is:
11000000001100000000110000000011
3224374275
00000011000011000011000011000000
51130560
My questions are
why does the endianness get swapped using my method 2 and how do I point to it correctly?
which method is more efficient? I need all of the file, and the file contains other data types too so I will need to write different methods to interpret them if using method 1. I was assuming I could just define different type pointers if using method 2 without doing extra work!
Thanks
#include <iostream>
#include <bitset>
int main(void){
unsigned char b[4];
//ifs.read((char*)b,sizeof(b));
//let's pretend the following 4 bytes are read in representing the number 3224374275:
b[0] = 0b11000000;
b[1] = 0b00110000;
b[2] = 0b00001100;
b[3] = 0b00000011;
//method 1:
unsigned int a = 0; //4 byte capacity
a = b[0] << 24 | b[1] << 16 | b[2] << 8 | b[3];
std::bitset<32> xm1(a);
std::cout << xm1 << std::endl;
std::cout << a << std::endl;
//method 2;
unsigned int* bui = reinterpret_cast<unsigned int*>(b);
std::bitset<32> xm2(*bui);
std::cout << xm2 << std::endl;
std::cout << *bui << std::endl;
}
Say I have a binary file; it contains positive binary numbers, but written in little endian as 32-bit integers
How do I read this file? I have this right now.
int main() {
FILE * fp;
char buffer[4];
int num = 0;
fp=fopen("file.txt","rb");
while ( fread(&buffer, 1, 4,fp) != 0) {
// I think buffer should be 32 bit integer I read,
// how can I let num equal to 32 bit little endian integer?
}
// Say I just want to get the sum of all these binary little endian integers,
// is there an another way to make read and get sum faster since it's all
// binary, shouldnt it be faster if i just add in binary? not sure..
return 0;
}
This is one way to do it that works on either big-endian or little-endian architectures:
int main() {
unsigned char bytes[4];
int sum = 0;
FILE *fp=fopen("file.txt","rb");
while ( fread(bytes, 4, 1,fp) != 0) {
sum += bytes[0] | (bytes[1]<<8) | (bytes[2]<<16) | (bytes[3]<<24);
}
return 0;
}
If you are using linux you should look here ;-)
It is about useful functions such as le32toh
From CodeGuru:
inline void endian_swap(unsigned int& x)
{
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
So, you can read directly to unsigned int and then just call this.
while ( fread(&num, 1, 4,fp) != 0) {
endian_swap(num);
// conversion done; then use num
}
If you are working with short files, I recommend the simple use of the class stringstream and then the function stoul. The code below reads byte per byte (in this case 2 bytes) from an ifstream and writes them in hex inside a string stream. Then thanks to stoul converts the string into a 16 bit integer:
#include <sstream>
#include <iomanip>
using namespace std;
ifstream is("filename.bin", ios::binary);
if(!is) { /*Error*/ }
is.unsetf(ios_base::skipws);
stringstream ss;
uint8_t byte1, byte2;
uint16_t val;
is >> byte1; is >> byte2;
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte1);
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte2);
val = static_cast<uint16_t>(stoul(ss.str(), nullptr, 16));
cout << val << endl;
For example if you have to read from a binary file, a 16 bit integer stored in Big Endian (00 f3), you put it inside a stringstream ("00f3") and then convert it in a integer (243). The example writes the value in hex, but it could be dec or oct, even binary, using the class bitset. The iomanip functions (setw, setfill) are used to give a correct format to the sstream.
The bad of this method is that it's tremendously slow if you have to work with files large in size.
You read the code normally. However when you go to interpret the data you need to make the proper conversions.
This can be a pain in the butt as if you want to make your code portable, ie to run in both little and big endian machines, you need to handle all types of combinations: little to big, big to little, little to little and big to big. In the last two cases a no-op.
Fortunately this all can be automated with the boost::endian library. An example from their documentation:
#include <iostream>
#include <cstdio>
#include <boost/endian/arithmetic.hpp>
#include <boost/static_assert.hpp>
using namespace boost::endian;
namespace
{
// This is an extract from a very widely used GIS file format.
// Why the designer decided to mix big and little endians in
// the same file is not known. But this is a real-world format
// and users wishing to write low level code manipulating these
// files have to deal with the mixed endianness.
struct header
{
big_int32_t file_code;
big_int32_t file_length;
little_int32_t version;
little_int32_t shape_type;
};
const char* filename = "test.dat";
}
int main(int, char* [])
{
header h;
BOOST_STATIC_ASSERT(sizeof(h) == 16U); // reality check
h.file_code = 0x01020304;
h.file_length = sizeof(header);
h.version = 1;
h.shape_type = 0x01020304;
// Low-level I/O such as POSIX read/write or <cstdio>
// fread/fwrite is sometimes used for binary file operations
// when ultimate efficiency is important. Such I/O is often
// performed in some C++ wrapper class, but to drive home the
// point that endian integers are often used in fairly
// low-level code that does bulk I/O operations, <cstdio>
// fopen/fwrite is used for I/O in this example.
std::FILE* fi = std::fopen(filename, "wb"); // MUST BE BINARY
if (!fi)
{
std::cout << "could not open " << filename << '\n';
return 1;
}
if (std::fwrite(&h, sizeof(header), 1, fi) != 1)
{
std::cout << "write failure for " << filename << '\n';
return 1;
}
std::fclose(fi);
std::cout << "created file " << filename << '\n';
return 0;
}
After compiling and executing endian_example.cpp, a hex dump of test.dat shows:
01020304 00000010 01000000 04030201
I have a BitSet of 8 bits.
How would I convert those 8 bits to a byte then write to file?
I have looked everywhere and only find converting the other way.
Thanks alot!
Assuming that you are talking about C++ STL bitsets, the answer is to convert the bitset to int (ulong to be precise), and casting the result into a char.
Example:
#include <bitset>
#include <iostream>
using namespace std;
main()
{
bitset<8> x;
char byte;
cout << "Enter a 8-bit bitset in binary: " << flush;
cin >> x;
cout << "x = " << x << endl;
byte = (char) x.to_ulong();
cout << "As byte: " << (int) byte << endl;
}
http://www.cplusplus.com/reference/stl/bitset/
They can also be directly inserted and extracted from streams in binary format.
You don't need to convert anything, you just write them to the output stream.
Aside from that, if you really wanted to extract them into something you're used to, to_ulong and to_string methods are provided.
If you have more bits in the set than an unsigned long can hold and don't want to write them out directly to the stream, then you're either going to have convert to a string and go that route, or access each bit using the [] operator and shift them into bytes that you're writing out.
You could use fstream std::ofstream:
#include <fstream>
std::ofstream os("myfile.txt", std::ofstream::binary);
os << static_cast<uint_fast8_t>(bitset<8>("01101001").to_ulong());
os.close();