I have a bitset which is very large, say, 10 billion bits.
What I'd like to do is write this to a file. However using .to_string() actually freezes my computer.
What I'd like to do is iterate over the bits and take 64 bits at a time, turn it into a uint64 and then write it to a file.
However I'm not aware how to access different ranges of the bitset. How would I do that? I am new to c++ and wasn't sure how to access the underlying bitset::reference so please provide an example for an answer.
I tried using a pointer but did not get what I expected. Here's an example of what I'm trying so far.
#include <iostream>
#include <bitset>
#include <cstring>
using namespace std;
int main()
{
bitset<50> bit_array(302332342342342323);
cout<<bit_array << "\n";
bitset<50>* p;
p = &bit_array;
p++;
int some_int;
memcpy(&some_int, p , 2);
cout << &bit_array << "\n";
cout << &p << "\n";
cout << some_int << "\n";
return 0;
}
the output
10000110011010100111011101011011010101011010110011
0x7ffe8aa2b090
0x7ffe8aa2b098
17736
The last number seems to change on each run which is not what I expect.
There are a couple of errors in the program. The maximum value bitset<50> can hold is 1125899906842623 and this is much less than what bit_array has been initialized with in the program.
some_int has to be defined as unsigned long and verify if unsigned long has 64 bits on your platform.
After this, test each bit of bit_array in a loop and then do the appropriate bitwise (OR and shift) operations and store the result into some_int.
std::size_t start_bit = 0;
std::size_t end_bit = 64;
for (std::size_t i = start_bit; i < end_bit; i++) {
if (bit_array[i])
some_int |= mask;
mask <<= 1;
}
You can change the values of start_bit and end_bit appropriately as you navigate through the large bitset.
See DEMO.
For accessing ranges of a bitset, you should look at the provided interface. The lack of something like bitset::data() indicates that you should not try to access the underlying data directly. Doing so, even if it had seemed to work, is fragile, hacky, and probably undefined behavior of some sort.
I see two possibilities for converting a massive bitset into more manageable pieces. A fairly straight-forward approach is to just go through bit-by-bit and collect these into an integer of some sort (or write them directly to a file as '0' or '1' if you're not that concerned about file size). Looks like P.W already provided code for this, so I'll skip an example for now.
The second possibility is to use bitwise operators and to_ullong(). The downside of this approach is that it nominally uses auxiliary storage space, specifically two additional bitsets the same size as your original. I say "nominally", though, because a compiler might be clever enough to optimize them away. Might. Maybe not. And you are dealing with sizes over a gigabyte each. Realistically, the bit-by-bit approach is probably the way to go, but I think this example is interesting at a theoretical level.
#include <iostream>
#include <iomanip>
#include <bitset>
#include <cstdint>
using namespace std;
constexpr size_t FULL_SIZE = 120; // Some large number
constexpr size_t CHUNK_SIZE = 64; // Currently the mask assumes 64. Otherwise, this code just
// assumes CHUNK_SIZE is nonzero and at most the number of
// bits in long long (which is at least 64).
int main()
{
// Generate some large bitset. This is just test data, so don't read too much into this.
bitset<FULL_SIZE> bit_array(302332342342342323);
bit_array |= bit_array << (FULL_SIZE/2);
cout << "Source: " << bit_array << "\n";
// The mask avoids overflow in to_ullong().
// The mask should be have exactly its CHUNK_SIZE low-order bits set.
// As long as we're dealing with 64-bit chunks, there's a handy constant to handle this.
constexpr bitset<FULL_SIZE> mask64(UINT64_MAX);
cout << "Mask: " << mask64 << "\n";
// Extract chunks.
const size_t num_chunks = (FULL_SIZE + CHUNK_SIZE - 1)/CHUNK_SIZE; // Round up.
for ( size_t i = 0; i < num_chunks; ++i ) {
// Extract the next CHUNK_SIZE bits, then convert to an integer.
const bitset<FULL_SIZE> chunk_set{(bit_array >> (CHUNK_SIZE * i)) & mask64};
unsigned long long chunk_val = chunk_set.to_ullong();
// NOTE: as long as CHUNK_SIZE <= 64, chunk_val can be converted safely to the desired uint64_t.
cout << "Chunk " << dec << i << ": 0x" << hex << setfill('0') << setw(16) << chunk_val << "\n";
}
return 0;
}
The output:
Source: 010000110010000110011010100111011101011011010101011010110011010000110010000110011010100111011101011011010101011010110011
Mask: 000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Chunk 0: 0x343219a9dd6d56b3
Chunk 1: 0x0043219a9dd6d56b
Related
I am trying to create a bitmaped data in , here is the code I used but I am not able to figure the right logic. Here's my code
bool a=1;
bool b=0;
bool c=1;
bool d=0;
uint8_t output = a|b|c|d;
printf("outupt = %X", output);
I want my output to be "1010" which is equivalent to hex "0x0A". How do I do it ??
The bitwise or operator ors the bits in each position. The result of a|b|c|d will be 1 because you're bitwise oring 0 and 1 in the least significant position.
You can shift (<<) the bits to the correct positions like this:
uint8_t output = a << 3 | b << 2 | c << 1 | d;
This will result in
00001000 (a << 3)
00000000 (b << 2)
00000010 (c << 1)
| 00000000 (d; d << 0)
--------
00001010 (output)
Strictly speaking, the calculation happens with ints and the intermediate results have more leading zeroes, but in this case we do not need to care about that.
If you're interested in setting/clearing/accessing very simply specific bits, you could consider std::bitset:
bitset<8> s; // bit set of 8 bits
s[3]=a; // access individual bits, as if it was an array
s[2]=b;
s[1]=c;
s[0]=d; // the first bit is the least significant bit
cout << s <<endl; // streams the bitset as a string of '0' and '1'
cout << "0x"<< hex << s.to_ulong()<<endl; // convert the bitset to unsigned long
cout << s[3] <<endl; // access a specific bit
cout << "Number of bits set: " << s.count()<<endl;
Online demo
The advantage is that the code is easier to read and maintain, especially if you're modifying bitmapped data. Because setting specific bits using binary arithmetics with a combination of << and | operators as explained by Anttii is a vorkable solution. But clearing specific bits in an existing bitmap, by combining the use of << and ~ (to create a bit mask) with & is a little more tricky.
Another advantage is that you can easily manage large bitsets of hundreds of bits, much larger than the largest built-in type unsigned long long (although doing so will not allow you to convert as easily to an unsigned long or an unsigned long long: you'll have to go via a string).
C only
I would use bitfields. I know that they are not portable, but for the particular embedded hardware (especially uCs) it is well defined.
#include <string.h>
#include <stdio.h>
#include <stdbool.h>
typedef union
{
struct
{
bool a:1;
bool b:1;
bool c:1;
bool d:1;
bool e:1;
bool f:1;
};
unsigned char byte;
}mydata;
int main(void)
{
mydata d;
d.a=1;
d.b=0;
d.c=1;
d.d=0;
printf("outupt = %hhX", d.byte);
}
Which is the most effective way to convert 16 bit value to 32 bit value in C++ by padding extra 2 bytes with 0 (i.e. without changing value but only change in size of variable).
Include the <cstdint> header and initialize your 32-bit integer with your 16-bit value. Be sure to pay attention to your signs. In the example below I'm converting all integer values (signed or not) to an unsigned 32-bit integer.
Example Conversion
#include <iostream>
#include <cstdint>
#include <iomanip>
using namespace std;
void dumpVar(const uint32_t value)
{
cout << setfill('0') << setw(8) << hex << value << endl;
}
int main()
{
uint16_t test1 = 0xffff;
int16_t test2 = 32767;
uint16_t test3 = 0xf33e;
int16_t test4 = -32768;
dumpVar(test1);
dumpVar(test2);
dumpVar(test3);
dumpVar(test4);
return 0;
}
Sample Output
Notice how negative numbers aren't zero-padded like you might expect. This is just a function of the sign bit.
0000ffff
00007fff
0000f33e
ffff8000
C and C++ handle this sort of operation automatically.
For example:
short small_number = 0xbeef;
int large_number = small_number;
// large_number is now 0x0000beef
Say I have a binary file; it contains positive binary numbers, but written in little endian as 32-bit integers
How do I read this file? I have this right now.
int main() {
FILE * fp;
char buffer[4];
int num = 0;
fp=fopen("file.txt","rb");
while ( fread(&buffer, 1, 4,fp) != 0) {
// I think buffer should be 32 bit integer I read,
// how can I let num equal to 32 bit little endian integer?
}
// Say I just want to get the sum of all these binary little endian integers,
// is there an another way to make read and get sum faster since it's all
// binary, shouldnt it be faster if i just add in binary? not sure..
return 0;
}
This is one way to do it that works on either big-endian or little-endian architectures:
int main() {
unsigned char bytes[4];
int sum = 0;
FILE *fp=fopen("file.txt","rb");
while ( fread(bytes, 4, 1,fp) != 0) {
sum += bytes[0] | (bytes[1]<<8) | (bytes[2]<<16) | (bytes[3]<<24);
}
return 0;
}
If you are using linux you should look here ;-)
It is about useful functions such as le32toh
From CodeGuru:
inline void endian_swap(unsigned int& x)
{
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
So, you can read directly to unsigned int and then just call this.
while ( fread(&num, 1, 4,fp) != 0) {
endian_swap(num);
// conversion done; then use num
}
If you are working with short files, I recommend the simple use of the class stringstream and then the function stoul. The code below reads byte per byte (in this case 2 bytes) from an ifstream and writes them in hex inside a string stream. Then thanks to stoul converts the string into a 16 bit integer:
#include <sstream>
#include <iomanip>
using namespace std;
ifstream is("filename.bin", ios::binary);
if(!is) { /*Error*/ }
is.unsetf(ios_base::skipws);
stringstream ss;
uint8_t byte1, byte2;
uint16_t val;
is >> byte1; is >> byte2;
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte1);
ss << setw(2) << setfill('0') << hex << static_cast<size_t>(byte2);
val = static_cast<uint16_t>(stoul(ss.str(), nullptr, 16));
cout << val << endl;
For example if you have to read from a binary file, a 16 bit integer stored in Big Endian (00 f3), you put it inside a stringstream ("00f3") and then convert it in a integer (243). The example writes the value in hex, but it could be dec or oct, even binary, using the class bitset. The iomanip functions (setw, setfill) are used to give a correct format to the sstream.
The bad of this method is that it's tremendously slow if you have to work with files large in size.
You read the code normally. However when you go to interpret the data you need to make the proper conversions.
This can be a pain in the butt as if you want to make your code portable, ie to run in both little and big endian machines, you need to handle all types of combinations: little to big, big to little, little to little and big to big. In the last two cases a no-op.
Fortunately this all can be automated with the boost::endian library. An example from their documentation:
#include <iostream>
#include <cstdio>
#include <boost/endian/arithmetic.hpp>
#include <boost/static_assert.hpp>
using namespace boost::endian;
namespace
{
// This is an extract from a very widely used GIS file format.
// Why the designer decided to mix big and little endians in
// the same file is not known. But this is a real-world format
// and users wishing to write low level code manipulating these
// files have to deal with the mixed endianness.
struct header
{
big_int32_t file_code;
big_int32_t file_length;
little_int32_t version;
little_int32_t shape_type;
};
const char* filename = "test.dat";
}
int main(int, char* [])
{
header h;
BOOST_STATIC_ASSERT(sizeof(h) == 16U); // reality check
h.file_code = 0x01020304;
h.file_length = sizeof(header);
h.version = 1;
h.shape_type = 0x01020304;
// Low-level I/O such as POSIX read/write or <cstdio>
// fread/fwrite is sometimes used for binary file operations
// when ultimate efficiency is important. Such I/O is often
// performed in some C++ wrapper class, but to drive home the
// point that endian integers are often used in fairly
// low-level code that does bulk I/O operations, <cstdio>
// fopen/fwrite is used for I/O in this example.
std::FILE* fi = std::fopen(filename, "wb"); // MUST BE BINARY
if (!fi)
{
std::cout << "could not open " << filename << '\n';
return 1;
}
if (std::fwrite(&h, sizeof(header), 1, fi) != 1)
{
std::cout << "write failure for " << filename << '\n';
return 1;
}
std::fclose(fi);
std::cout << "created file " << filename << '\n';
return 0;
}
After compiling and executing endian_example.cpp, a hex dump of test.dat shows:
01020304 00000010 01000000 04030201
I've got a homework assignment in my C++ programming class to write a function that outputs the binary value of a variable's value.
So for example, if I set a value of "a" to a char I should get the binary value of "a" output.
My C++ professor isn't the greatest in the whole world and I'm having trouble getting my code to work using the cryptic examples he gave us. Right now, my code just outputs a binary value of 11111111 no matter what I set it too (unless its NULL then I get 00000000).
Here is my code:
#include <iostream>
#define useavalue 1
using namespace std;
void GiveMeTehBinary(char bin);
int main(){
#ifdef useavalue
char b = 'a';
#else
char b = '\0';
#endif
GiveMeTehBinary(b);
system("pause");
return 0;
}
void GiveMeTehBinary(char bin){
long s;
for (int i = 0; i < 8; i++){
s = bin >> i;
cout << s%2;
}
cout << endl << endl;
}
Thanks a ton in advance guys. You're always extremely helpful :)
Edit: Fixed now - thanks a bunch :D The problem was that I was not storing the value from the bit shift. I've updated the code to its working state above.
The compiler should warn you about certain statements in your code that have no effect1. Consider
bin >> i;
This does nothing, since you don’t store the result of this operation anywhere.
Also, why did you declare tehbinary as an array? All you ever use is one element (the current one). It would be enough to store just the current bit.
Some other things:
NULL must only be used with pointer values. Your usage works but it’s not the intended usage. What you really want is a null character, i.e. '\0'.
Please use real, descriptive names. I vividly remember myself using variables called tehdataz etc. but this really makes the code hard to read and once the initial funny wears off it’s annoying both for you when you try to read your code, and for whoever is grading your code.
Formatting the code properly helps understanding a lot: make the indentation logical and consistent.
1 If you’re using g++, always pass the compiler flags -Wall -Wextra to get useful diagnostics about your code.
Try this:
#include <bitset>
#include <iostream>
int main()
{
std::bitset<8> x('a');
std::cout << x << std::endl;
}
it's actually really simple. to convert from decimal to binary you will need to include #include <bitset> in your program. inside here, it gives you a function that allows you to convert from decimal to binary form. and the function looks like this:
std::cout << std::bitset<8>(0b01000101) << std::endl;
the number 8 in the first argument means the length of the output string. the second argument is the value you want to convert. by the way, you can input a variable in binary form by declaring a 0b in front of the number to write it in binary form. note that to write in binary form is a feature added in c++14 so using any version lower than that won't work. here is the full code if you want to test it out.
#include <iostream>
#include <bitset>
int main()
{
std::cout << std::bitset<8>(0b01000101) << std::endl;
}
note that you don't have to input a binary number to do this.
#include <iostream>
#include <bitset>
int main()
{
std::cout << std::bitset<8>(34) << std::endl;
}
output:
00100010
Why not just check each bit in the unsigned char variable?
unsigned char b=0x80|0x20|0x01; //some test data
int bitbreakout[8];
if(b&0x80) bitbreakout[7]=1;
//repeat above for 0x40, 0x20, etc.
cout << bitbreakout;
There are a TON of ways to optimize this, but this should give you an idea of what do to.
#include <iostream>
using namespace std;
int main(){
int x = 255;
for(int i = numeric_limits<int>::digits; i >=0; i--){
cout << ((x & (1 << i)) >> i);
}
}
it's actually really simple. if you know how to convert decimal to binary, then you can code it easily in c++. in fact I have gone ahead and created a header file that allows you not only to convert from decimal to binary, it can convert from decimal to any number system. here's the code.
#pragma once
#include <string>
char valToChar(const uint32_t val)
{
if (val <= 9)
return 48 + val;
if (val <= 35)
return 65 + val - 10;
return 63;
}
std::string baseConverter(uint32_t num, const uint32_t &base)
{
std::string result;
while (num != 0)
{
result = valToChar(num % base) + result;
num /= base;
}
return result;
}
now, here is how you can use it.
int main()
{
std::cout << baseConverter(2021, 2) << "\n";
}
output:
11111100101
I have the contents of a file assigned into a string object. For simplicity the file only has 5 bytes, which is the size of 1 integer plus another byte.
What I want to do is get the first four bytes of the string object and somehow store it into a valid integer variable by the program.
Then the program will do various operations on the integer, changing it.
Afterward I want the changed integer stored back into the first four bytes of the string object.
Could anyone tell me I could achieve this? I would prefer to stick with the standard C++ library exclusively for this purpose. Thanks in advance for any help.
The following code snippet should illustrate a handful of things. Beware of endian differences. Play around with it. Try to understand what's going on. Add some file operations (binary read & write). The only way to really understand how to do this, is to experiment and create some tests.
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char *argv[]) {
int a = 108554107; // some random number for example sake
char c[4]; // simulate std::string containing a binary int
*((int *) &c[0]) = a; // use casting to copy the data
// reassemble a into b, using indexed bytes from c
int b = 0;
b |= (c[3] & 0xff) << 24;
b |= (c[2] & 0xff) << 16;
b |= (c[1] & 0xff) << 8;
b |= c[0] & 0xff;
// show that all three are equivalent
cout << "a: " << a << " b: " << b
<< " c: " << *((int *) &c[0]) << endl;
return 0;
}
If you are reading into std::string from that file any zero byte would signal end of the string, so you might end up with a string that is shorter then 5 bytes. Take a look here for how to do binary I/O with C++ streams.