How to convert from bitset back to int - c++

So I currently working on file compression so to compress the file I am converting my string of ints using huffman encoding
example:
1000011011010100011010100010001101111001001111010011000011011100001010010110011011
1001110011111000010110110101111111
to bit set like this :
int i = 0;
while (i < str.length())
{
bitset<8>set(str.substr(i, i + 7));
outputFile << char(set.to_ulong());
i = i + 8;
}
when I am retrieving the contents of that file I cant figure out how to convert it back to a string of ints. Once i get back the string of ints I can retrieve the original contents I encoded its just getting the string back thats the problem

If I correctly understand your question here is what you can do to perform a reversed operation:
constexpr int bits_num = sizeof(unsigned char) * CHAR_BIT; //somewhere upper in the file
//...
std::string outputStr;
unsigned char c;
while (inputFile.get(c)) //the simplest function to read "character by character"
{
std::bitset<bits_num> bset(c); //initialize bitset with a char value
outputStr += bset.to_string(); //append to output string
}
Of course I assume that inputFile stream you will declare and set up for yourself.
sizeof(unsigned char) * CHAR_BIT is an elegant way of expressing a number of bits in a variables's type (like here unsigned char). To use the constant CHAR_BIT (which is basically 8 on all modern architectures) add #include <climits> header.
You have also an error in your code:
bitset<8>set(str.substr(i, i + 7));
You're increasing the cut out substring each iteration (i + 7), change it simply to 8 (or better and less error-prone to the suggested constant):
constexpr int bits_num = sizeof(unsigned char) * CHAR_BIT;
//...
std::bitset<bits_num> set( str.substr(i, bits_num) );
outputFile << static_cast<unsigned char>( set.to_ulong() );
i += bits_num;

Related

Fill fixed size char array with integers in C++

I want to store integers in char array ex: 0 up to 1449, I checked other posts and I tried memset, sprinf etc. but either I get gibberish characters or unreadable symbols when I print inside of char array. Can anyone help please?
I checked the duplicate link however I am not trying to print int to char, I want to store int in char array. But I tried buf[i] = static_cast<char>(i); inside for loop but it didn't work. Casting didn't work.
The last one I tried is like this:
char buf[1450];
for (int i = 0; i < 1449; i++)
{
memset(buf, ' '+ i, 1450);
cout << buf[i];
}
The output is:
I'm not sure what you trying to do! You should say your objective.
A char (usually 8 bit) in c++ doesn't hold an int (usually 32 bit), If you want to store an int you should use an int array:
int buf[1500];
The memset(buf, ' '+ i, 1450); will actually write the sum of ' ' ascii number plus i always at beginning of the buffer (buffer address is never incremented).
something like this, maybe is what you want:
int buf[1500] = 0;
for (int i = 0; i < 1449; i++)
{
buf[i] = i;
cout << buf[i] << ' ';
}
consider using c++11 containers like std::vector to hold the int or chars, would much safer to use.
You are going to have to explain better what it is you want, because "store integers in char array" is exactly what this code does:
char buf[1450];
for (int i = 0; i < 1450; i++)
{
buf[i] = static_cast<char>(i);
std::cout << buf[i];
}
Yes, the output is similar to what your picture shows, but that is also the correct output.
When you use a debugger to look at buf after the loop, then it does contain: 0, 1, 2, 3, ..., 126, 127, -128, -127, ..., 0, 1, 2, 3, ... and so on, which is the expected contents given that we are trying to put the numbers 0-1449 into an integer type that (in this case*) can contain the range [-128;127].
If this is not the behavior you are looking for (it sounds like it isn't), then you need to describe your requirements in more detail or we won't be able to help you.
(*) Char must be able to contain a character representative. On many/most systems it is 8 bits, but the size is system dependent and it may also be larger.
New answer.
Thank you for the clarification, I believe that what you need is something like this:
int32_t before = 1093821061; // Int you want to transmit
uint8_t buf[4];
buf[0] = static_cast<uint8_t>((before >> 0) & 0xff);
buf[1] = static_cast<uint8_t>((before >> 8) & 0xff);
buf[2] = static_cast<uint8_t>((before >> 16) & 0xff);
buf[3] = static_cast<uint8_t>((before >> 24) & 0xff);
// Add buf to your UDP packet and send it
// Stuff...
// After receiving the packet on the other end:
int32_t after = 0;
after += buf[0] << 0;
after += buf[1] << 8;
after += buf[2] << 16;
after += buf[3] << 24;
std::cout << before << ", " << after << std::endl;
Your problem (as I see it), is that you want to store 32bit numbers in the 8bit buffers that you need for UDP packets. The way to do this is to pick the larger number apart, and convert it into individual bytes, transmit those bytes, and then assemble the big number again from the bytes.
The above code should allow you to do this. Note that I have changed types to int32_t and uint8_t to ensure that I know the exact size of my types - depending on the library you use, you may have to use plain int and char types, just be aware that then the exact sizes of your types are not guaranteed (most likely it will still be 32 and 8 bits, but they can change in size if you change compiler or compile for a different target system). If you want you can add some sizeof checks to ensure that your types conform to what you expect.

Convert string of hex to char array c++ [duplicate]

This question already has answers here:
Converting a hex string to a byte array
(22 answers)
Closed 3 years ago.
I have a string that represents an hexadecimal number:
std::string hex = "3371";
I want to convert it to a char array:
char hex[2] {0x33, 0x71};
Is there any convenient way to do it? I can use c++11 features, if it may help.
Motivation:
I need to save an integer (4 bytes), using 2 bytes char array.
The way how I thought it can be done is to convert it to string using std::hex, and then convert the string to the char array, but this is the point where I cannot continue..
If there is another simple way - I would like to hear :)
Important: I can assume that the hex number is less than 0xFFFF, and a positive number.
Just use std::stoi():
std::string hex = "3371";
uint16_t num = std::stoi( hex, nullptr, 16 );
uint8_t array[sizeof(num)];
memcpy( array, &num, sizeof( num ) );
note order of bytes will depend of endianness of your platform. If you need network order (as shown on your example) use htons() function:
uint16_t num = htons( std::stoi( hex, nullptr, 16 ) );
I found out a better way to convert a number to char array, as I needed. This solution works only for unsigned types!
template <typename T> bool ToByteArray(T num, unsigned char* ret, size_t size) {
if (ret == nullptr) {
std::cout << "Error in: " << __func__ << ": nullptr" << std::endl;
return false;
}
// drop the right-most bytes and convert to new right most byte.
for (int i = 0; i < size; i++) {
ret[i] = (int)((num >> (24 - 8*i)) & 0xFF);
}
return true;
}
This is more elegant way to do it.
If you want to convert it back - you might use:
long FormatBlock(const unsigned char* arr, size_t size) {
long num = 0;
for (int i = 0; i < size; i++) {
num += ((long)arr[i] << (24 - i*8));
}
return num;
}

Integer into char array

I need to convert integer value into char array on bit layer. Let's say int has 4 bytes and I need to split it into 4 chunks of length 1 byte as char array.
Example:
int a = 22445;
// this is in binary 00000000 00000000 1010111 10101101
...
//and the result I expect
char b[4];
b[0] = 0; //first chunk
b[1] = 0; //second chunk
b[2] = 87; //third chunk - in binary 1010111
b[3] = 173; //fourth chunk - 10101101
I need this conversion make really fast, if possible without any loops (some tricks with bit operations perhaps). The goal is thousands of such conversions in one second.
I'm not sure if I recommend this, but you can #include <stddef.h> and <sys/types.h> and write:
*(u32_t *)b = htonl((u32_t)a);
(The htonl is to ensure that the integer is in big-endian order before you store it.)
int a = 22445;
char *b = (char *)&a;
char b2 = *(b+2); // = 87
char b3 = *(b+3); // = 173
Depending on how you want negative numbers represented, you can simply convert to unsigned and then use masks and shifts:
unsigned char b[4];
unsigned ua = a;
b[0] = (ua >> 24) & 0xff;
b[1] = (ua >> 16) & 0xff;
b[2] = (ua >> 8) & 0xff
b[3] = ua & 0xff;
(Due to the C rules for converting negative numbers to unsigned, this will produce the twos complement representation for negative numbers, which is almost certainly what you want).
To access the binary representation of any type, you can cast a pointer to a char-pointer:
T x; // anything at all!
// In C++
unsigned char const * const p = reinterpret_cast<unsigned char const *>(&x);
/* In C */
unsigned char const * const p = (unsigned char const *)(&x);
// Example usage:
for (std::size_t i = 0; i != sizeof(T); ++i)
std::printf("Byte %u is 0x%02X.\n", p[i]);
That is, you can treat p as the pointer to the first element of an array unsigned char[sizeof(T)]. (In your case, T = int.)
I used unsigned char here so that you don't get any sign extension problems when printing the binary value (e.g. through printf in my example). If you want to write the data to a file, you'd use char instead.
You have already accepted an answer, but I will still give mine, which might suit you better (or the same...). This is what I tested with:
int a[3] = {22445, 13, 1208132};
for (int i = 0; i < 3; i++)
{
unsigned char * c = (unsigned char *)&a[i];
cout << (unsigned int)c[0] << endl;
cout << (unsigned int)c[1] << endl;
cout << (unsigned int)c[2] << endl;
cout << (unsigned int)c[3] << endl;
cout << "---" << endl;
}
...and it works for me. Now I know you requested a char array, but this is equivalent. You also requested that c[0] == 0, c[1] == 0, c[2] == 87, c[3] == 173 for the first case, here the order is reversed.
Basically, you use the SAME value, you only access it differently.
Why haven't I used htonl(), you might ask?
Well since performance is an issue, I think you're better off not using it because it seems like a waste of (precious?) cycles to call a function which ensures that bytes will be in some order, when they could have been in that order already on some systems, and when you could have modified your code to use a different order if that was not the case.
So instead, you could have checked the order before, and then used different loops (more code, but improved performance) based on what the result of the test was.
Also, if you don't know if your system uses a 2 or 4 byte int, you could check that before, and again use different loops based on the result.
Point is: you will have more code, but you will not waste cycles in a critical area, which is inside the loop.
If you still have performance issues, you could unroll the loop (duplicate code inside the loop, and reduce loop counts) as this will also save you a couple of cycles.
Note that using c[0], c[1] etc.. is equivalent to *(c), *(c+1) as far as C++ is concerned.
typedef union{
byte intAsBytes[4];
int int32;
}U_INTtoBYTE;

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.

How to convert an int to a binary string representation in C++

I have an int that I want to store as a binary string representation. How can this be done?
Try this:
#include <bitset>
#include <iostream>
int main()
{
std::bitset<32> x(23456);
std::cout << x << "\n";
// If you don't want a variable just create a temporary.
std::cout << std::bitset<32>(23456) << "\n";
}
I have an int that I want to first convert to a binary number.
What exactly does that mean? There is no type "binary number". Well, an int is already represented in binary form internally unless you're using a very strange computer, but that's an implementation detail -- conceptually, it is just an integral number.
Each time you print a number to the screen, it must be converted to a string of characters. It just so happens that most I/O systems chose a decimal representation for this process so that humans have an easier time. But there is nothing inherently decimal about int.
Anyway, to generate a base b representation of an integral number x, simply follow this algorithm:
initialize s with the empty string
m = x % b
x = x / b
Convert m into a digit, d.
Append d on s.
If x is not zero, goto step 2.
Reverse s
Step 4 is easy if b <= 10 and your computer uses a character encoding where the digits 0-9 are contiguous, because then it's simply d = '0' + m. Otherwise, you need a lookup table.
Steps 5 and 7 can be simplified to append d on the left of s if you know ahead of time how much space you will need and start from the right end in the string.
In the case of b == 2 (e.g. binary representation), step 2 can be simplified to m = x & 1, and step 3 can be simplified to x = x >> 1.
Solution with reverse:
#include <string>
#include <algorithm>
std::string binary(unsigned x)
{
std::string s;
do
{
s.push_back('0' + (x & 1));
} while (x >>= 1);
std::reverse(s.begin(), s.end());
return s;
}
Solution without reverse:
#include <string>
std::string binary(unsigned x)
{
// Warning: this breaks for numbers with more than 64 bits
char buffer[64];
char* p = buffer + 64;
do
{
*--p = '0' + (x & 1);
} while (x >>= 1);
return std::string(p, buffer + 64);
}
AND the number with 100000..., then 010000..., 0010000..., etc. Each time, if the result is 0, put a '0' in a char array, otherwise put a '1'.
int numberOfBits = sizeof(int) * 8;
char binary[numberOfBits + 1];
int decimal = 29;
for(int i = 0; i < numberOfBits; ++i) {
if ((decimal & (0x80000000 >> i)) == 0) {
binary[i] = '0';
} else {
binary[i] = '1';
}
}
binary[numberOfBits] = '\0';
string binaryString(binary);
http://www.phanderson.com/printer/bin_disp.html is a good example.
The basic principle of a simple approach:
Loop until the # is 0
& (bitwise and) the # with 1. Print the result (1 or 0) to the end of string buffer.
Shift the # by 1 bit using >>=.
Repeat loop
Print reversed string buffer
To avoid reversing the string or needing to limit yourself to #s fitting the buffer string length, you can:
Compute ceiling(log2(N)) - say L
Compute mask = 2^L
Loop until mask == 0:
& (bitwise and) the mask with the #. Print the result (1 or 0).
number &= (mask-1)
mask >>= 1 (divide by 2)
I assume this is related to your other question on extensible hashing.
First define some mnemonics for your bits:
const int FIRST_BIT = 0x1;
const int SECOND_BIT = 0x2;
const int THIRD_BIT = 0x4;
Then you have your number you want to convert to a bit string:
int x = someValue;
You can check if a bit is set by using the logical & operator.
if(x & FIRST_BIT)
{
// The first bit is set.
}
And you can keep an std::string and you add 1 to that string if a bit is set, and you add 0 if the bit is not set. Depending on what order you want the string in you can start with the last bit and move to the first or just first to last.
You can refactor this into a loop and using it for arbitrarily sized numbers by calculating the mnemonic bits above using current_bit_value<<=1 after each iteration.
There isn't a direct function, you can just walk along the bits of the int (hint see >> ) and insert a '1' or '0' in the string.
Sounds like a standard interview / homework type question
Use sprintf function to store the formatted output in the string variable, instead of printf for directly printing. Note, however, that these functions only work with C strings, and not C++ strings.
There's a small header only library you can use for this here.
Example:
std::cout << ConvertInteger<Uint32>::ToBinaryString(21);
// Displays "10101"
auto x = ConvertInteger<Int8>::ToBinaryString(21, true);
std::cout << x << "\n"; // displays "00010101"
auto x = ConvertInteger<Uint8>::ToBinaryString(21, true, "0b");
std::cout << x << "\n"; // displays "0b00010101"
Solution without reverse, no additional copy, and with 0-padding:
#include <iostream>
#include <string>
template <short WIDTH>
std::string binary( unsigned x )
{
std::string buffer( WIDTH, '0' );
char *p = &buffer[ WIDTH ];
do {
--p;
if (x & 1) *p = '1';
}
while (x >>= 1);
return buffer;
}
int main()
{
std::cout << "'" << binary<32>(0xf0f0f0f0) << "'" << std::endl;
return 0;
}
This is my best implementation of converting integers(any type) to a std::string. You can remove the template if you are only going to use it for a single integer type. To the best of my knowledge , I think there is a good balance between safety of C++ and cryptic nature of C. Make sure to include the needed headers.
template<typename T>
std::string bstring(T n){
std::string s;
for(int m = sizeof(n) * 8;m--;){
s.push_back('0'+((n >> m) & 1));
}
return s;
}
Use it like so,
std::cout << bstring<size_t>(371) << '\n';
This is the output in my computer(it differs on every computer),
0000000000000000000000000000000000000000000000000000000101110011
Note that the entire binary string is copied and thus the padded zeros which helps to represent the bit size. So the length of the string is the size of size_t in bits.
Lets try a signed integer(negative number),
std::cout << bstring<signed int>(-1) << '\n';
This is the output in my computer(as stated , it differs on every computer),
11111111111111111111111111111111
Note that now the string is smaller , this proves that signed int consumes less space than size_t. As you can see my computer uses the 2's complement method to represent signed integers (negative numbers). You can now see why unsigned short(-1) > signed int(1)
Here is a version made just for signed integers to make this function without templates , i.e use this if you only intend to convert signed integers to string.
std::string bstring(int n){
std::string s;
for(int m = sizeof(n) * 8;m--;){
s.push_back('0'+((n >> m) & 1));
}
return s;
}