I've written the below code to convert and store the data from a string (array of chars) called strinto an array of 16-bit integers called arr16bit
The code works. However, i'd say that there's a better or cleaner way to implement this logic, using less variables etc.
I don't want to use index i to get the modulus % 2, because if using little endian, I have the same algorithm but i starts at the last index of the string and counts down instead of up. Any recommendations are appreciated.
// assuming str had already been initialised before this ..
int strLength = CalculateStringLength(str); // function implementation now shown
uint16_t* arr16bit = new uint16_t[ (strLength /2) + 1]; // The only C++ feature used here , so I didn't want to tag it
int indexWrite = 0;
int counter = 0;
for(int i = 0; i < strLength; ++i)
{
arr16bit[indexWrite] <<= 8;
arr16bit[indexWrite] |= str[i];
if ( (counter % 2) != 0)
{
indexWrite++;
}
counter++;
}
Yes, there are some redundant variables here.
You have both counter and i which do exactly the same thing and always hold the same value. And you have indexWrite which is always exactly half (per integer division) of both of them.
You're also shifting too far (16 bits rather than 8).
const std::size_t strLength = CalculateStringLength(str);
std::vector<uint16_t> arr16bit((strLength/2) + 1);
for (std::size_t i = 0; i < strLength; ++i)
{
arr16bit[i/2] <<= 8;
arr16bit[i/2] |= str[i];
}
Though I'd probably do it more like this to avoid N redundant |= operations:
const std::size_t strLength = CalculateStringLength(str);
std::vector<uint16_t> arr16bit((strLength/2) + 1);
for (std::size_t i = 0; i < strLength+1; i += 2)
{
arr16bit[i/2] = (str[i] << 8);
arr16bit[(i/2)+1] |= str[i+1];
}
You may also wish to consider a simple std::copy over the whole dang buffer, if your endianness is right for it.
Related
I have a base64 string containing bits, I have alredy decoded it with the code in here. But I'm unable to transform the resultant string in bits I could work with. Is there a way to convert the bytes contained in the code to a vector of bools containing the bits of the string?
I have tried converting the char with this code but it failed to conver to a proper char
void DecodedStringToBit(std::string const& decodedString, std::vector<bool> &bits) {
int it = 0;
for (int i = 0; i < decodedString.size(); ++i) {
unsigned char c = decodedString[i];
for (unsigned char j = 128; j > 0; j <<= 1) {
if (c&j) bits[++it] = true;
else bits[++it] = false;
}
}
}
Your inner for loop is botched: it's shifting j the wrong way. And honestly, if you want to work with 8-bit values, you should use the proper <stdint.h> types instead of unsigned char:
for (uint8_t j = 128; j; j >>= 1)
bits.push_back(c & j);
Also, remember to call bits.reserve(decodedString.size() * 8); so your program doesn't waste a bunch of time on resizing.
I'm assuming the bit order is MSB first. If you want LSB first, the loop becomes:
for (uint8_t j = 1; j; j <<= 1)
In OP's code, it is not clear if the vector bits is of sufficient size, for example, if it is resized by the caller (It should not be!). If not, then the vector does not have space allocated, and hence bits[++it] may not work; the appropriate thing might be to push_back. (Moreover, I think the code might need the post-increment of it, i.e. bits[it++] to start from bits[0].)
Furthermore, in OP's code, the purpose of unsigned char j = 128 and j <<= 1 is not clear. Wouldn't j be all zeros after the first iteration? If so, the inner loop would always run for only one iteration.
I would try something like this (not compiled):
void DecodedStringToBit(std::string const& decodedString,
std::vector<bool>& bits) {
for (auto charIndex = 0; charIndex != decodedString.size(); ++charIndex) {
const unsigned char c = decodedString[charIndex];
for (int bitIndex = 0; bitIndex != CHAR_BIT; ++bitIndex) {
// CHAR_BIT = bits in a char = 8
const bool bit = c & (1 << bitIndex); // bitwise-AND with mask
bits.push_back(bit);
}
}
}
I have a Constructor that creates a BitArray object, which asks a user for how many 'bits' they would like to use. It then uses unsigned chars to store the Bytes needed to hold the many. I then wish to create methods that allow for a user to 'Set' a certain bit, and also to Display the full set of Bytes at the end. However, my Set method does not seem to be changing the bit, that, or my Print function (The Overload) does not seem to actually be printing the actual bit(s). Can somebody point out the problem please?
Constructor
BitArray::BitArray(unsigned int n)
{
//Now let's find the minimum 'bits' needed
n++;
//If it does not "perfectly" fit
//------------------------------------ehhhh
if( (n % BYTE) != 0)
arraySize =(n / BYTE);
else
arraySize = (n / BYTE) + 1;
//Now dynamically create the array with full byte size
barray = new unsigned char[arraySize];
//Now intialize bytes to 0
for(int i = 0; i < arraySize; i++)
{
barray[i] = (int) 0;
}
}
Set Method:
void BitArray::Set(unsigned int index)
{
//Set the Indexed Bit to ON
barray[index/BYTE] |= 0x01 << (index%BYTE);
}
Print Overload:
ostream &operator<<(ostream& os, const BitArray& a)
{
for(int i = 0; i < (a.Length()*BYTE+1); i++)
{
int curNum = i/BYTE;
char charToPrint = a.barray[curNum];
os << (charToPrint & 0X01);
charToPrint >>= 1;
}
return os;
}
for(int i = 0; i < (a.Length()*BYTE+1); i++)
{
int curNum = i/BYTE;
char charToPrint = a.barray[curNum];
os << (charToPrint & 0X01);
charToPrint >>= 1;
}
Each time you run your loop, you are fetching a new value for charToPrint. That means that the operation charToPrint >>= 1; is useless, since that modification is not going to be carried out to the next time the loop runs. Therefore, you will always print only the first bit of each char of your array.
I want to convert an integer to binary string and then store each bit of the integer string to an element of a integer array of a given size. I am sure that the input integer's binary expression won't exceed the size of the array specified. How to do this in c++?
Pseudo code:
int value = ???? // assuming a 32 bit int
int i;
for (i = 0; i < 32; ++i) {
array[i] = (value >> i) & 1;
}
template<class output_iterator>
void convert_number_to_array_of_digits(const unsigned number,
output_iterator first, output_iterator last)
{
const unsigned number_bits = CHAR_BIT*sizeof(int);
//extract bits one at a time
for(unsigned i=0; i<number_bits && first!=last; ++i) {
const unsigned shift_amount = number_bits-i-1;
const unsigned this_bit = (number>>shift_amount)&1;
*first = this_bit;
++first;
}
//pad the rest with zeros
while(first != last) {
*first = 0;
++first;
}
}
int main() {
int number = 413523152;
int array[32];
convert_number_to_array_of_digits(number, std::begin(array), std::end(array));
for(int i=0; i<32; ++i)
std::cout << array[i] << ' ';
}
Proof of compilation here
You could use C++'s bitset library, as follows.
#include<iostream>
#include<bitset>
int main()
{
int N;//input number in base 10
cin>>N;
int O[32];//The output array
bitset<32> A=N;//A will hold the binary representation of N
for(int i=0,j=31;i<32;i++,j--)
{
//Assigning the bits one by one.
O[i]=A[j];
}
return 0;
}
A couple of points to note here:
First, 32 in the bitset declaration statement tells the compiler that you want 32 bits to represent your number, so even if your number takes fewer bits to represent, the bitset variable will have 32 bits, possibly with many leading zeroes.
Second, bitset is a really flexible way of handling binary, you can give a string as its input or a number, and again you can use the bitset as an array or as a string.It's a really handy library.
You can print out the bitset variable A as
cout<<A;
and see how it works.
You can do like this:
while (input != 0) {
if (input & 1)
result[index] = 1;
else
result[index] =0;
input >>= 1;// dividing by two
index++;
}
As Mat mentioned above, an int is already a bit-vector (using bitwise operations, you can check each bit). So, you can simply try something like this:
// Note: This depends on the endianess of your machine
int x = 0xdeadbeef; // Your integer?
int arr[sizeof(int)*CHAR_BIT];
for(int i = 0 ; i < sizeof(int)*CHAR_BIT ; ++i) {
arr[i] = (x & (0x01 << i)) ? 1 : 0; // Take the i-th bit
}
Decimal to Binary: Size independent
Two ways: both stores binary represent into a dynamic allocated array bits (in msh to lsh).
First Method:
#include<limits.h> // include for CHAR_BIT
int* binary(int dec){
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
// conversion
int left = sizeof(int) * CHAR_BIT - 1;
for(i = 0; left >= 0; left--, i++){
bits[i] = !!(dec & ( 1u << left ));
}
return bits;
}
Second Method:
#include<limits.h> // include for CHAR_BIT
int* binary(unsigned int num)
{
unsigned int mask = 1u << ((sizeof(int) * CHAR_BIT) - 1);
//mask = 1000 0000 0000 0000
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
//conversion
while(mask > 0){
if((num & mask) == 0 )
bits[i] = 0;
else
bits[i] = 1;
mask = mask >> 1 ; // Right Shift
i++;
}
return bits;
}
I know it doesn't add as many Zero's as you wish for positive numbers. But for negative binary numbers, it works pretty well.. I just wanted to post a solution for once :)
int BinToDec(int Value, int Padding = 8)
{
int Bin = 0;
for (int I = 1, Pos = 1; I < (Padding + 1); ++I, Pos *= 10)
{
Bin += ((Value >> I - 1) & 1) * Pos;
}
return Bin;
}
This is what I use, it also lets you give the number of bits that will be in the final vector, fills any unused bits with leading 0s.
std::vector<int> to_binary(int num_to_convert_to_binary, int num_bits_in_out_vec)
{
std::vector<int> r;
// make binary vec of minimum size backwards (LSB at .end() and MSB at .begin())
while (num_to_convert_to_binary > 0)
{
//cout << " top of loop" << endl;
if (num_to_convert_to_binary % 2 == 0)
r.push_back(0);
else
r.push_back(1);
num_to_convert_to_binary = num_to_convert_to_binary / 2;
}
while(r.size() < num_bits_in_out_vec)
r.push_back(0);
return r;
}
i know there is a lot of questions about it, but most of them uses fixed sized converters, like 4 bytes to int and etc.
I have an templated functions to convert bytes to numbers and etc, but have a problem :D
template <typename IntegerType>
static IntegerType bitsToInt(BYTE* bits, bool little_endian = true)
{
IntegerType result = 0;
if (little_endian)
for (int n = sizeof(IntegerType); n >= 0; n--)
result = (result << 8) + bits[n];
else
for (int n = 0; n < sizeof(IntegerType); n++)
result = (result << 8) + bits[n];
return result;
}
template <typename IntegerType>
static BYTE *intToBits(IntegerType value)
{
BYTE result[sizeof(IntegerType)] = { 0 };
for (int i = 0; i < sizeof(IntegerType); i++)
result = (value >> (i * 8));
return result;
}
static void TestConverters()
{
short int test = 12345;
BYTE *bytes = intToBits<short int>(test);
short int test2 = bitsToInt<short int>(bytes); //<--i getting here different number, then 12345, so something goes wrong at conversion
}
So, could anyone say what's wrong here?
static BYTE *intToBits(IntegerType value)
This is returning a pointer to locally allocated memory, which once the function returns goes out of scope and is no longer valid.
There are several bugs in the function intsToBits
1. Insted of
result = (value >> (i * 8));
there should be
result[i] = 0xFF & (value >> (i * 8));
More serious one you return the pointer to the memory on the stack, which is generally incorrect after you exit the function. You shoul allocate the memory with the new operator.
BYTE * result = new BYTE[sizeof(IntegerType)];
The you'll be needed to release the memory
This may not be your only problem, but intToBits is returning a pointer to a local variable, which is undefined behavior.
Try using new to allocate the byte array you return
I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).