I've written the below code to convert and store the data from a string (array of chars) called strinto an array of 16-bit integers called arr16bit
The code works. However, i'd say that there's a better or cleaner way to implement this logic, using less variables etc.
I don't want to use index i to get the modulus % 2, because if using little endian, I have the same algorithm but i starts at the last index of the string and counts down instead of up. Any recommendations are appreciated.
// assuming str had already been initialised before this ..
int strLength = CalculateStringLength(str); // function implementation now shown
uint16_t* arr16bit = new uint16_t[ (strLength /2) + 1]; // The only C++ feature used here , so I didn't want to tag it
int indexWrite = 0;
int counter = 0;
for(int i = 0; i < strLength; ++i)
{
arr16bit[indexWrite] <<= 8;
arr16bit[indexWrite] |= str[i];
if ( (counter % 2) != 0)
{
indexWrite++;
}
counter++;
}
Yes, there are some redundant variables here.
You have both counter and i which do exactly the same thing and always hold the same value. And you have indexWrite which is always exactly half (per integer division) of both of them.
You're also shifting too far (16 bits rather than 8).
const std::size_t strLength = CalculateStringLength(str);
std::vector<uint16_t> arr16bit((strLength/2) + 1);
for (std::size_t i = 0; i < strLength; ++i)
{
arr16bit[i/2] <<= 8;
arr16bit[i/2] |= str[i];
}
Though I'd probably do it more like this to avoid N redundant |= operations:
const std::size_t strLength = CalculateStringLength(str);
std::vector<uint16_t> arr16bit((strLength/2) + 1);
for (std::size_t i = 0; i < strLength+1; i += 2)
{
arr16bit[i/2] = (str[i] << 8);
arr16bit[(i/2)+1] |= str[i+1];
}
You may also wish to consider a simple std::copy over the whole dang buffer, if your endianness is right for it.
I'm doing an assignment on DES encryption and I can't seem to convert a string, let alone a char into a bitset. Can anyone show me how to convert a single char into a bitset in C++?
The following:
char c = 'A';
std::bitset<8> b(c); // implicit cast to unsigned long long
should work.
See http://ideone.com/PtSFvz
Converting an arbitrary-length string to a bitset is harder, if at all possible. The size of a bitset must be known at compile-time, so there's not really a way of converting a string to one.
However, if you know the length of your string at compile-time (or can bound it at compile time), you can do something like:
const size_t N = 50; // bound on string length
bitset<N * 8> b;
for (int i = 0; i < str.length(); ++i) {
char c = s[i];
for (int j = 7; j >= 0 && c; --j) {
if (c & 0x1) {
b.set(8 * i + j);
}
c >>= 1;
}
}
That may be a bit inefficient but I don't know if there's a better work-around.
I have a Constructor that creates a BitArray object, which asks a user for how many 'bits' they would like to use. It then uses unsigned chars to store the Bytes needed to hold the many. I then wish to create methods that allow for a user to 'Set' a certain bit, and also to Display the full set of Bytes at the end. However, my Set method does not seem to be changing the bit, that, or my Print function (The Overload) does not seem to actually be printing the actual bit(s). Can somebody point out the problem please?
Constructor
BitArray::BitArray(unsigned int n)
{
//Now let's find the minimum 'bits' needed
n++;
//If it does not "perfectly" fit
//------------------------------------ehhhh
if( (n % BYTE) != 0)
arraySize =(n / BYTE);
else
arraySize = (n / BYTE) + 1;
//Now dynamically create the array with full byte size
barray = new unsigned char[arraySize];
//Now intialize bytes to 0
for(int i = 0; i < arraySize; i++)
{
barray[i] = (int) 0;
}
}
Set Method:
void BitArray::Set(unsigned int index)
{
//Set the Indexed Bit to ON
barray[index/BYTE] |= 0x01 << (index%BYTE);
}
Print Overload:
ostream &operator<<(ostream& os, const BitArray& a)
{
for(int i = 0; i < (a.Length()*BYTE+1); i++)
{
int curNum = i/BYTE;
char charToPrint = a.barray[curNum];
os << (charToPrint & 0X01);
charToPrint >>= 1;
}
return os;
}
for(int i = 0; i < (a.Length()*BYTE+1); i++)
{
int curNum = i/BYTE;
char charToPrint = a.barray[curNum];
os << (charToPrint & 0X01);
charToPrint >>= 1;
}
Each time you run your loop, you are fetching a new value for charToPrint. That means that the operation charToPrint >>= 1; is useless, since that modification is not going to be carried out to the next time the loop runs. Therefore, you will always print only the first bit of each char of your array.
The code below shown is in visual c++
array<Byte>^ b = gcnew array <Byte> (filesize);
fs->Read(b,0,b->Length);
unsigned char *pb;
pb=(byte*)malloc(b->Length); //pb is unmanaged here.
for(int i=0;i<b->Length;i++)
{
*(pb+i)=InverseByte(b+i);
}
I want to call the function below to reverse each byte. How can I do this?
I want to do inverse of each byte of managed array b, and put it in the unmanaged array b.
unsigned char InverseByte(unsigned char* PbByte)
{
//something;
}
Fix the declaration of InverseByte:
unsigned char InverseByte(unsigned char value)
So you can then use it like this:
for (int i=0; i < b->Length; i++)
{
pb[i] = InverseByte(b[i]);
}
You mean bitwise negation I presume.
unsigned char InverseByte(unsigned char c)
{
return ~c;
}
Note that I changed the parameter to pass by value rather than passing a pointer to the value.
I also fail to see why you are using pointer arithmetic instead of indexing. That just makes your code harder to read. The loop should be written like this:
for(int i=0;i<b->Length;i++)
{
pb[i] = InverseByte(b[i]);
}
unsigned char InverseByte(unsigned char c)
{
return (c>>7)|((c&64)>>5)|((c&32)>>3)|((c&16)>>1)|((c&8)<<1)|((c&4)<<3)|((c&2)<<5)|((c&1)<<7);
}
From the comments it's clear you are reversing the bytes (i.e. reordering them front-to-back) rather than inverting (i.e. subtracting a variable from its maximum value) or negating (i.e. flipping 1's and 0's), so should name the function accordingly.
Here's a neat little snippet from Seander's bithacks:
unsigned int v; // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
for (v >>= 1; v; v >>= 1)
{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero
return r;
If you combine Hans Passant's answer with this, you should have all the pieces to put together your function.
I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).