I'm coding a program writing down to a file a large bool area (usually around 512*512 bool variables). It would make great use of saving it in a smart way, I'm thinking about saving them 8 by 8, coding these 8 booleans into one byte of the form:
byte = bit0 * boolean0 | ... | bit7 * boolean7
But I'm not sure how to handle this conversion, though I know to write and read a file byte by byte.
I'm using C++, I've no background in CS but this seems close to the topic of serialization though everything I searched on the subject is really unclear to me. Has it already been implemented, or this there a really simple way to implement it? (I meant saving as much CPU time as possible, my program will write and open millions of these files per instance)
Cheers.
Edit:
With the help of Sean (thank you btw!) I managed to get a bit further but it is still not working, a test of the data after saving and reading tells me that it gets corrupted (as in not reconstructed correctly and so not identical to the initial data) somewhere in the writing, reading or both...
My code will probably help.
Here are the writing lines:
typedef char byte;
ofstream ofile("OUTPUT_FILE");
for(int i=0;i<N/8;i++){
byte encoded = 0;
for(int j=0; j<8; j++){
byte bit = (byte)(tab[(i*8+j)/h][(i*8+j)%h]==1);
encoded = (encoded << 1) | bit;
}
ofile << encoded;
}
and the reading lines:
for(int i=0;i<N/8;i++){ //N is the total number of entries I have in my final array
temp = ifile.get(); //reading the byte in a char
for(int j=0; j<8; j++){ // trying to read each bits in there
if(((temp >> j) & 1) ? '1' : '0' ){
tab[(i*8+j)/h][(i*8+j)%h]=1;
}
else{
tab[(i*8+j)/h][(i*8+j)%h]=-1; //my programs manipulates +1 (TRUE) and -1 (FALSE) making most of the operations easier
}
}
}
ifile.close();
Edit2:
I finally managed to do using the bitset<8> objects, much clearer to me than to manipulate bits inside char. I'll probably update my post later with my working code. I'm still concerned with efficiency, is it much quicker to work with than to bitset do you think?
If you don't need to determine the size of you bit array at runtime, you can use std::bitset
http://en.cppreference.com/w/cpp/utility/bitset
You can encode the bools to an int in a loop like this:
bool flags[8];
// populate flags from somewhere
byte encoded=0;
for(int i=0; i<8; i++)
{
byte bit = (byte)flags[i];
encoded = (encoded << 1) | bit;
}
The code uses the fact that casting a bool to a number yields 1 for true and o for false.
Alternatively you can unroll it:
byte encoded = 0;
encoded |= ((byte)flags[0]) << 7;
encoded |= ((byte)flags[1]) << 6;
encoded |= ((byte)flags[2]) << 5;
encoded |= ((byte)flags[3]) << 4;
encoded |= ((byte)flags[4]) << 3;
encoded |= ((byte)flags[5]) << 2;
encoded |= ((byte)flags[6]) << 1;
encoded |= ((byte)flags[7]);
To convert the byte back to an array of flags you can do something like this:
bool flags[8];
byte encoded= /* some value */
for(int i=0, i<8; i++)
{
bool flag=(bool)(encoded & 1);
flags[7-i]=flag;
encoded>>=1;
}
Related
I want to store integers in char array ex: 0 up to 1449, I checked other posts and I tried memset, sprinf etc. but either I get gibberish characters or unreadable symbols when I print inside of char array. Can anyone help please?
I checked the duplicate link however I am not trying to print int to char, I want to store int in char array. But I tried buf[i] = static_cast<char>(i); inside for loop but it didn't work. Casting didn't work.
The last one I tried is like this:
char buf[1450];
for (int i = 0; i < 1449; i++)
{
memset(buf, ' '+ i, 1450);
cout << buf[i];
}
The output is:
I'm not sure what you trying to do! You should say your objective.
A char (usually 8 bit) in c++ doesn't hold an int (usually 32 bit), If you want to store an int you should use an int array:
int buf[1500];
The memset(buf, ' '+ i, 1450); will actually write the sum of ' ' ascii number plus i always at beginning of the buffer (buffer address is never incremented).
something like this, maybe is what you want:
int buf[1500] = 0;
for (int i = 0; i < 1449; i++)
{
buf[i] = i;
cout << buf[i] << ' ';
}
consider using c++11 containers like std::vector to hold the int or chars, would much safer to use.
You are going to have to explain better what it is you want, because "store integers in char array" is exactly what this code does:
char buf[1450];
for (int i = 0; i < 1450; i++)
{
buf[i] = static_cast<char>(i);
std::cout << buf[i];
}
Yes, the output is similar to what your picture shows, but that is also the correct output.
When you use a debugger to look at buf after the loop, then it does contain: 0, 1, 2, 3, ..., 126, 127, -128, -127, ..., 0, 1, 2, 3, ... and so on, which is the expected contents given that we are trying to put the numbers 0-1449 into an integer type that (in this case*) can contain the range [-128;127].
If this is not the behavior you are looking for (it sounds like it isn't), then you need to describe your requirements in more detail or we won't be able to help you.
(*) Char must be able to contain a character representative. On many/most systems it is 8 bits, but the size is system dependent and it may also be larger.
New answer.
Thank you for the clarification, I believe that what you need is something like this:
int32_t before = 1093821061; // Int you want to transmit
uint8_t buf[4];
buf[0] = static_cast<uint8_t>((before >> 0) & 0xff);
buf[1] = static_cast<uint8_t>((before >> 8) & 0xff);
buf[2] = static_cast<uint8_t>((before >> 16) & 0xff);
buf[3] = static_cast<uint8_t>((before >> 24) & 0xff);
// Add buf to your UDP packet and send it
// Stuff...
// After receiving the packet on the other end:
int32_t after = 0;
after += buf[0] << 0;
after += buf[1] << 8;
after += buf[2] << 16;
after += buf[3] << 24;
std::cout << before << ", " << after << std::endl;
Your problem (as I see it), is that you want to store 32bit numbers in the 8bit buffers that you need for UDP packets. The way to do this is to pick the larger number apart, and convert it into individual bytes, transmit those bytes, and then assemble the big number again from the bytes.
The above code should allow you to do this. Note that I have changed types to int32_t and uint8_t to ensure that I know the exact size of my types - depending on the library you use, you may have to use plain int and char types, just be aware that then the exact sizes of your types are not guaranteed (most likely it will still be 32 and 8 bits, but they can change in size if you change compiler or compile for a different target system). If you want you can add some sizeof checks to ensure that your types conform to what you expect.
Let us say that we have a double, say, x = 4.3241;
Quite simply, I would like to know, how in C++, can one simply retrieve an int for each bit in the representation of a number?
I have seen other questions and read the page on bitset, but I'm afraid I still do not understand how to retrieve those bits.
So, for example, I would like the input to be x = 4.53, and if the bit representation was 10010101, then I would like 8 ints, each one representing each 1 or 0.
Something like:
double doubleValue = ...whatever...;
uint8_t *bytePointer = (uint8_t *)&doubleValue;
for(size_t index = 0; index < sizeof(double); index++)
{
uint8_t byte = bytePointer[index];
for(int bit = 0; bit < 8; bit++)
{
printf("%d", byte&1);
byte >>= 1;
}
}
... will print the bits out, ordered from least significant to most significant within bytes and reading the bytes from first to last. Depending on your machine architecture that means the bytes may or may not be in order of significance. Intel is strictly little endian so you should get all bits from least significant to most; most CPUs use the same endianness for floating point numbers as for integers but even that's not guaranteed.
Just allocate an array and store the bits instead of printing them.
(an assumption made: that there are eight bits in a byte; not technically guaranteed in C but fairly reliable on any hardware you're likely to encounter nowadays)
This is extremely architecture-dependent. After gathering the following information
The Endianess of your target architecture
The floating point representation (e.g. IEEE754)
The size of your double type
you should be able to get the bit representation you're searching for. An example tested on a x86_64 system
#include <iostream>
#include <climits>
int main()
{
double v = 72.4;
// Boilerplate to circumvent the fact bitwise operators can't be applied to double
union {
double value;
char array[sizeof(double)];
};
value = v;
for (int i = 0; i < sizeof(double) * CHAR_BIT; ++i) {
int relativeToByte = i % CHAR_BIT;
bool isBitSet = (array[sizeof(double) - 1 - i / CHAR_BIT] &
(1 << (CHAR_BIT - relativeToByte - 1))) == (1 << (CHAR_BIT - relativeToByte - 1));
std::cout << (isBitSet ? "1" : "0");
}
return 0;
}
Live Example
The output is
0100000001010010000110011001100110011001100110011001100110011010
which, split into sign, exponent and significand (or mantissa), is
0 10000000101 (1.)0010000110011001100110011001100110011001100110011010
(Image taken from wikipedia)
Anyway you're required to know how your target representation works, otherwise these numbers will pretty much be useless to you.
Since your question is unclear whether you want those integers to be in the order that makes sense with regard to the internal representation of your number of simply dump out the bytes at that address as you encounter them, I'm adding another easier method to just dump out every byte at that address (and showing another way of dealing with bit operators and double)
double v = 72.4;
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int bit = CHAR_BIT - 1; bit >= 0; --bit) // Print each byte
std::cout << ((byte & (1 << bit)) == (1 << bit));
}
The above code will simply print each byte from the one at lower address to the one with higher address.
Edit: since it seems you're just interested in how many 1s and 0s are there (i.e. the order totally doesn't matter), in this specific instance I agree with the other answers and I would also just go for a counting solution
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int j = 0; j < CHAR_BIT; ++j) {
std::cout << (byte & 0x1);
byte >>= 1;
}
}
What is the best way to implement a bitwise memmove? The method should take an additional destination and source bit-offset and the count should be in bits too.
I saw that ARM provides a non-standard _membitmove, which does exactly what I need, but I couldn't find its source.
Bind's bitset includes isc_bitstring_copy, but it's not efficient
I'm aware that the C standard library doesn't provide such a method, but I also couldn't find any third-party code providing a similar method.
Assuming "best" means "easiest", you can copy bits one by one. Conceptually, an address of a bit is an object (struct) that has a pointer to a byte in memory and an index of a bit in the byte.
struct pointer_to_bit
{
uint8_t* p;
int b;
};
void membitmovebl(
void *dest,
const void *src,
int dest_offset,
int src_offset,
size_t nbits)
{
// Create pointers to bits
struct pointer_to_bit d = {dest, dest_offset};
struct pointer_to_bit s = {src, src_offset};
// Bring the bit offsets to range (0...7)
d.p += d.b / 8; // replace division by right-shift if bit offset can be negative
d.b %= 8; // replace "%=8" by "&=7" if bit offset can be negative
s.p += s.b / 8;
s.b %= 8;
// Determine whether it's OK to loop forward
if (d.p < s.p || d.p == s.p && d.b <= s.b)
{
// Copy bits one by one
for (size_t i = 0; i < nbits; i++)
{
// Read 1 bit
int bit = (*s.p >> s.b) & 1;
// Write 1 bit
*d.p &= ~(1 << d.b);
*d.p |= bit << d.b;
// Advance pointers
if (++s.b == 8)
{
s.b = 0;
++s.p;
}
if (++d.b == 8)
{
d.b = 0;
++d.p;
}
}
}
else
{
// Copy stuff backwards - essentially the same code but ++ replaced by --
}
}
If you want to write a version optimized for speed, you will have to do copying by bytes (or, better, words), unroll loops, and handle a number of special cases (memmove does that; you will have to do more because your function is more complicated).
P.S. Oh, seeing that you call isc_bitstring_copy inefficient, you probably want the speed optimization. You can use the following idea:
Start copying bits individually until the destination is byte-aligned (d.b == 0). Then, it is easy to copy 8 bits at once, doing some bit twiddling. Do this until there are less than 8 bits left to copy; then continue copying bits one by one.
// Copy 8 bits from s to d and advance pointers
*d.p = *s.p++ >> s.b;
*d.p++ |= *s.p << (8 - s.b);
P.P.S Oh, and seeing your comment on what you are going to use the code for, you don't really need to implement all the versions (byte/halfword/word, big/little-endian); you only want the easiest one - the one working with words (uint32_t).
Here is a partial implementation (not tested). There are obvious efficiency and usability improvements.
Copy n bytes from src to dest (not overlapping src), and shift bits at dest rightwards by bit bits, 0 <= bit <= 7. This assumes that the least significant bits are at the right of the bytes
void memcpy_with_bitshift(unsigned char *dest, unsigned char *src, size_t n, int bit)
{
int i;
memcpy(dest, src, n);
for (i = 0; i < n; i++) {
dest[i] >> bit;
}
for (i = 0; i < n; i++) {
dest[i+1] |= (src[i] << (8 - bit));
}
}
Some improvements to be made:
Don't overwrite first bit bits at beginning of dest.
Merge loops
Have a way to copy a number of bits not divisible by 8
Fix for >8 bits in a char
How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.
I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).