As part of my master thesis, I get a number (e.g. 5 bits) with 2 significant bits (2nd and 4th). This means for example x1x0x, where $x \in {0,1}$ (x could be 0 or 1) and 1,0 are bits with fixed values.
My first task is to compute all the combinations of the above given number , 2^3 = 8. This is called S_1 group.
Then I need to compute 'S_2' group and this is all the combinations of the two numbers x0x0x and x1x1x(this means one mismatch in the significant bits), this should give us $\bin{2}{1} * 2^3 = 2 * 2^3 = 16.
EDIT
Each number, x1x1x and x0x0x, is different from the Original number, x1x0x, at one significant bit.
Last group, S_3, is of course two mismatches from the significant bits, this means, all the numbers which pass the form x0x1x, 8 possibilities.
The computation could be computed recursively or independently, that is not a problem.
I would be happy if someone could give a starting point for these computations, since what I have is not so efficient.
EDIT
Maybe I chose my words wrongly, using significant bits. What I meant to say is that a specific places in a five bits number the bit are fixed. Those places I defined as specific bits.
EDIT
I saw already 2 answers and it seems I should have been clearer. What I am more interested in, is finding the numbers x0x0x, x1x1x and x0x1x with respect that this is a simply example. In reality, the group S_1 (in this example x1x0x) would be built with at least 12 bit long numbers and could contain 11 significant bits. Then I would have 12 groups...
If something is still not clear please ask ;)
#include <vector>
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
string format = "x1x0x";
unsigned int sigBits = 0;
unsigned int sigMask = 0;
unsigned int numSigBits = 0;
for (unsigned int i = 0; i < format.length(); ++i)
{
sigBits <<= 1;
sigMask <<= 1;
if (format[i] != 'x')
{
sigBits |= (format[i] - '0');
sigMask |= 1;
++numSigBits;
}
}
unsigned int numBits = format.length();
unsigned int maxNum = (1 << numBits);
vector<vector<unsigned int> > S;
for (unsigned int i = 0; i <= numSigBits; i++)
S.push_back(vector<unsigned int>());
for (unsigned int i = 0; i < maxNum; ++i)
{
unsigned int changedBits = (i & sigMask) ^ sigBits;
unsigned int distance = 0;
for (unsigned int j = 0; j < numBits; j++)
{
if (changedBits & 0x01)
++distance;
changedBits >>= 1;
}
S[distance].push_back(i);
}
for (unsigned int i = 0; i <= numSigBits; ++i)
{
cout << dec << "Set with distance " << i << endl;
vector<unsigned int>::iterator iter = S[i].begin();
while (iter != S[i].end())
{
cout << hex << showbase << *iter << endl;
++iter;
}
cout << endl;
}
return 0;
}
sigMask has a 1 where all your specific bits are. sigBits has a 1 wherever your specific bits are 1. changedBits has a 1 wherever the current value of i is different from sigBits. distance counts the number of bits that have changed. This is about as efficient as you can get without precomputing a lookup table for the distance calculation.
Of course, it doesn't actually matter what the fixed-bit values are, only that they're fixed. xyxyx, where y is fixed and x isn't, will always yield 8 potentials. The potential combinations of the two groups where y varies between them will always be a simple multiplication- that is, for each state that the first may be in, the second may be in each state.
Use bit logic.
//x1x1x
if(01010 AND test_byte) == 01010) //--> implies that the position where 1s are are 1.
There's probably a number-theoretic solution, but, this is very simple.
This needs to be done with a fixed-bit integer type. Some dynamic languages (python for example), will extend bits out if they think it's a good idea.
This is not hard, but it is time consuming, and TDD would be particularly appropriate here.
Related
My task is to design a function that fulfils those requirements:
Function shall sum members of given one-dimensional array. However, it should sum only members whose number of ones in the binary representation is higher than defined threshold (e.g. if the threshold is 4, number 255 will be counted and 15 will not)
The array length is arbitrary
The function shall utilize as little memory as possible and shall be written in an efficient way
The production function code (‘sum_filtered(){..}’) shall not use any standard C library functions (or any other libraries)
The function shall return 0 on success and error code on error
The array elements are of a type 16-bit signed integer and an overflow during calculation shall be regarded as a failure
Use data types that ensure portability between different CPUs (so the calculations will be the same on 8/16/32-bit MCU)
The function code should contain a reasonable amount of comments in doxygen annotation
Here is my solution:
#include <iostream>
using namespace std;
int sum_filtered(short array[], int treshold)
{
// return 1 if invalid input parameters
if((treshold < 0) || (treshold > 16)){return(1);}
int sum = 0;
int bitcnt = 0;
for(int i=0; i < sizeof(array); i++)
{
// Count one bits of integer
bitcnt = 0;
for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}
// Add integer to sum if bitcnt>treshold
if(bitcnt>treshold){sum += array[i];}
}
return(0);
}
int main()
{
short array[5] = {15, 2652, 14, 1562, -115324};
int result = sum_filtered(array, 14);
cout << result << endl;
short array2[5] = {15, 2652, 14, 1562, 15324};
result = sum_filtered(array2, -2);
cout << result << endl;
}
However I'm not sure whether this code is portable between different CPUs.
And I don't how can an overflow occur during calculation and what can be other errors during processing of arrays with this function.
Can somebody more experienced give me his opinion?
Well, I can foresee one problem:
for(int i=0; i < sizeof(array); i++)
array in this context is a pointer, so will likely be 4 on 32bit systems, or 8 on 64bit systems. You really do want to be passing a count variable (in this case 5) into the sum_filtered function (and then you can pass the count as sizeof(array) / sizeof(short)).
Anyhow, this code:
// Count one bits of integer
bitcnt = 0;
for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}
Effectively you are doing a popcount here (which can be done using __builtin_popcount on gcc/clang, or __popcnt on MSVC. They are compiler specific, but usually boil down to a single popcount CPU instruction on most CPUs).
If you do want to do this the slow way, then an efficient approach is to treat the computation as a form of bitwise SIMD operation:
#include <cstdint> // or stdint.h if you have a rubbish compiler :)
uint16_t popcount(uint16_t s)
{
// perform 8x 1bit adds
uint16_t a0 = s & 0x5555;
uint16_t b0 = (s >> 1) & 0x5555;
uint16_t s0 = a0 + b0;
// perform 4x 2bit adds
uint16_t a1 = s0 & 0x3333;
uint16_t b1 = (s0 >> 2) & 0x3333;
uint16_t s1 = a1 + b1;
// perform 2x 4bit adds
uint16_t a2 = s1 & 0x0F0F;
uint16_t b2 = (s1 >> 4) & 0x0F0F;
uint16_t s2 = a2 + b2;
// perform 1x 8bit adds
uint16_t a3 = s2 & 0x00FF;
uint16_t b3 = (s2 >> 8) & 0x00FF;
return a3 + b3;
}
I know it says you can't use stdlib functions (your 4th point), but that shouldn't apply to the standardised integer types surely? (e.g. uint16_t) If it does, well then there is no way to guarantee portability across platforms. You're out of luck.
Personally I'd just use a 64bit integer for the sum. That should reduce the risk of any overflows *(i.e. if the threshold is zero, and all the values are -128, then you'd overflow if the array size exceeded 0x1FFFFFFFFFFFF elements (562,949,953,421,311 in decimal).
#include <cstdint>
int64_t sum_filtered(int16_t array[], uint16_t threshold, size_t array_length)
{
// changing the type on threshold to be unsigned means we don't need to test
// for negative numbers.
if(threshold > 16) { return 1; }
int64_t sum = 0;
for(size_t i=0; i < array_length; i++)
{
if (popcount(array[i]) > threshold)
{
sum += array[i];
}
}
return sum;
}
I have the following snippet:
int n = 10;
int k = n>>1;
std::cout<<k;
This prints 5.
I want k to be the last digit in binary representation of n.
Like bin(n) = 1010
So, I want k to be 0.
I understand long methods are possible. Please suggest a one liner if possible.
Edit:
After going through the comments and answers, I discovered that there are various ways of doing that.
Some of them are:
k = n%2
k = n&1
Thanks to all those who answered the question. :)
int main( )
{
unsigned int val= 0x1010;
//so you just want the least siginificant bit?
//and assign it to another int?
unsigned int assign= val & 0x1;
std::cout << assign << std::endl;
val= 0x1001;
assign= val & 0x1;
std::cout << assign << std::endl;
return 0;
}
UPDATE:
I would add that bit masking is not uncommon with c. I use ints to hold states often
#define STATE_MOTOR_RUNNING 0x0001
#define STATE_UPDATE_DISPLAY 0x0002
#define STATE_COUNTER_READY 0x0004
Then:
unsigned int state= STATE_COUNTER_READY;
if( state & STATE_COUNTER_READY )
{
start_motor( );
state|= STATE_MOTOR_RUNNING;
}
//etc...
You aren't going to be able to avoid some calculation.
int k = n % 10;
will get you the last decimal digit, as that assignment gives k the remainder of division by 10.
Let us say that we have a double, say, x = 4.3241;
Quite simply, I would like to know, how in C++, can one simply retrieve an int for each bit in the representation of a number?
I have seen other questions and read the page on bitset, but I'm afraid I still do not understand how to retrieve those bits.
So, for example, I would like the input to be x = 4.53, and if the bit representation was 10010101, then I would like 8 ints, each one representing each 1 or 0.
Something like:
double doubleValue = ...whatever...;
uint8_t *bytePointer = (uint8_t *)&doubleValue;
for(size_t index = 0; index < sizeof(double); index++)
{
uint8_t byte = bytePointer[index];
for(int bit = 0; bit < 8; bit++)
{
printf("%d", byte&1);
byte >>= 1;
}
}
... will print the bits out, ordered from least significant to most significant within bytes and reading the bytes from first to last. Depending on your machine architecture that means the bytes may or may not be in order of significance. Intel is strictly little endian so you should get all bits from least significant to most; most CPUs use the same endianness for floating point numbers as for integers but even that's not guaranteed.
Just allocate an array and store the bits instead of printing them.
(an assumption made: that there are eight bits in a byte; not technically guaranteed in C but fairly reliable on any hardware you're likely to encounter nowadays)
This is extremely architecture-dependent. After gathering the following information
The Endianess of your target architecture
The floating point representation (e.g. IEEE754)
The size of your double type
you should be able to get the bit representation you're searching for. An example tested on a x86_64 system
#include <iostream>
#include <climits>
int main()
{
double v = 72.4;
// Boilerplate to circumvent the fact bitwise operators can't be applied to double
union {
double value;
char array[sizeof(double)];
};
value = v;
for (int i = 0; i < sizeof(double) * CHAR_BIT; ++i) {
int relativeToByte = i % CHAR_BIT;
bool isBitSet = (array[sizeof(double) - 1 - i / CHAR_BIT] &
(1 << (CHAR_BIT - relativeToByte - 1))) == (1 << (CHAR_BIT - relativeToByte - 1));
std::cout << (isBitSet ? "1" : "0");
}
return 0;
}
Live Example
The output is
0100000001010010000110011001100110011001100110011001100110011010
which, split into sign, exponent and significand (or mantissa), is
0 10000000101 (1.)0010000110011001100110011001100110011001100110011010
(Image taken from wikipedia)
Anyway you're required to know how your target representation works, otherwise these numbers will pretty much be useless to you.
Since your question is unclear whether you want those integers to be in the order that makes sense with regard to the internal representation of your number of simply dump out the bytes at that address as you encounter them, I'm adding another easier method to just dump out every byte at that address (and showing another way of dealing with bit operators and double)
double v = 72.4;
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int bit = CHAR_BIT - 1; bit >= 0; --bit) // Print each byte
std::cout << ((byte & (1 << bit)) == (1 << bit));
}
The above code will simply print each byte from the one at lower address to the one with higher address.
Edit: since it seems you're just interested in how many 1s and 0s are there (i.e. the order totally doesn't matter), in this specific instance I agree with the other answers and I would also just go for a counting solution
uint8_t *array = reinterpret_cast<uint8_t*>(&v);
for (int i = 0; i < sizeof(double); ++i) {
uint8_t byte = array[i];
for (int j = 0; j < CHAR_BIT; ++j) {
std::cout << (byte & 0x1);
byte >>= 1;
}
}
I have an integer(i) occupying 4 bytes and i am assuming that it is stored in the memory like this, with starting address as 1000,
If i write int*p=&i;
p now stores the starting address which is 1000 here.
if i increment p it points to the address 1004.
Is there any way to traverse the address 1000, 1001, 1002 and 1003 so that i can separate and print the digits 1 ,5,2,6 using pointers?
Please help..... :( (newbie)
My assumption of storage maybe wrong Can anyone please help me correct it? :(
EDIT 1
According to the answer given by Mohit Jain below and suggestions by others,
unsigned char *cp = reinterpret_cast<unsigned char *>(&i);
for(size_t idx = 0; idx < sizeof i; ++idx) {
cout << static_cast<int>(cp[idx]);
}
I am getting the answer as
246 5 0 0 .
I realized that the way I was assuming the memory structure was wrong,
So is there no way to get the actual digits using pointers??
An int with the value 1526 will not normally be stored as four bytes with the values 1, 5, 2 and 6.
Instead, it'll be stored in binary. Assuming a little-endian machine, the bytes will have the values: 0, 0, 5, 246 (and if it's big-endian, you'll get the same values in the reverse order). The reason for those numbers is that it can store values from 0 to 255 in each byte. Therefore, it's stored as 5 * 256 + 246. When dealing with values in memory like this, it's often convenient (and quite common) to use hexadecimal instead of decimal, in which case you'd be looking at it as 0x05F6.
The usual way to get decimal digits involves more math than pointers. For example, the least significant digit will be the remainder after dividing the value by 10.
To list the memory contents
Using pointer (endian-ness dependent output)
unsigned char *cp = reinterpret_cast<unsigned char *>(&i);
for(size_t idx = 0; idx < sizeof i; ++idx) {
cout << static_cast<int>(cp[idx]);
}
Without using pointer (endian-ness independent output), because digits are not stored the way you assume.
int copy = i;
unsigned int mask = (1U << CHAR_BIT) - 1U;
for(size_t idx = 0; idx < sizeof i; ++idx) {
cout << (copy & mask);
copy >>= CHAR_BIT;
}
To list the digits
If you want the digits of integer using pointer you should first convert the integer to a string:
std::string digits = std::to_string(i); // You can alternatively use stringstream
char *p = digits.c_str();
for(size_t idx = 0; idx < digits.length(); ++idx) cout << (*p++);
You can cast the pointer to (char *) and increment that pointer to point to beginning of individual bytes. However, your assumption of storage is wrong, so you will not get the digits like that.
As I can see you want to extract each digit of a number.
To achieve it You need to:
get reminder of i divided by 10. Do it like this: const int r = i % 10;
divide i by 10: i /= 10;
if i is not 0, go to 1.
Implementation (not tested) could be like this:
do
{
const int r = i % 10;
// do anything you need with r
i /= 10;
} while (i > 0);
This will give you each digit starting from the less significant.
I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).