I have data that is sometimes best viewed as an array of 10 bytes, sometimes as an array of 80 bits. Maybe a job for a union?
After filling the array with 10 bytes, I scan through the 80 bits and test if set. The scan is advanced bit-by-bit in an ISR, so efficiency is key.
Right now I do this at each interrupt:
volatile uint8_t bit_array[10]; // external to ISR
volatile uint8_t bit_idx;
volatile uint8_t byte_idx;
// -----ISR---------
static uint8_t abyte; // temp byte from array
if (bit_idx == 0) { // at each new byte
bit_idx = 1; // begin at the lowest bit
abyte = bit_array[byte_idx];
}
if (abyte & bit_idx) {
// << do the thing >>
}
if ((bit_idx *= 2) == 0) { // idx << and test for done
if (++byte_idx > 9) { // try next byte
byte_idx = 0;
fill_array_again();
}
}
I have a sense that there's a way to create a union that would allow a straightforward scan of the bits using a single index 0..79, but I don't know enough to try it.
The questions are: can I do that? and: can it be efficient?
You can use the 0 ... 79 range for your index without the need for a union1. You can get the byte index in your array using index / 8 and the bit position (within that byte) using index % 8.
This would certainly simplify your code; however, whether it will be significantly more efficient will depend on a number of factors, like what the target CPU is and how smart your compiler is. But note that the division and remainder operations with 8 as their RHS are trivial for most compilers/architectures and reduce to a bit-shift and a simple mask, respectively.
Here's a possible outline implementation:
uint8_t data[10]; // The 10 bytes
uint8_t index = 0; // index of bits in 0 .. 79 range
void TestISR()
{
// Test the indexed bit using combination of division and remainder ...
if (data[index / 8] & (1 << (index % 8))) {
// Do something
}
// Increment index ...
if (++index > 79) {
index = 0;
refill_array();
}
}
For any compiler that fails to implement the optimized division and remainder operations, the if statement can be re-written thus:
if (data[index >> 3] & (1 << (index & 7))) {
// ...
1 Note that any attempt to actually use a union will likely exhibit undefined behaviour. In C++, reading from a member of a union that wasn't the last one written is UB (although it's acceptable and well-defined in C).
Hello everybody out there! I have a home work assigment where I need to build a high presision calculator that will operate with very large numbers. The whole point of this assigment is that storing the values in arrays as one digit goes to separate array cell is now allowed.
That is memory representation of number
335897294593872
like so
int number[] = {3, 3, 5, 8, 9, 7, 2, 9, 4, 5, 9, 3, 8, 7, 2};
is not legit,
nor
char number[] = {3, 3, 5, 8, 9, 7, 2, 9, 4, 5, 9, 3, 8, 7, 2};
nor
std::string number("335897294593872");
What I want to do is to split up the whole number into 32bit chunks and store each individual chunk in separate array cell data type of which is u32int_t.
Since I get the input from keyboard I store all values in std::string initially and later put them in integer arrays to perform operations.
How do I put binary representation of a large number into an integer array filling in all bits properly?
Thank you in advance.
EDIT: Using standard C++ libraries only
EDIT2: I want to be able to add, subtract, multiply, divide those arrays with large numbers so I mean not to merely cut the string up and store decimal representation in integer array, but rather preserve bits order of the number itself to be able to calculate carry.
This is a rather naïve solution:
If last digit in string is odd store a 1 in result (otherwise leave it 0).
Divide digits in string by 2 (considering carries).
If 32 bits have written add another element to result vector.
Repeat this until string contains 0s only.
Source Code:
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
std::vector<uint32_t> toBigInt(std::string text)
{
// convert string to BCD-like
for (char &c : text) c -= '0';
// build result vector
std::vector<uint32_t> value(1, 0);
uint32_t bit = 1;
for (;;) {
// set next bit if last digit is odd
if (text.back() & 1) value.back() |= bit;
// divide BCD-like by 2
bool notNull = false; int carry = 0;
for (char &c : text) {
const int carryNew = c & 1;
c /= 2; c += carry * 5;
carry = carryNew;
notNull |= c;
}
if (!notNull) break;
// shift bit
bit <<= 1;
if (!bit) {
value.push_back(0); bit = 1;
}
}
// done
return value;
}
std::ostream& operator<<(std::ostream &out, const std::vector<uint32_t> &value)
{
std::ios fmtOld(0); fmtOld.copyfmt(out);
for (size_t i = value.size(); i--;) {
out << std::hex << value[i] << std::setfill('0') << std::setw(sizeof (uint32_t) * 2);
}
out.copyfmt(fmtOld);
return out;
}
int main()
{
std::string tests[] = {
"0", "1",
"4294967295", // 0xffffffff
"4294967296", // 0x100000000
"18446744073709551615", // 0xffffffffffffff
"18446744073709551616", // 0x100000000000000
};
for (const std::string &test : tests) {
std::cout << test << ": " << toBigInt(test) << '\n';
}
return 0;
}
Output:
0: 0
1: 1
4294967295: ffffffff
4294967296: 100000000
18446744073709551615: ffffffffffffffff
18446744073709551616: 10000000000000000
Live Demo on coliru
Notes:
The output is little-endian. (The least significant element is first.)
For the tests, I used numbers where hex-code is simple to check by eyes.
To use an array to store the different parts of a big number is a common way to do the work. Another thing to think of is to consider the different architecture implementations for signed ints, that lead you to have to sacrifice (this is what normal libraries to deal with big integers do) to allow signed to unsigned conversions (you have several ways of doing here) between the parts of your number or how are you going to implement the different arithmetic operations.
I don't generally recommend to use long long integer versions for the array cells, as they are not generally the native size of the architecture, so to give the architecture some chance to do things efficiently, I should use a reduced (at least one bit, to be able to see the carries out from one extended digit to the next) standard unsigned (for example, gnu **libgmp* uses 24bit integers on each array cell ---last time I checked that). It's also common to reduce it to a multiple of char size, so displacements and reallocation of numbers are easier than to make 31 bit displacements on a full array of bits.
It's common that when you work with money or delicate numbers like that you often use Integers, because you can assure, many things of it, so my recommendation is that whenever you work with this big numbers, simulate a fixpoint or floating-point arithmetic with two Ints, so you can "watch" how everything is executing, you could check the IEEE 754 standard for the floating point.
If you store the number in an array make sure to take a constant number of steps to make all the operations that you are doing while manipulating it. Which could be tricky.
I recommend you to trust the integers but fix the size of bits.
But if you really want to go for interesting stuff, try and use the bit-wise operators, and maybe you could get something interesting out if it.
You could check the details of the data types here, in particular, the signed short int, or the long long int, and to confirm sizes of the data types check this
I am reading about sets representing as bits at following location
http://www.brpreiss.com/books/opus4/html/page395.html
class SetAsBitVector : public Set
{
typedef unsigned int Word;
enum { wordBits = bitsizeof (Word) };
Array<Word> vector;
public:
SetAsBitVector (unsigned int);
// ...
};
SetAsBitVector::SetAsBitVector (unsigned int n) :
Set (n),
vector ((n + wordBits - 1U) / wordBits)
{
// Question here?
for (unsigned int i = 0; i < vector.Length (); ++i)
vector [i] = 0;
}
void SetAsBitVector::Insert (Object& object)
{
unsigned int const item = dynamic_cast<Element&> (object);
vector [item / wordBits] |= 1 << item % wordBits;
// Question here
}
To insert an item into the set, we need to change the appropriate bit
in the array of bits to one. The ith bit of the bit array is bit i mod
w of word ceiling(i/w). Thus, the Insert function is implemented using
a bitwise or operation to change the ith bit to one as shown in above
Program . Even though it is slightly more complicated than the
corresponding operation for the SetAsArray class, the running time for
this operation is still O(1). Since w = wordBits is a power of two, it
is possible to replace the division and modulo operations, / and %,
with shifts and masks like this:
vector [item >> shift] |= 1 << (item & mask);
Depending on the compiler and machine architecture, doing so may
improve the performance of the Insert operation by a constant factor
Questions
My question in constructor why author adding wordBits to "n" and subtracting 1, instead we can use directly as n/wordbits?
Second question whay does author mean by statement "ince w = wordBits is a power of two, it is possible to replace the division and modulo operations, / and %, with shifts and masks like this:
vector [item >> shift] |= 1 << (item & mask);
Reequest to give an example in case of above scenario what is value of shift and mask.
Why author mentioned depending on architecture and compiler there is improve in performance?
I re-tagged this as C++, since it's clearly not C.
To round up. Consider what happens if you call it with n equal to something smaller than wordBits for instance. The generic formula is exactly the one being used, i.e. b = (a + Q - 1) / Q makes sure b * Q is at least a.
Basic binary arithmmetic, division by two is equivalent with shifting to the right and so on.
On some machines, bitwise operations like shifts and masks are faster than divisions and modulos.
So I can't figure out how to do this in C++. I need to do a modulus operation and integer conversion on data that is 96 bits in length.
Example:
struct Hash96bit
{
char x[12];
};
int main()
{
Hash96bit n;
// set n to something
int size = 23;
int result = n % size
}
Edit: I'm trying to have a 96 bit hash because i have 3 floats which when combined create a unique combination. Thought that would be best to use as the hash because you don't really have to process it at all.
Edit: Okay... so at this point I might as well explain the bigger issue. I have a 3D world that I want to subdivide into sectors, that way groups of objects can be placed in sectors that would make frustum culling and physics iterations take less time. So at the begging lets say you are at sector 0,0,0. Sure we store them all in array, cool, but what happens when we get far away from 0,0,0? We don't care about those sectors there anymore. So we use a hashmap since memory isn't an issue and because we will be accessing data with sector values rather than handles. Now a sector is 3 floats, hashing that could easily be done with any number of algorithms. I thought it might be better if I could just say the 3 floats together is the key and go from there, I just needed a way to mod a 96 bit number to fit it in the data segment. Anyway I think i'm just gonna take the bottom bits of each of these floats and use a 64 bit hash unless anyone comes up with something brilliant. Thank you for the advice so far.
UPDATE: Having just read your second edit to the question, I'd recommend you use David's jenkin's approach (which I upvoted a while back)... just point it at the lowest byte in your struct of three floats.
Regarding "Anyway I think i'm just gonna take the bottom bits of each of these floats" - again, the idea with a hash function used by a hash table is not just to map each bit in the input (less till some subset of them) to a bit in the hash output. You could easily end up with a lot of collisions that way, especially if the number of buckets is not a prime number. For example, if you take 21 bits from each float, and the number of buckets happens to be 1024 currently, then after % 1024 only 10 bits from one of the floats will be used with no regard to the values of the other floats... hash(a,b,c) == hash(d,e,c) for all c (it's actually a little worse than that - values like 5.5, 2.75 etc. will only use a couple bits of the mantissa....).
Since you're insisting on this (though it's very likely not what you need, and a misnomer to boot):
struct Hash96bit
{
union {
float f[3];
char x[12];
uint32_t u[3];
};
Hash96bit(float a, float b, float c)
{
f[0] = a;
f[1] = b;
f[2] = c;
}
// the operator will support your "int result = n % size;" usage...
operator uint128_t() const
{
return u[0] * ((uint128_t)1 << 64) + // arbitrary ordering
u[1] + ((uint128_t)1 << 32) +
u[2];
}
};
You can use jenkins hash.
uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
{
uint32_t hash, i;
for(hash = i = 0; i < len; ++i)
{
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
I need to find the smallest power of two that's greater or equal to a given value. So far, I have this:
int value = 3221; // 3221 is just an example, could be any number
int result = 1;
while (result < value) result <<= 1;
It works fine, but feels kind of naive. Is there a better algorithm for that problem?
EDIT. There were some nice assembler suggestions, so I'm adding those tags to the question.
A related question, Rounding up to next power of 2 has some C answers where C++20 std::bit_ceil() isn't available.
Most of the answers to this question predate C++20, but could still be useful if implementing a C++ standard library or compiler.
Here's my favorite. Other than the initial check for whether it's invalid (<0, which you could skip if you knew you'd only have >=0 numbers passed in), it has no loops or conditionals, and thus will outperform most other methods. This is similar to erickson's answer, but I think that my decrementing x at the beginning and adding 1 at the end is a little less awkward than his answer (and also avoids the conditional at the end).
/// Round up to next higher power of 2 (return x if it's already a power
/// of 2).
inline int
pow2roundup (int x)
{
if (x < 0)
return 0;
--x;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
return x+1;
}
ceil(log2(value))
ilog2() can be calculated in 3 asm instructions e.g., http://www.asterisk.org/doxygen/1.4/log2comp_8h-source.html
On Intel hardware the BSR instruction is close to what you want - it finds the most-significant-set-bit. If you need to be more precise you can then wonder if the remaining bits are precisely zero or not.
I tend to assume that other CPU's will have something like BSR - this is a question you want answered to normalize a number.
If your number is more than 32 bits then you would do a scan from your most-significant-DWORD to find the first DWORD with ANY bits set.
Edsger Dijkstra would likely remark that the above "algorithms" assume that your computer uses Binary Digits, while from his kind of lofty "algorithmic" perspective you should think about Turing machines or something - obviously I am of the more pragmatic style.
In the spirit of Quake II's 0x5f3759df and the Bit Twiddling Hacks' IEEE version - this solution reaches into a double to extract the exponent as a means to calculate floor(lg2(n)). It's a bit faster than the accepted solution and much faster than the Bit Twiddling IEEE version since it avoids floating point math. As coded, it assumes a double is a real*8 IEEE float on a little endian machine.
int nextPow2(int n)
{
if ( n <= 1 ) return n;
double d = n-1;
return 1 << ((((int*)&d)[1]>>20)-1022);
}
Edit: Add optimized x86 assembly version with the help of a co-worker. A 4% speed gain but still about 50% slower than a bsr version (6 sec vs 4 on my laptop for n=1..2^31-2).
int nextPow2(int n)
{
if ( n <= 1 ) return n;
double d;
n--;
__asm {
fild n
mov eax,4
fstp d
mov ecx, dword ptr d[eax]
sar ecx,14h
rol eax,cl
}
}
Here's a template version of the bit shifting technique.
template<typename T> T next_power2(T value)
{
--value;
for(size_t i = 1; i < sizeof(T) * CHAR_BIT; i*=2)
value |= value >> i;
return value+1;
}
Since the loop only uses constants it gets flattened by the compiler. (I checked) The function is also future proof.
Here's one that uses __builtin_clz. (Also future proof)
template<typename T> T next_power2(T value)
{
return 1 << ((sizeof(T) * CHAR_BIT) - __builtin_clz(value-1));
}
Your implementation is not naive, it's actually the logical one, except that it's wrong - it returns a negative for numbers greater that 1/2 the maximum integer size.
Assuming you can restrict numbers to the range 0 through 2^30 (for 32-bit ints), it'll work just fine, and a lot faster than any mathematical functions involving logarithms.
Unsigned ints would work better but you'd end up with an infinite loop (for numbers greater than 2^31) since you can never reach 2^32 with the << operator.
pow ( 2 , ceil( log2(value) );
log2(value) = log(value) / log(2);
An exploration of the possible solutions to closely related problem (that is, rounding down instead of up), many of which are significantly faster than the naive approach, is available on the Bit Twiddling Hacks page, an excellent resource for doing the kinds of optimization you are looking for. The fastest solution is to use a lookup table with 256 entries, that reduces the total operation count to around 7, from an average of 62 (by a similar operation counting methodology) for the naive approach. Adapting those solutions to your problem is a matter of a single comparison and increment.
You don't really say what you mean by "better algorithm" but as the one you present is perfectly clear (if somewhat flawed), I'll assume you are after a more efficient algorithm.
Larry Gritz has given what is probably the most efficient c/c++ algorithm without the overhead of a look up table and it would suffice in most cases (see http://www.hackersdelight.org for similar algorithms).
As mentioned elsewhere most CPUs these days have machine instructions to count the number of leading zeroes (or equivalently return the ms set bit) however their use is non-portable and - in most cases - not worth the effort.
However most compilers have "intrinsic" functions that allow the use of machine instructions but in a more portable way.
Microsoft C++ has _BitScanReverse() and gcc provides __builtin_clz() to do the bulk of the work efficiently.
How about a recursive template version to generate a compile constant:
template<uint32_t A, uint8_t B = 16>
struct Pow2RoundDown { enum{ value = Pow2RoundDown<(A | (A >> B)), B/2>::value }; };
template<uint32_t A>
struct Pow2RoundDown<A, 1> { enum{ value = (A | (A >> 1)) - ((A | (A >> 1)) >> 1) }; };
template<uint32_t A, uint8_t B = 16>
struct Pow2RoundUp { enum{ value = Pow2RoundUp<((B == 16 ? (A-1) : A) | ((B == 16 ? (A-1) : A) >> B)), B/2>::value }; };
template<uint32_t A >
struct Pow2RoundUp<A, 1> { enum{ value = ((A | (A >> 1)) + 1) }; };
Can be used like so:
Pow2RoundDown<3221>::value, Pow2RoundUp<3221>::value
The code below repeatedly strips the lowest bit off until the number is a power of two, then doubles the result unless the number is a power of two to begin with. It has the advantage of running in a time proportional to the number of bits set. Unfortunately, it has the disadvantage of requiring more instructions in almost all cases than either the code in the question or the assembly suggestions. I include it only for completeness.
int nextPow(int x) {
int y = x
while (x &= (x^(~x+1)))
y = x << 1;
return y
}
I know this is downvote-bait, but if the number is small enough (like 8 or 16-bits) a direct lookup might be fastest.
// fill in the table
unsigned short tab[65536];
unsigned short bit = tab[i];
It might be possible to extend it to 32 bits by first doing the high word and then the low.
//
unsigned long bitHigh = ((unsigned long)tab[(unsigned short)(i >> 16)]) << 16;
unsigned long bitLow = 0;
if (bitHigh == 0){
bitLow = tab[(unsigned short)(i & 0xffff)];
}
unsigned long answer = bitHigh | bitLow;
It's probably no better that the shift-or methods, but maybe could be extended to larger word sizes.
(Actually, this gives the highest 1-bit. You'd have to shift it left by 1 to get the next higher power of 2.)
My version of the same:
int pwr2Test(size_t x) {
return (x & (x - 1))? 0 : 1;
}
size_t pwr2Floor(size_t x) {
// A lookup table for rounding up 4 bit numbers to
// the nearest power of 2.
static const unsigned char pwr2lut[] = {
0x00, 0x01, 0x02, 0x02, // 0, 1, 2, 3
0x04, 0x04, 0x04, 0x04, // 4, 5, 6, 7
0x08, 0x08, 0x08, 0x08, // 8, 9, 10, 11
0x08, 0x08, 0x08, 0x08 // 12, 13, 14, 15
};
size_t pwr2 = 0; // The return value
unsigned int i = 0; // The nybble interator
for( i = 0; x != 0; ++i ) { // Iterate through nybbles
pwr2 = pwr2lut[x & 0x0f]; // rounding up to powers of 2.
x >>= 4; // (i - 1) will contain the
} // highest non-zero nybble index.
i = i? (i - 1) : i;
pwr2 <<= (i * 4);
return pwr2;
}
size_t pwr2Size(size_t x) {
if( pwr2Test(x) ) { return x; }
return pwr2Floor(x) * 2;
}
In standard c++20 this is included in <bit>.
The answer is simply
#include <bit>
unsigned long upper_power_of_two(unsigned long v)
{
return std::bit_ceil(v);
}
i love the shift.
i'll settle for
int bufferPow = 1;
while ( bufferPow<bufferSize && bufferPow>0) bufferPow <<= 1;
that way the loop always terminates, and the part after && is evaluated almost never.
And i do not think two lines are worth a function call. Also you can make a long, or short, depending on your judgment, and it is very readable.
(if bufferPow becomes negative, hopefully your main code will exit fast.)
Usually you compute 2-power only once at the start of an algorithm, so optimizing would be silly anyway. However, would be interested if anyone bored enough would care for a speed contest... using the above examples and 255 256 257 .. 4195 4196 4197
An arbitrary log function can be converted to a log base 2 by dividing by the log of 2:
$ /usr/local/pypy-1.9/bin/pypy
Python 2.7.2 (341e1e3821ff, Jun 07 2012, 15:38:48)
[PyPy 1.9.0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``<arigato> yes but there is not
much sense if I explain all about today's greatest idea if tomorrow it's
completely outdated''
>>>> import math
>>>> print math.log(65535)/math.log(2)
15.9999779861
>>>> print math.log(65536)/math.log(2)
16.0
>>>>
It of course won't be 100% precise, since there is floating point arithmetic involved.
This works and is really fast (on my 2.66 GHz Intel Core 2 Duo 64-bit processor).
#include <iostream>
int main(void) {
int testinput,counter;
std::cin >> testinput;
while (testinput > 1) {
testinput = testinput >> 1;
counter++;
}
int finalnum = testinput << counter+1;
printf("Is %i\n",finalnum);
return 0;
}
I tested it on 3, 6, and 65496, and correct answers (4, 8, and 65536) were given.
Sorry if this seems a bit arcane, I was under the influence of a couple of hours of Doom just before writing. :)