Calculate and Store Power of very large Number - c++

I am finding pow(2,i) where i can range: 0<=i<=100000.
Apart i have MOD=1000000007
powers[100000];
powers[0]=1;
for (i = 1; i <=100000; ++i)
{
powers[i]=(powers[i-1]*2)%MOD;
}
for i=100000 won't power value become greater than MOD ?
How do I store the power correctly?
The operation doesn't look feasible to me.
I am getting correct value up to i=70 max I guess.
I have to find sum+= ar[i]*power(2,i) and finally print sum%1000000007 where ar[i] is an additional array with some big numbers up to 10^5

As long as your modulus value is less than half the capacity of your data type, it will never be exceeded. That's because you take the previous value in the range 0..1000000006, double it, then re-modulo it bringing it back to that same range.
However, I can't guarantee that higher values won't cause you troubles, it's more mathematical analysis than I'm prepared to invest given the simple alternative. You could spend a lot of time analysing, checking and debugging, but it's probably better just to not allow the problem to occur in the first place.
The alternative? I'd tend to use the pre-generation method (having a program do the gruntwork up front, inserting the pre-generated values into an array easily and speedily accessible from your real program).
With this method, you can use tools that are well tested and known to work with massive values. Since this data is not going to change, it's useless calculating it every time your program starts.
If you want an easy (and efficient) way to do this, the following bash script in conjunction with bc and awk can do this:
#!/usr/bin/bash
bc >nums.txt <<EOF
i = 1;
for (x = 0;x <= 10000; x++) {
i % 1000000007;
i = i * 2;
}
EOF
awk 'BEGIN { printf "static int array[] = {" }
{ if (NR % 5 == 1) printf "\n ";
printf "%s, ",$0;
next
}
END { print "\n};" }' nums.txt
The bc part is the "meat" of the matter, it creates the large powers of two and outputs them modulo the number you provided. The awk part is simply to format them in C-style array elements, five per line.
Just take the output of that and put it into your code and, voila, there you have it, a compile-time-expensed array that you can use for fast lookup.
It takes only a second and a half on my box to generate the array and then you never need to do it again. You also won't have to concern yourself with the vagaries of modulo math :-)
static int array[] = {
1,2,4,8,16,
32,64,128,256,512,
1024,2048,4096,8192,16384,
32768,65536,131072,262144,524288,
1048576,2097152,4194304,8388608,16777216,
33554432,67108864,134217728,268435456,536870912,
73741817,147483634,294967268,589934536,179869065,
359738130,719476260,438952513,877905026,755810045,
511620083,23240159,46480318,92960636,185921272,
371842544,743685088,487370169,974740338,949480669,
898961331,797922655,595845303,191690599,383381198,
766762396,533524785,67049563,134099126,268198252,
536396504,72793001,145586002,291172004,582344008,
164688009,329376018,658752036,317504065,635008130,
270016253,540032506,80065005,160130010,320260020,
640520040,281040073,562080146,124160285,248320570,
:
861508356,723016705,446033403,892066806,784133605,
568267203,136534399,273068798,546137596,92275185,
184550370,369100740,738201480,476402953,952805906,
905611805,
};

If you notice that your modulo can be stored in int. MOD=1000000007(decimal) is equivalent of 0b00111011100110101100101000000111 and can be stored in 32 bits.
- i pow(2,i) bit representation
- 0 1 0b00000000000000000000000000000001
- 1 2 0b00000000000000000000000000000010
- 2 4 0b00000000000000000000000000000100
- 3 8 0b00000000000000000000000000001000
- ...
- 29 536870912 0b00100000000000000000000000000000
Tricky part starts when pow(2,i) is grater than your MOD=1000000007, but if you know that current pow(2,i) will be greater than your MOD, you can actually see how bits look like after MOD
- i pow(2,i) pow(2,i)%MOD bit representation
- 30 1073741824 73741817 0b000100011001010011000000000000
- 31 2147483648 147483634 0b001000110010100110000000000000
- 32 4294967296 294967268 0b010001100101001100000000000000
- 33 8589934592 589934536 0b100011001010011000000000000000
So if you have pow(2,i-1)%MOD you can do *2 actually on pow(2,i-1)%MOD till you're next pow(2,i) will be greater than MOD.
In example for i=34 you will use (589934536*2) MOD 1000000007 instead of (8589934592*2) MOD 1000000007, because 8589934592 can't be stored in int.
Additional you can try bit operations instead of multiplication for pow(2,i).
Bit operation same as multiplication for 2 is bit shift left.

Related

Calculating Probability C++ Bernoulli Trials

The program asks the user for the number of times to flip a coin (n; the number of trials).
A success is considered a heads.
Flawlessly, the program creates a random number between 0 and 1. 0's are considered heads and success.
Then, the program is supposed to output the expected values of getting x amount of heads. For example if the coin was flipped 4 times, what are the following probabilities using the formula
nCk * p^k * (1-p)^(n-k)
Expected 0 heads with n flips: xxx
Expected 1 heads with n flips: xxx
...
Expected n heads with n flips: xxx
When doing this with "larger" numbers, the numbers come out to weird values. It happens if 15 or twenty are put into the input. I have been getting 0's and negative values for the value that should be xxx.
Debugging, I have noticed that the nCk has come out to be negative and not correct towards the upper values and beleive this is the issue. I use this formula for my combination:
double combo = fact(n)/fact(r)/fact(n-r);
here is the psuedocode for my fact function:
long fact(int x)
{
int e; // local counter
factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
Any thoughts? My guess is my factorial or combo functions are exceeding the max values or something.
You haven't mentioned how is factor declared. I think you are getting integer overflows. I suggest you use double. That is because since you are calculating expected values and probabilities, you shouldn't be concerned much about precision.
Try changing your fact function to.
double fact(double x)
{
int e; // local counter
double factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
EDIT:
Also to calculate nCk, you need not calculate factorials 3 times. You can simply calculate this value in the following way.
if k > n/2, k = n-k.
n(n-1)(n-2)...(n-k+1)
nCk = -----------------------
factorial(k)
You're exceeding the maximum value of a long. Factorial grows so quickly that you need the right type of number--what type that is will depend on what values you need.
Long is an signed integer, and as soon as you pass 2^31, the value will become negative (it's using 2's complement math).
Using an unsigned long will buy you a little time (one more bit), but for factorial, it's probably not worth it. If your compiler supports long long, then try an "unsigned long long". That will (usually, depends on compiler and CPU) double the number of bits you're using.
You can also try switching to use double. The problem you'll face there is that you'll lose accuracy as the numbers increase. A double is a floating point number, so you'll have a fixed number of significant digits. If your end result is an approximation, this may work okay, but if you need exact values, it won't work.
If none of these solutions will work for you, you may need to resort to using an "infinite precision" math package, which you should be able to search for. You didn't say if you were using C or C++; this is going to be a lot more pleasant with C++ as it will provide a class that acts like a number and that would use standard arithmetic operators.

Sieve of eratosthenes : bit wise optimized

After searching the net I came to know that the bit-wise version of the sieve of eratosthenes is pretty efficient.
The problem is I am unable to understand the math/method it is using.
The version that I have been busy with looks like this:
#define MAX 100000000
#define LIM 10000
unsigned flag[MAX>>6]={0};
#define ifc(n) (flag[n>>6]&(1<<((n>>1)&31))) //LINE 1
#define isc(n) (flag[n>>6]|=(1<<((n>>1)&31))) //LINE 2
void sieve() {
unsigned i, j, k;
for(i=3; i<LIM; i+=2)
if(!ifc(i))
for(j=i*i, k=i<<1; j<LIM*LIM; j+=k)
isc(j);
}
Points that I understood (Please correct me if I am wrong):
Statement in line 1 checks if the number is composite.
Statement in line 2 marks the number 'n' as composite.
The program is storing the value 0 or 1 at a bit of an int. This tends to reduce the memory usage to x/32. (x is the size that would have been used had an int been used to store the 0 or 1 instead of a bit like in my solution above)
Points that are going above my head as of now :
How is the finction in LINE 1 functioning.How is the function making sure that the number is composite or not.
How is function in LINE 2 setting the bit.
I also came to know that the bitwise sieve is timewise efficient as
well. Is it because of the use of bitwise operators only or
something else is contributing to it as well.
Any ideas or suggestions?
Technically, there is a bug in the code as well:
unsigned flag[MAX>>6]={0};
divides MAX by 64, but if MAX is not an exact multiple of 64, the array is one element short.
Line 1: Let's pick it apart:
(flag[n>>6]&(1<<((n>>1)&31)))
The flag[n>>6] (n >> 6 = n / 64) gives the 32-bit integer that holds the bit value for n / 2.
Since only "Odd" numbers are possible primes, divide n by two: (n>>1).
The 1<<((n>>1)&31) gives us the bit corresponding to n/2 within the 0..31 - (& 31 makes sure that it's "in range").
Finally, use & to combine the value on the left with the value on the right.
So, the result is true if element for n has bit number n modulo 32 set.
The second line is essentially the same concept, just that it uses |= (or equal) to set the bit corresponding to the multiple.

Analysis of the usage of prime numbers in hash functions

I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).

C/C++ Bit Array or Bit Vector

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

Is there any alternative to using % (modulus) in C/C++?

I read somewhere once that the modulus operator is inefficient on small embedded devices like 8 bit micro-controllers that do not have integer division instruction. Perhaps someone can confirm this but I thought the difference is 5-10 time slower than with an integer division operation.
Is there another way to do this other than keeping a counter variable and manually overflowing to 0 at the mod point?
const int FIZZ = 6;
for(int x = 0; x < MAXCOUNT; x++)
{
if(!(x % FIZZ)) print("Fizz\n"); // slow on some systems
}
vs:
The way I am currently doing it:
const int FIZZ = 6;
int fizzcount = 1;
for(int x = 1; x < MAXCOUNT; x++)
{
if(fizzcount >= FIZZ)
{
print("Fizz\n");
fizzcount = 0;
}
}
Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.
Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.
So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:
Maybe a better way to think about the problem is in terms of number
bases and modulo arithmetic. For example, your goal is to compute DOW
mod 7 where DOW is the 16-bit representation of the day of the
week. You can write this as:
DOW = DOW_HI*256 + DOW_LO
DOW%7 = (DOW_HI*256 + DOW_LO) % 7
= ((DOW_HI*256)%7 + (DOW_LO % 7)) %7
= ((DOW_HI%7 * 256%7) + (DOW_LO%7)) %7
= ((DOW_HI%7 * 4) + (DOW_LO%7)) %7
Expressed in this manner, you can separately compute the modulo 7
result for the high and low bytes. Multiply the result for the high by
4 and add it to the low and then finally compute result modulo 7.
Computing the mod 7 result of an 8-bit number can be performed in a
similar fashion. You can write an 8-bit number in octal like so:
X = a*64 + b*8 + c
Where a, b, and c are 3-bit numbers.
X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
= (a%7 + b%7 + c%7) % 7
= (a + b + c) % 7
since 64%7 = 8%7 = 1
Of course, a, b, and c are
c = X & 7
b = (X>>3) & 7
a = (X>>6) & 7 // (actually, a is only 2-bits).
The largest possible value for a+b+c is 7+7+3 = 17. So, you'll need
one more octal step. The complete (untested) C version could be
written like:
unsigned char Mod7Byte(unsigned char X)
{
X = (X&7) + ((X>>3)&7) + (X>>6);
X = (X&7) + (X>>3);
return X==7 ? 0 : X;
}
I spent a few moments writing a PIC version. The actual implementation
is slightly different than described above
Mod7Byte:
movwf temp1 ;
andlw 7 ;W=c
movwf temp2 ;temp2=c
rlncf temp1,F ;
swapf temp1,W ;W= a*8+b
andlw 0x1F
addwf temp2,W ;W= a*8+b+c
movwf temp2 ;temp2 is now a 6-bit number
andlw 0x38 ;get the high 3 bits == a'
xorwf temp2,F ;temp2 now has the 3 low bits == b'
rlncf WREG,F ;shift the high bits right 4
swapf WREG,F ;
addwf temp2,W ;W = a' + b'
; at this point, W is between 0 and 10
addlw -7
bc Mod7Byte_L2
Mod7Byte_L1:
addlw 7
Mod7Byte_L2:
return
Here's a liitle routine to test the algorithm
clrf x
clrf count
TestLoop:
movf x,W
RCALL Mod7Byte
cpfseq count
bra fail
incf count,W
xorlw 7
skpz
xorlw 7
movwf count
incfsz x,F
bra TestLoop
passed:
Finally, for the 16-bit result (which I have not tested), you could
write:
uint16 Mod7Word(uint16 X)
{
return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}
Scott
If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:
x % 8 == x & 7
x % 256 == x & 255
A few caveats:
This only works if the second number is a power of two.
It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which does guarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.
There is an overhead most of the time in using modulo that are not powers of 2.
This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.
For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).
However, one rule of thumb is to select array sizes etc. to be powers of 2.
so if calculating day of week, may as well use %7 regardless
if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F
Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!
Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.
#Matthew is right. Try this:
int main() {
int i;
for(i = 0; i<=1024; i++) {
if (!(i & 0xFF)) printf("& i = %d\n", i);
if (!(i % 0x100)) printf("mod i = %d\n", i);
}
}
x%y == (x-(x/y)*y)
Hope this helps.
Do you have access to any programmable hardware on the embedded device? Like counters and such? If so, you might be able to write a hardware based mod unit, instead of using the simulated %. (I did that once in VHDL. Not sure if I still have the code though.)
Mind you, you did say that division was 5-10 times faster. Have you considered doing a division, multiplication, and subtraction to simulated the mod? (Edit: Misunderstood the original post. I did think it was odd that division was faster than mod, they are the same operation.)
In your specific case, though, you are checking for a mod of 6. 6 = 2*3. So you could MAYBE get some small gains if you first checked if the least significant bit was a 0. Something like:
if((!(x & 1)) && (x % 3))
{
print("Fizz\n");
}
If you do that, though, I'd recommend confirming that you get any gains, yay for profilers. And doing some commenting. I'd feel bad for the next guy who has to look at the code otherwise.
You should really check the embedded device you need. All the assembly language I have seen (x86, 68000) implement the modulus using a division.
Actually, the division assembly operation returns the result of the division and the remaining in two different registers.
In the embedded world, the "modulus" operations you need to do are often the ones that break down nicely into bit operations that you can do with &, | and sometimes >>.
#Jeff V: I see a problem with it! (Beyond that your original code was looking for a mod 6 and now you are essentially looking for a mod 8). You keep doing an extra +1! Hopefully your compiler optimizes that away, but why not just test start at 2 and go to MAXCOUNT inclusive? Finally, you are returning true every time that (x+1) is NOT divisible by 8. Is that what you want? (I assume it is, but just want to confirm.)
For modulo 6 you can change the Python code to C/C++:
def mod6(number):
while number > 7:
number = (number >> 3 << 1) + (number & 0x7)
if number > 5:
number -= 6
return number
Not that this is necessarily better, but you could have an inner loop which always goes up to FIZZ, and an outer loop which repeats it all some certain number of times. You've then perhaps got to special case the final few steps if MAXCOUNT is not evenly divisible by FIZZ.
That said, I'd suggest doing some research and performance profiling on your intended platforms to get a clear idea of the performance constraints you're under. There may be much more productive places to spend your optimisation effort.
The print statement will take orders of magnitude longer than even the slowest implementation of the modulus operator. So basically the comment "slow on some systems" should be "slow on all systems".
Also, the two code snippets provided don't do the same thing. In the second one, the line
if(fizzcount >= FIZZ)
is always false so "FIZZ\n" is never printed.