how to fix hamming weight invariants - bit-manipulation

I am learning Dafny, attempting to write a specification for the hamming weight problem, aka the number of 1 bits in a number. I believe I have gotten the specification correct, but it still doesn't verify. For speed of verification I limited it to 8 bit numbers;
problem definition: https://leetcode.com/problems/number-of-1-bits/
function method twoPow(x: bv16): bv16
requires 0 <= x <= 16
{
1 << x
}
function method oneMask(n: bv16): bv16
requires 0 <= n <= 16
ensures oneMask(n) == twoPow(n)-1
{
twoPow(n)-1
}
function countOneBits(n:bv8): bv8 {
if n == 0 then 0 else (n & 1) + countOneBits(n >> 1)
}
method hammingWeight(n: bv8) returns (count: bv8 )
ensures count == countOneBits(n)
{
count := 0;
var i := 0;
var n' := n;
assert oneMask(8) as bv8 == 255; //passes
while i < 8
invariant 0 <= i <= 8
invariant n' == n >> i
invariant count == countOneBits(n & oneMask(i) as bv8);
{
count := count + n' & 1;
n' := n' >> 1;
i := i + 1;
}
}
I have written the same code in javascript to test the behavior and example the invariant values before and after the loop. I don't seen any problems.
function twoPow(x) {
return 1 << x;
}
function oneMask(n) {
return twoPow(n)-1;
}
function countOneBits(n) {
return n === 0 ? 0 : (n & 1) + countOneBits(n >> 1)
}
function hammingWeight(n) {
if(n < 0 || n > 256) throw new Error("out of range")
console.log(`n: ${n} also ${n.toString(2)}`)
let count = 0;
let i = 0;
let nprime = n;
console.log("beforeloop",`i: ${i}`, `n' = ${nprime}`, `count: ${count}`, `oneMask: ${oneMask(i)}`, `cb: ${countOneBits(n & oneMask(i))}`)
console.log("invariants", i >= 0 && i <= 8, nprime == n >> i, count == countOneBits(n & oneMask(i)));
while (i < 8) {
console.log("");
console.log('before',`i: ${i}`, `n' = ${nprime}`, `count: ${count}`, `oneMask: ${oneMask(i)}`, `cb: ${countOneBits(n & oneMask(i))}`)
console.log("invariants", i >= 0 && i <= 8, nprime == n >> i, count == countOneBits(n & oneMask(i)));
count += nprime & 1;
nprime = nprime >> 1;
i++;
console.log('Afterloop',`i: ${i}`, `n' = ${nprime}`, `count: ${count}`, `oneMask: ${oneMask(i)}`, `cb: ${countOneBits(n & oneMask(i))}`)
console.log("invariants", i >= 0 && i <= 8, nprime == n >> i, count == countOneBits(n & oneMask(i)));
}
return count;
};
hammingWeight(128);
All invariants evaluate as true. I must be missing something. it says invariant count == countOneBits(n & oneMask(i) as bv8); might not be maintained by the loop. Running the javascript shows that they are all true. Is it due to the cast of oneMask to bv8?
edit:
I replaced the mask function with one that didn't require casting and that still not resolve the problem.
function method oneMaskOr(n: bv8): bv8
requires 0 <= n <= 8
ensures oneMaskOr(n) as bv16 == oneMask(n as bv16)
{
if n == 0 then 0 else (1 << (n-1)) | oneMaskOr(n-1)
}
One interesting thing I found is that it shows me a counter example where it has reached the end of the loop and the final bit of the input variable n is set, so values 128 or greater. But when I add an assertion above the loop that value equals the count at the end of the loop it then shows me the another value of n.
assert 1 == countOneBits(128 & OneMaskOr(8)); //counterexample -> 192
assert 2 == countOneBits(192 & OneMaskOr(8)); //counterexample -> 160
So it seems like it isn't evaluating the loop invariant after the end of loop? I thought the whole point of the invariants was to evaluate after the end of loop.
Edit 2:
I figured it out, apparently adding the explicit decreases clause to the while loop fixed it. I don't get it though. I thought Dafny could figure this out.
while i < 8
invariant 0 <= i <= 8
invariant n' == n >> i
invariant count == countOneBits(n & oneMask(i) as bv8);
decreases 8 - i
{
I see one line in the docs for loop termination saying
If the decreases clause of a loop specifies *, then no termination check will be performed. Use of this feature is sound only with respect to partial correctness.
So is if the decreases clause is missing does it default to *?

After playing around, I did find a version which passes though it required reworking countOneBits() so that its recursion followed the order of iteration:
function countOneBits(n:bv8, i: int, j:int): bv8
requires i ≥ 0 ∧ i ≤ j ∧ j ≤ 8
decreases 8-i {
if i == j then 0
else (n&1) + countOneBits(n >> 1, i+1, j)
}
method hammingWeight(n: bv8) returns (count: bv8 )
ensures count == countOneBits(n,0,8)
{
count ≔ 0;
var i ≔ 0;
var n' ≔ n;
//
assert count == countOneBits(n,0,i);
//
while i < 8
invariant 0 ≤ i ≤ 8;
invariant n' == n >> i;
invariant count == countOneBits(n,0,i);
{
count ≔ (n' & 1) + count;
n' ≔ n' >> 1;
i ≔ i + 1;
}
}
The intuition here is that countOneBits(n,i,j) returns the number of 1 bits between i (inclusive) and j (exclusive). This then reflects what the loop is doing as we increase i.

Related

How to invoke lemma in dafny in bubblesort example?

How to invoke lemma as the reasoning for equality to be true. Consider the following example in dafny where I've created a lemma to say that the sum from 0 to n is n choose 2. Dafny seems to have no problem justifying this lemma. I want to use that lemma to say that the number of swaps in bubblesort has an upper bound of the sum from 0 to n which is equivalent to n choose 2. That being the case, I currently have an error when I try to say it's true by the lemma. Why is this happening and how can I say that equality is true given a lemma?
method BubbleSort(a: array?<int>) returns (n: nat)
modifies a
requires a != null
ensures n <= (a.Length * (a.Length - 1))/2
{
var i := a.Length - 1;
n := 0;
while (i > 0)
invariant 0 <= i < a.Length
{
var j := 0;
while (j < i)
invariant j <= i
invariant n <= SumRange(i, a.Length)
{
if(a[j] > a[j+1])
{
a[j], a[j+1] := a[j+1], a[j];
n := n + 1;
}
j := j + 1;
}
i := i -1;
}
assert n <= SumRange(i, a.Length) == (a.Length * (a.Length - 1))/2 by {SumRangeNChoose2(a.Length)};
assert n <= (a.Length * (a.Length - 1))/2
}
function SumRange(lo: int, hi: int): int
decreases hi - lo
{
if lo == hi then hi
else if lo >= hi then 0
else SumRange(lo, hi - 1) + hi
}
lemma SumRangeNChoose2(n: nat)
ensures SumRange(0, n) == (n * (n - 1))/2
{}
You just have a syntax error. There should be no semicolon after the } on the second to last line of BubbleSort. Also, there should be a semicolon after the assertion on the following line.
After you fix the syntax errors, there are several deeper errors in the code about missing invariants, etc. But they can all be fixed using the annotations from my answer to your other question.

How do i reduce the repeadly use of % operator for faster execution in C

This is code -
for (i = 1; i<=1000000 ; i++ ) {
for ( j = 1; j<= 1000000; j++ ) {
for ( k = 1; k<= 1000000; k++ ) {
if (i % j == k && j%k == 0)
count++;
}
}
}
or is it better to reduce any % operation that goes upto million times in any programme ??
edit- i am sorry ,
initialized by 0, let say i = 1 ok!
now, if i reduce the third loop as #darshan's answer then both the first
&& second loop can run upto N times
and also it calculating % , n*n times. ex- 2021 mod 2022 , then 2021 mod 2023..........and so on
so my question is- % modulus is twice (and maybe more) as heavy as compared to +, - so there's any other logic can be implemented here ?? which is alternate for this question. and gives the same answer as this logic will give..
Thank you so much for knowledgeable comments & help-
Question is:
3 integers (A,B,C) is considered to be special if it satisfies the
following properties for a given integer N :
A mod B=C
B mod C=0
1≤A,B,C≤N
I'm so curious if there is any other smartest solution which can greatly reduces time complexity.
A much Efficient code will be the below one , but I think it can be optimized much more.
First of all modulo (%) operator is quite expensive so try to avoid it on a large scale
for(i = 0; i<=1000000 ; i++ )
for( j = 0; j<= 1000000; j++ )
{
a = i%j;
for( k = 0; k <= j; k++ )
if (a == k && j%k == 0)
count++;
}
We placed a = i%j in second loop because there is no need for it to be calculated every time k changes as it is independent of k and for the condition j%k == 0 to be true , k should be <= j hence change looping restrictions
First of all, your code has undefined behavior due to division by zero: when k is zero then j%k is undefined, so I assume that all your loops should start with 1 and not 0.
Usually the % and the / operators are much slower to execute than any other operation. It is possible to get rid of most invocations of the % operators in your code by several simple steps.
First, look at the if line:
if (i % j == k && j%k == 0)
The i % j == k has a very strict constrain over k which plays into your hands. It means that it is pointless to iterate k at all, since there is only one value of k that passes this condition.
for (i = 1; i<=1000000 ; i++ ) {
for ( j = 1; j<= 1000000; j++ ) {
k = i % j;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
To get rid of "i % j" switch the loop. This change is possible since this code is affected only by which combinations of i,j are tested, not in the order in which they are introduced.
for ( j = 1; j<= 1000000; j++ ) {
for (i = 1; i<=1000000 ; i++ ) {
k = i % j;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
Here it is easy to observe how k behaves, and use that in order to iterate on k directly without iterating on i and so getting rid of i%j. k iterates from 1 to j-1 and then does it again and again. So all we have to do is to iterate over k directly in the loop of i. Note that i%j for j == 1 is always 0, and since k==0 does not pass the condition of the if we can safely start with j=2, skipping 1:
for ( j = 2; j<= 1000000; j++ ) {
for (i = 1, k=1; i<=1000000 ; i++, k++ ) {
if (k == j)
k = 0;
// Constrain k to the range of the original loop.
if (k <= 1000000 && k > 0 && j%k == 0)
count++;
}
}
This is still a waste to run j%k repeatedly for the same values of j,k (remember that k repeats several times in the inner loop). For example, for j=3 the values of i and k go {1,1}, {2,2}, {3,0}, {4,1}, {5,2},{6,0},..., {n*3, 0}, {n*3+1, 1}, {n*3+2, 2},... (for any value of n in the range 0 < n <= (1000000-2)/3).
The values beyond n= floor((1000000-2)/3)== 333332 are tricky - let's have a look. For this value of n, i=333332*3=999996 and k=0, so the last iteration of {i,k}: {n*3,0},{n*3+1,1},{n*3+2, 2} becomes {999996, 0}, {999997, 1}, {999998, 2}. You don't really need to iterate over all these values of n since each of them does exactly the same thing. All you have to do is to run it only once and multiply by the number of valid n values (which is 999996+1 in this case - adding 1 to include n=0).
Since that did not cover all elements, you need to continue the remainder of the values: {999999, 0}, {1000000, 1}. Notice that unlike other iterations, there is no third value, since it would set i out-of-range.
for (int j = 2; j<= 1000000; j++ ) {
if (j % 1000 == 0) std::cout << std::setprecision(2) << (double)j*100/1000000 << "% \r" << std::flush;
int innerCount = 0;
for (int k=1; k<j ; k++ ) {
if (j%k == 0)
innerCount++;
}
int innerLoopRepeats = 1000000/j;
count += innerCount * innerLoopRepeats;
// complete the remainder:
for (int k=1, i= j * innerLoopRepeats+1; i <= 1000000 ; k++, i++ ) {
if (j%k == 0)
count++;
}
}
This is still extremely slow, but at least it completes in less than a day.
It is possible to have a further speed up by using an important property of divisibility.
Consider the first inner loop (it's almost the same for the second inner loop),
and notice that it does a lot of redundant work, and does it expensively.
Namely, if j%k==0, it means that k divides j and that there is pairK such that pairK*k==j.
It is trivial to calculate the pair of k: pairK=j/k.
Obviously, for k > sqrt(j) there is pairK < sqrt(j). This implies that any k > sqrt(j) can be extracted simply
by scanning all k < sqrt(j). This feature lets you loop over only a square root of all interesting values of k.
By searching only for sqrt(j) values gives a huge performance boost, and the whole program can finish in seconds.
Here is a view of the second inner loop:
// complete the remainder:
for (int k=1, i= j * innerLoopRepeats+1; i <= 1000000 && k*k <= j; k++, i++ ) {
if (j%k == 0)
{
count++;
int pairI = j * innerLoopRepeats + j / k;
if (pairI != i && pairI <= 1000000) {
count++;
}
}
}
The first inner loop has to go over a similar transformation.
Just reorder indexation and calculate A based on constraints:
void findAllSpecial(int N, void (*f)(int A, int B, int C))
{
// 1 ≤ A,B,C ≤ N
for (int C = 1; C < N; ++C) {
// B mod C = 0
for (int B = C; B < N; B += C) {
// A mod B = C
for (int A = C; A < N; A += B) {
f(A, B, C);
}
}
}
}
No divisions not useless if just for loops and adding operations.
Below is the obvious optimization:
The 3rd loop with 'k' is really not needed as there is already a many to One mapping from (I,j) -> k
What I understand from the code is that you want to calculate the number of (i,j) pairs such that the (i%j) is a factor of j. Is this correct or am I missing something?

C++. Logical expression

I am trying to solve the following task. Given an integer x, I want to compute 2 numbers (a and b) which are:
1) divisible by 2 or 3 (e.g a % 2 == 0 or a % 3 == 0 )
2) x = a +
b
I wrote the following code, to find such a and b.
cin >> x;
if (x % 2 == 0) {
a = b = x / 2;
} else {
a = x / 2;
b = a + 1;
}
while((a % 3 !=0 || a % 2 != 0) && (b % 2 != 0 || b % 3 != 0)) {
a++;
b--;
}
However it does not work. For example when x is 13, it prints out that a = 6 and b = 7. But 7 is not divisible by 2 or 3. What is the problem ?
Examine your continuation condition carefully (where n is some arbitrary integer, possibly different in each use so, for example a != 3n simply means that a is not a multiple of three). I'll show the process:
while((a % 3 != 0 || a % 2 != 0) && (b % 2 != 0 || b % 3 != 0))
( a != 3n OR a != 2n ) AND ( b != 2n OR b != 3n )
( a != 6n ) AND ( b != 6n )
It says: continue while both a is not a multiple of both two and three, and b is not a multiple of both two and three. In other words, it will only continue if both a and b are multiples of six. On the flip side of that, it will of course exit if either a or b is not a multiple of six.
Since an input value of 13 sets a = 6 and b = 7, the continuation case is false on the first iteration (seven is not a multiple of six).
Perhaps it would be better to rethink the way you figure out whether certain number combinations are valid(a). For example (assuming the numbers have to be between 1 and N - 1 since, otherwise, your solution space is likely to be infinite), you could use something like:
#include <iostream>
int main() {
// Get the number.
int num;
std::cout << "Number? ";
std::cin >> num;
// Check all i + j = n for 1 <= i,j < n.
for (int i = 1, j = num - 1; i < j; ++i, --j) {
// Disregard if either number not a multiple of 2 or 3.
if ((i % 2) != 0 && (i % 3) != 0) continue;
if ((j % 2) != 0 && (j % 3) != 0) continue;
std::cout << num << " => " << i << ", " << j << "\n";
return 0;
}
std::cout << num << " => no solution\n";
return 0;
}
Note my use of i < j is the for continuation condition, this is assuming they have to be distinct numbers. If they're allowed to be the same number, change that to i <= j.
(a) Using all of and, or, and not (even implicitly, by the reversal of continuation and exit conditions), is sometimes more trouble than it's worth since De Morgan's theorems tend to come into play:
_____ _ _
A ∩ B ⇔ A ∪ B : (not(A and B)) is ((not A) or (not B))
_____ _ _
A ∪ B ⇔ A ∩ B : (not(A or B)) is ((not A) and (not B))
In cases like that, code can become more readable if you break out the individual checks.
Interestingly, if you run that code with quite a few input values, you'll see the pattern that, if a solution exists, one of the numbers in that solution is always two or three.
That's because, other than the pathological cases of sums less than five (or less than four if the solution is allowed to have identical numbers):
every even number 2n, n > 1 is the sum of 2 and 2n - 2, both multiples of two (2n - 2 = 2(n - 1)); and
every odd number 2n + 1, n > 2 is the sum of 3 and 2n + 1 - 3, the first a multiple of three and the second a multiple of two (2n + 1 - 3 = 2n - 2 = 2(n - 1)).
So, in reality, no loop is required:
if (num < 5) { // 4 if allowing duplicates.
std::cout << num << " => no solution\n";
} else {
int first = (num % 2) == 0 ? 2 : 3;
std::cout << num << " => " << first << ", " << (num - first) << "\n";
}
This actually gives a different result for some numbers, such as 17 = 2 + 15 = 3 + 14 but both solutions are still correct.

Can someone thoroughly explain this IsItPrime Function?

This is the function I found on CodeReview site, it determines if a number is a prime, while also handling negative numbers. There are few things I can't really catch up with.
1) Why is the first condition <= 3 when it's supposed to deal with negatives?
2) What does return n > 1 actually returns? And does it, in any way, affect other conditions?
bool IsItPrime(int n)
{
if(n <= 3) {
return n > 1;
} else if(n % 2 == 0 || n % 3 == 0) {
return false;
} else {
for(int i(5); i * i <= n; i += 6) {
if(n % i == 0 || n % (i + 2) == 0) {
return false;
}
}
return true;
}
}
if(n <= 3) {
return n > 1;
}
Is clever if not the easiest thing to reason out. First you enter the if body if you have a number of 3 or less. This covers the first 2 prime numbers plus all negative numbers, 0, and 1. Then it goes on to return n > 1; this means that if n is greater than 1 you will return true, otherwise false. So, if n is 2 or 3 the function will return true. It is is less than 2 then it returns false. It would be the same as
if (n <= 1) return false;
else if (n == 2 || n == 3) return true
else if ...
But as you can see that is more typing and adds an if statement.
according to wikipedia:
A prime number (or a prime) is a natural number greater than 1
return n > 1 means return true for any number greater than 1, (i.e. 2 or 3, because this appears after the test if n<=3). 0, 1 or a negative number will make it return false.
1) Why is the first condition <= 3 when it's supposed to deal with
negatives? 2) What does return n > 1 actually returns?
These questions are linked:
The function can take a negative argument, but as you can see the first set of conditions tests if the argument is between 1 and 3. 2,3 being prime, it will return true, if the argument is 1,0 or negative, it will return false (so it doesn't really work with negative numbers).
if(n <= 3) {
return n > 1;
}
This will return false for all numbers less than or equal to 3, except for 2 and 3. This handles negative numbers as well (any negative number will be less than 3).
else if(n % 2 == 0 || n % 3 == 0) {
return false;
}
This does a modulus operation on your argument, and returns false if the number is divisible by 2 or 3 (which would also make it not prime). We're safe to call this even though 2 and 3 are divisible since they're are handled above.
else {
for(int i(5); i * i <= n; i += 6) {
if(n % i == 0 || n % (i + 2) == 0) {
return false;
}
}
return true;
}
This handles all other digits by looping until you're above the square root of a number. It also does modulus operations, this time with 5, 7, and all values n + 6, n + 8 (<11, 13>, <17, 19>, etc). We can iterate by 6 since we're doing the 2 and 3 modulus operations above.

Reducing time complexity

int main()
{
int n ;
std::cin >> n; // or scanf ("%d", &n);
int temp;
if( n ==1 ) temp = 1; // if n is 1 number is power of 2 so temp = 1
if( n % 2 != 0 && n!= 1) temp =0; // if n is odd it can't be power of two
else
{
for (;n && n%2 == 0; n /= 2);
if(n > 0 && n!= 1) temp = 0; // if the loop breaks out because of the second condition
else temp = 1;
}
std::cout << temp; // or printf ("%d", temp);
}
The above code checks whether a number is power of two. The worst case runtime complexity is O(n). How to optimize the code by reducing the time complexity?
Try if( n && (n & (n-1)) == 0) temp = 1; to check whether a number is power of two or not.
For example :
n = 16;
1 0 0 0 0 (n)
& 0 1 1 1 1 (n - 1)
_________
0 0 0 0 0 (yes)
A number which is power of 2 has only one bit set.
n & (n - 1) unsets the rightmost set bit.
Running time O(1) ;-)
As #GMan noticed n needs to be an unsigned integer. Bitwise operation on negative signed integers is implementation defined.
How about this?
bool IsPowerOfTwo(int value)
{
return (value && ((value & (value-1)) == 0);
}
try this: bool isPowerOfTwo = n && !(n & (n - 1));
Instead of dividing a number by 2, you can right shift it by 1. This is an universal optimizing rule for division by 2,4,8,16,32 and so on. This division can be replaced by right shift 1,2,3,4,5 and so on. They are mathematically equivalent.
Shift is better, because the assembler division instruction is terribly inefficient on most CPU architectures. A logical shift right instruction is executed much faster.
However, the compiler should be able to do this optimization for you, or it has a pretty bad optimizer. Style-wise it may be better to write division and modulo operators in the C code, but verify that the compiler actually optimizes them to shift operations.
bool ifpower(int n)
{
if(log2(n)==ceil(log2(n))) return true;
else return false;
}