int a = 1/2 == 0.25 * 2;
I'm not sure why I'm not seeing this. Am I missing something with precedence?
Let digging:
int a = 1/2 == 0.25 * 2;
First, 1/2 == 0 (type of int), and 0.25 * 2 == 0.5 (type of double). So does 0 equal to 0.5? No. So a receives the value of 0 (FALSE).
Related
I have the following code to compute modulo between two floating point numbers:
auto mod(float x, float denom)
{
return x>= 0 ? std::fmod(x, denom) : denom + std::fmod(x + 1.0f, denom) - 1.0f;
}
It does only work partially for negative x:
-8 0
-7.75 0.25
-7.5 0.5
-7.25 0.75
-7 1
-6.75 1.25
-6.5 1.5
-6.25 1.75
-6 2
-5.75 2.25
-5.5 2.5
-5.25 2.75
-5 3
-4.75 -0.75 <== should be 3.25
-4.5 -0.5 <== should be 3.5
-4.25 -0.25 <== should be 3.75
-4 0
-3.75 0.25
-3.5 0.5
-3.25 0.75
-3 1
-2.75 1.25
-2.5 1.5
-2.25 1.75
-2 2
-1.75 2.25
-1.5 2.5
-1.25 2.75
-1 3
-0.75 3.25
-0.5 3.5
-0.25 3.75
0 0
How to fix it for negative x. Denom is assumed to be an integer greater than 0. Note: fmod as is provided by the standard library is broken for x < 0.0f.
x is in the left column, and the output is in the right column, like so:
for(size_t k = 0; k != 65; ++k)
{
auto x = 0.25f*(static_cast<float>(k) - 32);
printf("%.8g %.8g\n", x, mod(x, 4));
}
Note: fmod as is provided by the standard library is broken for x < 0.0f
I guess you want the result to always be a positive value1:
In mathematics, the result of the modulo operation is an equivalence class, and any member of the class may be chosen as representative; however, the usual representative is the least positive residue, the smallest non-negative integer that belongs to that class (i.e., the remainder of the Euclidean division).
The usual workaround was shown in Igor Tadetnik's comment, but that seems not enough.
#IgorTandetnik That worked. Pesky signed zero though, but I guess you cannot do anything about that.
Well, consider this(2, 3):
auto mod(double x, double denom)
{
auto const r{ std::fmod(x, denom) };
return std::copysign(r < 0 ? r + denom : r, 1);
}
1) https://en.wikipedia.org/wiki/Modulo
2) https://en.cppreference.com/w/cpp/numeric/math/copysign
3) https://godbolt.org/z/fdr9cbsYT
int sum_down(int x)
{
if (x >= 0)
{
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
}
else
{
return 1;
}
}
What is this smallest integer value of the parameter x, so that the returned value is greater than 1.000.000 ?
Right now I am just doing it by trial and error and since this question is asked via a paper format. I don't think I will have enough time to do trial and error. Question is, how do you guys visualise this quickly such that it can be solved easily. Thanks guys and I am new to programming so thanks in advance!
The recursion logic:
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
can be simplified to:
x = x - 1;
int y = x + sum_down(x) + sum_down(x);
return y;
which can be simplified to:
int y = (x-1) + sum_down(x-1) + sum_down(x-1);
return y;
which can be simplified to:
return (x-1) + 2*sum_down(x-1);
Put in mathematical form,
f(N) = (N-1) + 2*f(N-1)
with the recursion terminating when N is -1. f(-1) = 1.
Hence,
f(0) = -1 + 2*1 = 1
f(1) = 0 + 2*1 = 2
f(2) = 1 + 2*2 = 5
...
f(18) = 17 + 2*f(17) = 524269
f(19) = 18 + 2*524269 = 1048556
Your program can be written this way (sorry about c#):
public static void Main()
{
int i = 0;
int j = 0;
do
{
i++;
j = sum_down(i);
Console.Out.WriteLine("j:" + j);
} while (j < 1000000);
Console.Out.WriteLine("i:" + i);
}
static int sum_down(int x)
{
if (x >= 0)
{
return x - 1 + 2 * sum_down(x - 1);
}
else
{
return 1;
}
}
So at first iteration you'll get 2, then 5, then 12... So you can neglect the x-1 part since it'll stay little compared to the multiplication.
So we have:
i = 1 => sum_down ~= 4 (real is 2)
i = 2 => sum_down ~= 8 (real is 5)
i = 3 => sum_down ~= 16 (real is 12)
i = 4 => sum_down ~= 32 (real is 27)
i = 5 => sum_down ~= 64 (real is 58)
So we can say that sum_down(x) ~= 2^x+1. Then it's just basic math with 2^x+1 < 1 000 000 which is 19.
A bit late, but it's not that hard to get an exact non-recursive formula.
Write it up mathematically, as explained in other answers already:
f(-1) = 1
f(x) = 2*f(x-1) + x-1
This is the same as
f(-1) = 1
f(x+1) = 2*f(x) + x
(just switched from x and x-1 to x+1 and x, difference 1 in both cases)
The first few x and f(x) are:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
And while there are many arbitrary complicated ways to transform this into a non-recursive formula, with easy ones it often helps to write up what the difference is between each two elements:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
0 1 3 7 15
So, for some x
f(x+1) - f(x) = 2^(x+1) - 1
f(x+2) - f(x) = (f(x+2) - f(x+1)) + (f(x+1) - f(x)) = 2^(x+2) + 2^(x+1) - 2
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
With eg. a x=0 inserted, to make f(x+n) to f(n):
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
f(0+n) - f(0) = sum[0<=i<n](2^(0+1+i)) - n
f(n) - 1 = sum[0<=i<n](2^(i+1)) - n
f(n) = sum[0<=i<n](2^(i+1)) - n + 1
f(n) = sum[0<i<=n](2^i) - n + 1
f(n) = (2^(n+1) - 2) - n + 1
f(n) = 2^(n+1) - n - 1
No recursion anymore.
How about this :
int x = 0;
while (sum_down(x) <= 1000000)
{
x++;
}
The loop increments x until the result of sum_down(x) is superior to 1.000.000.
Edit : The result would be 19.
While trying to understand and simplify the recursion logic behind the sum_down() function is enlightening and informative, this snippet tend to be logical and pragmatic in that it does not try and solve the problem in terms of context, but in terms of results.
Two lines of Python code to answer your question:
>>> from itertools import * # no code but needed for dropwhile() and count()
Define the recursive function (See R Sahu's answer)
>>> f = lambda x: 1 if x<0 else (x-1) + 2*f(x-1)
Then use the dropwhile() function to remove elements from the list [0, 1, 2, 3, ....] for which f(x)<=1000000, resulting in a list of integers for which f(x) > 1000000. Note: count() returns an infinite "list" of [0, 1, 2, ....]
The dropwhile() function returns a Python generator so we use next() to get the first value of the list:
>>> next(dropwhile(lambda x: f(x)<=1000000, count()))
19
I am looking for a solution for cycling through consecutive numbers based on an input value. Similar to modulo, but different for negative numbers. Is there a better solution compared to the inefficient code below? Here is some input/output examples:
Numbers range 0 to 2
-2 -> 1
-1 -> 2
0 -> 0
1 -> 1
2 -> 2
3 -> 0
4 -> 1
//Inefficient Code example
int getConsecutiveVal(int min, int max, int input) //Inclusive in this scenario
{
while (input>max)
input -= (1+max-min);
while (input<min)
input += (1+max-min);
return input;
}
//Incorrect Code example since func(0,2,-1) returns 2
int getConsecutiveVal(int min, int max, int input)
{
return (input % (1+max-min))+min;
}
To be able to increment or decrement, I used the following function. It's more than 1 line, but fewer math operations. It's similar in spirit to the original poster's format. Tested for positive and negative cases.
int16_t cycleIncDec(int16_t x, int16_t dir, int16_t xmin, int16_t xmax) {
// inc/dec with constrained range
// the supplied xmax must be greater than xmin
x += dir;
if (x > xmax) x = xmin;
else if (x < xmin) x = xmax;
return x;
}
Output of cycleIncDec() with various start values and step sizes
x: 11: +1 0 1 2 3 4 5 6 0 1 2 3
x: 4: -1 3 2 1 0 -1 -2 -3 -4 -5 -6 -7
x: -8: -1 -13 -12 -11 -10 -9 -8 -13 -12 -11 -10 -9
x:-190: -2 -192 -194 -196 -198 -200 -170 -172 -174 -176 -178 -180
In principle, you need the modulo operator. The problem is that in C it doesn't work as expected for negative numbers.
If you know the minimum input value, you can just add a positive number x big enough to transform all negative numbers to positive. It won't affect the result if x % R = 0 (in your example R=3.)
In your example, if you add, say, 3*10 to all inputs and perform the modulo operation you'll get the desired result:
mod(3*10+[-2 -1 0 1 2 3 4], 3)
= 1 2 0 1 2 0 1
(the above is matlab notation and is specialized to the example you have presented. I'll leave it to you to extend it to arbitrary min/max)
A specific formula for the case you have presented:
You have suggested using
((input+abs(input)*(1+max-min)) % (1+max-min))+min
However, this formula does not work. For two reasons:
First, if input=0, the abs() returns 0 and you get the minimum value as output (This is not always what your explicit while-based loop produces)
Second, you forgot to subtract min from the input before the operation.
So the correct formula is the following (using x for input):
(x - xmin + (1+abs(x))*(1+xmax-xmin)) % (1+xmax-xmin) + xmin
You can call % twice to get you the right behaviour, since a%b, for positive b, is guaranteed to lie in [-b+1, b+1].
int getConsecutiveVal(int min, int max, int input)
{
int range_len = (1 + max - min);
input -= min;
return (((input % range_len) + range_len) % range_len) + min;
}
Can someone explain why 1.000000 <= 1.0f is false?
The code:
#include <iostream>
#include <stdio.h>
using namespace std;
int main(int argc, char **argv)
{
float step = 1.0f / 10;
float t;
for(t = 0; t <= 1.0f; t += step)
{
printf("t = %f\n", t);
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
}
printf("t = %f\n", t );
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
cout << "\n(1.000000 <= 1.0f) = " << (1.000000 <= 1.0f) << "\n";
}
The result:
t = 0.000000
t = 0
(t <= 1.0f) = 1
t = 0.100000
t = 0.1
(t <= 1.0f) = 1
t = 0.200000
t = 0.2
(t <= 1.0f) = 1
t = 0.300000
t = 0.3
(t <= 1.0f) = 1
t = 0.400000
t = 0.4
(t <= 1.0f) = 1
t = 0.500000
t = 0.5
(t <= 1.0f) = 1
t = 0.600000
t = 0.6
(t <= 1.0f) = 1
t = 0.700000
t = 0.7
(t <= 1.0f) = 1
t = 0.800000
t = 0.8
(t <= 1.0f) = 1
t = 0.900000
t = 0.9
(t <= 1.0f) = 1
t = 1.000000
t = 1
(t <= 1.0f) = 0
(1.000000 <= 1.0f) = 1
As correctly pointed out in the comments, the value of t is not actually the same 1.00000 that you are defining in the line below.
Printing t with higher precision with std::setprecision(20) will reveal its actual value: 1.0000001192092895508.
The common way to avoid these kinds of issues is to compare not with 1, but with 1 + epsilon, where epsilon is a very small number, that is maybe one or two magnitudes greater than your floating point precision.
So you would write your for loop condition as
for(t = 0; t <= 1.000001f; t += step)
Note that in your case, epsilon should be atleast ten times greater than the maximum possible floating point error, as the float is added ten times.
As pointed out by Muepe and Alain, the reason for t != 1.0f is that 1/10 can not be precisely represented in binary floating point numbers.
Floating point types in C++ (and most other languages) are implemented using an approach that uses the available bytes (for example 4 or 8) for the following 3 components:
Sign
Exponent
Mantissa
Lets have a look at it for a 32 bit (4 byte) type which often is what you have in C++ for float.
The sign is just a simple bit beeing 1 or 0 where 0 could mean its positive and 1 that its negative. If you leave every standardization away that exists you could also say 0 -> negative, 1 -> positive.
The exponent could use 8 bits. Opposed to our daily life this exponent is not ment to be used to the base 10 but base 2. That means 1 as an exponent does not correspond to 10 but to 2, and the exponent 2 means 4 (=2^2) and not 100 (=10^2).
Another important part is, that for floating point variables we also might want to have negative exponents like 2^-1 beeing 0.5, 2^-2 for 0.25 and so on. Thus we define a bias value that gets subtracted from the exponent and yields the real value. In this case with 8 bits we'd choose 127 meaning that an exponent of 0 gives 2^-127 and an exponent of 255 means 2^128. But, there is an exception to this case. Usually two values of the exponent are used to mark NaN and infinity. Therefore the real exponent is from 0 to 253 giving a range from 2^-127 to 2^126.
The mantissa obviously now fills up the remaining 23 bits. If we see the mantissa as a series of 0 and 1 you can imagine its value to be like 1.m where m is the series of those bits, but not in powers of 10 but in powers of 2. So 1.1 would be 1 * 2^0 + 1 * 2^-1 = 1 + 0.5 = 1.5. As an example lets have a look at the following mantissa (a very short one):
m = 100101 -> 1.100101 to base 2 -> 1 * 2^0 + 1 * 2^-1 + 0 * 2^-2 + 0 * 2^-3 + 1 * 2^-4 + 0 * 2^-5 + 1 * 2^-6 = 1 * 1 + 1 * 0.5 + 1 * 1/16 + 1 * 1/64 = 1.578125
The final result of a float is then calculated using:
e * 1.m * (sign ? -1 : 1)
What exactly is going wrong in your loop: Your step is 0.1! 0.1 is a very bad number for floating point numbers to base 2, lets have a look why:
sign -> 0 (as its non-negative)
exponent -> The first value smaller than 0.1 is 2^-4. So the exponent should be -4 + 127 = 123
mantissa -> For this we check how many times the exponent is 0.1 and then try to convert the fraction to a mantissa. 0.1 / (2^-4) = 0.1/0.0625 = 1.6. Considering the mantissa gives 1.m our mantissa should be 0.6. So lets convert that to binary:
0.6 = 1 * 2^-1 + 0.1 -> m = 1
0.1 = 0 * 2^-2 + 0.1 -> m = 10
0.1 = 0 * 2^-3 + 0.1 -> m = 100
0.1 = 1 * 2^-4 + 0.0375 -> m = 1001
0.0375 = 1 * 2^-5 + 0.00625 -> m = 10011
0.00625 = 0 * 2^-6 + 0.00625 -> m = 100110
0.00625 = 0 * 2^-7 + 0.00625 -> m = 1001100
0.00625 = 1 * 2^-8 + 0.00234375 -> m = 10011001
We could continue like thiw until we have our 23 mantissa bits but i can tell you that you get:
m = 10011001100110011001...
Therefore 0.1 in a binary floating point environment is like 1/3 is in a base 10 system. Its a periodic infinite number. As the space in a float is limited there comes the 23rd bit where it just has to cut of, and therefore 0.1 is a tiny bit greater than 0.1 as there are not all infinite parts of the number in the float and after 23 bits there would be a 0 but it gets rounded up to a 1.
The reason is that 1.0/10.0 = 0.1 can not be represented exactly in binary, just as 1.0/3.0 = 0.333.. can not be represented exactly in decimals.
If we use
float step = 1.0f / 8;
for example, the result is as expected.
To avoid such problems, use a small offset as shown in the answer of mic_e.
I'm trying to write a code for modulus, but when I do it for negative numbers I don't get the right result, my code:
double mod (double X, double Y)
{
double result = X;
if (X>0){
do
{
result = result - Y;
}while(result >= Y);
}
if (X<0){
do
{
result = result + Y;
}while(0 >= result);
}
}
When you do something like mod(-5,2) it should return -1 but it returns 1, why does it return 1 when it can't be greater than 0?
In my mind I thought it works like -5 + 2 = -3 + 2 = -1. For positive it would be 5 - 2 = 3 - 2 = 1.
Thanks.
EDIT: I am trying to do this without using CMATH using my own math library.
EDIT: My return result is in a later part of the program and does show output. This just a block of the entire program itself.
Examine this part of code only:
if (X<0){
do
{
result = result + Y;
}while(0 >= result);
}
Let's say that X is -5, and result is 0.
do loop will be executed:
1. pass - result = -3
2. pass - result = -1
3. pass - result = -1 + 2 = 1
3. pass will be executed as result from 2. pass is still less than zero.
You need to change your loop condition to while(0 >= result + Y)
You are missing
return result;
consider the integer sequence
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
when taking mod(5,2) we are taking multiple of 2 less than or equal to 5 then return difference with 5. ie here 1 (5-4)
for negative numbers according to this concept
for mod(-5, 2). number which is lesser than -5 that is divisible by 2 is -6(not -4; -4 > -5). and its difference is (-5 - -6) which is 1.
That is what happening in your code