So in C++ i can do something like:
DWORD count;
count = 3 / 1.699999;
cout << count;
which will result in:
1
Delphi however complains Cardinal and Extended mismatch.
var
count: DWORD;
begin
count := 3 / 1.6;
Writeln(inttostr(count));
So i either have to round count := round(3 / 1.6) which results in:
2
or trunc count := trunc(3 / 1.6) which results in
1
Is trunc really the way to go?
Is there maybe a compiler switch i would have to toggle?
You would think it's easy to google something like that but trust me it isn't.
Thanks for your time!
C/C++ only has one arithmetic division operator - / - but its behavior depends on the type of operands you pass to it. It can perform both integer division and floating point division.
Delphi has two arithmetic division operations - div for integer division, and / for floating point division.
Your C++ code is performing floating point division, and then assigning the result to a DWORD, which is not a floating point type, so the assignment truncates off the decimal point:
1 / 1.699999 is 1.764706920415836, which truncates to 1.
In Delphi, the / operator returns an Extended, which is a floating-point type. Unlike C/C++, Delphi does not allow a floating-point type to be assigned directly to an integral type. You have to use Round() or Trunc().
In this case, the Delphi equivalent of your C++ code is to use Trunc():
var
count: DWORD;
begin
count := Trunc(3 / 1.699999);
Write(IntToStr(count));
The easiest is to use trunc(3 /1.699999)..
Another way is to use a previous multiplication before the division.
var
count: DWORD;
begin
count := 3;
count := (count*1000000) div 1699999;
Writeln(inttostr(count));
Of course, to avoid overflow, count should be < maxInt div 1000000.
Related
I am being paranoid that one of these functions may give an incorrect result like this:
std::floor(2000.0 / 1000.0) --> std::floor(1.999999999999) --> 1
or
std::ceil(18 / 3) --> std::ceil(6.000000000001) --> 7
Can something like this happen? If there is indeed a risk like this, I'm planning to use the functions below in order to work safely. But, is this really necessary?
constexpr long double EPSILON = 1e-10;
intmax_t GuaranteedFloor(const long double & Number)
{
if (Number > 0)
{
return static_cast<intmax_t>(std::floor(Number) + EPSILON);
}
else
{
return static_cast<intmax_t>(std::floor(Number) - EPSILON);
}
}
intmax_t GuaranteedCeil(const long double & Number)
{
if (Number > 0)
{
return static_cast<intmax_t>(std::ceil(Number) + EPSILON);
}
else
{
return static_cast<intmax_t>(std::ceil(Number) - EPSILON);
}
}
(Note: I'm assuming that the the given 'long double' argument will fit in the 'intmax_t' return type.)
People often get the impression that floating point operations produce results with small, unpredictable, quasi-random errors. This impression is incorrect.
Floating point arithmetic computations are as exact as possible. 18/3 will always produce exactly 6. The result of 1/3 won't be exactly one third, but it will be the closest number to one third that is representable as a floating point number.
So the examples you showed are guaranteed to always work. As for your suggested "guaranteed floor/ceil", it's not a good idea. Certain sequences of operations can easily blow the error far above 1e-10, and certain other use cases will require 1e-10 to be correctly recognized (and ceil'ed) as nonzero.
As a rule of thumb, hardcoded epsilon values are bugs in your code.
In the specific examples you're listing, I don't think those errors would ever occur.
std::floor(2000.0 /*Exactly Representable in 32-bit or 64-bit Floating Point Numbers*/ / 1000.0 /*Also exactly representable*/) --> std::floor(2.0 /*Exactly Representable*/) --> 2
std::ceil(18 / 3 /*both treated as ints, might not even compile if ceil isn't properly overloaded....?*/) --> 6
std::ceil(18.0 /*Exactly Representable*/ / 3.0 /*Exactly Representable*/) --> 6
Having said that, if you have math that depends on these functions behaving exactly correctly for floating point numbers, that may illuminate a design flaw you need to reconsider/reexamine.
As long as the floating-point values x and y exactly represent integers within the limits of the type you're using, there's no problem--x / y will always yield a floating-point value that exactly represents the integer result. Casting to int as you're doing will always work.
However, once the floating-point values go outside the integer-representable range for the type (Representing integers in doubles), epsilons don't help.
Consider this example. 16777217 is the smallest integer not exactly representable as a 32-bit float:
int ix=16777217, iy=97;
printf("%d / %d = %d", ix, iy, ix/iy);
// yields "16777217 / 97 = 172961" which is accurate
float x=ix, y=iy;
printf("%f / %f = %f", x, y, x/y);
// yields "16777216.000000 / 97.000000 = 172960.989691"
In this case, the error is negative; in other cases (try 16777219 / 1549), the error is positive.
While it's tempting to add an epsilon to make floor work, it won't extend the accuracy much. When the values differ by more orders of magnitude, the error becomes greater than 1 and integer-accuracy can't be guaranteed. Specifically, when x/y exceeds the max. representable, the error can exceed 1.0, so the epsilon is no help.
If this is coming into play, you will have to consider changing your mathematical approach--order of operations, work with logarithms, etc.
Such results are likely to appear when working with doubles. You can use round or you can subtract 0.5 then use std::ceil function.
Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:
#include <stdint.h>
#include <iostream>
using namespace std;
int main() {
double odds = 1.18;
int64_t st = 1000000000;
int64_t res = st * odds;
cout << "result: " << res << endl;
return 1;
}
I appreciate any feedback.
1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.
So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:
odds = 1 / 3 and st = 30000
Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:
odds = 0.333333
0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.
There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.
Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.
An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.
First of all - 1.18 is not exactly representable in double. Mathematically the result of:
double odds = 1.18;
is 1.17999999999999993782751062099 (according to an online calculator).
So, mathematically, odds * st is 1179999999.99999993782751062099.
But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:
Do the computation in double precision
Do the computation in higher precision and then round the result to double
Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.
However, doing it in long double precision produces something more like 1179999999.99999993782751062099
Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.
Then converting this next-lowest result to integer will truncate the fractional part.
There is an interesting blog post here where the author describes the behaviour of GCC:
It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)
According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.
I have found an algorithm to multiply in modoulus. The next pseudocode is taken from wikipedia, page Modular exponention, section Right-to-left binary method.
The full pseudocode is
function modular_pow(base, exponent, modulus)
Assert :: (modulus - 1) * (modulus - 1) does not overflow base
result := 1
base := base mod modulus
while exponent > 0
if (exponent mod 2 == 1):
result := (result * base) mod modulus
exponent := exponent >> 1
base := (base * base) mod modulus
return result
I don't understand what this line of pseudocode means Assert :: (modulus - 1) * (modulus - 1) does not overflow base; what does this line mean and how would it be best programmed in C++?
In most computer programming languages numbers can only be stored with a limited precision or over a certain range.
For example a C++ integer will often be a 32 bit signed int, capable of storing at most 2^31 as a value.
If you try and multiply together two numbers and the result would be greater than 2^31 you do not get the result you were expecting, it has overflowed.
Assert is (roughly speaking) a way to check preconditions; "this must be true to proceed". In C++ you'll code it using the assert macro, or your own hand-rolled assertion system.
'does not overflow' means that the statement shouldn't be too large to fit into whatever integer type base is; it's a multiplication so it's quite possible. Integer overflow in C++ is undefined behaviour, so it's wise to guard against it! There are plenty of resources out there to explain integer overflow, such as this article on Wikipedia
To do the check in C++, a nice simple approach is to store the intermediate result in a larger integer type and check that it's small enough to fit in the destination type. For example, if base is int32_t use int64_t and check that it's lower than static_cast<int64_t>(std::numeric_limits<int32_t>::max()):
const int64_t intermediate = (static_cast<int64_t>(modulus) - 1) * (static_cast<int64_t>(modulus) - 1);
assert(intermediate < static_cast<int64_t>(std::numeric_limits<int32_t>::max()));
I am doing Arithmetic Coding now, and I have got the final start position and distance, then I add them.
How can I convert the result to binary mode?
For example, how can I convert 0.125 decimal to 0.001 binary in C++?
void CArithmeticCoding::Encode()
{
if ( 0 == m_input )
return;
printf("The input is [%s].\n", this->m_input);
while (*m_input)
{
if ( *m_input == m_MPS )
{
DOMPS();
}
else
{
DOLPS();
}
++m_input;
}
double ret = m_start + m_dis;
return;
}
Arithmetic coding is done with integer data types for efficiency and predicability. There are no advantages and only disadvantages to using floating point types. You can simply consider the integer of n bits to be an n-bit fraction. As you take bits off the top, you renormalize the fraction to use those bits.
See Practical Implementations of Arithmetic Coding, and Introduction to Arithmetic Coding - Theory and Practice.
Converting anything to binary means finding out how many of each kind of power of 2 is involved. In the case of a decimal number, the powers involved are negative.
For .125, the sequence is like this:
.125 x 2 = .250 (< 1)
.250 x 2 = .500 (< 1)
.500 x 2 = 1.000 (>= 1)
.000 = 0 done
So, the binary representation is 0x2^-1 + 0x2^-2 + 1x2^-3 = .001 binary. As an exercise, contrast this technique with converting a normal integer into binary representation.
Just as regular decimals can have non-terminating patterns (like 1/3 or pi/4), the same can happen for the binary representations. In those cases, you have to stop the calculation when you reach your desired precision.
You should investigate IEEE 754
This is the standard for binary representation of floating point formats, both single and double precision
Suppose I have some code such as:
float a, b = ...; // both positive
int s1 = ceil(sqrt(a/b));
int s2 = ceil(sqrt(a/b)) + 0.1;
Is it ever possible that s1 != s2? My concern is when a/b is a perfect square. For example, perhaps a=100.0 and b=4.0, then the output of ceil should be 5.00000 but what if instead it is 4.99999?
Similar question: is there a chance that 100.0/4.0 evaluates to say 5.00001 and then ceil will round it up to 6.00000?
I'd prefer to do this in integer math but the sqrt kinda screws that plan.
EDIT: suggestions on how to better implement this would be appreciated too! The a and b values are integer values, so actual code is more like: ceil(sqrt(float(a)/b))
EDIT: Based on levis501's answer, I think I will do this:
float a, b = ...; // both positive
int s = sqrt(a/b);
while (s*s*b < a) ++s;
Thank you all!
I don't think it's possible. Regardless of the value of sqrt(a/b), what it produces is some value N that we use as:
int s1 = ceil(N);
int s2 = ceil(N) + 0.1;
Since ceil always produces an integer value (albeit represented as a double), we will always have some value X, for which the first produces X.0 and the second X.1. Conversion to int will always truncate that .1, so both will result in X.
It might seem like there would be an exception if X was so large that X.1 overflowed the range of double. I don't see where this could be possible though. Except close to 0 (where overflow isn't a concern) the square root of a number will always be smaller than the input number. Therefore, before ceil(N)+0.1 could overflow, the a/b being used as an input in sqrt(a/b) would have to have overflowed already.
You may want to write an explicit function for your case. e.g.:
/* return the smallest positive integer whose square is at least x */
int isqrt(double x) {
int y1 = ceil(sqrt(x));
int y2 = y1 - 1;
if ((y2 * y2) >= x) return y2;
return y1;
}
This will handle the odd case where the square root of your ratio a/b is within the precision of double.
Equality of floating point numbers is indeed an issue, but IMHO not if we deal with integer numbers.
If you have the case of 100.0/4.0, it should perfectly evaluate to 25.0, as 25.0 is exactly representable as a float, as opposite to e.g. 25.1.
Yes, it's entirely possible that s1 != s2. Why is that a problem, though?
It seems natural enough that s1 != (s1 + 0.1).
BTW, if you would prefer to have 5.00001 rounded to 5.00000 instead of 6.00000, use rint instead of ceil.
And to answer the actual question (in your comment) - you can use sqrt to get a starting point and then just find the correct square using integer arithmetic.
int min_dimension_greater_than(int items, int buckets)
{
double target = double(items) / buckets;
int min_square = ceil(target);
int dim = floor(sqrt(target));
int square = dim * dim;
while (square < min_square) {
seed += 1;
square = dim * dim;
}
return dim;
}
And yes, this can be improved a lot, it's just a quick sketch.
s1 will always equal s2.
The C and C++ standards do not say much about the accuracy of math routines. Taken literally, it is impossible for the standard to be implemented, since the C standard says sqrt(x) returns the square root of x, but the square root of two cannot be exactly represented in floating point.
Implementing routines with good performance that always return a correctly rounded result (in round-to-nearest mode, this means the result is the representable floating-point number that is nearest to the exact result, with ties resolved in favor of a low zero bit) is a difficult research problem. Good math libraries target accuracy less than 1 ULP (so one of the two nearest representable numbers is returned), perhaps something slightly more than .5 ULP. (An ULP is the Unit of Least Precision, the value of the low bit given a particular value in the exponent field.) Some math libraries may be significantly worse than this. You would have to ask your vendor or check the documentation for more information.
So sqrt may be slightly off. If the exact square root is an integer (within the range in which integers are exactly representable in floating-point) and the library guarantees errors are less than 1 ULP, then the result of sqrt must be exactly correct, because any result other than the exact result is at least 1 ULP away.
Similarly, if the library guarantees errors are less than 1 ULP, then ceil must return the exact result, again because the exact result is representable and any other result would be at least 1 ULP away. Additionally, the nature of ceil is such that I would expect any reasonable math library to always return an integer, even if the rest of the library were not high quality.
As for overflow cases, if ceil(x) were beyond the range where all integers are exactly representable, then ceil(x)+.1 is closer to ceil(x) than it is to any other representable number, so the rounded result of adding .1 to ceil(x) should be ceil(x) in any system implementing the floating-point standard (IEEE 754). That is provided you are in the default rounding mode, which is round-to-nearest. It is possible to change the rounding mode to something like round-toward-infinity, which could cause ceil(x)+.1 to be an integer higher than ceil(x).