How to convert Real to Int? - sml

Let's say I have 0.0 how do I convert it to 0?
I know that I can use Real.fromInt(0) to do the opposite (0 -> 0.0) but what about Real to Int?
In the SML documentation I read about a function toInt, but there was no example so I probably use it in the wrong way.
I tried this:
Real.toInt(a)
val a Real.toInt;
Both are wrong...

Real.toInt has type IEEEReal.rounding_mode -> real -> int so it requires that you specify a rounding mode. For example:
Real.toInt IEEEReal.TO_NEAREST 1.2;
evaluates to 1.
There is almost no reason to use Real.toInt in everyday programming. Instead, just use one of round, floor, ceil, trunc -- functions which correspond to the 4 rounding modes of TO_NEAREST, TO_NEGINF, TO_POSINF, TO_ZERO respectively. In other words, rather than use Real.toInt IEEEReal.TO_NEAREST 1.2, just use round 1.2. Of the four functions, trunc corresponds to how int() works in Python (and other languages), so is probably the most familiar one.
The only time that I can see for using Real.toInt is if you want to experiment with how different rounding modes affect a calculation. Unless you use a variable for the rounding mode which takes on different values in different calls, it is more readable to use one of the four conversion functions which have a hard-wired rounding mode.

Related

Why do lots of (old) programs use floor(0.5 + input) instead of round(input)?

The differences reside in the returned value giving inputs around tie-breaking I believe, such as this code:
int main()
{
std::cout.precision(100);
double input = std::nextafter(0.05, 0.0) / 0.1;
double x1 = floor(0.5 + input);
double x2 = round(input);
std::cout << x1 << std::endl;
std::cout << x2 << std::endl;
}
which outputs:
1
0
But they are just different results in the end, one chooses its preferred one. I see lots of "old" C/C++ programs using floor(0.5 + input) instead of round(input).
Is there any historic reason? Cheapest on the CPU?
std::round is introduced in C++11. Before that, only std::floor was available so programmers were using it.
There is no historic reason whatsoever. This kind of deviance has been around since year dot. It's an abuse of floating point arithmetic, and many experienced professional programmers fall for it. Even the Java bods did up to version 1.7. Funny guys.
My conjecture is that a decent out-of-the-box rounding function was not formally available until C++11 (despite C getting theirs in C99), but that really is no excuse for adopting the so-called alternative.
Here's the thing: floor(0.5 + input) does not always recover the same result as the corresponding std::round call!
The reason is quite subtle: the cutoff point for rounding, a.5 for an integer a is a dyadic rational. As this can be represented exactly in an IEEE754 floating point up to the 52nd power of 2, and thereafter rounding is a no-op anyway, std::round always works properly. For other floating point schemes, consult the documentation.
But adding 0.5 to a double can introduce imprecision causing a slight under or overshoot for some values. If you think about it, adding two double values together - that are the inception of unwitting denary conversions - and applying a function that is a very strong function of the input (such as a rounding function), is bound to end in tears.
Don't do it.
Reference: Why does Math.round(0.49999999999999994) return 1?
I think this is where you err:
But they are just different results in the end, one chooses its
preferred one. I see lots of "old" C/C++ programs using floor(0.5 +
input) instead of round(input).
That is not the case. You must select the right rounding scheme for the domain. In a financial application, you'll round using banker's rules (not using float by the way). When sampling, however, rounding up using static_cast<int>(floor(f + .5)) yields less sampling noise, this increments the dynamic range. When aligning pixels, i.e. converting a position to screen coordinates, using any other rounding method will yield holes, gaps, and other artifacts.
A simple reason could be that there are different methods of rounding numbers so unless you knew the method used, you could different results.
With floor(), you can be consistent with the results. If the float is .5 or greater, adding it will bump up to the next int. But .49999 will just drop the decimal.
Many programmers adapt idioms that they learned when programming with other languages. Not all languages have a round() function, and in those languages it's normal to use floor(x + 0.5) as a substitute. When these programmers start using C++, they don't always realize that there's a built-in round(), they continue to use the style they're used to.
In other words, just because you see lots of code that does something, it doesn't mean there's a good reason to do it. You can find examples of this in every programming language. Remember Sturgeon's Law:
ninety percent of everything is crap

Fortran - want to round to one decimal point

In fortran I have to round latitude and longitude to one digit after decimal point.
I am using gfortran compiler and the nint function but the following does not work:
print *, nint( 1.40 * 10. ) / 10. ! prints 1.39999998
print *, nint( 1.49 * 10. ) / 10. ! prints 1.50000000
Looking for both general and specific solutions here. For example:
How can we display numbers rounded to one decimal place?
How can we store such rounded numbers in fortran. It's not possible in a float variable, but are there other ways?
How can we write such numbers to NetCDF?
How can we write such numbers to a CSV or text file?
As others have said, the issue is the use of floating point representation in the NetCDF file. Using nco utilities, you can change the latitude/longitude to short integers with scale_factor and add_offset. Like this:
ncap2 -s 'latitude=pack(latitude, 0.1, 0); longitude=pack(longitude, 0.1, 0);' old.nc new.nc
There is no way to do what you are asking. The underlying problem is that the rounded values you desire are not necessarily able to be represented using floating point.
For example, if you had a value 10.58, this is represented exactly as 1.3225000 x 2^3 = 10.580000 in IEEE754 float32.
When you round this to value to one decimal point (however you choose to do so), the result would be 10.6, however 10.6 does not have an exact representation. The nearest representation is 1.3249999 x 2^3 = 10.599999 in float32. So no matter how you deal with the rounding, there is no way to store 10.6 exactly in a float32 value, and no way to write it as a floating point value into a netCDF file.
YES, IT CAN BE DONE! The "accepted" answer above is correct in its limited range, but is wrong about what you can actually accomplish in Fortran (or various other HGL's).
The only question is what price are you willing to pay, if the something like a Write with F(6.1) fails?
From one perspective, your problem is a particularly trivial variation on the subject of "Arbitrary Precision" computing. How do you imagine cryptography is handled when you need to store, manipulate, and perform "math" with, say, 1024 bit numbers, with exact precision?
A simple strategy in this case would be to separate each number into its constituent "LHSofD" (Left Hand Side of Decimal), and "RHSofD" values. For example, you might have an RLon(i,j) = 105.591, and would like to print 105.6 (or any manner of rounding) to your netCDF (or any normal) file. Split this into RLonLHS(i,j) = 105, and RLonRHS(i,j) = 591.
... at this point you have choices that increase generality, but at some expense. To save "money" the RHS might be retained as 0.591 (but loose generality if you need to do fancier things).
For simplicity, assume the "cheap and cheerful" second strategy.
The LHS is easy (Int()).
Now, for the RHS, multiply by 10 (if, you wish to round to 1 DEC), e.g. to arrive at RLonRHS(i,j) = 5.91, and then apply Fortran "round to nearest Int" NInt() intrinsic ... leaving you with RLonRHS(i,j) = 6.0.
... and Bob's your uncle:
Now you print the LHS and RHS to your netCDF using a suitable Write statement concatenating the "duals", and will created an EXACT representation as per the required objectives in the OP.
... of course later reading-in those values returns to the same issues as illustrated above, unless the read-in also is ArbPrec aware.
... we wrote our own ArbPrec lib, but there are several about, also in VBA and other HGL's ... but be warned a full ArbPrec bit of machinery is a non-trivial matter ... lucky you problem is so simple.
There are several aspects one can consider in relation to "rounding to one decimal place". These relate to: internal storage and manipulation; display and interchange.
Display and interchange
The simplest aspects cover how we report stored value, regardless of the internal representation used. As covered in depth in other answers and elsewhere we can use a numeric edit descriptor with a single fractional digit:
print '(F0.1,2X,F0.1)', 10.3, 10.17
end
How the output is rounded is a changeable mode:
print '(RU,F0.1,2X,RD,F0.1)', 10.17, 10.17
end
In this example we've chosen to round up and then down, but we could also round to zero or round to nearest (or let the compiler choose for us).
For any formatted output, whether to screen or file, such edit descriptors are available. A G edit descriptor, such as one may use to write CSV files, will also do this rounding.
For unformatted output this concept of rounding is not applicable as the internal representation is referenced. Equally for an interchange format such as NetCDF and HDF5 we do not have this rounding.
For NetCDF your attribute convention may specify something like FORTRAN_format which gives an appropriate format for ultimate display of the (default) real, non-rounded, variable .
Internal storage
Other answers and the question itself mention the impossibility of accurately representing (and working with) decimal digits. However, nothing in the Fortran language requires this to be impossible:
integer, parameter :: rk = SELECTED_REAL_KIND(radix=10)
real(rk) x
x = 0.1_rk
print *, x
end
is a Fortran program which has a radix-10 variable and literal constant. See also IEEE_SELECTED_REAL_KIND(radix=10).
Now, you are exceptionally likely to see that selected_real_kind(radix=10) gives you the value -5, but if you want something positive that can be used as a type parameter you just need to find someone offering you such a system.
If you aren't able to find such a thing then you will need to work accounting for errors. There are two parts to consider here.
The intrinsic real numerical types in Fortran are floating point ones. To use a fixed point numeric type, or a system like binary-coded decimal, you will need to resort to non-intrinsic types. Such a topic is beyond the scope of this answer, but pointers are made in that direction by DrOli.
These efforts will not be computationally/programmer-time cheap. You will also need to take care of managing these types in your output and interchange.
Depending on the requirements of your work, you may find simply scaling by (powers of) ten and working on integers suits. In such cases, you will also want to find the corresponding NetCDF attribute in your convention, such as scale_factor.
Relating to our internal representation concerns we have similar rounding issues to output. For example, if my input data has a longitude of 10.17... but I want to round it in my internal representation to (the nearest representable value to) a single decimal digit (say 10.2/10.1999998) and then work through with that, how do I manage that?
We've seen how nint(10.17*10)/10. gives us this, but we've also learned something about how numeric edit descriptors do this nicely for output, including controlling the rounding mode:
character(10) :: intermediate
real :: rounded
write(intermediate, '(RN,F0.1)') 10.17
read(intermediate, *) rounded
print *, rounded ! This may look not "exact"
end
We can track the accumulation of errors here if this is desired.
The `round_x = nint(x*10d0)/10d0' operator rounds x (for abs(x) < 2**31/10, for large numbers use dnint()) and assigns the rounded value to the round_x variable for further calculations.
As mentioned in the answers above, not all numbers with one significant digit after the decimal point have an exact representation, for example, 0.3 does not.
print *, 0.3d0
Output:
0.29999999999999999
To output a rounded value to a file, to the screen, or to convert it to a string with a single significant digit after the decimal point, use edit descriptor 'Fw.1' (w - width w characters, 0 - variable width). For example:
print '(5(1x, f0.1))', 1.30, 1.31, 1.35, 1.39, 345.46
Output:
1.3 1.3 1.4 1.4 345.5
#JohnE, using 'G10.2' is incorrect, it rounds the result to two significant digits, not to one digit after the decimal point. Eg:
print '(g10.2)', 345.46
Output:
0.35E+03
P.S.
For NetCDF, rounding should be handled by NetCDF viewer, however, you can output variables as NC_STRING type:
write(NetCDF_out_string, '(F0.1)') 1.49
Or, alternatively, get "beautiful" NC_FLOAT/NC_DOUBLE numbers:
beautiful_float_x = nint(x*10.)/10. + epsilon(1.)*nint(x*10.)/10./2.
beautiful_double_x = dnint(x*10d0)/10d0 + epsilon(1d0)*dnint(x*10d0)/10d0/2d0
P.P.S. #JohnE
The preferred solution is not to round intermediate results in memory or in files. Rounding is performed only when the final output of human-readable data is issued;
Use print with edit descriptor ‘Fw.1’, see above;
There are no simple and reliable ways to accurately store rounded numbers (numbers with a decimal fixed point):
2.1. Theoretically, some Fortran implementations can support decimal arithmetic, but I am not aware of implementations that in which ‘selected_real_kind(4, 4, 10)’ returns a value other than -5;
2.2. It is possible to store rounded numbers as strings;
2.3. You can use the Fortran binding of GIMP library. Functions with the mpq_ prefix are designed to work with rational numbers;
There are no simple and reliable ways to write rounded numbers in a netCDF file while preserving their properties for the reader of this file:
3.1. netCDF supports 'Packed Data Values‘, i.e. you can set an integer type with the attributes’ scale_factor‘,’ add_offset' and save arrays of integers. But, in the file ‘scale_factor’ will be stored as a floating number of single or double precision, i.e. the value will differ from 0.1. Accordingly, when reading, when calculating by the netCDF library unpacked_data_value = packed_data_value*scale_factor + add_offset, there will be a rounding error. (You can set scale_factor=0.1*(1.+epsilon(1.)) or scale_factor=0.1d0*(1d0+epsilon(1d0)) to exclude a large number of digits '9'.);
3.2. There are C_format and FORTRAN_format attributes. But it is quite difficult to predict which reader will use which attribute and whether they will use them at all;
3.3. You can store rounded numbers as strings or user-defined types;
Use write() with edit descriptor ‘Fw.1’, see above.

How come some people append f to the end of variables?

In the tutorial I'm reading for OGRE3d here the programmer is constantly adding f at the end of any variable he initializes, like 200.00f or 0.00f so I decided to erase f and see if it compiles and it compiles just fine, what is the point of adding f at the end of the variable?
EDIT: So you're saying if I initialize a variable with 200.03 it won't initialize it as a floating point but if I were to do so with 200.03f it would? If not where does the f become useful then?
It's a way to specify that number has to be interpreted as a "float", not a "double" (which is the standard for C++ decimal numbers and uses up twice the memory).
This discussion could be of help:
http://www.cplusplus.com/forum/beginner/24483/
Quoted from http://msdn.microsoft.com/en-us/library/w9bk1wcy.aspx
A floating-point constant without an f, F, l, or L suffix has type
double. If the letter f or F is the suffix, the constant has type
float. If suffixed by the letter l or L, it has type long double. For
example:
200.00f is not a variable. It can't vary.
It's a compile-time constant, with float representation. The f signifies that it's a float.
By comparison, 200.00 would be interpreted as a double.
The C standard states that constant floats are doubles which promotes the operation to a double.
float a,b,c;
...
a = b+7.1; this is a double precision operation
...
a = b+7.1f; this is a single precision operation
...
c = 7.1; //double
a = b + c; //single all the way
The double precision requires more storage for the constant, plus a conversion from single to double for the variable operand, then a conversion from double to single to assign the result. With all the conversions going on if you are not in tune with how floating point works, rounding and such you might not get the result you were thinking you were going to get. The compiler may at some point in the path optimize some of this behavior out, making it either harder to understand the real problems and the fpu in the hardware might accept mixed mode operands, also hiding what is really going on.
It is not just a speed problem but also accuracy. There was a recent SO question, pretty much the same problem, why does this comparison work with one number and not another. Take the fraction 5/11ths for example 0.454545.... Lets say, hypothetically, you had base 10 fpu with single precision of 3 significant digits and a double of 6 significant digits.
float a = 0.45454545454;
...
if(a>0.4545454545) b=1;
...
well in our hypothetical system we can only store three digits into a, so a = .455 because we are using by default a round up rounding mode. but our comparision will be considered double because we didnt put the f at the end of the number. the double version is 0.454545. a is converted to a double which results in 0.455000, so:
if(0.455000>0.454545) b = 1;
0.455 is greater than 0.454545 so b would be a 1.
float a = 0.45454545454;
...
if(a>0.4545454545f) b=1;
...
so now the comparison is single precision so we are comparing 0.455 to 0.455 which is not greater, so b=1 does not happen.
When you write floating point constants that is base 10 decimal, the floating point numbers in the computer are base 2 and they dont always convert smoothly just like 5/11 would work just fine in base 11 but in base 10 you get an infinite repeating digit. 0.1 in decimal for example creates a repeating pattern in binary. Depending on where the mantissa cuts off the rounding can make that lsbit of the mantissa round up or not (also depends on the rounding mode you are using if the floating point format you are using even has rounding). Which of itself creates problems depending on how you use the variable as the comparison above shows.
For non-floating point the compiler usually saves you, but sometimes doesnt:
unsigned long long a;
...
a = ~3;
a = ~(3ULL);
...
Depending on the compiler and computer the two assignments can give you different results one MIGHT give you 0x00000000FFFFFFFC another MIGHT give 0xFFFFFFFFFFFFFFFC.
If you want something specific you should be quite clear when you tell the compiler what you want otherwise the compiler takes a guess and doesnt always make the guess that you wanted.
It means that the value is to be interpreted as a single-precision floating point variable (type float). Without the f-suffix, it is interpreted as a double-precision floationg point variable (type double).
This is usually done to shut up compiler warnings about possible loss of precision by assigning a double value to a float variable. When you didn't receive such a warning you maybe have switched off warnings in your compiler settings (which is bad!).
But it can also have subtile syntactical meaning. As you know C++ allows functions which have the same name but differ by the types of their parameters. In that case the f suffix can determine which function is called.

What does the compiler do when it converts a float variable to an integer variable?

What does the compiler do? The aim is to get the number after the point as an integer. I did it like this:
float a = 0;
cin >> a;
int b = (a - (int)a)*10;
Now my problem is this: when I enter for example 3.2, I get 2, which is what I want. It also works with .4, .5 and .7. but when I enter for example 2.3, I get 2. For 2.7 I get 6 and so on. But when I do it without variables, for example:
(2.3 - (int)2.3)*10;
I get the correct result.
I couldn't figure out what the compiler does. I alway thought when I cast a float to an integer, then it simply cuts at the point. This is what the compiler actually does when I use constant numbers. However, when I use variables, the compiler reduces some of them, but not all.
You are most likely not having problems with the compiler, but with the fact that floating point numbers cannot be represented exactly on a binary computer.
So, when you do:
float f = 2.7f;
..what might actually be stored in the computer is:
2.6999999999999999
This is a very well-known characteristic of floating points on binary computers. There are many posts on SO that discuss this.
Basically, the problem comes from the fact that binary has different "infinitely repeating" values than base 10 does. For instance. 1/10 in decimal is 0.1, in binary, it's 0.000110011001100110011001100... The problem is caused because floating point cannot hold 2.3 correctly because it's an infinite number of binary digits, but it approximates closely, probably as 2.2999999. For most math, it's the close enough. But be wary of truncation.
One solution is to round before you truncate.
int b = (a - (int)(a+.05))*10;
Also note that floating point values have different sizes in memory than in the registers, which means you have to round when comparing if two floating point values are equal as well.
The reason for the discrepancy is that by default, floating point literals are doubles, which have higher accuracy, and are more closely able to represent the value you're looking for.
Why don't you do it like this?
b = (a*10)%10;
I find it a lot easier.

Preventing Rounding Errors

I was just reading about rounding errors in C++. So, if I'm making a math intense program (or any important calculations) should I just drop floats all together and use only doubles or is there an easier way to prevent rounding errors?
Obligatory lecture: What Every Programmer Should Know About Floating-Point Arithmetic.
Also, try reading IEEE Floating Point standard.
You'll always get rounding errors. Unless you use an infinite arbitrary precision library, like gmplib. You have to decide if your application really needs this kind of effort.
Or, you could use integer arithmetic, converting to floats only when needed. This is still hard to do, you have to decide if it's worth it.
Lastly, you can use float or double taking care not to make assumption about values at the limit of representation's precision. I'd wish this Valgrind plugin was implemented (grep for float)...
The rounding errors are normally very insignificant, even using floats. Mathematically-intense programs like games, which do very large numbers of floating-point computations, often still use single-precision.
This might work if your highest number is less than 10 billion and you're using C++ double precision.
if ( ceil(10000*(x + 0.00001)) > ceil(100000*(x - 0.00001))) {
x = ceil(10000*(x + 0.00004)) / 10000;
}
This should allow at least the last digit to be off +/- 9. I'm assuming dividing by 1000 will always just move a decimal place. If not, then maybe it could be done in binary.
You would have to apply it after every operation that is not +, -, *, or a comparison. For example, you can't do two divisions in the same formula because you'd have to apply it to each division.
If that doesn't work, you could work in integers by scaling the numbers up and always use integer division. If you need advanced functions maybe there is a package that does deterministic integer math. Integer division is required in a lot of financial settings because of round off error being subject to exploit like in the movie "The Office".