Why can't you add int and float literals in F#? - casting

In F#, if you write 3 + 2.5 you get a syntax error. You have to write e.g. 3. + 2.5 to make it work, which can get annoying in math-heavy domains with many numeric literals.
Seeing as many other languages (e.g. C#) handle this just fine, is there a particular reason why F# doesn't implicitly convert int literals to float (which is a lossless conversion as far as I know) when performing arithmetic operations?

It's true that int to float is "safe". However, the lack of implicit conversion between types in general is considered to be a good feature of F# by many, as others have mentioned.
F# has much more extensive type-inference than C#. The types that are inferred by usage can be passed all the way through a large codebase. Implicit conversion between numeric types could complicate that inference can make it harder to understand type errors and increase the maintenance burden of the compiler code itself. In fact, F# doesn't perform any implicit conversions defined in C#.
By eliminating unnecessary casts, implicit conversions can improve source code readability. However, because implicit conversions do not require programmers to explicitly cast from one type to the other, care must be taken to prevent unexpected results.
Again, this decreases convenience but reduces the chance of incorrect behaviour, which can be a much bigger inconvenience later on, or for someone else.
Basically, this approach trades some convenience for another convenience (not having to write type names everywhere) and some increased safety/explicitness. I personally think it's a good trade-off for F#.

F# is a functional first language, one of the core values in functional languages is being able to reason about your code. Which is a fancy way of saying it's easy to understand what your code and what it is doing. Now explicit operations mean that your code will be easier to reason about, don't believe me?
Here is some python code that takes a number, turns it into a string, then back into a number, guess what it returns:
float(str(0.47000000000000003))
Did you guess 0.47000000000000003? Sorry, it's actually 0.46999999999999997! There are all sorts of weirdness like that when converting from double to decimal to float! Best to pick a type and stick to it. Now constantly having to specify the type might seem annoying at first, but the value of never having to worry about what types your functions are using, vs the the types being sent in... god help you if a library chose types for you as well... well, let's just say you will appreciate the explicitness as time goes on ;)

Related

Do type conversions slow program running?

Th title is quite obvious.
In my case, and for the sake of simplicity, I avoid using, for instance, unsigned int instead of int, as it makes coding faster and simpler.
(BTW, Im using an Android IDE, CppDroid)
Yet, the IDE frequently alerts me to implicit conversions at, for example, For loops where the incremented variable (int) is compared with the size of a vector (size_t/unsigned int).
My questions are:
Do type conversions take time?
If so, how long do they take compared to other common operations?
In the case convertions do take some time, is it worth to correctly define variables in order to avoid convertions?
Your question is valid, although the goal is misconstrued. It is paramount to correctly define variables, but not because of mysterious performance.
It is to ensure correctness. Comparing unsigned integer with signed one is a ticking bomb, as well as (most usually) comparing size_t with integer.
For example, consider following snippet:
for (int i = 0; i < vec.size(); ++i) { }
For all you know, this code can lead to undefined behavior! If the size of the vector is bigger than maximum size signed integer can hold (which is usually the case with 64bit systems) your integer will be overflowing, which is undefined. Compiler might just remove the loop altogether, if it can proove that size of the vector is bigger than maximum int!
Similar looking (and incorrecet as well) line
for (unsigned int i = 0; i < vec.size(), ++i) { }
Is not going to cause undefined behaviour, but it will hang the program when vector size is greater than maximum int. No good thing either.
And of course, the correct way of doing this is
for (decltype(vec.size()) i = 0; i < vec.size(), ++i) { }
Depends what you convert to what.
That particular warning of signed/unsigned mismatch results in zero overhead, but you may end treating negative number as huge unsigned one (or other way around) - so as long as you are using int, and you don't expect to break into 2^31 numbers land, you are safe.
As safe, as people writing file I/O routines around 1990 (never expecting to see 3GiB file in their life). ...not very funny nowadays (still so much SW is broken on 2+GiB file size).
Some other conversions like from int to uint_8 may have tiny overhead, so it's better to avoid them - if possible (by designing the code to use the desired data type all around).
I would firstly address clarity and functionality of the code, and that usually leads to usage of particular data type for particular value all the time, without any conversion.
After the code works, you can measure the performance and consider what optimization makes sense (including usage of mismatched data types with conversions between them).
conclusion: just fix it, use proper data type.
Type conversions might give performance hits (when signedness, bitness or conversion between floating-point types are involved), but, as a general rule, the type identity of the many things in a program is merely a conceptual language front-end feature. When such hits do happen, however, it is because the types involved mean reasonably different things, and hence code must be emitted in order to properly solve the conversion.
Another thing which is completely different from the above is the invocation of type conversion operators in C++, which can run arbitrary code and, thus, most obviously influence in the final program behavior (not only performance).
As mentioned by others, correct use of the type system is most important for program correctness, at least or specially in languages such as C and C++. Using mismatched types can affect the program behavior in some corner cases, albeit can have no impact whatsoever on the execution time otherwise.
It depends.
Actually converting the data between types will require extra calculations. That much should probably be obvious. Usually those calculations take extra time, so they will have a performance impact. However, there are several factors that mitigate the actual impact of this:
The compiler can optimize types in some cases to minimize conversions.
Some platforms implement certain conversions in hardware.
The primary concern surrounding calculations with unlike types typically has much less to do with performance and more to do with safety and producing expected results. That is why the compiler is warning you; the vast majority of compilers will not tell you that you are doing something inefficient, but they will tell you that you are doing something dangerous.
For example, comparing an int with an unsigned int is asking for trouble. The int has a negative range, the unsigned int has a larger positive range. On conversion to unsigned int, the negative range of the int will appear to be larger than its own positive range. It is very easy to generate endless loops or out-of-bounds errors with such a construct.
You should normally only worry yourself about type conversion performance if you are dealing with huge amounts of data - like large vectors/arrays that need converted between formats. You would have to loop over the data in some way, so it would be a conscious action. For example, converting a 10000 element vector of chars to ints. In these cases, you might need to consider if you have a design flaw that is requiring needless conversion of data.
It is worth pointing out that in the above example, even if the conversion itself were instant, the iteration and copy is not.
As for an example of platforms where this can be done to an extent in hardware, most video cards are able to interpret integers as floats on a normalized range, of the sort 255 --> 1.0. However, many other conversions, like conversions between image formats, are still done in software.
Given the platform and optimization details vary greatly, answering how long a given conversion takes relative to other operations is effectively impossible. If you are dealing with enough data that a conversion is creating a noticeable performance bottleneck, then profile that conversion.
It is worth it to make sure your types match to the best of your ability if you are dealing with enough data for it to matter; that is a subjective measurement of value, though, and will depend on what you are doing.
It is always worth it to make sure implicit type conversions do not cause errors, as errors due to them can be some of the worst possible in C/C++ (memory leaks, buffer overflows, access violations, etc.).

Is implicit casting considered to be a bad concept? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why is implicit casting even allowed? I mean what is the benefit of casting a float to an int implicitly? Doesn't explicit casting makes a more readable and easier to debug code?
answer : Yes it is and here is an example
#include <stdio.h>
int main()
{
unsigned int a=1;
int b=-1;
if(b>a)
printf("-1 > 1 \n");
else
printf("boring!\n");
return 0;
}
If you execute this code you will get
-1 > 1
This is due to the implicit cast of the variable b (b will be casted to unsigned int and turn -1 to 4294976295 which is bigger than 1) which sometimes makes a problem so it will be a good habit to make explicit cast in order to make things clear for you and for programmers working on the same project !!
Question:
Why is implicit casting even allowed? I mean what is the benefit of
casting a float to an int implicitly?
The maintainer of the C FAQ, Steve Summit, says in a tutorial:
The default conversion rules serve two purposes. One is purely selfish
on the compiler's part: it does not want to have to know how to
generate code to add, say, a floating-point number to an integer. The
compiler would much prefer if all operations operated on two values of
the same type: two integers, two floating-point numbers, etc. (Indeed,
few processors have an instruction for adding a floating-point number
to an integer; most have instructions for adding two integers, or two
floating-point numbers.) The other purpose for the default conversions
is the programmer's convenience: the mentality that ``the computer and
the compiler are stupid, we programmers must specify everything in
excruciating detail'' can be carried too far, and it's reasonable to
define the language such that certain conversions are performed
implicitly and automatically by the compiler, when it's unambiguous
and safe to do so.
Question 2:
Doesn't explicit casting makes a more readable and easier to debug
code?
Answer:
As mentioned above this is how implicit conversions happen but
explicit conversions "YES" adds readability.
First: the large number of implicit conversions in C++ is due to
historical reasons, and nothing else. I don't think any one considers
all of them a good idea. On the other hand, there are many different
types of implicit conversions, and some of them are almost essential to
the language: you wouldn't like it if you needed an explicit conversion
to pass a MyType x; to a function taking a MyType const&; I'm pretty
sure that there is a consensus that const conversions adding const, like
this one, should be implicit.
With regards to conversions where there isn't a consensus:
Almost no one seems to have a problem with non-lossy conversion;
things like int to long, or float to double. Most people also
seem to accept conversions from integral types to floating point (eg
int to double), although these can loose precision in some cases.
(int i = 123456789; float f = i;, for example.)
There was a proposal during the standardization of C++98 to deprecate
narrowing conversions, like float to int. (The author of the
proposal was Stroustrup; if you don't like such conversions, you're in
good company.) It didn't pass; I don't know why exactly, but I suspect
that it was a question of breaking too much from the traditions of C.
In C++11, such conversions are forbidden in some newer constructs,
like the new initialization sequences. So it sounds to me like there is
a consensus that these implicit conversions aren't really a good idea,
but that they can't be removed for fear of breaking code or maybe just
breaking with the tradition in C. (I know that more than a few people
don't like the fact that someString += 3.14159; is a legal statement,
adding an ETX character to the end of the string.)
The original proposal for bool proposed deprecating all of the
conversions of numeric and pointer types to bool. This was removed;
it soon became apparent that the proposal wouldn't pass if it made
things like if ( somePointer ) (as opposed to
if ( somePointer != NULL )) illegal. There is still a large body of
people (myself included) who consider such conversions "bad", and avoid
them.
Finally: a compiler is free to issue a warning for anything it feels
like. If the market insisted warnings for such conversions, compilers
would implement them (probably as an option). I suspect that the
reason they don't is that the warnings have a bad reputation, due to
the initial implementations generating too many warnings. Integral
promotion leads to a number of narrowing conversions that no one wants
to eliminate:
char ch = '0' + v % 10;
for example, involves an int to char conversion (which is
narrowing); in C++11:
char ch{ '0' + v % 10 };
is illegal (but both VC++ and g++ accept it, g++ with a warning). I
suspect that to be usable, banning narrowing conversions would at least
have to make exceptions for cases where the wider type is itself due to
integral promotion, mixed type arithmetic and cases where the source
expression is a compile time constant which "fits" in the target type.
Obviously breaking old code is what prevents new versions of the languages (C and C++) to change the rules. So the question is why this was permitted in the first place when C was conceived. The fundamental reason is, I understand, that C was modeled to be close to the hardware, and hardware doesn't (often, fundamentally) distinguish between addresses, integers and boolean types. Thus code like int i=10; while(i--) doSomething(i); or int *p, offset; ... if(p) doSomethingElse(p+offset); is almost directly translatable to machine code. In fact, it is not far from a macro assembler, most differences being the niceties around function calls. In my opinion it is also extremely readable. Any additional casts or explicit comparisons would compromise the bare-bones visibility of the logic. But that, of course, is a matter of taste and programming socialization.
And then yes, experience not available in the 70s has shown that some of the implicit conversions are sources of errors. If K&R could conceive C again they would probably change a few (literally few) things. The world being as it is though we have to make do with compiler warnings.

Performance of Initialization from Different Type

I'm porting some code, and the original author was evidently quite concerned with squeezing as much performance as possible out of the code.
Throughout (and there's hundreds of source files), there are lots of things like this:
float f = (float)(6);
type_float tf = (type_float)(0); //type_float is a typedef of float xor double
In short, the author tried to make the RHS of assignments equal to the variable being assigned into. The aim, I presume, was to coerce the compiler into making e.g. the 6 in the first example into 6.0f so that no conversion overhead happens when that value is copied into the variable.
This would actually be useful for something like the second example, where the proper form of the literal (one of {0.0f,0.0}) isn't known/can be changed from a line far away. However, I can see it being problematic if the literal is converted and stored into a temporary and then copied, instead of the conversion happening on copy.
Is this author onto something here? Are all these literals actually being stored with the intended type? Or is this just a massive waste of source file bits? What is the best way to handle these sorts of cases in modern code?
Note: I believe this applies to both C and C++, so I have applied both tags.
This is a complete waste. No modern optimizing compiler will keep any track of intermediate values, but directly initialize with the final correct value. There is really no point in it, default conversion should always do the right thing, here. And yes this should apply to both, C and C++, and they shouldn't differ much in behavior.

Use of Literals, yay/nay in C++

I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.

`short int` vs `int`

Should I bother using short int instead of int? Is there any useful difference? Any pitfalls?
short vs int
Don't bother with short unless there is a really good reason such as saving memory on a gazillion values, or conforming to a particular memory layout required by other code.
Using lots of different integer types just introduces complexity and possible wrap-around bugs.
On modern computers it might also introduce needless inefficiency.
const
Sprinkle const liberally wherever you can.
const constrains what might change, making it easier to understand the code: you know that this beastie is not gonna move, so, can be ignored, and thinking directed at more useful/relevant things.
Top-level const for formal arguments is however by convention omitted, possibly because the gain is not enough to outweight the added verbosity.
Also, in a pure declaration of a function top-level const for an argument is simply ignored by the compiler. But on the other hand, some other tools may not be smart enough to ignore them, when comparing pure declarations to definitions, and one person cited that in an earlier debate on the issue in the comp.lang.c++ Usenet group. So it depends to some extent on the toolchain, but happily I've never used tools that place any significance on those consts.
Cheers & hth.,
Absolutely not in function arguments. Few calling conventions are going to make any distinction between short and int. If you're making giant arrays you could use short if your data fits in short to save memory and increase cache effectiveness.
What Ben said. You will actually create less efficient code since all the registers need to strip out the upper bits whenever any comparisons are done. Unless you need to save memory because you have tons of them, use the native integer size. That's what int is for.
EDIT: Didn't even see your sub-question about const. Using const on intrinsic types (int, float) is useless, but any pointers/references should absolutely be const whenever applicable. Same for class methods as well.
The question is technically malformed "Should I use short int?". The only good answer will be "I don't know, what are you trying to accomplish?".
But let's consider some scenarios:
You know the definite range of values that your variable can take.
The ranges for signed integers are:
signed char — -2⁷ – 2⁷-1
short — -2¹⁵ – 2¹⁵-1
int — -2¹⁵ – 2¹⁵-1
long — -2³¹ – 2³¹-1
long long — -2⁶³ – 2⁶³-1
We should note here that these are guaranteed ranges, they can be larger in your particular implementation, and often are. You are also guaranteed that the previous range cannot be larger than the next, but they can be equal.
You will quickly note that short and int actually have the same guaranteed range. This gives you very little incentive to use it. The only reason to use short given this situation becomes giving other coders a hint that the values will be not too large, but this can be done via a comment.
It does, however, make sense to use signed char, if you know that you can fit every potential value in the range -128 — 127.
You don't know the exact range of potential values.
In this case you are in a rather bad position to attempt to minimise memory useage, and should probably use at least int. Although it has the same minimum range as short, on many platforms it may be larger, and this will help you out.
But the bigger problem is that you are trying to write a piece of software that operates on values, the range of which you do not know. Perhaps something wrong has happened before you have started coding (when requirements were being written up).
You have an idea about the range, but realise that it can change in the future.
Ask yourself how close to the boundary are you. If we are talking about something that goes from -1000 to +1000 and can potentially change to -1500 – 1500, then by all means use short. The specific architecture may pad your value, which will mean you won't save any space, but you won't lose anything. However, if we are dealing with some quantity that is currently -14000 – 14000, and can grow unpredictably (perhaps it's some financial value), then don't just switch to int, go to long right away. You will lose some memory, but will save yourself a lot of headache catching these roll-over bugs.
short vs int - If your data will fit in a short, use a short. Save memory. Make it easier for the reader to know how much data your variable may fit.
use of const - Great programming practice. If your data should be a const then make it const. It is very helpful when someone reads your code.