Why can't Crystal resolve the type of the assignment of 1 + 1? - crystal-lang

What's the reason why Crystal can't/won't resolve the type of this? (I see that the documentation does not mention that the compiler could infer instance method calls, but what's the rationale behind it, especially when only stdlib functions are involved? Compile time?)
class Something
def blah
#result = 1 + 1
end
end
Something.new().blah
Compiler error:
Showing last frame. Use --error-trace for full trace.
error in line 3
Error: can't infer the type of instance variable '#result' of SomeObject
The type of a instance variable, if not declared explicitly with
`#result : Type`, is inferred from assignments to it across
the whole program.
The assignments must look like this:
1. `#result = 1` (or other literals), inferred to the literal's type
2. `#result = Type.new`, type is inferred to be Type
3. `#result = Type.method`, where `method` has a return type
annotation, type is inferred from it
4. `#result = arg`, with 'arg' being a method argument with a
type restriction 'Type', type is inferred to be Type
5. `#result = arg`, with 'arg' being a method argument with a
default value, type is inferred using rules 1, 2 and 3 from it
6. `#result = uninitialized Type`, type is inferred to be Type
7. `#result = LibSome.func`, and `LibSome` is a `lib`, type
is inferred from that fun.
8. `LibSome.func(out #result)`, and `LibSome` is a `lib`, type
is inferred from that fun argument.
Other assignments have no effect on its type.
can't infer the type of instance variable '#result' of SomeObject

If I'm not mistaken, Crystal used to be more aggressive in deducing the types. While it is elegant in certain examples, it creates more issues in bigger projects. Compilation times became a problem (incremental compilation is hard or impossible), and errors in the code (e.g. because of a typo) could be harder to track down when everything propagates.
In the end, keeping the interference rules simpler and falling back to explicit types for expressions was considered more practical. Here is a discussion about the change from 2015. I'm not involved in the language design, but reading through the thread, I think the arguments in the thread apply to your question (why 1 + 1 needs to be explicitly typed). Note that 1 + 1 is a simple case, but expressions can become arbitrary more complex once you allow them. In general, the compiler would have to work through the whole program code to do the analysis.

Related

AnyLogic - dynamic schedule for resourcePool cast from Integer to TimeUnits

I refer to the following SOW post and the answer of Stuart Rossiter.
I thought it was right to open a new thread about this, as the problem can be looked at a little differently after all these years. Now I get the following error: "The method create_ShiftChange(double, TimeUnits) in the Main type is not applicable for the arguments (int, Integer)."
As I noted in my comment from Stuart Rossiter's solution, I believe the function create_ShiftChange(...) had different input arguments a few years ago.
The cast from getTimeoutToNextValue() to double is not a problem. However, the cast of the second argument getNextValue() from Integer to TimeUnits presents me with a challenge.
Does anyone have a solution for my problem or do I have to look for a detour, since the "old" create_ShiftChange(...) also has a different meaning due to the other input arguments? Thanks for the help!
There hasn't been a change in the create_* functions (methods) for dynamic events. There are two forms:
One where you explicitly specify the time units for when it should be scheduled (so with 2 initial arguments of type double and TimeUnits). TimeUnits is a Java enum (effectively what an AnyLogic option list is under the covers) with values like TimeUnits.MINUTE; auto-complete will show you the alternatives.
One where you implicitly assume the time units of the model as a whole, as in its properties (so with 1 initial argument of type double).
The dynamic event in question has a single int argument (i.e., its 'event-specific' data comprises just an integer), so the relevant create_* function variants have this as their final argument (i.e., they have 3 and 2 arguments respectively).
In your case, you are not using a dynamic event with a single argument (otherwise the method create_ShiftChange(double, TimeUnits) it's complaining about wouldn't exist — it would be create_ShiftChange(double, TimeUnits, int) instead) and, since you've called it with two integers, the compiler (incorrectly) assumes you were trying to use the 2 argument form, hence the error message.
So either add the argument to the dynamic event or, if in your case you're using a different set of arguments (or no arguments) for your dynamic event, change accordingly.
You simply need to type TimeUnits. (note the dot!) and then use code-complete. This shows you all the options you have available, choose the one you need.
Background: This is an enum defined by AnyLogic to be used for time units. When you see things like that, always type it out and try code-complete

How are quantities referenced in Fortran?

I was told a long time ago that in FORTRAN, everything is passed by value. Therefore I would need to do this (provided mySubroutine is suitably defined elsewhere):
double precision :: myArray(2)
myArray(1:2) = (/ 2.3d0, 1.5d0 /)
CALL mySubroutine(myArray)
However, I also found that the program compiles and runs as expected if I do this
CALL mySubroutine((/ 2.3d0, 1.5d0 /))
without needing to define an intermediary array myArray. I thought that I was passing myArray into mySubroutine by reference. What is going on under the hood in the second version? Is the compiler unpacking the subroutine call, declaring a temporary variable only to pass it by reference?
To a large extent, trying to classify Fortran procedure calling with pass-by-reference and pass-by-value is not too helpful. You can find more detail on that in response to questions like this one and this one.
In short, generally procedure references are such that changes to a variable in a procedure are reflected in the variable where the procedure was referenced. In some cases a compiler may choose to do copy-in/copy-out, and in others it effectively must. Equally, the value attribute of a dummy argument specifies that an anonymous copy be made.
Where this question adds something a little different is in the use of an expression such as in
call mySubroutine([2.3d0, 1.5d0]) ! Using F2003 array constructor syntax
Is the compiler creating a temporary variable?
Admittedly, this is perhaps just a looseness in terminology but it's worth saying that there is certainly no variable involved. [2.3d0, 1.5d0] is an expression, not a variable. Crucially this means that it cannot be modified (appear in a variable definition context) in the procedure. Restrictions that apply in the case using an expression rather than a (temporary) variable include:
the dummy argument associated with an expression may not have the intent(inout) or the intent(out) attribute;
if the dummy argument hasn't an intent attribute then that argument may not be modified if the associated actual argument is an expression.
Now, if the dummy argument has the value attribute the effect of the procedure is the same whichever way it is referenced.
To conclude, the program may work just as well with an expression instead of an intermediate variable. If it doesn't that's because of violation of some aspect of Fortran. How it works is a problem for the compiler not the programmer.

typed vs untyped vs expr vs stmt in templates and macros

I've been lately using templates and macros, but i have to say i have barely found information about these important types. This is my superficial understanding:
typed/expr is something that must exists previously, but you can use .immediate. to overcome them.
untyped/stmt is something that doesn't to be defined previously/one or more statements.
This is a very vague notion of the types. I'd like to have a better explanation of them, including which types should be used as return.
The goal of these different parameter types is to give you several increasing levels of precision in specifying what the compiler should accept as a parameter to the macro.
Let's imagine a hypothetical macro that can solve mathematical equations. It will be used like this:
solve(x + 10 = 25) # figures out that the correct value for x is 15
Here, the macro just cares about the structure of the supplied AST tree. It doesn't require that the same tree is a valid expression in the current scope (i.e. that x is defined and so on). The macro just takes advantage of the Nim parser that already can decode most of the mathematical equations to turn them into easier to handle AST trees. That's what untyped parameters are for. They don't get semantically checked and you get the raw AST.
On the next step in the precision ladder are the typed parameters. They allow us to write a generic kind of macro that will accept any expression, as long as it has a proper meaning in the current scope (i.e. its type can be determined). Besides catching errors earlier, this also has the advantage that we can now work with the type of the expression within the macro body (using the macros.getType proc).
We can get even more precise by requiring an expression of a specific type (either a concrete type or a type class/concept). The macro will now be able to participate in overload resolution like a regular proc. It's important to understand that the macro will still receive an AST tree, as it will accept both expressions that can be evaluated at compile-time and expressions that can only be evaluated at run-time.
Finally, we can require that the macro receives a value of specific type that is supplied at compile-time. The macro can work with this value to parametrise the code generation. This is realm of the static parameters. Within the body of the macro, they are no longer AST trees, but rather ordinary well typed values.
So far, we've only talked about expressions, but Nim's macros also accept and produce blocks and this is the second axis, which we can control. expr generally means a single expression, while stmt denotes a list of expressions (historically, its name comes from StatementList, which existed as a separate concept before expressions and statements were unified in Nim).
The distinction is most easily illustrated with the return types of templates. Consider the newException template from the system module:
template newException*(exceptn: typedesc, message: string): expr =
## creates an exception object of type ``exceptn`` and sets its ``msg`` field
## to `message`. Returns the new exception object.
var
e: ref exceptn
new(e)
e.msg = message
e
Even thought it takes several steps to construct an exception, by specifying expr as the return type of the template, we tell the compiler that only that last expression will be considered as the return value of the template. The rest of the statements will be inlined, but cleverly hidden from the calling code.
As another example, let's define a special assignment operator that can emulate the semantics of C/C++, allowing assignments within if statements:
template `:=` (a: untyped, b: typed): bool =
var a = b
a != nil
if f := open("foo"):
...
Specifying a concrete type has the same semantics as using expr. If we had used the default stmt return type instead, the compiler wouldn't have allowed us to pass a "list of expressions", because the if statement obviously expects a single expression.
.immediate. is a legacy from a long-gone past, when templates and macros didn't participate in overload resolution. When we first made them aware of the type system, plenty of code needed the current untyped parameters, but it was too hard to refactor the compiler to introduce them from the start and instead we added the .immediate. pragma as a way to force the backward-compatible behaviour for the whole macro/template.
With typed/untyped, you have a more granular control over the individual parameters of the macro and the .immediate. pragma will be gradually phased out and deprecated.

Widening of integral types?

Imagine you have this function:
void foo(long l) { /* do something with l */}
Now you call it like so at the call site:
foo(65); // here 65 is of type int
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
I have found this in the C++ Standard:
4.7 Integral conversions [conv.integral]
5 The conversions allowed as integral promotions are excluded from the set of integral conversions.
That a narrowing conversion isn't being done implicitly, I can think with, but here the destination type is obviously wider than the source type.
EDIT
This question is based on a question I saw earlier, which had funny behavior when you didn't specify the L suffix. Example, but perhaps it's a C thing, more than C++?!!
In C++ objects and values have a type, that is independent on how you use them. Then when you use them, if you need a different type it will be converted appropriately.
The problem in the linked question is that varargs is not type-safe. It assumes that you pass in the correct types and that you decode them for what they are. While processing the caller, the compiler does not know how the callee is going to decode each one of the arguments so it cannot possibly convert them for you. Effectively, varargs is as typesafe as converting to a void* and converting back to a different type, if you get it right you get what you pushed in, if you get it wrong you get trash.
Also note that in this particular case, with inlining the compiler has enough information, but this is just a small case of a general family if errors. Consider the printf family of functions, depending on the contents of the first argument each one of the arguments is processed as a different type. Trying to fix this case at the language level would lead to inconsistencies, where in some cases the compiler does the right thing or the wrong one and it would not be clear to the user when to expect which, including the fact that it could do the right thing today, and the wrong one tomorrow if during refactoring the function definition is moved and not available for inlining, or if the logic of the function changes and the argument is processed as one type or another based on some previous parameter.
The function in this instance does receive a long, not an int. The compiler automatically converts any argument to the required parameter type if it's possible without losing any information (as here). That's one of the main reasons function prototypes are important.
It's essentially the same as with an expression like (1L + 1) - because the integer 1 is not the right type, it's implicitly converted to a long to perform the calculation, and the result is a long.
If you pass 65L in this function call, no type conversion is necessary, but there's no practical difference - 65L is used either way.
Although not C++, this is the relevant part of the C99 standard, which also explains the var args note:
If the expression that denotes the called function has a type that
does include a prototype, the arguments are implicitly converted, as
if by assignment, to the types of the corresponding parameters, taking
the type of each parameter to be the unqualified version of its
declared type. The ellipsis notation in a function prototype
declarator causes argument type conversion to stop after the last
declared parameter. The default argument promotions are performed on
trailing arguments.
Why, (technically) when you specify in the declaration of your function that you are expecting a long and you pass just a number without the L suffix, is it being treated as an int?
Because the type of a literal is specified only by the form of the literal, not the context in which it is used. For an integer, that is int unless the value is too large for that type, or a suffix is used to specify another type.
Now, I know it is because the C++ Standard says so, however, what is the technical reason that this 65 isn't just promoted to being of type long and so save us the silly error of forgetting L suffix to make it a long explicitly?
The value should be promoted to long whether or not you specify that type explicitly, since the function is declared to take an argument of type long. If that's not happening, perhaps you could give an example of code that fails, and describe how it fails?
UPDATE: the example you give passes the literal to a function taking untyped ellipsis (...) arguments, not a typed long argument. In that case, the function caller has no idea what type is expected, and only the default argument promotions are applied. Specifically, a value of type int remains an int when passed through ellipsis arguments.
The C standard states:
"The type of an integer constant is the first of the corresponding list in which its value can be represented."
In C89, this list is:
int, long int, unsigned long int
C99 extends that list to include:
long long int, unsigned long long int
As such, when you code is compiled, the literal 65 fits in an int type, and so it's type is accordingly int. The int is then promoted to long when the function is called.
If, for instance, sizeof(int) == 2, and your literal is something like 64000, the type of the value will be a long (assuming sizeof(long) > sizeof(int)).
The suffixes are used to overwrite the default behavior and force the specified literal value to be of a certain type. This can be particularly useful when the integer promotion would be expensive (e.g. as part of an equation in a tight loop).
We have to have a standard meaning for types because for lower level applications, the type REALLY matters, especially for integral types. Low level operators (such as bitshift, add, ect) rely on the type of the input to determine overflow locations. ((65 << 2) with integers is 260 (0x104), but with a single char it is 4! (0x004)). Sometimes you want this behavior, sometimes you don't. As a programmer, you just need to be able to always know what the compiler is going to do. Thus the design decision was made to make the human explicitly declare the integral types of their constants, with "undecorated" as the most commonly used type, integer.
The compiler does automatically "cast" your constant expressions at compile time, such that the effective value passed to the function is long, but up until the cast it is considered an int for this reason.

Explain ML type inference to a C++ programmer

How does ML perform the type inference in the following function definition:
let add a b = a + b
Is it like C++ templates where no type-checking is performed until the point of template instantiation after which if the type supports the necessary operations, the function works or else a compilation error is thrown ?
i.e. for example, the following function template
template <typename NumType>
NumType add(NumType a, NumType b) {
return a + b;
}
will work for
add<int>(23, 11);
but won't work for
add<ostream>(cout, fout);
Is what I am guessing is correct or ML type inference works differently?
PS: Sorry for my poor English; it's not my native language.
I suggest you have a look at this article: What is Hindley-Milner? (and why is it cool)
Here is the simplest example they use to explain type inference (it's not ML, but the idea is the same):
def foo(s: String) = s.length
// note: no explicit types
def bar(x, y) = foo(x) + y
Just looking at the definition of bar, we can easily see that its type must be (String, Int)=>Int. That's type inference in a nutshell. Read the whole article for more information and examples.
I'm not a C++ expert, but I think templates are something else that is closer to genericity/parametricity, which is something different.
I think trying to relate ML type inference to almost anything in C++ is more likely to lead to confusion than understanding. C++ just doesn't have anything that's much like type inference at all.
The only part of C++ that doesn't making typing explicit is templates, but (for the most part) they support generic programming. A C++ function template like you've given might apply equally to an unbounded set of types -- just for example, the code you have uses NumType as the template parameter, but would work with strings. A single program could instantiate your add to add two strings in one place, and two numbers in another place.
ML type inference isn't for generic programming. In C or C++, you explicitly define the type of a parameter, and then the compiler checks that everything you try to do with that parameter is allowed by that type. ML reverses that: it looks at the things you do with the parameter, and figures out what the type has to be for you to be able to do those things. If you've tried to do things that contradict each other, it'll tell you there is no type that can satisfy the constraints.
This would be pretty close to impossible in C or C++, largely because of all the implicit type conversions that are allowed. Just for example, if I have something like a + b in ML, it can immediately conclude that a and b must be ints -- but in C or C++, they could be almost any combination of integer or pointer or floating point types (with the constraint that they can't both be pointers) or used defined types that overload operator+ (e.g., std::string). In ML, finding types can be exponential in the worst case, but is almost always pretty fast. In C++, I'd estimate it being exponential much more often, and in a lot of cases would probably be under-constrained, so a given function could have any of a number of different signatures.
ML uses Hindley-Milner type inference. In this simple case all it has to do is look at the body of the function and see that it uses + with the arguments and returns that. Thus it can infer that the arguments must be the type of arguments that + accepts (i.e. ints) and the function returns the type that + returns (also int). Thus the inferred type of add is int -> int -> int.
Note that in SML (but not CAML) + is also defined for other types than int, but it will still infer int when there are multiple possibilities (i.e. the add function you defined can not be used to add two floats).
F# and ML are somewhat similar with regards to type inference, so you might find
Overview of type inference in F#
helpful.