typed vs untyped vs expr vs stmt in templates and macros - templates

I've been lately using templates and macros, but i have to say i have barely found information about these important types. This is my superficial understanding:
typed/expr is something that must exists previously, but you can use .immediate. to overcome them.
untyped/stmt is something that doesn't to be defined previously/one or more statements.
This is a very vague notion of the types. I'd like to have a better explanation of them, including which types should be used as return.

The goal of these different parameter types is to give you several increasing levels of precision in specifying what the compiler should accept as a parameter to the macro.
Let's imagine a hypothetical macro that can solve mathematical equations. It will be used like this:
solve(x + 10 = 25) # figures out that the correct value for x is 15
Here, the macro just cares about the structure of the supplied AST tree. It doesn't require that the same tree is a valid expression in the current scope (i.e. that x is defined and so on). The macro just takes advantage of the Nim parser that already can decode most of the mathematical equations to turn them into easier to handle AST trees. That's what untyped parameters are for. They don't get semantically checked and you get the raw AST.
On the next step in the precision ladder are the typed parameters. They allow us to write a generic kind of macro that will accept any expression, as long as it has a proper meaning in the current scope (i.e. its type can be determined). Besides catching errors earlier, this also has the advantage that we can now work with the type of the expression within the macro body (using the macros.getType proc).
We can get even more precise by requiring an expression of a specific type (either a concrete type or a type class/concept). The macro will now be able to participate in overload resolution like a regular proc. It's important to understand that the macro will still receive an AST tree, as it will accept both expressions that can be evaluated at compile-time and expressions that can only be evaluated at run-time.
Finally, we can require that the macro receives a value of specific type that is supplied at compile-time. The macro can work with this value to parametrise the code generation. This is realm of the static parameters. Within the body of the macro, they are no longer AST trees, but rather ordinary well typed values.
So far, we've only talked about expressions, but Nim's macros also accept and produce blocks and this is the second axis, which we can control. expr generally means a single expression, while stmt denotes a list of expressions (historically, its name comes from StatementList, which existed as a separate concept before expressions and statements were unified in Nim).
The distinction is most easily illustrated with the return types of templates. Consider the newException template from the system module:
template newException*(exceptn: typedesc, message: string): expr =
## creates an exception object of type ``exceptn`` and sets its ``msg`` field
## to `message`. Returns the new exception object.
var
e: ref exceptn
new(e)
e.msg = message
e
Even thought it takes several steps to construct an exception, by specifying expr as the return type of the template, we tell the compiler that only that last expression will be considered as the return value of the template. The rest of the statements will be inlined, but cleverly hidden from the calling code.
As another example, let's define a special assignment operator that can emulate the semantics of C/C++, allowing assignments within if statements:
template `:=` (a: untyped, b: typed): bool =
var a = b
a != nil
if f := open("foo"):
...
Specifying a concrete type has the same semantics as using expr. If we had used the default stmt return type instead, the compiler wouldn't have allowed us to pass a "list of expressions", because the if statement obviously expects a single expression.
.immediate. is a legacy from a long-gone past, when templates and macros didn't participate in overload resolution. When we first made them aware of the type system, plenty of code needed the current untyped parameters, but it was too hard to refactor the compiler to introduce them from the start and instead we added the .immediate. pragma as a way to force the backward-compatible behaviour for the whole macro/template.
With typed/untyped, you have a more granular control over the individual parameters of the macro and the .immediate. pragma will be gradually phased out and deprecated.

Related

In OCaml Menhir, how to write a parser for C++/Rust/Java-style generics

In C++, a famous parsing ambiguity happens with code like
x<T> a;
Is it if T is a type, it is what it looks like (a declaration of a variable a of type x<T>, otherwise it is (x < T) > a (<> are comparison operators, not angle brackets).
In fact, we could make a change to make this become unambiguous: we can make < and > nonassociative. So x < T > a, without brackets, would not be a valid sentence anyway even if x, T and a were all variable names.
How could one resolve this conflict in Menhir? At first glance it seems we just can't. Even with the aforementioned modification, we need to lookahead an indeterminate number of tokens before we see another closing >, and conclude that it was a template instantiation, or otherwise, to conclude that it was an expression. Is there any way in Menhir to implement such an arbitrary lookahead?
Different languages (including the ones listed in your title) actually have very different rules for templates/generics (like what type of arguments there can be, where templates/generics can appear, when they are allowed to have an explicit argument list and what the syntax for template/type arguments on generic methods is), which strongly affect the options you have for parsing. In no language that I know is it true that the meaning of x<T> a; depends on whether T is a type.
So let's go through the languages C++, Java, Rust and C#:
In all four of those languages both types and functions/methods can be templates/generic. So we'll not only have to worry about an ambiguity with variable declarations, but also function/method calls: is f<T>(x) a function/method call with an explicit template/type argument or is it two relational operators with the last operand parenthesized? In all four languages template/generic functions/methods can be called without template/type when those can be inferred, but that inference isn't always possible, so just disallowing explicit template/type arguments for function/method calls is not an option.
Even if a language does not allow relational operators to be chained, we could get an ambiguity in expressions like this: f(a<b, c, d>(e)). Is this calling f with the three arguments a<b, c and d>e or with the single argument a<b, c, d>(e) calling a function/method named a with the type/template arguments b,c,d?
Now beyond this common foundation, most everything else is different between these languages:
Rust
In Rust the syntax for a variable declaration is let variableName: type = expr;, so x<T> a; couldn't possibly be a variable declaration because that doesn't match the syntax at all. In addition it's also not a valid expression statement (anymore) because comparison operators can't be chained (anymore).
So there's no ambiguity here or even a parsing difficulty. But what about function calls? For function calls, Rust avoided the ambiguity by simply choosing a different syntax to provide type arguments: instead of f<T>(x) the syntax is f::<T>(x). Since type arguments for function calls are optional when they can be inferred, this ugliness is thankfully not necessary very often.
So in summary: let a: x<T> = ...; is a variable declaration, f(a<b, c, d>(e)); calls f with three arguments and f(a::<b, c, d>(e)); calls a with three type arguments. Parsing is easy because all of these are sufficiently different to be distinguished with just one token of lookahead.
Java
In Java x<T> a; is in fact a valid variable declaration, but it is not a valid expression statement. The reason for that is that Java's grammar has a dedicated non-terminal for expressions that can appear as an expression statement and applications of relational operators (or any other non-assignment operators) are not matched by that non-terminal. Assignments are, but the left side of assignment expressions is similarly restricted. In fact, an identifier can only be the start of an expression statement if the next token is either a =, ., [ or (. So an identifier followed by a < can only be the start of a variable declaration, meaning we only need one token of lookahead to parse this.
Note that when accessing static members of a generic class, you can and must refer to the class without type arguments (i.e. FooClass.bar(); instead of FooClass<T>.bar()), so even in that case the class name would be followed by a ., not a <.
But what about generic method calls? Something like y = f<T>(x); could still run into the ambiguity because relational operators are of course allowed on the right side of =. Here Java chooses a similar solution as Rust by simply changing the syntax for generic method calls. Instead of object.f<T>(x) the syntax is object.<T>f(x) where the object. part is non-optional even if the object is this. So to call a generic method with an explicit type argument on the current object, you'd have to write this.<T>f(x);, but like in Rust the type argument can often be inferred, allowing you to just write f(x);.
So in summary x<T> a; is a variable declaration and there can't be expression statements that start with relational operations; in general expressions this.<T>f(x) is a generic method call and f<T>(x); is a comparison (well, a type error, actually). Again, parsing is easy.
C#
C# has the same restrictions on expression statements as Java does, so variable declarations aren't a problem, but unlike the previous two languages, it does allow f<T>(x) as the syntax for function calls. In order to avoid ambiguities, relational operators need to be parenthesized when used in a way that could also be valid call of a generic function. So the expression f<T>(x) is a method call and you'd need to add parentheses f<(T>(x)) or (f<T)>(x) to make it a comparison (though actually those would be type errors because you can't compare booleans with < or >, but the parser doesn't care about that) and similarly f(a<b, c, d>(e)) calls a generic method named a with the type arguments b,c,d whereas f((a<b), c, (d<e)) would involve two comparisons (and you can in fact leave out one of the two pairs of parentheses).
This leads to a nicer syntax for method calls with explicit type arguments than in the previous two languages, but parsing becomes kind of tricky. Considering that in the above example f(a<b, c, d>(e)) we can actually place an arbitrary number of arguments before d>(e) and a<b is a perfectly valid comparison if not followed by d>(e), we actually need an arbitrary amount of lookahead, backtracking or non-determinism to parse this.
So in summary x<T> a; is a variable declaration, there is no expression statement that starts with a comparison, f<T>(x) is a method call expression and (f<T)>(x) or f<(T>(x)) would be (ill-typed) comparisons. It is impossible to parse C# with menhir.
C++
In C++ a < b; is a valid (albeit useless) expression statement, the syntax for template function calls with explicit template arguments is f<T>(x) and a<b>c can be a perfectly valid (even well-typed) comparison. So statements like a<b>c; and expressions like a<b>(c) are actually ambiguous without additional information. Further, template arguments in C++ don't have to be types. That is, Foo<42> x; or even Foo<c> x; where c is defined as const int x = 42;, for example, could be perfectly valid instantiations of the Foo template if Foo is defined to take an integer as a template argument. So that's a bummer.
To resolve this ambiguity, the C++ grammar refers to the rule template-name instead of identifier in places where the name of a template is expected. So if we treated these as distinct entities, there'd be no ambiguity here. But of course template-name is defined simply as template-name: identifier in the grammar, so that seems pretty useless, ... except that the standard also says that template-name should only be matched when the given identifier names a template in the current scope. Similarly it says that identifiers should only be interpreted as variable names when they don't refer to a template (or type name).
Note that, unlike the previous three languages, C++ requires all types and templates to be declared before they can be used. So when we see the statement a<b>c;, we know that it can only be a template instantiation if we've previously parsed a declaration for a template named a and it is currently in scope.
So, if we keep track of scopes while parsing, we can simply use if-statements to check whether the name a refers to a previously parsed template or not in a hand-written parser. In parser generators that allow semantic predicates, we can do the same thing. Doing this does not even require any lookahead or backtracking.
But what about parser generators like yacc or menhir that don't support semantic predicates? For these we can use something known as the lexer hack, meaning we make the lexer generate different tokens for type names, template names and ordinary identifiers. Then we have a nicely unambiguous grammar that we can feed our parser generator. Of course the trick is getting the lexer to actually do that. In order to accomplish that, we need to keep track of which templates and types are currently in scope using a symbol table and then access that symbol table from the lexer. We'll also need to tell the lexer when we're reading the name of a definition, like the x in int x;, because then we want to generate a regular identifier even if a template named x is currently in scope (the definition int x; would shadow the template until the variable goes out of scope).
This same approach is used to resolve the casting ambiguity (is (T)(x) a cast of x to type T or a function call of a function named T?) in C and C++.
So in summary, foo<T> a; and foo<T>(x) are template instantiations if and only if foo is a template. Parsing's a bitch, but possible without arbitrary lookahead or backtracking and even using menhir when applying the lexer hack.
AFAIK C++'s template syntax is a well-known example of real-world non-LR grammar. Strictly speaking, it is not LR(k) for any finite k... So C++ parsers are usually hand-written with hacks (like clang) or generated by a GLR grammar (LR with branching). So in theory it is impossible to implement a complete C++ parser in Menhir, which is LR.
However even the same syntax for generics can be different. If generic types and expressions involving comparison operators never appear under the same context, the grammar may still be LR compatible. For example, consider the rust syntax for variable declaration (for this part only):
let x : Vec<T> = ...
The : token indicates that a type, rather than an expression follows, so in this case the grammar can be LR, or even LL (not verified).
So the final answer is, it depends. But for the C++ case it should be impossible to implement the syntax in Menhir.

Flex/Bison: cannot use semantic_type

I try to create a c++ flex/bison parser. I used this tutorial as a starting point and did not change any bison/flex configurations. I am stuck now to the point of trying to unit test the lexer.
I have a function in my unit tests that directly calls yylex, and checks the result of it:
private: static void checkIntToken(MyScanner &scanner, Compiler *comp, unsigned long expected, unsigned char size, char isUnsigned, unsigned int line, const std::string &label) {
yy::MyParser::location_type loc;
yy::MyParser::semantic_type semantic; // <---- is seems like the destructor of this variable causes the crash
int type = scanner.yylex(&semantic, &loc, comp);
Assert::equals(yy::MyParser::token::INT, type, label + "__1");
MyIntToken* token = semantic.as<MyIntToken*>();
Assert::equals(expected, token->value, label + "__2");
Assert::equals(size, token->size, label + "__3");
Assert::equals(isUnsigned, token->isUnsigned, label + "__4");
Assert::equals(line, loc.begin.line, label + "__5");
//execution comes to this point, and then, program crashes
}
The error message is:
program: ../src/__autoGenerated__/MyParser.tab.hh:190: yy::variant<32>::~variant() [S = 32]: Assertion `!yytypeid_' failed.
I have tried to follow the logic in the auto-generated bison files, and make some sense out of it. But I did not succeed on that and ultimately gave up. I searched then for any advice on the web about this error message but did not find any.
The location indicated by the error has the following code:
~variant (){
YYASSERT (!yytypeid_);
}
EDIT: The problem disappears only if I remove the
%define parse.assert
option from the bison file. But I am not sure if this is a good idea...
What is the proper way to obtain the value of the token generated by flex, for unit testing purposes?
Note: I've tried to explain bison variant types to the best of my knowledge. I hope it is accurate but I haven't used them aside from some toy experiments. It would be an error to assume that this explanation in any way implies an endorsement of the interface.
The so-called "variant" type provided by bison's C++ interface is not a general-purpose variant type. That was a deliberate decision based on the fact that the parser is always able to figure out the semantic type associated with a semantic value on the parser stack. (This fact also allows a C union to be used safely within the parser.) Recording type information within the "variant" would therefore be redundant. So they don't. In that sense, it is not really a discriminated union, despite what one might expect of a type named "variant".
(The bison variant type is a template with an integer (non-type) template argument. That argument is the size in bytes of the largest type which is allowed in the variant; it does not in any other way specify the possible types. The semantic_type alias serves to ensure that the same template argument is used for every bison variant object in the parser code.)
Because it is not a discriminated union, its destructor cannot destruct the current value; it has no way to know how to do that.
This design decision is actually mentioned in the (lamentably insufficient) documentation for the Bison "variant" type. (When reading this, remember that it was originally written before std::variant existed. These days, it would be std::variant which was being rejected as "redundant", although it is also possible that the existence of std::variant might have had the happy result of revisiting this design decision). In the chapter on C++ Variant Types, we read:
Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.
Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually).
And indeed they are. So any use of a bison "variant" must have a definite type:
You can build a variant with an argument of the type to build. (This is the only case where you don't need a template parameter, because the type is deduced from the argument. You would have to use an explicit template parameter only if the argument were not of the precise type; for example, an integer of lesser rank.)
You can get a reference to the value of known type T with as<T>. (This is undefined behaviour if the value has a different type.)
You can destruct the value of known type T with destroy<T>.
You can copy or move the value from another variant of known type T with copy<T> or move<T>. (move<T> involves constructing and then destructing a T(), so you might not want to do it if T had an expensive default constructor. On the whole, I'm not convinced by the semantics of the move method. And its name conflicts semantically with std::move, but again it came first.)
You can swap the values of two variants which both have the same known type T with swap<T>.
Now, the generated parser understands all these restrictions, and it always knows the real types of the "variants" it has at its disposal. But you might come along and try to do something with one of these objects in a way that violates a constraint. Since the object really doesn't have any way to check the constraint, you'll end up with undefined behaviour which will probably have some disastrous eventual consequence.
So they also implemented an option which allows the "variant" to check the constraints. Unsurprisingly, this consists of adding a discriminator. But since the discriminator is only used to validate and not to modify behaviour, it is not a small integer which chooses between a small number of known alternatives, but rather a pointer to a std::typeid (or NULL if the variant does not yet contain a value.) (To be fair, in most cases alignment constraints mean that using a pointer for this purpose is no more expensive than using a small enum. All the same...)
So that's what you're running into. You enabled assertions with %define parse.assert; that option was provided specifically to prevent you from doing what you are trying to do, which is let the variant object's destructor run before the variant's value is explicitly destructed.
So the "correct" way to avoid the problem is to insert an explicit call at the end of the scope:
// execution comes to this point, and then, without the following
// call, the program will fail on an assertion
semantic.destroy<MyIntType*>();
}
With the parse assertion enabled, the variant object will be able to verify that the types specified as template parameters to semantic.as<T> and semantic.destroy<T> are the same types as the value stored in the object. (Without parse.assert, that too is your responsibility.)
Warning: opinion follows.
In case anyone reading this cares, my preference for using real std::variant types comes from the fact that it is actually quite common for the semantic value of an AST node to require a discriminated union. The usual solution (in C++) is to construct a type hierarchy which is, in some ways, entirely artificial, and it is quite possible that std::variant can better express the semantics.
In practice, I use the C interface and my own discriminated union implementation.

Same block of code in a SELECT TYPE in Fortran

I’m wondering if there is an elegant way to avoid repeating block of code that apply to different types in a SELECT TYPE construct. Consider for example:
select type (var)
type is (t1_t)
codeA (many lines of code)
type is (t2_t)
codeA (same lines)
...
type is (tn_t)
codeB
class default
codeC
end select
Unlike the select case construct, where you can group multiple tests in the same case satement, there is no such facility in the select type construct.
The reason is: within the block of each type-guard statement, the selector (variable or expression you're matching) will have the same type as named in the type-guard statement, and won't be polymorphic there. This is how you can have dynamic type resolution in Fortran, that is statically typed.
select type (var)
type is (t1_t)
! Here, type of var is t1_t, and you can call
! procedures that use type(t1_t) as arguments with var
type is (t2_t)
! Here, type of var is t2_t, and procedures expecting
! type(t1_t) as arguments won't work with var
...
end select
Therefore, compiler can't allow you to group many types in the same type-guard, because it would't know which dynamic type to apply on the selector.
As a side note, unlike switch satement in c-derived languages, select constructs in Fortran do not fall through, i.e., after a match in Fortran, the corresponding block is executed and the control exits the construct.
As #cup said, you can convert whatever you have in common on the various branches in a subroutine in order to avoid repetition. But be aware that if you need to pass var to the routine, you must declare it as polymorphic dummy.
As francescalus pointed out in the comments, the typical way to repeat the same lines of code is to use an include file. This is a traditional technique which used to be widely used for common blocks and is also used for poor-man's templating.
included.f90:
codeA (many lines of code)
main.f90:
select type (var)
type is (t1_t)
include "included.f90"
type is (t2_t)
include "included.f90"
...
end select
If you use the C preprocessor, you can use the preprocesor's #include as well.
There is the overloaded operator type of construct that the 'type is' suggests. That needs to be in a module as a subroutine or function. I mostly use functions, so if one is mixing with C, then subroutines may be easier than a function trying to return a vector.
It will still be the same exact code, just operating on a complex, float, double, or variety of integer but depths... so there is no explicit 'type of' as the right version is selected based upon the arguement types.

How does a parser for C++ differentiate between comparisons and template instantiations?

In C++, the symbols '<' and '>' are used for comparisons as well as for signifying a template argument. Thus, the code snippet
[...] Foo < Bar > [...]
might be interpreted as any of the following two ways:
An object of type Foo with template argument Bar
Compare Foo to Bar, then compare the result to whatever comes next
How does the parser for a C++ compiler efficiently decide between those two possibilities?
If Foo is known to be a template name (e.g. a template <...> Foo ... declaration is in scope, or the compiler sees a template Foo sequence), then Foo < Bar cannot be a comparison. It must be a beginning of a template instantiation (or whatever Foo < Bar > is called this week).
If Foo is not a template name, then Foo < Bar is a comparison.
In most cases it is known what Foo is, because identifiers generally have to be declared before use, so there's no problem to decide one way or the other. There's one exception though: parsing template code. If Foo<Bar> is inside a template, and the meaning of Foo depends on a template parameter, then it is not known whether Foo is a template or not. The language standard directs to treat it as a non-template unless preceded by the keyword template.
The parser might implement this by feeding context back to the lexer. The lexer recognizes Foo as different types of tokens, depending on the context provided by the parser.
The important point to remember is that C++ grammar is not context-free. I.e., when the parser sees Foo < Bar (in most cases) knows that Foo refers to a template definition (by looking it up in the symbol table), and thus < cannot be a comparison.
There are difficult cases, when you literally have to guide the parser. For example, suppose that are writing a class template with a template member function, which you want to specialize explicitly. You might have to use syntax like:
a->template foo<int>();
(in some cases; see Calling template function within template class for details)
Also, comparisons inside non-type template arguments must be surrounded by parentheses, i.e.:
foo<(A > B)>
not
foo<A > B>
Non-static data member initializers bring more fun: http://open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#325
C and C++ parsers are "context sensitive", in other words, for a given token or lexeme, it is not guaranteed to be distinct and have only one meaning - it depends on the context within which the token is used.
So, the parser part of the compiler will know (by understanding "where in the source it is") that it is parsing some kind of type or some kind of comparison (This is NOT simple to know, which is why reading the source of competent C or C++ compiler is not entirely straight forward - there are lots of conditions and function calls checking "is this one of these, if so do this, else do something else").
The keyword template helps the compiler understand what is going on, but in most cases, the compiler simply knows because < doesn't make sense in the other aspect - and if it doesn't make sense in EITHER form, then it's an error, so then it's just a matter of trying to figure out what the programmer might have wanted - and this is one of the reasons that sometimes, a simple mistake such as a missing } or template can lead the entire parsing astray and result in hundreds or thousands of errors [although sane compilers stop after a reasonable number to not fill the entire universe with error messages]
Most of the answers here confuse determining the meaning of the symbol (what I call "name resolution") with parsing (defined narrowly as "can read the syntax of the program").
You can do these tasks separately..
What this means is that you can build a completely context-free parser for C++ (as my company, Semantic Designs does), and leave the issues of deciding what the meaning of the symbol is to a explicitly seperate following task.
Now, that task is driven by the possible syntax interpretations of the source code. In our parsers, these are captured as ambiguities in the parse.
What name resolution does is collect information about the declarations of names, and use that information to determine which of the ambiguous parses doesn't make sense, and simply drop those. What remains is a single valid parse, with a single valid interpretation.
The machinery to accomplish name resolution in practice is a big mess. But that's the C++ committee's fault, not the parser or name resolver. The ambiguity removal with our tool is actually done automatically, making that part actually pretty nice but if you don't look inside our tools you would not appreciate that, but we do because it means a small engineering team was able to build it.
See an example of resolution of template-vs-less than on C++s most vexing parse done by our parser.

Explain ML type inference to a C++ programmer

How does ML perform the type inference in the following function definition:
let add a b = a + b
Is it like C++ templates where no type-checking is performed until the point of template instantiation after which if the type supports the necessary operations, the function works or else a compilation error is thrown ?
i.e. for example, the following function template
template <typename NumType>
NumType add(NumType a, NumType b) {
return a + b;
}
will work for
add<int>(23, 11);
but won't work for
add<ostream>(cout, fout);
Is what I am guessing is correct or ML type inference works differently?
PS: Sorry for my poor English; it's not my native language.
I suggest you have a look at this article: What is Hindley-Milner? (and why is it cool)
Here is the simplest example they use to explain type inference (it's not ML, but the idea is the same):
def foo(s: String) = s.length
// note: no explicit types
def bar(x, y) = foo(x) + y
Just looking at the definition of bar, we can easily see that its type must be (String, Int)=>Int. That's type inference in a nutshell. Read the whole article for more information and examples.
I'm not a C++ expert, but I think templates are something else that is closer to genericity/parametricity, which is something different.
I think trying to relate ML type inference to almost anything in C++ is more likely to lead to confusion than understanding. C++ just doesn't have anything that's much like type inference at all.
The only part of C++ that doesn't making typing explicit is templates, but (for the most part) they support generic programming. A C++ function template like you've given might apply equally to an unbounded set of types -- just for example, the code you have uses NumType as the template parameter, but would work with strings. A single program could instantiate your add to add two strings in one place, and two numbers in another place.
ML type inference isn't for generic programming. In C or C++, you explicitly define the type of a parameter, and then the compiler checks that everything you try to do with that parameter is allowed by that type. ML reverses that: it looks at the things you do with the parameter, and figures out what the type has to be for you to be able to do those things. If you've tried to do things that contradict each other, it'll tell you there is no type that can satisfy the constraints.
This would be pretty close to impossible in C or C++, largely because of all the implicit type conversions that are allowed. Just for example, if I have something like a + b in ML, it can immediately conclude that a and b must be ints -- but in C or C++, they could be almost any combination of integer or pointer or floating point types (with the constraint that they can't both be pointers) or used defined types that overload operator+ (e.g., std::string). In ML, finding types can be exponential in the worst case, but is almost always pretty fast. In C++, I'd estimate it being exponential much more often, and in a lot of cases would probably be under-constrained, so a given function could have any of a number of different signatures.
ML uses Hindley-Milner type inference. In this simple case all it has to do is look at the body of the function and see that it uses + with the arguments and returns that. Thus it can infer that the arguments must be the type of arguments that + accepts (i.e. ints) and the function returns the type that + returns (also int). Thus the inferred type of add is int -> int -> int.
Note that in SML (but not CAML) + is also defined for other types than int, but it will still infer int when there are multiple possibilities (i.e. the add function you defined can not be used to add two floats).
F# and ML are somewhat similar with regards to type inference, so you might find
Overview of type inference in F#
helpful.