Clojure static typing - clojure

I know that this may sound like blasphemy to Lisp aficionados (and other lovers of dynamic languages), but how difficult would it be to enhance the Clojure compiler to support static (compile-time) type checking?
Setting aside the arguments for and against static and dynamic typing, is this possible (not "is this advisable")?
I was thinking that adding a new reader macro to force a compile-time type (an enhanced version of the #^ macro) and adding the type information to the symbol table would allow the compiler to flag places where a variables was misused. For example, in the following code, I would expect a compile-time error (#* is the "compile-time" type macro):
(defn get-length [#*String s] (.length s))
(defn test-get-length [] (get-length 2.0))
The #^ macro could even be reused with a global variable (*compile-time-type-checking*) to force the compiler the do the checks.
Any thoughts on the feasibility?

It certain possible. However I do not think that Clojure will ever get any form of weak static typing - it's benefits are too few.
Rich Hickey has however expressed on several occasions his like for the strong, optional, and expressive typing feature of the Qi language, http://www.lambdassociates.org/qilisp.htm

It's certainly possible. The compiler already does some static type checking around primitive argument types in the 1.3 development branch.

Yes! It looks like there is a project underway, core.typed, to make optional static type checking a reality. See the Github project and its
documentation
This work grew out of an undergraduate honours dissertation (PDF) by Ambrose Bonnaire-Sergeant, and is related to the Typed Racket system.

Since one form is read AND evaluated at a time you cannot have forward references making this somewhat limited.

Old question but two important points: I don't think Clojure supports reader macros, only ordinary lisp macros. And now we have core.typed option for typing in Clojure.

declare can have type hints, so it is possible to declare a var that "is" the type which has not been defined yet but contains data about the structure, but this would be really clunky and you would have to do it before any code path that could be executed before the type is defined. Basically, you would want to define all of your user defined types up front and then use them like normal. I think that makes library writing somewhat hackish.
I didn't mean to suggest earlier that this isn't possible, just that for user defined types it is a lot more complicated than for pre-defined types. The benefit of doing this vs. the cost is something that should be seriously considered. But I encourage anyone who is interested to try it out and see if they can make it work!

Related

How is is_standard_layout useful?

From what I understand, standard layout allows three things:
Empty base class optimization
Backwards compatibility with C with certain pointer casts
Use of offsetof
Now, included in the library is the is_standard_layout predicate metafunction, but I can't see much use for it in generic code as those C features I listed above seem extremely rare to need checking in generic code. The only thing I can think of is using it inside static_assert, but that is only to make code more robust and isn't required.
How is is_standard_layout useful? Are there any things which would be impossible without it, thus requiring it in the standard library?
General response
It is a way of validating assumptions. You wouldn't want to write code that assumes standard layout if that wasn't the case.
C++11 provides a bunch of utilities like this. They are particularly valuable for writing generic code (templates) where you would otherwise have to trust the client code to not make any mistakes.
Notes specific to is_standard_layout
It looks to me like the (pseudo code) definition of is_pod would roughly be...
// note: applied recursively to all members
bool is_pod(T) { return is_standard_layout(T) && is_trivial(T); }
So, you need to know is_standard_layout in order to implement is_pod. Given that, we might as well expose is_standard_layout as a tool available to library developers. Also of note: if you have a use-case for is_pod, you might want to consider the possibility that is_standard_layout might actually be a better (more accurate) choice in that case, since POD is essentially a subset of standard layout.
I get the feeling that they added every conceivable variant of type evaluation, regardless of any obvious value, just in case someone might encounter a need sometime before the next standard comes out. I doubt if piling on these "extra" type properties adds a significant additional burden to compiler developers.
There is a nice discussion of standard layout here: Why is C++11's POD "standard layout" definition the way it is?
There is also a lot of good detail at cppreference.com: Non-static data members

Should I avoid using Monad fail?

I'm fairly new to Haskell and have been slowly getting the idea that there's something wrong with the existence of Monad fail. Real World Haskell warns against its use ("Once again, we recommend that you almost always avoid using fail!"). I just noticed today that Ross Paterson called it "a wart, not a design pattern" back in 2008 (and seemed to get quite some agreement in that thread).
While watching Dr Ralf Lämmel talk on the essence of functional programming, I started to understand a possible tension which may have led to Monad fail. In the lecture, Ralf talks about adding various monadic effects to a base monadic parser (logging, state etc). Many of the effects required changes to the base parser and sometimes the data types used. I figured that the addition of 'fail' to all monads might have been a compromise because 'fail' is so common and you want to avoid changes to the 'base' parser (or whatever) as much as possible. Of course, some kind of 'fail' makes sense for parsers but not always, say, put/get of State or ask/local of Reader.
Let me know if I could be off onto the wrong track.
Should I avoid using Monad fail?
What are the alternatives to Monad fail?
Are there any alternative monad libraries that do not include this "design wart"?
Where can I read more about the history around this design decision?
Some monads have a sensible failure mechanism, e.g. the terminal monad:
data Fail x = Fail
Some monads don't have a sensible failure mechanism (undefined is not sensible), e.g. the initial monad:
data Return x = Return x
In that sense, it's clearly a wart to require all monads to have a fail method. If you're writing programs that abstract over monads (Monad m) =>, it's not very healthy to make use of that generic m's fail method. That would result in a function you can instantiate with a monad where fail shouldn't really exist.
I see fewer objections to using fail (especially indirectly, by matching Pat <- computation) when working in a specific monad for which a good fail behaviour has been clearly specified. Such programs would hopefully survive a return to the old discipline where nontrivial pattern matching created a demand for MonadZero instead of just Monad.
One might argue that the better discipline is always to treat failure-cases explicitly. I object to this position on two counts: (1) that the point of monadic programming is to avoid such clutter, and (2) that the current notation for case analysis on the result of a monadic computation is so awful. The next release of SHE will support the notation (also found in other variants)
case <- computation of
Pat_1 -> computation_1
...
Pat_n -> computation_n
which might help a little.
But this whole situation is a sorry mess. It's often helpful to characterize monads by the operations which they support. You can see fail, throw, etc as operations supported by some monads but not others. Haskell makes it quite clumsy and expensive to support small localized changes in the set of operations available, introducing new operations by explaining how to handle them in terms of the old ones. If we seriously want to do a neater job here, we need to rethink how catch works, to make it a translator between different local error-handling mechanisms. I often want to bracket a computation which can fail uninformatively (e.g. by pattern match failure) with a handler that adds more contextual information before passing on the error. I can't help feeling that it's sometimes more difficult to do that than it should be.
So, this is a could-do-better issue, but at the very least, use fail only for specific monads which offer a sensible implementation, and handle the 'exceptions' properly.
In Haskell 1.4 (1997) there was no fail. Instead, there was a MonadZero type class which contained a zero method. Now, the do notation used zero to indicate pattern match failure; this caused surprises to people: whether their function needed Monad or MonadZero depended on how they used the do notation in it.
When Haskell 98 was designed a bit later, they did several changes to make programming simpler to the novice. For example, monad comprehensions were turned into list comprehensions. Similarly, to remove the do type class issue, the MonadZero class was removed; for the use of do, the method fail was added to Monad; and for other uses of zero, a mzero method was added to MonadPlus.
There is, I think, a good argument to be made that fail should not be used for anything explicitly; its only intended use is in the translation of the do notation. Nevertheless, I myself am often naughty and use fail explicitly, too.
You can access the original 1.4 and 98 reports here. I'm sure the discussion leading to the change can be found in some email list archives, but I don't have links handy.
I try to avoid Monad fail wherever possible, and there are a whole variety of ways to capture fail depending on your circumstance. Edward Yang has written a good overview on his blog in the article titled 8 ways to report errors in Haskell revisited.
In summary, the different ways to report errors that he identifies are:
Use error
Use Maybe a
Use Either String a
Use Monad and fail to generalize 1-3
Use MonadError and a custom error type
Use throw in the IO monad
Use ioError and catch
Go nuts with monad transformers
Checked exceptions
Failure
Of these, I would be tempted to use option 3, with Either e b if I know how to handle the error, but need a little more context.
If you use GHC 8.6.1 or above, there is a new class called MonadFail and a language extension called MonadFailDesugaring. If you have access to these, use them. You can then use fail without a problem that some monad actually has no fail implementation.
This extension makes the fail desugaring requires a more restrictive MonadFail instead of any monad.

Writing a function with type 'a -> string

For debugging purposes I'd like to have a function in OCaml that converts to string arbitrary type, the debugger currently has one, but it'd be cool to have one.
The sexplib library would be perfect, but the fact is that I can't modify all the types I need to add with sexp, and I can't use camlp4 either.
Is there any such function? (It won't be on production code so I accept dirty solutions)
Something like Haskell's Show typeclass would be exactly what I mean.
Thanks for your time
The Std module in Batteries Included provides a dump function which converts arbitrary types to readable strings. It is somewhat limited - as it does not know about types, it cannot print constructors for variant types properly and replaces them with numbers - but it can still be pretty helpful. Since type information is not available at runtime, that's about as good as you can do. The debugger and toplevel use compiler trickery to obtain better representations, but that is difficult if not impossible to do in a general library.
I seem to remember also seeing a more sophisticated dumping library somewhere, but I do not recall where.

Is it fine to customize C++?

For my projects, I usually define a lot of aliases for types like unsigned int, char and double as well as std::string and others.
I also aliased and to &&, or to ||, not to !, etc.
Is this considered bad practice or okay to do?
Defining types to add context within your code is acceptable, and even encouraged. Screwing around with operators will only encourage the next person that has to maintain your code to bring a gun to your house.
Well, consider the newcomers who are accustomed to C++. They will have difficulties maintaining your project.
Mind that there are many possibilities for the more valid aliasing. A good example is complicated nested STL containers.
Example:
typedef int ClientID;
typedef double Price;
typedef map<ClientID, map<Date, Price> > ClientPurchases;
Now, instead of
map<int, map<Date, double> >::iterator it = clientPurchases.find(clientId);
you can write
ClientPurchases::iterator it = clientPurchases.find(clientId);
which seems to be more clear and readable.
If you're only using it for pointlessly renaming language features (as opposed to the example #Vlad gives), then it's the Wrong Thing.
It definitely makes the code less readable - anyone proficient in C++ will see (x ? y : z) and know that it's a ternary conditional operator. Although ORLY x YARLY y NOWAI z KTHX could be the same thing, it will confuse the viewer: "is this YARLY NOWAI the exact same thing as ? :, renamed for the author's convenience, or does it have subtle differences?" If these "aliases" are the same thing as the standard language elements, they will only slow down the next person to maintain your code.
TLDR: Reading code, any code, is hard enough without having to look up your private alternate syntax all the time.
That’s horrible. Don’t do it, please. Write idiomatic C++, not some macro-riddled monstrosity. In general, it’s extremely bad practice to define such macros, except in very specific cases (such as the BOOST_FOREACH macro).
That said, and, or and not are actually already valid aliases for &&, || and ! in C++!
It’s just that Visual Studio only knows them if you first include the standard header <ciso646>. Other compilers don’t need this.
Types are something else. Using typedef to create type aliases depending on context makes sense if it augments the expressiveness of the code. Then it’s encouraged. However, even better would be to create own types instead of aliases. For example, I can’t imagine it ever being beneficial to create an alias for std::string – why not just use std::string directly? (An exception are of course generic data structures and algorithms.)
"and", "or", "not" are OK because they're part of the language, but it's probably better to write C++ in a style that other C++ programmers use, and very few people bother using them. Don't alias them yourself: they're reserved names and it's not valid in general to use reserved names even in the preprocessor. If your compiler doesn't provide them in its default mode (i.e. it's not standard-compliant), you could fake them up with #define, but you may be setting yourself up for trouble in future if you change compiler, or change compiler options.
typedefs for builtin types might make sense in certain circumstances. For example in C99 (but not in C++03), there are extended integer types such as int32_t, which specifies a 32 bit integer, and on a particular system that might be a typedef for int. They come from stdint.h (<cstdint> in C++0x), and if your C++ compiler doesn't provide that as an extension, you can generally hunt down or write a version of it that will work for your system. If you have some purpose in mind for which you might in future want to use a different integer type (on a different system perhaps), then by all means hide the "real" type behind a name that describes the important properties that are the reason you chose that type for the purpose. If you just think "int" is unnecessarily brief, and it should be "integer", then you're not really helping anyone, even yourself, by trying to tweak the language in such a superficial way. It's an extra indirection for no gain, and in the long run you're better off learning C++, than changing C++ to "make more sense" to you.
I can't think of a good reason to use any other name for string, except in a case similar to the extended integer types, where your name will perhaps be a typedef for string on some builds, and wstring on others.
If you're not a native English-speaker, and are trying to "translate" C++ to another language, then I sort of see your point but I don't think it's a good idea. Even other speakers of your language will know C++ better than they know the translations you happen to have picked. But I am a native English speaker, so I don't really know how much difference it makes. Given how many C++ programmers there are in the world who don't translate languages, I suspect it's not a huge deal compared with the fact that all the documentation is in English...
If every C++ developer was familiar with your aliases then why not, but you are with these aliases essentially introducing a new language to whoever needs to maintain your code.
Why add this extra mental step that for the most part does not add any clarity (&& and || are pretty obvious what they are doing for any C/C++ programmer, and any way in C++ you can use the and and or keywords)

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.
A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.
Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)
Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.