Getting more useful warnings when compiling Clojure

Getting more useful warnings when compiling Clojure - clojure

In the function bellow, I used cond in the place of case. It took me a long time to single out this function. I am learning clojure, so the error was not obvious to me. When I tried to run the code up to the map function (using the cursive/Intellij debugger), Intellij complained: There is no executable code at core.clj:144. If the clojure compiler knows that, is there an option to get a warning at compiler time? Are there other checks that the compiler (or a lint) can do in my code?
(defn uri-gen [uri fields line]
(let [remo "[//\\:*?()<>|.%'\"&]"]
(cond (count fields)
0 (correct-uri ...)
1 (let ...)
(correct-empty
uri
(apply str
(map (fn [it] ...)))))))

Unfortunately, compiler warnings & error messages in Clojure are often terse, nonsensical, or just plain missing.
I'm not sure if it would help in this instance, but you might try the eastwood Clojure lint tool (see others in the Clojure Toolbox). I also make extensive use of Plumatic Schema which has helped me to avoid many simple type errors.
The Clojure Toolbox
Eastwood
Plumatic Schema

It's not really the compiler's problem.
From the compiler's point of view, what you wrote makes perfect sense. You've got an s-expression composed of an s-expression followed by more s-expressions - what's the problem? A Lisp, such as Clojure, has darned little "syntax" for the compiler to "know" about. It does not, for example, have to worry about silly things like operator precedence and "if" and "while" and "until" statements and all the tons of other things that compilers for other languages dither over. It has no knowledge of what cond and case and and and whatever do - it just knows that, because they're the first value in an unquoted form, they must be a function; the compiler can in some sense "find" that function, so it can create the code to call the function; and there's a bunch of other valid expressions following it that need to be passed to the function. (And of course those other expressions may consist of more s-expressions, representing more functions to be called, and down the Lisp-hole we go!). KEWL! It's the function which has to make some sort of sense of the arguments it's passed and do whatever it is that the function is supposed to do. The compiler doesn't know anything about all that - in a very very oversimplified sense, it just reads s-expressions and generates code to call functions.
If you insist on passing case-ish arguments to cond, the compiler doesn't care. It will merrily do what you asked. The fact that cond will (probably) barf all over those arguments is not something for the compiler to deal with - it's the programmer's problem. Lisp/Clojure puts the responsibility for getting it right squarely on the shoulders of the programmer, where it belongs.
Lisp's are a kind of like a super-power for programmer's. In the right hands it's a valuable tool in the service of mankind. In the wrong hands, it's a recipe for disaster. Try not to be Tighten - use your powers wisely. :-)

Related

Why does Crystal's macro syntax for iterating differ from the rest of Crystal

Coming from the Ruby world, I instantly understood why Crystal chose not to implement a for method. But then I was surprised to see that Crystal does implement a for method for macros. I was even more surprised to find that macros don't allow an enumerable (.each, etc) syntax (i.e. {% ["one", "two", "three"].each do |value| %} isn't valid macro syntax).
Is there a logical reason for this syntax difference? It's possible that the answer is simply ~"because the devs decided that macro syntax looks like x, and non-macro syntax looks like y", but I'm guessing that there is more to it then that (an arbitrary syntax inconsistency seems like a flaw).
Thanks!

The main reason is that when the parser parses foo.bar do |arg| ... end, it expects an expression after |arg|, not %}, which is a parse error. So to allow that we'd need to enhance the parser (which is already quite complex) to take that into account. for was decided because of this, but also to make it clear that it's just not regular crystal but a different thing (it's an interpreted subset of crystal and the standard library).
Another reason is that if each and other iteration methods are allowed, why not while and until? That could allow endless loops in macros, which with just for aren't possible, so you can guarantee a macro finishes executing. Which... is actually not true given that we have run inside macros.
So I think I'm not opposed to change the language to allow each, each_with_index, etc., inside macros, and allow that syntax, and eventually remove for from the macro language. Opening an issue requesting this is a good way in this direction.

Does clojure have identifier macros?

In other words, is there any way to trigger macro expansion on forms that don't look like (MACRO arg* ...).
To give a hypothetical example:
(defmacro my-var
(do (printf "Using my-var!\n") 42))
(+ my-var 1) ;; Should print & return 43
What I really want to do is call macroexpand on a term with free variables. I want to these free variables should have special meaning during macro expansion, but should go back to "normal" afterward.

This facility currently exists in Clojure as one of the features of the Clojure Contrib library tools.macro. The relevant tools.macro macros are defsymbolmacro, with-symbol-macros and symbol-macrolet – here's a symbol-macrolet example from the README (as found on the tip of the master branch right now):
(symbol-macrolet [def foo]
(def def def))
expands to (def def foo)
symbol-macrolet is used to introduce lexically scoped symbol macros, while defsymbolmacro introduces symbol macros to be applied inside any with-symbol-macros blocks (doing away with with-symbol-macros would only be possible with support from the Clojure compiler itself).
AFAIK, some version of symbol-macrolet has been a planned feature for inclusion in Clojure's compiler itself for quite a while; it is expected that when and if it lands, it will likely be called letmacro. There's no concrete public timeline for that, though, and it is certainly not guaranteed to land at all.
The names that tools.macro uses are inspired by Common Lisp, where the relevant macros are known as define-symbol-macro and symbol-macrolet. Since this is a standard compiler feature in CL, symbol macros introduced with define-symbol-macro are actually applied globally without assistance from any helper macros.

Sorry, no, Clojure does not have this feature.
With the exception of the reader macros such as quote ', syntax-quote `, dispatch #, etc. which do expand outside the function argument position, you cannot add macros that expand in this position.
Reader literals are slightly related in that you can define literal syntax that will be converted into a data structure at read time, though these cannot be used as variables like you are describing.
The auto-gensym feature of the syntax-quote reader macro has a little of this flavor though in a much more limited way.

How to identify which forms are macros and which are functions while looking at a Clojure code?

Lisp/Clojure code have consistency in their syntax and it is a plus point as one doesn't need to understand various different constructs.
But at times It is easier to understand by looking at a piece of code just by the different syntax being used like this is a switch case or this is the pattern matching construct etc without actually reading the text.
I have started out with Clojure couple of months ago and I have realized I can't understand the code without reading the name of the form and then googling whether it is a macro or a function and how it works.
So it turns out that, a piece of Clojure code, irrespective fo the uniformity of the syntax isn't uniform.
It may seem like a function but if at all it is a macro then it might not be evaluating all its arguments.
Is there a naming convention or indentation style that all macros use so it is easier for someone to grasp by the name what is going on ?

The most useful intuition in my opinion comes from understanding the purpose of a given operator / Var. Well-designed macros simply could not be written as functions and still offer the same functionality with the same syntax, for if they could, they would in fact be written as functions (see the "well-designed" part above!).1 So, if you're dealing with a construct which couldn't possibly be a regular function, then you know it isn't; otherwise it likely is.
Additionally, the usual ways of learning about the Vars exported by a library tell you whether you're dealing with a macro or a function up front. That is true of doc ((doc foo) says that foo is a macro near the top of its output if that is indeed the case), source (since it gives you the entire code) and M-. (jump to definition in Emacs with nrepl.el or swank-clojure; M-, jumps back). Documentation may be expected to mention what is a macro and what isn't (except that's not necessarily true of docstrings, since all usual ways of accessing a docstring already tell you whether you're dealing with a macro, as explained above).
If you're skimming a body of code with the intention of forming a rough understanding of what it probably does on the assumption that the various operators perform the functions suggested by their names, then either (1) the names are suggestive enough and you get an idea of what's intended by the code, so you don't even need to care which operators happen to be macros, or (2) the names are not suggestive enough, so you'll need to dive into the docs or the source for some of the operators anyway, and then the first thing you'll learn is which of them are registered as macros.
Finally, there is no single naming style for macros, although there are certain conventions specific to particular use cases. For example with-foo-style constructs tend to be convenience macros whose purpose is to simplify dealing with resources of type foo; dofoo-style constructs tend to be macros which take a body of expressions to be executed (how many times and with which additional context set up depends on the macro; the most basic member of this family, do, is actually a special form rather than a macro); deffoo-style constructs introduce new Vars or type-like entities.
It's worth pointing out that similar patterns are sometimes broken. For instance, most threading constructs (-> & Co.) are macros, but xml-> from clojure.data.zip.xml is a function. That makes perfect sense when one considers the functionality provided, which brings us back to the point about the purpose of an operator being the most useful source of intuition.
1 There might be some exceptions to this rule. One would expect these to be documented. Some projects are of course not documented at all (or very nearly so); here the issue goes away completely, since one must go to the source to make sense of things anyway.

There are two attributes that typically distinguish a macro (or sometimes special form) from a function:
When the form does some sort of binding (i.e. declaring new identifiers for later use)
When some of the arguments are evaluated lazily
Examples of the first case are let, letfn, binding and with-local-vars. Strangely though, defn is defined as a function, but I'm pretty sure it has something to do with Clojure's bootstrapping process (defn is defined before defmacro is defined).
Examples of the second would be and, or and lazy-seq. In all these constructs, the arguments are evaluated lazily by either putting them in conditional branches (like if) or moving them inside a function body.
Both of those attributes are really just manifestations of the macro manipulating the Clojure syntax. I don't think the threading macros (-> and ->>) fit very well into either of those categories, but the nil-safe versions (-?> and -?>>) kind of fall under having lazy arguments.

As far as I know there is no enforced naming convention.
As a rule of thumb, functions are preferred wherever possible, but macros can sometimes be spotted when they follow the pattern def<something> for setting up a something or with-<resource> for doing something with an open resource.
Because of this, you may find clojure's doc macro helpful. It will tell you whether a form is a macro/function/special form, as well as give it's arg list and doc string (if present). For example
(use 'clojure.repl)
(doc and)
Will print the following to the repl.
clojure.core/and
([] [x] [x & next])
Macro
Evaluates exprs one at a time, from left to right. If a form
returns logical false (nil or false), and returns that value and
doesn't evaluate any of the other expressions, otherwise it returns
the value of the last expr. (and) returns true.
Some editors (e.g. emacs) will provide this documentation as a pop-up or on a key combination, which makes accessing it (and reading) much faster.

C++ vs. D , Ada and Eiffel (horrible error messages with templates)

One of the problems of C++ are horrible error messages that we are getting from code which intensively uses templates and template metaprogramming. The concepts are designed to solve this problem, but unfortunately they will not be in the next standard.
I'm wondering, is this problem common for all languages, which are supporting generic programming? Or something is wrong with C++ templates?
Unfortunately I don't know any other language, that supports generic programming (Java and C# generics are too simplified and not as powerful as C++ templates).
So I'm asking you guys: are D,Ada,Eiffel templates (generics) producing such ugly error messages too? And Is it possible to have language with powerful generic programming paradigm, but without ugly error messages? And if yes, how these languages are solving this problem ?
Edit: for downvoters. I really love C++ and templates. I'm not saying that templates are bad. Actually I'm a big fan of generic programming and template metaprogramming. I'm just asking why I'm getting such ugly error messages from compilers.

In general I found Ada compiler error messages for generics really not significantly more difficult to read than any other Ada compiler error messages.
C++ template error messages, on the other hand, are notorious for being error novels. The main difference I think is the way C++ does template instantiation. The thing is, C++ templates are much more flexible than Ada generics. It is so flexible, it is almost like a macro preprocessor. The clever folks in Boost have used this to implement things like lambdas and even whole other languages.
Because of that flexibility, the entire template hierarchy basically has to be compiled anew every time its particular permutation of template parameters is first encountered. Thus issues that resolve down to incompatibilities several layers down a API end up being presented to the poor API client to decipher.
In Ada, Generics are actually strongly typed, and provide full information hiding to the client, just like normal packages and subroutines do. So if you do get an error message, it is typically just referencing the one generic you are trying to instatiate, not the entire hierarchy used to implement it.
So yes, C++ template error messages are way worse than Ada's.
Now debugging is a different story entirely...

The problem, at heart, is that error recovery is difficult, whatever the context.
And when you factor in C and C++ horrid grammars, you can only wonder that error messages are not worse than that! I am afraid that the C grammar has been designed by people who didn't have a clue about the essential properties of a grammar, one of them being that the less reliance on the context the better and the other being that you should strive to make it as unambiguous as possible.
Let us illustrate a common error: forgetting a semi-colon.
struct CType {
int a;
char b;
}
foo
bar() { /**/ }
Okay so this is wrong, where should the missing semi-colon go ? Well unfortunately it's ambiguous, it can go either before or after foo because:
C considers it normal to declare a variable in stride after defining a struct
C considers it normal not to specify a return type for a function (in which case it defaults to int)
If we reason about, we could see that:
if foo names a type, then it belongs to the function declaration
if not, it probably denotes a variable... unless of course we made a typo and it was meant to be written fool, which happens to be a type :/
As you can see, error recovery is downright difficult, because we need to infer what the writer meant, and the grammar is far from being receptive. It is not impossible though, and most errors can indeed be diagnosed more or less correctly, and even recovered from... it just takes considerable effort.
It seems that people working on gcc are more interested in producing fast code (and I mean fast, search for the latest benchmarks on gcc 4.6) and adding interesting features (gcc already implement most - if not all - of C++0x) than producing easy to read error messages. Can you blame them ? I can't.
Fortunately there are people who think that accurate error reporting and good error recovery are a very worthy goal, and some of those have been working on CLang for quite a bit, and they are continuing to do so.
Some nice features, off the top of my head:
Terse but complete error messages, which include the source ranges to expose exactly where the error emanated from
Fix-It notes when it's obvious what was meant
In which case the compiler parses the rest of the file as if the fix had been there already, instead of spewing lines upon lines of gibberish
(recent) avoid including the include stack for notes, to cut out on the cruft
(recent) trying only to expose the template parameter types that the developper actually wrote, and preserving typedefs (thus talking about std::vector<Name> instead of std::vector<std::basic_string<char, std::allocator<char>>, std::allocator<std::basic_string<char, std::allocator<char>> > which makes all the difference)
(recent) recovering correctly in case of a missing template in case it's missing in a call to a template method from within another template method
But each of those has required several hours to days of work.
They certainly didn't come for free.
Now, concepts should have (normally) made our lives easier. But they were mostly untested and so it was deemed preferable to remove them from the draft. I must say I am glad for this. Given C++ relative inertia, it's better not to include features that haven't been thoroughly revised, and the concept maps didn't really thrilled me. Neither did they thrilled Bjarne or Herb it seems, as they said that they would be rethinking Concepts from scratch for the next standard.

The article Generic Programming outlines many of the pros and cons of generics in several languages, including Ada in particular. Although lacking template specialization, all Ada generic instances are "equivalent to the instance declaration…immediately followed by the instance body". As a practical matter, error messages tend to occur at compile-time, and they typically represent familiar violations of type-safety.

D has two features to improve the quality of template error messages: Constraints and static assert.
// Use constraints to only allow a function to operate on random access
// ranges as defined in std.range. If something that doesn't satisfy this
// is passed, the compiler will error before even trying to instantiate
// fun().
void fun(R)(R range) if(isRandomAccessRange!(R)) {
// Do stuff.
}
// Use static assert to check a high level invariant. If
// the predicate is false, the error message will be
// printed and compilation will stop before a screen
// worth of more confusing errors are encountered.
// This function takes any number of ranges to merge sort
// and the same number of temporary buffers to merge into.
void mergeSort(R...)(R ranges) {
static assert(R.length % 2 == 0,
"Must have equal number of ranges to be sorted and temporary buffers.");
static assert(allSatisfy!(isRandomAccessRange, R),
"All arguments to mergeSort must be random access ranges.");
// Implementation
}

Eiffel has the best of all error messages because it is has the best of all template systems. It is fully integrated into the language and works well because it is the only language which is using covarianz in arguments.
Therefore it is much more then a simple compiler copy and paste. Unfortunately explaining the difference in a few lines is impossible. Just go and have a look at EiffelStudio.

There are some efforts to improve the error messages. Clang, for example, has put quite a lot of emphasis on generating more easily readable compiler error messages. I've only been using it for a short while, but my experience of it so far has been quite positive compared to GCC's equivalent errors.

C++ if statement alternatives

Is it me, or does it seem like C++ asks for more use of the 'if' statement then C#?
I have this codebase and it contains lots of things such as this:
if (strcmp((char*)type,"double")==0)
I wondered isn't it a bit of a 'code smell' when there's just too many if statements?
I'm not saying there bad, but things like string comparisons, with lots of strings involved, can't they be done differently?
Is there an alternative to just writing sequences of if statements?
THIS IS JUST AN EXAMPLE, IT CAN BE ANY KIND OF IF STATEMENTS
instead of:
if (string a == "blah") then bla
if (string b == "blah") then blo

The reason you do if (strcmp((char*)type,"double")==0) is because you can't make "double" a case-expression and use a switch statement. That said, if you're doing a lot of these kinds of string matches, you may want to look at using a std::map<std::string, int> or something similar and then use the map to convert the string to an index which you then feed to switch.
Personally, in these cases, I'm a fan of things like std::map<std::string, int (Handler::*)(void)>, which lets me create a handler map of class methods, but YMMV.
EDIT: I forgot to mention: the other sweet thing about having a map of strings to methods is that you can alter (usually add to) it at run time. For example, a parser could change its list of keywords and their handlers at runtime after it knows what kind of file it's parsing.

This is code smell.
To minimize it, you should (in this case) use std::strings. Your code then becomes:
#include <string>
// [...]
std::string type = "whatever";
// [...]
if (type == "double")
This is almost identical to the C# equivalent: to compile this sample code in C# code just remove the include and the std::.
Usually, if you find code that uses char* directly in C++ it's usually doing it wrong (except maybe for some rare exceptions).
Edit: Mike DeSimone addressed the problem of further refactoring this in his answer (so I won't mention it here :) ).

I don't think C++ requires any more "ifs" than C#. The number of if statements in a program is really just a matter of coding style. You can always eliminate ifs through techniques like polymorphism, table driven methods, and so on. These same techniques are available in both C++ and C#. If there is a difference between programs written in these two languages, I suspect it has to do with the mentality of C# vs C++ programmers.
Note that I don't necessarily recommend "if" elimination. In my experience, if statements tend to be clearer than the alternatives. To directly address your second point: the way to eliminate chained string comparisons like that is to use a DFSA. Most of the time, however, string comparisons are perfectly suitable.

It's not something I've noticed; I've done 10 years C++ and 4 years of C# too!
Surely the number of if's relates to the design of your code rather than a difference between C# and C++?

To get rid of conditional expressions in either language you can consider the Inversion of Control pattern. It has the side effect of lessening those.

Based on the nature of 'bla' and 'blo' you can always try to use a std::map, with the strings as keys.

Too many if statement are code smell if you can replace them by a switch...case. Otherwise, I don't see the problem with using if.
Maybe you have used more event-driven programming in C#, while your C++ code is more sequential ?

There are better ways to implement a string parser than an endless set of if (strcmp...) statements.
One approach could be a map between strings and function pointers or functor objects.
Another design could involve a chain of responsibility pattern where the string is passed to a chain of objects that decide if they have a match or to pass it along.
I'm not aware of anything about C++ that makes it more prone to "if abuse" than any other language.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js