Clojure code static analysis tools

Clojure code static analysis tools - clojure

Is there a tool to run code convention tests in clojure? For example, make sure function names don't have any capital letters or keywords don't have any underscores in them.

Two useful Leiningen plugins I learned about recently:
lein-bikeshed
lein-kibit

Late to the party here. Seconding noahlz, the three main static analysis tools that I use on a regular basis are lein-bikeshed, lein-kibit, and Eastwood, though I also use yagni. Each of these has different strengths.
Bikeshed is good for general code cleanup but is mostly focused on style (e.g. making sure lines aren't too long, there's no trailing whitespace, functions have docstrings, etc.).
Kibit is good for showing you the most idiomatic function to use (for instance, when using an if form that returns nil if false, you could just use when instead).
Eastwood is probably the most comprehensive lint tool that exists for Clojure, and checks for a pretty impressive number of code smell issues.
Finally, Yagni is great for finding unused code paths in your libraries and applications.

Related

Is there an idiomatic alternative to nil-punning in Clojure?

I'm reading some Clojure code at the moment that has a bunch of uninitialised values as nil for a numeric value in a record that gets passed around.
Now lots of the Clojure libraries treat this as idiomatic. Which means that it is an accepted convention.
But it also leads to NullPointerException, because not all the Clojure core functions can handle a nil as input. (Nor should they).
Other languages have the concept of Maybe or Option to proxy the value in the event that it is null, as a way of mitigating the NullPointerException risk. This is possible in Clojure - but not very common.
You can do some tricks with fnil but it doesn't solve every problem.
Another alternative is simply to set the uninitialised value to a symbol like :empty-value to force the user to handle this scenario explicitly in all the handling code. But this isn't really a big step-up from nil - because you don't really discover all the scenarios (in other people's code) until run-time.
My question is: Is there an idiomatic alternative to nil-punning in Clojure?

Not sure if you've read this lispcast post on nil-punning, but I do think it makes a pretty good case for why it's idiomatic and covers various important considerations that I didn't see mentioned in those other SO questions.
Basically, nil is a first-class thing in clojure. Despite its inherent conventional meaning, it is a proper value, and can be treated as such in many contexts, and in a context-dependent way. This makes it more flexible and powerful than null in the host language.
For example, something like this won't even compile in java:
if(null) {
....
}
Where as in clojure, (if nil ...) will work just fine. So there are many situations where you can use nil safely. I'm yet to see a java codebase that isn't littered with code like if(foo != null) { ... everywhere. Perhaps java 8's Optional will change this.
I think where you can run into issues quite easily is in java interop scenarios where you are dealing with actual nulls. A good clojure wrapper library can also help shield you from this in many cases, and its one good reason to prefer one over direct java interop where possible.
In light of this, you may want to re-consider fighting this current. But since you are asking about alternatives, here's one I think is great: prismatic's schema. Schema has a Maybe schema (and many other useful ones as well), and it works quite nicely in many scenarios. The library is quite popular and I have used it with success. FWIW, it is recommended in the recent clojure applied book.

Is there an idiomatic alternative to nil-punning in Clojure?
No. As leeor explains, nil-punning is idiomatic. But it's not as prevalent as in Common Lisp, where (I'm told) an empty list equates to nil.
Clojure used to work this way, but the CL functions that deal with lists correspond to Clojure functions that deal with sequences in general. And these sequences may be lazy, so there is a premium on unifying lazy sequences with others, so that any laziness can be preserved. I think this evolution happened about Clojure 1.2. Rich described it in detail here.
If you want option/maybe types, take a look at the core.typed library. In contrast to Prismatic Schema, this operates at compile time.

Drop-in, portable parsing

I see umpteen posts a day about "how to do X with regexen". And the best response to most of them seems like it would honestly be, "Why are you trying to drive a screw with a hammer?" But regexen are everywhere, and the syntax is mostly portable, particularly if you keep away from the fancy bits.
Is there anything equivalent to regexen but at the next level up in power and configurability? A "you can use it anywhere" parsing library of some variety, preferably with a gloriously concise DSL as its interface?
I've used Ragel somewhat, but because of the preprocessing step, I'd hesitate to recommend it to someone as "use this instead of some hairy regex". It's awkward to use from Obj-C, and I expect it will be terribly awkward from a language that doesn't have compile-link-run as part of its standard operating procedure.
What I'm looking for is something that will pass the "inline-online-universal" test.
(inline) You can write the notation inline with your other code, as you would with a regex..
(online) You can run the resulting parser just as you would your other code, which would mean right after input to a REPL in the case of something like Python.
(universal) You can move to a different language/platform and use virtually the same code for your parser, modulo dialect differences. In reality, I'd be happy with something that works from Python, Ruby, C, Java, and Haskell.
Most tools I know of fall down at "online". They preprocess a grammar offline and spit out code in the target language (C, Python, Java, C++…). They're standalone tools that aren't themselves integrated into the language environment.
I've had suggestions of PEG parsers and lex/yacc combos. Parser combinator libraries might also be a good fit. Whatever you might propose, I'd like to see demonstrated that it meets these tests. Your answer should demonstrate that the proposed solution meets the inline-online-universal requirements by providing a working demo parser in Python, C, and Haskell. The demo example is up to the author, but it should be something painful using just regexen but trivial using a proper parser.

https://github.com/leblancmeneses/NPEG
Implements PEG.
Meets all 3 ... let me explain.
It is inline only with C# and offline with all the others. C# has an offline version also.
I currently support offline versions: C/C++/Javascript (local right now)/Java pass all unit tests - to make it universal. To add another language takes 25.84 hrs (how long it took to create the offline Javascript version)
To make it online for every language would be to much maintenance(possible) but it took me a lot of work and time just to support the current offline versions. I can now focus my energy on building grammar optimizers and tooling to unit test grammar rules where all offline versions benefit.

Have a look at Lex/Yacc or their counterparts Flex/Bison (or Coco, or all the other "compiler" generators). The combination can be used to parse complex textual data with an (arguably) much more readable syntax than with regexen.
For simple problems though, where regexen are more than sufficient, by any means do use them.

How do Lisp (Clojure) and Tcl compare in terms of abstraction and metaprogramming abilities?

Both seem to be good for building extensible API's and code generation.
What are the main differences between them?
What do you see as their strengths, weaknesses, ...

Disclaimer: I'm more familiar with Clojure than Tcl so apoligies to any Tclers if I misrepresent anything. However here are some points I'm generally aware of:
Both are extremely flexible - you can use meta-programming to generate and execute pretty much any code you like at runtime.
Clojure is a JVM language whereas Tcl is relatively standalone.
Advantages of being on the JVM include being able to generate interoperable code for and act as a DSL for libraries in other JVM languages like Java and Scala. Also you get access to the huge range of Java libraries.
Disadvantage of being on the JVM is relatively substantial start-up time. So you'd probably prefer Clojure for longer-running server applications rather than command line tools that need to execute quickly.
Clojure metaprogramming will get compiled to native code (via JIT in the JVM). Will depend on your application, but I expect this will perform faster than Tcl in most circumstances
Both languages are dynamic (a good thing for metaprogramming on average!) and support functional programming
Clojure contains some interesting abstractions that are very useful for metaprogramming - in particular the considerable in-language support for sequences
I personally find the simlicity of Lisp syntax to be a great advantage for metaprogramming - it's much easier to generate code when there is only one syntactical construct (the s-expression) to worry about...
On average both languages are great for metaprogramming, but if I was choosing between the two:
I'd choose Clojure if I was building a server-side application or if I had a particular need for JVM interoperability
I'd choose Tcl if I wanted a DSL for managing scripts / tools at the command line

I think mikera's answer is excellent.
I would add only that where in clojure metaprogramming tends to focus on macros, a grand unified metaprogramming solution without peer, tcl has several small sharp tools available for metaprogramming, among them the "unknown" command, which can be used to do all sorts of nifty tricks. Also interesting is that even though clojure has only a small number of keywords or "special forms", tcl actually has none, which sort of makes tcl it's own dsl, and changing (or extending) the behavior of any command is possible.
One thing i have to disagree with from mikera's answer is the part about clojure's syntax being more amenable to metaprogramming. One of the unpleasant surprises for me when coming to clojure is actually how much syntactic variation there is in clojure, with the various uses of () [] {} "" ^{} #' :key ... and on and on. I totally grok the justification of this type of syntactic sugar, but i actually find tcl's syntax easier to deal with for "ham-handed" metaprogramming and code generation. And tcl's "everything is a string" nature adds to the simplicity.
As for mulit-processing capabilities, immutable data structures, purely functional nature and many other instances of clojure's distinctives, there really is nothing comparable in tcl.
I couldn't agree more with mikera's conclusion.

Regex & String Libraries in Haskell

I'm trying to introduce Haskell into my daily life by using it to write incidental scripts and such.
readProcess is handy for getting the results of exterior commands, but I find myself searching when it comes to processing the String results. I'm coming from ruby where regexes are first-class, so I'm used to having them as a tool.
Any libraries I should read up on to do string processing in haskell? Searching for matching lines, pulling out matching regions of a string, and such?

I found this to be a good starting point: http://www.serpentine.com/blog/2007/02/27/a-haskell-regular-expression-tutorial/ It only covers the basics, no advanced topics, but it's great to get started IMHO.
Things to note:
Regexes in haskell are different in that they have overloaded return types. This means that you can pull many different kinds of thing out of a regex match. (Bool, String, [String], etc...) Depending on the return type you use, it will give you back a different kind of answer (whether or not the regex matched, the test of the match, all matching subgroups, etc..) This is done using some fairly complex typeclass voodoo. The above link demonstrates the basic kinds, a more complete list is here
There are actually multiple standard modules in haskell that provide regex support (strange but true). The tutorial above shows the POSIX module, because it comes standard in haskell. If you have cabal, you can also pretty easily install other regex modules and use those instead. There's a pcre binding (regex-pcre), as well as some packages that work via DFAs (regex-dfa, among others). Install using a command like: cabal install regex-pcre and you should be good to go.
(The modules have a standardized interface, the difference is mainly in the implementation and the regex flavor)
There IS a regex object in haskell, but you don't really need it to use the =~ or =~~ match operators. (Just use a string, conversion happens automatically). If your task is complicated enough that you want a first class parsing object, consider looking into Parsec as has been mentioned in other answers.
DISCLAIMER: I only really user pcre, myself, so I don't really know much about the other packages.

When I was first teaching myself Haskell I found that learning to use a parser combinator library for string processing was a fantastic investment. They can do everything regular expressions can do, and much more, and writing combinator parsers is a great way to build up intuitions about type classes like monads, applicative functors, etc.
I tend to use Attoparsec these days, but Parsec is probably a better starting point because it's more widely documented and discussed, provides nicer error messages, etc.

A good introduction to regular expressions is to be found in Realworld Haskell
Update: On a side note, for command-processing and pipes and such, checkout HSH.

There are plenty of great regex libs in Haskell, but we have better tools. Let's stick with standard Haskell Strings for now (i.e. lists of Char). The basics are all in Data.List -- http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.3.0.0/Data-List.html. You have lines, unlines, words, unwords, takewhile, dropwhile, etc.etc. Also isPrefixOf and isInfixOf, etc.
You may end up writing your own recursive functions fairly directly, but that's a breeze too. The only really missing operations are splitting ones, for which you can use brent's excellent package: http://hackage.haskell.org/package/split
Fundamentally, the notion is that you want to do incremental processing of streams of characters.
Not everything is as efficient as possible, especially since the string representation is not that efficient. But if/when you move on to other data types, the core concepts of how you process things will translate directly from basic strings.

Are regex tools (like RegexBuddy) a good idea?

One of my developers has started using RegexBuddy for help in interpreting legacy code, which is a usage I fully understand and support. What concerns me is using a regex tool for writing new code. I have actually discouraged its use for new code in my team. Two quotes come to mind:
Some people, when confronted with a
problem, think "I know, I’ll use
regular expressions." Now they have
two problems. - Jamie Zawinski
And:
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to debug
it. - Brian Kernighan
My concerns are (respectively:)
That the tool may make it possible to solve a problem using a complicated regular expression that really doesn't need it. (See also this question).
That my one developer, using regex tools, will start writing regular expressions which (even with comments) can't be maintained by anyone who doesn't have (and know how to use) regex tools.
Should I encourage or discourage the use of regex tools, specifically with regard to producing new code? Are my concerns justified? Or am I being paranoid?

Poor programming is rarely the fault of the tool. It is the fault of the developer not understanding the tool. To me, this is like saying a carpenter should not own a screwdriver because he might use a screw where a nail would have been more appropriate.

Regular expressions are just one of the many tools available to you. I don't generally agree with the oft-cited Zawinski quote, as with any technology or technique, there are both good and bad ways to apply them.
Personally, I see things like RegexBuddy and the free Regex Coach primarily as learning tools. There are certainly times when they can be helpful to debug or understand existing regexes, but generally speaking, if you've written your regex using a tool, then it's going to be very hard to maintain it.
As a Perl programmer, I'm very familiar with both good and bad regular expressions, and have been using even complicated ones in production code successfully for many years. Here are a few of the guidelines I like to stick to that have been gathered from various places:
Don't use a regex when a string match will do. I often see code where people use regular expressions in order to match a string case-insensitively. Simply lower- or upper-case the string and perform a standard string comparison.
Don't use a regex to see if a string is one of several possible values. This is unnecessarily hard to maintain. Instead place the possible values in an array, hash (whatever your language provides) and test the string against those.
Write tests! Having a set of tests that specifically target your regular expression makes development significantly easier, particularly if it's a vaguely complicated one. Plus, a few tests can often answer many of the questions a maintenance programmer is likely to have about your regex.
Construct your regex out of smaller parts. If you really need a big complicated regex, build it out of smaller, testable sections. This not only makes development easier (as you can get each smaller section right individually), but it also makes the code more readable, flexible and allows for thorough commenting.
Build your regular expression into a dedicated subroutine/function/method. This makes it very easy to write tests for the regex (and only the regex). it also makes the code in which your regex is used easier to read (a nicely named function call is considerably less scary than a block of random punctuation!). Dropping huge regular expressions into the middle of a block of code (where they can't easily be tested in isolation) is extremely common, and usually very easy to avoid.

You should encourage the use of tools that make your developers more efficient. Having said that, it is important to make sure they're using the right tool for the job. You'll need to educate all of your team members on when it is appropriate to use a regular expression, and when (less|more) powerful methods are called for. Finally, any regular expression (IMHO) should be thoroughly commented to ensure that the next generation of developers can maintain it.

I'm not sure why there is so much diffidence against regex.
Yes, they can become messy and obscure, exactly as any other piece of code somebody may write but they have an advantage over code: they represent the set of strings one is interested to in a formally specified way (at least by your language if there are extensions). Understanding which set of strings is accepted by a piece of code will require "reverse engineering" the code.
Sure, you could discurage the use of regex as has already been done with recursion and goto's but this would be justifed to me only if there's a good alternative.
I would prefer maintain a single line regex code than a convoluted hand-made functions that tries to capture a set of strings.
On using a tool to understand a regex (or write a new one) I think it's perfectly fine! If somebody wrote it with the tool, somebody else could understand it with a tool! Actually, if you are worried about this, I would see tools like RegexBuddy your best insurance that the code will not be unmaintainable just because of the regex's

Regex testing tools are invaluable. I use them all the time. My job isn't even particularly regex heavy, so having a program to guide me through the nuances as I build my knowledge base is crucial.

Regular expressions are a great tool for a lot of text handling problems. If you have someone on your team who is writing regexes that the rest of the team don't understand, why not get them to teach the rest of you how they are working? Rather than a threat, you could be seeing this as an opportunity. That way you wouldn't have to feel threatened by the unknown and you'll have another very valuable tool in your arsenal.
Zawinski's comments, though entertainingly glib, are fundamentally a display of ignorance and writing Regular Expressions is not the whole of coding so I wouldn't worry about those quotes. Nobody ever got the whole of an argument into a one-liner anyways.
If you came across a Regular Expression that was too complicated to understand even with comments, then probably a regex wasn't a good solution for that particular problem, but that doesn't mean they have no use. I'd be willing to bet that if you've deliberately avoided them, there will be places in your codebase where you have many lines of code and a single, simple, Regex would have done the same job.
Regexbuddy is a useful shortcut, to make sure that the regular expressions you are writing do what you expect- it certainly makes life easier, but it's the matter of using them at all that is what seems important to me about your question.

Like others have said, I think using or not using such a tool is a neutral issue. More to the point: If a regular expression is so complicated that it needs inline comments, it is too complicated. I never comment my regexps. I approach large or complex matching problems by breaking it down into several steps of matching, either with multiple match statements (=~), or by building up a regexp with sub regexps.
Having said all that, I think any developer worth his salt should be reasonably proficient in regular expression writing and reading. I've been using regular expressions for years and have never encountered a time where I needed to write or read one that was terrifically complex. But a moderately sized one may be the most elegant and concise way to do a validation or match, and regexps should not be shied away from only because an inexperienced developer may not be able to read it -- better to educate that developer.

What you should be doing is getting your other devs hooked up with RB.
Don't worry about that whole "2 probs" quote; it seems that may have been a blast on Perl (said back in 1997) not regex.

I prefer not to use regex tools. If I can't write it by hand, then it means the output of the tool is something I don't understand and thus can't maintain. I'd much rather spend the time reading up on some regex feature than learning the regex tool. I don't understand the attitude of many programmers that regexes are a black art to be avoided/insulated from. It's just another programming language to be learned.
It's entirely possible that a regex tool would save me some time implementing regex features that I do know, but I doubt it... I can type pretty fast, and if you understand the syntax well (using a text editor where regexes are idiomatic really helps -- I use gVim), most regexes really aren't that complex. I think you're nearly always better served by learning a technology better rather than learning a crutch, unless the tool is something where you can put in simple info and get out a lot of boilerplate code.

Well, it sounds like the cure for that is for some smart person to introduce a regex tool that annotates itself as it matches. That would suggest that using a tool is not as much the issue as whether there is a big gap between what the tool understands and what the programmer understands.
So, documentation can help.
This is a real trivial example is a table like the following (just a suggestion)
Expression Match Reason
^ Pos 0 Start of input
\s+ " " At least one space
(abs|floor|ceil) ceil One of "abs", "floor", or "ceil"
...
I see the issue, though. You probably want to discourage people from building more complex regular expression than they can parse. I think standards can address this, by always requiring expanded REs and check that the annotation is proper.
However, if they just want to debug an RE, to make sure it's acting as they think it's acting, then it's not really much different from writing code you have to debug.
It's relative.

A couple of regex tools (for Node/JS, PHP and Python) i made (for some other projects) are available online to play and experiment.
regex-analyzer and regex-composer
github repo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js