Parsing in Clojure

Parsing in Clojure - clojure

I am aware of edn/read-string
In Haskell, I tend to use Parsec https://hackage.haskell.org/package/parsec
I need to parse an Algo/Pascal like programming language. What are the suggested libraries to use for parsing in Clojure?
EDIT: 4. Bonus if there's a way to do this via a core.logic like manner, where a "prioritized" conde would specify the options (and the order to resolve ambiguity).

You should really look at Instaparse. It is one of the best front end parsing libraries I've ever used. It is a lively project with steady advancements.

Related

Is there an idiomatic alternative to nil-punning in Clojure?

I'm reading some Clojure code at the moment that has a bunch of uninitialised values as nil for a numeric value in a record that gets passed around.
Now lots of the Clojure libraries treat this as idiomatic. Which means that it is an accepted convention.
But it also leads to NullPointerException, because not all the Clojure core functions can handle a nil as input. (Nor should they).
Other languages have the concept of Maybe or Option to proxy the value in the event that it is null, as a way of mitigating the NullPointerException risk. This is possible in Clojure - but not very common.
You can do some tricks with fnil but it doesn't solve every problem.
Another alternative is simply to set the uninitialised value to a symbol like :empty-value to force the user to handle this scenario explicitly in all the handling code. But this isn't really a big step-up from nil - because you don't really discover all the scenarios (in other people's code) until run-time.
My question is: Is there an idiomatic alternative to nil-punning in Clojure?

Not sure if you've read this lispcast post on nil-punning, but I do think it makes a pretty good case for why it's idiomatic and covers various important considerations that I didn't see mentioned in those other SO questions.
Basically, nil is a first-class thing in clojure. Despite its inherent conventional meaning, it is a proper value, and can be treated as such in many contexts, and in a context-dependent way. This makes it more flexible and powerful than null in the host language.
For example, something like this won't even compile in java:
if(null) {
....
}
Where as in clojure, (if nil ...) will work just fine. So there are many situations where you can use nil safely. I'm yet to see a java codebase that isn't littered with code like if(foo != null) { ... everywhere. Perhaps java 8's Optional will change this.
I think where you can run into issues quite easily is in java interop scenarios where you are dealing with actual nulls. A good clojure wrapper library can also help shield you from this in many cases, and its one good reason to prefer one over direct java interop where possible.
In light of this, you may want to re-consider fighting this current. But since you are asking about alternatives, here's one I think is great: prismatic's schema. Schema has a Maybe schema (and many other useful ones as well), and it works quite nicely in many scenarios. The library is quite popular and I have used it with success. FWIW, it is recommended in the recent clojure applied book.

Is there an idiomatic alternative to nil-punning in Clojure?
No. As leeor explains, nil-punning is idiomatic. But it's not as prevalent as in Common Lisp, where (I'm told) an empty list equates to nil.
Clojure used to work this way, but the CL functions that deal with lists correspond to Clojure functions that deal with sequences in general. And these sequences may be lazy, so there is a premium on unifying lazy sequences with others, so that any laziness can be preserved. I think this evolution happened about Clojure 1.2. Rich described it in detail here.
If you want option/maybe types, take a look at the core.typed library. In contrast to Prismatic Schema, this operates at compile time.

Clojure code static analysis tools

Is there a tool to run code convention tests in clojure? For example, make sure function names don't have any capital letters or keywords don't have any underscores in them.

Two useful Leiningen plugins I learned about recently:
lein-bikeshed
lein-kibit

Late to the party here. Seconding noahlz, the three main static analysis tools that I use on a regular basis are lein-bikeshed, lein-kibit, and Eastwood, though I also use yagni. Each of these has different strengths.
Bikeshed is good for general code cleanup but is mostly focused on style (e.g. making sure lines aren't too long, there's no trailing whitespace, functions have docstrings, etc.).
Kibit is good for showing you the most idiomatic function to use (for instance, when using an if form that returns nil if false, you could just use when instead).
Eastwood is probably the most comprehensive lint tool that exists for Clojure, and checks for a pretty impressive number of code smell issues.
Finally, Yagni is great for finding unused code paths in your libraries and applications.

What Language Features Can Be Added To Clojure Through Libraries?

For example pattern matching is a programming language feature that can be added to the clojure language through macros: http://www.brool.com/index.php/pattern-matching-in-clojure
What other language features can be added to the language?

Off the top of my hat I have two examples, but I'm sure there are more.
Contracts programming: https://github.com/fogus/trammel
Declarative logic: https://github.com/jduey/mini-kanren

I think its a stupid question to ask what can be added, what you should ask is what you cant add. Macros allow you to hook into the compiler that mean you can do almost anything.
At the moment you cant add your own syntax to the language. Clojure does not have a user extenseble reader, this means you don't have any reader-macros (http://dorophone.blogspot.com/2008/03/common-lisp-reader-macros-simple.html). This is not because of a technical problem but more a decition by Rich Hickey (the Clojure creator).
What you can not do is implement features that need virtual maschine support like add tail call semantics or goto.
If you want to see some stuff that has been done: Are there any Clojure DSLs?
Note that this list is not 100% up to date.
Edit:
Since you seem you took pattern matching as an example (it is a really good example for the power of macros) you should really look at the match library. Its probebly the best fastest pattern matching library in Clojure. http://vimeo.com/27860102

You can effectively add any language features you like.
This follows from the ability of macros to construct arbitrary code at compile time: as long as you can figure out what code you need to generate in order to implement your language features, it can be achieved with macros.
Some examples I've seen:
Query languages (Korma)
Logic programming (core.logic)
Image synthesis DSL (clisk)
Infix notation for arithmetic
Algebraic manipulation
Declarative definition of realtime data flows (Storm, Aleph)
Music programming (Overtone, Music As Data)
There are a few caveats:
If the feature isn't supported directly by the JVM (e.g. tail call optimisation in the mutually recursive case) then you'll have to emulate it. Not a big deal, but may have some performance impact.
If the feature requires a syntax not supported by the Clojure reader, you'll need to provide your own reader (since Clojure lacks an extensible reader at present). As a result, it's much easier if you stick to Clojure syntax/forms.
If you do anything too unusual / unidiomatic, it probably won't get picked up by others. There is a lot of value in sticking to standard Clojure conventions.
Beware of using macros where they are not needed. Often, just using normal functions (perhaps higher order functions) is sufficient to implement many new language features. The general rule is: "don't use macros unless you absolutely need to".

Writing a tokenizer, where to begin?

I'm trying to write a tokenizer for CSS in C++, but I have no idea how to write a tokenizer. I know that it should be greedy, reading as much input as possible, for each token, and in theory I know how I could put that in code.
I have looked at Boost.Tokenizer, and it seems nice, but it doesn't help me whatsoever. It sure is a nice wrapper for a tokenizer, but the problem lies in writing the token splitter, the TokenizerFunction in Boost terms.
I have no idea how to write this tokenizer, are there any "neat" ways of doing it, like something that closely resembles the syntax itself?
Please note, I'm not looking for a parser! My application doesn't need to be able to understand CSS, just read a CSS file to a general internal tokenized format, process some things and output again.

Writing a "correct" lexer and/or parser is more difficult than you might think. And it can get ugly when you start dealing with weird corner cases.
My best suggestion is to invest some time in learning a proper lexer/parser system. CSS should be a fairly easy language to implement, and then you will have acquired an amazingly powerful tool you can use for all sorts of future projects.
I'm an Old Fart® and I use lex/yacc (or things that use the same syntax) for this type of project. I first learned to use them back in the early 80's and it has returned the effort to learn them many, many times over.
BTW, if you have anything approaching a BNF of the language, lex/yacc can be laughably easy to work with.

Boost.Spirit.Qi would be my first choice.
Spirit.Qi is designed to be a practical parsing tool. The ability to generate a fully-working parser from a formal EBNF specification inlined in C++ significantly reduces development time. Programmers typically approach parsing using ad hoc hacks with primitive tools such as scanf. Even regular-expression libraries (such as boost regex) or scanners (such as Boost tokenizer) do not scale well when we need to write more elaborate parsers. Attempting to write even a moderately-complex parser using these tools leads to code that is hard to understand and maintain.
The Qi tutorials even finish by implementing a parser for an XMLish language; writing a grammar for CSS should be considerably easier.

Regex & String Libraries in Haskell

I'm trying to introduce Haskell into my daily life by using it to write incidental scripts and such.
readProcess is handy for getting the results of exterior commands, but I find myself searching when it comes to processing the String results. I'm coming from ruby where regexes are first-class, so I'm used to having them as a tool.
Any libraries I should read up on to do string processing in haskell? Searching for matching lines, pulling out matching regions of a string, and such?

I found this to be a good starting point: http://www.serpentine.com/blog/2007/02/27/a-haskell-regular-expression-tutorial/ It only covers the basics, no advanced topics, but it's great to get started IMHO.
Things to note:
Regexes in haskell are different in that they have overloaded return types. This means that you can pull many different kinds of thing out of a regex match. (Bool, String, [String], etc...) Depending on the return type you use, it will give you back a different kind of answer (whether or not the regex matched, the test of the match, all matching subgroups, etc..) This is done using some fairly complex typeclass voodoo. The above link demonstrates the basic kinds, a more complete list is here
There are actually multiple standard modules in haskell that provide regex support (strange but true). The tutorial above shows the POSIX module, because it comes standard in haskell. If you have cabal, you can also pretty easily install other regex modules and use those instead. There's a pcre binding (regex-pcre), as well as some packages that work via DFAs (regex-dfa, among others). Install using a command like: cabal install regex-pcre and you should be good to go.
(The modules have a standardized interface, the difference is mainly in the implementation and the regex flavor)
There IS a regex object in haskell, but you don't really need it to use the =~ or =~~ match operators. (Just use a string, conversion happens automatically). If your task is complicated enough that you want a first class parsing object, consider looking into Parsec as has been mentioned in other answers.
DISCLAIMER: I only really user pcre, myself, so I don't really know much about the other packages.

When I was first teaching myself Haskell I found that learning to use a parser combinator library for string processing was a fantastic investment. They can do everything regular expressions can do, and much more, and writing combinator parsers is a great way to build up intuitions about type classes like monads, applicative functors, etc.
I tend to use Attoparsec these days, but Parsec is probably a better starting point because it's more widely documented and discussed, provides nicer error messages, etc.

A good introduction to regular expressions is to be found in Realworld Haskell
Update: On a side note, for command-processing and pipes and such, checkout HSH.

There are plenty of great regex libs in Haskell, but we have better tools. Let's stick with standard Haskell Strings for now (i.e. lists of Char). The basics are all in Data.List -- http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.3.0.0/Data-List.html. You have lines, unlines, words, unwords, takewhile, dropwhile, etc.etc. Also isPrefixOf and isInfixOf, etc.
You may end up writing your own recursive functions fairly directly, but that's a breeze too. The only really missing operations are splitting ones, for which you can use brent's excellent package: http://hackage.haskell.org/package/split
Fundamentally, the notion is that you want to do incremental processing of streams of characters.
Not everything is as efficient as possible, especially since the string representation is not that efficient. But if/when you move on to other data types, the core concepts of how you process things will translate directly from basic strings.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js