foreign source code pre-processing with Clojure

foreign source code pre-processing with Clojure - clojure

I'd like to preprocess code from another language like so:
Predicate1(X) => Predicate2(Y)
<% (clojure-func "Predicate3" "X" "Y") %>
Basically, what's inside angle brackets gets executed and the emitted string output inserted into the string. I see that there are HTML templating libraries. I'm wondering if I can get by with something like Clojure macros. It is possible that I'm not aware of the benefits provided by a templating library like Fleet or Selmer, and need some guidance.
In the above example I want to create combinations of more expressions:
Predicate3(X_a) => Predicate2(Y)
Predicate3(X_b) => Predicate2(Y)
Ultimately, I do need to keep track of variables of the foreign language. For this purpose pre-processing may be the wrong approach and that instead I'm better off doing complete code-generation.
P.S.: For those of you wondering I'm trying to extend the language of Markov Logic Networks (MLN).

Clojure macros will not help you directly with this. Macros still require expressions to be in essentially Clojure readable syntax with invocations of the form (macro arg1 arg...).
Other Lisps do allow you to extend the readable syntax with reader macros, but Clojure made a decision not to allow them.

Related

Persisting variables in core.async style

I need to do a big trick and am keen on hearing your suggestions.
What I need is a macro that takes ordinary clojure code peppered with a special "await" form. The await forms contains only clojure code and are supposed to return the code's return value. Now, what I want is that when I run whatever is being produced by this macro, it should stop executing when the first "await" form is due for evaluation.
Then, it should dump all the variables defined in its scope so far to the database (I will ignore the problem that not all Clojure types can be serialised to EDN, e.g. functions can't), together with some marker of the place it has stopped in.
Then, if I want to run this code again (possibly on a different machine, another day) - it will read its state from the DB and continue where it stopped.
Therefore I could have, for example:
(defexecutor my-executor
(let [x 7
y (await (+ 3 x))]
(if (await (> y x))
"yes"
"no")))
Now, when I do:
(my-executor db-conn "unique-job-id")
the first time I should get a special return value, something like
:deferred
The second time it should be like this as well, only the third time a real return value should be returned.
The question I have is not how to write such executor, but rather how to gather information from within the macro about all the declared variables to be able to store them. Later I also want to re-establish them when I continue execution. The await forms can be nested, of course :)
I had a peek into core.async source code because it is doing a similar thing inside, but what I have found there made me shiver - it seems they employ the Clojure AST analyser to get this info. Is this really so complex? I know of &env variable inside a macro, but do not have any idea how to use it in this situation. Any help would be appreciated.
And one more thing. Please do not ask me why I need this or that there is a different way of solving a problem - I want this specific solution.

I will ignore the problem that not all Clojure types can be serialised to EDN, e.g. functions can't
If you ignore this, it will be very restrictive for the kinds of Clojure expressions you can handle. Functions are everywhere, e.g. in the implementation of things like doseq and for. Likewise, a lot of interesting programs will depend on some Java object like a file handle or whatever.
The question I have is not how to write such executor, but rather how to gather information from within the macro about all the declared variables to be able to store them.
If you manage to write such an executor, I suspect its implementation will need to know about local variables anyway. So you can put off this question until you are done implementing your executor - you will probably find it obsolete, if you can implement your executor.
I had a peek into core.async source code because it is doing a similar thing inside, but what I have found there made me shiver - it seems they employ the Clojure AST analyser to get this info. Is this really so complex?
Yes, this is very intrusive. You are basically writing a compiler. Thank your lucky stars they wrote the analyzer for you already, instead of having to analyze expressions yourself.
I know of &env variable inside a macro, but do not have any idea how to use it in this situation.
This is the easy part. If you like, you can write a simple macro that gives you all the locals in scope. This question has been asked and answered before, e.g. in Clojure get local lets.
And one more thing. Please do not ask me why I need this or that there is a different way of solving a problem - I want this specific solution.
This is generally an unproductive attitude when asking a question. It's admitting you're posing an XY problem, and still refusing to tell anyone what the Y is.

What language is used in this slide from Google Next

I was watching a Google Next session, as I'm interested in Google's cloud and their Go language.
Developer ecosystems/communities have their ways of doing things, cultural customs, which can be really alien to outsiders who don't have the experience to fill-in the gaps.
So I have a few noob questions:
What language is this?
What language does Google use in samples, Python, Go, or pseudo code?
Why is there a call to getFailedInserts() but the result of the get isn't assigned to anything?
Is it normal to use what I call magic strings, i.e. "WriteMutatedRecords", as instructions instead of naming a method as such or using an enum, or string consts?

The code example is Java using Apache Beam programming model (https://beam.apache.org/)
I believe the complete code from the slide is here:
https://github.com/ryanmcdowell/dataflow-dynamic-schema/blob/master/src/main/java/com/google/cloud/pso/pipeline/DynamicSchemaPipeline.java
The code from slide:
tries to insert data into a table 'events_table'
If it returns a transient error from Big Query API (for example "column 'foo' does not exist") it runs a table mutation adding 'foo' and inserts data again.
It is a pattern to create flexible tables into Big Query which is a predefined schema columnar database.

The code example looks like it is written in Scala or Java. You can tell from a number of indicators:
The code has a Java-style syntax
Methods are called on objects (e.g. input), which means it is an object-oriented language
new BigQuerySchemaMutator() is typical for a Java - style constructor
These indicators do not, however, give any indication wether it is Scala or Java. The syntax of these languages is very similar, and both are JVM - lanugages.
The strongest indicator for Scala in my opinion is that the code is written in a functional matter, and it contains two method invocations on BigQueryIO, which could either be a static method for the class BigQueryIO itself in case of Java, or is a method defined on the object BigQueryIO in Scala, which is a common design pattern in the language.
There is, however, the final ; which would only be necessary with Java.
For someone reading the code example this question is actually not important, because Apache Beam (which is the SDK that seems to be used here) is a Java library - which can be used both in Java and Scala.
The result of getFailedInserts seems to be further processed by calling .apply on it. This kind of style is called functional programming.
It's a whole different approach to programming, instead of the common procedural programming patterns found in most other lanugages. (e.g. storing something in a variable / variables in general)
Note that this example doesn't actually contain any functional programming per se (e.g. higher order functions alias lambdas), but the functional programming style is obvious.
It is always considered best practice to not have magic strings, but for such a code example they probably wanted to keep the code as simple as possible - as it is a one-liner already (allthough with line breaks).

Why does Clojure lack user defined reader macros?

As I understand it Clojure does not expose the reader macro table or allow user defined reader macros.
From http://clojure.org/reader:
The read table is currently not accessible to user programs.
I'm just wondering if there is a definitive or explicit statement (presumably from Rich Hickey) stating the rationale for leaving them out of Clojure.
Note I'm not asking if it is a good or bad thing that Clojure lacks user defined reader macros. Just wondering why.

From the link in matt's comments, to quote the answer by Rich Hickey, the author of Clojure:
I am unconvinced that reader macros are needed in Clojure at this
time. They greatly reduce the readability of code that uses them (by
people who otherwise know Clojure), encourage incompatible custom mini-
languages and dialects (vs namespace-partitioned macros), and
complicate loading and evaluation.
To the extent I'm willing to accommodate common needs different from
my own (e.g. regexes), I think many things that would otherwise have
forced people to reader macros may end up in Clojure, where everyone
can benefit from a common approach.
Clojure is arguably a very simple language, and in that simplicity
lies a different kind of power.
I'm going to pass on pursuing this for now,
Rich

Speaking straight there are Tagged Literals that allow you to specify what to do with next form. For example, you can add
{to/u clojure.string/upper-case}
to data_readers.clj (see docs) and write something like this:
testapp.core> #to/u "asd"
"ASD"
but it's not so powerful as full support of reader macros, at least because of
The data reader function is invoked on the form AFTER it has been read as a normal Clojure data structure by the reader.
I found this old log (don't ask me how)
http://clojure-log.n01se.net/date/2008-11-06.html
where there is a discussion with Rich Hickey's thoughts about reader macros.

How do Clojure programmers use Macros?

My understanding is Clojure's homoiconicity exists so as to make writing macros easier.
Based on this stackoverflow thread, it looks like Macros are used sparingly, except for DSLs in which higher-order functions are not to be used.
Could someone share some examples of how macros are used in real-life?

It's correct that homoiconicity makes writing Clojure macros very easy. Basically they enable you to write code that builds whatever code you want, exploiting the "code is data" philosophy of Lisp.
Macros in a homoiconic language are also extremely powerful. There's a fun example presentation I found where they implement a LINQ-like query syntax in just three lines of Clojure.
In general, Clojure macros are potentially useful for many reasons:
Control structures - it's possible to create certain control structures using macros that can never be represented as functions. For example, you can't write if as a function, because if it was a function then it would have to evaluate all three arguments, whereas with a macro you can make it only evaluate two (the condition value and either the true or false expression)
Compile time optimisation - sometimes you want to optimise your code based on a constant that is known or can be computed at compile time. For example, you could create a "logging" function that logs only if the code was compiled in debug mode, but creates zero overhead in the production application.
Code generation / boilerplate elimination - if you need to produce a lot of very similar code with similar structure, then you can use macros to automatically generate these from a few parameters. If you hate boilerplate, then macros are your friends.
Creating new syntax - if you see the need for a particular piece of syntax that would be useful (perhaps encapsulating a common pattern) then you can create a macro to implement this. Some DSLs for example can be simplified with additional syntax.
Creating a new language with entirely new semantics (Credits to SK-Logic!) theoretically you could even go so far to create a new language using macros, which would effectively compile your new language down into Clojure. The new langauge would not even have to be Lisp-like: it could parse and compile arbitrary strings for example.
One important piece of advice is only use macros if you need them and functions won't work. Most problems can be solved with functions. Apart for the fact that macros are overkill for simple cases, functions have some intrinsic advantages: they are more flexible, can be stored in data structures, can be passed as parameters to higher order functions, are a bit easier for people to understand etc.

How do you implement syntax highlighting?

I am embarking on some learning and I want to write my own syntax highlighting for files in C++.
Can anyone give me ideas on how to go about doing this?
To me it seems that when a file is opened:
It would need to be parsed and decided what type of source file it is. Trusting the extension might not be fool-proof
A way to know what keywords/commands apply to what language
A way to decide what color each keyword/command gets
I want to do this on OS X, using C++ or Objective-C.
Can anyone provide pointers on how I might get started with this?

Syntax highlighters typically don't go beyond lexical analysis, which means you don't have to parse the whole language into statements and declarations and expressions and whatnot. You only have to write a lexer, which is fairly easy with regular expressions. I recommend you start by learning regular expressions, if you haven't already. It'll take all of 30 minutes.
You may want to consider toying with Flex ( the lexical analyzer generator; https://github.com/westes/flex ) as a learning exercise. It should be quite easy to implement a basic syntax highlighter in Flex that outputs highlighted HTML or something.
In short, you would give Flex a set of regular expressions and what to do with matching text, and the generator will greedily match against your expressions. You can make your lexer transition among exclusive states (e.g. in and out of string literals, comments, etc.) as shown in the flex FAQ. Here's a canonical example of a lexer for C written in Flex: http://www.lysator.liu.se/c/ANSI-C-grammar-l.html .
Making an extensible syntax highlighter would be the next part of your journey. Although I am by no means a fan of XML, take a look at how Kate syntax highlighting files are defined, such as this one for C++ . Your task would be to figure out how you want to define syntax highlighters, then make a program that uses those definitions to generate HTML or whatever you please.

You may want to look at how GeSHI implements highlighting, etc. In addition, it has a whole bunch of language packs that contain all the keywords you'll ever want.

Assuming that you are using Cocoa frameworks you can use UTIs to determine the file type.
For an overview of the api:
http://developer.apple.com/mac/library/documentation/FileManagement/Conceptual/understanding_utis/understand_utis_intro/understand_utis_intro.html#//apple_ref/doc/uid/TP40001319-CH201-SW1
For a list of known UTIs:
http://developer.apple.com/mac/library/documentation/Miscellaneous/Reference/UTIRef/Articles/System-DeclaredUniformTypeIdentifiers.html#//apple_ref/doc/uid/TP40009259-SW1
The two keys are you probably most interested in would be kUTTypeObjectiveCPlusPlusSource and kUTTypeCPlusPlusHeader.
For the highlighting you might find the information on this page helpful as it discusses syntax highlighting with an NSView and temporary attributes:
http://www.cocoadev.com/index.pl?ImplementSyntaxHighlightingUsingTemporaryAttributes

I think (1) isn't possible, since the only way to tell if a file is valid C++ is to run it through a C++ parser and see if it parses... but if you used that as your standard, you couldn't operate on code that doesn't compile because it is a work-in-progress, which you probably want to do. It's probably best just to trust the extension, as I don't think any other method will work better than that.
You can get a list of C++ keywords here: http://www.cppreference.com/wiki/keywords/start
The colors are up to you (or if you want, you can make them configurable and leave the choice to the user)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js