What exactly is the Clojure REPL? What is the technology behind it? - clojure

I know what the Clojure Repl does and how it is useful, but I do not have any information on how the internals of it works. Is it a program running in the JVM? How does the internals of a repl work?

The technology behind it:
the tiny Java entry point:
https://github.com/clojure/clojure/blob/clojure-1.7.0/src/jvm/clojure/main.java
the actual implementation of the REPL written in Clojure:
https://github.com/clojure/clojure/blob/clojure-1.7.0/src/clj/clojure/main.clj
The links are to the 1.7.0 versions of the files, that being the most recent stable release as of this writing.
To summarize what these do, clojure.main is a tiny Java class with a main method that serves as the entry point to the REPL. (So, it's just a standard Java program.) That main method accepts any arguments and hands them off to a function in the clojure.main Clojure namespace (using a few simple calls to methods in the clojure.lang.RT class which implements some core details of the Clojure runtime to get at the function in question – well, strictly speaking the Var that holds the function). Then the said function calls code that actually Reads user input, Evaluates it, Prints out the result and Loops around to the read more input again, until terminated by C-d or some other method, with various complications like setting up some Var bindings and such (to allow user control over some aspects of the REPL's operation and certain compiler settings).

Related

How can one execute C++ code at runtime from a string representing a function?

I'm working on a project wherein I need to be able to save a function string to disk, so I am having the user pass a string of characters that is the actual code of the function and saving it to disk. The opposite is necessary as well; loading a string (from file) and executing as a function at runtime within C++. I need to load this function and return a function pointer to be used in my program. I'm looking at Clang right now, but some of it is a little over my head. So basically I have two questions;
Can Clang run code extracted from a string (loaded from disk)?
Can a compiled Clang function be represented with a function pointer pointing to it?
Any ideas?
The simple answer to your question is "yes", the slightly more complex answer is "not at all easily".
Doing it with C++ would require that you compile and link your function into a DLL/shared object, load it, then acquire the exported function. In addition, accepting such code from the user would be a terrible security risk
C++ is a very poor choice for such run-time execution, you would be far better off going with a language meant for that use, JavaScript or Python come to mind.
You can't easily do this in a compiled language.
For a compiled program to execute a C++ function that has been dynamically provided at runtime, that function would need to be compiled itself. You could make your program call the compiler at runtime to generate a callable library (e.g. one that implements an interface or abstract class and is callable via Dependency Injection), but this is complex and is a project in and of itself. This also means that your application must be packaged with the compiler or must only be installed on systems that contain a compatible compiler - somewhat realistic on Linux, not at all so on Windows.
A better solution would be to use an interpreter. JavaScript and Lisp both come with an eval() function that does exactly what you want - it takes a string (in the case of JavaScript) or a list (in the case of Lisp) and executes it as code.
A third possibility is to find a C++ interpreter that has an eval() function. I'm not sure if any exist. You could try to write one yourself.

Precompile script into objects inside C++ application

I need to provide my users the ability to write mathematical computations into the program. I plan to have a simple text interface with a few buttons including those to validate the script grammar, save etc.
Here's where it gets interesting. These functions the user is writing need to execute at multi-megabyte line speeds in a communications application. So I need the speed of a compiled language, but the usage of a script. A fully interpreted language just won't cut it.
My idea is to precompile the saved user modules into objects at initialization of the C++ application. I could then use these objects to execute the code when called upon. Here are the workflows I have in mind:
1) Testing(initial writing) of script: Write code in editor, save, compile into object (testing grammar), run with test I/O, Edit Code
2) Use of Code (Normal operation of application): Load script from file, compile script into object, Run object code, Run object code, Run object code, etc.
I've looked into several off the shelf interpreters, but can't find what I'm looking for. I considered JAVA, as it is pretty fast, but I would need to load the JAVA virtual machine, which means passing objects between C and the virtual machine... The interface is the bottleneck here. I really need to create a native C++ object running C++ code if possible. I also need to be able to run the code on multiple processors effectively in a controlled manner.
I'm not looking for the whole explanation on how to pull this off, as I can do my own research. I've been stalled for a couple days here now, however, and I really need a place to start looking.
As a last resort, I will create my own scripting language to fulfill the need, but that seems a waste with all the great interpreters out there. I've also considered taking an existing open source complier and slicing it up for the functionality I need... just not saving the compiled results to disk... I don't know. I would prefer to use a mainline language if possible... but that's not required.
Any help would be appreciated. I know this is not your run of the mill idea I have here, but someone has to have done it before.
Thanks!
P.S.
One thought that just occurred to me while writing this was this: what about using a true C compiler to create object code, save it to disk as a dll library, then reload and run it inside "my" code? Can you do that with MS Visual Studio? I need to look at the licensing of the compiler... how to reload the library dynamically while the main application continues to run... hmmmmm I could then just group the "functions" created by the user into library groups. Ok that's enough of this particular brain dump...
A possible solution could be use gcc (MingW since you are on windows) and build a DLL out of your user defined code. The DLL should export just one function. You can use the win32 API to handle the DLL (LoadLibrary/GetProcAddress etc.) At the end of this job you have a C style function pointer. The problem now are arguments. If your computation has just one parameter you can fo a cast to double (*funct)(double), but if you have many parameters you need to match them.
I think I've found a way to do this using standard C.
1) Standard C needs to be used because when it is compiled into a dll, the resulting interface is cross compatible with multiple compilers. I plan to do my primary development with MS Visual Studio and compile objects in my application using gcc (windows version)
2) I will expose certain variables to the user (inputs and outputs) and standardize them across units. This allows multiple units to be developed with the same interface.
3) The user will only create the inside of the function using standard C syntax and grammar. I will then wrap that function with text to fully define the function and it's environment (remember those variables I intend to expose?) I can also group multiple functions under a single executable unit (dll) using name parameters.
4) When the user wishes to test their function, I dump the dll from memory, compile their code with my wrappers in gcc, and then reload the dll into memory and run it. I would let them define inputs and outputs for testing.
5) Once the test/create step was complete, I have a compiled library created which can be loaded at run time and handled via pointers. The inputs and outputs would be standardized, so I would always know what my I/O was.
6) The only problem with standardized I/O is that some of the inputs and outputs are likely to not be used. I need to see if I can put default values in or something.
So, to sum up:
Think of an app with a text box and a few buttons. You are told that your inputs are named A, B, and C and that your outputs are X, Y, and Z of specified types. You then write a function using standard C code, and with functions from the specified libraries (I'm thinking math etc.)
So now your done... you see a few boxes below to define your input. You fill them in and hit the TEST button. This would wrap your code in a function context, dump the existing dll from memory (if it exists) and compile your code along with any other functions in the same group (another parameter you could define, basically just a name to the user.) It then runs the function using a functional pointer, using the inputs defined in the UI. The outputs are sent to the user so they can determine if their function works. If there are any compilation errors, that would also be outputted to the user.
Now it's time to run for real. Of course I kept track of what functions are where, so I dynamically open the dll, and load all the functions into memory with functional pointers. I start shoving data into one side and the functions give me the answers I need. There would be some overhead to track I/O and to make sure the functions are called in the right order, but the execution would be at compiled machine code speeds... which is my primary requirement.
Now... I have explained what I think will work in two different ways. Can you think of anything that would keep this from working, or perhaps any advice/gotchas/lessons learned that would help me out? Anything from the type of interface to tips on dynamically loading dll's in this manner to using the gcc compiler this way... etc would be most helpful.
Thanks!

Beginning Clojure without Java experience - how to best organise and run projects?

Apologies in advance for the somewhat discursive nature of this clump of related questions; I hope the answers will be a useful resource for newcomers to Clojure.
I have just begun to learn Clojure, motivated in part by this essay. I'm not a professional developer but I have several decades of programming experience (ARexx, VB/VBScript/VBA, then Perl and daily use of R starting in 2011). My platform is Windows 7 64-bit. I'm using Emacs 24.3, cider 20131221 and Leiningen 2.3.3 on Java 1.7.0_45 Java Hotspot 64-bit server. I have bought Clojure Programming and the Clojure Data Analysis Cookbook and dipped into both. I have found them promising but I am getting lost in the detail.
Obviously the thing to do is to get stuck in and experiment with code exercises and small tasks, but the immediate problem for me has been the complexity of structuring, organising and even just plain running projects in Clojure. With R I can get away with a file of plain text containing the bulk of the code, perhaps with one or two others containing common functions for larger projects.
Clojure is very different and with no experience in Java I am struggling to put the pieces together. Clojure Programming has a whole chapter on organising and building projects, but it is so comprehensive that conversely I'm finding it difficult to tease out the information relevant to me now. I guess I'm looking for something like this answer on Swank, but the tools seem to have moved on since then. So here goes.
Leiningen produces amongst other things a project.clj file that contains the project definition and dependencies. I think I get this. Can I use this file for code not related to the definition, below the defproject, or is it best to leave this untouched and have the code itself in different clj file(s)?
If the answer is to leave the project.clj file alone, how is the relationship between that and other files established? Is it simply that all the clj files in the project folder are counted part of the project?
How do I define the main code file, the 'entry point' of the project? Let's say I have project.clj and main.clj with some helper functions in common.clj - how are the relations between these three files defined? I can call functions from main.clj but how does the project know that main is the core of the project if/when I package the project into an uberjar?
If I have a number of clj files, what is the best way to import functions? I have read about require and use (and import and refer and...) but I don't fully understand the difference and those two keywords are difficult to search for. The examples for REPL in the Clojure Data Analysis Cookbook most often opt for use. I found a similar question but it was a little over my head.
This is more tool-specific, but as Emacs seems to be widely used it seems fair to ask: what's a good workflow to run small bits of code given (say) the main.clj example given above? Currently I just open the main.clj file in Emacs, do an M-x cider-jack-in to establish the REPL, experiment in the REPL, then when I want to try something I select the whole buffer and select Eval region from the CIDER menu (C-c C-R). Is this standard operating procedure or utterly misguided?
Is there a convention for defining namespaces? I think I understand that namespaces can cover multiple clj files and that ns is used to define the namespace. Should I explicitly define the namespace (at the beginning of) every file of code? Clojure Programming has some recommendations but I'm interested in input from other users.
Clojure programming says to "Use underscores in filenames when namespaces contain dashes. Very simply, if your namespace is to be com.my-project.foo, the source code for that namespace should be in a file located at com/my_project/foo.clj". (EDIT as explained in this useful answer and also this one). This restriction would never have occured to me. Are there any other gotchas with regard to naming namespaces and variables? R frequently uses dots in variable names but I guess given the Java connection that dots should generally be avoided?
No, don't put actual code in there unless you know what you are doing (e. g. generate the version number for defproject from the local git repository like in the repositories of juxt)
The project.clj is simply one big parameter to Clojures build tool leiningen. See an example here https://github.com/technomancy/leiningen/blob/master/sample.project.clj. For example, you could specifiy a different source directory than src in the :source-path.
Default is the -main function in project.core, but you can specify various different configurations in the project.clj.
require is preferred. :use imports all publics of a namespace unless you use it in conjunction with :only. Require let's you use an alias for an entire namespace with :as, but you can have the same effect from use with :only using :refer. Notice that in ClojureScript :use without :only is not even allowed.
This is normal. There are other combos like e.g. C-c C-k to reload the entire file of the buffer. If you find yourself entering too many forms into a REPL and would rather edit them in a separate buffer https://www.refheap.com/22235.
I like to experiment trying to name namespaces in in verbs rather than nouns, e. g. I prefer myproject.parse, myproject.interpret, over myproject.parser, myproject.interpreter etc. But that's a question of personal style. EDIT: Yes, explicitly define the naming of the namespace by its filename and the ns form at the beginning of the source file. It is unusual to have multiple source-file defining one namespace.
Afaic this is the only caveat regarding naming of namespace. You can hardly know it in advance.
I like your "worried" approach. You will (hopefully) find out that Clojure and especially Leiningen are almost nonsense-free in terms of these questions.
Regarding REPL use: I saw your comment under #Mars answer that you want to use a REPL in a fashion that you can re-use what you are entering. Two things:
Dynamic development is awesome, allowing you to test small components or functions interactively without the need to run an entire program written for that purpose.
If you find your self entering huge forms at the REPL that you intend to de-/recompose into functions or tests later, I recommend editing them in a seperate clj file that is not part of the project source (i. e. not in a namespace). You can then use this Emacs hack to eval forms from a Clojure buffer in the REPL. Ideally split your Emacs in two windows (C-x 3) with the nrepl buffer on one side and your .clj on the other side. Then use C-x C-. from within the clj file to have the form at point pasted into the nrepl and be evaluated. Installation instructions are at the link (and your .emacs file usually resides in the home directory).
#Igrapenthin's answers are great. Here are a few other thoughts.
On namespaces, this tutorial is great.
Just to clarify re #2: No, don't just put the .clj files anywhere under the project. They have to be under src/, or in whatever directories are listed (as strings) in the vector after :source-paths in project.clj, if that entry exists. Then strip off that initial path when you're making your namespace names. This drove me crazy until I figured it out. (People who know better, please correct me if something here isn't right.)
One #3, you need Igraphenthin's answer, but why not just start by evaluating expressions in the REPL? I've been working on a project on and off for weeks, and it does a lot, but my -main function still doesn't do anything. I just run whatever parts I'm working on. Well, you're used to languages with fully operational prompts--you decide.
EDIT: Whether or not you define the -main function to do anything, you can also put :use or :require keywords in the ns statement that defines the namespace for that same file. These will automatically get invoked when you start the REPL with lein repl, and so whatever you have made available through the ns keywords will be available at the REPL. That way, you have your previous work available, but you can play around with it in different ways in the REPL. (Also, if you don't like the default name for the file that's automatically loaded, you can redefine it in project.clj with :main. Igraphenthin alluded to that.)

Poll a file for change?

It is often a pattern that I wish to poll a file for changes (when it was last written). When the file does change from its previous value, I wish to execute some function. Something of the form.
(poll-for-changes file-str on-change-fx current-value)
where
file-str is just a string that specifies the files location
on-change-fx is the function that should be called when the file at file-str changes. Let us say that the on-change-fx should take the File object pointing to file-str as a argument.
current-value the current value of the file in milliseconds. You might set to 0 to guarantee that this function will run at least once, or to the actual value to only run this function when you actually detect a change.
I would just like this function implemented in the clearest, most concise, Clojurist way possible. Thank you.
If you're looking to poll a directory or files and act on it, I think watchtower is pretty good to look at.
Java 7 has a WatchService, which uses file system events to react to changes. In this case, you don't poll at all, but block on a future file event. I don't think there are any projects in Clojure that are out there leveraging that, although I spent some time toying with it to write a small library. The source for it is here
I don't claim my library is even complete, but it does use the Java 7 service, so you could use that for inspiration on your own project.
There are two approaches you could use here:
Use Java interop and the Java 7 WatchService API.
Inspect and learn from existing idiomatic, concise code (in this case by Stuart Sierra) that does something like you want. Note it also uses Java Interop.
I think option #1 is your best bet, and the implementation of the function should be straight forward. You will likely want to use doto and the -> and ->> macros to make the code more readable.
Nowadays I would probably try hara.io.watch first.
Otherwise there are many alternatives (as stated in the hara docs) :
clojure-watch
dirwatch
hawk
watchtower
java-watcher
panoptic
ojo
filevents
And some code can be extracted from :
lein-midje
ns-tracker
lazytest
You could also do what hara does and wrap java.nio.file.WatchService.

Interpreters: Handling includes/imports

I've built an interpreter in C++ and everything works fine so far, but now I'm getting stuck with the design of the import/include/however you want to call it function.
I thought about the following:
Handling includes in the tokenizing process: When there is an include found in the code, the tokenizing function is recursively called with the filename specified. The tokenized code of the included file is then added to the prior position of the include.
Disadvantages: No conditional includes(!)
Handling includes during the interpreting process: I don't know how. All I know is that PHP must do it this way as conditional includes are possible.
Now my questions:
What should I do about includes?
How do modern interpreters (Python/Ruby) handle this? Do they allow conditional includes?
This problem is easy to solve if you have a clean design and you know what you're doing. Otherwise it can be very hard. I have written at least 6 interpreters that all have this feature, and it's fairly straightforward.
Your interpreter needs to maintain an environment that knows about all the global variables, functions, types and so on that have been defined. You might feel more comfortable calling this the "symbol table".
You need to define an internal function that reads a file and updates the environment. Depending on your language design, you might or might not do some evaluation the moment you read things in. My interpreters are very dynamic and evaluate each definition as soon as it is read in.
Your life will be infinitely easier if you structure your interpreter in layers:
Tokenizer (breaks input into tokens)
Parser (reads one token at a time, converts to abstract-syntax tree)
Evaluator (reads the abstract syntax and updates the environment)
The abstract-syntax tree is really the key. If you have this, when you encounter the import/include construct in the input, you just make a recursive call and get more abstract syntax back. You can do this in the parser or the evaluator. If you want conditional import, you have to do it in the evaluator, since only the evaluator can compute a condition.
Source code for my interpreters is on the web. Two of them are written in C; the others are written in Standard ML.