How does Clojure find sources in the classpath? - clojure

Libraries in Clojure are packed the in jars the same as Java.
But these jars contain source code to be compiled, so, in theory, Clojure needs to check every file inside every jar in the classpath to see if it's a Clojure source file (sounds inefficient), specially if the code needs to be AOT compiled.
Does Clojure actually do this or does it have some heuristic to find out which jars contain .clj files?

To access Clojure code, first you have to require the namespace. When require is called it will derive the source file name from the namespace name.
From the Clojure documentation of require:
A lib's
name also locates its root directory within classpath using Java's
package name to classpath-relative path mapping.
[...]
The root resource path
is derived from the lib name in the following manner: Consider a lib named by the symbol 'x.y.z; it has the root directory
/x/y/, and its root resource is /x/y/z.clj, or
/x/y/z.cljc if /x/y/z.clj does not exist.

Related

Finding out the name to `:require` in a namespace

I am following this tutorial: https://practicalli.github.io/blog/posts/web-scraping-with-clojure-hacking-hacker-news/ and I have had a hard time dealing with the :require part of the ns macro. This tutorial shows how to parse HTML and pull out information from it with a library called enlive, and to use it, I first had to put
...
:dependencies [[org.clojure/clojure "1.10.1"]
[enlive "1.1.6"]]
...
in my project.clj, and require the library in core.clj as the following:
(ns myproject.core
(:require [net.cgrand.enlive-html :as html])
(:gen-class))
I spent so much time finding out the name net.cgrand.enlive-html, since it was different from the package's name itself (which is just enlive), and I couldn't find it through any of the lein commands (I eventually found it out by googling).
How can I easily find out what name to require?
Practical approach
If your editor/IDE helps with auto-completion and
docs, that might be a first route.
Other than that, libraries usually have some read-me online, where they show off
what they do (what to require, how to use that).
Strict approach
If you really have nothing about a library, you will find the downloaded
library in you ~/.m2/repository directory. Note that deps without the naming
convention of "group/artifact" will just double on the artifact name, Next is
the version. So you can find your libraries JAR file here:
.m2/repository/enlive/enlive/1.1.6/enlive-1.1.6.jar.
JAR files are just ZIP-Files. Inside the JAR file you will usually find the
source files of the library. The directory structure reflects the package
structure. E.g. one file there is net/cgrand/enlive_html.clj (note the use
of the _ instead of -, this is due to name munging for the JVM). You then
can require the file in your REPL and explore with doc or dir etc. Or you
open this file, to see the docs and source code in one chunk.
Usually I get this from the documentation / tutorial for the library.
https://github.com/cgrand/enlive Check out the Quick Tutorial, which starts with the needed require.

Loading OCaml modules not in the current directory

I'm writing a large OCaml project. I wrote a file foo.ml, which works perfectly. In a subdirectory of foo.ml's directory, there is a file bar.ml.
bar.ml references code in foo.ml, so its opening line is:
open Foo
This gives me an error at compile time:
Unbound module Foo.
What can I do to fix this without changing the location of foo.ml?
The easy path is to use one of OCaml build system like ocamlbuild or oasis. Another option would be jbuilder but jbuilder is quite opiniated about file organization and does not allow for the kind of subdirectory structure that you are asking for.
The more explicit path comes with a warning: OCaml build process is complicated with many moving parts that can be hard to deal with.
After this customary warning, when looking for modules, OCaml compiler first looks for module in the current compilation environment, then looks for compiled interface ".cmi" files in the directories specified by the "-I" option flags (plus the current directory and the standard library directory).
Thus in order to compile your bar.ml file, you will need to add the parent directory in the list of included directories with the -I .. option.
After all this, you will discover that during the linking phase, all object files (i.e. .cmo or .cmx) need to be listed in a topological order compatible with the dependency graph of your project.
Consequently, let me repeat my advice: use a proper build system.

Where should the file user.clj go?

I am trying to setup proto-repl atom-editor package and apparently it needs a file user.clj to exist somewhere - which I guess is some leiningen's init file.
Where should I create this file?
Clojure will load the file user.clj from your class path if it is found. In a default leinengen project src/ will be on the class path, so if you create src/user.clj the contents of that file will be loaded in the context of the user namespace.
user is the default namespace for the clojure repl, but some leiningen projects override this. In order to access definitions in user.clj you will need to either pull user into scope (using require or use) or make sure that user is your starting namespace.
See the Proto REPL demo project https://github.com/jasongilman/proto-repl-demo/blob/master/dev/user.clj for an example of how to setup user.clj You should also add a dependency on clojure.tools.namespace in the project.clj https://github.com/jasongilman/proto-repl-demo/blob/master/project.clj
I just pushed some changes to Proto REPL last night to improve this area but you'll still benefit from having one setup.
According to the proto-repl page, it might use some functions from user namespace when reloading code in REPL (reset function) but it shouldn't be required.
You might want to take a look at the proto-repl demo project to see the more advanced setup.

How do you specify the classpath with Leiningen?

In Clojure, I have a Leiningen project with my source in
/src/project/core.clj
I want to add a subdirectory to this. Eg.
/src/project/examples/example-one.clj
In my core.clj file I try to pull in from
project.examples.example-one
But lein compile still tells me
Could not locate project/examples/example_one__init.class or project/examples/example_one.clj on classpath:
Do you have to explicitly update project.clj file if you add a subdirectory to your main code directory? (I don't see that the main code directory itself is given there explicitly.)
if your namespace contains dashes, the corresponding file should contain underscores instead of those dashes. You can read about the reason in here:
why-does-clojure-convert-dashes-in-names-to-underscores-in-the-filesystem
Unless you add different source codes like Java, Groovy etc... by default lein will include all the namespaces in the src folder.
Ah ... seems I can't have a hyphen in file name?
Kind of weird for a Lisp dialect, now I've got used to using hyphens as the default separator in my function names.

The path in protobuf

I don't quite understand the path in protobuf. My file layout like this:
Top
A
a.proto
B
C
c.proto // import "A/a.proto";
I have written an RPC system based on protobuf and I need generate two kinds of files(client and server code) from c.proto. Client code should be placed in B and Server code still in C.
I can't write a correct command.
Top> protoc -I=. --client_out=./B/ C/c.proto will generate client code in B/C and the #include in code will have a wrong path.
Top/C> protoc -I=../ -I=./ --client_out=./ ./c.proto lead a protobuf_AddDesc_* error.
For every .proto file, protoc tries to determine the file's "canonical name" -- a name which distinguishes it from any other .proto file that may ever find its way into your system. In fact, ideally, the canonical name is different from every other .proto file in the world. The canonical name is the name you use when you import the .proto file from another .proto file. It is also used to decide where to output the generated files and what #includes to generate.
For .proto files specified on the command line, protoc determines the canonical name by trying to figure out what name you would use to import that file. So, it goes through the import paths (specified with -I) and looks for one that is a prefix of the file name. It then removes that prefix to determine the canonical name.
In your case, if you specify -I=. C/c.proto, then the canonical name is C/c.proto. If you specified -I=C C/c.proto, the canonical name would then simply be c.proto.
It is important that any file which attempts to import your .proto file imports it using exactly the canonical name determined when the file itself was compiled. Otherwise, you get the linker error regarding AddDesc.
In general, everything works well if you designate some directory to be the "root" of your source tree, and all of your code lives in a subdirectory of that with a unique name designating your project. Your "root" directory should be the directory you pass to both -I and --client_out. Alternatively, you can have separate directories for source files vs. generated files, but the generated files directory should have an internal structure that mirrors your source directory. You can then specify the generated files directory to --client_out, and when you run the C++ compiler, specify both the source and generated files directories in the include path.
If you have some other setup -- e.g. one where the .proto files live at a different canonical path from the .pb.h files -- then unfortunately you will have some trouble making protoc do what you want. Though, given that you are writing a custom code generator, you could invent whatever rules you want for the way its output files are organized, but straying from the rules the standard code generator follows might lead to lots of little pitfalls.