Parse string with "read" and ignore package namespaces - list

I am writing a program that opens a lisp file, calls "read" on the stream until the stream is empty, and does things with the lists it collects.
This was working quite nicely until I discovered that "read" will perform package lookup, for instance if it encounters some-package:foo it will complain that Package SOME-PACKAGE does not exist.
Here is an example showing what I mean:
(read (make-string-input-stream "(list 'foo :foo some-package:foo)"))
So I now I would like one of three things:
Make it so "read" will ignore package namespaces so I can convert arbitrary source files to lists of symbols.
Use some other parsing library with similar behavior to "read" but that only gets plain symbols, either by mangling the : or ignoring the colon and everything before it.
Pre-processing the file and use regex or such to package lookups and replace them with plain names, such as converting "some-package:foo" to simply "foo"
The purpose of all of this in the first place was to make a function call dependency graph. I'm aware there exists things of that nature of much higher quality that exist, but I wanted to do it myself for fun/learning. However, I have hit a snag with this problem and don't know how to proceed.

For your use case, you could handle the package-error condition by creating the required package and restarting. That would also preserve the symbol identities. Note that you need to handle in-package forms when you encounter them.

The simplest answer is to tell the Lisp reader to read colon #\: as is:
(defun read-standalone-char (stream char)
(declare (ignore stream))
char)
(defun make-no-package-prefix-readtable (&optional (rt (copy-readtable)))
"Return a readtable for reading while ignoring package prefixes."
(set-syntax-from-char #\: #\Space rt)
(set-macro-character #\: #'read-standalone-char nil rt)
rt)
(let ((*readtable* (make-no-package-prefix-readtable)))
(read-from-string "(list 'foo :foo some-package:foo)"))
==> (LIST 'FOO #\: FOO SOME-PACKAGE #\: FOO) ; 33
The obvious problem is that this will read FOO:BAR and FOO :BAR identically, but you might be able to work around that.

Related

Clojure pipe collection one by one

How in Clojure process collections like in Java streams - one by one thru all the functions instead of evaluating all the elements in all the stack frame. Also I would describe it as Unix pipes (next program pulls chunk by chunk from previous one).
As far as I understand your question, you may want to look into two things.
First, understand the sequence abstraction. This is a way of looking at collections which consumes them one by one and lazily. It is an important Clojure idiom and you'll meet well known functions like map, filter, reduce, and many more. Also the macro ->>, which was already mentioned in a comment, will be important.
After that, when you want to dig deeper, you probably want to look into transducers and reducers. In a grossly oversimplifying summary, they allow you combine several lazy functions into one function and then process a collection with less laziness, less memory consumption, more performance, and possibly on several threads. I consider these to be advanced topics, though. Maybe the sequences are already what you were looking for.
Here is a simple example from ClojureDocs.org
;; Use of `->` (the "thread-first" macro) can help make code
;; more readable by removing nesting. It can be especially
;; useful when using host methods:
;; Arguably a bit cumbersome to read:
user=> (first (.split (.replace (.toUpperCase "a b c d") "A" "X") " "))
"X"
;; Perhaps easier to read:
user=> (-> "a b c d"
.toUpperCase
(.replace "A" "X")
(.split " ")
first)
"X"
As always, don't forget the Clojure CheatSheet or Clojure for the Brave and True.

Use of ^ in clojure function parameter definition

(defn lines
"Given an open reader, return a lazy sequence of lines"
[^java.io.BufferedReader reader]
(take-while identity (repeatedly #(.readLine reader))))
what does this line mean? -> [^java.io.BufferedReader reader]
also I know this is a dumb question. can you show me the documentation where I could read this myself? So that I don't have to ask it here :)
You can find documentation here:
https://clojure.org/reference/java_interop#typehints
Clojure supports the use of type hints to assist the compiler in avoiding reflection in performance-critical areas of code. Normally, one should avoid the use of type hints until there is a known performance bottleneck. Type hints are metadata tags placed on symbols or expressions that are consumed by the compiler. They can be placed on function parameters, let-bound names, var names (when defined), and expressions:
(defn len [x]
(.length x))
(defn len2 [^String x]
(.length x))
...
Once a type hint has been placed on an identifier or expression, the compiler will try to resolve any calls to methods thereupon at compile time. In addition, the compiler will track the use of any return values and infer types for their use and so on, so very few hints are needed to get a fully compile-time resolved series of calls.
You should also check out:
https://clojure.org/guides/weird_characters
https://clojure.org/reference/reader
And never, ever fail to keep open a browser tab to The Clojure CheatSheet
You may also wish to review this answer.

Toggle case sensitivity for a huge chunk of Clojure code

There is a large chunk of code (mostly not mine) that does the following with user input (that is more or less, space separated list of commands with some arguments/options):
Remove all unsupported characters
Split on space into a vector
Recursively apply first item in vector on the rest of the vector (function uses whatever arguments it needs, and returns vector without itself and its arguments to the loop).
Functions themselves, as far as input is concerned, have a mix of (case), (cond), (condp), (=) and (compare) with some nasty (keyword) comparisons mixed in.
Everyone was fine with the fact that this all is strictly case-sensitive until very recently. Now some (previously unknown) ancient integration bits acting as users appeared and are having some casing issues that I have no control over.
Question: is there a viable way (shortcut before there will be more time to redo it all) to make string comparison case insensitive for some sort of a scope, based on some variable?
I considered 3 options:
Fixing the code (will be done sometime, anyway, but not viable at the moment).
Extracting some low level comparison function (hopefully just one) and rebinding it for the local scope (sounds great, but catching cases might be difficult and error-prone).
Standardize input (might not be possible without some hacks since some data, outside comparisons, NEEDS to be case sensitive).
After some research, the answer is probably no (and planning for major changes should start), but I figured asking would not hurt, maybe someone thought of it before.
Edit: sample problematic input:
"Command1 ARG1 aRG2 Command3 command9 Arg4 Arg9 aRg5 COMMAND4 arg8"
Breaking it down:
"Commands" with broken case I need to be able, on demand, to match case insensitively. Arguments are matched case insensitively on another level - so they do not concern this piece of code, but their case inside this bit of code should be preserved to be sent further along.
NB! It is not possible at the start of the processing to tell what is in the input a command and what is argument.
For what it's worth, here is a case-insensitive wrapper for simple case forms:
(ns lexer.core)
(defn- standardize [thing]
(assert (string? thing) (str thing " should be a string"))
(clojure.string/lower-case thing))
(defmacro case-insensitive-case [expr & pairs+default?]
(let [pairs (partition 2 pairs+default?)
convert (fn [[const form]]
(list (standardize const) form))
most-of-it `(case (standardize ~expr) ~#(mapcat convert pairs))]
(if (-> pairs+default? count even?)
most-of-it
(concat most-of-it [(last pairs+default?)]))))
For example,
(macroexpand-1 '(case-insensitive-case (test expression)
"Blam!" (1 + 1)
(whatever works)))
=> (clojure.core/case (lexer.core/standardize (test expression)) "blam!" (1 + 1) (whatever works))
The assert in standardize is necessary because lower-case turns things into strings:
(clojure.string/lower-case 22)
=> "22"
As per Alan Thompson's comment, str/lower-case was the right first half of approach - I just needed to find the right place to apply it to just command name.
Afterwards redefining = and couple of functions used inside cond and condp (credit to ClojureMostly) solved the matching part.
All that was left were the string literals inside case statements which I just find-and-replaced with lower case.

How do I get core clojure functions to work with my defrecords

I have a defrecord called a bag. It behaves like a list of item to count. This is sometimes called a frequency or a census. I want to be able to do the following
(def b (bag/create [:k 1 :k2 3])
(keys bag)
=> (:k :k1)
I tried the following:
(defrecord MapBag [state]
Bag
(put-n [self item n]
(let [new-n (+ n (count self item))]
(MapBag. (assoc state item new-n))))
;... some stuff
java.util.Map
(getKeys [self] (keys state)) ;TODO TEST
Object
(toString [self]
(str ("Bag: " (:state self)))))
When I try to require it in a repl I get:
java.lang.ClassFormatError: Duplicate interface name in class file compile__stub/techne/bag/MapBag (bag.clj:12)
What is going on? How do I get a keys function on my bag? Also am I going about this the correct way by assuming clojure's keys function eventually calls getKeys on the map that is its argument?
Defrecord automatically makes sure that any record it defines participates in the ipersistentmap interface. So you can call keys on it without doing anything.
So you can define a record, and instantiate and call keys like this:
user> (defrecord rec [k1 k2])
user.rec
user> (def a-rec (rec. 1 2))
#'user/a-rec
user> (keys a-rec)
(:k1 :k2)
Your error message indicates that one of your declarations is duplicating an interface that defrecord gives you for free. I think it might actually be both.
Is there some reason why you cant just use a plain vanilla map for your purposes? With clojure, you often want to use plain vanilla data structures when you can.
Edit: if for whatever reason you don't want the ipersistentmap included, look into deftype.
Rob's answer is of course correct; I'm posting this one in response to the OP's comment on it -- perhaps it might be helpful in implementing the required functionality with deftype.
I have once written an implementation of a "default map" for Clojure, which acts just like a regular map except it returns a fixed default value when asked about a key not present inside it. The code is in this Gist.
I'm not sure if it will suit your use case directly, although you can use it to do things like
user> (:earth (assoc (DefaultMap. 0 {}) :earth 8000000000))
8000000000
user> (:mars (assoc (DefaultMap. 0 {}) :earth 8000000000))
0
More importantly, it should give you an idea of what's involved in writing this sort of thing with deftype.
Then again, it's based on clojure.core/emit-defrecord, so you might look at that part of Clojure's sources instead... It's doing a lot of things which you won't have to (because it's a function for preparing macro expansions -- there's lots of syntax-quoting and the like inside it which you have to strip away from it to use the code directly), but it is certainly the highest quality source of information possible. Here's a direct link to that point in the source for the 1.2.0 release of Clojure.
Update:
One more thing I realised might be important. If you rely on a special map-like type for implementing this sort of thing, the client might merge it into a regular map and lose the "defaulting" functionality (and indeed any other special functionality) in the process. As long as the "map-likeness" illusion maintained by your type is complete enough for it to be used as a regular map, passed to Clojure's standard function etc., I think there might not be a way around that.
So, at some level the client will probably have to know that there's some "magic" involved; if they get correct answers to queries like (:mars {...}) (with no :mars in the {...}), they'll have to remember not to merge this into a regular map (merge-ing the other way around would work fine).

How to rename an operation in Clojure?

In my list, addition, the operation + appears as #. How can I make this appear exactly as +? When I eval it, it should also work exactly the same as +.
I guess this would also apply in all kinds of functions in Clojure...
Thanks guys.
The # character is simply not a valid character in symbol names in Clojure (see this page for a list of valid characters) and while it might work sometimes (as it often will), it is not a good practice to use it. Also, it will definitely not work at the beginning of a symbol (actually a literal, you could still do (symbol "#"), though there's probably no point in that). As the Clojure reader currently stands, there's nothing to be done about it (except possibly hacking the reader open to have it treat # (that's '#' followed by a space) as the symbol # -- or simply + -- though that's something you really shouldn't do, so I almost feel guilty for providing a link to instructions on how to do it).
Should you want to alias a name to some other name which is legal in Clojure, you may find it convenient to use the clojure.contrib.def/defalias macro instead of plain def; this has the added benefit of setting metadata for you (and should handle macros, though it appears to have a bug which prevents that at this time, at least in 1.2 HEAD).
And in case you'd like to redefine some built-in names when creating your aliases... (If you don't, the rest of this may not be relevant to you.)
Firstly, if you work with Clojure 1.1 or earlier and you want to provide your own binding for a name from clojure.core, you'll need to use :refer-clojure when defining your namespace. E.g. if you want to provide your own +:
(ns foo.bar
(:refer-clojure :exclude [+]))
;; you can now define your own +
(defn + [x y]
(if (zero? y)
x
(recur (inc x) (dec y))))
;; examples
(+ 3 5)
; => 8
(+ 3 -1)
; => infinite loop
(clojure.core/+ 3 -1)
; => 2
The need for this results from Clojure 1.1 prohibiting rebinding of names which refer to Vars in other namespaces; ns-unmap provides a way around it appropriate for REPL use, whereas (:refer-clojure :exclude ...), (:use :exclude ...) and (:use :only ...) provide the means systematically to prevent unwanted names from being imported from other namespaces in the first place.
In current 1.2 snapshots there's a "last-var-in wins" policy, so you could do without the :refer-clojure; it still generates a compiler warning, though, and it's better style to use :refer-clojure, plus there's no guarantee that this policy will survive in the actual 1.2 release.
An operation is just a piece of code, assigned to a variable. If you want to rebind an operation, you just rebind that variable:
(def - +)
(- 1 2 3)
# => 6
The only problem here, is that the # character is special in Clojure. I'm not sure whether you can use # as a variable name at all, at the very least you will need to quote it when binding it and probably also when calling it:
(def # +)
# => java.lang.Exception: No dispatch macro for:
Unfortunately, I'm not familiar enough with Clojure to know how to quote it.