How this code translates a number to English? - clojure

I'm new to Clojure and found there's a piece of code like following
user=> (def to-english (partial clojure.pprint/cl-format nil
"~#(~#[~R~]~^ ~A.~)"))
#'user/to-english
user=> (to-english 1234567890)
"One billion, two hundred thirty-four million, five hundred sixty-seven
thousand, eight hundred ninety"
at https://clojuredocs.org/clojure.core/partial#example-542692cdc026201cdc326ceb. I know what partial does and I checked clojure.pprint/cl-format doc but still don't understand how it translates an integer to English words. Guess secret is hidden behind "~#(~#[~R~]~^ ~A.~)" but I didn't find a clue to read it.
Any help will be appreciated!

The doc mentions it, but one good resource is A Few FORMAT Recipes from Seibel's Practical Common Lisp.
Also, check §22.3 Formatted Output from the HyperSpec.
In Common Lisp:
CL-USER> (format t "~R" 10)
ten
~#(...~^...) is case conversion, where the # prefix means to capitalize (upcase only the first word). It contains an escape upward operation ~^, which in this context marks the end of what is case-converted. It also exits the current context when there are no more argument available.
~#[...] is conditional format: the inner format is applied on a value only if it is non nil.
The final ~A means that the function should be able to accept one more argument and print it.
In fact, your example looks like the one in §22.3.9.2:
If ~^ appears within a ~[ or ~( construct, then all the commands up to
the ~^ are properly selected or case-converted, the ~[ or ~(
processing is terminated, and the outward search continues for a ~{ or
~< construct to be terminated. For example:
(setq tellstr "~#(~#[~R~]~^ ~A!~)")
=> "~#(~#[~R~]~^ ~A!~)"
(format nil tellstr 23) => "Twenty-three!"
(format nil tellstr nil "losers") => " Losers!"
(format nil tellstr 23 "losers") => "Twenty-three losers!"

Related

How to test a function in lisp and emacs

How can I assert that 2012/08 (instead of Aug 2012) is returned from this function?
Thus I can start learning lisp on the job/with the function itself until the output satisfies.
I know some python unit testing (pytest) and am looking for something similar for lisp. However, my first attempt[0] C-c eval-buffer fails with Invalid function: "<2012-08-12 Mon>"
(assert (= (org-cv-utils-org-timestamp-to-shortdate ("<2012-08-12 Mon>")) "Aug 2012"))
(defun org-cv-utils-org-timestamp-to-shortdate (date_str)
"Format orgmode timestamp DATE_STR into a short form date.
Other strings are just returned unmodified
e.g. <2012-08-12 Mon> => Aug 2012
today => today"
(if (string-match (org-re-timestamp 'active) date_str)
(let* ((abbreviate 't)
(dte (org-parse-time-string date_str))
(month (nth 4 dte))
(year (nth 5 dte))) ;;'(02 07 2015)))
(concat
(calendar-month-name month abbreviate) " " (number-to-string year)))
date_str))
[0] https://www.emacswiki.org/emacs/UnitTesting
(assert (= (org-cv-utils-org-timestamp-to-shortdate ("<2012-08-12 Mon>"))
"Aug 2012"))
(defun org-cv-utils-org-timestamp-to-shortdate (...) ...)
statements are executed in sequence, which means your function will only be defined after the assertion is evaluated. This is a problem because the assertion calls that function. You should reorder your code to test it after the function is defined.
you cannot compare strings with =, if you call describe-function (C-h f) for =, you'll see that = is a numerical comparison (in fact, numbers or markers). For strings you need to use string=.
In a normal evaluation context, ie. not in a macro or a special form, the following is read as a function call:
("<2012-08-12 Mon>")
Parentheses are meaningful, the above form says: call function "<2012-08-12 Mon>" with zero arguments. This is not what you want, there is no need to add parentheses here around the string.

Meaning of # in clojure

In clojure you can create anonymous functions using #
eg
#(+ % 1)
is a function that takes in a parameter and adds 1 to it.
But we also have to use # for regex
eg
(clojure.string/split "hi, buddy" #",")
Are these two # related?
There are also sets #{}, fully qualified class name constructors #my.klass_or_type_or_record[:a :b :c], instants #inst "yyyy-mm-ddThh:mm:ss.fff+hh:mm" and some others.
They are related in a sence that in these cases # starts a sequence recognisible by clojure reader, which dispatches every such instance to an appropriate reader.There's a guide that expands on this.
I think this convention exists to reduce the number of different syntaxes to just one and thus simplify the reader.
The two uses have no (direct) relationship.
In Clojure, when you see the # symbol, it is a giant clue that you are "talking" to the Clojure Reader, not to the Clojure Compiler. See the full docs on the Reader here: https://clojure.org/reference/reader.
The Reader is responsible for converting plain text from a source file into a collection of data structures. For example, comparing Clojure to Java we have
; Clojure ; Java
"Hello" => new String( "Hello" )
and
[ "Goodbye" "cruel" "world!" ] ; Clojure vector of 3 strings
; Java ArrayList of 3 strings
var msg = new ArrayList<String>();
msg.add( "Goodbye" );
msg.add( "cruel" );
msg.add( "world!" );
Similarly, there are shortcuts that the Reader recognizes even within Clojure source code (before the compiler converts it to Java bytecode), just to save you some typing. These "Reader Macros" get converted from your "short form" source code into "standard Clojure" even before the Clojure compiler gets started. For example:
#my-atom => (deref my-atom) ; not using `#`
#'map => (var map)
#{ 1 2 3 } => (hash-set 1 2 3)
#_(launch-missiles 12.3 45.6) => `` ; i.e. "nothing"
#(+ 1 %) => (fn [x] (+ 1 x))
and so on. As the # or deref operator shows, not all Reader Macros use the # (hash/pound/octothorpe) symbol. Note that, even in the case of a vector literal:
[ "Goodbye" "cruel" "world!" ]
the Reader creates a result as if you had typed:
(vector "Goodbye" "cruel" "world!" )
Are these two # related?
No, they aren't. The # literal is used in different ways. Some of them you've already mentioned: these are an anonymous function and a regex pattern. Here are some more cases:
Prepending an expression with #_ just wipes it from the compiler as it has never been written. For example: #_(/ 0 0) will be ignored on reader level so none of the exception will appear.
Tagging primitives to coerce them to complex types, for example #inst "2019-03-09" will produce an instance of java.util.Date class. There are also #uuid and other built-in tags. You may register your own ones.
Tagging ordinary maps to coerce them to types maps, e.g. #project.models/User {:name "John" :age 42} will produce a map declared as (defrecord User ...).
Other Lisps have proper programmable readers, and consequently read macros. Clojure doesn't really have a programmable reader - users cannot easily add new read macros - but the Clojure system does internally use read macros. The # read macro is the dispatch macro, the character following the # being a key into a further read macro table.
So yes, the # does mean something; but it's so deep and geeky that you do not really need to know this.

Common Lisp Applying Regex-like Patterns to Keys in PLIST

I am wondering if it is possible to apply Regex-like pattern matching to keys in a plist.
That is, suppose we have a list like this (:input1 1 :input2 2 :input3 3 :output1 10 :output2 20 ... :expand "string here")
The code I need to write is something along the lines of:
"If there is :expand and (:input* or :output*) in the list's keys, then do something and also return the :expand and (:output* or :input*)".
Obviously, this can be accomplished via cond but I do not see a clear way to write this elegantly. Hence, I thought of possibly using a Regex-like pattern on keys and basing the return on the results from that pattern search.
Any suggestions are appreciated.
Normalize your input
A possible first step for your algorithm that will simplify the rest of your problem is to normalize your input in a way that keep the same information in a structured way, instead of inside symbol's names. I am converting keys from symbols to either symbols or lists. You could also define your own class which represents inputs and outputs, and write generic functions that works for both.
(defun normalize-key (key)
(or (cl-ppcre:register-groups-bind (symbol number)
("^(\\w+)(\\d+)$" (symbol-name key))
(list (intern symbol "KEYWORD")
(parse-integer number)))
key))
(defun test-normalize ()
(assert (eq (normalize-key :expand) :expand))
(assert (equal (normalize-key :input1) '(:input 1))))
The above normalize-key deconstructs :inputN into a list (:input N), with N parsed as a number. Using the above function, you can normalize the whole list (you could do that recursively too for values, if you need it):
(defun normalize-plist (plist)
(loop
for (key value) on plist by #'cddr
collect (normalize-key key)
collect value))
(normalize-plist
'(:input1 1 :input2 2 :input3 3 :output1 10 :output2 20 :expand "string here"))
=> ((:INPUT 1) 1
(:INPUT 2) 2
(:INPUT 3) 3
(:OUTPUT 1) 10
(:OUTPUT 2) 20
:EXPAND "string here")
From there, you should be able to implement your logic more easily.

Clojure: java.lang.Integer cannot be cast to clojure.lang.IFn

I know there are a lot of questions out there with this headline, but I can't glean my answer from them, so here goes.
I'm an experienced programmer, but fairly new to Clojure. I'm trying to parse a RTF file by converting it to a HTML file then calling the html parser.
The converter I'm using (unrtf) always prints to stdout, so I need to capture the output and write the file myself.
(defn parse-rtf
"Use unrtf to parse a rtf file"
[#^java.io.InputStream istream charset]
(let [rtffile (File/createTempFile "parse" ".rtf" (File. "/vault/tmp/"))
htmlfile (File/createTempFile "parse" ".ohtml" (File. "/vault/tmp/"))
command (str "/usr/bin/unrtf "
(.getPath rtffile)
)
]
(try
(with-open [rtfout (FileOutputStream. rtffile)]
(IOUtils/copy istream rtfout))
(let [ proc (.exec (Runtime/getRuntime) command)
ostream (.getInputStream proc)
result (.waitFor proc)]
(if (> result 0)
(
(println "unrtf failed" command result)
; throwing an exception causes a parse failure to be logged
(throw (Exception. (str "RTF to HTML conversion failed")))
)
(
(with-open [htmlout (FileOutputStream. htmlfile)]
(IOUtils/copy ostream htmlout))
; since we now have html, run it through the html parser
(parse-html (FileInputStream. htmlfile) charset)
)
)
)
(finally
(.delete rtffile)
(.delete htmlfile)
)
)))
The exception points to the line with
(IOUtils/copy ostream htmlout))
which really confuses me, since I used that form earlier (just after the try:) and it seems to be OK there. I can't see the difference.
Thanks for any help you can give.
As others have correctly pointed out, you can't just add extra parentheses for code organization to group forms together. Parentheses in a Clojure file are tokens that delimit a list in the corresponding code; lists are evaluated as s-expressions - that is, the first form is evaluated and the result is invoked as a function (unless it names a special form such as if or let).
In this case you have the following:
(
(with-open [htmlout (FileOutputStream. htmlfile)]
(IOUtils/copy ostream htmlout))
; since we now have html, run it through the html parser
(parse-html (FileInputStream. htmlfile) charset)
)
The IOUtils/copy function has an integer return value (the number of bytes copied). This value is then returned when the surrounding with-open macro is evaluated. Since the with-open form is the first in a list, Clojure will then try to invoke the integer return value from IOUtils/copy as a function, resulting in the exception that you see.
To evaluate multiple forms for side-effects without invoking the result from the first one, wrap them in a do form; this is a special form that evaluates each expression and returns the result of the final expression, discarding the result from all others. Many core macros and special forms such as let, when, and with-open (among many others) accept multiple expressions and evaluate them in an implicit do.
I didnt try to run your code, just had a look at it, and after the if (> result 0) you have ((println ...)(throw ...)) without a do. Having an extra parens causes the returned value from the inner parens to be treated as a function and get executed.
try to include it, like this (do (println ...) (throw ...))

How to use pprint-newline properly?

I am trying to print a sequence such that neither the whole sequence is printed on one line, nor is each element of the sequence printed on its own line. E.g.
[10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29]
I found pprint-newline in the documentation which indicates that allows me to determine how the newline gets printed. Unfortunately, I cannot find any examples on how it is to be used in conjunction with pprint, and the doc string doesn't to offer much insight:
-------------------------
clojure.pprint/pprint-newline
([kind])
Print a conditional newline to a pretty printing stream. kind specifies if the
newline is :linear, :miser, :fill, or :mandatory.
This function is intended for use when writing custom dispatch functions.
Output is sent to *out* which must be a pretty printing writer.
pprint specifies an optional second argument for the writer, which is by default set to *out*. However, I am not sure how to 'send' pprint-writer to *out* in this case, e.g. something like the example below doesn't appear to work
(clojure.pprint/pprint [1 2 3 4] (*out* (clojure.pprint/pprint-newline :miser)))
While Guillermo explained how to change the dispatch for pretty-printing in general, if all you want to do is printing one collection differently, that's possible, too.
For example, using cl-format (after (use '[clojure.pprint :as pp)):
(binding [pp/*print-pretty* true
pp/*print-miser-width* nil
pp/*print-right-margin* 10]
(pp/cl-format true "~<[~;~#{~a~^ ~:_~}~;]~:>~%" '[foo bar baz quux]))
Set *print-right-margin* as you wish.
You don't have to use format for this. The format directives can be translated to their respective pretty-printer functions, if you want. Explanation of the format string: ~< and ~:> establish a logical block. Inside the block, there are three sections separated by ~;. The first and last section are your prefix and suffix, while the elements are printed in the middle section, using ~#{ and ~}. For each element, the element is printed using ~a, followed by a space, if needed, and a conditional fill-style newline.
(In CL, the format string could be simplified to "~<[~;~#{~a~^ ~}~;]~:#>~%", but that doesn't seem to work in Clojure 1.5.)
As the help says, the function is intended for use for custom dispatch functions.
In order to change the behavior of the pprint for sequences you need to provide a new dispatch function for clojure.lang.ISeq.
The current dispatch function for sequences you can find in clojure/pprint/dispatch.clj
(use-method simple-dispatch clojure.lang.ISeq pprint-list)
...
(defn- pprint-simple-list [alis]
(pprint-logical-block :prefix "(" :suffix ")"
(print-length-loop [alis (seq alis)]
(when alis
(write-out (first alis))
(when (next alis)
(.write ^java.io.Writer *out* " ")
(pprint-newline :linear)
(recur (next alis)))))))
Since printing is dispatched according to data type overriding seems to be the way to go.
See the source code for ideas.