Clojure: how to simulate adding metadata to characters in stream? - clojure

Having a function which returns a seq of characters, I need to modify it to allow attaching metadata to some characters (but not all). Clojure doesn't support 'with-meta' on primitive types. So, the possible options are:
return a seq of vectors of [character, metadata];
pros: simplicity, data and metadata are tied together
cons: need to extract data from vector
return two separate seqs, one for characters and one for metadata, caller most iterate those simultaneously if he cares about metadata;
pros: caller is not forced to extract data from each stream element and may throw away meta-sequence if he wishes
cons: need to iterate both seqs at once, more complexity on caller side if metadata is needed
introduce some record-wrapper containing one character and allowing to attach meta to itself (Clojure records implement IMeta);
pros: data and metadata are tied together
cons: need to extract data from record
your better option.
Which approach is better?

Using vector/map sequence, e.g.
({:char 'x' :meta <...>} {:char 'y' :meta <...>} {:char 'z' :meta <...>} ...)
; or
(['x' <...>] ['y' <...>] ['z' <...>] ...)
looks like the best option for me, that's what I'd do myself if I had such task. Then, for example, writing a function which transforms such sequence back to sequence of chars is very simple:
(defn characters [s] (map :char s))
; or
(defn characters [s] (map first s))
Iterating through characters and metadata at the same time is also very easy using destructuring bindings:
(doseq [{:keys [char meta]} s] ...)
; or
(doseq [[char meta] s] ...)
What to use (map or vector) is mostly a matter of personal preference.
IMO, using records and their IMeta interface is not quite as good: I think that this kind of metadata is mostly intended for language-related code (macros, code preprocessing, syntax extensions etc) and not for domain code. Of course I may be wrong in this assumption.
And using two parallel sequences is the worst option because it is not as convenient for the user of your interface as single sequence. And throwing away the metadata is very simple with the function I wrote above, and it will not even have performance implications if all sequences are lazy.

Related

Clojure pipe collection one by one

How in Clojure process collections like in Java streams - one by one thru all the functions instead of evaluating all the elements in all the stack frame. Also I would describe it as Unix pipes (next program pulls chunk by chunk from previous one).
As far as I understand your question, you may want to look into two things.
First, understand the sequence abstraction. This is a way of looking at collections which consumes them one by one and lazily. It is an important Clojure idiom and you'll meet well known functions like map, filter, reduce, and many more. Also the macro ->>, which was already mentioned in a comment, will be important.
After that, when you want to dig deeper, you probably want to look into transducers and reducers. In a grossly oversimplifying summary, they allow you combine several lazy functions into one function and then process a collection with less laziness, less memory consumption, more performance, and possibly on several threads. I consider these to be advanced topics, though. Maybe the sequences are already what you were looking for.
Here is a simple example from ClojureDocs.org
;; Use of `->` (the "thread-first" macro) can help make code
;; more readable by removing nesting. It can be especially
;; useful when using host methods:
;; Arguably a bit cumbersome to read:
user=> (first (.split (.replace (.toUpperCase "a b c d") "A" "X") " "))
"X"
;; Perhaps easier to read:
user=> (-> "a b c d"
.toUpperCase
(.replace "A" "X")
(.split " ")
first)
"X"
As always, don't forget the Clojure CheatSheet or Clojure for the Brave and True.

Toggle case sensitivity for a huge chunk of Clojure code

There is a large chunk of code (mostly not mine) that does the following with user input (that is more or less, space separated list of commands with some arguments/options):
Remove all unsupported characters
Split on space into a vector
Recursively apply first item in vector on the rest of the vector (function uses whatever arguments it needs, and returns vector without itself and its arguments to the loop).
Functions themselves, as far as input is concerned, have a mix of (case), (cond), (condp), (=) and (compare) with some nasty (keyword) comparisons mixed in.
Everyone was fine with the fact that this all is strictly case-sensitive until very recently. Now some (previously unknown) ancient integration bits acting as users appeared and are having some casing issues that I have no control over.
Question: is there a viable way (shortcut before there will be more time to redo it all) to make string comparison case insensitive for some sort of a scope, based on some variable?
I considered 3 options:
Fixing the code (will be done sometime, anyway, but not viable at the moment).
Extracting some low level comparison function (hopefully just one) and rebinding it for the local scope (sounds great, but catching cases might be difficult and error-prone).
Standardize input (might not be possible without some hacks since some data, outside comparisons, NEEDS to be case sensitive).
After some research, the answer is probably no (and planning for major changes should start), but I figured asking would not hurt, maybe someone thought of it before.
Edit: sample problematic input:
"Command1 ARG1 aRG2 Command3 command9 Arg4 Arg9 aRg5 COMMAND4 arg8"
Breaking it down:
"Commands" with broken case I need to be able, on demand, to match case insensitively. Arguments are matched case insensitively on another level - so they do not concern this piece of code, but their case inside this bit of code should be preserved to be sent further along.
NB! It is not possible at the start of the processing to tell what is in the input a command and what is argument.
For what it's worth, here is a case-insensitive wrapper for simple case forms:
(ns lexer.core)
(defn- standardize [thing]
(assert (string? thing) (str thing " should be a string"))
(clojure.string/lower-case thing))
(defmacro case-insensitive-case [expr & pairs+default?]
(let [pairs (partition 2 pairs+default?)
convert (fn [[const form]]
(list (standardize const) form))
most-of-it `(case (standardize ~expr) ~#(mapcat convert pairs))]
(if (-> pairs+default? count even?)
most-of-it
(concat most-of-it [(last pairs+default?)]))))
For example,
(macroexpand-1 '(case-insensitive-case (test expression)
"Blam!" (1 + 1)
(whatever works)))
=> (clojure.core/case (lexer.core/standardize (test expression)) "blam!" (1 + 1) (whatever works))
The assert in standardize is necessary because lower-case turns things into strings:
(clojure.string/lower-case 22)
=> "22"
As per Alan Thompson's comment, str/lower-case was the right first half of approach - I just needed to find the right place to apply it to just command name.
Afterwards redefining = and couple of functions used inside cond and condp (credit to ClojureMostly) solved the matching part.
All that was left were the string literals inside case statements which I just find-and-replaced with lower case.

Write a lazy sequence of lines to a file in ClojureClr without reflection

I have a large lazy seq of lines that I want to write to a file. In C#, I would use System.IO.File/WriteAllLines which has an overload where the lines are either string[] or IEnumerable<string>.
I want to do this without using reflection at runtime.
(set! *warn-on-reflection* true)
(defn spit-lines [^String filename seq]
(System.IO.File/WriteAllLines filename seq))
But, I get this reflection warning.
Reflection warning, ... - call to WriteAllLines can't be resolved.
In general I need to know when reflection is necessary for performance reasons, but I don't care about this particular method call. I'm willing to write a bit more code to make the warning go away, but not willing to force all the data into memory as an array. Any suggestions?
Here are two options to consider, depending on whether you are using Clojure's core data structures.
Convert from a seq to an IEnumerable<string> with Enumerable.Cast from LINQ
This option will work for any IEnumerable that contains only strings.
(defn spit-lines [^String filename a-seq]
(->> a-seq
(System.Linq.Enumerable/Cast (type-args System.String))
(System.IO.File/WriteAllLines filename)))
Type Hint to force the caller to supply IEnumerable<string>
If you want to use a type hint, do this. But watch out, the clojure data structures do not implement IEnumerable<String>, so this could lead to a runtime exception.
^|System.Collections.Generic.IEnumerable`1[System.String]|
Wrapping the full CLR name of the type in vertical pipes (|) lets you specify characters that are otherwise illegal in Clojure syntax.
(defn spit-lines [^String filename ^|System.Collections.Generic.IEnumerable`1[System.String]| enumerable-of-string]
(System.IO.File/WriteAllLines filename enumerable-of-string))
Here's the exception from (spit-lines "filename.txt" #{}) when passing a set to the type-hinted version:
System.InvalidCastException: Unable to cast object of type 'clojure.lang.PersistentTreeSet' to type 'System.Collections.Generic.IEnumerable`1[System.String]'.
More information about specifying types.

Can I use the clojure 'for' macro to reverse a string?

This is a follow up to my question "Recursively reverse a sequence in Clojure".
Is it possible to reverse a sequence using the Clojure "for" macro? I'm trying to better understand the limitations and use-cases of this macro.
Here is the code I'm starting from:
((defn reverse-with-for [s]
(for [c s] c))
Possible?
If so, I assume the solution may require wrapping the for macro in some expression that defines a mutable var, or that the body-expr of the for macro will somehow pass a sequence to the next iteration (similar to map).
Clojure for macro is being used with arbitrary Clojure sequences.
These sequences may or may not expose random access like vectors do. So, in general case, you do not have access to the last element of a Clojure sequence without traversing all the way to it, which would make making a pass through it in reverse order not possible.
I'm assumming you had something like this in mind (Java-like pseudocode):
for(int i = n-1; i--; i<=0){
doSomething(array[i]);
}
In this example we know array size n in advance and we can access elements by its index. With Clojure sequences we don't know that. In Java it makes sense to do that with arrays and ArrayLists. Clojure sequences are however much more like linked lists - you have an element, and a reference to next one.
Btw, even if there were a (probably non-idiomatic)* way to do that, its time complexity would be something like O(n^2) which is just not worth the effort compared to much easier solution in the linked post which is O(n^2) for lists and a much better O(n) for vectors (and it is quite elegant and idiomatic. In fact, the official reverse has that implementation).
EDIT:
A general advice: Don't try to do imperative programming in Clojure, it wasn't designed for it. Although many things may seem strange or counter-intuitive (as opposed to well known idioms from imperative programming) once you get used to the functional way of doing things it is a lot, and I mean a lot easier.
Specifically for this question, despite the same name Java (and other C-like) for and Clojure for are not the same thing! First is an actual loop - it defines a flow control. The second one is a comprehension - look at it conceptually as a higher function of a sequence and a function f to be done for each of its element, which returns another sequence of f(element) s. Java for is a statement, it doesn't evaluate to anything, Clojure for (as well as anything else in Clojure) is an expression - it evaluates to the sequence of f(element) s.
Probably the easiest way to get the idea is to play with sequence functions library: http://clojure.org/sequences. Also, you can solve some problems on http://www.4clojure.com/. The first problems are very easy but they gradually get harder as you progress through them.
*As shown in Alexandre's answer the solution to the problem in fact is idiomatic and quite clever. Kudos for that! :)
Here's how you could reverse a string with for:
(defn reverse-with-for [s]
(apply str
(for [i (range (dec (count s)) -1 -1)]
(get s i))))
Note that this code is mutation free. It's the same as:
(defn reverse-with-map [s]
(apply str
(map (partial get s) (range (dec (count s)) -1 -1))))
A simpler solution would be:
(apply str (reverse s))
First of all, as Goran said, for is not a statement - it is an expression, namely sequence comprehension. It construct sequences by iteration through other sequences. So in the form it is meant to be used it is pure function (without side-effects). for can be seen as enhanced map infused with filter. Because of this it cannot be used to hold iteration state as e.g. reduce do.
Secondly, you can express sequence reversal using for and mutable state, e.g. using an atom, which is rough equivalent (not taking into account its concurrency properties) of java variable. But doing so you are facing several problems:
You are breaking main language paradigm so you will definitely get worse looking and behaving code.
Since all clojure mutable state cells are designed to be thread-safe, they all use some kind of illegal concurrent modification protection, and there is no ability to remove it. Consequently, you will get poorer performance characteristics.
In this particular case, like Goran said, sequences are one of the wide-used Clojure abstractions. For example, there are lazy sequences, which could be potentially infinite, so you just cannot walk them to the end. You certainly will have difficulties trying to work with such sequences with imperative techniques.
So don't do it, at least in Clojure :)
EDIT: I forgot to mention it. for returns lazy sequence, so you have to evaluate it in some way in order to apply all state mutations you do in it. Another reason not to do so :)

How do I get core clojure functions to work with my defrecords

I have a defrecord called a bag. It behaves like a list of item to count. This is sometimes called a frequency or a census. I want to be able to do the following
(def b (bag/create [:k 1 :k2 3])
(keys bag)
=> (:k :k1)
I tried the following:
(defrecord MapBag [state]
Bag
(put-n [self item n]
(let [new-n (+ n (count self item))]
(MapBag. (assoc state item new-n))))
;... some stuff
java.util.Map
(getKeys [self] (keys state)) ;TODO TEST
Object
(toString [self]
(str ("Bag: " (:state self)))))
When I try to require it in a repl I get:
java.lang.ClassFormatError: Duplicate interface name in class file compile__stub/techne/bag/MapBag (bag.clj:12)
What is going on? How do I get a keys function on my bag? Also am I going about this the correct way by assuming clojure's keys function eventually calls getKeys on the map that is its argument?
Defrecord automatically makes sure that any record it defines participates in the ipersistentmap interface. So you can call keys on it without doing anything.
So you can define a record, and instantiate and call keys like this:
user> (defrecord rec [k1 k2])
user.rec
user> (def a-rec (rec. 1 2))
#'user/a-rec
user> (keys a-rec)
(:k1 :k2)
Your error message indicates that one of your declarations is duplicating an interface that defrecord gives you for free. I think it might actually be both.
Is there some reason why you cant just use a plain vanilla map for your purposes? With clojure, you often want to use plain vanilla data structures when you can.
Edit: if for whatever reason you don't want the ipersistentmap included, look into deftype.
Rob's answer is of course correct; I'm posting this one in response to the OP's comment on it -- perhaps it might be helpful in implementing the required functionality with deftype.
I have once written an implementation of a "default map" for Clojure, which acts just like a regular map except it returns a fixed default value when asked about a key not present inside it. The code is in this Gist.
I'm not sure if it will suit your use case directly, although you can use it to do things like
user> (:earth (assoc (DefaultMap. 0 {}) :earth 8000000000))
8000000000
user> (:mars (assoc (DefaultMap. 0 {}) :earth 8000000000))
0
More importantly, it should give you an idea of what's involved in writing this sort of thing with deftype.
Then again, it's based on clojure.core/emit-defrecord, so you might look at that part of Clojure's sources instead... It's doing a lot of things which you won't have to (because it's a function for preparing macro expansions -- there's lots of syntax-quoting and the like inside it which you have to strip away from it to use the code directly), but it is certainly the highest quality source of information possible. Here's a direct link to that point in the source for the 1.2.0 release of Clojure.
Update:
One more thing I realised might be important. If you rely on a special map-like type for implementing this sort of thing, the client might merge it into a regular map and lose the "defaulting" functionality (and indeed any other special functionality) in the process. As long as the "map-likeness" illusion maintained by your type is complete enough for it to be used as a regular map, passed to Clojure's standard function etc., I think there might not be a way around that.
So, at some level the client will probably have to know that there's some "magic" involved; if they get correct answers to queries like (:mars {...}) (with no :mars in the {...}), they'll have to remember not to merge this into a regular map (merge-ing the other way around would work fine).