Beginner in clojure: Tokenizing lists of different characters

Beginner in clojure: Tokenizing lists of different characters - regex

So I know this isn't the best method of solving this issue, but I'm trying to go through a list of lines from an input file, which end up being expressions. I've got a list of expressions, and each expression has it's own list thanks to the split-the-list function. My next step is to replace characters with id, ints with int, and + or - with addop. I've got the regexes to find whether or not my symbols match any of those, but when I try and replace them, I can only get the last for loop I call to leave any lasting results. I know what it stems down to is the way functional programming works, but I can't wrap my head around the trace of this program, and how to replace each separate type of input and keep the results all in one list.
(def reint #"\d++")
(def reid #"[a-zA-Z]+")
(def readdop #"\+|\-")
(def lines (into () (into () (clojure.string/split-lines (slurp "input.txt")) )))
(defn split-the-line [line] (clojure.string/split line #" " ))
(defn split-the-list [] (for [x (into [] lines)] (split-the-line x)))
(defn tokenize-the-line [line]
(for [x line] (clojure.string/replace x reid "id"))
(for [x line] (clojure.string/replace x reint "int"))
(for [x line] (clojure.string/replace x readdop "addop")))
(defn tokenize-the-list [] (for [x (into [] (split-the-list) )] (tokenize-the-line x)))
And as you can probably tell, I'm pretty new to functional programming, so any advice is welcome!

You're using a do block, which evaluates several expressions (normally for side effects) and then returns the last one. You can't see it because fn (and hence defn) implicitly contain one. As such, the lines
(for [x line] (clojure.string/replace x reid "id"))
(for [x line] (clojure.string/replace x reint "int"))
are evaluated (into two different lazy sequences) and then thrown away.
In order for them to affect the return value, you have to capture their return values and use them in the next round of replacements.
In this case, I think the most natural way to compose your replacements is the threading macro ->:
(for [x line]
(-> x
(clojure.string/replace reid "id")
(clojure.string/replace reint "int")
(clojure.string/replace readdop "addop")))
This creates code which does the reid replace with x as the first argument, then does the reint replace with the result of that as the first argument and so on.
Alternatively you could do this by using comp to compose anonymous functions like (fn [s] (clojure.string/replace s reid "id") (partial application of replace). In the imperative world we get pretty used to running several procedures that "bash the data in place" - in the functional world you more often combine several functions together to do all the operations and then run the result.

Related

Conditional "assignment" in functional programming

I am programming something that doesn't have side-effects, but my code is not very readable.
Consider the following piece of code:
(let [csv_data (if header_row (cons header_row data_rows) data_rows)]
)
I'm trying to use csv_data in a block of code. What is a clean way of conditioning on the presence of a header_row? I've looked at if-let, but couldn't see how that could help here.
I have run into similar situations with functional for-loops as well where I'm binding the result to a local variable, and the code looks like a pile of expressions.
Do I really have to create a separate helper function in so many cases?
What am I missing here?

Use the cond->> macro
(let [csv_data (cond->> data_rows
header_row (cons header-row)]
)
It works like the regular ->> macro, but before each threading form a test expression has to be placed that determines whether the threading form will be used.
There is also cond->. Read more about threading macros here: Official threading macros guide

First, don't use underscore, prefer dashes.
Second, there is nothing wrong with a little helper function; after all, this seems to be a requirement for handling your particular data format.
Third, if you can change your data so that you can skip those decisions and have a uniform representation for all corner cases, this is even better. A header row contains a different kind of data (column names?), so you might prefer to keep them separate:
(let [csv {:header header :rows rows}]
...)
Or maybe at some point you could have "headers" and "rows" be of the same type: sequences of rows. Then you can concat them directly.
The ensure-x idiom is a very common way to normalize your data:
(defn ensure-list [data]
(and data (list data)))
For example:
user=> (ensure-list "something")
("something")
user=> (ensure-list ())
(())
user=> (ensure-list nil)
nil
And thus:
(let [csv (concat (ensure-list header) rows)]
...)

i would propose an utility macro. Something like this:
(defmacro update-when [check val-to-update f & params]
`(if-let [x# ~check]
(~f x# ~val-to-update ~#params)
~val-to-update))
user> (let [header-row :header
data-rows [:data1 :data2]]
(let [csv-data (update-when header-row data-rows cons)]
csv-data))
;;=> (:header :data1 :data2)
user> (let [header-row nil
data-rows [:data1 :data2]]
(let [csv-data (update-when header-row data-rows cons)]
csv-data))
;;=> [:data1 :data2]
it is quite universal, and lets you fulfill more complex tasks then just simple consing. Like for example you want to reverse some coll if check is trueish, and concat another list...
user> (let [header-row :header
data-rows [:data1 :data2]]
(let [csv-data (update-when header-row data-rows
(fn [h d & params] (apply concat (reverse d) params))
[1 2 3] ['a 'b 'c])]
csv-data))
;;=> (:data2 :data1 1 2 3 a b c)
update
as noticed by #amalloy , this macro should be a function:
(defn update-when [check val-to-update f & params]
(if check
(apply f check val-to-update params)
val-to-update))

After thinking about the "cost" of a one-line helper function in the namespace I've came up with a local function instead:
(let [merge_header_fn (fn [header_row data_rows]
(if header_row
(cons header_row data_rows)
data_rows))
csv_data (merge_header_fn header_row data_rows) ]
...
<use csv_data>
...
)
Unless someone can suggest a more elegant way of handling this, I will keep this as an answer.

What is the difference between this `doseq` statement and a `for` statement; reading file in clojure?

If you have been following my questions over the day,
I am doing a class project in clojure and having difficulty reading a file, parsing it, and creating a graph from its content. I have managed to open and read a file along with parsing the lines as needed. The issue I face now is creating a graph structure from the data that was read in.
Some background first. In other functions I have implemented in this project I have used a for statement to "build up" a list of values as such
...
(let [rem-list (remove nil? (for [j (range (count (graph n)))]
(cond (< (rand) 0.5)
[n (nth (seq (graph n)) j)])))
...
This for would build up a list of edges to remove from a graph, after it was done I could then use rem-list in a reduce to remove all of the edges from some graph structure.
Back to my issue. I figured that if I were to read a file line by line I could "build up" a list in the same manner so I implemented the function below
(defn readGraphFile [filename, numnodes]
(let [edge-list
(with-open [rdr (io/reader filename)]
(doseq [line (line-seq rdr)]
(lineToEdge line)))]
(edge-list)))
Though if I am to run this function I end up with a null pointer exception as if nothing was ever "added" to edge-list. So being the lazy/good? programmer I am I quickly thought of another way. Though it still somewhat relies on my thinking of how the for built the list.
In this function I first let [graph be equal to an empty graph with the known number of nodes. Then each time that a line was read I would simply add that edge (each line in the file is an edge) to the graph, in effect "building up" my graph. The function is shown below
(defn readGraph [filename, numnodes]
(let [graph (empty-graph numnodes)]
(with-open [rdr (io/reader filename)]
(doseq [line (line-seq rdr)]
(add-edge graph (lineToEdge line))))
graph))
Here lineToEdge returns a pair of numbers (ex [1 2]). Which is proper input for the add-edge function.
finalproject.core> (add-edge (empty-graph 5) (lineToEdge "e 1 2"))
[#{} #{2} #{1} #{} #{}]
The issue with this function though is that it seems to never actually add an edge to a graph
finalproject.core> (readGraph "/home/eccomp/finalproject/resources/11nodes.txt" 11)
[#{} #{} #{} #{} #{} #{} #{} #{} #{} #{} #{}]
So I guess my issue lies with how doseq is different from for? Is it different or is my implementation incorrect?

doseq differs from for in that it is intended for running a function on a sequence just for the side effects.
If you look at the documentation for doseq:
(https://clojuredocs.org/clojure.core/doseq)
Repeatedly executes body (presumably for side-effects) with
bindings and filtering as provided by "for". Does not retain
the head of the sequence. Returns nil
So, regardless of any processing you're doing, nil will just be returned.
You can switch doseq with for, and it should work. However, line-seq is lazy, so what you might have to do is wrap it in a doall to ensure that it will try to read all the lines when the file is open.
Also, your second readGraph function will only return an empty graph:
(defn readGraph [filename, numnodes]
(let [graph (empty-graph numnodes)]
(with-open [rdr (io/reader filename)]
(doseq [line (line-seq rdr)]
(add-edge graph (lineToEdge line))))
graph))
The final line is just the empty graph you set with let, since Clojure is an immutable language, the graph reference is never updated, since you have a function that takes an existing graph and adds an edge to it, you need to step though the list while passing the list that you're building up.
I know there must be a better way to do this, but I'm not as good at Clojure as I would like, but something like:
(defn readGraph
[filename numnodes]
(with-open [rdr (io/reader filename)]
(let [edge-seq (line-seq rdr)]
(loop [cur-line (first edge-seq)
rem-line (rest edge-seq)
graph (empty-graph numnodes)]
(if-not cur-line
graph
(recur (first rem-line)
(rest rem-line)
(add-edge graph (lineToEdge cur-line))))))))
Might give you something closer to what you're after.
Thinking about it a little more, you could try using reduce, so:
(defn readGraph
[filename numnodes]
(with-open [rdr (io/reader filename)]
(reduce add-edge (cons (empty-graph numnodes)
(doall (line-seq rdr))))))
Reduce will go through a sequence, applying the function you pass in to the first two arguments, then passing in the result of that as the first argument to the next call. The cons is there, so we can be sure an empty graph is the first argument that is passed in.

You could easily find an answer to your question in Clojure documentation.
You could find complete documentation for all core functions on clojuredocs.org website, or you could simply run (doc <function name>) in your Clojure REPL.
Here is what doseq function documentation says:
=> (doc doseq)
(doc doseq)
-------------------------
clojure.core/doseq
([seq-exprs & body])
Macro
Repeatedly executes body (presumably for side-effects) with
bindings and filtering as provided by "for". Does not retain
the head of the sequence. Returns nil.
In other words, in always returns nil. So, the only way you could use it is to cause some side-effects (e.g. repeatedly print something to your console).
And here is what for function documentation says:
=> (doc for)
(doc for)
-------------------------
clojure.core/for
([seq-exprs body-expr])
Macro
List comprehension. Takes a vector of one or more
binding-form/collection-expr pairs, each followed by zero or more
modifiers, and yields a lazy sequence of evaluations of expr.
Collections are iterated in a nested fashion, rightmost fastest,
and nested coll-exprs can refer to bindings created in prior
binding-forms. Supported modifiers are: :let [binding-form expr ...],
:while test, :when test.
(take 100 (for [x (range 100000000) y (range 1000000) :while (< y x)] [x y]))
So, for function produces a lazy sequence which you could bind to some variable and use later in your code.
Note that produced sequence is lazy. It means that elements of this sequence will not be computed until you'll try to use (or print) them. For example, the following function:
(defn noop []
(for [i (range 10)]
(println i))
nil)
won't print anything, since for loop result is not used and thus not computed. You could force evaluation of a lazy sequence using doall function.

use 'for' inside 'let' return a list of hash-map

Sorry for the bad title 'cause I don't know how to describe in 10 words. Here's the detail:
I'd like to loop a file in format like:
a:1 b:2...
I want to loop each line, collect all 'k:v' into a hash-map.
{ a 1, b 2...}
I initialize a hash-map in a 'let' form, then loop all lines with 'for' inside let form.
In each loop step, I use 'assoc' to update the original hash-map.
(let [myhash {}]
(for [line #{"A:1 B:2" "C:3 D:4"}
:let [pairs (clojure.string/split line #"\s")]]
(for [[k v] (map #(clojure.string/split %1 #":") pairs)]
(assoc myhash k (Float. v)))))
But in the end I got a lazy-seq of hash-map, like this:
{ {a 1, b 2...} {x 98 y 99 z 100 ...} }
I know how to 'merge' the result now, but still don't understand why 'for' inside 'let' return
a list of result.
What I'm confused is: does the 'myhash' in the inner 'for' refers to the 'myhash' declared in the 'let' form every time? If I do want a list of hash-map like the output, is this the idiomatic way in Clojure ?

Clojure "for" is a list comprehension, so it creates list. It is NOT a for loop.
Also, you seem to be trying to modify the myhash, but Clojure's datastructures are immutable.
The way I would approach the problem is to try to create a list of pair like (["a" 1] ["b" 2] ..) and the use the (into {} the-list-of-pairs)

If the file format is really as simple as you're describing, then something much more simple should suffice:
(apply hash-map (re-seq #"\w+" (slurp "your-file.txt")))
I think it's more readable if you use the ->> threading macro:
(->> "your-file.txt" slurp (re-seq #"\w+") (apply hash-map))
The slurp function reads an entire file into a string. The re-seq function will just return a sequence of all the words in your file (basically the same as splitting on spaces and colons in this case). Now you have a sequence of alternating key-value pairs, which is exactly what hash-map expects...
I know this doesn't really answer your question, but you did ask about more idiomatic solutions.
I think #dAni is right, and you're confused about some fundamental concepts of Clojure (e.g. the immutable collections). I'd recommend working through some of the exercises on 4Clojure as a fun way to get more familiar with the language. Each time you solve a problem, you can compare your own solution to others' solutions and see other (possibly more idomatic) ways to solve the problem.
Sorry, I didn't read your code very thorougly last night when I was posting my answer. I just realized you actually convert the values to Floats. Here are a few options.
1) partition the sequence of inputs into key/val pairs so that you can map over it. Since you now how a sequence of pairs, you can use into to add them all to a map.
(->> "kvs.txt" slurp (re-seq #"\w") (partition 2)
(map (fn [[k v]] [k (Float. v)])) (into {}))
2) Declare an auxiliary map-values function for maps and use that on the result:
(defn map-values [m f]
(into {} (for [[k v] m] [k (f v)])))
(->> "your-file.txt" slurp (re-seq #"\w+")
(apply hash-map) (map-values #(Float. %)))
3) If you don't mind having symbol keys instead of strings, you can safely use the Clojure reader to convert all your keys and values.
(->> "your-file.txt" slurp (re-seq #"\w+")
(map read-string) (apply hash-map))
Note that this is a safe use of read-string because our call to re-seq would filter out any hazardous input. However, this will give you longs instead of floats since numbers like 1 are long integers in Clojure

Does the myhash in the inner for refer to the myhash declared in the let form every time?
Yes.
The let binds myhash to {}, and it is never rebound. myhash is always {}.
assoc returns a modified map, but does not alter myhash.
So the code can be reduced to
(for [line ["A:1 B:2" "C:3 D:4"]
:let [pairs (clojure.string/split line #"\s")]]
(for [[k v] (map #(clojure.string/split %1 #":") pairs)]
(assoc {} k (Float. v))))
... which produces the same result:
(({"A" 1.0} {"B" 2.0}) ({"C" 3.0} {"D" 4.0}))
If I do want a list of hash-map like the output, is this the idiomatic way in Clojure?
No.
See #DaoWen's answer.

Clojure apply vs map

I have a sequence (foundApps) returned from a function and I want to map a function to all it's elements. For some reason, apply and count work for the sequnece but map doesn't:
(apply println foundApps)
(map println rest foundApps)
(map (fn [app] (println app)) foundApps)
(println (str "Found " (count foundApps) " apps to delete"))))
Prints:
{:description another descr, :title apptwo, :owner jim, :appstoreid 1235, :kind App, :key #<Key App(2)>} {:description another descr, :title apptwo, :owner jim, :appstoreid 1235, :kind App, :key #<Key App(4)>}
Found 2 apps to delete for id 1235
So apply seems to happily work for the sequence, but map doesn't. Where am I being stupid?

I have a simple explanation which this post is lacking. Let's imagine an abstract function F and a vector. So,
(apply F [1 2 3 4 5])
translates to
(F 1 2 3 4 5)
which means that F has to be at best case variadic.
While
(map F [1 2 3 4 5])
translates to
[(F 1) (F 2) (F 3) (F 4) (F 5)]
which means that F has to be single-variable, or at least behave this way.
There are some nuances about types, since map actually returns a lazy sequence instead of vector. But for the sake of simplicity, I hope it's pardonable.

Most likely you're being hit by map's laziness. (map produces a lazy sequence which is only realised when some code actually uses its elements. And even then the realisation happens in chunks, so that you have to walk the whole sequence to make sure it all got realised.) Try wrapping the map expression in a dorun:
(dorun (map println foundApps))
Also, since you're doing it just for the side effects, it might be cleaner to use doseq instead:
(doseq [fa foundApps]
(println fa))
Note that (map println foundApps) should work just fine at the REPL; I'm assuming you've extracted it from somewhere in your code where it's not being forced. There's no such difference with doseq which is strict (i.e. not lazy) and will walk its argument sequences for you under any circumstances. Also note that doseq returns nil as its value; it's only good for side-effects. Finally I've skipped the rest from your code; you might have meant (rest foundApps) (unless it's just a typo).
Also note that (apply println foundApps) will print all the foundApps on one line, whereas (dorun (map println foundApps)) will print each member of foundApps on its own line.

A little explanation might help. In general you use apply to splat a sequence of elements into a set of arguments to a function. So applying a function to some arguments just means passing them in as arguments to the function, in a single function call.
The map function will do what you want, create a new seq by plugging each element of the input into a function and then storing the output. It does it lazily though, so the values will only be computed when you actually iterate over the list. To force this you can use the (doall my-seq) function, but most of the time you won't need to do that.
If you need to perform an operation immediately because it has side effects, like printing or saving to a database or something, then you typically use doseq.
So to append "foo" to all of your apps (assuming they are strings):
(map (fn [app] (str app "foo")) found-apps)
or using the shorhand for an anonymous function:
(map #(str % "foo") found-apps)
Doing the same but printing immediately can be done with either of these:
(doall (map #(println %) found-apps))
(doseq [app found-apps] (println app))

Clojure Parse String

I have the following string
layout: default
title: Envy Labs
What i am trying to do is create map from it
layout->default
title->"envy labs"
Is this possible to do using sequence functions or do i have to loop through each line?
Trying to get a regex to work with and failing using.
(apply hash-map (re-split #": " meta-info))

user> (let [x "layout: default\ntitle: Envy Labs"]
(reduce (fn [h [_ k v]] (assoc h k v))
{}
(re-seq #"([^:]+): (.+)(\n|$)" x)))
{"title" "Envy Labs", "layout" "default"}

The _ is a variable name used to indicate that you don't care about the value of the variable (in this case, the whole matched string).

I'd recommend using clojure-contrib/duck-streams/read-lines to process the lines then split the fields from there. I find this method is usually more robust to errors in the file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Beginner in clojure: Tokenizing lists of different characters - regex

Related

Conditional "assignment" in functional programming

What is the difference between this `doseq` statement and a `for` statement; reading file in clojure?

use 'for' inside 'let' return a list of hash-map

Clojure apply vs map

Clojure Parse String

Categories

Resources