edn/read does not maintain the order of the sequence - clojure

I am using clojure library to read the edn file
Now my data.edn file looks like this.
{:mysites {:locations ["priorityLoc1" "priorityLoc2" "priorityLoc3"]
:back-up ["backup1uri" "backup2uri"]}
}
Now when i use edn/read the map that get returns looks like
{:mysites {:locations ["priorityLoc2" "priorityLoc1" "priorityLoc3"]
:back-up ["backup1uri" "backup2uri"]}
}
As you see above in the sequence locations, the positions of the values ie., priorityLoc2 and priorityLoc1 changes.
This causes problem with the applications which looks for site locations in order of the sequence as priority.
I am not sure why the order gets changed during edn/read , Is there a way I can make sure the order of the sequence is not changed ??
I tried the edn/read as is shown in the documentation page for edn/read
https://clojuredocs.org/clojure.edn/read
(defn load-edn
"Load edn from an io/reader source (filename or io/resource)."
[source]
(try
(with-open [r (io/reader source)]
(edn/read (java.io.PushbackReader. r)))
(catch java.io.IOException e
(printf "Couldn't open '%s': %s\n" source (.getMessage e)))))
(catch RuntimeException e
(printf "Error parsing edn file '%s': %s\n" source (.getMessage e)))))))
UPDATE:
I am able to see that edn/read is changing the order when i pass my data edn file , I am trying to replicate the issue with a smaller data file as I cant share my data.edn file here. With my smaller data file the issue is not reproducible.
So, I am not really sure to assume edn/read IS NOT changing the order , is there any chance that edn tries to attempt a sort (like sort by names ?) which causes order to change.
Is there a way to make sure order does not change.
Below is the code for the edn reader.
(require '[clojure.java.io :as io])
(require '[clojure.tools.reader.edn :as edn])
(require '[clojure.tools.reader.reader-types :as readers])
(defn meine-edn-reader
"Load edn from an io/reader source (filename or io/resource)."
[source]
(try
(with-open [reader (io/reader source)]
(-> reader
readers/push-back-reader
readers/indexing-push-back-reader
edn/read))
(catch java.io.IOException e
(printf "Couldn't open '%s': %s\n" source (.getMessage e)))))

Related

Clojure open a large txt file edit the data and write it to a new file

I'm trying to open a file that is to large to slurp. I want to then edit the file to remove all characters except numbers. Then write the data to a new file.
So far I have
(:require [clojure.java.io :as io])
(:require [clojure.string :as str])
:jvm-opts ["-Xmx2G"]
(with-open [rdr (io/reader "/Myfile.txt")
wrt (io/writer "/Myfile2.txt")]
(doseq [line (line-seq rdr)]
(.write wrt (str line "\n"))))
Which reads and writes but I'm unsure of the best way to go about editing.Any help is much appreciated. I'm very new to the language.
Looks like you just need to modify the line value before writing it. If you want to modify a string to remove all non-numeric characters, a regular expression is a pretty easy route. You could make a function to do this:
(defn numbers-only [s]
(clojure.string/replace s #"[^\d]" ""))
(numbers-only "this is 4 words")
=> "4"
Then use that function in your example:
(str (numbers-only line) "\n")
Alternatively, you could map numbers-only over the output of line-seq, and because both map and line-seq are lazy you'll get the same lazy/on-demand behavior:
(map numbers-only (line-seq rdr))
And then your doseq would stay the same. I would probably opt for this approach as it keeps your "stream" processing together, and your imperative/side-effect loop is only concerned with writing its inputs.

Read from a file and return output in a vector

I'm just learning clojure and trying to read in a file and do something with the returned vector of results. In this instance I'm just trying to print it out.
Below is the code in question:
(defn read_file
"Read in a file from the resources directory"
[input]
(with-open [rdr (reader input)]
(doseq [line (line-seq rdr)])))
(defn -main []
(println (read_file "resources/input.txt") ))
The println returns a "nil". What do I need to do to return "line"
If the file is not very big, you can use slurp to read the file content as a string, then split it with a specific delimiter (in this case \n).
(defn read-file [f]
(-> (slurp f)
(clojure.string/split-lines)))
doseq returns nil. It's supposed to be used when you're doing stuff in a do fashion on the elements of a sequence, so mostly side-effect stuff.
Try this:
(defn file->vec
"Read in a file from the resources directory"
[input]
(with-open [rdr (reader input)]
(into [] (line-seq rdr))))
But you shouldn't do this for big files, in those cases you don't want the whole file to sit in memory. For this reason, slurp is equally bad.

Zip a file in clojure

I want to zip a file in clojure and I can't find any libraries to do it.
Do you know a good way to zip a file or a folder in Clojure?
Must I use a java library?
There is a stock ZipOutputStream in Java which can be used from Clojure. I don't know whether there is a library somewhere. I use the plain Java functions with a small helper macro:
(defmacro ^:private with-entry
[zip entry-name & body]
`(let [^ZipOutputStream zip# ~zip]
(.putNextEntry zip# (ZipEntry. ~entry-name))
~#body
(flush)
(.closeEntry zip#)))
Obviously every ZIP entry describes a file.
(require '[clojure.java.io :as io])
(with-open [file (io/output-stream "foo.zip")
zip (ZipOutputStream. file)
wrt (io/writer zip)]
(binding [*out* wrt]
(doto zip
(with-entry "foo.txt"
(println "foo"))
(with-entry "bar/baz.txt"
(println "baz")))))
To zip a file you might want to do something like this:
(with-open [output (ZipOutputStream. (io/output-stream "foo.zip"))
input (io/input-stream "foo")]
(with-entry output "foo"
(io/copy input output)))
All compression and decompression of files can be done with a simple shell command which we can access through clojure.java.shell
Using the same method you can also compress and decompress any compression type you would usually from your terminal.
(use '[clojure.java.shell :only [sh]])
(defn unpack-resources [in out]
(clojure.java.shell/sh
"sh" "-c"
(str " unzip " in " -d " out)))
(defn pack-resources [in out]
(clojure.java.shell/sh
"sh" "-c"
(str " zip " in " -r " out)))
(unpack-resources "/path/to/my/zip/foo.zip"
"/path/to/store/unzipped/files")
(pack-resources "/path/to/store/archive/myZipArchiveName.zip"
"/path/to/my/file/myTextFile.csv")
You can import this (gzip) https://gist.github.com/bpsm/1858654
Its quite interesting.
Or more precisely, you can use this
(defn gzip
[input output & opts]
(with-open [output (-> output clojure.java.io/output-stream GZIPOutputStream.)]
(with-open [rdr (clojure.java.io/reader input)]
(doall (apply clojure.java.io/copy rdr output opts)))))
You can use rtcritical/clj-ant-tasks library that wraps Apache Ant, and zip with a single command.
Add library dependency [rtcritical/clj-ant-tasks "1.0.1"]
(require '[rtcritical.clj-ant-tasks :refer [run-ant-task]])
To zip a file:
(run-ant-task :zip {:destfile "/tmp/file-zipped.zip"
:basedir "/tmp"
:includes "file-to-zip"})
Note: run-ant-task(s) functions in this library namespace can be used to run any other Apache Ant task(s) as well.
For more information, see https://github.com/rtcritical/clj-ant-tasks

Read csv into a list in clojure

I know there are a lot of related questions, I have read them but still have not gained some fundamental understanding of how to read-process-write. Take the following function for example which uses clojure-csv library to parse a line
(defn take-csv
"Takes file name and reads data."
[fname]
(with-open [file (reader fname)]
(doseq [line (line-seq file)]
(let [record (parse-csv line)]))))
What I would like to obtain is data read into some collection as a result of (def data (take-csv "file.csv")) and later to process it. So basically my question is how do I return record or rather a list of records.
"doseq" is often used for operations with side effect. In your case to create collection of records you can use "map":
(defn take-csv
"Takes file name and reads data."
[fname]
(with-open [file (reader fname)]
(doall (map (comp first csv/parse-csv) (line-seq file)))))
Better parse the whole file at ones to reduce code:
(defn take-csv
"Takes file name and reads data."
[fname]
(with-open [file (reader fname)]
(csv/parse-csv (slurp file))))
You also can use clojure.data.csv instead of clojure-csv.core. Only should rename parse-csv to take-csv in previous function.
(defn put-csv [fname table]
(with-open [file (writer fname)]
(csv/write-csv file table)))
With all the things you can do with .csv files, I suggest using clojure-csv or clojure.data.csv. I mostly use clojure-csv to read in a .csv file.
Here are some code snippets from a utility library I use with most of my Clojure programs.
from util.core
(ns util.core
^{:author "Charles M. Norton",
:doc "util is a Clojure utilities directory"}
(:require [clojure.string :as cstr])
(:import java.util.Date)
(:import java.io.File)
(:use clojure-csv.core))
(defn open-file
"Attempts to open a file and complains if the file is not present."
[file-name]
(let [file-data (try
(slurp file-name)
(catch Exception e (println (.getMessage e))))]
file-data))
(defn ret-csv-data
"Returns a lazy sequence generated by parse-csv.
Uses open-file which will return a nil, if
there is an exception in opening fnam.
parse-csv called on non-nil file, and that
data is returned."
[fnam]
(let [csv-file (open-file fnam)
inter-csv-data (if-not (nil? csv-file)
(parse-csv csv-file)
nil)
csv-data
(vec (filter #(and pos? (count %)
(not (nil? (rest %)))) inter-csv-data))]
(if-not (empty? csv-data)
(pop csv-data)
nil)))
(defn fetch-csv-data
"This function accepts a csv file name, and returns parsed csv data,
or returns nil if file is not present."
[csv-file]
(let [csv-data (ret-csv-data csv-file)]
csv-data))
Once you've read in a .csv file, then what you do with its contents is another matter. Usually, I am taking .csv "reports" from one financial system, like property assessments, and formatting the data to be uploaded into a database of another financial system, like billing.
I will often either zipmap each .csv row so I can extract data by column name (having read in the column names), or even make a sequence of zipmap'ped .csv rows.
Just to add this good answers, here is a full example
First, add clojure-csv into your dependencies
(ns scripts.csvreader
(:require [clojure-csv.core :as csv]
[clojure.java.io :as io]))
(defn take-csv
"Takes file name and reads data."
[fname]
(with-open [file (io/reader fname)]
(-> file
(slurp)
(csv/parse-csv))))
usage
(take-csv "/path/youfile.csv")

Convert to CSV value to Clojure list

What is the best way to turn this line of CSV for column 3 to a Clojure list?
357302041352401, 2012-08-27 19:59:32 -0700, 100, ["SNIA34", "M33KLC", "M34KLC", "W35REK", "SRBT", "MODE", "BFF21S", "CC12", "RCV56V", "NBA1", "RESP", "A0NTC", "PRNK", "WAYS", "HIRE", "BITE", "INGA1", "M32MOR", "TFT99W", "TBF5P", "NA3NR"]
Assuming you can already read the csv file...
You can use read-string in combination with into
user=> (def your_csv_column "[\"SNIA34\", \"M33KLC\", \"M34KLC\"]")
#'user/your_csv_column
user=> (into '() (read-string your_csv_column))
("M34KLC" "M33KLC" "SNIA34")
You can use Clojure Csv to do that.
You data is interesting, which appears to include a traditional comma-separated line, followed by data in brackets. I could not quite tell if the bracketed data was the representation you had in the .csv file or wanted after reading, but either way, this is how I read a .csv file:
My library's project.clj that uses clojure-csv:
(defproject util "1.0.4-SNAPSHOT"
:description "A general purposes Clojure library"
:dependencies [[org.clojure/clojure "1.4.0"]
[clojure-csv/clojure-csv "1.3.2"]]
:aot [util.core]
:omit-source true)
My library's core.clj header:
(ns util.core
^{:author "Charles M. Norton",
:doc "util is a Clojure utilities directory containing things
most Clojure programs need, like cli routines.
Created on April 4, 2012"}
(:require [clojure.string :as cstr])
(:import java.util.Date)
(:import java.io.File)
(:use clojure-csv.core))
My library's function that returns a .csv file parsed as a vector of vectors.
(defn ret-csv-data
"Returns a lazy sequence generated by parse-csv.
Uses open-file which will return a nil, if
there is an exception in opening fnam.
parse-csv called on non-nil file, and that
data is returned."
[fnam]
(let [ csv-file (open-file fnam)
inter-csv-data (if-not (nil? csv-file)
(parse-csv csv-file)
nil)
csv-data (vec (filter #(and pos? (count %) (not (nil? (rest %))))
inter-csv-data))]
;removes blank sequence at EOF.
(pop csv-data)))
(defn fetch-csv-data
"This function accepts a csv file name, and returns parsed csv data,
or returns nil if file is not present."
[csv-file]
(let [csv-data (ret-csv-data csv-file)]
csv-data))
What I have found to be very helpful is avoid using nth -- very useful advice from SO and other sources -- and given most of my .csv data is from database queries, I zipmap columns to each .csv seqeuence (row), and then operate on that data by map key. It simplifies things for me.