Is there Clojure module equivalent to Python's lxml?

Is there Clojure module equivalent to Python's lxml? - clojure

I apologize for a second question on the same topic, but I'm confused. Is there a Clojure module that follows lxml, even loosely, or how-to documentation on how to walk through an XML file using Clojure?
In Python, I can open an XML file using the lxml module; parse my way through the data; look for tags like <DeviceID>, <TamperName>, <SecheduledDateTime>, and then peform an action based on the value of one of those tags.
In Clojure, I have been given excellent answers on how to parse using data.xml and then further reduce the data.xml-parsed information by pulling out the :content tag's vals and putting the information in a tree-seq.
However, even that resultant data has other map tags embedded, which obviously do not respond to keys and vals functions.
I could take this data and use regular expression searches, but I feel I'm missing something much simpler.
The data right out of data.xml/parse (calling ret-xml-data) looks like this, using various (first parsed-xml) and other commands at the REPL:
[:tag :TamperExport]
[:attrs {}]
:content
#clojure.data.xml.Element{:tag :Header, :attrs {}, :content
(#clojure.data.xml.Element{:tag :ExportType, :attrs {},
:content ("Tamper Export")}
#clojure.data.xml.Element{:tag :CurrentDateTime,
:attrs {},
:content ("2012-06-26T15:40:22.063")} :attrs {},
:content ("{06643D9B-DCD3-459B-86A6-D21B20A03576}")}
Here is the Clojure code I have so far:
(defn ret-xml-data
"Returns a map of the supplied xml file, as parsed by data.xml/parse."
[xml-fnam]
(let [input-xml (try
(java.io.FileInputStream. xml-fnam)
(catch Exception e))]
(if-not (nil? input-xml)
(xmld/parse input-xml)
nil)))
(defn gen-xml-content-tree
"Returns a tree-seq with :content extracted."
[parsed-xml]
(map :content (first (tree-seq :content :content (:content parsed-xml)))))
I think I may have found a repeatable pattern to the data that will allow me to parse this without creating a hodgepodge:
xml-lib.core=> (first (second cl1))
#clojure.data.xml.Element{:tag :DeviceId, :attrs {}, :content ("80580608")}
xml-lib.core=> (keys (first (second cl1)))
(:tag :attrs :content)
xml-lib.core=> (vals (first (second cl1)))
(:DeviceId {} ("80580608"))
Thank you as always.
Edit:
Add some more testing.
The resulting data, if I ran through the tree-seq structure using a function like doseq, could probably now be parsed with actions taken.

First, it's hard to tell exactly what you're trying to do. When working on a programming problem, it helps both you and others helping you to have a "small case" you can present and solve before working towards a larger one.
From what it sounds like, you're trying to pull the content out of certain elements and perform actions based on that content.
I put together a small XML file with some simple content to try things out on:
<root>
<someele>
<item1>data</item1>
<deeper>
<item2>else</item2>
</deeper>
</someele>
</root>
I designed it to be what I think is representative of some of the core challenges with the problem at hand - in particular, being able to do stuff at arbitrary levels of nesting in the XML.
Looking at the wonderful Clojure Cheatsheet, I found xml-seq, and tried running it on the clojure.data.xml/parsed xml. The sequence went through each of the elements and then their children, making it easy to iterate over the XML.
To pick out and work with particular items in a sequence, I like using for loops with :when. :when makes it easy to just enter the body of the loop when certain conditions are true. I also make use of the "set as a function" semantics, which checks to see if something is in the set.
(for [ele (xml-seq (load-xml))
:when (#{:item1 :item2} (:tag ele))]
[(:tag ele) (first (:content ele))])
This gets back a sequence of ([:item1 "data"] [:item2 "else"]) that can then easily be acted on in other ways.
One of the key things to try and keep in mind about Clojure is that you tend to not need any special API to do stuff - the core language makes it easy to do most, if not all, that you need to do. Records (which are what you see being returned) are also maps for example, so assoc, dissoc, and so on work on them, and it's how they are intended to be worked with.
If this doesn't help you get to what you need, then could you provide a small sample output and a sample result that you want?

The closest Clojure library I can think of for lxml after a (very) brief look is called Enlive. It's listed as an HTML templating tool, but I'm pretty sure the techniques it uses for picking out HTML elements can also be applied to XML.

Related

Why is the ^ character used in this ClojureScript snippet?

In the clojurescript re-frame todomvc application we find the following snippet in the todomvc.views namespace.
(defn todo-list
[visible-todos]
[:ul.todo-list
(for [todo #visible-todos]
^{:key (:id todo)} [todo-item todo])])
Although I have read the Clojure chapter on metadata I don't quite understand the purpose of:
^{:key
in the snippet above. Please explain.

The :key is what React needs when you have many items, so that they can be unique within the group. But the latest version of React does not need these keys. So if you use the latest versions of reframe / Reagent, just try without the :key metadata.
This metadata is equivalent to placing :key within the component. So for example what you have is equivalent to:
[todo-item {:key (:id todo)} todo]
Using the metadata approach is a convenience, which must in some cases be easier than the 'first key in props passed to the component' approach.
Here's more explanation.

^{:key (:id todo)} [todo-item todo] would be equivalent to (with-meta [todo-item todo] {:key (:id todo)}), see https://clojuredocs.org/clojure.core/with-meta
Reagent uses this to generate the corresponding react component with a key. Keys help React identify which items have changed, are added, or are removed. here is the explanation: https://reactjs.org/docs/lists-and-keys.html

How to access clojure reagent atom map variable?

I am new to Clojure and Reagent. Kindly tell how to print the variable first inside the atom variable contacts?
(def app-state
(r/atom
{:contacts [{:first "Ben" :last "Lem" :middle "Ab"}]}))

First of all: the reagent tutorial is a really good place to start. It even gives you examples to solve exactly this problem.
Since reagents atom can be treated just as a regular Clojurescript atom, you can use all your normal sequence operations. Keep in mind that in order to access the current value, you have to dereference the atom via #.If you really just want to access the first :first in your atom:
(:first (first (:contacts #app-state))) or (get (first (get #app-state :contacts)) :first)
Or, if you think it's more readable
(-> #app-state
:contacts
first
:first)
I guess what you might want to do is define a few functions to make the access more easy such as:
(defn get-contacts!
"Returns the current vector of contacts stored in the app-state."
[]
(:contacts #app-state))
(defn get-first-names!
"Return a vector of all first names in the current list of contacts in the
app-state."
[]
(mapv :first (get-contacts!)))
Please keep in mind that in reagent (and in general really) you might want to dereference that atom as fiew times as possible, so look for a good place to dereference it and just use regular functions that operate on a simple sequence instead of an atom.
Still, I would really suggest you go read the aforementioned reagent tutorial.

Here is a concise way to access the value that you are looking for using Clojure's (get-in m ks) function:
(get-in #app-state [:contacts 0 :first])

Just as an extra, you may see this often written as
(->> #app-state
:contacts
(mapv :first)
first
and it's useful to understand what's going on here.
->> is a macro called thread-last which will re-write the code above to be
(first (mapv :first (:contacts #app-state)))
Thread last is a bit weird at first but it makes the code more readable when lots of things are going on. I suggest that on top of the reagent tutorial mentioned in the other comments, you read this.

#app-state will give you whatever is inside the r/atom and (:first (first (:contacts #app-state))) will return the first element and (println (:first (first (:contacts #app-state)))) will print output to the browser console (so you need to have the developer tools console open to see it).
Note that for println to output to the browser developer tools console you need to have this line in your code:
(enable-console-print!)

What is the "Correct" way to write an EDN file in Clojure as of August 2013?

I would like to write out an EDN data file from Clojure as tagged literals. Although the clojure.edn API contains read and read-string, there are no writers. I'm familiar with the issue reported here. Based on that, it's my understanding that the pr and pr-str functions are what are meant to be used today.
I wanted to check with the StackOverflow community to see if something like the following would be considered the "correct" way to write out an EDN file:
(spit "friends.edn" (apply str
(interpose "\n\n"
[(pr-str (symbol "#address-book/person") {:name "Janet Wood"})
(pr-str (symbol "#address-book/person") {:name "Jack Tripper"})
(pr-str (symbol "#address-book/person") {:name "Chrissy Snow"})])))
If you are using EDN in production, how do you write out an EDN file? Similar to the above? Are there any issues I need to look out for?
Update
The Clojure Cookbook entry, "Emitting Records as EDN Values" contains a more thorough explanation of this issue and ways to handle it that result in valid EDN tags.

you should not need to generate the tags manually. If you use any of the clojure type definition mechanisms they will be created by the printer. defrecord is particularly convenient for this.
(ns address-book)
(defrecord person [name])
(def people [(person. "Janet Wood")
(person. "Jack Tripper")
(person. "Chrissy Snow")])
address-book> (pr-str people)
"[#address_book.person{:name \"Janet Wood\"}
#address_book.person{:name \"Jack Tripper\"}
#address_book.person{:name \"Chrissy Snow\"}]"
if you want them formatted more nicely you can combine with-out-str and clojure.pprint/pprint. Using Clojure types to create the tags also gives you reading of those tags for free.
address-book> (read-string (pr-str people))
[#address_book.person{:name "Janet Wood"}
#address_book.person{:name "Jack Tripper"}
#address_book.person{:name "Chrissy Snow"}]
address-book> (def read-people (read-string (pr-str people)))
#'address-book/read-people
address-book> (type (first read-people))
address_book.person
The only downside I see is that you lose some control over the way the tags look if you have -'s in your namespace because java classes can't contain these so they get converted to underscores.

How can I use data-readers with edn?

I tried to follow the documentation for clojure.instant/read-instant-timestamp, which reads:
clojure.instant/read-instant-timestamp
To read an instant as a java.sql.Timestamp, bind *data-readers* to a
map with this var as the value for the 'inst key. Timestamp preserves
fractional seconds with nanosecond precision. The timezone offset will
be used to convert into UTC.`
The following result was unexpected:
(do
(require '[clojure.edn :as edn])
(require '[clojure.instant :refer [read-instant-timestamp]])
(let [instant "#inst \"1970-01-01T00:00:09.999-00:00\""
reader-map {'inst #'read-instant-timestamp}]
;; This binding is not appearing to do anything.
(binding [*data-readers* reader-map]
;; prints java.util.Date -- unexpected
(->> instant edn/read-string class println)
;; prints java.sql.Timestamp -- as desired
(->> instant (edn/read-string {:readers reader-map}) class println))))
How can I use the *data-readers* binding? Clojure version 1.5.1.

clojure.edn functions by default only use data readers stored in clojure.core/default-data-readers which, as of Clojure 1.5.1, provides readers for instant and UUID literals. If you want to use custom readers, you can do that by passing in a :readers option; in particular, you can pass in *data-readers*. This is documented in the docstring for clojure.edn/read (the docstring for clojure.edn/read-string refers to that for read).
Here are some examples:
(require '[clojure.edn :as edn])
;; instant literals work out of the box:
(edn/read-string "#inst \"2013-06-08T01:00:00Z\"")
;= #inst "2013-06-08T01:00:00.000-00:00"
;; custom literals can be passed in in the opts map:
(edn/read-string {:readers {'foo identity}} "#foo :asdf")
;= :asdf
;; use current binding of *data-readers*
(edn/read-string {:readers *data-readers*} "...")
(The following section added in response to comments made by Richard Möhn in this GitHub issue's comment thread. The immediate question there is whether it is appropriate for a reader function to call eval on the data passed in. I am not affiliated with the project in question; please see the ticket for details, as well as Richard's comments on the present answer.)
It is worth adding that *data-readers* is implicitly populated from any data_readers.{clj,cljc} files that Clojure finds at the root of the classpath at startup time. This can be convenient (it allows one to use custom tagged literals in Clojure source code and at the REPL), but it does mean that new data readers may appear in there with a change to a single dependency. Using an explicitly constructed reader map with clojure.edn is a simple way to avoid surprises (which could be particularly nasty when dealing with untrusted input).
(Note that the implicit loading process does not result in any code being loaded immediately, or even when a tag mentioned in *data-readers* is first encountered; the process which populates *data-readers* creates empty namespaces with unbound Vars as placeholders, and to actually use those readers one still has to require the relevant namespaces in user code.)

The *data-readers* dynamic var seems to apply to the read-string and read functions from clojure.core only.
(require '[clojure.instant :refer [read-instant-timestamp]])
(let [instant "#inst \"1970-01-01T00:00:09.999-00:00\""
reader-map {'inst #'read-instant-timestamp}]
;; This will read a java.util.Date
(->> instant read-string class println)
;; This will read a java.sql.Timestamp
(binding [*data-readers* reader-map]
(->> instant read-string class println)))
Browsing through the source code for clojure.edn reader here, I couldn't find anything that would indicate that the same *data-readers* var is used at all there.
clojure.core's functions read and read-string use LispReader (which uses the value from *data-readers*), while the functions from clojure.edn use the EdnReader.
This edn library is relatively new in Clojure so that might be the reason why the documentation string is not specific enough regarding edn vs. core reader, which can cause this kind of confusion.
Hope it helps.

Generating Clojure code with macro containing type hints

I'm trying to generate some Clojure code with type hints, however the type hints seem to disappear whenever I build some code (they also don't function when the code is compiled)
e.g.
`(let [^BufferedImage b (create-buffered-image)]
(.getRGB b 0 0))
=> (clojure.core/let [user/b (user/create-buffered-image)] (.getRGB user/b 0 0))
I'm not precisely sure why the type hint is disappearing, but I assume it is something to do with how metatdata is handled by the reader.
What's the right way to create correct type hints in generated code?

There are two answers to this question. To answer your specific question: in the actual code you just posted, nothing is wrong: it's working just fine. (set! *print-meta* true) and try again, and you'll see the metadata annotation. It just doesn't normally print.
But, in general this is not the right way to do things from a macro, and you will have trouble before long. Remember, you don't want metadata on the forms the macro evaluates, you want metadata on the forms the macro emits. So, a more accurate solution is to use with-meta on the symbols or forms that you want to attach metadata to - or, if they're user-supplied forms, you should usually use vary-meta so that you don't discard the metadata they added explicitly. For example,
(defmacro with-image [name & body]
(let [tagged-name (vary-meta name assoc :tag `BufferedImage)
`(let [~tagged-name (create-buffered-image)
~#body)))
(with-image i (.getRGB i 0 0))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is there Clojure module equivalent to Python's lxml? - clojure

The closest Clojure library I can think of for lxml after a (very) brief look is called Enlive. It's listed as an HTML templating tool, but I'm pretty sure the techniques it uses for picking out HTML elements can also be applied to XML.

Related

Why is the ^ character used in this ClojureScript snippet?

How to access clojure reagent atom map variable?

What is the "Correct" way to write an EDN file in Clojure as of August 2013?

How can I use data-readers with edn?

Generating Clojure code with macro containing type hints

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is there Clojure module equivalent to Python's lxml? - clojure

The closest Clojure library I can think of for lxml after a (very) brief look is called Enlive. It's listed as an HTML templating tool, but I'm pretty sure the techniques it uses for picking out HTML elements can also be applied to XML.

Related

Why is the ^ character used in this ClojureScript snippet?

How to access clojure reagent atom map variable?

What is the "Correct" way to write an EDN file in Clojure as of August 2013?

How can I use *data-readers* with edn?

Generating Clojure code with macro containing type hints

Categories

Resources

How can I use data-readers with edn?