I have multiple html files, which is to be combined into a single html file. Those multiple files are like header, footer, etc, which are common to multiple files. I'm using enlive's html-resource method. but, that method inserting missing html tags into the final file, which I don't want.
Following is the output map,
({:tag :html, :attrs nil, :content (
{:tag :head, :attrs nil, :content (
{:tag :meta, :attrs {:content text/html; charset=utf-8, :http-equiv Content-Type}, :content ()}
{:tag :title, :attrs nil, :content (HewaniLife | Changing The Way You Live)}
{:tag :link, :attrs {:href styles/main.css, :rel stylesheet, :type text/css}, :content ()} )}
{:tag :body, :attrs nil, :content (
{:tag :html, :attrs nil, :content ({:tag :body, :attrs nil, :content ({:tag :div, :attrs {:id header}, :content (
{:tag :h1, :attrs nil, :content ({:tag :a, :attrs {:href index.xhtml, :id logo}, :content (
{:tag :span, :attrs {:class img-replace}, :content (hewaniLife)})})}
{:tag :div, :attrs {:id main-nav}, :content (
{:tag :ul, :attrs nil, :content (
{:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href login.xhtml, :id btn-login}, :content (
{:tag :span, :attrs {:class img-replace}, :content (Login)})})}
{:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href index.xhtml, :id btn-home}, :content (
{:tag :span, :attrs {:class img-replace}, :content (Home)})})}
{:tag :li, :attrs nil, :content ({:tag :a, :attrs {:href search.xhtml, :id btn-search}, :content (
{:tag :span, :attrs {:class img-replace}, :content (Search)})})})})}
{:type :comment, :data end of div#main-nav }
{:tag :br, :attrs {:class clear-all}, :content nil})} {:type :comment, :data end of div#header })})})})}
Here, you can see the html tags nested when I insert the files.
Is there are any way to insert these files..?
Can anybody used any other methods..?
You should use defsnippet rather and specify which parts are of interest to you.
All your fragements can even reside in a single page and defsnippet will pluck different fragments out.
html-snippet is mainly intended for playing at the repl
I have found a method in enlive named as html-snippet. You can use it to combine multiple html fragment codes.
Related
Say I have a tree like this. I would like to obtain the paths to child nodes that only contain leaves and not non-leaf child nodes.
So for this tree
root
├──leaf123
├──level_a_node1
│ ├──leaf456
├──level_a_node2
│ ├──level_b_node1
│ │ └──leaf987
│ └──level_b_node2
│ └──level_c_node1
| └── leaf654
├──leaf789
└──level_a_node3
└──leaf432
The result would be
[["root" "level_a_node1"]
["root" "level_a_node2" "level_b_node1"]
["root" "level_a_node2" "level_b_node2" "level_c_node1"]
["root" "level_a_node3"]]
I've attempted to go down to the bottom nodes and check if the (lefts) and the (rights) are not branches, but that that doesn't quite work.
(z/vector-zip ["root"
["level_a_node3" ["leaf432"]]
["level_a_node2" ["level_b_node2" ["level_c_node1" ["leaf654"]]] ["level_b_node1" ["leaf987"]] ["leaf789"]]
["level_a_node1" ["leaf456"]]
["leaf123"]])
edit: my data is actually coming in as a list of paths and I'm converting that into a tree. But maybe that is an overcomplication?
[["root" "leaf"]
["root" "level_a_node1" "leaf"]
["root" "level_a_node2" "leaf"]
["root" "level_a_node2" "level_b_node1" "leaf"]
["root" "level_a_node2" "level_b_node2" "level_c_node1" "leaf"]
["root" "level_a_node3" "leaf"]]
Hiccup-style structures are a nice place to visit, but I wouldn't want to live there. That is, they're very succinct to write, but a giant pain to manipulate programmatically, because the semantic nesting structure is not reflected in the physical structure of the nodes. So, the first thing I would do is convert to Enlive-style tree representation (or, ideally, generate Enlive to begin with):
(def hiccup
["root"
["level_a_node3" ["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456"]]
["leaf123"]])
(defn hiccup->enlive [x]
(when (vector? x)
{:tag (first x)
:content (map hiccup->enlive (rest x))}))
(def enlive (hiccup->enlive hiccup))
;; Yielding...
{:tag "root",
:content
({:tag "level_a_node3", :content ({:tag "leaf432", :content ()})}
{:tag "level_a_node2",
:content
({:tag "level_b_node2",
:content
({:tag "level_c_node1",
:content ({:tag "leaf654", :content ()})})}
{:tag "level_b_node1", :content ({:tag "leaf987", :content ()})}
{:tag "leaf789", :content ()})}
{:tag "level_a_node1", :content ({:tag "leaf456", :content ()})}
{:tag "leaf123", :content ()})}
Having done this, the last thing getting in your way is your desire to use zippers. They are a good tool for targeted traversals, where you care a lot about the structure near the node you are working on. But if all you care about is the node and its children, it is much easier to just write a simple recursive function to traverse the tree:
(defn paths-to-leaves [{:keys [tag content] :as root}]
(when (seq content)
(if (every? #(empty? (:content %)) content)
[(list tag)]
(for [child content
path (paths-to-leaves child)]
(cons tag path)))))
The ability to write recursive traversals like this is a skill that will serve you many times throughout your Clojure career (for example, a similar question I recently answered on Code Review). It turns out that a huge number of functions on trees are just: call yourself recursively on each child, and somehow combine the results, usually in a possibly-nested for loop. The hard part is just figuring out what your base case needs to be, and the correct sequence of maps/mapcats to combine the results without introducing undesired levels of nesting.
If you insist on sticking with Hiccup, you can de-mangle it at the use site without too much pain:
(defn hiccup-paths-to-leaves [node]
(when (vector? node)
(let [tag (first node), content (next node)]
(if (and content (every? #(= 1 (count %)) content))
[(list tag)]
(for [child content
path (hiccup-paths-to-leaves child)]
(cons tag path))))))
But it's noticeably messier, and is work you'll have to repeat every time you work with a tree. Again I encourage you to use Enlive-style trees for your internal data representation.
You can definitely use the file api to navigate the directory. If using zipper, you can do this:
(loop [loc (vector-zip ["root"
["level_a_node3"
["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456" "leaf456b"]]
["leaf123"]])
ans nil]
(if (end? loc)
ans
(recur (next loc)
(cond->> ans
(contains-leaves-only? loc)
(cons (->> loc down path (map node)))))))
which will output this:
(("root" "level_a_node1")
("root" "level_a_node2" "level_b_node1")
("root" "level_a_node2" "level_b_node2" "level_c_node1")
("root" "level_a_node3"))
with the way you define the tree, helper functions can be implemented
as:
(def is-leaf? #(-> % down nil?))
(defn contains-leaves-only?
[loc]
(some->> loc
down ;; branch name
right ;; children list
down ;; first child
(iterate right) ;; with other sibiling
(take-while identity)
(every? is-leaf?)))
UPDATE - add a lazy sequence version
(->> ["root"
["level_a_node3"
["leaf432"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf654"]]]
["level_b_node1"
["leaf987"]]
["leaf789"]]
["level_a_node1"
["leaf456" "leaf456b"]]
["leaf123"]]
vector-zip
(iterate next)
(take-while (complement end?))
(filter contains-leaves-only?)
(map #(->> % down path (map node))))
It is because zippers have so many limitations that I created the Tupelo Forest library for processing tree-like data structures. Your problem then has a simple solution:
(ns tst.tupelo.forest-examples
(:use tupelo.core tupelo.forest tupelo.test))
(with-forest (new-forest)
(let [data ["root"
["level_a_node3" ["leaf"]]
["level_a_node2"
["level_b_node2"
["level_c_node1"
["leaf"]]]
["level_b_node1" ["leaf"]]]
["level_a_node1" ["leaf"]]
["leaf"]]
root-hid (add-tree-hiccup data)
leaf-paths (find-paths-with root-hid [:** :*] leaf-path?)]
with a tree that looks like:
(hid->bush root-hid) =>
[{:tag "root"}
[{:tag "level_a_node3"}
[{:tag "leaf"}]]
[{:tag "level_a_node2"}
[{:tag "level_b_node2"}
[{:tag "level_c_node1"}
[{:tag "leaf"}]]]
[{:tag "level_b_node1"}
[{:tag "leaf"}]]]
[{:tag "level_a_node1"}
[{:tag "leaf"}]]
[{:tag "leaf"}]])
and a result like:
(format-paths leaf-paths) =>
[[{:tag "root"} [{:tag "level_a_node3"} [{:tag "leaf"}]]]
[{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node2"} [{:tag "level_c_node1"} [{:tag "leaf"}]]]]]
[{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node1"} [{:tag "leaf"}]]]]
[{:tag "root"} [{:tag "level_a_node1"} [{:tag "leaf"}]]]
[{:tag "root"} [{:tag "leaf"}]]]))))
There are many choices after this depending on the next steps in the processing chain.
I'm struggling to come up with a reliable solution to a problem I'm having using tools.analyzer.
What I'm trying to achieve is, given an ast node, what is the last/furthest node in the tree? E.g. If the following code were analayzed: (def a (do (+ 1 2) 3))
Is there a reliable way of marking the node that has the value "3" as the last node in this tree? Essentially what I'm attempting to do is work out which form will eventually be bound to the var b.
Questions like the above are the reason I created the tupelo.forest library a few years ago.
You may wish to view:
The home page of the lib
The API docs
The Clojure/Conj talk
Many live examples
To get started, lets add the data as a tree, after putting it into hiccup format. Here is an outline of how to proceed, with unit tests showing the result of the operations:
(ns tst.tupelo.forest-examples
(:use tupelo.core tupelo.forest tupelo.test))
(dotest-focus
(hid-count-reset)
(with-forest (new-forest)
(let [data-orig (quote (def a (do (+ 1 2) 3)))
data-vec (unlazy data-orig)
root-hid (add-tree-hiccup data-vec)
all-paths (find-paths root-hid [:** :*])
max-len (apply max (mapv #(count %) all-paths))
paths-max-len (keep-if #(= max-len (count %)) all-paths)]
Here is the result:
(is= data-vec (quote [def a [do [+ 1 2] 3]]))
(is= (hid->bush root-hid)
(quote
[{:tag def}
[{:tag :tupelo.forest/raw, :value a}]
[{:tag do}
[{:tag +}
[{:tag :tupelo.forest/raw, :value 1}]
[{:tag :tupelo.forest/raw, :value 2}]]
[{:tag :tupelo.forest/raw, :value 3}]]]))
(is= all-paths
[[1007]
[1007 1001]
[1007 1006]
[1007 1006 1004]
[1007 1006 1004 1002]
[1007 1006 1004 1003]
[1007 1006 1005]])
(is= paths-max-len
[[1007 1006 1004 1002]
[1007 1006 1004 1003]])
(nl)
(is= (format-paths paths-max-len)
(quote [[{:tag def}
[{:tag do} [{:tag +} [{:tag :tupelo.forest/raw, :value 1}]]]]
[{:tag def}
[{:tag do} [{:tag +} [{:tag :tupelo.forest/raw, :value 2}]]]]])))))
Depending on your specific goal, you can continue the processing further.
I am a new to Clojure and enlive.
I have html like this
<SPAN CLASS="f10">....</SPAN></DIV><DIV CLASS="p5"><SPAN CLASS="f10">.....</SPAN>
I tried this
(html/select (fetch-url base-url) [:span.f10 [:a (html/attr? :href)]]))
but it returns this
({:tag :a,
:attrs
{:target "detail",
:title
"...",
:href
"value1"},
:content ("....")}
{:tag :a,
:attrs
{:target "detail",
:title
"....",
:href
"value2"},
:content
("....")}
What i want is just value1 and value 2 in the output. How can i accomplish it ?
select returns the matched nodes, but you still need to extract their href attributes. To do that, you can use attr-values:
(mapcat #(html/attr-values % :href)
(html/select (html/html-resource "sample.html") [:span.f10 (html/attr? :href)]))
I use this little function because the Enlive attr functions don't return the values. You are basically just walking the hash to get the value.
user=> (def data {:tag :a, :attrs {:target "detail", :title "...", :href "value1"}})
#'user/data
user=> (defn- get-attr [node attr]
#_=> (some-> node :attrs attr))
#'user/get-attr
user=> (get-attr data :href)
"value1"
Are there secondary clojure xml parsing projects that could be used after or in conjunction with clojure-xml/parse, and, if so, what are they?
clojure-xml/parse works wonderfully, but the map returned by clojure-xml/parse is deeply nested, at least after parsing one of our water cuts/tampers xml files. I am wondering if a secondary library exists that would allow me to parse further.
Here is just part of our xml file deliberately folded so you do not have to scroll.
:content [{:tag :Header, :attrs nil, :content [{:tag :ExportType,
:attrs nil, :content ["Tamper Export"]}
{:tag :CurrentDateTime, :attrs nil, :content ["
Notice the vector with embedded maps.
I can certainly develop something that could be used to parse this further, but I was just wondering if a module already exists.
Thank You.
The library to "parse" the content further is clojure.core. The functions and macros there can do a very good job of transforming the data structure generated from the XML into something useful. My personal favorite technique is using the two threading macros while making use of first and the keyword functions. If I need to do more than just digging deep, I'll write a quick function I can use map on.
The data structure you get back from the clojure.xml/parse is just as deep as the xml - each element has one map with three items, the content being a vector of child elements and strings. It may look a little bit deeper, but it's just an open representation of what would be stored, say, in the Java XML objects. It's biggest advantage is you don't need a special API to work with it - the functions you use on normal data work on the XML just as well. If anything, you write a few functions to translate into your domain and that's it.
Say you have something like the following (I'm leaving out attrs for brevity):
{:tag :stuff
:content [{:tag item
:content [{:tag :key :content ["Key one"]}
{:tag :value :content ["Item one"]}]}
{:tag item
:content [{:tag :key :content ["Key two"]}
{:tag :value :content ["Item two"]}]}]}
It's nested, but make a utility function for transforming each item into something usable.
(defn transform-item [item]
(let [key-element (-> item :content first)
value-element (-> item :content second)]
[(-> key-element :content first)
(-> value-element :content first)]))
And then map that on the content of the root element.
(defn transform-stuff [stuff-xml]
(into {} (map transform-item (:content stuff-xml)))
And you should end up with some data which actually represents your domain.
{"Key one" "Item One", "Key two" "Item 2"}
The key is to not think of it as parsing, but just as translating one data structure into another.
I need to retrieve some some raw HTML from a certain part of an HTML page.
I wrote the scraper and it grabs the appropriate div, but it returns a map of tags.
(:use [net.cgrand.enlive-html :as html])
(defn fetch-url [url]
(html/html-resource (java.net.URL. url)))
(defn parse-test []
(let [url "http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0000928/"
url-data (fetch-url url)
id "a693025"]
(first (html/select url-data [(keyword (str "#" id "-why"))]))))
This outputs:
{:tag :div, :attrs {:class "section", :id "a693025-why"}, :content ({:tag :h2, :attrs nil, :content ({:tag :span, :attrs {:class "title"}, :content ("Why is this medication prescribed?")})} {:tag :p, :attrs nil, :content ("Zolpidem is used to treat insomnia (difficulty falling asleep or staying asleep). Zolpidem belongs to a class of medications called sedative-hypnotics. It works by slowing activity in the brain to allow sleep.")})}
How do I convert this to raw html? I couldn't find any enlive function to do this.
(apply str (html/emit* [(parse-test)]))
; => "<div class=\"section\" id=\"a693025-why\"><h2><span class=\"title\">Why is this medication prescribed?</span></h2><p>Zolpidem is used to treat insomnia (difficulty falling asleep or staying asleep). Zolpidem belongs to a class of medications called sedative-hypnotics. It works by slowing activity in the brain to allow sleep.</p></div>"