Logical OR in CLIPS - simple defrule - assert

I've just started working in CLIPS. I'm trying to do this simple rule, but i've got no idea how to use logical OR here. I know I could define two rules (one for relative-brother and other one for relative-sister) but I think it's not a point.
The rule is: You are a relative of someone, if you are his brother or sister.
(defrule MAIN::siblings-relatives
(is-brother ?x ?y)
(test (or (is-sister ?x ?y))
=>
(assert (is-relative ?x ?y))
(printout t ?x " is relative of " ?y crlf))

CLIPS> (clear)
CLIPS>
(defrule siblings-relatives
(or (is-brother ?x ?y)
(is-sister ?x ?y))
=>
(assert (is-relative ?x ?y))
(printout t ?x " is relative of " ?y crlf))
CLIPS> (assert (is-brother Dave Jim))
<Fact-1>
CLIPS> (assert (is-sister Jane Frank))
<Fact-2>
CLIPS> (run)
Jane is relative of Frank
Dave is relative of Jim
CLIPS> (facts)
f-0 (initial-fact)
f-1 (is-brother Dave Jim)
f-2 (is-sister Jane Frank)
f-3 (is-relative Jane Frank)
f-4 (is-relative Dave Jim)
For a total of 5 facts.
CLIPS>

Related

How to optimize pattern matching between different templated facts in CLIPS

I have a rule similar to the following:
(deftemplate person
(slot name ( type INTEGER))
(slot surname ( type INTEGER))
)
(defrule surname_cant_be_a_name
?p1<-(person (name ?n1))
?p2<-(person (surname ?n2&:(= ?n1 ?n2)))
=>
(retract ?p2)
)
Functionally, this works. But I run this on a huge fact-set, and the complexity gets through the roof fairly quickly.
Because the rule is looking for two person objects, there's a nested for-loop kinda situation slowing the execution down. This setup goes through every possible person pairing and only after having a pair the rule filters out based on my setup "&:(= ?n1 ?n2))"
I feel like there must be a smarter way to do this. Ideally, I want p1 to iterate through all person objects, but only match with p2 objects that conform to my rule.
To make my point clearer, I'm looking for something like the following which will avoid double looping:
(defrule surname_cant_be_a_name
?p1<-(person (name ?n1))
?p2<-(person (surname %%JUST_MATCH_n1%% ))
=>
(retract ?p2)
)
Is this possible to achieve something like that? Any recommendation to optimize this rule is appreciated.
Thanks
P.S. Sorry for the ridiculous example, but it highlights my situation very well.
If you're comparing variables for equality, it's much more efficient to use the same variable in both places than to use two separate variables and call the = or eq function to compare for equality. Across patterns, hash tables are used to quickly locate facts sharing the same variables, something which isn't done when you're using a function call to perform an equality comparison. For a large number of facts, this can improve performance by orders of magnitude:
CLIPS (6.31 6/12/19)
CLIPS> (clear)
CLIPS>
(deftemplate person
(slot name (type INTEGER))
(slot surname (type INTEGER)))
CLIPS>
(defrule surname_cant_be_a_name
?p1<- (person (name ?n1))
?p2<- (person (surname ?n2&:(= ?n1 ?n2)))
=>
(retract ?p2))
CLIPS> (timer (loop-for-count (?i 10000) (assert (person (name ?i) (surname (+ ?i 1))))))
12.3485549999987
CLIPS> (clear)
CLIPS>
(deftemplate person
(slot name (type INTEGER))
(slot surname (type INTEGER)))
CLIPS>
(defrule surname_cant_be_a_name
?p1 <- (person (name ?n1))
?p2 <- (person (surname ?n1))
=>
(retract ?p2))
CLIPS> (timer (loop-for-count (?i 10000) (assert (person (name ?i) (surname (+ ?i 1))))))
0.0177029999995284
CLIPS> (/ 12.3485549999987 0.0177029999995284)
697.540247434201
CLIPS>

How to convert a lazy sequence to a map?

I've got a lazy sequence that, when sent to println, shows like this:
(((red dog) (purple cat)) ((green mouse) (yellow bird)))
Note that this is the result of reading a csv and trimming all "cell" values, hence (a) the fact that it's lazy at the time I want to print it and (b) in the future innermost lists might be more than 2 strings because more columns are added.
I'm trying to juse clojure.pprint/print-table to print this in a two-column table. I'm having a pretty hard time because print-table seems to want a map with data.
Here's a repro:
;; Mock the lazy data retrieved from a csv file
(defn get-lazy-data []
(lazy-seq '('('("red" "dog") '("purple" "cat")) '('("green" "mouse") '("yellow" "bird")))))
(defn -main []
(let [data (get-lazy-data)]
(println "Starting...")
(println data)
(println "Continuing...")
(print-table data)
(println "Finished!")))
This gives an error:
Exception in thread "main" java.lang.ClassCastException: clojure.lang.Symbol cannot be cast to java.util.Map$Entry
I've tried various options:
(print-table (apply hash-map data)) gives same exception
(print-table (zipmap data)) tells me to provide another argument for the keys, but I want a solution that doesn't rely on specifying the number of columns beforehand
comprehending and adapting the answer to "Clojure printing lazy sequence", which would be a duplicate of my question were it not that both the question and answer seem so much more complex that I don't know how to translate that solution to my own scenario
Basically I know I have an XY-problem but now I want answers to both questions:
X: How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
Y: How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?
How do I pretty print a lazy sequence of pairs of pairs of strings as a table on the console?
The fact that your "rows" seem to be grouped in pairs is odd, assuming you want a two-column table of color/animal, so we can remove the extra grouping with mapcat identity then zipmap those pairs with the desired map keywords:
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(def de-tupled (mapcat identity my-list))
(map #(zipmap [:color :animal] %) de-tupled)
=> ({:color red, :animal dog} {:color purple, :animal cat} {:color green, :animal mouse} {:color yellow, :animal bird})
(clojure.pprint/print-table *1)
| :color | :animal |
|--------+---------|
| red | dog |
| purple | cat |
| green | mouse |
| yellow | bird |
It's not clear from the question, but it seems like you want to support an arbitrary number of "columns" which kinda precludes having fixed names for them. In that case you can do something like this:
(def my-list ;; added third mood "column"
'(((red dog happy) (purple cat sad)) ((green mouse happy) (yellow bird sad))))
(def de-tupled (apply concat my-list))
(clojure.pprint/print-table (map #(zipmap (range) %) de-tupled))
| 0 | 1 | 2 |
|--------+-------+-------|
| red | dog | happy |
| purple | cat | sad |
| green | mouse | happy |
| yellow | bird | sad |
How can I convert a lazy sequence to a map (where e.g. keys are the indexes)?
(def my-list
'(((red dog) (purple cat)) ((green mouse) (yellow bird))))
(zipmap (range) my-list)
=> {0 ((red dog) (purple cat)), 1 ((green mouse) (yellow bird))}
A related point to your problem is how you print out your data. Clojure has two ways to print:
(dotest
(println ["hello" "there" "everybody"]) ; #1
(prn ["hello" "there" "everybody"])) ; #2
#1 => [hello there everybody]
#2 => ["hello" "there" "everybody"]
For strings the presence of quotes in #2 makes a huge difference in understanding what is happening. The prn function produces output that is machine-readable (like what you type in your source code). You really need that if you have strings involved in your data.
Look at the difference with symbols:
(println ['hello 'there 'everybody])
(prn ['hello 'there 'everybody])
; doesn't matter if you quote the whole form or individual symbols
(println '[hello there everybody])
(prn '[hello there everybody])
all results are the same:
[hello there everybody]
[hello there everybody]
[hello there everybody]
[hello there everybody]
The point is that you need prn to tell the difference between symbols and strings when you print results. Note that the prn output format (with double-quotes) happens automatically if you use pprint:
(def data
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]])
(clojure.pprint/pprint data) =>
[["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]
["here" "we" "have" "a" "lot" "of" "strings" "in" "vectors"]]

Convert vector of lists into vector of vectors

I have the following data in a .txt file:
1|John Smith|123 Here Street|456-4567
2|Sue Jones|43 Rose Court Street|345-7867
3|Fan Yuhong|165 Happy Lane|345-4533
I get the data and convert it to a vector using the following code:
(def custContents (slurp "cust.txt"))
(def custVector (clojure.string/split custContents #"\||\n"))
(def testing (into [] (partition 4 custVector )))
Which gives me the following vector:
[(1 John Smith 123 Here Street 456-4567) (2 Sue Jones 43 Rose Court Street
345-7867) (3 Fan Yuhong 165 Happy Lane 345-4533)]
I would like to convert it into a vector of vectors like this:
[[1 John Smith 123 Here Street 456-4567] [2 Sue Jones 43 Rose Court Street
345-7867] [3 Fan Yuhong 165 Happy Lane 345-4533]]
I would do it slightly differently, so you first break it up into lines, then process each line. It also makes the regex simpler:
(ns tst.demo.core
(:require
[clojure.string :as str] ))
(def data
"1|John Smith|123 Here Street|456-4567
2|Sue Jones|43 Rose Court Street|345-7867
3|Fan Yuhong|165 Happy Lane|345-4533")
(let [lines (str/split-lines data)
line-vecs-1 (mapv #(str/split % #"\|" ) lines)
line-vecs-2 (mapv #(str/split % #"[|]") lines)]
...)
with result:
lines => ["1|John Smith|123 Here Street|456-4567"
"2|Sue Jones|43 Rose Court Street|345-7867"
"3|Fan Yuhong|165 Happy Lane|345-4533"]
line-vecs-1 =>
[["1" "John Smith" "123 Here Street" "456-4567"]
["2" "Sue Jones" "43 Rose Court Street" "345-7867"]
["3" "Fan Yuhong" "165 Happy Lane" "345-4533"]]
line-vecs-2 =>
[["1" "John Smith" "123 Here Street" "456-4567"]
["2" "Sue Jones" "43 Rose Court Street" "345-7867"]
["3" "Fan Yuhong" "165 Happy Lane" "345-4533"]]
Note that there are 2 ways of doing the regex. line-vecs-1 shows a regex where the pipe character is escaped in the string. Since regex varies on different platform (e.g. on Java one would need "\|"), line-vecs-2 uses a regex class of a single character (the pipe), which sidesteps the need for escaping the pipe.
Update
Other Clojure Learning Resources:
Brave Clojure
Clojure CheatSheet
ClojureDocs.org
Clojure-Doc.org (similar but different)
> (mapv vec testing)
=> [["1" "John Smith" "123 Here Street" "456-4567"]
["2" "Sue Jones" "43 Rose Court Street" "345-7867"]
["3" "Fan Yuhong" "165 Happy Lane" "345-4533"]]

How to construct a query that matches exactly a vector of refs in DataScript?

Setup Consider the following DataScript database of films and cast, with data stolen from learndatalogtoday.org: the following code can be executed in a JVM/Clojure REPL or a ClojureScript REPL, as long as project.clj contains [datascript "0.15.0"] as a dependency.
(ns user
(:require [datascript.core :as d]))
(def data
[["First Blood" ["Sylvester Stallone" "Brian Dennehy" "Richard Crenna"]]
["Terminator 2: Judgment Day" ["Linda Hamilton" "Arnold Schwarzenegger" "Edward Furlong" "Robert Patrick"]]
["The Terminator" ["Arnold Schwarzenegger" "Linda Hamilton" "Michael Biehn"]]
["Rambo III" ["Richard Crenna" "Sylvester Stallone" "Marc de Jonge"]]
["Predator 2" ["Gary Busey" "Danny Glover" "Ruben Blades"]]
["Lethal Weapon" ["Gary Busey" "Mel Gibson" "Danny Glover"]]
["Lethal Weapon 2" ["Mel Gibson" "Joe Pesci" "Danny Glover"]]
["Lethal Weapon 3" ["Joe Pesci" "Danny Glover" "Mel Gibson"]]
["Alien" ["Tom Skerritt" "Veronica Cartwright" "Sigourney Weaver"]]
["Aliens" ["Carrie Henn" "Sigourney Weaver" "Michael Biehn"]]
["Die Hard" ["Alan Rickman" "Bruce Willis" "Alexander Godunov"]]
["Rambo: First Blood Part II" ["Richard Crenna" "Sylvester Stallone" "Charles Napier"]]
["Commando" ["Arnold Schwarzenegger" "Alyssa Milano" "Rae Dawn Chong"]]
["Mad Max 2" ["Bruce Spence" "Mel Gibson" "Michael Preston"]]
["Mad Max" ["Joanne Samuel" "Steve Bisley" "Mel Gibson"]]
["RoboCop" ["Nancy Allen" "Peter Weller" "Ronny Cox"]]
["Braveheart" ["Sophie Marceau" "Mel Gibson"]]
["Mad Max Beyond Thunderdome" ["Mel Gibson" "Tina Turner"]]
["Predator" ["Carl Weathers" "Elpidia Carrillo" "Arnold Schwarzenegger"]]
["Terminator 3: Rise of the Machines" ["Nick Stahl" "Arnold Schwarzenegger" "Claire Danes"]]])
(def conn (d/create-conn {:film/cast {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}
:film/name {:db/unique :db.unique/identity
:db/cardinality :db.cardinality/one}
:actor/name {:db/unique :db.unique/identity
:db/cardinality :db.cardinality/one}}))
(def all-datoms (mapcat (fn [[film actors]]
(into [{:film/name film}]
(map #(hash-map :actor/name %) actors)))
data))
(def all-relations (mapv (fn [[film actors]]
{:db/id [:film/name film]
:film/cast (mapv #(vector :actor/name %) actors)}) data))
(d/transact! conn all-datoms)
(d/transact! conn all-relations)
Description In a nutshell, there are two kinds of entities in this database—films and actors (word intended to be ungendered)—and three kinds of datoms:
film entity: :film/name (a unique string)
film entity: :film/cast (multiple refs)
actor entity: :actor/name (unique string)
Question I would like to construct a query which asks: which films have these N actors, and these N actors alone, appeared as the sole stars, for N>=2?
E.g., RoboCop starred Nancy Allen, Peter Weller, Ronny Cox, but no film starred solely the first two of these, Allen and Weller. Therefore, I would expect the following query to produce the empty set:
(d/q '[:find ?film-name
:where
[?film :film/name ?film-name]
[?film :film/cast ?actor-1]
[?film :film/cast ?actor-2]
[?actor-1 :actor/name "Nancy Allen"]
[?actor-2 :actor/name "Peter Weller"]]
#conn)
; => #{["RoboCop"]}
However, the query is flawed because I don't know how to express that any matches should exclude any actors who are not Allen or Weller—again, I want to find the movies where only Allen and Weller have collaborated without any other actors, so I want to adapt the above query to produce the empty set. How can I adjust this query to enforce this requirement?
Because DataScript doesn't have negation (as of May 2016), I don't believe that's possible with one static query in 'pure' Datalog.
My way to go would be:
build the query programmatically to add the N clauses that state that the cast must contain the N actors
Add a predicate function which, given a movie, the database, and the set of actors ids, uses the EAVT index to find if each movie has an actor that is not in the set.
Here's a basic implementation
(defn only-those-actors? [db movie actors]
(->> (datoms db :eavt movie :film/cast) seq
(every? (fn [[_ _ actor]]
(contains? actors actor)))
))
(defn find-movies-with-exact-cast [db actors-names]
(let [actors (set (d/q '[:find [?actor ...] :in $ [?name ...] ?only-those-actors :where
[?actor :actor/name ?name]]
db actors-names))
query {:find '[[?movie ...]]
:in '[$ ?actors ?db]
:where
(concat
(for [actor actors]
['?movie :film/cast actor])
[['(only-those-actors? ?db ?movie ?actors)]])}]
(d/q query db actors db only-those-actors?)))
You can use predicate fun and d/entity together for filtering datoms by :film/cast field of an entity. This approach looks much more straightforward until Datascript doesn't support negation (not operator and so on).
Look at the row (= a (:age (d/entity db e)) in the test case of the Datascript here
[{:db/id 1 :name "Ivan" :age 10}
{:db/id 2 :name "Ivan" :age 20}
{:db/id 3 :name "Oleg" :age 10}
{:db/id 4 :name "Oleg" :age 20}]
...
(let [pred (fn [db e a]
(= a (:age (d/entity db e))))]
(is (= (q/q '[:find ?e
:in $ ?pred
:where [?e :age ?a]
[(?pred $ ?e 10)]]
db pred)
#{[1] [3]})))))
In your case, the predicate body could look something like this
(clojure.set/subset? actors (:film/cast (d/entity db e))
In regards to performance, the d/entity call is fast because it is a lookup by index.

How to Parse and Compare Files?

I'd appreciate suggestions/insights on how I can leverage Clojure to efficiently parse and compare two files. There are two (log) files that contain employee attendance; from these files I need to determine all the days that two employees worked the same times, in the same department. Below are examples of the log files.
Note: each file has differing number of entries.
First File:
Employee Id Name Time In Time Out Dept.
mce0518 Jon 2011-01-01 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-02 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-04 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 17:00 2011-01-01 23:00 ER
Second File:
Employee Id Name Time In Time Out Dept.
pdm1705 Jane 2011-01-01 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-02 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-05 06:00 2011-01-01 13:00 ER
pdm1705 Jane 2011-01-05 17:00 2011-01-01 23:00 ER
if you are not going to do it periodically,
(defn data-seq [f]
(with-open [rdr (java.io.BufferedReader.
(java.io.FileReader. f))]
(let [s (rest (line-seq rdr))]
(doall (map seq (map #(.split % "\\s+") s))))))
(defn same-time? [a b]
(let [a (drop 2 a)
b (drop 2 b)]
(= a b)))
(let [f1 (data-seq "f1.txt")
f2 (data-seq "f2.txt")]
(reduce (fn[h v]
(let [f2 (filter #(same-time? v %) f2)]
(if (empty? f2)
h
(conj h [(first v) (map first f2)])))) [] f1)
)
will get you,
[["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")]]
I came to somewhat shorter and (IMHO) more readable version
(use ; moar toolz - moar fun
'[clojure.contrib.duck-streams :only (reader)]
'[clojure.string :only (split)]
'[clojure.contrib.str-utils :only (str-join)]
'[clojure.set :only (intersection)])
(defn read-presence [filename]
(with-open [rdr (reader filename)] ; file will be securely (always) closed after use
(apply hash-set ; make employee's hash-set
(map #(str-join "--" (drop 2 (split % #" [ ]+"))) ; right-to-left: split row by spaces then forget two first columns then join using "--"
(drop 1 ; ommit first line
(line-seq rdr)))))) ; read file content line-by-line
(intersection (read-presence "a.in") (read-presence "b.in")) ; now it's simple!
;result: #{"2011-01-01 06:00--2011-01-01 14:00--ER" "2011-01-02 06:00--2011-01-01 14:00--ER" "2011-01-05 17:00--2011-01-01 23:00--ER"}
Assuming a.in and b.in are your files. I also assumed you'll have one hash-set for each employee -- (naive) generalization to N employees would need next six lines:
(def employees ["greg.txt" "allison.txt" "robert.txt" "eric.txt" "james.txt" "lisa.txt"])
(for [a employees b employees :when (and
(= a (first (sort [a b]))) ; thou shall compare greg with james ONCE
(not (= a b)))] ; thou shall not compare greg with greg
(str-join " -- " ; well, it's not pretty... nor pink at least
[a b (intersection (read-presence a) (read-presence b))]))
;result: ("a.in -- b.in -- #{\"2011-01-01 06:00--2011-01-01 14:00--ER\" \"2011-01-02 06:00--2011-01-01 14:00--ER\" \"2011-01-05 17:00--2011-01-01 23:00--ER\"}")
Actually this loop is sooo ugly and it doesn't memorize intermediate results... To be improved.
--edit--
I knew there must be something elegant in core or contrib!
(use '[clojure.contrib.combinatorics :only (combinations)])
(def employees ["greg.txt" "allison.txt" "robert.txt" "eric.txt" "james.txt" "lisa.txt"])
(def employee-map (apply conj (for [e employees] {e (read-presence e)})))
(map (fn [[a b]] [a b (intersection (employee-map a) (employee-map b))])
(combinations employees 2))
;result: (["a.in" "b.in" #{"2011-01-01 06:00--2011-01-01 14:00--ER" "2011-01-02 06:00--2011-01-01 14:00--ER" "2011-01-05 17:00--2011-01-01 23:00--ER"}])
Now it's memorized (parsed data in employee-map), general and... lazy :D