How to get older versions for clojure data structures - clojure

Clojure is using persistent data structures , is there a way to access older versions of vector or maps since it is keeping it internally ?
Lets say for a Vector ,what i meant is since clojure is not copying full structure and keeping it in a tree internally (see https://hypirion.com/musings/understanding-persistent-vector-pt-1) and keeps older structure values too, is there a way to use this to do some senarios like undo/redo or replay, It is using the same principle for Datomic to retrieve older version for data, so im asking if it is possible to get this in clojure.

I am not sure that I understood the question, but just keep a reference on the old structure.
(def my-old-map {a 1, b 2, c 3})
(def my-new-map (assoc my-old-map b 7))

Any version of a persistent data structure survives so long as there is a live reference to it. Thereafter, it is subject to garbage collection.
The Clojure persistent vectors and maps are like copy-on-write file systems such as Btrfs, both in concept and in the sort of internal data structures they employ to create the illusion that each version of an entity is quite distinct.

As per Kris's comment, using add-watch is the right answer it seems ,
David Nolen has described this approach here
https://swannodette.github.io/2013/12/31/time-travel

Related

Clojure, replacing vars in let bindings causes performance issue?

Let's say there is a function and it received a json string which contains about 100KB of data. The string gets converted to a map and then it keeps associating new keys to that map and replacing the var in let bindings like below:
(defn myfn
[str]
(let [j (json/read-str str)
j (assoc-in j [:key1 :subkey1] somedata1)
j (assoc-in j [:key2 :subkey2] somedata2)
....
j (assoc-in j [:key100 :subkey100] somedata100)]
... do something ...))
I know, after all those let bindings, j will have all those new keys added. This is just an example. I wonder what happens inside those lots of bindings to the same var.
I mean what happens in the memory. Would that copy 100KB 100 times in the memory? And it eats up 100KB * 100 = 10,000KB until it gets out of that function? Or, Clojure is smart enough and it actually keeps adding new keys in the same memory space?
If you could also recommend where I should look for in Clojure reference to find an answer to this, that would be really nice.
Clojure uses a data structure called a trie, that is similar to a tree, but which only has data at the leaf nodes. Most of clojure's persistent structures are implemented as a trie.
This excellent article really explains things in detail and uses vectors, so I won't rehash it here. I know on S.O. it's preferred to give the content rather than a link, but it's not a topic that can be covered fully in an answer here, and the article does it best, so I'll just link to it.
In short, when a data structure is modified in some way, a new trie is created for the new "version", and instead of copying all the data over from the old to the new with one change made, the nodes in the new structure point to existing data. Here is a visualization from the above article that shows data sharing:
So, using this structure, we have shared data, but since it is only a binary trie, it can get deep very quickly, so lookups could take a very long time (for a vector of 1 billion elements, the depth to get to a leaf node is log21e9 which is 30). To get around this, clojure uses a 32-way branching factor instead of a 2-way one, yielding trees that are very shallow. So, the same vector that holds 1 billion elements in clojure only takes log321e9, or 6 levels of indirection, to reach the leaves.
I encourage you to read the article, and also have a look at PersistentHashMap, and you will see references to shift + 5 in several places. This is a clever way to use bit-shifting to index into the trie (log232 = 5). See the second part of the article for more in-depth info on this.
To summarize, clojure uses efficient data structures to achieve persistence, and any language which features immutability as a core feature must do this, if it hopes to achieve usable performance.

Clojure: immutability and persistence

Every textbook says that Clojure data structures are 'immutable and persistent'. They go different lengths explaining the concept, but so far I failed to figure out what is the difference between immutability and persistence. Is there an entity persistent but mutable? or immutable but not persistent?
Immutable means that the value can't be changed and persistence means that the path to the value is copied if the value already exists within the program. Clojure uses this as a part of it's structural sharing implementation. If the data doesn't exist, it's created. If the data exists, the new data builds on the old version of the data without altering or removing it.
Atoms are persistent but safely mutable.
user> (def +a+ (atom 0))
#'user/+a+
user> #+a+
0
user> (swap! +a+ inc)
1
user> #+a+
1
Transients are mutable but should be made persistent after mutation
user> (def t (transient []))
#'user/t
user> (conj! t 1)
#<TransientVector clojure.lang.PersistentVector$TransientVector#658ee462>
user> (persistent! t)
[1]
Understanding Clojure's Persistent Vectors, pt. 1 =>
http://hypirion.com/musings/understanding-persistent-vector-pt-1
Persistent data structure => https://en.wikipedia.org/wiki/Persistent_data_structure
Persistent Data Structures and Managed References =>
http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Purely Functional Data Structures by Chris Okasaki refers to an article [1] which appears to contain the original definition of the term persistent:
Ordinary data structures are ephemeral in the sense that making a change to the structure destroys the old version, leaving only the new one. … We call a data structure persistent if it supports access to multiple versions. The structure is partially persistent if all versions can be accessed but only the newest version can be modified, and fully persistent if every version can be both accessed and modified.
[1] James R. Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 38(1):86–124, February 1989.
Immutable implies persistent, but persistent does not imply immutable. So you could have something that's persistent but not immutable.
An example of a mutable and persistent data structure is Java's CopyOnWriteArrayList.
Persistence does not imply shared structure, nor does it say anything about performance. Of course, shared structure and good performance are both highly desirable, and are both provided by Clojure's persistent data structures. But it would be quite possible to create something that had no structure sharing and awful performance (see CopyOnWriteArrayList, for example ;-)) but was still persistent.
Basically immutable == can't be changed, and persistent == immutable, with shared structure.
If I have a language where arrays can't be changed, then arrays are immutable. To "change" the array, I must create a new array and copy every element (except the one(s) to be changed) into the new array. This makes any update O(n), where n is the number of elements in the array. This is obviously inefficient for large n.
On the other hand, if I use a persistent data structure instead of an array, then instead of copying every element every time the data structure is "altered", the new version shares most of the same structure with the old one.
The details depend on the structure, but often usually there is a tree involved. If the tree is balanced, replacing an element means creating new copies of nodes along the path from the root to the leaf containing the element. The rest of the nodes are shared with the original version. The length of this path is O(n log(n)). Since the nodes are O(1) size, the entire operation takes O(n log(n)) time and extra space.
Note that not all persistent structures support the same operations efficiently. In Clojure, for example, Lists are singly-linked lists. You can efficiently add and remove elements to/from the front, but that's about it. Vectors, on the other hand, allow you to efficiently get any element and add/remove elements at the back.

Clojure states within states within states

I'd love to hear what advice the Clojure gurus here have about managing state in hierarchies. I find I'm often using {:structures {:like {:this {:with {:many 'levels}} } } } and if I want to track changes in state at multiple levels, by throwing atoms around values (atom {:like (atom 'this)} ), I find myself thinking this must be wrong. Is it generally better to use just one atom at the top level, and have none as values in a map ?
Don't use nested atoms in a data structure if at all possible.
The main reason is that immutability is your friend. Clojure is a functional language that thrives on immutable data structures. Most libraries assume immutable data structures. Clojure's STM assumes immutable data structures to get the best possible concurrency. Immutability gives you the opportunity to take consistent snapshots of the entire state at any one instant. Pure functions that operate on immutable data are easy to develop and test.
If you put atoms inside your data structures then you lose all the advantages of immutability and risk making your code very complex - it's a lot harder to reason about a data structure if it contains a lot of mutable components.
Some suggested alternative approaches:
Put your entire data structure in a single ref or atom. This can be a huge data structure with no problem - I once wrote a game where the entire game map was held in a single atom without any difficulty.
Use the various methods that are designed for accessing and changing nested immutable data structures: assoc-in, get-in, update-in etc.
Use recursive functions to make navigating your data structure more managable. If one node of your structure has sub-nodes of the same "type" then it's usually a good hint that you should be using some form of recursive function.
You can use assoc-in, get-in, update-in, and dissoc-in functions to work with nested structures.
They are very convenient, but I don't know if they can handle atoms and such directly. In the worst case you should be able to nest them up to deref, e.g.:
(def m (atom {:like {:this {:nested (atom {:value 5})}}}))
#(get-in #m [:like :this :nested])
; => {:value 5}
(get-in #(get-in #m [:like :this :nested]) [:value])
; => 5
You can use -> to make this more readable:
(-> #m
(get-in [:like :this :nested])
deref
(get-in [:value]))
; => 5
Regarding nested atoms/refs/agents, etc. I think it depends on what you're trying to achieve. It's certainly easier to reason about things, if there's just one of them at the top and the changes are synchronized.
On the other hand, if you don't need this synchronization, you're wasting time in doing it, and you'll be better off with nested atoms/refs/agents.
The bottom line is, I don't think either way is "the right way", they both have their usages.
I would prefer to use one atom at top level as that would make things really simple and also that indicate that the data represent a state which is modified at once n all by an operation. If you put atoms at each level then it would become way too complex to figure out what is going on. Also if in your case the nesting is going way too deep then I would suggest you to sit back and think carefully whether you need such a structure or there can be any better alternate possible because this will certainly lead to complexity until the nested data is recursive (i.e same structure at each level)

Adding version numbers to a Clojure ref

I was wondering whether it makes sense to add a version number of a timestamp meta data every time my ref is changed, so that the freshness of data can used to determine whether data is updated by GUI components.
If you add metadata (or plain data) to the ref, then UI components will have to poll the ref to know whether to update. You might be better off to use an agent send within the ref update to notify interested parties.
you can use the answer to your question from yesterday for this purpose also.
(def my-ref (ref {}))
(def my-ref-version (atom 0))
(add-watch my-ref (fn [key ref old new] (swap! my-ref-version inc)))
If you stick to immutable data structures then you can save a copy of the data you last served and compare it to the data you are considering serving. This would be a lot simpler and you would not resend data that had been updated to the same value. immutable data is great for caching using time stamps it good when you cant directly compare the data to what you last sent. With languages that dont offer efficient copy functions for the collections this is necessary because you can't efficiently save a copy of your data before sending it. with clojure's collections saving a copy before you send is both easy and efficient.
Although metadata is an option, you could just put your ref data in a map. Either way you have to do a map lookup since Clojure's metadata lives in a map. Using metadata just makes you jump through the extra hurdle. So when you define/update your ref just make it a map and you will be functionally equivalent without having to use meta to get at the information.
I guess, you will need to remember the last time you updated the gui with the value of the ref, in order to know whether updating the gui is necessary or not. If the value is a big datastructure or the update is expensive this might make sense.

Where should I use defrecord in clojure?

I use many maps and structs in my clojure programs. What are the benefits (apart from performance) of converting these to defrecords?
I consider structs to be effectively deprecated so I don't use them at all.
When I have a fixed set of well-known keys used in many map instances, I usually create a record. The big benefits are:
Performance
Generated class has a type that I can switch on in multimethods or other situations
With additional macro machinery around defrecord, I can get field validation, default values, and whatever other stuff I want
Records can implement arbitrary interfaces or protocols (maps can't)
Records act as maps for most purposes
keys and vals return results in stable (per-creation) order
Some downsides of records:
Because records are Java class instances (not Clojure maps), there is no structural sharing so the same record structure will likely use more memory than the equivalent map structure that has been changed. There is also more object creation/destruction as you "change" a record although the JVM is designed specifically to eat this kind of short-lived garbage without breaking a sweat.
If you are changing records during development you probably need to restart your REPL more frequently to pick up those changes. This is typically only an issue during narrow bits of development.
Many existing libraries have not been updated to support records (postwalk, zip, matchure, etc etc). We've added this support as needed.
Stuart Sierra recently wrote an interesting article on "Solving the Expression Problem with Clojure 1.2", which also contains a section on defrecord:
https://web.archive.org/web/20110821210021/http://www.ibm.com/developerworks/java/library/j-clojure-protocols/#datatypes
I think the whole article is a good starting point for understanding protocols and records.
One other major benefit is the record has a type (its class) you can dispatch off of.
An example that uses this feature but is not representative of all possible uses is the following:
(defprotocol communicate
(verbalize [this]))
(defrecord Cat [hunger-level]
communicate
(verbalize [this]
(apply str (interpose " " (repeat hunger-level "meow")))))
(defrecord Dog [mood]
communicate
(verbalize [this]
(case mood
:happy "woof"
"arf")))
(verbalize (->Cat 3))
; => "meow meow meow"
(verbalize (->Dog :happy))
; => "woof"
Use maps in most cases and records only when you require polymorphism. With maps alone you can still use multimethods; however, you need records if you want protocols. Given this, wait until you need protocols before resorting to records. Until then, avoid them in favor of more data-centric and simpler code.
In addition to what has been previously noted, besides being generally at par or superior in terms of performance, and in exposing the same programming interface as a map, records enforce mild structure: key names and the number of keys are enforced at the time of definition. This might be useful in avoiding silly errors where the same structure is expected from many values (or just artificially rigid otherwise).
Whatever the original motivations, this property too sets it apart from maps.