how do atoms differ from refs? - concurrency

How do atoms and refs actual differ?
I understand that atoms are declared differently and are updated via the swap! function, whereas refs use alter inside a dosync. The internal implementation, however, seems quite similar, which makes me wonder why I would use one and not the other.
For example, the doc page for atoms (http://clojure.org/atoms) states:
Internally, swap! reads the current value, applies the function to it, and attempts to compare-and-set it in. Since another thread may have changed the value in the intervening time, it may have to retry, and does so in a spin loop. The net effect is that the value will always be the result of the application of the supplied function to a current value, atomically. However, because the function might be called multiple times, it must be free of side effects.
The method described sounds quite similar to me to the STM used for refs.

The difference is that you can't coordinate changes between multiple atoms but you can coordinate changes between multiple refs.
Ref changes have to take place inside of a dosync block. All of the changes in the dosync take place or none of them do (atomic) but that extends to all changes to the refs within that dosync. This behaves a lot like a database transaction.
Let's say for example that you wanted to remove an item from one collection and add it to another but without anyone seeing a case where neither collection has the item. That's impossible to guarantee with atoms but you can guarantee it with refs.

Keep in mind that:
Use Refs for synchronous, coordinated and shared changes.
Use Atoms for synchronous, independent and shared changes.
To me, I don't care about the implementation differences between the atoms and refs. What I care about is the "Use Cases" of each on of them.
I use refs with I need to change the state of more than one reference type and I need the ATM semantics. I use atoms when I change the state of one reference type (object, depends on how you see it).
For example, if I need to increase the number of page hits in a web analytics system; I use atoms. If I need to transfer money between two accounts, I use refs.

Related

What is the real benefit of Immutable Objects

I always hear people saying that it's easier to manage immutable objects when working with multiple threads since when one thread accessed an immutable object, it doesn't have to worry that another thread is changing it.
Well, what happens if I have an immutable list of all employees in a company and a new employee is hired? In this case the immutable list has to be duplicated and the new copy of it has to include another employee object. Then the reference to list of employees should be directed to the new list.
When this scenario happens, the list itself doesn't change, but the reference to this list changes, and therefore the code "sees" different data.
If so, I don't understand why immutable objects makes our lives easier when working with multi-threads. What am I missing?
The main problem concurrent updates of mutable data, is that threads may perceive variable values stemming from different versions, i.e. a mixture of old and new values when speaking of a single update, forming an inconsistent state, violating the invariants of these variables.
See, for example, Java’s ArrayList. It has an int field holding the current size and a reference to an array whose elements are references to the contained objects. The values of these variables have to fulfill certain invariants, e.g. if the size is non-zero, the array reference is never null and the array length is always greater or equal to the size. When seeing values of different updates for these variables, these invariants do not hold anymore, so threads may see a list contents which never existed in this form or fail with spurious exceptions, reporting an illegal state that should be impossible (like NullPointerException or ArrayIndexOutOfBoundeException).
Note that thread safe or concurrent data structures only solve the problem regarding the internals of the data structure, so operations do not fail with spurious exceptions anymore (regarding the collection’s state, we’ve not talked about the contained element’s state yet), but operations iterating over these collections or looking at more than one contained element in any form, are still subject to possibly observing an inconsistent state regarding the contained elements. This also applies to the check-then-act anti-pattern, where an application first checks for a condition (e.g. using contains), before acting upon it (like fetching, adding or removing an element), whereas the condition might change in-between.
In contrast, a thread working on an immutable data structure may work on an outdated version of it, but all variables belonging to that structure are consistent to each other, reflecting the same version. When performing an update, you don’t need to think about exclusion of other threads, it’s simply not necessary, as the new data structures are not seen by other threads. The entire task of publishing a new version reduces to the task of publishing the root reference to the new version of your data structure. If you can’t stop the other threads processing the old version, the worst thing that can happen, is that you may have to repeat the operation using the new data afterwards, in other words, just a performance issue, in the worst case.
This works smooth with programming languages with garbage collection, as these allow it to let the new data structure refer to old objects, just replacing the changed objects (and their parents), without needing to worry about which objects are still in use and which not.
Here is an example: (a) we have a immutable list, (b) we have writer thread that adds elements to the list and (c) 1000 read threads that read the list without changing it.
It will work without locks.
If we have more than one writer thread we will still need a write lock. If we have to remove entries from the list we will need read-write lock.
Is it valuable? Do not know.

Memory Management Design

I am having some issues designing the memory management for an Entity-Component system and am having some issues coming up with the detail of the design. Here is what I am trying to do (note that all of these classes except Entity are actually virtual, so will have many different specific implementations):
The Program class will have a container of Entity's. The Program will loop through the Entity's and call update on each of them. It will also have a few SubSystem's, which it will also update on each loop through.
Each Entity will contain two types of Component's. All of them will be owned by a unique_ptr inside the Entity since their lifetime is directly tied to the entity. One type, UpdateableComponent, will be updated when the Entity.update() method is called. The second type SubSystemComponent will be updated from within their respective SubSystem.
Now here are my two problems. The first is that some of the Component's will control the lifetime of their parent Entity. My current idea for this is that Component will be able to call a function parent.die() which would change an internal flag inside Entity. Then after Program finishes looping through its updates, it loops through a second time and removes each Entity which was marked for deletion during the last update. I don't know if this is an efficient or smart way to go about it, although it should avoid the problem of an Entity dieing while its Component's are still updating.
The second issue is that I am not sure how to reference SubSystemComponent's from within SubSystem. Since they are refered to by a unique_ptr from inside Entity, I can't use a shared_ptr or a weak_ptr, and a standard pointer would end up dangling when the Entity owning a component dies. I could switch to a shared_ptr inside the Entity for these, then use a weak_ptr in the SubSystem's, however I would prefer to not do this because the whole point is that Entity completely owns its Component's.
So 2 things:
Can my first idea be improved upon in a meaningful way?
Is there an easy way to implement a weak_ptr sort of functionality with unique_ptr, or should I just switch to shared_ptr and just make sure to not create more than one shared_ptr to the SubSystemComponent's
Can my first idea be improved upon in a meaningful way?
Hard to say without knowing more about the nature of the work being undertaken. For example, you haven't said anything about your use of threads, but it seems your design gives equal priority to all the possible updates by cycling through things in a set sequence. For some things where low latency is important, or there's some useful prioritorisation that would ideally be done, a looping sequence like that isn't good, while other times it's ideal.
There are other ways to coordinate the Component-driven removal of Entities from the Program:
return codes could bubble up to the loop over entities, triggering an erase from the container of Entities,
an Observer pattern or lambda/std::function could allow the Program to specify cleanup behaviour.
Is there an easy way to implement a weak_ptr sort of functionality with unique_ptr,
No.
or should I just switch to shared_ptr and just make sure to not create more than one shared_ptr to the SubSystemComponent's
It sounds like a reasonable fit. You could even wrap a shared_ptr in a non-copyable class to avoid accidental mistakes.
Alternatively - as for Entity destruction above - you could coordinate the linkage between SubSystem and SubSystemComponent using events, so the SubSystemComponent destructor calls back to the SubSystem. An Observer pattern is one way to do this, a SubSystemComponent-side std::function fed a lambda is even more flexible. Either way, the Subsystem removes the SubSystemComponent from its records.

Should agents only hold immutable values

In Clojure Programming (OReilly) there is an example where both a java.io.BufferedWriter, and a java.io.Printwriter is put inside an agent (one agent for each). These are then written to inside the agent action. The book says that it is safe to perform io in an agent action. As I understand it all side effecting operations are ok inside an agent action. This is because agent actions inside commits are only run if the commit is succesful. And agent actions inside other agent actions are only run after the outer agent action completes successfully. Agent actions in general are guaranteed to be applied serially.
The Clojure documentation says this: "The state of an Agent should be itself immutable...".
As I understand it, the reason that atoms and refs must hold immutable values is so that clojure can roll back and retry commits several times.
What I don't understand is:
1: If Clojure makes sure that agent actions are only run once, why must agent values be immutable.
(for example if I hold a java array in an agent, and add to it in an agent action, this should be fine because the action will only run once. This is very similar to adding lines to a BufferedWriter)
2: Is java.io.BufferedWriter considered immutable? I understand that you could have a stable reference to one, but if the agent action is performing io on it, should it still be considered immutable?
3: If BufferedWriter is considered immutable, how do I decide if other similar java classes are immutable?
As I see it:
Values held by agents should be 'effectively immutable' (term borrowed from JCIP), in that they should always be conceptually equal to themselves.
This means, if I .clone() an object and compare both copies, original.equals(copy) should be true, no matter of what I do (and when).
In this sense, an instance of the typical Employee class full of getters/setters can not be guaranteed to equal to itself, in face of mutability: equals() will be defined as a field-by-field comparison, so the test can fail.
A BufferedWriter though, does not represent a value - its equality is defined in terms of being exactly the same object in memory. So it has a 'sound' mutability -unlike Employee's- which makes it apt for wrapping it in an agent.
I believe that you are right in that from the STM point of view, agent-value mutability wouldn't hurt a lot. But it would in that it'd break Clojure's time model, in which you 'cannot change the past', etc.
On deciding whether a Java class is immutable: impossible without diving into the implementation. You don't have to care about this too much though.
I'd make the following taxonomy of types in Java-land:
Mutable objects which (badly) represent values - Employee, etc. Never wrap them in a Clojure reference type.
Immutable objects which represent values - their immutability is reflected in the doc, or in naming conventions ("EmployeeBuilder"). Safe to wrap in any Clojure reference.
Unmanaged collection types - ArrayList, etc. Avoid except for interop purposes.
Managed reference/collection types - AtomicReference, blocking queues... They play fine with Clojure, dubious to wrap them in Clojure references though.
'IO' types - BufferedWriter, Swing stuff... you don't care about their mutability because they don't represent values at all - you just want them for their side effects. It might make sense to guard them in agents to guarantee access serialization.
The agent value should be immutable because someone can do this:
(def my-agent (agent (BufferedWriter.)))
(.write #my-agent "Hello world")
Which is basically modifying the agent value (in this case the writer) without going through agent control mechanism.
Yes, BufferedWriter is mutable because by writing to it you can change its internal state. It is like a pointer or reference and not a value.

Accessing different data members belonging to the same object from 2 different thread in C++

I have a few objects I need to perform actions on from different threads in c++. I known it is necessary to lock any variable that may be used by more than one thread at the same time, but what if each thread is accessing (writing to) a different data member of the same object? For example, each thread is calling a different method of the object and none of the methods called modify the same data member. Is it safe as long as I don't access the same data member or do I need to lock the whole object anyway?
I've looked around for explanations and details on this topic but every example seems to focus on single variables or non-member functions.
To summarize:
Can I safely access 2 different data members of the same object from 2 different thread without placing a lock on the whole object?
It is effectively safe, but will strongly reduce the performance of your code if you do that often. Computers use things called "cache lines" and if two processors are working on the same cache line they'll have to pass it back & forth all the time, slowing your work down.
Yes, it is safe to access different members of one object by different thread.
I think you can do that fine. But you better be sure that that the method internals never change to access the same data or the calling program doesn't decide to call another method that another thread is already using etc.
So possible, but potentially dangerous. But then it will also be quicker because you'll be avoiding calls to get mutexes. Pick your poison.
Well, yes, OK you can do it but, as others have pointed out, you should not have to. IMHO, access to data members should be via getter/setter methods so that any necessary mutexing/criticalSectioning/semaphoring/whatever is encapsulated within the object.
Is it safe as long as I don't access the same data member or do I need to lock the whole object anyway?
The answer totally depends upon the design of the class, However I would still say that it is always recommended to think 100 times before allowing multiple threads to access same object. Given the fact, If you are sure that the data is really independent their is NO need to lock the whole object.
Then a different question arises, "If variables are indeed independent Why they are in same class ?" Be careful threading kills if mistaken.
You might want to be careful. See for example http://gcc.gnu.org/ml/gcc/2012-02/msg00032.html
Depending on how the fields are accessed, you might run across similar hard to find problems.

Providing settable global vars

I need to provide settable vars for my users, like warn-on-reflection provided by clojure AFAIK they are not defined on the clojure side thats why we can set them.
Problem is my vars (all configuration stuff) are used in a lot of tight loops thats why I don't want to make them refs cause they MAY get set when application starts up and thats it no change during runtime, they will be read maybe millions of times so making them refs seems like wasting resources.
So the question is can I define settable vars in my case?
If you want settable global state with low overhead that is visible to all threads and doesn't need any STM transactions to control mutation, I'd recommend just using atoms:
(def some-global-value (atom 1))
Reads and writes to atoms are extremely low overhead.
warn-on-reflection is just a var, albeit one defined in Clojure's Java code rather than in core.clj. However, aside from where it is defined, there is nothing special about warn-on-reflection; it behaves exactly like any other var.
If you do not want to use vars, you may need to approach your problem from a different perspective than "must use global variables". It is common in functional programming to pass all necessary values into a function rather than relying on global variables. Perhaps it is time for you to consider such an approach.
You can create local binding using let, this way the parameters can be dynamic and you'll still have fast access. The only thing is that if someone change a parameter while the loop is running, the loop won't pick that up (IMO this should be the desired behaviour anyway).