Trouble understanding :state in Clojure - clojure

I understand what :state /does/. It creates a field, like in Java, in your class. What I don't understand is what is the point of this? It seems like I only see it done with Clojure-generated classes that extend other classes. http://www.fatvat.co.uk/2009/05/clojure-and-robocode.html being one example. I don't know Java, and I'm not very well versed in Object Oriented Programming. Can someone explain the point of :state to me, and where it all fits in with Java interop?
Thanks a lot!
NOTE: When I say :state, I am referring to (:gen-class :state)

:state is simply a way of sharing some data between the functions generated as part of gen-class. Think of it as being exactly the same as the instance data of an object.

More infomation on the state and how to initialize it can be found the article gen-class – how it works and how to use it
From the article:
:state defines a method which will return the object's state.
:init defines the name of the initialiser. This is a function which has to return a vector. The first element is again a vector of arguments to the super class constructor. In our case this is just the empty vector. The second element is the object's state.
In summary, init returns the state of the object and is called when the object is instantiated. state is a method on the class, as opposed to a function, that will return the same value returned as the second element in the vector returned by init.
The article then goes on to show how to use an atom to be able to change the state of the object, if needed.

I talked it over with hiredman on the #Clojure IRC channel, and he told me that the main point of it is a state per instance. That makes sense.

Related

Common implementaion for scoped procedure

I am interested in common practice/implementation of a pattern which I would call "scoped procedure". I even don't know how to name it correctly :)
The thing I am talking about is close to boost::scoped_exit and boost::scoped_connection: a class, which
holds user-provided functor,
shares the functor on object copying,
have a counter to count all objects sharing given functor,
calls the fuctor when the counter becomes 0.
The candidate I can think about is boost::shared_ptr, but it seems to me a bit awkward to store there some fake pointer.
Can you give any suggestion/best practice for this?
The case I want to apply it to is as follows. There is some registry class which stores a collection of records for associated objects of some other class. When an object is "registered" in the collection it receives a registration ID. When the object wants to unregister it just "releases the ID" (by going out of scope or calling a method similar to boost::scoped_connection::disconnect()). On release a user-provided procedure would be called to remove a corresponding record from the collection.
Thank you in advance!

What is internal state?

What is this "internal state" people talk about all the time precisely? The term really irritates me. The internet couldn't provide me with a definition yet.
From Object-Oriented Analysis and Design with Applications
The state of an object encompasses all of the (usually static)
properties of the object plus the current (usually dynamic) values of
each of these properties
In object oriented programming the objects can have state (data) and behavior (function).
The behavior specifies what the object can do, and it is usually conditioned by its state.
The state can be represented by any member or static variable, and it will depend of the definition of the class the object is instance of.
Update: The internal state refers to those private variables that affect the behavior of the object but are not visible from the outside world.
For example, let's say you have an HTTP client having the following interface:
class HttpClient {
public:
HttpClient(std::string host);
HttpResponse get(std::string path);
HttpResponse post(std::string path);
};
This object might have a getter for host but none for the current connection state.
A good optimization might be to keep the connection alive between requests (assuming the server allows it) so, in the first call to get or post the object will have to establish the connection and save the socket description in some internal variable that is not exposed to the user. The next time get or post is called the connection is already established (and the user has no idea).
In this case, the connection is part of the internal state of the object.
What is your internal state?
Hungry, Thirst,
Put some variables on that.
So in OO terms.
My state is
drinks-requirement: two glass of water,
food-requirement: sandwich
So the same concept applies in terms of an object. The sum total of the variables of anobject
Building off of what #AdamBurry said, think of an object as a black box that another piece of code can use. That code instantiates it:
Order o = new Order();
Then the code asks for the object to modify itself:
OrderItem oi = new OrderItem("Widget", 5.5);
o.AddItemToOrder(oi);
Then the code asks for the object to do something.
o.GetTotal();
How is the order computing the new total, given the item that just got added? Does it have a list of OrderItems, complete with prices? You bet. It has internal details that the code calling into may have no way of getting to. Those black-boxy details that the object needs to very carefully keep track of are the internal state of the object.
A much more practical example of something you may never want to expose to the "outside" world are variables which maintain the "dirty" state of an object. Has it been modified, but not committed to the database, yet? External code should never need to know this, but the object may need to.
What about an object that lets you step forward or backward through a list? Somewhere in that object, there's going to be an internal state variable that acts as a pointer to the current record. Again, the calling code would never need to see this, but when the code calls the .MoveNext() method, the object is going to have to increment that pointer by one to maintain the state of where it is in the list.
The internal state of an object is the set of all its attributes' values. One particular aspect of the internal state is that a method applied to the object being in a defined internal state (= a specific set of all its attributes' value) will result in another, also defined (and reproducible) internal state.
This aspect is important when you somehow record system states that you want to replay afterwards in a simulation of the recorded system. By recording the original internal state of an object you are able to reproduce all its subsequent internal states by simply calling its methods without having to store any additional data. However this is easier said than done in practice...
Applied to C++ the internal state will not be altered by a const method.
A mutable attribute (= attribute modifiable by a const method) can be altered without semantically affecting the internal state of an object. At least this is the contract the developer goes for when he uses this modifier...

Clojure modify LazySeq

I inherited some Java code that does the following:
1) it receives from Clojure a LazySeq object (which is made up of a number of PersistentHashMap objects)
2) it then passes this same LazySeq object (unchanged) back to a Clojure script where it is converted into a String and passed back to Java
The issue is that inside the Java code after step (1) and before step (2), I need to modify some of the PersistentHashMap objects inside the LazySeq and then proceed to step (2). Something like:
LazySeq seq = clojureFunctionReturningLazySeq();
//update the elements of the sequence
String result = clojureFunctionReceivingLazySeq(seq);
I cannot modify the Clojure script itself and the updating of the LazySeq has to happen inside the Java code. I checked the LazySeq API and I cannot find a method to modify (or add) an element.
Thank you,
Chris
Short answer: You can't. LazySeq and PersistentHashMap in Clojure are immutable.
Longer answer: Generally, Clojure code makes very few assumptions about the exact kind of list object it's receiving - Most things work against ISeq which, if you don't want to bother with the other Clojure types, is rather trivial to implement.
So, you'd need to create a class that implements ISeq and returns transformed PersistentHashMap's as it runs through its parent LazySeq. Instantiate that class and pass it to clojureFunctionReceivingLazySeq(seq) instead.

Should agents only hold immutable values

In Clojure Programming (OReilly) there is an example where both a java.io.BufferedWriter, and a java.io.Printwriter is put inside an agent (one agent for each). These are then written to inside the agent action. The book says that it is safe to perform io in an agent action. As I understand it all side effecting operations are ok inside an agent action. This is because agent actions inside commits are only run if the commit is succesful. And agent actions inside other agent actions are only run after the outer agent action completes successfully. Agent actions in general are guaranteed to be applied serially.
The Clojure documentation says this: "The state of an Agent should be itself immutable...".
As I understand it, the reason that atoms and refs must hold immutable values is so that clojure can roll back and retry commits several times.
What I don't understand is:
1: If Clojure makes sure that agent actions are only run once, why must agent values be immutable.
(for example if I hold a java array in an agent, and add to it in an agent action, this should be fine because the action will only run once. This is very similar to adding lines to a BufferedWriter)
2: Is java.io.BufferedWriter considered immutable? I understand that you could have a stable reference to one, but if the agent action is performing io on it, should it still be considered immutable?
3: If BufferedWriter is considered immutable, how do I decide if other similar java classes are immutable?
As I see it:
Values held by agents should be 'effectively immutable' (term borrowed from JCIP), in that they should always be conceptually equal to themselves.
This means, if I .clone() an object and compare both copies, original.equals(copy) should be true, no matter of what I do (and when).
In this sense, an instance of the typical Employee class full of getters/setters can not be guaranteed to equal to itself, in face of mutability: equals() will be defined as a field-by-field comparison, so the test can fail.
A BufferedWriter though, does not represent a value - its equality is defined in terms of being exactly the same object in memory. So it has a 'sound' mutability -unlike Employee's- which makes it apt for wrapping it in an agent.
I believe that you are right in that from the STM point of view, agent-value mutability wouldn't hurt a lot. But it would in that it'd break Clojure's time model, in which you 'cannot change the past', etc.
On deciding whether a Java class is immutable: impossible without diving into the implementation. You don't have to care about this too much though.
I'd make the following taxonomy of types in Java-land:
Mutable objects which (badly) represent values - Employee, etc. Never wrap them in a Clojure reference type.
Immutable objects which represent values - their immutability is reflected in the doc, or in naming conventions ("EmployeeBuilder"). Safe to wrap in any Clojure reference.
Unmanaged collection types - ArrayList, etc. Avoid except for interop purposes.
Managed reference/collection types - AtomicReference, blocking queues... They play fine with Clojure, dubious to wrap them in Clojure references though.
'IO' types - BufferedWriter, Swing stuff... you don't care about their mutability because they don't represent values at all - you just want them for their side effects. It might make sense to guard them in agents to guarantee access serialization.
The agent value should be immutable because someone can do this:
(def my-agent (agent (BufferedWriter.)))
(.write #my-agent "Hello world")
Which is basically modifying the agent value (in this case the writer) without going through agent control mechanism.
Yes, BufferedWriter is mutable because by writing to it you can change its internal state. It is like a pointer or reference and not a value.

Object Oriented Design - The easiest case, but I'm confused anyway!

When I wrap up some procedural code in a class (in my case c++, but that is probably not of interest here) I'm often confused about the best way to do it. With procedural code I mean something that you could easily put in an procedure and where you use the surrounding object mainly for clarity and ease of use (error handling, logging, transaction handling...).
For example, I want to write some code, that reads stuff from the database, does some calculations on it and makes some changes to the database. For being able to do this, it needs data from the caller.
How does this data get into the object the best way. Let's assume that it needs 7 Values and a list of integers.
My ideas are:
List of Parameters of the constructor
Set Functions
List of Parameters of the central function
Advantage of the first solution is that the caller has to deliver exactly what the class needs to do the job and ensures also that the data is available right after the class has been created. The object could then be stored somewhere and the central function could be triggered by the caller whenever he wants to without any further interaction with the object.
Its almost the same in the second example, but now the central function has to check if all necessary data has been delivered by the caller. And the question is if you have a single set function for every peace of data or if you have only one.
The Last solution has only the advantage, that the data has not to be stored before execution. But then it looks like a normal function call and the class approaches benefits disappear.
How do you do something like that? Are my considerations correct? I'm I missing some advantages/disadvantages?
This stuff is so simple but I couldn't find any resources on it.
Edit: I'm not talking about the database connection. I mean all the data need for the procedure to complete. For example all informations of a bookkeeping transaction.
Lets do a poll, what do you like more:
class WriteAdress {
WriteAdress(string name, string street, string city);
void Execute();
}
or
class WriteAdress {
void Execute(string name, string street, string city);
}
or
class WriteAdress {
void SetName(string Name);
void SetStreet(string Street);
void SetCity(string City);
void Execute();
}
or
class WriteAdress {
void SetData(string name, string street, string city);
void Execute();
}
Values should be data members if they need to be used by more than one member function. So a database handle is a prime example: you open the connection to the database and get the handle, then you pass it in to several functions to operate on the database, and finally close it. Depending on your circumstances you may open it directly in the constructor and close it in the destructor, or just accept it as a value in the constructor and store it for later use by the member functions.
On the other hand, values that are only used by one member function and may vary every call should remain function parameters rather than constructor parameters. If they are always the same for every invocation of the function then make them constructor parameters, or just initialize them in the constructor.
Do not do two-stage construction. Requiring that you call a bunch of setXYZ functions on a class after the constructor before you can call a member function is a bad plan. Either make the necessary values initialized in the constructor (whether directly, or from constructor parameters), or take them as function parameters. Whether or not you provide setters which can change the values after construction is a different decision, but an object should always be usable immediately after construction.
Interface design is very important but in your case what you need is to learn that worst is better.
First choose the simplest solution you have, write it now.
Then you'll see what are the flaws, so fix them.
Repeat until it's not important to fix them.
The idea is that you'll have to get experience to understand how to get directly to the "best" or better said "less worst" solution of some type of problem (that's what we call "design pattern"). To get that experience you'll have to hit problems fast, solve them and try to deeply understand why something was wrong.
That's you'll have to do each time you try something "new". Errors are not a problem if you fix them and learn from them.
You should use the constructor parameters for all values, which are necessary in any case (consider that many programming languages also support constructor overloading).
This leads to the second: Setter should be used to introduce optional parameters, or to update values.
You can also join these methods: expect necessary parameters in the constructor and then call their setter-function. This way you have to do check validity checks only once (in the setters).
Central functions should use temporary parameters only (timestamps, ..)
First off, it sounds like you are trying to do too much at once. Reading, calculating and updating are all separate operations, that themselves can probably split down further.
A technique I use when I'm thinking about the design of a method or class is to think: 'what do I want the highest-level method to ideally look like?' i.e. think about the separate components of the method and split them down. That's top-down design.
In your case, I envisaged this in my head (C#):
public static void Dostuff(...)
{
Data d = ReadDatabase(...);
d.DoCalculations(...);
UpdateDatabase(d);
}
Then do the same thing for each of those methods.
When you come to passing in parameters to your method, you need to consider whether the data you're passing in is stored or not - i.e. if your class is static (it cannot be instantiated, and is instead just a collection of methods etc) or if you make objects of the class. In other words: each object of the class has a state.
If the parameters can indeed be considered to be attributes of the class, they define its state, and should be stored as private variables with getters and setters for each, where neccessary. If the class instead has no state, it should be static and the parameters passed directly to the method.
Either way, it is common, and not considered bad practice, to have both a constructor and a few get / set functions where neccessary. It is also common to have to check the state of the object at the beginning of a method, so I wouldnt worry about that.
As you can see, it largely depends on what else you are doing in this class.
The reason you can't find many resources on this is that the 'right' answer is hugely domain-specific; it depends heavily on the specific project. The best way to find out is usually by experiment.
(For example: You're right about the advantages of the first two methods. An obvious disadvantage is the use of memory to store the data the whole time the object exists. This disadvantage doesn't matter in the least if your project needs two of these data objects; it's potentially a huge problem if you need a very large number. If it's a big live dataset, you're probably better querying for data as you need it, as implied by your third solution... but not definitely, as there are times when it's better to cache the data.)
When in doubt, do a quick test implementation with a simplest-possible interface; just writing it will frequently make it clearer what the pros and cons are for your project.
Specifically addressing your example it seems as though you are still thinking too procedurally.
You should make an object that initialises the connection to the database doing all relevant error checking. Then have a method on the object that writes the values in whatever convenient way you prefer. When the object is destroyed it should release the handle to the database. That would be the object oriented way to approach the problem.
I assume the only responsibility of your WriteAddress class is to write an address to a database or an output stream. If so, then you should not worry about getters and setters for the address details; instead, define an interface AddressDataProvider that is to be implemented by all classes with which your WriteAddress class will collaborate.
One of the methods on that interface would be GetAddressParts(), which would return an array of strings as required by WriteAddress. Any class that implements that method will need to respect this array structure.
Then, in WriteAddress, define a setter SetDataProvider(AddressDataProvider). This method will be called by the code that instantiates your WriteAddress object(s).
Finally, in your Execute() method, obtain the data that are required by calling GetAddressParts() on the "data provider" that you set and write out your address.
Notice that this design shields WriteAddress from subsidiary activities that are not strictly part of its responsibilities. So, WriteAddress does not care how the address details are retrieved; it does not even care about knowing and holding the address details. It just knows from where to get them and how to write them out.
This is obvious even in the description of this design: only two names WriteAddress and AddressDataProvider come up; there is no mention of database or how to pass the address details. This is usually an indication of high cohesion and low coupling.
I hope this helps.
You can implement each approach, they don't exclude each other, then you're going to see which are most useful.