How to achieve MVCC in actor model - concurrency

Let's say I have represented my Employee entity as an actor. I have 2 services also modeled as actors. Both of them manipulate the state of an Employee actor it has received by sending it messages. Now let's say both the services are processing the same actor. Now it is perfectly possible that an employee actor receive state changing messages in the following order from the two services A and B
Employee <- |a1|a2|a3|b1|b2|b3|
This is fine. But sometimes its not
Employee <- |a1|b1|a2|b2|a3|b3|
Maybe a2 was dependent on state changed by a1, but b1 changed it
In analogy to databases, we have transactions so that we can work with a single snapshot/version of the data throughout the transactions lifetime.
In imperative model, we would lock the whole employee object and update its state similar to how database would do it.
So is it possible that an actor can receive bulked messages that will be processed as one atomic series of messages? Or is my modeling of my data itself flawed?

Since I don't know what a1-a3 and b1-b3 actually represent, I can only assume to answer the question correctly. To me it appears that your messages are too fine grained. For example, perhaps a1-a3 are trying to set just one attribute of data in each message. The same probably goes for b1-b3.
However, what your messages should be focused on is causing behaviors on the Employee, not in setting individual attributes. Thus, as you yourself suggest, design your messages as behaviors, where a1-a3 would collapse into a single operation request. This is often called the Command pattern, where you command/tell an object/actor to do something. Doing so will result in correct transactional boundaries per message.
Note that above I said "object/actor." You can/should use the same approach in your object designs, not just for actors. Think in terms of intention-revealing interfaces and telling your domain model what you want it to do for you, rather than treating domain objects/actors as dumb data holders.
That's my take on your question. HTH.
Vaughn

Both of them manipulate the state of an Employee actor it has received
by sending it messages.
Well. An Actor by definition does not share its state or its manipulation with any other Actor. Any state manipulation is transactional within the boundaries of one message handling. An Actor represents an aggregate in that sense. Messages usually are Domain events/commands and have scope and part of the Ubiquitous language.
DDD reasoning helps a lot when thinking about Actors.
My two cents
Sergiy
<><

Related

Get ActorRef of a mutiple akka Actor instances by path

I am creating a pool of instances of a same Actor
Later in my app, I want to be able to reference a specific instance of an actor "pool" by its unique path.
val instance1 = context.spawn(ActorA(), "actorA_1")
val instance2 = context.spawn(ActorA(), "actorA_2")
val instance3 = context.spawn(ActorA(), "actorA_3")
etc
This is done at the initialisation of the application and the goal is to allow any actor of the app to reference this instance if it knows its unique path...
I want to achieve something like that :
val actorRef = getActorByItsUniquePath(path)
actorRef ! sendMessage(...)
I don't know if I need to use the classic Actor Api or the typed API (receptionist) and/or if I need to store the unique path (path + UID) of each Actor instance on a HashMap and retrieve this path later.
I don't find any clear direction to implement that in the doc.
You are on the right path. In Classic Actors this is done by actor selection and Actor Path. Note that "unique path" potentially has to include the node in a clustered setup.
In typed Actors you would use the Receptionist and the ServiceKey acts as your "unique path".
So, you don't need to use one API or the other: you should use classic actors or typed actors as you prefer. Just choose the discovery method (ActorPath or Receptionist) based on which API you have chosen.
However, I will add one caveat. Generally it is preferred not to do this. You are effectively exposing an implementation detail to your client. And removing some flexibility/resiliency. This is discussed here in the docs: https://doc.akka.io/docs/akka/current/actors.html#identifying-actors-via-actor-selection under "It is always preferable to communicate with other Actors using their ActorRef instead of relying upon ActorSelection. Exceptions are:". This is not to say that you absolutely shouldn't do this, as the actor discovery APIs obviously exist for a reason. Just that you should "prefer" not to depending on looking up actors via unique path, and that you should have a good reason for not using something like a router instead.
EDIT TO REPLY TO COMMENT:
You say in the comment below that you will have actor instances with the same path, but this impossible. Actor paths are unique*; each instance has a unique path. It seems that you may think that the path is associated with the actor, but the path is the actor instance. See the docs and also see the section in the Classic docs about naming actors where it talks about the fact that you must give a unique name to each instance. If you don't explicitly give a unique name to your instance, the system will auto generate one. (e.g. /user/myactor/$1, /user/myactor/$2, ...)
If you do have a uid generated already, just specify it as the name of instance when you spawn it and you will end up with a path that looks like: akka://my-actorsystem#mynode:port/user/parentactor/youruid and you will be able to use that path to look it up.
You situation really isn't that unusual. This is a pretty common pattern to have many instances of the same actor, each maintaining state. This is also why I mention using a router pattern, because you have more flexibility with that design. In fact, this is also basically the pattern behind Akka Sharding. You may want to look into that as an option as well, especially if you are using clustering.
If you use a Receptionist and typed actors it's basically the exact example shown in the Receiptionist docs. Every time you create a new actor instance you would register it with the Receptionist with the register message and your uid key. Then you just look it up later with a Find message. It's a wee bit more complicated because Receptionists don't enforce uniqueness (unlike Paths) and because you have to be aware of types (and therefore adapt response types) but that's all the Receptionist really is: a key-value store of actor instances implemented as an actor you send messages to.
*Technically, actor paths are unique to living actor instances. You, in theory could create an actor /user/foo, stop it, and then create a second instance with the same name. But I didn't want to complicate the above explanation as I didn't think that detail was important here.

Repository pattern: isn't getting the entire domain object bad behavior (read method)?

A repository pattern is there to abstract away the actual data source and I do see a lot of benefits in that, but a repository should not use IQueryable to prevent leaking DB information and it should always return domain objects, not DTO's or POCO's, and it is this last thing I have trouble with getting my head around.
If a repository pattern always has to return a domain object, doesn't that mean it fetches way too much data most of the times? Lets say it returns an employee domain object with forty properties and in the service and view layers consuming that object only five of those properties are actually used.
It means the database has fetched a lot of unnecessary data a pumped that across the network. Doing that with one object is hardly noticeable, but if millions of records are pushed across that way and a lot of of the data is thrown away every time, is that not considered bad behavior?
Yes, when adding or editing or deleting the object, you will use the entire object, but reading the entire object and pushing it to another layer which uses only a fraction of it is not utilizing the underline database and network in the most optimal way. What am I missing here?
There's nothing preventing you from having a separate read model (which could a separately stored projection of the domain or a query-time projection) and separating out the command and query concerns - CQRS.
If you then put something like GraphQL in front of your read side then the consumer can decide exactly what data they want from the full model down to individual field/property level.
Your commands still interact with the full domain model as before (except where it's a performance no-brainer to use set based operations).

Event Sourcing: concurrently creating conflicting events

I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser messages. However, when the second controller tries to persist those 2 NewUser messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.
Any ideas on how to address this?
Thanks.
UPDATE:
In particular, I would like to verify whether the following are accepted approaches to the problem:
use the username as the userId (restrictive)
send an event to a topic partitioned by username and when validation
is done send an event to another topic
Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:
1. Pessimistic approach:
If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:
You can wait until all events published before locking are materialized.
You can read current state from the database and apply events on it in a separate process.
2. Optimistic approach:
In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.
Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.
During a new user sign-up I want to check if the username the user provided is already taken.
You may want to review Greg Young's essay on Set Validation.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB).
That's a little bit different from the usual arrangement. (You may also want to review Greg's talk on polyglot data.)
Suppose we begin with two writers; that's fine, but if there is going to be a single point of truth, then you are going to need synchronization somewhere.
The usual arrangement is to use a form of optimistic concurrency; when processing a request, you reserve a copy of your original state, then you do your calculation, and finally you send the book of record a `replace(originalState,newState)'.
So at this point, we have two writes racing toward the book of record
replace(red,green)
replace(red,blue)
At the book of record, the writes are processed in series.
[...,replace(red,blue)...,replace(red,green)]
So when the book of record processes replace(red,blue), it performs a check that yes, the state is currently red, and swaps in blue. Later, when the book of record tries to process replace(red,green), the book of record performs the check, which fails because the state is no longer red.
So one of the writes has succeeded, and the other fails; the latter can propagate the failure outwards, or retry, or..., precisely what depends on the specific mechanics in question. A retry should mean, of course, reload the "original state", at which point the model would discover that some previous edit already claimed the username.
Any ideas on how to address this?
Single writer per stream makes the rest of the problem pretty simple, by eliminating the ambiguity introduced by having multiple in memory copies of the model.
Multiple writers using a synchronous write to the durable store is probably the most common design. It requires an event store that understands the idea of writing to a specific location in a stream -- aka "expected version".
You can perform an asynchronous write, and then start doing other work until you get an acknowledgement that the write succeeded (or not, or until you time out, or)....
There's no magic -- if you want uniqueness (or any other sort of invariant enforcement, for that matter), then everybody needs to agree on a single authority, and anybody else who wants to propose a change won't know if it has been accepted without getting word back from the authority, and needs to be prepared for a rejected proposal.
(Note: this shouldn't be a surprise -- if you were using a traditional design with current state stored in a RDBMS, then your authority would be a user table in the database, with a uniqueness constraint on the username column, and the race would be between the two insert statements trying to finish their transaction first....)

WS2ESB: Store state between sequence invocations

I was wondering about the proper way to store state between sequence invocations in WSO2ESB. In other words, if I have a scheduled task that invokes sequence S, at the end of iteration 0 I want to store some String variable (lets' call it ID), and then I want to read this ID at the start (or in the middle) of iteration 1, and so on.
To be more precise, I want to get a list of new SMS messages from an existing service, Twilio to be exact. However, Twilio only lets me get messages for selected days, i.e. there's no way for me to say give me only new messages (since I last checked / newer than certain message ID). Therefore, I'd like to create a scheduled task that will query Twilio and pass only new messages via REST call to my service. In order to do this, my sequence needs to query Twilio and then go through the returned list of messages, and discard messages that were already reported in the previous invocation. Now, to do this I need to store some state between different task/sequence invocations, i.e. at the end of the sequence I need to store the ID of the newest message in the current batch. This ID can then be used in subsequent invocation to determine which messages were already reported in the previous invocation.
I could use DBLookup and DB Report mediators, but it seems like an overkill (using a database to store a single string) and not very performance friendly. On the other hand, as far as I can see Class mediators are instantiated as singletons, therefore I could create a custom Class mediator that would manage this state and filter the list of messages to be sent to my service. I am quite sure that this will work, but I was wondering if this is the way to go, or there might be a more elegant solution that I missed.
We can think of 3 options here.
Using DBLookup/Report as you've suggested
Using the Carbon registry to store the values (this again uses DBs in the back end)
Using a Custom mediator to hold the state and read/write it from/to properties
Out of these three, obviously the third one will deliver the best performance since everything will be in-memory. It's also quite simple to implement and sometime back I did something similar and wrote a blog post here.
But on the other hand, the first two options can keep the state even when the server crashes, if it's a concern for your use case.
Since esb 490 you can persist and read properties from registry using property mediator.
https://docs.wso2.com/display/ESB490/Property+Mediator

How do you perform service-oriented parent-child transactions?

Example:
A SalesOrder is composed of a SalesOrderHeader and one or more SalesOrderItems. When editing an existing SalesOrder, the SalesOrderHeader can be modified and SalesOrderItems can be added, modified and deleted. All changes must be saved in a single transaction. Multiple users may edit the SalesOrder at the same time with optimistic concurrency.
I believe that the requirement to have the save done in a single transaction encourages us to communicate both the SaleOrderHeader and the SalesOrderItems in a single service call. The implication of packaging up the child data with its parent is that there will need to be some understanding as to whether the child data is added, modified or deleted.
Change tracking of the child entities can happen either on the server or on the client.
Change tracking on the server
The idea with this strategy is that the client can modify the SalesOrder to its will without tracking which SalesOrderItems are added, modified or deleted. The state of the SalesOrderItems will be determined on the server when the save service is called.
The server should remain stateless between service calls. This means that the server can’t retain any information about the state of the SalesOrder between its retrieval and its eventual save. The only option left if for the server to determine the state of its entities by comparing the modified object graph to the database object graph.
With nHibernate, there is a merge function to accomplish this. With Entity framework, the highest voted feature request is to have this added. There’s also an open source implementation of this for EF called GraphDiff.
This sounds great in theory because it makes the services very easy to design and use. However, I see two major issues with this strategy. The first is performance. The entire object graph must be sent back on every save. Whether or not a SalesOrderItem was modified, it must be sent back or the server will assume it’s been deleted. The second problem is even more critical and it has to do with concurrency. If User 1 adds a SalesOrderItem to a SalesOrder and User 2 makes a change to the same SalesOrder, when User 2 saves the server will assume that the SalesOrderItem added by User 1 should be deleted because it was not included in User 2’s object graph. I don’t see a way this can be prevented in any implementation of server side change tracking.
Change tracking on the client
The alternative is to have the client track changes to its entities and communicate that state when calling the save service. One benefit is that the client does not need to send its unchanged child entities. This helps with performance. A downside is that all entities will need an additional property named something along the lines of “ObjectState” to track whether it’s added, modified or deleted. This makes the entity models on the server quite messy and filled with concerns unrelated to the business domain. This also puts onus on the different consumers of the service to maintain this state. Another problem is that it becomes difficult to deal with deleted entities. Should the SalesOrderHeader maintain a list of deleted SalesOrderItems? or should the SalesOrderItems get assigned a state of deleted which must be filtered out by the client UI?
I know that breeze javascript library has its own implementation of client-side entity tracking but my concern is that its implementation requires both client-side and server-side components. Shouldn't the service layer isolate which technology we use on either side? What if non-javascript clients want to use my services?
Question
I would think this is a common scenario that should be addressed by the majority of service implementations. Have I made any incorrect assumptions or am I doing anything out or the ordinary? What strategy have you implemented? Are there any reasonable alternatives?
Full disclosure: I work with Breeze, and I think change tracking on the client is the way to go. Change tracking on the client allows stateless servers, reduces traffic between the client and server, and allows offline use.
In Breeze, the "ObjectState" that you mention is called the EntityAspect, and each entity has one, but it is not part of the domain model. The server-side entities don't need an EntityAspect, but the server-side service has to know how to handle the entity state information that comes from the client.
Basically, the service needs to create, update, or delete entities based on the information coming from the client. There are existing server-side backends for Breeze that do all this already (in .NET (EF and NHibernate), Java, PHP, Node, and Ruby), but you can also write your own. Your server just needs to know how to talk to the client.
Let's say we've updated a SalesOrder and added a new SalesOrderItem. The Breeze client sends a save bundle that looks something like this:
{
"entities": [
{
"Id": 123,
"Title": "My Updated Title",
"OrderDate": "2014-08-03T07:00:00.000Z",
"entityAspect": {
"entityTypeName": "SalesOrder:#My.DomainModel",
"entityState": "Modified",
"originalValuesMap": {
"Title": "My Original Title"
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
},
{
"Id": -1,
"SalesOrderId": 123,
"ProductId": 456,
"Quantity": 11,
"entityAspect": {
"entityTypeName": "SalesOrderItem:#My.DomainModel",
"entityState": "Added",
"originalValuesMap": {
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
}
]
}
Here, SalesOrder with Id# 123 has been modified (its Title has been changed). The entityAspect includes the originalValuesMap which shows what the previous Title was.
The server would need to update the existing SalesOrder with the new value. Whether the server needs to query the existing SalesOrder from the database before applying the changes is implementation-dependent.
A new SalesOrderItem has been added. A temporary Id, -1, was created for it on the client. The server needs to create and persist a new SalesOrderItem and generate a real Id for it.
The response from the server should contain the entities that were created and updated, and KeyMapping information that shows what server-generated keys map to the temporary client-side keys, so that the client can replace them.
Change tracking is not a simple problem, but Breeze tries to do the hard parts for you.
I'd like to piggy back on Steve's answer.
We should be clear: the onus for implementing the Order-graph (AKA "Order aggregate") transaction in a relational data model falls on the developer. BreezeJS (and Breeze helpers for .NET servers) can facilitate but you have to make it work.
The key to making this work is including the root element of the aggregate - the Order - in all changes to any entity within the aggregate. If you add, delete, or modify an OrderItem, make sure you modify the Order at the same time .
How? By bumping the Order's concurrency property (e.g, the rowVersion) and making sure that Breeze KNOWS this is your concurrency property.
You must implement root entity optimistic concurrency if you want to ensure Order aggregate consistency.
Now you can detect if someone else has made a change to any part of the Order aggregate. That could be a change to the Order or an add/mod/delete of one of its OrderItems.
You do not have to include all OrderItems in the change-set when you save a changed Order aggregate. You only need to include the OrderItems that are added/modified/deleted.
Of course some other user may make a change to the Order aggregate before you save yours. When you try to save yours, the save will fail with an optimistic concurrency error.
Upon detecting an optimistic concurrency error for an Order, make sure the client removes the entire order aggregate from cache - the Order and all of its OrderItems - and then re-fetch the aggregate Don't just re-fetch the root Order entity and start messing with its items. Make sure you remove the entire aggregate from cache and then re-fetch it (the order and its items).
If everyone follows this protocol you'll be in fine shape on the server.