How do you perform service-oriented parent-child transactions? - web-services

Example:
A SalesOrder is composed of a SalesOrderHeader and one or more SalesOrderItems. When editing an existing SalesOrder, the SalesOrderHeader can be modified and SalesOrderItems can be added, modified and deleted. All changes must be saved in a single transaction. Multiple users may edit the SalesOrder at the same time with optimistic concurrency.
I believe that the requirement to have the save done in a single transaction encourages us to communicate both the SaleOrderHeader and the SalesOrderItems in a single service call. The implication of packaging up the child data with its parent is that there will need to be some understanding as to whether the child data is added, modified or deleted.
Change tracking of the child entities can happen either on the server or on the client.
Change tracking on the server
The idea with this strategy is that the client can modify the SalesOrder to its will without tracking which SalesOrderItems are added, modified or deleted. The state of the SalesOrderItems will be determined on the server when the save service is called.
The server should remain stateless between service calls. This means that the server can’t retain any information about the state of the SalesOrder between its retrieval and its eventual save. The only option left if for the server to determine the state of its entities by comparing the modified object graph to the database object graph.
With nHibernate, there is a merge function to accomplish this. With Entity framework, the highest voted feature request is to have this added. There’s also an open source implementation of this for EF called GraphDiff.
This sounds great in theory because it makes the services very easy to design and use. However, I see two major issues with this strategy. The first is performance. The entire object graph must be sent back on every save. Whether or not a SalesOrderItem was modified, it must be sent back or the server will assume it’s been deleted. The second problem is even more critical and it has to do with concurrency. If User 1 adds a SalesOrderItem to a SalesOrder and User 2 makes a change to the same SalesOrder, when User 2 saves the server will assume that the SalesOrderItem added by User 1 should be deleted because it was not included in User 2’s object graph. I don’t see a way this can be prevented in any implementation of server side change tracking.
Change tracking on the client
The alternative is to have the client track changes to its entities and communicate that state when calling the save service. One benefit is that the client does not need to send its unchanged child entities. This helps with performance. A downside is that all entities will need an additional property named something along the lines of “ObjectState” to track whether it’s added, modified or deleted. This makes the entity models on the server quite messy and filled with concerns unrelated to the business domain. This also puts onus on the different consumers of the service to maintain this state. Another problem is that it becomes difficult to deal with deleted entities. Should the SalesOrderHeader maintain a list of deleted SalesOrderItems? or should the SalesOrderItems get assigned a state of deleted which must be filtered out by the client UI?
I know that breeze javascript library has its own implementation of client-side entity tracking but my concern is that its implementation requires both client-side and server-side components. Shouldn't the service layer isolate which technology we use on either side? What if non-javascript clients want to use my services?
Question
I would think this is a common scenario that should be addressed by the majority of service implementations. Have I made any incorrect assumptions or am I doing anything out or the ordinary? What strategy have you implemented? Are there any reasonable alternatives?

Full disclosure: I work with Breeze, and I think change tracking on the client is the way to go. Change tracking on the client allows stateless servers, reduces traffic between the client and server, and allows offline use.
In Breeze, the "ObjectState" that you mention is called the EntityAspect, and each entity has one, but it is not part of the domain model. The server-side entities don't need an EntityAspect, but the server-side service has to know how to handle the entity state information that comes from the client.
Basically, the service needs to create, update, or delete entities based on the information coming from the client. There are existing server-side backends for Breeze that do all this already (in .NET (EF and NHibernate), Java, PHP, Node, and Ruby), but you can also write your own. Your server just needs to know how to talk to the client.
Let's say we've updated a SalesOrder and added a new SalesOrderItem. The Breeze client sends a save bundle that looks something like this:
{
"entities": [
{
"Id": 123,
"Title": "My Updated Title",
"OrderDate": "2014-08-03T07:00:00.000Z",
"entityAspect": {
"entityTypeName": "SalesOrder:#My.DomainModel",
"entityState": "Modified",
"originalValuesMap": {
"Title": "My Original Title"
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
},
{
"Id": -1,
"SalesOrderId": 123,
"ProductId": 456,
"Quantity": 11,
"entityAspect": {
"entityTypeName": "SalesOrderItem:#My.DomainModel",
"entityState": "Added",
"originalValuesMap": {
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
}
]
}
Here, SalesOrder with Id# 123 has been modified (its Title has been changed). The entityAspect includes the originalValuesMap which shows what the previous Title was.
The server would need to update the existing SalesOrder with the new value. Whether the server needs to query the existing SalesOrder from the database before applying the changes is implementation-dependent.
A new SalesOrderItem has been added. A temporary Id, -1, was created for it on the client. The server needs to create and persist a new SalesOrderItem and generate a real Id for it.
The response from the server should contain the entities that were created and updated, and KeyMapping information that shows what server-generated keys map to the temporary client-side keys, so that the client can replace them.
Change tracking is not a simple problem, but Breeze tries to do the hard parts for you.

I'd like to piggy back on Steve's answer.
We should be clear: the onus for implementing the Order-graph (AKA "Order aggregate") transaction in a relational data model falls on the developer. BreezeJS (and Breeze helpers for .NET servers) can facilitate but you have to make it work.
The key to making this work is including the root element of the aggregate - the Order - in all changes to any entity within the aggregate. If you add, delete, or modify an OrderItem, make sure you modify the Order at the same time .
How? By bumping the Order's concurrency property (e.g, the rowVersion) and making sure that Breeze KNOWS this is your concurrency property.
You must implement root entity optimistic concurrency if you want to ensure Order aggregate consistency.
Now you can detect if someone else has made a change to any part of the Order aggregate. That could be a change to the Order or an add/mod/delete of one of its OrderItems.
You do not have to include all OrderItems in the change-set when you save a changed Order aggregate. You only need to include the OrderItems that are added/modified/deleted.
Of course some other user may make a change to the Order aggregate before you save yours. When you try to save yours, the save will fail with an optimistic concurrency error.
Upon detecting an optimistic concurrency error for an Order, make sure the client removes the entire order aggregate from cache - the Order and all of its OrderItems - and then re-fetch the aggregate Don't just re-fetch the root Order entity and start messing with its items. Make sure you remove the entire aggregate from cache and then re-fetch it (the order and its items).
If everyone follows this protocol you'll be in fine shape on the server.

Related

Event Sourcing: concurrently creating conflicting events

I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser messages. However, when the second controller tries to persist those 2 NewUser messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.
Any ideas on how to address this?
Thanks.
UPDATE:
In particular, I would like to verify whether the following are accepted approaches to the problem:
use the username as the userId (restrictive)
send an event to a topic partitioned by username and when validation
is done send an event to another topic
Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:
1. Pessimistic approach:
If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:
You can wait until all events published before locking are materialized.
You can read current state from the database and apply events on it in a separate process.
2. Optimistic approach:
In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.
Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.
During a new user sign-up I want to check if the username the user provided is already taken.
You may want to review Greg Young's essay on Set Validation.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB).
That's a little bit different from the usual arrangement. (You may also want to review Greg's talk on polyglot data.)
Suppose we begin with two writers; that's fine, but if there is going to be a single point of truth, then you are going to need synchronization somewhere.
The usual arrangement is to use a form of optimistic concurrency; when processing a request, you reserve a copy of your original state, then you do your calculation, and finally you send the book of record a `replace(originalState,newState)'.
So at this point, we have two writes racing toward the book of record
replace(red,green)
replace(red,blue)
At the book of record, the writes are processed in series.
[...,replace(red,blue)...,replace(red,green)]
So when the book of record processes replace(red,blue), it performs a check that yes, the state is currently red, and swaps in blue. Later, when the book of record tries to process replace(red,green), the book of record performs the check, which fails because the state is no longer red.
So one of the writes has succeeded, and the other fails; the latter can propagate the failure outwards, or retry, or..., precisely what depends on the specific mechanics in question. A retry should mean, of course, reload the "original state", at which point the model would discover that some previous edit already claimed the username.
Any ideas on how to address this?
Single writer per stream makes the rest of the problem pretty simple, by eliminating the ambiguity introduced by having multiple in memory copies of the model.
Multiple writers using a synchronous write to the durable store is probably the most common design. It requires an event store that understands the idea of writing to a specific location in a stream -- aka "expected version".
You can perform an asynchronous write, and then start doing other work until you get an acknowledgement that the write succeeded (or not, or until you time out, or)....
There's no magic -- if you want uniqueness (or any other sort of invariant enforcement, for that matter), then everybody needs to agree on a single authority, and anybody else who wants to propose a change won't know if it has been accepted without getting word back from the authority, and needs to be prepared for a rejected proposal.
(Note: this shouldn't be a surprise -- if you were using a traditional design with current state stored in a RDBMS, then your authority would be a user table in the database, with a uniqueness constraint on the username column, and the race would be between the two insert statements trying to finish their transaction first....)

REST API - Update of single resource changes multiple others

I'm looking for a way how to deal with a following problem:
Imagine you modify a resource and that subsequently causes update of other resources.
E.g. you issue a PUT to, say /api/orders/1234, which by definition changes state of all other Orders of given user. There may be UI clients that display the table of Orders and they should know that not only single item in the table was updated, but eventually other as well.
Now, is there any standard way how inform a clients about such a situation?
So far I can only think of sending back the 205 Reset Content HTTP status code to inform the client that he should refresh the state, as not just a single thing was changed.
There are multiple solutions.
You can define specific resources as non-cacheable, so the client does not cache them at all. (no-store)
You can try giving a max-age of 0, so the client will have to re-validate those resources always. In this case you might have to implement ETags and conditional GETs, but it will be easier on the server than option 1.
Some push method like WebSockets.
If you really want to "notify" potentially multiple clients of a change, then it sounds like you would need option 3.
However, correctly configured caching is normally good enough. For example you could mark not-yet-executed orders as not cached (max-age=0), but as soon as it is executed, you might mark it to be cached indefinitely, since it can not change anymore.

Ember Data Sync - LocalStorage+REST+RealTime+Online/Offline

We have a combination of requirements in terms o data access.
Pre-load some reference data.
We need reference data to survive browser restarts instead of just living in memory to avoid loading it all the time. I'm currently using the LocalStorageAdapter for that.
Once we have it, we would like to sync changes (polling or using Socket.IO in the background and updating the LocalStorage could do the trick)
There're other models that are more transactional, where we would need to directly go to the Server and get/save them. It would be nice to use something like the RESTAdapter for that.
Lastly, there're some operations that should work off-line and changes should be synced later.
To make it more concrete:
We pre-load vendor and "favorite products" into Local Storage. We work offline with those.
We need to sync server changes to vendor and product information.
If they search the full catalog, that requires them to be online.
When offline, we need to allow users to add something to their cart or even submit and order. We would like to queue this action and submit it when they have an Internet Connection.
So a few questions are derived from this:
Is there a way to user RESTAdapter in combination with LocalStorage?
Is there some Socket.IO support? (Happy to do this part manually)
Is there Queueing support? Ideally at the Ember-Data level.
I know we will have to do a lot of this manually and pull together the different lego pieces, but I wanted to ask for some perspective from experience Ember devs.
You definitely can do this. Like you said you're going to need to do a lot of lego pieces to put it all together.
You'll need to take the RESTAdapter and LSAdapter and create a hybrid. We've done something a little similar at my work, but it only goes one way (from server to client, not reverse).
That being said, I'd just like to pose a few questions:
How much do you plan on storing in localStorage, and do you have an eviction plan in place? Local Storage is generally small for most browsers, though the implementation is the same across most browsers (not implemented until IE8). IndexedDB gives you a much larger chunk of space, though implementation isn't available until later versions of IE.
Depending on performance needs, I'd recommend storing localStorage first, then attempting to persist to the server, if that works pop from localStorage, if it doesn't leave it in there for your adapter to attempt at a later date. (I'd look into using Ember's schedule or scheduleOnce or a slew of other convenient helpers that work within the run loop, http://emberjs.com/api/classes/Ember.run.html#method_schedule).
Called by the store when a newly created record is
`save`d.
It serializes the record, and `POST`s it to a URL generated by `buildURL`.
See `serialize` for information on how to customize the serialized form
of a record.
createRecord: function(store, type, record) {
var data = {};
var serializer = store.serializerFor(type.typeKey);
serializer.serializeIntoHash(data, type, record, { includeId: true });
// build up a model that knows the url, the method, and the data to post
// store it to local storage in some queue to save
// schedule it to save to server later, keep track of the record since you'll
// need to update the record with new information later that could come down
// from the server
return this.ajax(this.buildURL(type.typeKey), "POST", { data: data });
},
Honestly I think the most difficult thing you might experience will be how you handle ids when you don't really save it to the server. Good luck

EntityFramework, Unit of Work - Tracking changes of custom data and sending it via WebService

We have Unit of Work implemented in EntityFramework, so when we use ObjectContext and make any changes to the Entity it is tracked and then on SaveChanges it is all reflected in underlying database.
But what if I want to track changes for my custom class, so every modifications are tracked down and sent through webservice call ?
I have webservice which provides me some data, that data is displayed in datagrid and then may be modified. I want to track all the changes down and then be able to send back through webservice the data only that have been modified. Is there any solution for that like EntityFramework or POCO or whatever ? Or I have to implement my own Unit of Work pattern for it ?
Change tracking works only when entity is attached to the context. There is special type of entities called Self tracking entities which is able to track changes on the client side when exposed with web service but these classes are still your primary entities (not custom objects) and they apply their tracked state directly to the context.
What you describe has nothing to do with unit-of-work pattern. You are looking for change set pattern which is able to pass only differences back to the service. Implementation of such classes is completely up to you. .NET doesn't provide them. .NET offers two implementations of change set pattern
mentioned Self tracking entities for EF
DataSet and related classes
Both these implementations transfer by default all data (moreover at least DataSets have by default both old and new state in the message). Both data sets and STEs share same limitations - they are very badly interoperable.
Change tracking at the property level should not be left to the client of a WCF call, for a variety of reasons. If you use a DTO (Data-Transfer Object) pattern, you should be able to keep your individual objects small enough to avoid having any significant overhead from sending the entire changed object across the wire. Then, on the server side, you load the current version of the object out of your database, set the values provided by the DTO, and let Entity Framework track the changed properties.
public SavePerson(Person person)
{
using(var context = _contextFactory.Get())
{
var persistentPerson = context.People.Single(p => p.PersonId == person.PersonId);
persistendPerson.FirstName = person.FirstName;
/// etc. (This could be done with a tool like AutoMapper)
context.SaveChanges();
}
}
If you're changing multiple objects on the client side, and you want to keep track of which ones the user has changed, you could have the client be responsible for keeping track of the objects that get changed and send only those objects to the web service in bulk. There, you can apply the same pattern and wait to SaveChanges until all of the objects have been updated.
Hopefully this helps.

Constructing a Domain Object from multiple DTOs

Suppose you have the canonical Customer domain object. You have three different screens on which Customer is displayed: External Admin, Internal Admin, and Update Account.
Suppose further that each screen displays only a subset of all of the data contained in the Customer object.
The problem is: when the UI passes data back from each screen (e.g. through a DTO), it contains only that subset of a full Customer domain object. So when you send that DTO to the Customer Factory to re-create the Customer object, you have only part of the Customer.
Then you send this Customer to your Customer Repository to save it, and a bunch of data will get wiped out because it isn't there. Tragedy ensues.
So the question is: how would you deal with this problem?
Some of my ideas:
include an argument to the
Repository indicating which part of
the Customer to update, and ignore
others
when you load the Customer, keep it in static memory, or in the session, or wherever, and then when you receive one of the DTOs from the UI, update only the parts relevant to the DTO
IMO, both of these are kludges. Are there any other better ideas?
#chadmyers: Here is the problem.
Entity has properties A, B, C, and D.
DTO #1 contains properties for B and C.
DTO #2 contains properties for C and D.
UI asks for DTO #1, you load entity from the repository, convert it into DTO #1, filling in only B and C, and give it to the UI.
Now UI updates B and sends the DTO back. You recreate the entity and it has only B and C filled in because that is all that is contained in the DTO.
Now you want to save the entity, which has only B and C filled in, with A and D null/blank. The repository has no way of knowing if it should update A and D in persistence as blanks, or whether it should ignore them.
I would use factory to load a complete customer object from repository upon receipt of DTO. After that you can update only those fields that were specified in DTO.
That also allows you to apply some optimistic concurrency on your customer by checking last-updated timestamp, for example.
Is this a web app? Load the customer object from the repo, update it from the DTO, save it back. That doesn't seem like a kludge to me. :)
UPDATE: As per your updates (the A, B, C, D example)
So what I was thinking is that when you load the entity, it has A, B, C, and D filled in. If DTO#1 only updates B & C, that's OK. A and D are unaffected (which is the desired situation).
What the repository does with the B & C updates is up to him. If you're using Hibernate/NHibernate, for example, it will just figure it out and issue an update.
Just because DTO #1 only has B & C doesn't mean you have to also null out A & D. Just leave them alone.
I missed the point of this question at first because it is predicated on a few things that I don't think make sense from a design perspective.
Hydrating an entity from repository and then converting it to a DTO is a waste of effort. I assume that your DAL passes a DTO to your repository which then converts it to a full entity object. So converting it back to a DTO seems wasteful.
Having multiple DTOs makes sense if you have a search results page that shows a high volume of records and only displays part of your entity data. In that case it's efficient to pass that page just the data it needs. It does not make sense to pass a DTO that contains partial data to a CRUD page. Just give it a full DTO or even a full entity object. If it doesn't use all of the data, fine, no harm done.
So that main problem is that I don't think you should pass data to these pages using partial DTOs. If you used a full DTO, I would do the following 3 steps whenever the save action is performed:
Pull the full DTO from repository or db
Update the DTO with any changes made through the form
Save the full DTO back to the repository or db
This method requires an extra db hit but that's really not a significant issue on a CRUD form.
If we have an understanding that a Repository handles (almost exclusively) very rich domain Entity, then you numerous DTO's could simply map back.
i.e.
dtoUser.MapFrom<In,Out>(Entity)
or
dtoAdmin.MapFrom<In,Out>(Entity)
you would do the reverse to get the dto information back to the Entity and so on. So your repository only saves rich Entity's NOT numerous DTO's
entity.Foo = dtoUser.Foo
or
entity.Bar = dtoAdmin.Bar
entityRepsotiry.Save(entity) <-- do not pass DTO.
The whole point of DTO's is to keep things simple for the presentation or say for WCF dataTransfer, it has nothing to do with the Repository or the Entity for that matter.
Furthermore, you should never construct an Entity from DTO's... the only two ways to ever acquire an Entity is through a Factory(new) or a Repository(existing) respectively.
You mention storing the Entity somewhere, why would you do this? That is the job of your repository. It will decide where to get the Entity(db,cache,e.t.c), no need to store it somewhere else.
Hope that helps assign responsibility in your domain, it is always a challenge and there are gray area's here and there but in general, these are the typical uses of Repository, DTO e.t.c.