I am writing a SPA app that can be used on many devices by the same user. Say there are two entities E1 et E2 and two different users U1 and U2 logged in with the same account on the app. While U1 is adding/modifying/deleting on E1, U2 is also adding/modifying/deleting on E2. Then U2 saves E2 changes, E1 remaining unchanged on his device. Now, problem is when U1 saves E1 changes, E2 changes that U2 saved get overwritten by the initial E2 state that is on U1's device. My question is, how does we keep Breeze from overwriting changes that were made from another device if the entity is unchanged on ours? Practical example would be welcomed. Thanks.
The answer depends on whether the server implementation that you are using supports the concept of optimistic concurrency. With Breeze's .NET/Entity Framework/WebApi2 server which does support optimistic concurrency, breeze's saves will fail for any entity where the concurrency value for that entity has changed since it was last read. Breeze's server impl automatically updates this concurrency value with each save.
So with your example, Users U1 and U2 will both read in the same entity E1 with a concurrency value of say 'X'. When U1 attempts to save, the update checks that the current E1 concurrency value is in fact 'X' before the save is allowed to continue. In this case because U1 is first, the save succeeds and E1's concurrency value gets automatically updated to 'X2'.
Now when U2 attempts to save, the save will fail because the current E1 concurrency value of 'X2' does not match U2's copy of E1 which still has a concurrency value of 'X'. When a save fails the entire transaction is rolled back and an optimistic concurrency error is reported to the client.
Hope this is clear.
Breeze does not support concurrent saves defaultly. If you want concurrent save enable this option in while calling before saveChanges function in breeze..
var manager = new breeze.EntityManager({
dataservice: "api/Todo",
saveOptions: new breeze.SaveOptions({allowConcurrentSaves: true})
});
http://www.breezejs.com/documentation/concurrent-saves
Related
Hi everyone,
I'm a little bit lost with a problem thinking in ddd way.
Imagine you have an application to sell concert ticket. So you have an entity which is called Concert with the quantity number and a method to buy a ticket.
class Concert {
constructor(
public id: string,
public name: string,
public ticketQuantity: number,
) {}
buyTicket() {
this.ticketQuantity = this.ticketQuantity - 1;
}
}
The command looks like this:
async execute(command: BookConcertCommand): Promise<void> {
const concert = await this.concertRepository.findById(command.concertId);
concert.buyTicket();
await this.concertRepository.save(concert);
}
Imagine, your application has to carry a lot of users and 1000 users try to buy a ticket at the same when the ticketQuantity is 500.
How can you ensure the invariant of the quantity can't be lower than 0 ?
How can you deal with concurrency here because even if two users try to buy a ticket at the same time the data can be false ?
What are the patterns we can use to ensure consistency and concurrency ?
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users and we try to put all our logic domain into our domain so we can't put any logic inside sql/db or use a transactional script approach.
How can you ensure the invariant of the quantity can't be lower than 0
You include logic in your domain model that only assigns a ticket if at least one unassigned ticket is available.
You include locking (either optimistic or pessimistic) to ensure "first writer wins" -- the loser(s) in a data race should abort or retry.
If your book of record was just data in memory, then you would ensure that all attempts to buy tickets for concert 12345 must first acquire the same lock. In effect, you serialize the requests so that the business logic is running one at a time only.
If your book of record was a relational database, then within the context of each transaction you might perform a "select for update" to get a copy of the current data, and perform the update in the same transaction. The database will raise it's flavor of concurrent modification exception to the connections that lose the race.
Alternatively, you use something like the semantics of a conditional-write / compare and swap: you get an unlocked copy of the concert from the book of record, make your changes, then send a "update this record if it still looks like the unlocked copy" message, if you get the response announcing you've won the race, congratulations - you're done. If not, you retry or fail.
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users
Of course it can
If the concert is overbooked, they are going to be frustrated anyway
The business logic doesn't have to run synchronously with the request - it might be acceptable to write down that they want a ticket, and then contact them asynchronously to let them know that a ticket has been assigned to them
It may be helpful to review some of Udi Dahan's writing on collaborative and competitive domains; for instance, this piece from 2011.
In a collaborative domain, an inherent property of the domain is that multiple actors operate in parallel on the same set of data. A reservation system for concerts would be a good example of a collaborative domain – everyone wants the “good seats” (although it might be better call that competitive rather than collaborative, it is effectively the same principle).
You might be following these steps:
1- ReserveRequested -> ReserveRequestAccepted -> TicketReserved
2- ReserveRequested -> ReserveRequestRejected
When somebody clicks on the buy ticket button, you should create a reserve request entity, and then you can process the reservation in the background and by a queue system.
On the user side, you can return a unique reserve request-id to check the result of the process. So the frontend developer should fetch the result of process periodically until it succeeds or fails.
We have a data set that grows while the application is processing the data set. After a long discussion we have come to the decision that we do not want blocking or asynchronous APIs at this time, and we will periodically query our data store.
We thought of two options to design an API for querying our storage:
A query method returns a snapshot of the data and a flag indicating weather we might have more data. When we finish iterating over the last returned snapshot, we query again to get another snapshot for the rest of the data.
A query method returns a "live" iterator over the data, and when this iterator advances it returns one of the following options: Data is available, No more data, Might have more data.
We are using C++ and we borrowed the .NET style enumerator API for reasons which are out of scope for this question. Here is some code to demonstrate the two options. Which option would you prefer?
/* ======== FIRST OPTION ============== */
// similar to the familier .NET enumerator.
class IFooEnumerator
{
// true --> A data element may be accessed using the Current() method
// false --> End of sequence. Calling Current() is an invalid operation.
virtual bool MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
enum class Availability
{
EndOfData,
MightHaveMoreData,
};
class IDataProvider
{
// Query params allow specifying the ID of the starting element. Here is the intended usage pattern:
// 1. Call GetFoo() without specifying a starting point.
// 2. Process all elements returned by IFooEnumerator until it ends.
// 3. Check the availability.
// 3.1 MightHaveMoreDataLater --> Invoke GetFoo() again after some time by specifying the last processed element as the starting point
// and repeat steps (2) and (3)
// 3.2 EndOfData --> The data set will not grow any more and we know that we have finished processing.
virtual std::tuple<std::unique_ptr<IFooEnumerator>, Availability> GetFoo(query-params) = 0;
};
/* ====== SECOND OPTION ====== */
enum class Availability
{
HasData,
MightHaveMoreData,
EndOfData,
};
class IGrowingFooEnumerator
{
// HasData:
// We might access the current data element by invoking Current()
// EndOfData:
// The data set has finished growing and no more data elements will arrive later
// MightHaveMoreData:
// The data set will grow and we need to continue calling MoveNext() periodically (preferably after a short delay)
// until we get a "HasData" or "EndOfData" result.
virtual Availability MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
class IDataProvider
{
std::unique_ptr<IGrowingFooEnumerator> GetFoo(query-params) = 0;
};
Update
Given the current answers, I have some clarification. The debate is mainly over the interface - its expressiveness and intuitiveness in representing queries for a growing data-set that at some point in time will stop growing. The implementation of both interfaces is possible without race conditions (at-least we believe so) because of the following properties:
The 1st option can be implemented correctly if the pair of the iterator + the flag represent a snapshot of the system at the time of querying. Getting snapshot semantics is a non-issue, as we use database transactions.
The 2nd option can be implemented given a correct implementation of the 1st option. The "MoveNext()" of the 2nd option will, internally, use something like the 1st option and re-issue the query if needed.
The data-set can change from "Might have more data" to "End of data", but not vice versa. So if we, wrongly, return "Might have more data" because of a race condition, we just get a small performance overhead because we need to query again, and the next time we will receive "End of data".
"Invoke GetFoo() again after some time by specifying the last processed element as the starting point"
How are you planning to do that? If it's using the earlier-returned IFooEnumerator, then functionally the two options are equivalent. Otherwise, letting the caller destroy the "enumerator" then however-long afterwards call GetFoo() to continue iteration means you're losing your ability to monitor the client's ongoing interest in the query results. It might be that right now you have no need for that, but I think it's poor design to exclude the ability to track state throughout the overall result processing.
It really depends on many things whether the overall system will at all work (not going into details about your actual implementation):
No matter how you twist it, there will be a race condition between checking for "Is there more data" and more data being added to the system. Which means that it's possibly pointless to try to capture the last few data items?
You probably need to limit the number of repeated runs for "is there more data", or you could end up in an endless loop of "new data came in while processing the last lot".
How easy it is to know if data has been updated - if all the updates are "new items" with new ID's that are sequentially higher, you can simply query "Is there data above X", where X is your last ID. But if you are, for example, counting how many items in the data has property Y set to value A, and data may be updated anywhere in the database at the time (e.g. a database of where taxis are at present, that gets updated via GPS every few seconds and has thousands of cars, it may be hard to determine which cars have had updates since last time you read the database).
As to your implementation, in option 2, I'm not sure what you mean by the MightHaveMoreData state - either it has, or it hasn't, right? Repeated polling for more data is a bad design in this case - given that you will never be able to say 100% certain that there hasn't been "new data" provided in the time it took from fetching the last data until it was processed and acted on (displayed, used to buy shares on the stock market, stopped the train or whatever it is that you want to do once you have processed your new data).
Read-write lock could help. Many readers have simultaneous access to data set, and only one writer.
The idea is simple:
-when you need read-only access, reader uses "read-block", which could be shared with other reads and exclusive with writers;
-when you need write access, writer uses write-lock which is exclusive for both readers and writers;
Background
I have a 2-tier web service - just my app server and an RDBMS. I want to move to a pool of identical app servers behind a load balancer. I currently cache a bunch of objects in-process. I hope to move them to a shared Redis.
I have a dozen or so caches of simple, small-sized business objects. For example, I have a set of Foos. Each Foo has a unique FooId and an OwnerId.
One "owner" may own multiple Foos.
In a traditional RDBMS this is just a table with an index on the PK FooId and one on OwnerId. I'm caching this in one process simply:
Dictionary<int,Foo> _cacheFooById;
Dictionary<int,HashSet<int>> _indexFooIdsByOwnerId;
Reads come straight from here, and writes go here and to the RDBMS.
I usually have this invariant:
"For a given group [say by OwnerId], the whole group is in cache or none of it is."
So when I cache miss on a Foo, I pull that Foo and all the owner's other Foos from the RDBMS. Updates make sure to keep the index up to date and respect the invariant. When an owner calls GetMyFoos I never have to worry that some are cached and some aren't.
What I did already
The first/simplest answer seems to be to use plain ol' SET and GET with a composite key and json value:
SET( "ServiceCache:Foo:" + theFoo.Id, JsonSerialize(theFoo));
I later decided I liked:
HSET( "ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
That lets me get all the values in one cache as HVALS. It also felt right - I'm literally moving hashtables to Redis, so perhaps my top-level items should be hashes.
This works to first order. If my high-level code is like:
UpdateCache(myFoo);
AddToIndex(myFoo);
That translates into:
HSET ("ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
var myFoos = JsonDeserialize( HGET ("ServiceCache:FooIndex", theFoo.OwnerId) );
myFoos.Add(theFoo.OwnerId);
HSET ("ServiceCache:FooIndex", theFoo.OwnerId, JsonSerialize(myFoos));
However, this is broken in two ways.
Two concurrent operations can read/modify/write at the same time. The latter "wins" the final HSET and the former's index update is lost.
Another operation could read the index in between the first and second lines. It would miss a Foo that it should find.
So how do I index properly?
I think I could use a Redis set instead of a json-encoded value for the index.
That would solve part of the problem since the "add-to-index-if-not-already-present" would be atomic.
I also read about using MULTI as a "transaction" but it doesn't seem like it does what I want. Am I right that I can't really MULTI; HGET; {update}; HSET; EXEC since it doesn't even do the HGET before I issue the EXEC?
I also read about using WATCH and MULTI for optimistic concurrency, then retrying on failure. But WATCH only works on top-level keys. So it's back to SET/GET instead of HSET/HGET. And now I need a new index-like-thing to support getting all the values in a given cache.
If I understand it right, I can combine all these things to do the job. Something like:
while(!succeeded)
{
WATCH( "ServiceCache:Foo:" + theFoo.FooId );
WATCH( "ServiceCache:FooIndexByOwner:" + theFoo.OwnerId );
WATCH( "ServiceCache:FooIndexAll" );
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
//TODO somehow set succeeded properly
}
Finally I'd have to translate this pseudocode into real code depending how my client library uses WATCH/MULTI/EXEC; it looks like they need some sort of context to hook them together.
All in all this seems like a lot of complexity for what has to be a very common case;
I can't help but think there's a better, smarter, Redis-ish way to do things that I'm just not seeing.
How do I lock properly?
Even if I had no indexes, there's still a (probably rare) race condition.
A: HGET - cache miss
B: HGET - cache miss
A: SELECT
B: SELECT
A: HSET
C: HGET - cache hit
C: UPDATE
C: HSET
B: HSET ** this is stale data that's clobbering C's update.
Note that C could just be a really-fast A.
Again I think WATCH, MULTI, retry would work, but... ick.
I know in some places people use special Redis keys as locks for other objects. Is that a reasonable approach here?
Should those be top-level keys like ServiceCache:FooLocks:{Id} or ServiceCache:Locks:Foo:{Id}?
Or make a separate hash for them - ServiceCache:Locks with subkeys Foo:{Id}, or ServiceCache:Locks:Foo with subkeys {Id} ?
How would I work around abandoned locks, say if a transaction (or a whole server) crashes while "holding" the lock?
For your use case, you don't need to use watch. You simply use a multi + exec block and you'd have eliminated the race condition.
In pseudo code -
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
This is sufficient because multi makes the following promise :
"It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction"
You don't need the watch and retry mechanism because you are not reading and writing in the same transaction.
Given a System that contains two components, A and B, and
The System starts up A and B concurrently. Now A can go through states {A.Starting, A.Ready}, and B can be in states {B.Starting, B.DoingX, B.DoingY}. (Events to transition between A's and B's states are named accordingly: B.doingx => B goes to B.DoingX etc...)
I want to model that
While A is in A.Starting, or B is in B.Starting, the System is "Starting"
The System is in state "DoingX" when A is in A.Ready and B is in B.DoingX
The System is in state "DoingY" when A is in A.Ready and B is in B.DoingY
If I'm not mistaken, the fork/join pseudo-states could be used here.
But do these model elements have the declarative semantics of the composed state mentioned above? Is there another way to model this?
(Note: the diagrams are from http://yuml.me)
Why don't you just pull these apart? Here's another idea on how you could model it (assuming I understood it correctly) :
a state "Starting", that contains the states you refer to as A.Starting and B.Starting in parallel regions (you can use fork/joins here, or just rely on the default behavior of all regions being activated when "Starting" state is entered)
another state "Doing" that contains a region with your "A.Ready" state and another parallel region, that contains the two states "B.DoingX" and "B.DoingY".
If you really need to have an overall "DoingX" state, then you may have to create two states that correspond to A.Ready.
Anyways, on a broader perspective: I believe your point of view is a little bit off here, when you say that the "System is in state ...". Rather, the system modeled by such a state machine is in a set of states. So normally, I would be perfectly happy to say that "the system is currently in A.Ready and B.DoingX".
Maybe all you need is a change of terminology. What about this:
The system is in configuration "DoingX" when A.Ready and B.DoingX states are active ?
In response to the comment: Yes, this is standard, here's the corresponding part from the superstructure specification (version 2.4 beta):
In a hierarchical state machine more than one state can be active at the same time. [...] the current active “state” is actually represented by a set of trees of states
starting with the top-most states of the root regions down to the innermost active substate. We refer to such a state tree as
a state configuration.
please can somebody help me to explain the following (for me) very strange JPA behaviour. I intentionally change primary key of entity which is prohibed in JPA.
So first commit correctly throws "Exception Description: The attribute [date] of class [some.package.Holiday] is mapped to a primary key column in the database. Updates are not allowed.".
But second (third, fourth, ...) succeed...! How is this possible?!
Holiday h1 = EM.find(Holiday.class, new GregorianCalendar(2011, 0, 3).getTime());
try {
EM.getTransaction().begin();
h1.setDate(new GregorianCalendar(2011, 0, 4).getTime());
EM.getTransaction().commit();
System.out.println("First commit succeed");
} catch (Exception e) {
System.out.println("First commit failed");
}
try {
EM.getTransaction().begin();
EM.getTransaction().commit();
System.out.println("Second commit succeed");
} catch (Exception e) {
System.out.println("Second commit failed");
}
It will printout:
First commit failed
Second commit succeed
OMG, how is this possible?!
(Using EclipseLink 2.2.0.v20110202-r8913 with MySQL.)
The failure of the commit operation for the first transaction has no bearing on the second transaction. This is due to the fact that when the first commit fails, the EntityTransaction is no longer in the active state. When you issue the second em.getTransaction().begin invocation, a new transaction is initiated that does not have any knowledge of the first.
It is important to note that although your code may use the same EntityTransaction reference in both cases, it is not necessary that this class actually represent the transaction. In the case of EclipseLink, the EntityTransaction reference actually wraps an EntityTransactionWrapper instance that further uses a RepeatableUnitOfWork, the latter two classes being provided by EclipseLink implementation and not JPA. It is the RepeatableWriteUnitOfWork instance that actually tracks the collection of changes made to entities that will be merged into the shared cache (and the database). When the first transaction fails, the underlying UnitOfWork is invalidated, and new UnitOfWork is established when you start the second EntityTransaction.
The same will apply to most other JPA providers as the EntityTransaction class is not a concrete final class. Instead, it is an interface that is typically implemented by another class in the JPA provider, and which may likewise wrap a transaction thereby requiring clients to use the EntityTransaction reference instead of directly working with the underlying transaction (which may be a JTA transaction or a resource-local transaction).
Additionally, you ought to remember that:
EntityTransaction.begin() should be invoked only once. Invoking it a second time will result in an IllegalStateException exception being thrown as it cannot be invoked when a transaction is active. So, the fact that you are able to invoke it the second time, implies that the first transaction is no longer active.
If you require the changes performed in the context of the first transaction to be made available to the second, you must merge the entities back into the shared context in the second transaction, after they've been detached by the first. While, this may sound ridiculous, you ought to remember that detached entities can be modified by clients (read, end-users) before they are merged back, so the changes made by the end users may be retained, while mistakes (like the modification of the primary keys) may be corrected in the interim.