DDD - Concurrency with quantity - concurrency

Hi everyone,
I'm a little bit lost with a problem thinking in ddd way.
Imagine you have an application to sell concert ticket. So you have an entity which is called Concert with the quantity number and a method to buy a ticket.
class Concert {
constructor(
public id: string,
public name: string,
public ticketQuantity: number,
) {}
buyTicket() {
this.ticketQuantity = this.ticketQuantity - 1;
}
}
The command looks like this:
async execute(command: BookConcertCommand): Promise<void> {
const concert = await this.concertRepository.findById(command.concertId);
concert.buyTicket();
await this.concertRepository.save(concert);
}
Imagine, your application has to carry a lot of users and 1000 users try to buy a ticket at the same when the ticketQuantity is 500.
How can you ensure the invariant of the quantity can't be lower than 0 ?
How can you deal with concurrency here because even if two users try to buy a ticket at the same time the data can be false ?
What are the patterns we can use to ensure consistency and concurrency ?
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users and we try to put all our logic domain into our domain so we can't put any logic inside sql/db or use a transactional script approach.

How can you ensure the invariant of the quantity can't be lower than 0
You include logic in your domain model that only assigns a ticket if at least one unassigned ticket is available.
You include locking (either optimistic or pessimistic) to ensure "first writer wins" -- the loser(s) in a data race should abort or retry.
If your book of record was just data in memory, then you would ensure that all attempts to buy tickets for concert 12345 must first acquire the same lock. In effect, you serialize the requests so that the business logic is running one at a time only.
If your book of record was a relational database, then within the context of each transaction you might perform a "select for update" to get a copy of the current data, and perform the update in the same transaction. The database will raise it's flavor of concurrent modification exception to the connections that lose the race.
Alternatively, you use something like the semantics of a conditional-write / compare and swap: you get an unlocked copy of the concert from the book of record, make your changes, then send a "update this record if it still looks like the unlocked copy" message, if you get the response announcing you've won the race, congratulations - you're done. If not, you retry or fail.
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users
Of course it can
If the concert is overbooked, they are going to be frustrated anyway
The business logic doesn't have to run synchronously with the request - it might be acceptable to write down that they want a ticket, and then contact them asynchronously to let them know that a ticket has been assigned to them
It may be helpful to review some of Udi Dahan's writing on collaborative and competitive domains; for instance, this piece from 2011.
In a collaborative domain, an inherent property of the domain is that multiple actors operate in parallel on the same set of data. A reservation system for concerts would be a good example of a collaborative domain – everyone wants the “good seats” (although it might be better call that competitive rather than collaborative, it is effectively the same principle).

You might be following these steps:
1- ReserveRequested -> ReserveRequestAccepted -> TicketReserved
2- ReserveRequested -> ReserveRequestRejected
When somebody clicks on the buy ticket button, you should create a reserve request entity, and then you can process the reservation in the background and by a queue system.
On the user side, you can return a unique reserve request-id to check the result of the process. So the frontend developer should fetch the result of process periodically until it succeeds or fails.

Related

How to deal with deadlocks using ReentrantReadWriteLock in microservices

We have a deadlock situation which occured because of this heavy load on the microservice (Say A) causing multiple requests from different client services (B,C). So these calls from B and C come for the same clientId(key) and are served by different instances of A and they try to update the same clientId data in database at same time causing below error.
CannotAcquireLockException is thrown,
(SQL Error: 60, SQLState: 61000..
ORA-00060: deadlock detected while waiting for resource
We have decided to implement sharding at load balancer(haproxy) level which will ensure same instance of A will always serve the requests from B and C for a specific key(clientId), so we dont have multiple instances processing the request for same key(clientId).
Now we get into the mode of everything in single jvm as we have made sure requests from B and C for a specific clientId always come to same instance of A.
With this its still possible that requests from B and C services come for same clientId with difference in time of nanoseconds. Any then multiple threads will again try to update the same clientId data in database at same time causing same error again.
To improve this we are looking for possible solutions and one solutions is ReentrantReadWriteLock which should take care of this based on the concepts.
We are using spring data jpa and have a save being done which looks like
clientJpaRepository.save(ClientObject);
Now is it possible to use something like below.
public void save(Client clientObject) {
String clientId = clientObject.getClientId();
try {
boolean isLockAcquired = writeLock.tryLock(100, TimeUnit.MILLISECONDS);
if (isLockAcquired) {
clientJpaRepository.save(clientObject);
}
} catch (InterruptedException e) {
log.error("exception occured trying to acquire lock for clientId={}", clientId);
} finally {
writeLock.unlock();
}
}
I am not very sure how its going to deal with the keys. As in i don't want any threads to block if they are wanting to update for different key(clientId 2).
Also, other thing to note is there could be reads happening as part of other API calls for this data from database. They would not be waiting too long hopefully and i hope i don't need to make any changes there for the reads.
Sorry for the long question, Hope i will hear from someone soon.
Thanks.

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.
Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing
The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Aggregate root invariant enforcement with application quotas

The application Im working on needs to enforce the following rules (among others):
We cannot register a new user to the system if the active user quota for the tenant is exceeded.
We cannot make a new project if the project quota for the tenant is exceeded.
We cannot add more multimedia resources to any project that belongs to a tenant if the maximum storage quota defined in the tenant is exceeded
The main entities involved in this domain are:
Tenant
Project
User
Resource
As you can imagine, these are the relationship between entities:
Tenant -> Projects
Tenant -> Users
Project -> Resources
As a first glance, It seems the aggregate root that will enforce those rules is the tenant:
class Tenant
attr_accessor :users
attr_accessor :projects
def register_user(name, email, ...)
raise QuotaExceededError if active_users.count >= #users_quota
User.new(name, email, ...).tap do |user|
active_users << user
end
end
def activate_user(user_id)
raise QuotaExceededError if active_users.count >= #users_quota
user = users.find {|u| u.id == user_id}
user.activate
end
def make_project(name, ...)
raise QuotaExceededError if projects.count >= #projects_quota
Project.new(name, ...).tap do |project|
projects << project
end
end
...
private
def active_users
users.select(&:active?)
end
end
So, in the application service, we would use this as:
class ApplicationService
def register_user(tenant_id, *user_attrs)
transaction do
tenant = tenants_repository.find(tenant_id, lock: true)
tenant.register_user(*user_attrs)
tenants_repository.save(tenant)!
end
end
...
end
The problem with this approach is that aggregate root is quite huge because it needs to load all users, projects and resources and this is not practical. And also, in regards to concurrency, we would have a lot of penalties due to it.
An alternative would be (I'll focus on user registration):
class Tenant
attr_accessor :total_active_users
def register_user(name, email, ...)
raise QuotaExceededError if total_active_users >= #users_quota
# total_active_users += 1 maybe makes sense although this field wont be persisted
User.new(name, email, ...)
end
end
class ApplicationService
def register_user(tenant_id, *user_attrs)
transaction do
tenant = tenants_repository.find(tenant_id, lock: true)
user = tenant.register_user(*user_attrs)
users_repository.save!(user)
end
end
...
end
The case above uses a factory method in Tenant that enforces the business rules and returns the User aggregate. The main advantage compared to the previous implementation is that we dont need to load all users (projects and resources) in the aggregate root, only the counts of them. Still, for any new resource, user or project we want to add/register/make, we potentially have concurrency penalties due to the lock acquired. For example, if Im registering a new user, we cannot make a new project at the same time.
Note also that we are acquiring a lock on Tenant and however we are not changing any state in it, so we dont call tenants_repository.save. This lock is used as a mutex and we cannot take advantage of optimistic concurrency unless we decide to save the tenant (detecting a change in the total_active_users count) so that we can update the tenant version and raise an error for other concurrent changes if the version has changed as usual.
Ideally, I'd like to get rid of those methods in Tenant class (because it also prevents us from splitting some pieces of the application in their own bounded contexts) and enforce the invariant rules in any other way that does not have a big impact with the concurrency in other entities (projects and resources), but I don't really know how to prevent two users to be registered simultaneously without using that Tenant as aggregate root.
I'm pretty sure that this is a common scenario that must have a better way to be implemented that my previous examples.
I'm pretty sure that this is a common scenario that must have a better way to be implemented that my previous examples.
A common search term for this sort of problem: Set Validation.
If there is some invariant that must always be satisfied for an entire set, then that entire set is going to have to be part of the "same" aggregate.
Often, the invariant itself is the bit that you want to push on; does the business need this constraint strictly enforced, or is it more appropriate to loosely enforce the constraint and charge a fee when the customer exceeds its contracted limits?
With multiple sets -- each set needs to be part of an aggregate, but they don't necessarily need to be part of the same aggregate. If there is no invariant that spans multiple sets, then you can have a separate aggregate for each. Two such aggregates may be correlated, sharing the same tenant id.
It may help to review Mauro Servienti's talk All our aggregates are wrong.
An aggregate shoud be just a element that check rules. It can be from a stateless static function to a full state complex object; and does not need to match your persistence schema nor your "real life" concepts nor how you modeled your entities nor how you structure your data or your views. You model the aggregate with just the data you need to check rules in the form that suits you best.
Do not be affraid about precompute values and persist them (total_active_users in this case).
My recommendation is keep things as simple as possible and refactor (that could mean split, move and/or merge things) later; once you have all behavior modelled, is easier to rethink and analyze to refactor.
This would be my first approach without event sourcing:
TenantData { //just the data the aggregate needs from persistence
int Id;
int total_active_users;
int quota;
}
UserEntity{ //the User Entity
int id;
string name;
date birthDate;
//other data and/or behaviour
}
public class RegistrarionAggregate{
private TenantData fromTenant;//data from persistence
public RegistrationAggregate(TenantData fromTenant){ //ctor
this.fromTenant = fromTenant;
}
public UserRegistered registerUser(UserEntity user){
if (fromTenant.total_active_users >= fromTenant.quota) throw new QuotaExceededException
fromTeant.total_active_users++; //increase active users
return new UserRegisteredEvent(fromTenant, user); //return system changes expressed as a event
}
}
RegisterUserCommand{ //command structure
int tenantId;
UserData userData;// id, name, surname, birthDate, etc
}
class ApplicationService{
public void registerUser(RegisterUserCommand registerUserCommand){
var user = new UserEntity(registerUserCommand.userData); //avoid wrong entity state; ctor. fails if some data is incorrect
RegistrationAggregate agg = aggregatesRepository.Handle(registerUserCommand); //handle is overloaded for every command we need. Use registerUserCommand.tenantId to bring total_active_users and quota from persistence, create RegistrarionAggregate fed with TenantData
var userRegisteredEvent = agg.registerUser(user);
persistence.Handle(userRegisteredEvent); //handle is overloaded for every event we need; open transaction, persist userRegisteredEvent.fromTenant.total_active_users where tenantId, optimistic concurrency could fail if total_active_users has changed since we read it (rollback transaction), persist userRegisteredEvent.user in relationship with tenantId, commit transaction
eventBus.publish(userRegisteredEvent); //notify external sources for eventual consistency
}
}
Read this and this for a expanded explanation.

Is there any way to define task quota in celery?

I have requirements:
I have few heavy-resource-consume task - exporting different reports that require big complex queries, sub queries
There are lot users.
I have built project in django, and queue task using celery
I want to restrict user so that they can request 10 report per minute. The idea is they can put hundreds of request 10 minute, but I want celery to execute 10 task for a user. So that every user gets their turn.
Is there any way so that celery can do this?
Thanks
Celery has a setting to control the RATE_LIMIT (http://celery.readthedocs.org/en/latest/userguide/tasks.html#Task.rate_limit), it means, the number of task that could be running in a time frame.
You could set this to '100/m' (hundred per second) maning your system allows 100 tasks per seconds, its important to notice, that setting is not per user neither task, its per time frame.
Have you thought about this approach instead of limiting per user?
In order to have a 'rate_limit' per task and user pair you will have to do it. I think (not sure) you could use a TaskRouter or a signal based on your needs.
TaskRouters (http://celery.readthedocs.org/en/latest/userguide/routing.html#routers) allow to route tasks to a specify queue aplying some logic.
Signals (http://celery.readthedocs.org/en/latest/userguide/signals.html) allow to execute code in few well-defined points of the task's scheduling cycle.
An example of Router's logic could be:
if task == 'A':
user_id = args[0] # in this task the user_id is the first arg
qty = get_task_qty('A', user_id)
if qty > LIMIT_FOR_A:
return
elif task == 'B':
user_id = args[2] # in this task the user_id is the seconds arg
qty = get_task_qty('B', user_id)
if qty > LIMIT_FOR_B:
return
return {'queue': 'default'}
With the approach above, every time a task starts you should increment by one in some place (for example Redis) the pair user_id/task_type and
every time a task finishes you should decrement that value in the same place.
Its seems kind of complex, hard to maintain and with few failure points for me.
Other approach, which i think could fit, is to implement some kind of 'Distributed Semaphore' (similar to distributed lock) per user and task, so in each task which needs to limit the number of task running you could use it.
The idea is, every time a task which should have 'concurrency control' starts it have to check if there is some resource available if not just return.
You could imagine this idea as below:
#shared_task
def my_task_A(user_id, arg1, arg2):
resource_key = 'my_task_A_{}'.format(user_id)
available = SemaphoreManager.is_available_resource(resource_key)
if not available:
# no resources then abort
return
try:
# the resourse could be acquired just before us for other
if SemaphoreManager.acquire(resource_key):
#execute your code
finally:
SemaphoreManager.release(resource_key)
Its hard to say which approach you SHOULD take because that depends on your application.
Hope it helps you!
Good luck!

How should I implement simple caches with concurrency on Redis?

Background
I have a 2-tier web service - just my app server and an RDBMS. I want to move to a pool of identical app servers behind a load balancer. I currently cache a bunch of objects in-process. I hope to move them to a shared Redis.
I have a dozen or so caches of simple, small-sized business objects. For example, I have a set of Foos. Each Foo has a unique FooId and an OwnerId.
One "owner" may own multiple Foos.
In a traditional RDBMS this is just a table with an index on the PK FooId and one on OwnerId. I'm caching this in one process simply:
Dictionary<int,Foo> _cacheFooById;
Dictionary<int,HashSet<int>> _indexFooIdsByOwnerId;
Reads come straight from here, and writes go here and to the RDBMS.
I usually have this invariant:
"For a given group [say by OwnerId], the whole group is in cache or none of it is."
So when I cache miss on a Foo, I pull that Foo and all the owner's other Foos from the RDBMS. Updates make sure to keep the index up to date and respect the invariant. When an owner calls GetMyFoos I never have to worry that some are cached and some aren't.
What I did already
The first/simplest answer seems to be to use plain ol' SET and GET with a composite key and json value:
SET( "ServiceCache:Foo:" + theFoo.Id, JsonSerialize(theFoo));
I later decided I liked:
HSET( "ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
That lets me get all the values in one cache as HVALS. It also felt right - I'm literally moving hashtables to Redis, so perhaps my top-level items should be hashes.
This works to first order. If my high-level code is like:
UpdateCache(myFoo);
AddToIndex(myFoo);
That translates into:
HSET ("ServiceCache:Foo", theFoo.FooId, JsonSerialize(theFoo));
var myFoos = JsonDeserialize( HGET ("ServiceCache:FooIndex", theFoo.OwnerId) );
myFoos.Add(theFoo.OwnerId);
HSET ("ServiceCache:FooIndex", theFoo.OwnerId, JsonSerialize(myFoos));
However, this is broken in two ways.
Two concurrent operations can read/modify/write at the same time. The latter "wins" the final HSET and the former's index update is lost.
Another operation could read the index in between the first and second lines. It would miss a Foo that it should find.
So how do I index properly?
I think I could use a Redis set instead of a json-encoded value for the index.
That would solve part of the problem since the "add-to-index-if-not-already-present" would be atomic.
I also read about using MULTI as a "transaction" but it doesn't seem like it does what I want. Am I right that I can't really MULTI; HGET; {update}; HSET; EXEC since it doesn't even do the HGET before I issue the EXEC?
I also read about using WATCH and MULTI for optimistic concurrency, then retrying on failure. But WATCH only works on top-level keys. So it's back to SET/GET instead of HSET/HGET. And now I need a new index-like-thing to support getting all the values in a given cache.
If I understand it right, I can combine all these things to do the job. Something like:
while(!succeeded)
{
WATCH( "ServiceCache:Foo:" + theFoo.FooId );
WATCH( "ServiceCache:FooIndexByOwner:" + theFoo.OwnerId );
WATCH( "ServiceCache:FooIndexAll" );
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
//TODO somehow set succeeded properly
}
Finally I'd have to translate this pseudocode into real code depending how my client library uses WATCH/MULTI/EXEC; it looks like they need some sort of context to hook them together.
All in all this seems like a lot of complexity for what has to be a very common case;
I can't help but think there's a better, smarter, Redis-ish way to do things that I'm just not seeing.
How do I lock properly?
Even if I had no indexes, there's still a (probably rare) race condition.
A: HGET - cache miss
B: HGET - cache miss
A: SELECT
B: SELECT
A: HSET
C: HGET - cache hit
C: UPDATE
C: HSET
B: HSET ** this is stale data that's clobbering C's update.
Note that C could just be a really-fast A.
Again I think WATCH, MULTI, retry would work, but... ick.
I know in some places people use special Redis keys as locks for other objects. Is that a reasonable approach here?
Should those be top-level keys like ServiceCache:FooLocks:{Id} or ServiceCache:Locks:Foo:{Id}?
Or make a separate hash for them - ServiceCache:Locks with subkeys Foo:{Id}, or ServiceCache:Locks:Foo with subkeys {Id} ?
How would I work around abandoned locks, say if a transaction (or a whole server) crashes while "holding" the lock?
For your use case, you don't need to use watch. You simply use a multi + exec block and you'd have eliminated the race condition.
In pseudo code -
MULTI();
SET ("ServiceCache:Foo:" + theFoo.FooId, JsonSerialize(theFoo));
SADD ("ServiceCache:FooIndexByOwner:" + theFoo.OwnerId, theFoo.FooId);
SADD ("ServiceCache:FooIndexAll", theFoo.FooId);
EXEC();
This is sufficient because multi makes the following promise :
"It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction"
You don't need the watch and retry mechanism because you are not reading and writing in the same transaction.