Aggregate root invariant enforcement with application quotas - concurrency

The application Im working on needs to enforce the following rules (among others):
We cannot register a new user to the system if the active user quota for the tenant is exceeded.
We cannot make a new project if the project quota for the tenant is exceeded.
We cannot add more multimedia resources to any project that belongs to a tenant if the maximum storage quota defined in the tenant is exceeded
The main entities involved in this domain are:
Tenant
Project
User
Resource
As you can imagine, these are the relationship between entities:
Tenant -> Projects
Tenant -> Users
Project -> Resources
As a first glance, It seems the aggregate root that will enforce those rules is the tenant:
class Tenant
attr_accessor :users
attr_accessor :projects
def register_user(name, email, ...)
raise QuotaExceededError if active_users.count >= #users_quota
User.new(name, email, ...).tap do |user|
active_users << user
end
end
def activate_user(user_id)
raise QuotaExceededError if active_users.count >= #users_quota
user = users.find {|u| u.id == user_id}
user.activate
end
def make_project(name, ...)
raise QuotaExceededError if projects.count >= #projects_quota
Project.new(name, ...).tap do |project|
projects << project
end
end
...
private
def active_users
users.select(&:active?)
end
end
So, in the application service, we would use this as:
class ApplicationService
def register_user(tenant_id, *user_attrs)
transaction do
tenant = tenants_repository.find(tenant_id, lock: true)
tenant.register_user(*user_attrs)
tenants_repository.save(tenant)!
end
end
...
end
The problem with this approach is that aggregate root is quite huge because it needs to load all users, projects and resources and this is not practical. And also, in regards to concurrency, we would have a lot of penalties due to it.
An alternative would be (I'll focus on user registration):
class Tenant
attr_accessor :total_active_users
def register_user(name, email, ...)
raise QuotaExceededError if total_active_users >= #users_quota
# total_active_users += 1 maybe makes sense although this field wont be persisted
User.new(name, email, ...)
end
end
class ApplicationService
def register_user(tenant_id, *user_attrs)
transaction do
tenant = tenants_repository.find(tenant_id, lock: true)
user = tenant.register_user(*user_attrs)
users_repository.save!(user)
end
end
...
end
The case above uses a factory method in Tenant that enforces the business rules and returns the User aggregate. The main advantage compared to the previous implementation is that we dont need to load all users (projects and resources) in the aggregate root, only the counts of them. Still, for any new resource, user or project we want to add/register/make, we potentially have concurrency penalties due to the lock acquired. For example, if Im registering a new user, we cannot make a new project at the same time.
Note also that we are acquiring a lock on Tenant and however we are not changing any state in it, so we dont call tenants_repository.save. This lock is used as a mutex and we cannot take advantage of optimistic concurrency unless we decide to save the tenant (detecting a change in the total_active_users count) so that we can update the tenant version and raise an error for other concurrent changes if the version has changed as usual.
Ideally, I'd like to get rid of those methods in Tenant class (because it also prevents us from splitting some pieces of the application in their own bounded contexts) and enforce the invariant rules in any other way that does not have a big impact with the concurrency in other entities (projects and resources), but I don't really know how to prevent two users to be registered simultaneously without using that Tenant as aggregate root.
I'm pretty sure that this is a common scenario that must have a better way to be implemented that my previous examples.

I'm pretty sure that this is a common scenario that must have a better way to be implemented that my previous examples.
A common search term for this sort of problem: Set Validation.
If there is some invariant that must always be satisfied for an entire set, then that entire set is going to have to be part of the "same" aggregate.
Often, the invariant itself is the bit that you want to push on; does the business need this constraint strictly enforced, or is it more appropriate to loosely enforce the constraint and charge a fee when the customer exceeds its contracted limits?
With multiple sets -- each set needs to be part of an aggregate, but they don't necessarily need to be part of the same aggregate. If there is no invariant that spans multiple sets, then you can have a separate aggregate for each. Two such aggregates may be correlated, sharing the same tenant id.
It may help to review Mauro Servienti's talk All our aggregates are wrong.

An aggregate shoud be just a element that check rules. It can be from a stateless static function to a full state complex object; and does not need to match your persistence schema nor your "real life" concepts nor how you modeled your entities nor how you structure your data or your views. You model the aggregate with just the data you need to check rules in the form that suits you best.
Do not be affraid about precompute values and persist them (total_active_users in this case).
My recommendation is keep things as simple as possible and refactor (that could mean split, move and/or merge things) later; once you have all behavior modelled, is easier to rethink and analyze to refactor.
This would be my first approach without event sourcing:
TenantData { //just the data the aggregate needs from persistence
int Id;
int total_active_users;
int quota;
}
UserEntity{ //the User Entity
int id;
string name;
date birthDate;
//other data and/or behaviour
}
public class RegistrarionAggregate{
private TenantData fromTenant;//data from persistence
public RegistrationAggregate(TenantData fromTenant){ //ctor
this.fromTenant = fromTenant;
}
public UserRegistered registerUser(UserEntity user){
if (fromTenant.total_active_users >= fromTenant.quota) throw new QuotaExceededException
fromTeant.total_active_users++; //increase active users
return new UserRegisteredEvent(fromTenant, user); //return system changes expressed as a event
}
}
RegisterUserCommand{ //command structure
int tenantId;
UserData userData;// id, name, surname, birthDate, etc
}
class ApplicationService{
public void registerUser(RegisterUserCommand registerUserCommand){
var user = new UserEntity(registerUserCommand.userData); //avoid wrong entity state; ctor. fails if some data is incorrect
RegistrationAggregate agg = aggregatesRepository.Handle(registerUserCommand); //handle is overloaded for every command we need. Use registerUserCommand.tenantId to bring total_active_users and quota from persistence, create RegistrarionAggregate fed with TenantData
var userRegisteredEvent = agg.registerUser(user);
persistence.Handle(userRegisteredEvent); //handle is overloaded for every event we need; open transaction, persist userRegisteredEvent.fromTenant.total_active_users where tenantId, optimistic concurrency could fail if total_active_users has changed since we read it (rollback transaction), persist userRegisteredEvent.user in relationship with tenantId, commit transaction
eventBus.publish(userRegisteredEvent); //notify external sources for eventual consistency
}
}
Read this and this for a expanded explanation.

Related

DDD - Concurrency with quantity

Hi everyone,
I'm a little bit lost with a problem thinking in ddd way.
Imagine you have an application to sell concert ticket. So you have an entity which is called Concert with the quantity number and a method to buy a ticket.
class Concert {
constructor(
public id: string,
public name: string,
public ticketQuantity: number,
) {}
buyTicket() {
this.ticketQuantity = this.ticketQuantity - 1;
}
}
The command looks like this:
async execute(command: BookConcertCommand): Promise<void> {
const concert = await this.concertRepository.findById(command.concertId);
concert.buyTicket();
await this.concertRepository.save(concert);
}
Imagine, your application has to carry a lot of users and 1000 users try to buy a ticket at the same when the ticketQuantity is 500.
How can you ensure the invariant of the quantity can't be lower than 0 ?
How can you deal with concurrency here because even if two users try to buy a ticket at the same time the data can be false ?
What are the patterns we can use to ensure consistency and concurrency ?
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users and we try to put all our logic domain into our domain so we can't put any logic inside sql/db or use a transactional script approach.
How can you ensure the invariant of the quantity can't be lower than 0
You include logic in your domain model that only assigns a ticket if at least one unassigned ticket is available.
You include locking (either optimistic or pessimistic) to ensure "first writer wins" -- the loser(s) in a data race should abort or retry.
If your book of record was just data in memory, then you would ensure that all attempts to buy tickets for concert 12345 must first acquire the same lock. In effect, you serialize the requests so that the business logic is running one at a time only.
If your book of record was a relational database, then within the context of each transaction you might perform a "select for update" to get a copy of the current data, and perform the update in the same transaction. The database will raise it's flavor of concurrent modification exception to the connections that lose the race.
Alternatively, you use something like the semantics of a conditional-write / compare and swap: you get an unlocked copy of the concert from the book of record, make your changes, then send a "update this record if it still looks like the unlocked copy" message, if you get the response announcing you've won the race, congratulations - you're done. If not, you retry or fail.
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users
Of course it can
If the concert is overbooked, they are going to be frustrated anyway
The business logic doesn't have to run synchronously with the request - it might be acceptable to write down that they want a ticket, and then contact them asynchronously to let them know that a ticket has been assigned to them
It may be helpful to review some of Udi Dahan's writing on collaborative and competitive domains; for instance, this piece from 2011.
In a collaborative domain, an inherent property of the domain is that multiple actors operate in parallel on the same set of data. A reservation system for concerts would be a good example of a collaborative domain – everyone wants the “good seats” (although it might be better call that competitive rather than collaborative, it is effectively the same principle).
You might be following these steps:
1- ReserveRequested -> ReserveRequestAccepted -> TicketReserved
2- ReserveRequested -> ReserveRequestRejected
When somebody clicks on the buy ticket button, you should create a reserve request entity, and then you can process the reservation in the background and by a queue system.
On the user side, you can return a unique reserve request-id to check the result of the process. So the frontend developer should fetch the result of process periodically until it succeeds or fails.

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.
Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing
The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Allocating datastore id using PRNG

Google Cloud Datastore documents that if an entity id needs to be pre-allocated, then one should use the allocateIds method:
https://cloud.google.com/datastore/docs/best-practices#keys
That method seems to make a REST or RPC call which has latency. I'd like to avoid that latency by using a PRNG in my Kubernetes Engine application. Here's the scala code:
import java.security.SecureRandom
class RandomFactory {
protected val r = new SecureRandom
def randomLong: Long = r.nextLong
def randomLong(min: Long, max: Long): Long =
// Unfortunately, Java didn't make Random.internalNextLong public,
// so we have to get to it in an indirect way.
r.longs(1, min, max).toArray.head
// id may be any value in the range (1, MAX_SAFE_INTEGER),
// so that it can be represented in Javascript.
// TODO: randomId is used in production, and might be susceptible to
// TODO: blocking if /dev/random does not contain entropy.
// TODO: Keep an eye on this concern.
def randomId: Long =
randomLong(1, RandomFactory.MAX_SAFE_INTEGER)
}
object RandomFactory extends RandomFactory {
// MAX_SAFE_INTEGER is es6 Number.MAX_SAFE_INTEGER
val MAX_SAFE_INTEGER = 9007199254740991L
}
I also plan to install haveged in the pod to help with entropy.
I understand allocateIds ensures that an ID is not already in use. But in my particular use case, there are two mitigating factors to overlooking that concern:
Based on entity count, the chance of a conflict is 1 in 100 million.
This particular entity type is non-essential, and can afford a "once in a blue moon" conflict.
I am more concerned about even distribution in keyspace, because that is normal use case concern.
Will this approach work, particularly with even distribution in keyspace? Is the allocatedIds method essential, or does it just help developers avoid simple mistakes?
To get rid of collisions use more bits -- for all practical purposes 128 [See statistics behind UUID V4] will never generate a collision.
Another technique is to insert new entities with a shorter random number and handle the error Cloud Datastore returns if they already exist by trying again with a new ID (until you happen upon one that isn't currently in use).
As far as the key distribution goes: the keys will be randomly distributed within the key space will keep Cloud Datastore happy.
Given that you don't want the entity identifier to be based on an external value, you should allow Cloud Datastore to allocate IDs for you. This way you won't have any conflicts. The IDs allocated by Cloud Datastore will be appropriately scattered through the key space.

Doctrine2, update some datas before persist

I'm developing my php software using Doctrine2. It is quite simple to use it but I have a little problem and I would know what is the best practice in that situation. Maybe you could help me ! You'll have all my gratitude :-D
Situation :
I have 2 entities (User and Contacts)
A User can contain some Contacts
The entity (table) Contacts have a field labelled mainContact which define if it is the main contact of the user or not.
Ony one contact could be the main contact (mainContact=1)
Problematic :
I woud like that when I persist a contact :
If this contact has mainContact=1, all other contacts associated to
the user sould be updated to mainContact=0
If this contact has mainContact=0, I need to check all other
contacts. If I don't find any other contact with mainContact=1 for
this user, I automaticly update the current contact with
setMainContact(true).
Possible solutions :
I have some idea how to process this logic but I would like to know the best practice in order to do a good code because this application will be an open source application.
Not clean ideas :
Create a method in the Contact Repository that will update all the
others contacts assigned to the user and return the value to
attribute to the current contact.
With this solution, I must launch the repository method always before to persist a contact all around the application. If I forgot to launch it, the database integrity should be compromised.
Use the Prepersist mecanism from the entity to get the entitymanager
and update all others user's contacts.
This method is not recommanded, the entity should never access directly the entity manager.
Can anyone tell me what is the best practice to do so ? Thank you very much !
PS : Sorry for my poor english !
The best thing you can do here (from a pure OOP perspective, without even the persistence logic) is to implement this logic in your entity's setters. After all, the logic isn't heavy considered that a User won't have many contacts, nor the operation will happen very often.
<?php
class User
{
protected $contacts;
// constructor, other fields, other methods
public function addContact(Contact $contact)
{
if ($this->contacts->contains($contact)) {
return;
}
if ($contact->isMainContact()) {
foreach ($this->contacts as $existingContact) {
$existingContact->setMainContact(false);
}
$this->contacts->add($contact);
$contact->setUser($this); // set the owning side of the relation too!
return;
}
$mainContact = true;
foreach ($this->contacts as $existingContact) {
if ($existingContact->isMainContact()) {
$mainContact = false;
break; // no need for further checks
}
}
$contact->setMainContact($mainContact);
$this->contacts->add($contact);
$contact->setUser($this); // set the owning side of the relation too!
}
}
On the other side, think about adding a field to your user instead:
<?php
class User
{
// keep reference here instead of the contact (cleaner)
protected $mainContact;
}

How should I do post persist/update actions in doctrine 2.1, that involves re-saving to the db?

Using doctrine 2.1 (and zend framework 1.11, not that it matters for this matter), how can I do post persist and post update actions, that involves re-saving to the db?
For example, creating a unique token based on the just generated primary key' id, or generating a thumbnail for an uploaded image (which actually doesn't require re-saving to the db, but still) ?
EDIT - let's explain, shall we ?
The above is actually a question regarding two scenarios. Both scenarios relate to the following state:
Let's say I have a User entity. When the object is flushed after it has been marked to be persisted, it'll have the normal auto-generated id of mysql - meaning running numbers normally beginning at 1, 2, 3, etc..
Each user can upload an image - which he will be able to use in the application - which will have a record in the db as well. So I have another entity called Image. Each Image entity also has an auto-generated id - same methodology as the user id.
Now - here is the scenarios:
When a user uploads an image, I want to generate a thumbnail for that image right after it is saved to the db. This should happen for every new or updated image.
Since we're trying to stay smart, I don't want the code to generate the thumbnail to be written like this:
$image = new Image();
...
$entityManager->persist($image);
$entityManager->flush();
callToFunctionThatGeneratesThumbnailOnImage($image);
but rather I want it to occur automatically on the persisting of the object (well, flush of the persisted object), like the prePersist or preUpdate methods.
Since the user uploaded an image, he get's a link to it. It will probably look something like: http://www.mysite.com/showImage?id=[IMAGEID].
This allows anyone to just change the imageid in this link, and see other user's images.
So in order to prevent such a thing, I want to generate a unique token for every image. Since it doesn't really need to be sophisticated, I thought about using the md5 value of the image id, with some salt.
But for that, I need to have the id of that image - which I'll only have after flushing the persisted object - then generate the md5, and then saving it again to the db.
Understand that the links for the images are supposed to be publicly accessible so I can't just allow an authenticated user to view them by some kind of permission rules.
You probably know already about Doctrine events. What you could do:
Use the postPersist event handler. That one occurs after the DB insert, so the auto generated ids are available.
The EventManager class can help you with this:
class MyEventListener
{
public function postPersist(LifecycleEventArgs $eventArgs)
{
// in a listener you have the entity instance and the
// EntityManager available via the event arguments
$entity = $eventArgs->getEntity();
$em = $eventArgs->getEntityManager();
if ($entity instanceof User) {
// do some stuff
}
}
}
$eventManager = $em->getEventManager():
$eventManager->addEventListener(Events::postPersist, new MyEventListener());
Be sure to check e. g. if the User already has an Image, otherwise if you call flush in the event listener, you might be caught in an endless loop.
Of course you could also make your User class aware of that image creation operation with an inline postPersist eventHandler and add #HasLifecycleCallbacks in your mapping and then always flush at the end of the request e. g. in a shutdown function, but in my opinion this kind of stuff belongs in a separate listener. YMMV.
If you need the entity id before flushing, just after creating the object, another approach is to generate the ids for the entities within your application, e. g. using uuids.
Now you can do something like:
class Entity {
public function __construct()
{
$this->id = uuid_create();
}
}
Now you have an id already set when you just do:
$e = new Entity();
And you only need to call EntityManager::flush at the end of the request
In the end, I listened to #Arms who commented on the question.
I started using a service layer for doing such things.
So now, I have a method in the service layer which creates the Image entity. After it calls the persist and flush, it calls the method that generates the thumbnail.
The Service Layer pattern is a good solution for such things.