What would be the Pattern to display all existing Actors - akka

I programmed an Akka Application that realises Device Management. Every device is an Akka Actor and I implemented Akka Finite State Machine to control the lifecycle of Device, like FUNCTIONAL, BROKEN, IN_REPAIRS, RETIRED, etc...and I persist the devices with Akka Persistence to Cassandra.
Everything works like a dream but I have dilemma and I like to ask what would be pattern to deal with Akka.
I would nearly have 1 000 000 Devices, Akka is ideal to manage those single instances but how I implement that if user one to see all devices system and select one, change it is state...
I can't show it from Akka Journal table, I would not be able show anything other than persistenceId.
So how would you handle this dilemma.
My current plan, while all events coming to my system from Kafka, consume also these messages from Topic and redirect those to Solr/Elasticsearch, so I can index it some metadata with persistenceId, so user can select a Device to process with Akka Actor.
Do you have a better idea or how do you solve this idea?
Another option to save this information Cassandra to another Keyspace but for some reason I don't fancy it.....
Thx for answers...

Akka persistence is for managing Actor state so that it can be resilient with failures of application ( https://www.reactivemanifesto.org/).May not be optimal for using it for business cases. I understood that your requirement is to able to browse Actors in system. I see couple of options:
Option1:
Akka supports feature called named actors (https://doc.akka.io/docs/akka/current/general/addressing.html). In your case you have device to Actor as one to one mapping. So you can take advantage of this using with names actors feature. During the actors creation in actor system ,you apply this pattern so that all your actors in system are named with device ids.Now you can browse all your device ids (As this is your use case details, you can have searchable module using Solar/Elastic Search as you mentioned). Whenever browsing devices means you are browsing Actors in your system. You can use this named actor path to retrieve actor from system and do some actions.
Option2:
You can use monitoring tools for trace/browse actors in the application. Beyond your need it provides several other useful metrics.
https://www.lightbend.com/blog/akka-monitoring-telemetry
https://kamon.io/solutions/monitoring-for-akka/

Akka Persistence is heavily oriented to the Command-Query Responsibility Segregation style of implementing systems. There are plenty of great outlines describing this pattern if you want more depth, but the broad idea is that you divide responsibility for changing data (the intent to change data being modeled through commands) from responsibility for querying data. In some cases this responsibility carries through to separately deployed services, but it doesn't have to (the more separated, in terms of deployment/operations or development, the less coupled they are, so there's a cost/benefit tradeoff for where you want to be on the level-of-segregation spectrum).
Typically the portion of the system which is handling commands and deciding how (or even if) a given command updates state is often called the "write-side". In your application, the FSM actors modeling the state of a device and persisting changes would be the write-side, and you seem to have that part down pat.
The portion handling the queries is, correspondingly, often called the "read-side", and one key benefit is that it can use a different data model than the write-side, up to and including using a different data store (e.g. Solr/Elasticsearch).
Since you're using Akka Persistence and event-sourcing (judging from mentioning the journal table), Akka Projections provides a good opinionated wrapper for publishing events from the write-side to Kafka for another service to update a Solr/Elasticsearch read-side with. It does require (at least at this time) that your write-side tag events; with some effort you can do something similar by combining the persistenceIds and eventsByPersistenceId query streams to feed events from the write-side to Kafka without having to tag.
Note that when going down the CQRS path, you are generally committing to some level of eventual consistency between the write-side and the read-side.

Related

Akka cluster sharding: do shard entities share a journal?

I am following an akka tutorial demonstrating cluster sharding. In the cluster sharding example, the author starts up a shared journal and makes the following comment:
// Start the shared journal one one node (don't crash this SPOF)
// This will not be needed with a distributed journal
the journal used is:
journal.plugin = "akka.persistence.journal.leveldb-shared"
Why do shard entities share a journal? my understanding is that Akka persistence doesn't support multiple writes but does support multiple reads. what is the need for a shared journal? I was under the impression that each persistent actor has its own journal. Why would the non-shared LeveldbJournal not support distribute reads? Is there any difficulty with doing that?
The tutorial is based on Akka 2.4 and in this version, cluster sharding uses persistence as a default for akka.cluster.sharding.state-store-mode. In this example, what component exactly uses the snapshop/journal support? is it the Persistent actor in different shards or it is information about the shards relating to its replication? What exactly needs to be distributed? I find the relevant documentation vague and confusing.
If I were to have only one shard, do I need to have a distributed journal?
A somewhat related question: I have reimplemented the now deprecated PersistentView based on PersistenceQuery. I can query the journal for the events from a persistentActor and setup a stream to receive its persisted events. I have tested it and it works. However I can't get it to receive the events in a sharded actor in my test environment with InMemoryJournalStorage (which I don't believe is a distributed journal). In my test scenario, I only have one shard and one actor and I use the unique persistenceId for the actor to query it, but I don't receive any events on the read side. Is there something I am missing about getting Akka persistence to work with cluster sharding? Should I be append/prepending the persistenceId used to query for events?
They shouldn't, at least not in production code, see the warning note here:
http://doc.akka.io/docs/akka/current/java/persistence.html#shared-leveldb-journal
A shared LevelDB instance is a single point of failure and should therefore only be used for testing purposes.
Both
Yes, if you wanted failover to work. If you didn't want failover and all you had was a single shard, then there would be no point using sharding at all.
Can't tell without seeing some of your code.

Microservice Composition Approaches

I have a question for the microservices community. I'll give an example from the educational field but it applies to every microservices architecture.
Let's say I have student-service and licensing-service with a business requirement that the number of students is limited by a license. So every time a student is created a licensing check has to be made. There are multiple types of licenses so the type of the license would have to be included in the operation.
My question is which approach have you found is better in practice:
Build a composite service that calls the 2 services
Coupling student-service to licensing-service so that when createStudent is called the student-service makes a call to licensing-service and only when that completes will the student be created
Use an event-based architecture
People talk about microservice architectures being more like a graph than a hierarchy and option 1 kinda turns this into a hierarchy where you get increasingly coarse composites. Other downsides is it creates confusion as to what service clients should actually use and there's some duplication going on because the composites API would have to include all of the parameters that are needed to call the downstream services.
It does have a big benefit because it gives you a natural place to do failure handling, choreography and handle consistency.
Option 2 seems like it has disadvantages too:
the API of licensing would have to leak into the student API so that you can specify licensing restrictions.
it puts a lot of burden on the student-service because it has to handle consistency across all of the dependent services
as more services need to react when a student is created I could see the dependency graph quickly getting out of control and the service would have to handle that complexity in addition to the one from its own logic for managing students.
Option 3 While being decoupling heaven, I don't really think would work because this is all triggered from an UI and people aren't really used to "go do something else until this new student shows up" approach.
Thank you
Option 1 and 2 creates tight coupling which should be avoided as much as possible because you would want to have your services to be independent. So the question becomes:
How do we do this with an event-based architecture?
Use events to keep track of licensing information from license service in student service, practically a data duplication. Drawbacks here are: you only have eventual consistency as the data duplication is asynchronous.
Use asynchronous events to trigger event chain which ultimately trigger a student creation. From your question, it looks like you already got the idea, but have an issue dealing with UI. You have two possible options here: wait for the student creation (or failure) event with a small amount of timeout, or (event better), make you system completely reactive (use server-client push mechanism for the UI).
Application licensing and creating students are orthogonal so option 2 doesn't make sense.
Option 1 is more sensible but I would try not to build another service. Instead I would try to "filter" calls to student service through licensing middleware.
This way you could use this middleware for other service calls (e.g. classes service) and changes in API of both licensing and students can be done independently as those things are really independent. It just happens that licensing is using number of students but this could easily change.
I'm not sure how option 3, an event-based approach can help here. It can solve other problems though.
IMHO, I would go with option 2. A couple of things to consider. If you are buying complete into SOA and furthermore microservices, you can't flinch everytime a service needs to contact another service. Get comfortable with that.... remember that's the point. What I really like about option 2 is that a successful student-service response is not sent until the license-service request succeeds. Treat the license-service as any other external service, where you might wrap the license-service in a client object that can be published by the license-service JAR.
the API of licensing would have to leak into the student API so that you can specify licensing restrictions.
Yes the license-service API will be used. You can call it leakage (someone has to use it) or encapsulation so that the client requesting the student-service need not worry about licensing.
it puts a lot of burden on the student-service because it has to handle consistency across all of the dependent services
Some service has to take on this burden. But I would manage it organically. We are talking about 1 service needing another one. If this grows and becomes concretely troublesome then a refactoring can be done. If the number of services that student-service requires grows, I think it can be elegantly refactored and maybe the student-service becomes the composite service and groups of independently used services maybe be consolidated into new services if required. But if the list of dependency services that student-service uses is only used by student-service, then I do not know if its worth grouping them off into their own service. I think instead of burden and leakage you can look at it as encapsulation and ownership.... where student-service is the owner of that burden so it need not leak to other clients/services.
as more services need to react when a student is created I could see the dependency graph quickly getting out of control and the service would have to handle that complexity in addition to the one from its own logic for managing students.
The alternative would be various composite services. Like my response for the previous bullet point, this can be tackled elegantly if it surfaces as a real problem.
If forced each of your options can be turned into viable solution. I am making an opinionated case for option 2.
I recommend option 3. You have to choose between availability and consistency - and availability is most often desired in microservices architecture.
Your 'Student' aggregate should have a 'LicenseStatus' attribute. When a student is created, its license status is set to 'Unverfied', and publishes an event 'StudentCreated'. The LicenseService should then react to this event and attempt to reserve a license for this student. It would then publish a 'Reserved' or 'Rejected' event accordingly. The student service would update the student's status by subscribing to these events.
When the UI calls your API gateway to create a student, the gateway would simply call the Student service for creation and return a 202 Accepted or 200 OK response without having to wait for the student to be properly licensed. The UI can notify the user when the student is licensed through asynchronous communication (e.g. via long-polling or web sockets).
In case the license service is down or slow, only licensing would be affected. The student service would still be available and would continue to handle requests successfully. Once the license service is healthy again, the service bus will push any pending 'StudentCreated' events from the queue (Eventual consistency).
This approach also encourages expansion. A new microservice added in the future can subscribe to these events without having to make any changes to the student or license microservices (Decoupling).
With option 1 or option 2, you do not get any of these benefits and many of your microservices would stop working due to one unhealthy microservice.
I know the question has been asked a while ago, but I think I have something to say that might be of value here.
First of all, your approach will depend on the overall size of your final product. I tend to go with a rule of thumb: if I would have too many dependencies between individual micro-services, I tend to use something that would simplify and possibly remove these dependencies. I don't want to end up with a spider-web of services! A good thing to look at here are Message queues, like RabbitMQ for example.
However, if I have just a few services that talk to each other, I will just make them call each other directly, as any alternative solutions whilst simplifying the architecture, add some computing and infrastructure overhead.
Whatever approach you will decide to go with, design your services in a Hexagonal architecture in mind! This will save you trouble when you decide to migrate from one solution to another. What I tend to do is design my DAOs as "adapters", so a DAO that calls Service A will either call it directly or via message queue, independent of the business logic. When I need to change it, I can just change this DAO for another one, without having to touch any of the business logic (at the end of the day business logic doesn't care how it gets the data). Hexagonal architecture fits really well with micro-service, TDD and black-box testing.

How to enforce entity dependencies in SOA environment - build / download?

When establishing several modular and independent services, I am challenged with dependencies / stored relationships between entities. Consider Job Position and Employee. In my system, the Employee's Assignment is linked (URI) to the Job Position.
For our application, the Job Positions would be managed by a separate service than the Employee service, which leads to the challenge of constraints to prevent inadvertent removal of a Job Position if an employee is already matched to that position.
I've designed a custom solution leveraging a Registry (which should have dependency details, etc.) and enforce a paradigm across the inter-dependent services, however it is complex. In the SOA environment, how could one manage these inter-dependencies?
Many thanks in advance!
In some ways your question could be rephrased as "How to enforce referential integrity in SOA environment". Well the answer is you can't. That's kind of a by-product of the Autonomous in the tenets of SOA.
So almost by definition, the Job Position in the Employee service is not the same thing as the Job Position in the Job Position service. This is actually a good thing. Even though both services define Job Position, they do so from two different capabilities, and are free to develop and evolve their capability as needs arise.
So, hard constraints on the removal of data within one service boundary based on the existence of similar data inside another service boundary are just not possible (or even desirable).
This is all very well, but then how do you avoid the situation where Employees may be "matched" to a Job Position which has changed in some way, either via removal or update?
Well, services can be interested in changes to other services. And in these situations, services can become consumers of each other. It's fairly obvious the Employee capability would be interested in changes to the Job Position capability.
Events are actually a fairly well used design pattern for this scenario. If a business action results in a change the data of a service, that service can publish an event message which describes the change. Other services can become consumers of this type of event and can handle it in their own fashion. Because eventing is usually implemented with a pub-sub semantic, any service capability which so desires can subscribe to the event.
In your example, the event which could be published if a job position was deleted could be defined as (using C#):
class JobPositionRemoved
{
int JobPositionId { get; set; }
string JobPositionName { get; set; }
...
}
How a consumer of this event actually handles it (what action would be taken by the consumer) is another question and would depend on the capability of the consumer. As an example, your Employee service could gather a list of the Employees with this job position and flag them for review, or add them to a queue for "job position reassignment".
Your event could even include a field called int ReplacedByJobPosition which would enable consumers to automatically update any capability that depended on the removed job position.
As long as your event is delivered across a fault-tolerant transport (such as message queuing), you can be fairly confident that while you won't have referential integrity between your service capabilities, your system as a whole should become consistent eventually.
By using events in this way, you also avoid the need for a centralized registry of inter-dependencies (which sounds like a nasty idea). Each service is responsible for publishing events about changes to it's own data, and dependencies are defined by services consuming events from each other.
Hope this is helpful.
EDIT
In answer to your comment - while I can see the benefit of having another service taking care of the position:reassignment problem and I don't see any massive problems with this, there are a few considerations.
One of the reasons why service boundaries and business capability boundaries are a natural fit is that when you change a business capability (eg a change in Billing procedure) it does not generally impact other business capabilities (CRM/Finance/etc). By introducing shared services you're coupled to more than one capability, your service doesn't have well defined boundaries, and as a result has a higher cost of ownership as it will need to be changed a lot.
Additionally you could argue that the consumer of a business event (eg, JobPositionRemoved) should take responsibility for the entire handling of that event.
The handling of the event may well trigger a subsequent event to be published (such as ReviewTaskCreatedForEmployeeChange) which can then be handled by another consumer (eg a workflow tool) if desired.

Akka clustering - force actors to stay on specific machines

I've got an akka application that I will be deploying on many machines. I want each of these applications to communicate with each others by using the distributed publish/subscribe event bus features.
However, if I set the system up for clustering, then I am worried that actors for one application may be created on a different node to the one they started on.
It's really important that an actor is only created on the machine that the application it belongs to was started on.
Basically, I don't want the elasticity or the clustering of actors, I just want the distributed pub/sub. I can see options like singleton or roles, mentioned here http://letitcrash.com/tagged/spotlight22, but I wondered what the recommended way to do this is.
There is currently no feature in Akka which would move your actors around: either you programmatically deploy to a specific machine or you put the deployment into the configuration file. Otherwise it will be created locally as you want.
(Akka may one day get automatic actor tree partitioning, but that is not even specified yet.)
I think this is not the best way to use elastic clustering. But we also consider on the same issue, and found that it could to be usefull to spread actors over the nodes by hash of entity id (like database shards). For example, on each node we create one NodeRouterActor that proxies messages to multiple WorkerActors. When we send message to NodeRouterActor it selects the end point node by lookuping it in hash-table by key id % nodeCount then the end point NodeRouterActor proxies message to specific WorkerActor which controlls the entity.

n-tier design with website and backend transaction processor

We have a website, where transactions are entered in and put through a workflow. We are going to follow the standard BLL(Business Logic Layer), DTO(Data Transfer Object), DAL(Data Access Layer) etc. for a tiered application. We have the need to separate everything out because some transactions will cross multiple applications with different business logic.
We also have a backend processor. It handles our transactions once the workflow has been completed. It works with various third party systems, some of which are unstable, or the interface to them is unstable, and then reports the status of the transaction. Each website will have its own version of the backend processor.
Now the question, with N-Tier, they suggest a new BLL for each application. With the layout of the application above, it can be argued that the backend processor and website is one application acting in unison, or two applications with different business logic. What would be the ideal way to handle this? Have it act like one system, or two?
One thing that I picked up on while learning MVC over the last couple years is the difference between what I call application logic and domain logic. I don't like the term business logic anymore, because it has too much baggage from all the conflicting theories and practices that have used that term too loosely.
Domain logic is the "traditional" business logic, how things are supposed to act, what they require (validation), etc. Application logic is anything that is specific to a given presentation of your domain, IE when the user clicks this submit button in your web app then they are directed to this web page over here (note that this has nothing to do with how a WinForms app or a background processor would work). Application logic should live in your application. Domain logic should live in your BLL and lower, and be reusable across the different applications that may use your common "business logic".
Kind of a general answer, but I hope that helps.
You might consider partitioning the functionality to reflect the organization of the stakeholders. Usually if you have two distinct organizational groups, then development and administration requirements are easier to manage if the functionality is similarly partioned. And vise versa.
Most of us don't spend that much time writing applications that explore the outer boundaries of hardware and software capabilities.
If you separate your concerns well then I think that you will be able to view them as the same application with a single business logic layer, there is no point writing the same code twice. The trick will be forcing the separation of concerns between the user interface portions of the website and the business logic in your BLL library.
Performance is going to be an issue as well, you have to ensure that your batch processing doesn't block your website from performing tasks that it needs to perform due to your resources. This may be an argument to keep them more separate, however as they're likely sharing a database anyway (or some other file based resource) then that may be an issue regardless.
I would keep a common business logic library programmed to interfaces and fully separated from your other concerns.
The "Ideal" way to do this depends on the project at hand and the various requirements of the system.
My default design is to have it act as one app. But if there are more heavyweight processes taking place, I like to create a batching process where the parameters of the requested job are stored and acted upon by a seperate process.