I am wondering how to handle this specific case.
I have two ClientService that I want to provide to to a web application. By clientService, I mean client API that calls some external rest service. So we are in spray client here.
The thing is, one of the two service can be quite intensive and time consuming but less frequently called than the other one, which will be quicker but with very frequent calls.
I was thinking of having two dispatchers for the two clientService. Let's say we have the query API (ClientService1) and the classification API (ClientService2)
Both service shall indeed be based on the same actor system. So in other words, I would like to have two dispatcher in my actor system, then pass them to spray via the client-level api, for instance pipeline.
Is it feasible, scalable and appropriate?
What would you recommend instead to use one dispatcher but with a bigger thread pool?
Also, how can I obtain a dispatcher?
Should I create a threadpool executor myself and get a dispatcher out
of it?
How do I get an actor system to load/create multiple dispatcher, and
how to retrieve them such that to pass them to the pipeline method?
I know how to create an actor with a specific dispatcher, there are example for that, but that is a different scenario. I would not like to have lower than the client level API by the way
Edit
I have found that the system.dispatchers.lookup method can create one. So that should do.
However the thing that is not clear is related to AKK.IO/SPRAY.IO.
The manager IO(HTTP): it is not clear to me on which dispatcher it runs or if it can be configured.
Moreover, let's say I pass a different execution context to the pipeline method. What happens? I will still have IO(HTTP) running on the default execution context or its own (I don't know how it is done internally) ? Also what exactly will be ran on the execution context that I pass ? (in other words, which actors)
Related
So I have a background in working with event sourcing and microservices. And usually the best way to enforce bounded context yet be able to make your aggregate communicate is to have either some kind of Choreography or some kind of Orchestration.
In a Choreography Aggregate A will raise eventA that Aggregate B will listen to and handle and then after doing whatever needed to be done, it will raise eventB and A will listen and handle it and proceed. It's effective, respects event-sourcing and DDD rules.
In an Orchestration, Aggregate A will raise eventA that the orchestrator O will listen and handle and then issue a command B to Aggregate B which in return will run what's needed and raise Event B, orchestrator O will handle that event and issue a command A and so on... It adds a level of complexity but it's great for an added level of segregation, also this way Aggregate A and B are not listening/handling each other events.
Obvs these 2 methods have their own pros and cons, but both work perfectly in a microservice context.
The issue I'm facing is that for me there is no context. I'm working with AWS lambdas, whenever an event is pushed to the store I will have a lambda listening to db(event store) changes and then do something. It was working perfectly until I needed to add a second aggregate.
And now to achieve a choreography or an orchestration, I either need a context(which is not a thing for lambdas) and an event bus, or I need to add a lambda for every event, that would lead for total chaos.
Like if Aggregate A needs something from Agg B before continuing its flow it will push an event to the event store and I will have to handle the event with a new lambda so for every type of interaction between Agg A and Agg B, I will need 2 lambdas.
Maybe I'm missing something, after all I'm new in AWS lambdas and more used to working with microservices.
Perhaps what you're after is a Process Manager:
(…) a process manager is a class that coordinates the behavior of the aggregates in the domain. A process manager subscribes to the events that the aggregates raise, and then follow a simple set of rules to determine which command or commands to send. The process manager does not contain any business logic; it simply contains logic to determine the next command to send. The process manager is implemented as a state machine, so when it responds to an event, it can change its internal state in addition to sending a new command.
The above definition can be found in Microsoft's Exploring CQRS and Event Sourcing. This is a way of having orchestration in an event-driven system. The original definition, AFAIK, can be found in Gregor Hohpe's Enterprise Integration Patterns.
In AWS land, you'd have your lambda(s) reacting to those events and firing off commands (either via a command bus, if you have such concept in your system, or by directly invoking other lambdas).
Imagine I have a JobsEndpoint class, which contains a JobSupervisor class which then has two child actors, RepositoryActor and StreamsSupervisorActor. The behavior for different requests to this top level JobSupervisor will need to be performed in the appropriate child actor. For example, a request to store a job will be handled exclusively in the RepositoryActor, etc...
My question, then, is if it is an anti-pattern to pass the request context through these actors via the messages, and then complete the request as soon as it makes sense?
So instead of doing this:
Request -> Endpoint ~ask~> JobSupervisor ~ask~> RepositoryActor
Response <- Endpoint <- JobSupervisor <-|return result
I could pass the RequestContext in my messages, such as StoreJob(..., ctx: RequestContext), and then complete it in the RepositoryActor.
I admittedly haven't been using Akka long but I see a few opportunities for improvement.
First, you are chaining "ask" calls which block threads. In some cases it's unavoidable but I think in your case, it is avoidable. When you block threads, you're potentially hurting your throughput.
I would have the Endpoint send a message with it's ActorRef as a "reply to" field. That way, you don't have to block the Endpoint and JobSupervisor actors. Whenever Repository actor completes the operation, it can send the reply directly to Endpoint without having to traverse middlemen.
Depending on your messaging guarantee needs, the Endpoint could implement retrying and de-duplicating if necessary.
Ideally each actor will have everything it needs to process a message in the message. I'm not sure what your RequestContext includes but I would consider:
1) Is it hard to create one? This impacts testability. If the RequestContext is difficult to create, I would opt for pulling out just the needed members so that I could write unit tests.
2) Can it be serialized? If you deploy your actor system in a cluster environment, then you'll need to serialize the messages. Messages that are simple data holders work best.
I have a use case where the client app will make a REST call with MULTIPLE product identifiers and get their details. I have a lambda function(exposed by API gateway) which can take a SINGLE product id and get it's detail. I need to get them to work together.. What is a good solution for this?
Modify the client app so it makes single product Id requests. No change in Lambda function required then. But this increases client app network calls as it will invoke the lambda for each productId separately..
Modify the lambda function so that it can handle multiple product id's in the same call.. But this increases lambda response time..
I was also thinking about creating a new lambda function which takes in the multiple productId's and then calls the single product lambda function.. But not sure how to aggregate the responses before sending back to client app.
Looking for suggestions..
Option 1 is less optimal as it will move the chatty protocol between the client to server.
If there are no users of a single Id then I would not keep that one alive. (not option #3)
Depending on your language of choice - it might be best to implement 2 (or 3 for this matter). node.js (and C# too btw) makes it very easy to perform multiple calls in parallel (async calls) and then wait for all results and return to the client.
This means that you will not wait X time more - just a bit more, aligned with your slowest call.
ES6 (modern JS, supported by Lambda) now supports Promise.all() for this purpose.
C# also natively supports these patterns with Task.WaitAll()
Suppose I have a the following two Actors
Store
Product
Every Store can have multiple Products and I want to dynamically split the Store into StoreA and StoreB on high traffic on multiple machines. The splitting of Store will also split the Products evenly between StoreA and StoreB.
My question is: what are the best practices of knowing where to send all the future BuyProduct requests to (StoreA or StoreB) after the split ? The reason I'm asking this is because if a request to buy ProductA is received I want to send it to the right store which already has that Product's state in memory.
Solution: The only solution I can think of is to store the path of each Product Map[productId:Long, storePath:String] in a ProductsPathActor every time a new Product is created and for every BuyProduct request I will query the ProductPathActor which will return the correct Store's path and then send the BuyProduct request to that Store ?
Is there another way of managing this in Akka or is my solution correct ?
One good way to do this is with Akka Cluster Sharding. From the docs:
Cluster sharding is useful when you need to distribute actors across
several nodes in the cluster and want to be able to interact with them
using their logical identifier, but without having to care about their
physical location in the cluster, which might also change over time.
There is an Activator Template that demonstrates it here.
To your problem, the concept of StoreA and StoreB are each a ShardRegion and map 1:1 with to your cluster nodes. The ShardCoordinator manages distribution between these nodes and acts as the conduit between regions.
For it's part, your Request Handler talks to a ShardRegion, which routes the message if necessary in conjunction with the coordinator. Presumably, there is a JVM-local ShardRegion for each Request Handler to talk to, but there's no reason that it could not be a remote actor.
When there is a change in the number of nodes, ShardCoordinator needs to move shards (i.e. the collections of entities that were managed by that ShardRegion) that are going to shut down in a process called "rebalancing". During that period, the entities within those shards are unavailable, but the messages to those entities will be buffered until they are available again. To this end, "being available" means that the new ShardRegion responds to a directed message for that entity.
It's up to you to bring that entity back to life on the new node. Akka Persistence makes this very easy, but requires you to use the Event Sourcing pattern in the process. This isn't a bad thing, as it can lead to web-scale performance much more easily. This is especially true when the database in use is something like Apache Cassandra. You will see that nodes are "passivated", which is essentially just caching off to disk so they can be restored on request, and Akka Persistence works with that passivation to transparently restore the nodes under the control of the new ShardRegion – essentially a "move".
I have an application which uses spray-servlet to bootstrap my custom Spray routing Actor via spray.servlet.Initializer. The requests are then handed off to my Actor via spray.servlet.Servlet30ConnectorServlet.
From what I can gather, the Servlet30ConnectorServlet simply retrieves my Actor out of the ServletContext that the Initializer had set when the application started, and hands the HttpServletRequest to my Actor's receive method. This leads me to believe that only one instance of my Actor will have to handle all requests. If my Actor blocks in its receive method, then subsequent requests will queue waiting for it to complete.
Now I realize that I can code my routing Actor to use detach() or a complete that returns a Future, however most of the documentation never alludes to having to do this.
If my above assumption is true (single Actor instance handling all requests), is there a way to configure the Servlet30ConnectorServlet to perhaps load balance the incoming requests amongst multiple instances of my routing Actor instead of just the one? Or is this something I'll have to roll myself by subclassing Servlet30ConnectorServlet?
I did some research and now I understand better how spray-servlet is working. It's not spray-servlet that dictates the strategy for how many Request Handler Actors are created but rather the plumbing code provided with the example I based my application on.
My assumption all along was that spray-servlet would essentially work like a traditional Java EE application dispatcher in a handler-per-request type of fashion (or some reasonable variant of that notion). That is not the case because it is routing the request to an Actor with a mailbox, not some singleton HttpServlet.
I am now delegating the requests to a pool of actors in order to reduce our potential for bottleneck when our system is under load.
val serviceActor = system.actorOf(RoundRobinPool(config.SomeReasonableSize).props(Props(Props[BootServiceActor])), "my-route-actors")
I am still a bit baffled by the fact that the examples and documentation assumes everyone would be writing non-blocking Request Handler Actors under spray. All of their documentation essentially demonstrates non-Future rendering complete, yet there is no mention in their literature that maybe, just maybe, you might want to create a reasonable sized pool of Request Handler Actors to prevent a slew of requests from bottle necking the poor single overworked Actor. Or it's possible I've overlooked it.