What issues can arise when I have an existing akka application that runs on a single node, and then in the future I need to use akka cluster?
Are there any design considerations or best practises I should follow when transitioning from a single node to a clustered akka design? Or do things just work when you use cluster?
The general advice is to start your design as if you were clustering from the start. In particular you need to pay attention to latency differences between cluster members so make sure what you're passing through the network (between actors on different machines) is not "huge". You also need to pay attention to location transparency (Akka solves this), partial failure and concurrency. Each of these things can affect your design in significant ways so it is best to start out considering them and then optimize for the single-node cluster case.
Related Documentation (see also referenced paper from Sun, 1994)
Related
I have a lot of worker nodes in my akka-cluster, which cause Down all when unstable decision due to their instability; But they don't have SBR's role.
Why Down all when unstable decision in not taken based on SBR's
role?
To solve this problem, should i have distinct clusters or use Multi-DC cluster?
The primary constraint a split-brain resolver has to meet is that every node in the cluster reaches the same decision about which nodes need to be downed (including downing themselves). In the presence of different decisions being made, the guarantees of Cluster Sharding and Cluster Singleton no longer apply: there may be two incarnations of the same sharded entity or the singleton might not be a singleton.
Because there's latency inherent to disseminating reachability observations around the cluster, the less time has elapsed since seeing a change in reachability observations, the more likely it is that there's a node in the cluster which would disagree with our node about which nodes are reachable. That disagreement opens the door that node to make a different SBR decision than the one our node would make. The only strategy the SBR has which guarantees that every node makes the same decision even if there's a disagreement about membership or reachability is down-all.
Accordingly, SBR delays making a decision until there's been a long enough time since a cluster membership or reachability change has happened. In a particularly unstable cluster, if too much time has passed without achieving stability, the SBR will then apply the down-all strategy, which does not take cluster roles into account.
If you're not using cluster sharding or cluster singleton (and haven't implemented something with similar constraints...), you might be able to get away with disabling this fallback to down-all (if every bit of distributed state in your system forms a CRDT, for instance, you might be able to get away with this; if you know what a CRDT is, you know and if you don't, that almost certainly means not all distributed state in your system is a CRDT) with the configuration setting
akka.cluster.split-brain-resolver.down-all-when-unstable = off
Think very carefully about this in the context of your application. I would suspect that at least 99.9% of Akka clusters out there would violate correctness guarantees with this setting.
From your question about distinct clusters or Multi-DC, I take it you are spreading your cluster across multiple datacenters. In that case, note that inter-datacenter networking is typically less reliable than intra-datacenter networking. So that means that you basically have three options:
have separate clusters for each datacenter and use "something(s) else" to coordinate between them
use Multi-DC cluster which takes some account of the difference between inter- and intra-datacenter networking (e.g. that while it's possible for node A in some datacenter and node B in that datacenter to disagree on the reachability of a node C in that datacenter, it's highly likely that node A and node B will agree that node D in a different datacenter is reachable or not)
configuring the failure detector for the reliability of the inter-datacenter link (this is effectively treating even nodes in the same rack (or even running on the same physical host or even VM...) as if they were in separate datacenters). This will mean being very slow to declare that a node has crashed (and giving that node a lot of time to say "no, I'm not dead, I was just being quiet/sleepy/etc."). For some applications, this might be a viable strategy.
Which of those 3 is the right option? I think completely separate clusters communicating and coordinating over some separate channel(s) and modeling this in the domain is often useful (for instance, you might be able to balance traffic to the datacenters in such a way that it's highly unlikely you'd need your west coast datacenter to know what's happening on the east coast). Multi-DC might allow for a more consistency than separate clusters. It's probably unlikely that your application requirements are such that multiple DCs within a vanilla single cluster will work well.
Recently in a system design interview I was asked a question where cities were divided into zones and data of around 100 zones was available. An api took the zoneid as input and returned all the restaurants for that zone in response. The response time for the api was 50ms so the zone data was kept in memory to avoid delays.
If the zone data is approximately 25GB, then if the service is scaled to say 5 instances, it would need 125GB ram.
Now the requirement is to run 5 instances but use only 25 GB ram with the data split between instances.
I believe to achieve this we would need a second application which would act as a config manager to manage which instance holds which zone data. The instances can get which zones to track on startup from the config manager service. But the thing I am not able to figure out is how we redirect the request for a zone to the correct instance which holds its data especially if we use kubernetes. Also if the instance holding partial data restarts then how do we track which zone data it was holding
Splitting dataset over several nodes: sounds like sharding.
In-memory: the interviewer might be asking about redis or something similar.
Maybe this: https://redis.io/topics/partitioning#different-implementations-of-partitioning
Redis cluster might fit -- keep in mind that when the docs mention "client-side partitioning": the client is some redis client library, loaded by your backends, responding to HTTP client/end-user requests
Answering your comment: then, I'm not sure what they were looking for.
Comparing Java hashmaps to a redis cluster isn't completely fair, considering one is bound to your JVM, while the other is actually distributed / sharded, implying at least inter-process communications and most likely network/non-local queries.
Then again, if the question is to scale an ever-growing JVM: at some point, we need to address the elephant in the room: how do you guarantee data consistency, proper replication/sharding, what do you do when a member goes down, ...?
Distributed hashmap, using Hazelcast, may be more relevant. Some (hazelcast) would make the argument it is safer under heavy write load. Others that migrating from Hazelcast to Redis helped them improve service reliability. I don't have enough background in Java myself, I wouldn't know.
As a general rule: when asked about Java, you could argue that speed and reliability very much rely on your developers understanding of what they're doing. Which, in Java, implies a large margin of error. While we could suppose: if they're asking such questions, they probably have some good devs on their payroll.
Whereas distributed databases (in-memory, on disk, SQL or noSQL), ... is quite a complicated topic, that you would need to master (on top of java), to get it right.
The broad approach they're describing was described by Adya in 2019 as a LInK store. Linked In-memory Key-value stores allow for application objects supporting rich operations to be sharded across a cluster of instances.
I would tend to approach this by implementing a stateful application using Akka (disclaimer: I am at this writing employed by Lightbend, which employs the majority of the developers of Akka and offers support and consulting services to clients using Akka; as my SO history indicates, I would have the same approach even multiple years before I was employed by Lightbend) along these lines.
Akka Cluster to allow a set of JVMs running an application to form a cluster in a peer-to-peer manner and manage/track changes in the membership (including detecting instances which have crashed or are isolated by a network partition)
Akka Cluster Sharding to allow stateful objects keyed by ID to be distributed approximately evenly across a cluster and rebalanced in response to membership changes
These stateful objects are implemented as actors: they can update their state in response to messages and (since they process messages one at a time) without needing elaborate synchronization.
Cluster sharding implies that the actor responsible for an ID might exist on different instances, so that implies some persistence of the state of the zone outside of the cluster. For simplicity*, when an actor responsible for a given zone starts, it initializes itself from datastore (could be S3, could be Dynamo or Cassandra or whatever): after this its state is in memory so reads can be served directly from the actor's state instead of going to an underlying datastore.
By directing all writes through cluster sharding, the in-memory representation is, by definition, kept in sync with the writes. To some extent, we can say that the application is the cache: the backing datastore only exists to allow the cache to survive operational issues (and because it's only in response to issues of that sort that the datastore needs to be read, we can optimize the data store for writes vs. reads).
Cluster sharding relies on a conflict-free replicated data type (CRDT) to broadcast changes in the shard allocation to the nodes of the cluster. This allows, for instance, any instance to handle an HTTP request for any shard: it simply forwards a representation of the important parts of the request as a message to the shard which will distribute it to the correct actor.
From Kubernetes' perspective, the instances are stateless: no StatefulSet or similar is needed. The pods can query the Kubernetes API to find the other pods and attempt to join the cluster.
*: I have a fairly strong prior that event sourcing would be a better persistence approach, but I'll set that aside for now.
When building a microservice oriented application, i wonder what could be the appropriate microservice granularity.
Let's image an application consisting of:
A set of various resources types where each resource map a given business model. (ex: In a todo app resources could be User, TodoList and TodoItem...)
Each of those resources are saved within a NoSQL database that could be replicated.
Each of those resources are exposed through a REST Api
The application manage an internal chat room.
An Api gateway for gathering chat room and REST api interaction.
The application front end: an SPA application connected to the API Gateway
The first (and naive) approach when thinking about how microservices could match the need of this application would be:
One monolith service for managing EVERY resources and business logic:
By managing i mean providing the REST API for all of those resources and handling the persistance of those resources within the database.
One service for each Database replica.
One service providing the internal chat room using websocket or whatever.
One service for Authentification.
One service for the api gateway.
One service serving the static assets for the SPA front end.
An other approach could be to split service 1 into as many service as business models exist in the system. (let's call those services resource services)
I wonder what are the benefit of this second approach.
In fact i see a lot of downsides with this approach:
Need to setup an inter service communication process.
When requesting a service representing resource X that have a relation with resource Y, a lot more work are needed (i.e: interservice request)
More devops work.
More difficulty to share common code between resource services.
Where to put business logic ?
When starting a fresh project this second approach seams to me a bit of an over engineered work.
I feel like starting with the first approach and THEN split the monolith resource service into several specific services depending on the observed needs will minimize the complexity and risks.
What's your opinions regarding that ?
Is there any best practices ?
Thanks a lot !
The first approach is not microservice way, by definition.
And yes, idea is to split - each service for Bounded Context - One for Users, one for Inventory, Todo things etc etc.
The idea of microservices, at very simple, assumes:
You want to pay extra dev-ops work for modularity, and complete/as much as possible removal of dependencies between different bounded contexts (see dev/product/pjm teams).
It's idea lies around ownership, modularity, allowing separate teams develop their own piece of code, without requirement from them to know the rest of the system . As long as there is Umbiqutious Language (common set of conventions/communication protocols/terminology/documentation) they can work in completeley isolated, autotonmous fashion.
Maintaining, managing, testing, and develpoing become much faster - in cost of initial dev-ops and sophisticated architecture engeneering investment.
Sharing code should be minimal, and if required, could be done to represent the Umbiqutious Language (common communication interface/set of conventions). Sharing well-documented code, which acts as integration/infrastructure mini-framework, and have special dev/dev-ops/team attached to it ccould be easy business, as long as it, as i said, well-documented, and threated as separate architecture-related sub-project.
Properly engeneered Microservice architecture could lessen maintenance and development times by huge margin, but it requires quite serious reason to use it (there lot of reasons, and lots of articles on that, I wont start it here) and quite serious engeneering investment at start.
It brings modularity, concept of ownership, de-coupling of different contexts of your app.
My personal advise check if you really need MS architecture. If you can not invest engenerring though and dev-ops effort at start and do not have proper reasons for such system - why bother?
If you do need MS, i would really advise against the first method. You will develop wrong thing's, will miss the true challenges of MS, and could end with huge refactor, which could take more work than engeneering MS system from start properly. It's like to make square to make it fit into round bucket later.
Now answering your question title: granularity. (your question body bit different from your post title).
Attach it to Domain Model / Bounded Context. You can make meaty services at start, in order to avoid complex distributed transactions.
First just answer question if you need them in your design/architecture?
If not, probably you did a good design.
Passing reference ids between models from different microservices should suffice, and if not, try to rethink if more of complex transactions could be avoided.
If your system have unavoidable amount of distributed trasnactions, perhaps look towards using/making some CQRS mini-framework as your "shared code infrastructure component" / communication protocol.
It is the key problem of the microservices or any other SOA approach. It is where the theory meets the reality. In general you should not force the microservices architecture for the sake of it. This should rather naturally come from functional decomposition (top-down) and operational, technological, dev-ops needs (bottom-up). First approach is closer to what you would need to do, however at the first step do not focus so much on the technology aspect. Ask yourself why would you need to implement a separate service for particular business function. Treat it as a micro-application with all its technical resources. Ask yourself if there is reason to implement particular function as a full-stack app.
Some, of the functionalities you have mentioned in scenario 1) are naturally ok, such as 'authentication' service - this is probably good candidate.
For the business functions decomposition into separate service, focus on the 'dependencies' problem, if there are too many dependencies and you see that you have to implement bigger chunk of data mode - naturally this is not a micro service any more.
Try to put litmus test , if you can 'turn off' particular functionality and the system still makes sense - it is the candidate for service or further decomposition
Most of the nosql solution only use eventually consistency, and given that DynamoDB replicate the data into three datacenter, how does read after write consistency is being maintained?
What would be generic approach to this kind of problem? I think it is interesting since even in MySQL replication data is replicated asynchronously.
I'll use MySQL to illustrate the answer, since you mentioned it, though, obviously, neither of us is implying that DynamoDB runs on MySQL.
In a single network with one MySQL master and any number of slaves, the answer seems extremely straightforward -- for eventual consistency, fetch the answer from a randomly-selected slave; for read-after-write consistency, always fetch the answer from the master.
even in MySQL replication data is replicated asynchronously
There's an important exception to that statement, and I suspect there's a good chance that it's closer to the reality of DynamoDB than any other alternative here: In a MySQL-compatible Galera cluster, replication among the masters is synchronous, because the masters collaborate on each transaction at commit-time and a transaction that can't be committed to all of the masters will also throw an error on the master where it originated. A cluster like this technically can operate with only 2 nodes, but should not have less than three, because when there is a split in the cluster, any node that finds itself alone or in a group smaller than half of the original cluster size will roll itself up into a harmless little ball and refuse to service queries, because it knows it's in an isolated minority and its data can no longer be trusted. So three is something of a magic number in a distributed environment like this, to avoid a catastrophic split-brain condition.
If we assume the "three geographically-distributed replicas" in DynamoDB are all "master" copies, they might operate with logic along same lines of synchronous masters like you'd find with Galera, so the solution would be essentially the same since that setup also allows any or all of the masters to still have conventional subtended asynchronous slaves using MySQL native replication. The difference there is that you could fetch from any of the masters that is currently connected to the cluster if you wanted read-after-write consistency, since all of them are in sync; otherwise fetch from a slave.
The third scenario I can think of would be analogous to three geographically-dispersed MySQL masters in a circular replication configuration, which, again, supports subtended slaves off of each master, but has the additional problems that the masters are not synchronous and there is no conflict resolution capability -- not at all viable for this application, but for purposes of discussion, the objective could still be achieved if each "object" had some kind of highly-precise timestamp. When read-after-write consistency is needed, the solution here might be for the system serving the response to poll all of the masters to find the newest version, not returning an answer until all masters had been polled, or to read from a slave for eventual consistency.
Essentially, if there's more than one "write master" then it would seem like the masters have no choice but to either collaborate at commit-time, or collaborate at consistent-read-time.
Interestingly, I think, in spite of some whining you can find in online opinion pieces about the disparity in pricing among the two read-consistency levels in DynamoDB, this analysis -- even as divorced from the reality of DynamoDB's internals as it is -- does seem to justify that discrepancy.
Eventually-consistent read replicas are essentially infinitely scalable (even with MySQL, where a master can easily serve several slaves, each of which can also easily serve several slaves of its own, each of which can serve several... ad infinitum) but read-after-write is not infinitely scalable, since by definition it would seem to require the involvement of a "more-authoritative" server, whatever that specifically means, thus justifying a higher price for reads where that level of consistency is required.
I'll tell you exactly how DynamoDB does this. No guessing.
In order for a write request to be acknowledged to the client, the write must be durable on two of the three storage nodes for that partition. One of the two storage nodes MUST be the leader node for that partition. The third storage node is probably updated as well, but on the off chance something happened, it may not be. DynamoDB will get that one updated as soon as it can.
When you request a strongly consistent read, that read comes from the leader storage node for the partition the item(s) are stored in.
I know I'm answering this question long after it has been asked, but I thought a could contribute some helpful information...
In a distributed database the concept of a "master" is not particularly relevant anymore (at least for reads/writes). Each node should be able to perform reads and writes, so that read/write performance increases as the # of machines increases. If you want reads to be correct immediately after a write, the number of machines you write to and then read from must be greater than the total number of machines in the system.
Example: if you only write to 1 machine, then you must read from all of them to ensure that your data is not stale. Or if you write to 2 machines (in this case, quorum) you can perform reads at quorum and guarantee that your data is recent.
NOTE: these assumptions change when a subset of nodes in the system crash.
I've got an akka application that I will be deploying on many machines. I want each of these applications to communicate with each others by using the distributed publish/subscribe event bus features.
However, if I set the system up for clustering, then I am worried that actors for one application may be created on a different node to the one they started on.
It's really important that an actor is only created on the machine that the application it belongs to was started on.
Basically, I don't want the elasticity or the clustering of actors, I just want the distributed pub/sub. I can see options like singleton or roles, mentioned here http://letitcrash.com/tagged/spotlight22, but I wondered what the recommended way to do this is.
There is currently no feature in Akka which would move your actors around: either you programmatically deploy to a specific machine or you put the deployment into the configuration file. Otherwise it will be created locally as you want.
(Akka may one day get automatic actor tree partitioning, but that is not even specified yet.)
I think this is not the best way to use elastic clustering. But we also consider on the same issue, and found that it could to be usefull to spread actors over the nodes by hash of entity id (like database shards). For example, on each node we create one NodeRouterActor that proxies messages to multiple WorkerActors. When we send message to NodeRouterActor it selects the end point node by lookuping it in hash-table by key id % nodeCount then the end point NodeRouterActor proxies message to specific WorkerActor which controlls the entity.