How to start specific number of workers actors during start? - akka

I created clustered akka app based on.
https://github.com/typesafehub/activator-akka-distributed-workers-java/blob/master/tutorial/index.html
Is there any build in future to run specific number of actors of given type in cluster. Should I create router or there is better way ?
http://doc.akka.io/docs/akka/snapshot/java/routing.html

Yes, have a look at Cluster-aware routers.

Related

Use of redis cluster vs standalone redis

I have a question about when it makes sense to use a Redis cluster versus standalone Redis.
Suppose one has a real-time gaming application that will allow multiple instances of the game and wish to implement
real time leaderboard for each instance. (Games are created by communities of users).
Suppose at any time we have say 100 simultaneous matches running.
Based on the use cases outlined here :
https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
https://redislabs.com/solutions/use-cases/leaderboards/
https://aws.amazon.com/blogs/database/building-a-real-time-gaming-leaderboard-with-amazon-elasticache-for-redis/
We can implement each leaderboard using a Sorted Set dataset in memory.
Now I would like to implement some sort of persistence where leaderboard state is saved at the end of each
game as a snapshot. Thus each of these independent Sorted Sets are saved as a snapshot file.
I have a question about design choices:
Would a redis cluster make sense for this scenario ? Or would it make more sense to have standalone redis instances and create a new database for each game ?
As far as I know there is only a single database 0 for a single redis cluster.(https://redis.io/topics/cluster-spec)
In that case, how would one be able to snapshot datasets for each leaderboard at different times work ?
https://redis.io/topics/cluster-spec
From what I can see using a Redis cluster only makes sense for large-scale monolithic applications and may not be the best approach for the scenario described above. Is that the case ?
Or if one goes with AWS Elasticache for Redis Cluster mode can I configure snapshotting for individual datasets ?
You are correct, clustering is a way of scaling out to handle really high request loads and store tons of data.
It really doesn't quite sound like you need to bother with a cluster.
I'd quite be very surprised if a standalone Redis setup would be your bottleneck before having several tens of thousands of simultaneous players.
If you are unsure, you can probably mock some simulated load and see what it can handle. My guess is that you are better off focusing on other complexities of your game until you start reaching quite serious usage. Which is a good problem to have. :)
You might however want to consider having one or two replica instances, which is a different thing.
Secondly, regardless of cluster or not, why do you want to use snap-shots (SAVE or BGSAVE) to persist your scoreboard?
If you want to have individual snapshots per game, and its only a few keys per game, why don't you just have your application read and persist those keys when needed to a traditional db? You can for example use MULTI, DUMP and RESTORE to achieve something that is very similar to snapshotting, but on the specific keys you want.
It doesn't sound like multiple databases is warranted for this.
Multiple databases on clustered Redis is only supported in the Enterprise version, so not on ElastiCache. But the above mentioned approach should work just fine.

How to preserve "counter" variable across multiple server instances?

We're setting up the back-end architecture for our mobile application to connect to, but we need some help. Our application is built to mimic "take a number" tickets you would see at a deli or pharmacy. Users will use our mobile application to send a request to our node controller and our node controller will respond with a spot number.
We currently have our node controller set up on Amazon Elastic Beanstalk and have enabled load balancing to handle any surges in requests. Our question is: how do we persist our spotNumber across multiple instances of our node controller? We have it built now as a local variable that starts at 1 and increments with each request, but will this persist if AWS spins up a new instance of our node controller to handle increased traffic? If not, what would be the best practice for preserving our spotNumber across all potential instances of our server?
Thank you in advance!
Use a database.
Clearly you can't store the value within the node application, not only due to scaling but to prevent data loss if the application shuts down unexpectedly.
It sounds like you don't already have a database, so DynamoDB might be a good choice, as long as your only use case is to share a counter between applications. You can find an example here.
You could also use Redis on Elasticache, but I think that it's overkill for a single counter.
Keeping accurate counters at different scales may require different implementations. At small scale, a simple session variable and locking logic in the application would be enough. However, at a larger scale session synchronization and locking is better managed with a database. In particular for your case, DynamoDB conditional writes or Redis counters seems useful. However, keep your requirements simple and clear, managing counters at scale may require algorithms and data structures with funny names, like the HyperLogLog.

WSO2 APIM Clustering Configuration

I am using WSO2 APIM 1.10.0 on a single server deployment and would like to move to a clustering one. Looking at this documentation I could found a lot of information, howevre something is boring me; do I really have to always do all of it?
I mean, I don't want to split all my workers in multiple instances, all I want is configure two full setup configurations (key manager + publisher + store + gateway), each one on its own host and make sure I can put a load balance in front of it.
Thre requiremenst are simple: I would like to share the load on both of them, and guarantee a better availability in case of one of the hosts goes down. Is it a MUST break down the whole installation on both nodes so I have to start each component independently with offset ports configured?
I coud see that on version 2.0.0 a lot have been simplified, any way to reach the same on 1.10.0 one?
Regards
Splitting into profiles is not mandatory. This is designed in this way to scale API Manager based on the TPS. If you have a low TPS count and prefer to have 2 node HA setup, you can do the following.
Cluster the two nodes using wka, aws, etc.
Use dep-sync to share API artifacts between two nodes.
Use one node as the Publisher. You need to handle the publisher node traffic using single node. This is to avoid getting SVN conflicts.
You can serve API requests from both nodes.
You do not want to always use the same deployment pattern mentioned in the docuemtnation that you have pointed there. There are various Other deployment patterns that you can use according to the scalability and the requirement of yours.
Please refer the following documentation [1] for different deployment patterns you can use for WSO2 API Manager and [2] for more information on worker Manager separation and Load balancing.
[1] https://docs.wso2.com/display/CLUSTER44x/API+Manager+Deployment+Patterns
[2] https://docs.wso2.com/display/CLUSTER44x/Separating+the+Worker+and+Manager+Nodes

Apache storm execute bolts on different machines (designated node)

I want to create a topology in which one spout is there which emits words, and a bolt which based on these words create a directory named with word.
I have two supervisor nodes on and want that if word starts with "a" to "l" the directory is created on one node and on another node otherwise.
e.g if word is 'acknowledgement' then one directory will be created on one node and if word is "machine" then directory will be created on another node.
Please suggest a way to configure storm to achieve this.
I would also like to know if one bolt is enough or if two bolts are deployed how can one manage that one bolt is run on one machine and another on other machine.
P.S. I am using Pyleus(https://github.com/Yelp/pyleus) for creating bolts and spout.
You can use single bolt but two instances of it. Each instance of this bolt runs a each supervisor node. Use custom field grouping functionality to achive the same. Your custom field grouping logic decides to which instance of the bolt this word has to dispatch.
Basically, you can't be sure a Bolt / Spout will be present in a specific worker (JVM). It is part of Storm design: have workers on different hardwares that are similar as Storm decides which bolt / spout instances goes to which worker.
Storm includes an abstraction: topologies are not tied to a cluster and can be rebalanced at runtime. It is wonderfully resilient and performant but you can't make specific code that run on a specific node easily (it would also be an anti-pattern in storm philosophy).
AFAIK Storm uses a mod hash function for the repartition of tasks / executors in workers (managed by supervisors), and you can't overwrite it easily.
So your best bet is to have tasks >= executors >= workers as storm will try to divide equally the load in workers of your cluster.

Akka clustering - force actors to stay on specific machines

I've got an akka application that I will be deploying on many machines. I want each of these applications to communicate with each others by using the distributed publish/subscribe event bus features.
However, if I set the system up for clustering, then I am worried that actors for one application may be created on a different node to the one they started on.
It's really important that an actor is only created on the machine that the application it belongs to was started on.
Basically, I don't want the elasticity or the clustering of actors, I just want the distributed pub/sub. I can see options like singleton or roles, mentioned here http://letitcrash.com/tagged/spotlight22, but I wondered what the recommended way to do this is.
There is currently no feature in Akka which would move your actors around: either you programmatically deploy to a specific machine or you put the deployment into the configuration file. Otherwise it will be created locally as you want.
(Akka may one day get automatic actor tree partitioning, but that is not even specified yet.)
I think this is not the best way to use elastic clustering. But we also consider on the same issue, and found that it could to be usefull to spread actors over the nodes by hash of entity id (like database shards). For example, on each node we create one NodeRouterActor that proxies messages to multiple WorkerActors. When we send message to NodeRouterActor it selects the end point node by lookuping it in hash-table by key id % nodeCount then the end point NodeRouterActor proxies message to specific WorkerActor which controlls the entity.