As I understand with akka-cluster, frontend node receives the request and sends the job to one of the backend nodes for executing it. I want to find out for debugging purpose how to know which of the backend node is executing the job?
Does akka also provides some UI where one can look for current job executions happening on different backends?
There is nothing in Akka Cluster that is specifically about work scheduling or frontend and backend nodes, that would just be one application out of many that you can possibly build on top of Akka Cluster and as such, if you would want an UI of some kind you would build that as well for your application.
Related
I have a Microservice application in which I have a set of actors with each actor acting as a digital twin for say a Power producing unit. This actor gets messages from the PowerPlant and for each message I get, I evolve the state of my Actor.
There is another upstream Microservice that actually reads the state of the state machine for each actor. So far so good!
I now want to have redundancy built in. This means that I want multiple instances of my Microservice that contains the StateMachine to be run. The problem now is how will the upstream systems see a consistent state for a single PowerPlant? The upstream systems gets state messages from the time they ask to the Microservice via a HTTP endpoint the current state of the PowerPlant.
I see that there are several possibilities with which I could replicate the Akka actor that contains the StateMachine, but I'm not sure if any of those solutions might work! One such approach is here
https://doc.akka.io/docs/akka/2.5.5/scala/distributed-data.html
Any other suggestions?
EDIT: I have a full running application here - https://github.com/joesan/plant-simulator/ This application contains the StateMachine that I was talking about! It you navigate to the https://github.com/joesan/plant-simulator/tree/master/app/com/inland24/plantsim/services/simulator/onOffType and you will find there an Actor and the corresponding StateMachine. The StateMachine is evolved by the messages the Actor receives!
One classic solution is to use Akka Cluster Sharding coupled with Akka Persistence.
Sharding means that each PowerPlant exists only once in your cluster (no data sync issues), but that if the server it resides on goes down (exits the cluster), the actor is re-created somewhere else. Akka Persistence then makes sure your restore the actor state when it’s recreated.
Documentation:
https://doc.akka.io/docs/akka/current/cluster-sharding.html
https://doc.akka.io/docs/akka/current/persistence.html
We have developed a custom JAX-WS application that essentially achieves two things.
Exposes a few web service methods to perform some functionality.
Utilizes org.quartz.Scheduler to schedule and execute some polling tasks that monitors and processes data on a few database tables. (The logic here is slightly complex, hence a custom application was chosen over the use of WSO2 DSS)
This application is uploaded on WSO2 AS 5.2.1 and runs quite seamlessly. However, I'm unsure what will happen if we have to cluster the AS application server. Logically, I would think that each node will have its own instance of the custom application running within it, and hence its own scheduler. Would this not increase the risk of processing the same record, across both instances. Is my interpretation of the above scenario correct, from a clustering perspective?
Yes.You are correct.In cluster of app server nodes each nodes will have its own instance of the application.In your case each node will have seperate scheduler.You may consider using tasks from ESB 4.9.0. there WSO2 has added coordination support to work in cluster environment.
I want to start several web-server, and every server has a quartz instance for avoiding the job being interrupted by restarting the server.
I found that immutant can config the single job .But when i run the server i found that the scheme use the not-cluster config.And i do not know how to config it.
Immutant has built-in support for singelton jobs, but it requires running your application in a WildFly cluster, and does not use Quartz's clustering functionality.
Quartz clustering requires a JDBC JobStore, and Immutant does not currently expose a way to set a JobStore for the scheduler instance. The clustering works by using the database to lock the job - it would not be difficult to implement something similar yourself, by scheduling the same job on every node in the cluster, and using an external store as a synchronization mechanism, allowing the job to run on only one node at a time.
If you truly need the clustering inplementation in Quartz, or need more control over scheduler creation than Immutant provides, please file an issue against Immutant to have those options exposed. In the interim, you could take a look at Quartzite, I believe it exposes more options for scheduler creation.
On Python/Django stack, we were used to using Celery along with RabbitMQ.
Everything was easily done.
However when we tried doing the same thing in Clojure land, what we could get was Langhour.
In our current naive implementation we have a worker system which has three core parts.
Publisher module
Subscriber module
Task module
We can start the system on any node in either publisher or subscriber mode.
They are connected to RabbitMQ server.
They share one worker_queue.
What we are doing is creating tasks in Task module, and then when we want to run a task on subscriber. we send an expression call to the method, in EDN format to Subscriber which then decodes this and runs the actual task using eval.
Now is using eval safe ? we are not running expressions generated by user or any third party system.Initially we were planning to use JSON to send the payload message but then EDN gave us a lot more flexibility and it works like a charm, as of now.
Also is there a better way to do this ?
Depends on you needs (and your team), I highly suggest Storm Project. You will get a distributed, fault tolerant and realtime computation and it is really easy to use.
Another nice thing in Storm that it supports a plethora of options as the datasource for the topologies. It can be for example: Apache Kafka, RabbitMQ, Kestrel, MongoDB. If you aren't satisfied, then you can write your own driver.
It is also has a web interface to see what is happening in your topology.
I have never used celery before and I'm also a django newbie so I'm not sure if I should use celery in my project.
Brief description of my project:
There is an API for sending (via SSH) jobs to scientific computation clusters. The API is an abstraction to the different scientific job queue vendors out there. http://saga-project.github.io/saga-python/
My project is basically about doing a web GUI for this API with django.
So, my concern is that, if I use celery, I would have a queue in the local web server and another one in each of the remote clusters. I'm afraid this might complicate the implementation needlessly.
The API is still in development and some of the features aren't fully finished. There is a function for checking the state of the remote job execution (running, finished, etc.) but the callback support for state changes is not ready. Here is where I think celery might be appropriate. I would have one or several periodic task(s) monitoring the job states.
Any advice on how to proceed please? No celery at all? celery for everything? celery just for the job states?
I use celery for similar purpose and it works well. Basically I have one node running celery workers that manage the entire cluster. These workers generate input data for the cluster nodes, assign tasks, process the results for reporting or generating dependent tasks.
Each cluster node is running a very small python server which takes a db id of it's assigned job. It then calls into the main (http) server to request the data it needs and finally posts the data back when complete. In my case, the individual nodes don't need to message each other and run time of each task is very long (hours). This makes the delays introduced by central management and polling insignificant.
It would be possible to run a celery worker on each node taking tasks directly from the message queue. That approach is appealing. However, I have complex dependencies that are easier to work out from a centralized control. Also, I sometimes need to segment the cluster and centralized control makes this possible to do on the fly.
Celery isn't good at managing priorities or recovering lost tasks (more reasons for central control).
Thanks for calling my attention to SAGA. I'm looking at it now to see if it's useful to me.
Celery is useful for execution of tasks which are too expensive to be executed in the handler of HTTP request (i.e. Django view). Consider making an HTTP request from Django view to some remote web server and think about latencies, possible timeouts, time for data transfer, etc. It also makes sense to queue computation intensive tasks taking much time for background execution with Celery.
We can only guess what web GUI for API should do. However Celery fits very well for queuing requests to scientific computation clusters. It also allows to track the state of background task and their results.
I do not understand your concern about having many queues on different servers. You can have Django, Celery broker (implementing queues for tasks) and worker processes (consuming queues and executing Celery tasks) all on the same server.