So, here's the thing. I really like the idea of microservices and want to set it up and test it before deciding if I want to use it in production. And then if I do want to use it I want to slowly chip away pieces of my old rails app and move logic to microservices. This I think I can do using HAProxy and set up different routing based on URLs. So this should be covered.
Then my next biggest concern is that I don't want too much overhead to ensure everything is running smoothly on the infrastructure side. I want preferrably low configuration and the ease of development, testing and deployment.
Now, I want to know what are the benefits and downsides of each styles. Akka (cluster) vs something like Kubernetes (maybe even fabric8 on top of it).
What I also worry about is fault tolerance. I don't know how do you do that with Kubernetes. Do you then have to include some message queue to ensure your messages don't get lost? And then also have multiple queue if one of the queues goes down? Or just retry until queue comes up again? Akka actors already have that right? Retrying and mail boxes? What are the strategies for fault tolerance for microservices? Do they differ for each approach?
Someone please enlighten me! ;)
I don't know much about Akka, but from reading quickly it seems that it is an app framework. Kubernetes is at a bit of a lower-level. Kubernetes runs your containers and manages them for you. We don't have a concept of queues or mailboxes.
Kubernetes will soon have L7 load balancing so you can do URL maps.
As for fault tolerance - kubernetes ensures that your stated intentions are true - run N copies of this container. That container might be an Akka app or might be mysql - dopesn't matter.
There are a bunch of guides on Docker + Akka. Kubernetes makes managing docker containers easier, but the app is still yours :)
Related
I want to know how can we send a message to a particular actor in cluster without knowing the actual path..
For Example...
I have 5 nodes all have same actorsytem and formed a cluster.
Each node will have 2 actors in it all actors across cluster have unique name.
I have an actor system outside the cluster.which not part of this cluster.
I have to send message from this actor system to cluster actors.
How can i map respective message to respective actor in cluster each time.Without knowing the actual path of the actor.
Without cluster sharding how actors inside cluster will accessed by outside cluster.
There is a direct answer to your question (at the end), but I think you need to go back to the docs and rethink your design. One of the primary reasons for this is that while what you are asking is possible, it goes against all of the best practices and many of the features involved are deprecated or not recommended. The best practice is that clusters should be self-contained and clusters should expose via well defined APIs and not actor internals. To quote from the remoting docs:
When building an Akka application, you would usually not use the Remoting concepts directly, but instead use the more high-level Akka Cluster utilities or technology-agnostic protocols such as HTTP, gRPC etc.
Essentially the docs are tell you that what you are trying to do is a bad idea. If two actors need to be able to discover each other, they should be in the same cluster. If for some reason you can't have them in the same cluster, for maintainability reasons you should expose the actor in the cluster via REST/gRPC or some other well defined API rather than trying to allow direct access to the actors.
Similarly, in the section about cluster client the docs say:
Cluster Client is deprecated in favor of using Akka gRPC. It is not advised to build new applications with Cluster Client, and existing users should migrate.
But, here's an attempt to answer the question directly. To quote the first sentence of the Actor discovery docs, "There are two general ways to obtain Actor references: by creating actors and by discovery using the Receptionist."
Unrelated side note: (That is a little misleading though, because there are some other ways in untyped Actors, and you also can obviously just be given a ActorRef. It's not the the docs are wrong, I'm just taking them a little out of context.).
Anyway, that leads to the direct answer to your question: ClientClusterReceptionist. You can find details on how to use it in the Classic Cluster Client docs. It's going to require some untyped actors, but since you are using deprecated approaches that probably isn't an issue.
But, moreso, you really should rethink this because based on your last question, it seems like you really aren't understanding how either clustering or sharding work.
As #david-ogren said, what you need is cluster client. You can read the documentation here. With cluster client, you still need to know at least one of the cluster node address and either:
The name of the actor you're trying to communicate with, if you're trying to communicate with the actor instance directly, or
A predefined topic to publish to. If you went for this route, you will need to make sure that all of the participating actors inside the cluster subscribes to the topic to receive them.
You can see a working example in this GitHub repository.
I have a situation where I have a NodeJs app that runs as an event listener. This NodeJs app listens for external events outside of my application through websocket.
I need each of the events coming in to only be processed once by my Nodejs app.
However, it's also crucial to ensure that this particular NodeJs app instance can auto-scale up/down when needed and is highly available so that it wouldn't be a bottleneck.
Usually, when it comes to scaling and HA, the first thing that come to my mind is to run a few of instances of it with a load balancer, or run multiple containers on something like ECS. Doing so would introduce multiple instances of the Nodejs app and would also mean each of the same events from the websocket will get processed more than once by all the instances/containers which received it.
What would be a good solution and design to tackle such a problem?
Not sure I fully understand the situation here but I think what you are saying is that you have a socket server that emit to other services, however that a single instance, even with dedicated resources is subject to bottlenecks.
Assuming what I have said is in-line with the question what you probably want to look at (not sure if you using socket.io or not) is the redis socket.io package. This will essentially use redis to store the sockets so you can cluster your socket server and not have it sending duplicates or missed users.
To your question about scale, you for sure would want to use containers for this, we actually use digitalocean 'apps' as an easy way to deploy our containers without having to manage Kuberneties and docker images, only downside right now is no auto scale, however scaling out is just a click of a button and with alerts setup we know when to scale up or down.
With this setup, we have our socket server running with managed redis server, when we need more socket server we just tick it up and we have more throughput.
I'm running a Django app in an EC2 instance, which uses RabbitMQ + Celery for task queuing. Are there any drawbacks to running my RabbitMQ node from the same EC2 instance as my production app?
The answer to this questions really depends on the context of your application.
When you're faced with scenarios you should always consider a few things.
Seperation of concerns
Here, we want to make sure that if one of the systems are not responsible for the running of other systems. This includes things like
If the ec2 instance running all the stuff goes down, will the remaining tasks in queue continue running
if my RAM is full, will all systems remain functioning
Can I scale just one segment of my app without having to redesign infrastructure.
By having rabbit and django (with some kind of service, wsgi, gunicorn, waitress etc) all on one box, you loose a lot of resource contingency.
Although RAM and CPU may be abundant, there is a limit to IO, disk writes, network writes etc. This means that if for some reason you have a heavy write function, all other systems may suffer as a result. If you have a heavy write to RAM funciton, the same applies.
So really the downfalls from keeping things in one system that I can see from your question and my own experience are as follows.
Multiple points of failure. If your one instance of rabbit fails, your queues and tasks stop working.
If your app starts generating big traffic, other systems start to contend for recourses.
If any component goes down, that could mean other downtime of other services.
System downtime means complete downtime of all components.
Lots of headaches when your application demands more resources with minimal downtime.
Lots of web traffic will slow down task running
Lots of task running will slow down web requests
Lots of IO will slow down all the things
The rule of thumb that I usually follow is keep single points of failures far from each other - that way you only need to manage those components. A good use case for this would be to use an EC2 instance for your app, another for your workers and another for your rabbit. That way you can apply smaller/bigger instances for just those components if you need to. You can even create AMIs and create autoscaling groups - if it is your use case.
Here are some articles for reference
Seperation of concern
Modern design architectures
Single points of failure
TLDR; If you can run on one EC2 you should but make it easy to scale today.
Both Joshnidhin and Giannis covered the RAM, IO and CPU aspects.
I have run production apps in single instances with containerization and slept with peace of mind that if tomorrow suddenly lots of people want what I have built, I can scale pretty quickly by deploying those containers on different instances instead of one single instance.
Docker allows you to put a limit on CPU consumption and memory usage for each container hence you can also be sure that they will not step into each other.
If we take EC2 instance out of this question it becomes:
Are there any drawbacks in running RabbitMQ Node on the same server as my productions app?
I would say it depends on various things like, kind of workloads and its composition, complexity of the workload, do you expect growth in usage etc.
If your workload is well behaved and the server is big enough for both (app + task q) then why not as there will be only one server to manage. Make sure to protect these 2 process from each other by limiting their system resource usage.
If your traffic is not well behaved then you might want more the one server. In this case having dedicated servers is better (separation of concerns) as you will have to manage more than one server.
Now back to EC2, all the above still apply. EC2 makes horizontal scaling of applications easier so if you have them on separate instance then you can scale them individually and cost effectively. If not when you scale there will be wastage of resources.
I'm playing with newly released Akka.Net 1.0 (congratulations on the release!) so it's all quite new to me, but I'm pretty sure anyone with JVM Akka experience could also chime in since there's nothing that's runtime-dependent in the question.
Let's consider several (for the sake of example, 2) separate services that are a part of a larger system/application. Those services usually do their own things, but cross-service calls are sometimes needed. Let's say that Service 2 can be standalone and has a GetStuff action. Service 1 has a DoSomething action, which has to get the result of GetStuff action first.
What is preferred way of handling that kind of situation when both services can be deployed separately and to different machines?
As I said, I don't know much about Akka, but digging through examples, docs and source I found two options:
Remoting. Separate actor systems in their own services using Remoting to get ActorSelection from remote host. It would be pretty much the same as Remoting docs example, just that two actor systems would be equal 'clients'.
Clustering. I'm trying to wrap my head around that and the most I can figure out right now would be to set up a separate cluster service that would just set up the cluster system, creating a simple listener actor so that the seed node could be properly initialized (?). Then each and every separate actor system created in their own services would join to said cluster system under different role.
Maybe there's yet another solution that I'm not aware of...?
Personally, clustering solution seems harder to grasp and set up at first glance, but maybe there are some significant advantages that I can't see right now.
To reiterate, what is the preferred way of handling such situation and what should I look out for?
Akka.Cluster depends on Akka.Remote - here's what's fundamentally different about them:
Akka.Remote - allows you to connect and communicate with actor systems running in remote processes. Can be totally separate code-bases running entirely different Akka.NET applications ("services", if you will.) All you need to communicate between the two systems is a shared set of message classes that's visible in both processes.
Akka.Cluster - an abstraction on top of Akka.Remote that eliminates the need for each of your service instances to have to know the explicit address of every other possible service instance you might need to connect to. These can be instances of the same services or instances of different services. Enables dynamic discovery of services via a really simple "seed node" strategy.
I recommend you take a look at the Akka.Cluster microservices example that I wrote - it shows how you can use the Akka.Cluster "roles" feature to dynamically make cross-service calls to nodes in a different service without having to explicitly define any of their network addresses. In particular, take a look at how I use "cluster-aware" routers to do this.
I have a Java/Spring application running in the Amazon AWS cloud.
My server instances are using load balancing and runs the same image of a Linus OS, with a Tomcat application server.
They are also connected to S3 as a shared file system (s3fs), and an RDS database.
My concern is to be sure the state of the different applications is synchronized. Today, the point of synchronization is the database, but when memory caching is needed, out of sync problems appear.
The solution I would like to use is to put in place a messaging system between the applications. For specific reasons, I cannot use Amazon SQS service, then JMS seems to fit my needs. After some reading, HornetQ seems also a very good implementation of it. Once an application state change, it communicates the change to all other applications. Each application is producer and consumer of the same queue.
As we are in a dynamic system where servers and IPs are automatically created and deleted, the automatic discovery of instances seems to be the best solution to use.
But in AWS, broadcast is not possible!
For HornetQ, I saw a kind of work around which is using JGroups additionally. But for me, this is a second framework to investigate and learn. Twice the work. And no more an out-of-the-box solution.
What is your opinion? Does anyone already build a solution for similar needs?
Maybe other out-of-the-box solutions exists?
Thanks in advance for your answer!
In my experience you could try to use TCPGOSSIP, that is a HornetQ configuration.
See https://docs.jboss.org/jbossclustering/cluster_guide/5.1/html/jgroups.chapt.html