I am building infrastructure primitives to support workers and http services.
workers are standalone
http services have a web server and a load balancer
The way I understand it, a worker generally pull from an external resource to consume tasks while a service handles inbound requests and talks to upstream services.
Celery is an obvious worker and a web app is an obvious service. The lines can get blurry though and I'm not sure what the best approach is:
Is the worker/service primitive a good idea?
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks? Is this a worker or a service?
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
What about instances of a stateful service that a master service connects to? The master has to know the individual agent instances so we cannot
hide them behind a LB. How would you go about representing that?
Is the worker/service primitive a good idea?
IMO, the main difference between service and worker can be that workers should be only designated to one task but service can perform multiple tasks. A service can
utilize a worker or a chain of workers to process the user request.
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks?
Services can be of different forms like web service, FTP service, SNMP
service or processing service. Writing the processing logic in service may not be a good idea unless it is taking the form of a worker.
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
I believe you are assuming service to be only HTTP based but as I mentioned in the previous answer, services can be of different types. Yes you may write a TCP service for a particular protocol implementation that can be attached behind an NLB
What about instances of a stateful service that a hub service connects to? The hub has to know the individual instances so we cannot hide them behind a LB.
Not sure what you mean by hub here? But good practice for a scalable architecture is to use stateless servers/services behind. The session state should not be stored in the service memory but should be serialized to a data store like DynamoDB.
One way to see the difference is to look at their names - workers do what their name says - they perform (typically) heavy tasks (work) - something you do not want your service to be bothered with, especially if it is a microservice. - For this particular reason, you will rarely see 32-core machines with hundred GBs of RAM running services, but you will very often see them running workers. Finally, they complement each other well. Services off-load heavy tasks to the workers. This aligns with the well-known UNIX philosophy "do one thing, and do it well".
Related
I have two instances of a spring boot microservice (Microservice A) and a load balancer. The Spring boot application is running in AWS Fargate with 2 desired tasks (2 instances of Microservice A).
If I request the microservice A the load balancer leads me at the first request to microservice A instance 1.
The next request will be load balanced to microservice A instance 2. This is alternating on every future request.
On microservice A I have a Spring Service that has a List<String> attribute. On every request I put something in this List.
The List is hold in memory, I dont want to put this List into a database.
In my case every further request fills the list on microservice A instance 1 and microservice A instance 2 alternating. But I want to fill only one List with every future request so my List input is not spreaded across my microservice instances.
So my question is:
Is there any way to share one List between these two microservice instances?
Do I have to establish some kind of communication between them or is there a better solution for this? Or do I have to configure my load balancer?
A solution would be storing it in a database. But this would produce many DB calls and I want to prevent this because the purpose of the code is to prevent DB calls.
For your goal, using an external service (Redis | DynamoDB | ....) is a common pattern. If you don't want to use a DB (or making API calls anyway) the other thing I can think of is to use a shared file system (EFS) to keep this state. See here for more info, especially the use cases towards the end of the blog. Instead of storing the list in memory you'd flush it on the file system (which happens to be shared among the tasks). Separate tasks in the same service have no other way to communicate among them. I don't know the exact requirements you have (concurrency, latency, etc) but using DynamoDB seems to be the best way forward. Very low latency, 0 burden to "manage a database".
I generally like the approach of #mreferre.
You might take a look into AWS Sticky Sessions.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/sticky-sessions.html
However be warned: If the sticky target of an user is going down, you need to make sure that the user is not constantly sent to the now-down target.
You need to carefully consider if sticky sessions match your usecase.
Maybe you can work with a short duration cookie to minimize duration of bad sticky connections.
I have a general question regarding how APIs are scaled. I have a basic RESTful API powered by Django rest framework, the backend uses RDS for database management. Right now, I'm deploying my django application to a digitalocean droplet but thinking of switching over to EC2 or potentially EKS. Is my understanding correct that I can effectively point my application to the RDS endpoint and spin up several EC2 instances with the same Django application fronted by an ELB? Would this take care of incoming traffic and the scalability of the django application?
This isn't exactly a coding question so I'm not sure if this is the best stackexchange site to ask this.
My two cents here:
I`ve been using lambda to serve Django and Flask apis for quite some time now, and it works great. You don't need to worry about scalability at all, unless there is a chance that your API would receive more than 10,000 requests per second (very unlikely on most scenarios). It will be way cheaper than EKS, even cheaper than EC2. I have a app with 400k active users which is served by an API running on lambda, I never paid more than $25 on invokations.
You can use Zappa (which is exclusively for python, I recommend) or Serverless framework, they will take care of most of the heavy work and make the deployment very easy.
But have in mind that lambda is not very good for long running tasks, like cronjobs. If you have have crons that might take some time to be executed your invokations can get a little expensive if you invoke it ofteen (lambdas can run up to 15 minutes, but those 15 minutes will be much more expensive than EC2). Also, the apigateway in front of the lambda function have a 30 seconds timeout, so your requests must be processed before that. If you think your requests will take longer, you will need to leverage some async requests. I think it is a very small price to have a full service without having to worry about the infrastructure.
You are right, but you can think not only about ec2 and EKS. You can also look into ECS and Fargate options. ELB distribute traffic across compute resources inside Target Group and it can be Autoscaling Group for EC2. Also, with RDS you can scale read replicas for handling mor read traffic independent from master node
I am trying to create a certain kind of networking infrastructure, and have been looking at Amazon ECS and Kubernetes. However I am not quite sure if these systems do what I am actually seeking, or if I am contorting them to something else. If I could describe my task at hand, could someone please verify if Amazon ECS or Kubernetes actually will aid me in this effort, and this is the right way to think about it?
What I am trying to do is on-demand single-task processing on an AWS instance. What I mean by this is, I have a resource heavy application which I want to run in the cloud and have process a chunk of data submitted by a user. I want to submit a this data to be processed on the application, have an EC2 instance spin up, process the data, upload the results to S3, and then shutdown the EC2 instance.
I have already put together a functioning solution for this using Simple Queue Service, EC2 and Lambda. But I am wondering would ECS or Kubernetes make this simpler? I have been going through the ECS documenation and it seems like it is not very concerned with starting up and shutting down instances. It seems like it wants to have an instance that is constantly running, then docker images are fed to it as task to run. Can Amazon ECS be configured so if there are no task running it automatically shuts down all instances?
Also I am not understanding how exactly I would submit a specific chunk of data to be processed. It seems like "Tasks" as defined in Amazon ECS really correspond to a single Docker container, not so much what kind of data that Docker container will process. Is that correct? So would I still need to feed the data-to-be-processed into the instances via simple queue service, or other? Then use Lambda to poll those queues to see if they should submit tasks to ECS?
This is my naive understanding of this right now, if anyone could help me understand the things I've described better, or point me to better ways of thinking about this it would be appreciated.
This is a complex subject and many details for a good answer depend on the exact requirements of your domain / system. So the following information is based on the very high level description you gave.
A lot of the features of ECS, kubernetes etc. are geared towards allowing a distributed application that acts as a single service and is horizontally scalable, upgradeable and maintanable. This means it helps with unifying service interfacing, load balancing, service reliability, zero-downtime-maintenance, scaling the number of worker nodes up/down based on demand (or other metrics), etc.
The following describes a high level idea for a solution for your use case with kubernetes (which is a bit more versatile than AWS ECS).
So for your use case you could set up a kubernetes cluster that runs a distributed event queue, for example an Apache Pulsar cluster, as well as an application cluster that is being sent queue events for processing. Your application cluster size could scale automatically with the number of unprocessed events in the queue (custom pod autoscaler). The cluster infrastructure would be configured to scale automatically based on the number of scheduled pods (pods reserve capacity on the infrastructure).
You would have to make sure your application can run in a stateless form in a container.
The main benefit I see over your current solution would be cloud provider independence as well as some general benefits from running a containerized system: 1. not having to worry about the exact setup of your EC2-Instances in terms of operating system dependencies of your workload. 2. being able to address the processing application as a single service. 3. Potentially increased reliability, for example in case of errors.
Regarding your exact questions:
Can Amazon ECS be configured so if there are no task running it
automatically shuts down all instances?
The keyword here is autoscaling. Note that there are two levels of scaling: 1. Infrastructure scaling (number of EC2 instances) and application service scaling (number of application containers/tasks deployed). ECS infrastructure scaling works based on EC2 autoscaling groups. For more info see this link . For application service scaling and serverless ECS (Fargate) see this link.
Also I am not understanding how exactly I would submit a specific
chunk of data to be processed. It seems like "Tasks" as defined in
Amazon ECS really correspond to a single Docker container, not so much
what kind of data that Docker container will process. Is that correct?
A "Task Definition" in ECS is describing how one or multiple docker containers can be deployed for a purpose and what its environment / limits should be. A task is a single instance that is run in a "Service" which itself can deploy a single or multiple tasks. Similar concepts are Pod and Service/Deployment in kubernetes.
So would I still need to feed the data-to-be-processed into the
instances via simple queue service, or other? Then use Lambda to poll
those queues to see if they should submit tasks to ECS?
A queue is always helpful in decoupling the service requests from processing and to make sure you don't lose requests. It is not required if your application service cluster can offer a service interface and process incoming requests directly in a reliable fashion. But if your application cluster has to scale up/down frequently that may impact its ability to reliably process.
While going through the documentation I came to know about these two tier of environment in AWS, but couldn't find any comparison between them. The suggested thing in documentation is, one should choose Worker Environment for a long running tasks (to increase the responsiveness of Web-tier).
I have a few questions to clarify my doubts:
How two tier are different from each other?
(in regards to performing different operations, services available in each etc.)
How do both communicate with each other?
(if I developed my front-end app in Web-tier and back-end in
Worker-tier)
The most important difference in my opinion is that worker tier instances do not run web server processes (apache, nginx, etc). As such, they do not directly respond to customer requests. Instead, they can be used to offload long-running processes from your web tier.
The tiers communicate with each other via SQS. When your web instance needs to spawn a background job, it posts a message to the shared queue with the job details. A daemon running on the worker instance reads the item from the queue and POSTs a message to an endpoint that your application exposes on http://localhost/.
That being said, I think the web/worker architecture might be overkill in the "front-end/back-end" terms you're describing. Your web tier is fully capable of running both a web server and an application server. If you have requirements for background or asynchronous processing, though, adding a worker tier might make sense.
First of all my knowledge of ActiveMQ, AMQPS and AWS auto scaling is fairly limited and I have been handed over this task where I need to create a scalable broker architecture for messaging over AMQPS using ActiveMQ.
In my current architecture if have a single machine ActiveMQ broker where the messaging is happening over AMQP + SSL and as a need of the product there is a publisher subscriber authentication (TLS authentication) to ensure correct guys are talking to each other. That part is working fine.
Now the problem is that I need to scale the whole broker thing over AWS cloud with auto-scaling in my mind. Without auto-scaling, I assume I can create a master slave architecture using EC2 instances, but then adding more slaves will be more like a manual process than automatic.
I want to understand wether below two options can solve the purpose -
ELB + ActiveMQ nodes being auto scaled
Something like a Bitnami powered ActiveMQ AMI running with auto scaling enabled.
In first case where ELB is there, I understand that ELB terminates SSL which will fail my mutual authentication. Also I am not sure wether my Pub/Sub model will still work where different ActiveMQ instances are independently running with no shared DB as such. If yes, if anyone can offer a pointer or a reference material it will be a help as I am not able to find one by myself.
In second case again, my concern is that when multiple instances are running with ActiveMQ how they will coordinate between each other and ensure that everyone has access to data being held up in queue.
The questions may be lame, but if any pointer it will be helpful.
AJ