I have two instances of a spring boot microservice (Microservice A) and a load balancer. The Spring boot application is running in AWS Fargate with 2 desired tasks (2 instances of Microservice A).
If I request the microservice A the load balancer leads me at the first request to microservice A instance 1.
The next request will be load balanced to microservice A instance 2. This is alternating on every future request.
On microservice A I have a Spring Service that has a List<String> attribute. On every request I put something in this List.
The List is hold in memory, I dont want to put this List into a database.
In my case every further request fills the list on microservice A instance 1 and microservice A instance 2 alternating. But I want to fill only one List with every future request so my List input is not spreaded across my microservice instances.
So my question is:
Is there any way to share one List between these two microservice instances?
Do I have to establish some kind of communication between them or is there a better solution for this? Or do I have to configure my load balancer?
A solution would be storing it in a database. But this would produce many DB calls and I want to prevent this because the purpose of the code is to prevent DB calls.
For your goal, using an external service (Redis | DynamoDB | ....) is a common pattern. If you don't want to use a DB (or making API calls anyway) the other thing I can think of is to use a shared file system (EFS) to keep this state. See here for more info, especially the use cases towards the end of the blog. Instead of storing the list in memory you'd flush it on the file system (which happens to be shared among the tasks). Separate tasks in the same service have no other way to communicate among them. I don't know the exact requirements you have (concurrency, latency, etc) but using DynamoDB seems to be the best way forward. Very low latency, 0 burden to "manage a database".
I generally like the approach of #mreferre.
You might take a look into AWS Sticky Sessions.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/sticky-sessions.html
However be warned: If the sticky target of an user is going down, you need to make sure that the user is not constantly sent to the now-down target.
You need to carefully consider if sticky sessions match your usecase.
Maybe you can work with a short duration cookie to minimize duration of bad sticky connections.
Related
I am building infrastructure primitives to support workers and http services.
workers are standalone
http services have a web server and a load balancer
The way I understand it, a worker generally pull from an external resource to consume tasks while a service handles inbound requests and talks to upstream services.
Celery is an obvious worker and a web app is an obvious service. The lines can get blurry though and I'm not sure what the best approach is:
Is the worker/service primitive a good idea?
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks? Is this a worker or a service?
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
What about instances of a stateful service that a master service connects to? The master has to know the individual agent instances so we cannot
hide them behind a LB. How would you go about representing that?
Is the worker/service primitive a good idea?
IMO, the main difference between service and worker can be that workers should be only designated to one task but service can perform multiple tasks. A service can
utilize a worker or a chain of workers to process the user request.
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks?
Services can be of different forms like web service, FTP service, SNMP
service or processing service. Writing the processing logic in service may not be a good idea unless it is taking the form of a worker.
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
I believe you are assuming service to be only HTTP based but as I mentioned in the previous answer, services can be of different types. Yes you may write a TCP service for a particular protocol implementation that can be attached behind an NLB
What about instances of a stateful service that a hub service connects to? The hub has to know the individual instances so we cannot hide them behind a LB.
Not sure what you mean by hub here? But good practice for a scalable architecture is to use stateless servers/services behind. The session state should not be stored in the service memory but should be serialized to a data store like DynamoDB.
One way to see the difference is to look at their names - workers do what their name says - they perform (typically) heavy tasks (work) - something you do not want your service to be bothered with, especially if it is a microservice. - For this particular reason, you will rarely see 32-core machines with hundred GBs of RAM running services, but you will very often see them running workers. Finally, they complement each other well. Services off-load heavy tasks to the workers. This aligns with the well-known UNIX philosophy "do one thing, and do it well".
How can I estimate how many page views per second or in parallel an instance like t2.micro can handle? I know this varies depending on database queries, template processing and such, but I need some conservative estimates or real world examples just for a point of reference.
You're going to have issues if you try and apply a typical VPS type thought process to AWS. One of the strengths of AWS is that you have elasticity. Put up some instances, then add more when demand increases. Auto Scaling Groups mixed with an Elastic Load Balancer helps greatly with automatically dealing with demand (though it's not going to handle unexpected spike traffic very well, you'll just have to have a lot of standby instances ready if you want to deal with that).
One reason why you don't want to have t1.micros directly serving up requests is because slow clients can take up sockets that could be used to connect to the database. That's why you let ELB handle the clients instead so you don't have to deal with that. Also how many clients you can handle will be very much based on the number of available sockets, available resources, what type of web server you have installed, etc. etc.
If you're serving up static files then just use S3, potentially mixed with CloudFront to deal with that. For dealing with simple API calls that do CRUD operations on a database just use Lambda Functions with API Gateway. Since both Lambda and API Gateway will scale up with demand you won't really have to worry about the page views issue as much.
You're simply going to have an extremely difficult time finding a direct answer to your question just due to the way AWS works and how folks utilize it.
My scenario is mentioned below, please provide the solution.
I need to run 17 HTTP Rest API's for 30K users.
I will create 6 AWS instances (Slaves) for running 30K (6 Instances*5000 Users) users.
Each AWS instance (Slave) needs to handle 5K Users.
I will create 1 AWS instance (Master) for controlling 6 AWS slaves.
1) For Master AWS instance, what instance type and storage I need to use?
2) For Slave AWS instance, what instance type and storage I need to use?
3) The main objective is a Single AWS instance need to handle 5000Users (5k) users, for this what instance type and storage I need to use? This objective needs to solve for low cost (pricing)?
Full ELB DNS Name:
The answer is I don't know, this is something you need to find out how many users you will be able to simulate on this or that AWS instance as it depends on the nature of your test, what it is doing, response size, number of postprocessors/assertions, etc.
So I would recommend the following approach:
First of all make sure you are following recommendations from the 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
Start with single AWS server, i.e. t2.large and single virtual user. Gradually increase the load at the same time monitor the AWS health (CPU,RAM, Disk, etc) using either Amazon CloudWatch or JMeter PerfMon Plugin. Once there will be a lack of the monitored metrics (i.e. CPU usage exceeds 90%) stop your test and mention the number of virtual users at this stage (you can use i.e. Active Threads Over Time listener for this)
Depending on the outcome either switch to other instance type (i.e. Compute Optimized if there is a lack of CPU or Memory Optimized if there is a lack of RAM) or go for higher spec instance of the same tier (i.e. t2.xlarge)
Once you get the number of users you can simulate on a single host you should be able to extrapolate it to other hosts.
JMeter master host doesn't need to be as powerful as slave machines, just make sure it has enough memory to handle incoming results.
I am about to launch an iOS app that will be communicating with my custom REST API. Right now I am running a single EC2 t2.micro instance running an Apache web server with MySQLi. Before I go ahead and launch it for the public, I want to hear what proper steps should be taken regarding the following.
Should I run two separate EC2 instances? One only for the web server and the other to handle only the database?
How should I approach setting up the database? Should I still use MySQLi or should I start using Amazon's RDS?
In relationship to number two, when the database and/or web server runs out of space, how is this issue handled so that it seamlessly adds space to allow the database/web server to continue growth? I also read something regarding auto-scale.
I will be expecting many requests per minute to my web server and want to take precaution.
The answer to these questions largely depends on the requirements of your application, your budget, and on what you decide to manage vs. what you'd prefer to allow AWS to manage. However, I'll answer these as best I can.
1) Yes. Separating the database from the web server (that is, 2 different EC2 instances) makes sense for a lot of reasons. This will allow you to tailor resources like memory, CPU, etc. to each layer of your application separately. You do not want your web and database competing for the same resources. Additionally, an issue that forces you to take down one (web or database) will not force you to also take down the other. If your database lives on one of the web servers and you need to perform maintenance, your app will effectively become offline, since down goes your database as you perform updates. Also, ideally you would protect your database server within a private subnet in your VPC. If you have the web and database on the same server, they will both be in a public subnet, since you're web will require access to an internet gateway.
2) Depends. If you want to maintain total control of the database server, than use an EC2 instance where you retain operating system control. If you want to take advantage of features like Multi-AZ for high availability or allowing AWS to manage things like updates for you, RDS can be a great option. Cost also plays a role. For things like read-replicas and Multi-AZ, you will pay more, but you are purchasing performance and high availability. Thus, depends on your requirements. You can find the features of RDS here: RDS Product Details
3) For anything running on an EC2 instance (database or web) or if you decide to use RDS, you may provision and attach additional storage volumes as necessary. The type of storage you select will depend on the performance requirements, your budget, and the kind of workload you expect your database to face. Amazon provides the storage options available to you as well as a section for adding more storage here: RDS Storage Options
If you are worried about too many requests overwhelming your EC2 t2.micro instance, consider creating an ELB load balancer and setting up an auto-scaling group which will allow you to expand your capacity as necessary while distributing traffic such that no one server gets overwhelmed.
I have a web app which runs behind Amazon AWS Elastic Load Balancer with 3 instances attached. The app has a /refresh endpoint to reload reference data. It need to be run whenever new data is available, which happens several times a week.
What I have been doing is assigning public address to all instances, and do refresh independently (using ec2-url/refresh). I agree with Michael's answer on a different topic, EC2 instances behind ELB shouldn't allow direct public access. Now my problem is how can I make elb-url/refresh call reaching all instances behind the load balancer?
And it would be nice if I can collect HTTP responses from multiple instances. But I don't mind doing the refresh blindly for now.
one of the way I'd solve this problem is by
writing the data to an AWS s3 bucket
triggering a AWS Lambda function automatically from the s3 write
using AWS SDK to to identify the instances attached to the ELB from the Lambda function e.g. using boto3 from python or AWS Java SDK
call /refresh on individual instances from Lambda
ensuring when a new instance is created (due to autoscaling or deployment), it fetches the data from the s3 bucket during startup
ensuring that the private subnets the instances are in allows traffic from the subnets attached to the Lambda
ensuring that the security groups attached to the instances allow traffic from the security group attached to the Lambda
the key wins of this solution are
the process is fully automated from the instant the data is written to s3,
avoids data inconsistency due to autoscaling/deployment,
simple to maintain (you don't have to hardcode instance ip addresses anywhere),
you don't have to expose instances outside the VPC
highly available (AWS ensures the Lambda is invoked on s3 write, you don't worry about running a script in an instance and ensuring the instance is up and running)
hope this is useful.
While this may not be possible given the constraints of your application & circumstances, its worth noting that best practice application architecture for instances running behind an AWS ELB (particularly if they are part of an AutoScalingGroup) is ensure that the instances are not stateful.
The idea is to make it so that you can scale out by adding new instances, or scale-in by removing instances, without compromising data integrity or performance.
One option would be to change the application to store the results of the reference data reload into an off-instance data store, such as a cache or database (e.g. Elasticache or RDS), instead of in-memory.
If the application was able to do that, then you would only need to hit the refresh endpoint on a single server - it would reload the reference data, do whatever analysis and manipulation is required to store it efficiently in a fit-for-purpose way for the application, store it to the data store, and then all instances would have access to the refreshed data via the shared data store.
While there is a latency increase adding a round-trip to a data store, it is often well worth it for the consistency of the application - under your current model, if one server lags behind the others in refreshing the reference data, if the ELB is not using sticky sessions, requests via the ELB will return inconsistent data depending on which server they are allocated to.
You can't make these requests through the load balancer, So you will have to open up the security group of the instances to allow incoming traffic from source other than the ELB. That doesn't mean you need to open it to all direct traffic though. You could simply whitelist an IP address in the security group to allow requests from your specific computer.
If you don't want to add public IP addresses to these servers then you will need to run something like a curl command on an EC2 instance inside the VPC. In that case you would only need to open the security group to allow traffic from some server (or group of servers) that exist in the VPC.
I solved it differently, without opening up new traffic in security groups or resorting to external resources like S3. It's flexible in that it will dynamically notify instances added through ECS or ASG.
The ELB's Target Group offers a feature of periodic health check to ensure instances behind it are live. This is a URL that your server responds on. The endpoint can include a timestamp parameter of the most recent configuration. Every server in the TG will receive the health check ping within the configured Interval threshold. If the parameter to the ping changes it signals a refresh.
A URL may look like:
/is-alive?last-configuration=2019-08-27T23%3A50%3A23Z
Above I passed a UTC timestamp of 2019-08-27T23:50:23Z
A service receiving the request will check if the in-memory state is at least as recent as the timestamp parameter. If not, it will refresh its state and update the timestamp. The next health-check will result in a no-op since your state was refreshed.
Implementation notes
If refreshing the state can take more time than the interval window or the TG health timeout, you need to offload it to another thread to prevent concurrent updates or outright service disruption as the health-checks need to return promptly. Otherwise the node will be considered off-line.
If you are using traffic port for this purpose, make sure the URL is secured by making it impossible to guess. Anything publicly exposed can be subject to a DoS attack.
As you are using S3 you can automate your task by using the ObjectCreated notification for S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification.html
You can install AWS CLI and write a simple Bash script that will monitor that ObjectCreated notification. Start a Cron job that will look for the S3 notification for creation of new object.
Setup a condition in that script file to curl "http: //127.0.0.1/refresh" when the script file detects new object created in S3 it will curl the 127.0.0.1/refresh and done you don't have to do that manually each time.
I personally like the answer by #redoc, but wanted to give another alternative for anyone that is interested, which is a combination of his and the accepted answer. Using SEE object creation events, you can trigger a lambda, but instead of discovering the instances and calling them, which requires the lambda to be in the vpc, you could have the lambda use SSM (aka Systems Manager) to execute commands via a powershell or bash document on EC2 instances that are targeted via tags. The document would then call 127.0.0.1/reload like the accepted answer has. The benefit of this is that your lambda doesn't have to be in the vpc, and your EC2s don't need inbound rules to allow the traffic from lambda. The downside is that it requires the instances to have the SSM agent installed, which sounds like more work than it really is. There's AWS AMIs already optimized with SSM agent stuff, but installing it yourself in the user data is very simple. Another potential downside, depending on your use case, is that it uses an exponential ramp up for simultaneous executions, which means if you're targeting 20 instances, it runs one 1, then 2 at once, then 4 at once, then 8, until they are all done, or it reaches what you set for the max. This is because of the error recovery stuff it has built in. It doesn't want to destroy all your stuff if something is wrong, like slowly putting your weight on some ice.
You could make the call multiple times in rapid succession to call all the instances behind the Load Balancer. This would work because the AWS Load Balancers use round-robin without sticky sessions by default, meaning that each call handled by the Load Balancer is dispatched to the next EC2 Instance in the list of available instances. So if you're making rapid calls, you're likely to hit all the instances.
Another option is that if your EC2 instances are fairly stable, you can create a Target Group for each EC2 Instance, and then create a listener rule on your Load Balancer to target those single instance groups based on some criteria, such as a query argument, URL or header.