In Traditional Performance Automation Testing:
There is an application server where all the requests hits are received. So in this case; we have server configuration (CPU, RAM etc) with us to perform load testing (of lets say 5k concurrent users) using Jmeter or any load test tool and check server performance.
In case of AWS Serverless; there is no server - so to speak - all servers are managed by AWS. So code only resides in lambdas and it is decided by AWS on run time to perform load balancing in case there are high volumes on servers.
So now; we have a web app hosted on AWS using serverless framework and we want to measure performance of the same for 5K concurrent users. With no server backend information; only option here is to rely on the frontend or browser based response times - should this suffice?
Is there a better way to check performance of serverless applications?
I didn't work with AWS, but in my opinion performance testing in case serverless applications should perform pretty the same way as in traditional way with own physical servers.
Despite the name serverless, physical servers are still used (though are managed by aws).
So I will approach to this task with next steps:
send backend metrics (response time, count requests and so on) to some metrics system (graphite, prometheus, etc)
build dashboard in this metric system (ideally you should see requests count and response time per every instance and count of instances)
take a load testing tool (jmeter, gatling or whatever) and start your load test scenario
During the test and after the test you will see how many requests your app processing, it response times and how change count of instances depending of concurrent requests.
So in such case you will agnostic from aws management tools (but probably aws have some management dashboard and afterwards it will good to compare their results).
"Loadtesting" a serverless application is not the same as that of a traditional application. The reason for this is that when you write code that will run on a machine with a fixed amount CPU and RAM, many HTTP requests will be processed on that same machine at the same time. This means you can suffer from the noisy-neighbour effect where one request is consuming so much CPU and RAM that it is negatively affecting other requests. This could be for many reasons including sub-optimal code that is consuming a lot of resources. An attempted solution to this issue is to enable auto-scaling (automatically spin up additional servers if the load on the current ones reaches some threshold) and load balancing to spread requests across multiple servers.
This is why you need to load test a traditional application; you need to ensure that the code you wrote is performant enough to handle the influx of X number of visitors and that the underlying scaling systems can absorb the load as needed. It's also why, when you are expecting a sudden burst of traffic, you will pre-emptively spin up additional servers to help manage all that load ahead of time. The problem is you cannot always predict that; a famous person mentions your service on Facebook and suddenly your systems need to respond in seconds and usually can't.
In serverless applications, a lot of the issues around noisy neighbours in compute are removed for a number of reasons:
A lot of what you usually did in code is now done in a managed service; most web frameworks will route HTTP requests in code however API Gateway in AWS takes that over.
Lambda functions are isolated and each instance of a Lambda function has a certain quantity of memory and CPU allocated to it. It has little to no effect on other instances of Lambda functions executing at the same time (this also means if a developer makes a mistake and writes sub-optimal code, it won't bring down a server; serverless compute is far more forgiving to mistakes).
All of this is not to say its not impossible to do your homework to make sure your serverless application can handle the load. You just do it differently. Instead of trying to push fake users at your application to see if it can handle it, consult the documentation for the various services you use. AWS for example publishes the limits to these services and guarantees those numbers as a part of the service. For example, API Gateway has a limit of 10 000 requests per second. Do you expect traffic greater than 10 000 per second? If not, your good! If you do, contact AWS and they may be able to increase that limit for you. Similar limits apply to AWS Lambda, DynamoDB, S3 and all other services.
As you have mentioned, the serverless architecture (FAAS) don't have a physical or virtual server we cannot monitor the traditional metrics. Instead we can capture the below:
Auto Scalability:
Since the main advantage of this platform is Scalability, we need to check the auto scalability by increasing the load.
More requests, less response time:
When hitting huge amount of requests, traditional servers will increase the response time where as this approach will make it lesser. We need to monitor the response time.
Lambda insights in Cloudwatch:
There is an option to monitor the performance of multiple Lambda functions - Throttles, Invocations & Errors, Memory usage, CPU usage and network usage. We can configure the Lambdas we need and monitor in the 'Performance monitoring' column.
Container CPU and Memory usage:
In cloudwatch, we can create a dashboard with widgets to capture the CPU and memory usage of the containers, tasks count and LB response time (if any).
Related
While I have worked with AWS for a bit, I'm stuck on how to correctly approach the following use case.
We want to design an uptime monitor for up to 10K websites.
The monitor should run from multiple AWS regions and ping websites if they are available and measure the response time. With a lambda function, I can ping the site, pass the result to a sqs queue and process it. So far, so good.
However, I want to run this function every minute. I also want to have the ability to add and delete monitors. So if I don't want to monitor website "A" from region "us-west-1" I would like to do that. Or the other way round, add a website to a region.
Ideally, all this would run serverless and deployable to custom regions with cloud formation.
What services should I go with?
I have been thinking about Eventbridge, where I wanted to make custom events for every website in every region and then send the result over SNS to a central processing Lambda. But I'm not sure this is the way to go.
Alternatively, I wanted to build a scheduler lambda that fetches the websites it has to schedule from a DB and then invokes the fetcher lambda. But I was not sure about the delay since I want to have the functions triggered every minute. The architecture should monitor 10K websites and even more if possible.
Feel free to give me any advise you have :)
Kind regards.
In my opinion Lambda is not the correct solution for this problem. Your costs will be very high and it may not scale to what you want to ultimately do.
A c5.9xlarge EC2 costs about USD $1.53/hour and has a 10gbit network. With 36 CPU's a threaded program could take care of a large percentage - maybe all 10k - of your load. It could still be run in multiple regions on demand and push to an SQS queue. That's around $1100/month/region without pre-purchasing EC2 time.
A Lambda, running 10000 times / minute and running 5 seconds every time and taking only 128MB would be around USD $4600/month/region.
Coupled with the management interface you're alluding to the EC2 could handle pretty much everything you're wanting to do. Of course, you'd want to scale and likely have at least two EC2's for failover but with 2 of them you're still less than half the cost of the Lambda. As you scale now to 100,000 web sites it's a matter of adding machines.
There are a ton of other choices but understand that serverless does not mean cost efficient in all use cases.
I am developing a rest service using Spring boot. The rest service takes an input file and do some operation on it and return back the processed file.
I know that in spring boot we have configuration "server.tomcat.max-threads" which can be a maximum of 400.
My rest application will be deployed on a cluster.
I want to understand how I should be handling if the request is more than 400 for a case wherein my cluster has only one node.
Basically I wanted to understand what is the standard way for serving requests more than the "max-thread-per-node X N-nodes" in a cloud solution.
Welcome to AWS and Cloud Computing in general. What you have described is the system elasticity which is made very easy and accessible in this ecosystem.
Have a look at AWS Auto Scaling. It is a service which will monitor your application and automatically scale out to meet the increasing demand and scale in to save costs when the demand is low.
You can set triggers for the same. For eg. If you know that your application load is a function of Memory usage, whenever memory usage hits 80% you can add nodes to the custer. read more about various scaling Policies here.
One such scaling metric is ALBRequestCountPerTarget. It will scale the number of nodes int he cluster to maintain the average request count per node(target) in the cluster. With some buffer, you can set this to 300 and achieve what you are looking for. Read more about this in the docs.
I have an app deployed in 5 regions.
The latency between the regions varies from 150ms to 300ms
Currently, we use the method outlined in this article (usage tracking part):
http://highscalability.com/blog/2018/4/2/how-ipdata-serves-25m-api-calls-from-10-infinitely-scalable.html
But we export logs from Stackdriver to Cloud Pub/Sub. Then we use Cloud Dataflow to count the number of requests consumed per API key and update it in Mongo Atlas database which is geo-replicated in 5 regions.
In our app, we only read usage info from the nearest Mongo replica for low latency. App never updates any usage data directly in Mongo as it might incur latency cost since the data has to be updated in Master which may be in another region.
Updating API key usage counter directly from the app in Mongo doesn't seem feasible because we've traffic coming in at 10,000 RPS and due to the latency between region, I think it will run into some other issue. This is just a hunch, so far I've not tested it. I came to this conclusion based on my reading of https://www.mongodb.com/blog/post/active-active-application-architectures-with-mongodb
One problem is that we end up paying for cloud pub/sub and Dataflow. Are there strategies to avoid this?
I researched on Google but didn't find how other multi-region apps keep track of usage per API key in real-time. I am not surprised, from my understanding most apps operate in a single region for simplicity and until now it was not feasible to deploy an app in multiple regions without significant overhead.
If you want real-time then the best option is to go with Dataflow. You could change the way data arrives to Dataflow, for example usging Stackdriver → Cloud Storage → Dataflow, but instead of going though pub/sub you would go through Storage, so it’s more of a choice of convenience and comparing prices of each product cost on your use case. Here’s an example of how it could be with Cloud Storage.
If i understood the whole concept correctly, the "serverless" architecture assumes that instead of using own servers or containers, one should use bunch of aws services. Usually such architecture includes Amazon API Gateway, bunch of Lambda functions and DynamoDB (or alternative) for storing data and state, as Lambda can't keep state. And such services as EC2 is not participating in all this, well, because this is a virtual server and it diminish all the benefits of serverless architecture.
All this looks really cool, but i feel like i'm missing something important, because right now this seems to be not applicable for such cases as real time applications.
Say, i have 2 users online. One of them performs an action in an app, which triggers changes in database, which in turn, should trigger changes in the second user app.
The conventional way to send some data or command from server to client is websocket connection. But with serverless architecture there seem to be no way to establish and maintain websocket connection. So... where did i misunderstood the concept? Or, if i understood everything correctly, then how do i implement the interactions between 2 users as described above?
where did i misunderstood the concept?
Your observation is correct. It doesn't work out of the box using API Gateway and Lambda.
Applicable solution as described here is to use AWS IoT - yes, another AWS Service.
Serverless isn't just a matter of Lambda, API Gateway and DynamoDB, it's much bigger than that. One of the big advantages to Serverless is the operational burden that it takes off your plate. No more patching, no more capacity planning, no more config management. Those may seem trivial but doing those things well and across a significant fleet of instances is complex, expensive and time consuming. Another benefit is the economics. Public cloud leverages utility billing, meaning you pay for what you run whether or not you actually use it. With AWS most of the billing per service is by hour but with Lambda it's per 100ms. The cheapest EC2 instance running for a full month is about $10/m (double that for redundancy). $20 in Lambda pricing gets you millions of invocations so for most cases serverless is significantly cheaper.
Serverless isn't for everything though, it has it's limitations, for example it's not meant for running binaries. You can't run nginx in Lambda (for example), it's only meant to be a runtime environment for the programming languages that it supports. It's also specifically meant for event based workloads, which is perfect for microservice based architectures. Small independent discrete pieces of compute doing work that when done they send an event to another(s) to do something else and if needed return a response.
To address your concerns about realtime processing, depending on what your code is doing your Lambda function could complete in less than 100ms all the way up to 5 minutes. There are strategies to optimize it's duration time but in general it's for short lived work which is conducive of realtime scenarios.
In your example about the 2 users interacting with the web app and the db, that could very easily be built using serverless technologies with one or 2 functions and a DynamoDB table. The total roundtrip time could be as low as milliseconds if not seconds, it really all depends on your code and what it's doing. These would all be HTTP calls so no websockets needed. Think of a number of APIs calling each other and your Lambda code is the orchestrator.
You might want to look at SNS (simple notification service). In your example, if app user 2 is a a subscriber to an SNS topic, then when app user 1 makes a change that triggers an SNS message, it will be pushed to the subscriber (app user 2). The message can be pushed over several supported protocols (Amazon, Apple, Google, MS, Baidu) in addition to SMTP or SMS. The SNS message can be triggered by a lambda function or directly from a DynamoDB stream after an update (a database trigger). It's up to the app developer to select a message protocol and format. The app only has to receive messages through its native channels. This may not exactly be millisecond-latency 'real-time', but it's fast enough for all but the most latency-sensitive applications.
I've been working on an AWS serverless application for several months now, and am amazed at the variety of services available. The rate of improvement and new features being added is enough to leave you out-of-breath.
For a sake of HA I'm considering switching from self hosted solution (ZeroMQ) to AWS Simple Notification Service for pub/sub in an application. Which is a backend for an app, thus should be reasonably real-time.
What are latency and throughput I can expect of SNS?
Is the app going to be hosted on EC2? If so, the latency will be far diminished, as the communication channel will be going across Amazon's connection, rather than through the internet.
If you are going to call AWS services from boxes not hosted on EC2, here's a cool site that attempts to give you an idea of the amount of latency between you and various AWS services and locations.
How are you measuring the HTTP Ping Request Latency?
We are making a HTTP GET request to AWS Service Endpoints (like EC2,
SQS, SNS etc) for PING and measuring the observed latency for it
across all regions.
As for thoughput, that is left up to you. You can use various strategies to increase throughput, like multi-treading, batching messages, etc.
Keep in mind that you will have to code for some side effects, like possibly seeing the same message twice (At Least Once Delivery), and not being able to rely on FIFO.