I have an app deployed in 5 regions.
The latency between the regions varies from 150ms to 300ms
Currently, we use the method outlined in this article (usage tracking part):
http://highscalability.com/blog/2018/4/2/how-ipdata-serves-25m-api-calls-from-10-infinitely-scalable.html
But we export logs from Stackdriver to Cloud Pub/Sub. Then we use Cloud Dataflow to count the number of requests consumed per API key and update it in Mongo Atlas database which is geo-replicated in 5 regions.
In our app, we only read usage info from the nearest Mongo replica for low latency. App never updates any usage data directly in Mongo as it might incur latency cost since the data has to be updated in Master which may be in another region.
Updating API key usage counter directly from the app in Mongo doesn't seem feasible because we've traffic coming in at 10,000 RPS and due to the latency between region, I think it will run into some other issue. This is just a hunch, so far I've not tested it. I came to this conclusion based on my reading of https://www.mongodb.com/blog/post/active-active-application-architectures-with-mongodb
One problem is that we end up paying for cloud pub/sub and Dataflow. Are there strategies to avoid this?
I researched on Google but didn't find how other multi-region apps keep track of usage per API key in real-time. I am not surprised, from my understanding most apps operate in a single region for simplicity and until now it was not feasible to deploy an app in multiple regions without significant overhead.
If you want real-time then the best option is to go with Dataflow. You could change the way data arrives to Dataflow, for example usging Stackdriver → Cloud Storage → Dataflow, but instead of going though pub/sub you would go through Storage, so it’s more of a choice of convenience and comparing prices of each product cost on your use case. Here’s an example of how it could be with Cloud Storage.
Related
We're building a multi-tenant SaaS application hosted on AWS that exposes and visualizes data in the front end via a REST api.
Now, for storage we're considering using AWS Redshift (Cluster or Serverless?) and then exposing the data using API Gateway and Lambda with the Redshift Data API.
The reason why I'm inclined to using Redshift as opposed to e.g RDS is that it seems like a nice option to also be able to conduct data experiments internally when building our product.
My question is, would this be considered a good strategy?
Redshift is sized for very large data and tables. For example the minimum storage size is 1MB. That's 1MB for every column and across all the slices (minimum 2). A table with 5 columns and just a few rows will take 26MB on the smallest Redshift cluster size (default distribution style). Redshift shines when your tables have 10s of millions of rows minimum. It isn't clear from your case that you will have the data sizes that will run efficiently on Redshift.
The next concern would be about your workload. Redshift is a powerful analytics engine but is not designed for OLTP workloads. High volumes of small writes will not perform well; it wants batch writes. High concurrency of light reads will not work as well as a database designed for that workload.
At low levels of work Redshift can do these things - it is a database. But if you use it in a way it isn't optimized for it likely isn't the most cost effective option and won't scale well. If job A is the SAS workload and analytics is job B, then choose the right database for job A. If this choice cannot do job B at the performance level you need then add an analytics engine to the mix.
My $.02 and I'm the Redshift guy. If my assumptions about your workload are wrong please update with specific info.
We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.
This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.
In Traditional Performance Automation Testing:
There is an application server where all the requests hits are received. So in this case; we have server configuration (CPU, RAM etc) with us to perform load testing (of lets say 5k concurrent users) using Jmeter or any load test tool and check server performance.
In case of AWS Serverless; there is no server - so to speak - all servers are managed by AWS. So code only resides in lambdas and it is decided by AWS on run time to perform load balancing in case there are high volumes on servers.
So now; we have a web app hosted on AWS using serverless framework and we want to measure performance of the same for 5K concurrent users. With no server backend information; only option here is to rely on the frontend or browser based response times - should this suffice?
Is there a better way to check performance of serverless applications?
I didn't work with AWS, but in my opinion performance testing in case serverless applications should perform pretty the same way as in traditional way with own physical servers.
Despite the name serverless, physical servers are still used (though are managed by aws).
So I will approach to this task with next steps:
send backend metrics (response time, count requests and so on) to some metrics system (graphite, prometheus, etc)
build dashboard in this metric system (ideally you should see requests count and response time per every instance and count of instances)
take a load testing tool (jmeter, gatling or whatever) and start your load test scenario
During the test and after the test you will see how many requests your app processing, it response times and how change count of instances depending of concurrent requests.
So in such case you will agnostic from aws management tools (but probably aws have some management dashboard and afterwards it will good to compare their results).
"Loadtesting" a serverless application is not the same as that of a traditional application. The reason for this is that when you write code that will run on a machine with a fixed amount CPU and RAM, many HTTP requests will be processed on that same machine at the same time. This means you can suffer from the noisy-neighbour effect where one request is consuming so much CPU and RAM that it is negatively affecting other requests. This could be for many reasons including sub-optimal code that is consuming a lot of resources. An attempted solution to this issue is to enable auto-scaling (automatically spin up additional servers if the load on the current ones reaches some threshold) and load balancing to spread requests across multiple servers.
This is why you need to load test a traditional application; you need to ensure that the code you wrote is performant enough to handle the influx of X number of visitors and that the underlying scaling systems can absorb the load as needed. It's also why, when you are expecting a sudden burst of traffic, you will pre-emptively spin up additional servers to help manage all that load ahead of time. The problem is you cannot always predict that; a famous person mentions your service on Facebook and suddenly your systems need to respond in seconds and usually can't.
In serverless applications, a lot of the issues around noisy neighbours in compute are removed for a number of reasons:
A lot of what you usually did in code is now done in a managed service; most web frameworks will route HTTP requests in code however API Gateway in AWS takes that over.
Lambda functions are isolated and each instance of a Lambda function has a certain quantity of memory and CPU allocated to it. It has little to no effect on other instances of Lambda functions executing at the same time (this also means if a developer makes a mistake and writes sub-optimal code, it won't bring down a server; serverless compute is far more forgiving to mistakes).
All of this is not to say its not impossible to do your homework to make sure your serverless application can handle the load. You just do it differently. Instead of trying to push fake users at your application to see if it can handle it, consult the documentation for the various services you use. AWS for example publishes the limits to these services and guarantees those numbers as a part of the service. For example, API Gateway has a limit of 10 000 requests per second. Do you expect traffic greater than 10 000 per second? If not, your good! If you do, contact AWS and they may be able to increase that limit for you. Similar limits apply to AWS Lambda, DynamoDB, S3 and all other services.
As you have mentioned, the serverless architecture (FAAS) don't have a physical or virtual server we cannot monitor the traditional metrics. Instead we can capture the below:
Auto Scalability:
Since the main advantage of this platform is Scalability, we need to check the auto scalability by increasing the load.
More requests, less response time:
When hitting huge amount of requests, traditional servers will increase the response time where as this approach will make it lesser. We need to monitor the response time.
Lambda insights in Cloudwatch:
There is an option to monitor the performance of multiple Lambda functions - Throttles, Invocations & Errors, Memory usage, CPU usage and network usage. We can configure the Lambdas we need and monitor in the 'Performance monitoring' column.
Container CPU and Memory usage:
In cloudwatch, we can create a dashboard with widgets to capture the CPU and memory usage of the containers, tasks count and LB response time (if any).
I am developing a rest service using Spring boot. The rest service takes an input file and do some operation on it and return back the processed file.
I know that in spring boot we have configuration "server.tomcat.max-threads" which can be a maximum of 400.
My rest application will be deployed on a cluster.
I want to understand how I should be handling if the request is more than 400 for a case wherein my cluster has only one node.
Basically I wanted to understand what is the standard way for serving requests more than the "max-thread-per-node X N-nodes" in a cloud solution.
Welcome to AWS and Cloud Computing in general. What you have described is the system elasticity which is made very easy and accessible in this ecosystem.
Have a look at AWS Auto Scaling. It is a service which will monitor your application and automatically scale out to meet the increasing demand and scale in to save costs when the demand is low.
You can set triggers for the same. For eg. If you know that your application load is a function of Memory usage, whenever memory usage hits 80% you can add nodes to the custer. read more about various scaling Policies here.
One such scaling metric is ALBRequestCountPerTarget. It will scale the number of nodes int he cluster to maintain the average request count per node(target) in the cluster. With some buffer, you can set this to 300 and achieve what you are looking for. Read more about this in the docs.
I'm creating a simple web app that needs to be deployed to multiple regions in AWS. The application requires some dynamic configuration which is managed by a separate service. When the configuration is changed through this service, I need those changes to propagate to all web app instances across all regions.
I considered using cross-region replication with DynamoDB to do this, but I do not want to incur the added cost of running DynamoDB in every region, and the replication console. Then the thought occurred to me of using S3 which is inherently cross-region.
Basically, the configuration service would write all configurations to S3 as static JSON files. Each web app instance will periodically check S3 to see if the any of the config files have changed since the last check, and download the new config if necessary. The configuration changes are not time-sensitive, so polling for changes every 5/10 mins should suffice.
Have any of you used a similar approach to manage app configurations before? Do you think this is a smart solution, or do you have any better recommendations?
The right tool for this configuration depends on the size of the configuration and the granularity you need it.
You can use both DynamoDB and S3 from a single region to serve your application in all regions. You can read a configuration file in S3 from all the regions, and you can read the configuration records from a single DynamoDB table from all the regions. There is some latency due to the distance around the globe, but for reading configuration it shouldn't be much of an issue.
If you need the whole set of configuration every time that you are loading the configuration, it might make more sense to use S3. But if you need to read small parts of a large configuration, by different parts of your application and in different times and schedule, it makes more sense to store it in DynamoDB.
In both options, the cost of the configuration is tiny, as the cost of a text file in S3 and a few gets to that file, should be almost free. The same low cost is expected in DynamoDB as you have probably only a few KB of data and the number of reads per second is very low (5 Read capacity per second is more than enough). Even if you decide to replicate the data to all regions it will still be almost free.
I have an application I wrote that works in exactly the manner you suggest, and it works terrific. As it was pointed out, S3 is not 'inherently cross-region', but it is inherently durable across multiple availability zones, and that combined with cross region replication should be more than sufficient.
In my case, my application is also not time-sensitive to config changes, but none-the-less besides having the app poll on a regular basis (in my case 1 once per hour or after every long-running job), I also have each application subscribed to SNS endpoints so that when the config file changes on S3, an SNS event is raised and the applications are notified that a change occurred - so in some cases the applications get the config changes right away, but if for whatever reason they are unable to process the SNS event immediately, they will 'catch up' at the top of every hour, when the server reboots and/or in the worst case by polling S3 for changes every 60 minutes.