Cloud Run Web Hosting limitation - google-cloud-platform

I'm considering a cloud run for web hosting rather than a complex compute engine.
I just want to make an api with node.js. I heard that automatic load balancing is also available. If so, is there any problem with concurrent traffic of 1 million people without any configuration? (The database server is somewhere else (which is serverless like cockroachDB)
Or Do I have to configure various complicated settings like aws ec2 or gce?

For such traffic, out of the box configuration must be fine tuned.
Firstly, the concurrency parameter on Cloud Run. This parameter indicate how many concurrent request can be handle per instance. It's 80 the default value, and you can set up to 1000 concurrent requests per instance.
Of course, if you handle 1000 concurrent request per instance (or less) you should require more CPU and Memory. You can also play with those parameters
You also have to change the max instance limit. By default, you are limited to 1000.
If you set 1000 concurrent requests and 1000 instances, you can handle 1 million of concurrent request.
However, you don't have a lot of margins, or your instance with 1000 concurrent requests can be struggle even with max CPU and memory.
You can request more than 1000 instances with a quota increase request.
You can also optimise differently, especially if your 1 million users aren't in the same country/Google Cloud Region. if so, you can deploy a HTTPS load balancer in front of your cloud run service and deploy it in all the region of your users. (The Cloud Run services deployed in different regions must have the same name).
Like that, it's not only one service that will have to absorb 1 million of users, but several, in different regions. In addition, the HTTPS load balancer route the request to the closest region and therefore your optimize the latency, and reduce the egress/cross region traffic.

Related

How much ram i need in cloud hosting for 100 kb website storage?

I have one website that has simple one page that fetches trending videos from youtube with the use of youtube api and size of the website is just 100 kb. website is created by using of HTML,CSS,PHP. I want to host it on any good cloud hosting. Suppose i will get 10000 daily visitors to my website then 1gb ram and 1 coreCPU is sufficient for that?
Nobody can answer this question for you because every application is different, and much of it depends on the patterns of your particular user base.
The only way to know the requirements is to deploy the system and then simulate user traffic. Monitor the system to identify stress points, which could be RAM, CPU or Network. You can then adjust the size of the instance accordingly, and even change Instance Type to obtain a different mix of RAM and CPU.
Alternatively, just deploy something and monitor it closely. Then, adjust things based on usage patterns you see. This is "testing in production".
You could also consider using Amazon EC2 Auto Scaling, which can automatically launch new instances to handle an increased load. This way, the resources vary based on usage. However, this design would require a Load Balancer in front of the instances.
Then, if you want to get really fancy, you could simply host a static web page from Amazon S3 and have the page make API calls to a backend hosted in AWS Lambda. The Lambda function will automatically scale by running multiple functions in parallel. This way, you do not require any Amazon EC2 instances and you only pay for resources when somebody actually uses the website. It would be the cheapest architecture to run. However, you would need to rewrite your web page and back-end code to fit this architecture.

Reduce Cloud Run on GKE costs

would be great if I could have to answers to the following questions on Google Cloud Run
If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance?
The memory allocated dropdowon
If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs.
curl -v -H "Host: hello.default.example.com" YOUR-IP
So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?
Any help will be very much appreciated. Thank you.
1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param
https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu
2: can you precise what is asked and when performing which operation?
3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.
4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.

Protect an unauthenticated Cloud Run endpoint

when I make an unauthenticated (public) Cloud Run endpoint to host an API, what are my options to protect this endpoint from malicious users making billions of HTTP requests?
For $10 you can launch a Layer 7 HTTP flood attack that can send 250k requests per second. Let's assume your Cloud Run endpoints scale up and all requests are handled. For invocations alone, you will pay $360,-/hour (at $0.40 per million requests).
Note that there is a concurrency limit and a max instance limit that you might hit if the attack is not distributed over multiple Cloud Run endpoints. What other controls do I have?
As I understand, the usual defenses with Cloud Armor and Cloud CDN are bound to the Global Load Balancer, which is unavailable for Cloud Run, but is available for Cloud Run on GKE.
For unauthenticated invocations to a Cloud Run service with an IAM Cloud Run Invoker role set to the allUsers member type, I would expect the answer to be the same as those provided here - https://stackoverflow.com/a/49953862/7911479
specifically:
Cloud Functions sits behind the Google Front End which mitigates and absorbs many Layer 4 and below attacks, such as SYN floods, IP fragment floods, port exhaustion, etc.
It would certainly be great to get a clear Y/N answer on Cloud Armor support.
[Edit]: I have been thinking on this quite a lot and have come to the following conclusion:
if you expect you are likely to become a victim of an attack of this type then I would monitor your regular load/peak and set your account's ability to scale just above that load. Monitoring will allow you to increase this as your regular traffic grows over time. It appears to be the only good way. Yes, your service will be down once you reach your account limits, but that seems preferable in the scenario where you are the target.
An idea which I am yet to try is a protected route with Firebase Authentication and anonymous authentication.

AWS Server Configuration

I'm new in AWS. For one project we require to purchase server on AWS. I don't know what configuration is required for the server. Our website will be like https://www.justdial.com/ and minimum 1000 users every time will be online on the website. Please, what configuration will be best with minimum pricing. I'm mentioning details below, what we want;
> • 1 - Elastic IP
> • 1 - Load Balancer
> • 2 - Webserver + autoscaling
> • 1 - Database SQL
> • 1 - S3 storage backup
> • CDN
if anything else is missing please guide me.
Go for micro-service architecture, and create labmda for every service. You can use private RDS for security. Using labmda based serverless approach will cost you on the basis of API request per API. Since during night time request reduces to close to zero, for that duration you wont be charged. AWS lambda auto balance load and availability of service to everyone with minimum CPU, memory usage. You won't be needing your load balanced as AWS does it by default.
Based upon your requirements use of a VM won't be a good idea, as most of these, Load-Balancer, Webserver autoscaling, are free for serverless lambda, and using RDS will put your database cost to minimum in place of owning a VM and scaling VM resources.
It really depends on your application. If all you do is return static pages, you might be fine with the smallest instance and CDN like CloudFront. If every request is dynamic and takes massive computations, you need some strong servers.
I suggest you start with some reasonable settings (e.g. t3.medium) and then load-test it to figure out what you really need. There are many tools for that. You basically need something that will generate a lot of requests to your servers and track errors, latency and total response time. If any of those metrics come back insufficient (this also depends on your needs), add more resources. Make sure to leave room for traffic spikes.

Jmeter load test with 30K users with aws

My scenario is mentioned below, please provide the solution.
I need to run 17 HTTP Rest API's for 30K users.
I will create 6 AWS instances (Slaves) for running 30K (6 Instances*5000 Users) users.
Each AWS instance (Slave) needs to handle 5K Users.
I will create 1 AWS instance (Master) for controlling 6 AWS slaves.
1) For Master AWS instance, what instance type and storage I need to use?
2) For Slave AWS instance, what instance type and storage I need to use?
3) The main objective is a Single AWS instance need to handle 5000Users (5k) users, for this what instance type and storage I need to use? This objective needs to solve for low cost (pricing)?
Full ELB DNS Name:
The answer is I don't know, this is something you need to find out how many users you will be able to simulate on this or that AWS instance as it depends on the nature of your test, what it is doing, response size, number of postprocessors/assertions, etc.
So I would recommend the following approach:
First of all make sure you are following recommendations from the 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
Start with single AWS server, i.e. t2.large and single virtual user. Gradually increase the load at the same time monitor the AWS health (CPU,RAM, Disk, etc) using either Amazon CloudWatch or JMeter PerfMon Plugin. Once there will be a lack of the monitored metrics (i.e. CPU usage exceeds 90%) stop your test and mention the number of virtual users at this stage (you can use i.e. Active Threads Over Time listener for this)
Depending on the outcome either switch to other instance type (i.e. Compute Optimized if there is a lack of CPU or Memory Optimized if there is a lack of RAM) or go for higher spec instance of the same tier (i.e. t2.xlarge)
Once you get the number of users you can simulate on a single host you should be able to extrapolate it to other hosts.
JMeter master host doesn't need to be as powerful as slave machines, just make sure it has enough memory to handle incoming results.