Is Cloud Run limited by cold starts and maximum execution length? - google-cloud-platform

When using cloud functions we have the limitations related to cold starts and the maximum execution length of 9 minutes. Does any of these limitations also exist on Google Cloud Run?

According to the documentation, there is a limit of 15 minutes before a timeout.
Cloud Run still has cold starts, but they are much less frequent that Cloud Functions, depending on your traffic patterns and the configured level of concurrency for an instance (max 80 concurrent requests, also from the documentation).

Related

Google Cloud Run concurrency limits + autoscaling clarifications

Google Cloud Run allows a specified request concurrency limit per container. The subtext of the input field states "When this concurrency number is reached, a new container instance is started" Two clarification questions:
Is there any way to set Cloud Run to anticipate the concurrency limit being reached, and spawn a new container a little before that happens to ensure that requests over the concurrency limit of Container 1 are seamlessly handled by Container 2 without the cold start time affecting the requests?
Imagine we have Maximum Instances set to 10, Concurrency set to 10 and there are currently 100 requests being processed (i.e. we've maxed our our capacity and cannot autoscale any more). What happens to the 101th request? Will it be queued up for some period of time, or will a 5XX be returned immediately?
Is there any way to set Cloud Run to anticipate the concurrency limit
being reached, and spawn a new container a little before that happens
to ensure that requests over the concurrency limit of Container 1 are
seamlessly handled by Container 2 without the cold start time
affecting the requests?
No. Cloud Run does not try to predict future traffic patterns.
Imagine we have Maximum Instances set to 10, Concurrency set to 10 and
there are currently 100 requests being processed (i.e. we've maxed our
our capacity and cannot autoscale any more). What happens to the 101th
request? Will it be queued up for some period of time, or will a 5XX
be returned immediately?
HTTP Error 429 Too Many Requests will be returned.
[EDIT - Google Cloud documentation on request queuing]
Under normal circumstances, your revision scales out by creating new
instances to handle incoming traffic load. But when you set a maximum
instances limit, in some scenarios there will be insufficient
instances to meet that traffic load. In that case, incoming requests
queue for up to 60 seconds. During this 60 second window, if an
instance finishes processing requests, it becomes available to process
queued requests. If no instances become available during the 60 second
window, the request fails with a 429 error code on Cloud Run (fully
managed).
About maximum container instances

How does the hourly price in AWS works when building an API?

We're building a Python-based web application that has a low usage (10 hits per month max), but needs high processing power.
So we thought AWS hourly cost would only charge us for when the API gets pinged, but is it really how it works?
Or we will pretty much have to pay for it 24 hours in order for the API always stay up?
It depends on which solution you use. EC2 instances are billed by the amount of time they run, so if you run a webserver on EC2 you'll pay for idle time. AWS Lambda functions run in response to events (like API Gateway requests) and you are charged by the number of invocations and the duration of the function. See the AWS Lambda pricing. With your low number of invocations per month, I would suggest using Lambda and API Gateway if it meets your requirements for processing power and if your processing time can be less than 15 minutes (Lambda's current max timeout).

GCP Dataflow vCPU usage and pricing question

I submited a GCP dataflow pipeline to receive my data from GCP Pub/Sub, parse and store to GCP Datastore. It seems work perfect.
Through 21 days, I found the cost is $144.54 and worked time is 2,094.72 hour. It means after I submitted it, it will be charged every sec, even there is not receive (process) any data from Pub/Sub.
Is this behavior normal? Or I set a wrong parameters?
I thought CPU use time only be counted when data is received.
Is there have any way to reduce the cost in same working model (receive from Pub/Sub and store to Datastore)?
The Cloud Dataflow service usage is billed in per second increments, on a per job basis. I guess your job used 4 n1-standard-1 workers, which used 4 vCPUs giving an estimated of 2,000 vCPU hr resource usage. Therefore, this behavior is normal. To reduce the cost, you can use either autoscaling, to specify the maximum number of workers, or the pipeline options, to override the resource settings that are allocated to each worker. Depending on your necessities, you could consider using Cloud Functions which cost less, but considering its limits.
Hope it helps.

How to handle backpressure using google cloud functions

Using google cloud functions, is there a way to manage execution concurrency the way AWS Lambda is doing? (https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html)
My intent is to design a function that consumes a file of tasks and publish those tasks to a work queue (pub/sub). I want to have a function that consumes tasks from the work queue (pub/sub) and execute the task.
The above could result in a large number of almost concurrent execution. My dowstream consumer service is slow and cannot consume many concurrent requests at a time. In all likelyhood, it would return HTTP 429 response to try to slow down the producer.
Is there a way to limit the concurrency for a given Google Cloud functions the way it is possible to do it using AWS?
This functionality is not available for Google Cloud Functions. Instead, since you are asking to handle the pace at which the system will open concurrent tasks, Task Queues is the solution.
Push queues dispatch requests at a reliable, steady rate. They guarantee reliable task execution. Because you can control the rate at which tasks are sent from the queue, you can control the workers' scaling behavior and hence your costs.
In your case, you can control the rate at which the downstream consumer service is called.
This is now possible with the current gcloud beta! You can set a max that can run at once:
gcloud beta functions deploy FUNCTION_NAME --max-instances 10 FLAGS...
See docs https://cloud.google.com/functions/docs/max-instances
You can set the number of "Function invocations per second" with quotas. It's documented here:
https://cloud.google.com/functions/quotas#rate_limits
The documentation tells you how to increase it, but you can also decrease it to achieve the kind of throttling that you are looking for.
You can control the pace at which cloud functions are triggered by controlling the triggers themselves. For example, if you have set "new file creation in a bucket" as trigger for your cloud function, then by controlling how many new files are created in that bucket you can manage concurrent execution.
Such solutions are not perfect though because sometimes the cloud functions fails and get restart automatically (if you've configure your cloud function that way) without you having any control over it. In effect, the number of active instances of cloud functions will be sometimes more than you plan.
What AWS is offering is a neat feature though.

Scheduling long-running tasks using AWS services

My application heavily relies on AWS services, and I am looking for an optimal solution based on them. Web Application triggers a scheduled job (assume repeated infinitely) which requires certain amount of resources to be performed. Single run of the task normally will take maximum 1 min.
Current idea is to pass jobs via SQS and spawn workers on EC2 instances depending on the queue size. (this part is more or less clear)
But I struggle to find a proper solution for actually triggering the jobs at certain intervals. Assume we are dealing with 10000 jobs. So for a scheduler to run 10k cronjobs (the job itself is quite simple, just passing job description via SQS) at the same time seems like a crazy idea. So the actual question would be, how to autoscale the scheduler itself (given the scenarios when scheduler is restarted, new instance is created etc. )?
Or the scheduler is redundant as an app and it is wiser to rely on AWS Lambda functions (or other services providing scheduling)? The problem with using Lambda functions is the certain limitation and the memory provided 128mb provided by single function is actually too much (20mb seems like more than enough)
Alternatively, the worker itself can wait for a certain amount of time and notify the scheduler that it should trigger the job one more time. Let's say if the frequency is 1 hour:
1. Scheduler sends job to worker 1
2. Worker 1 performs the job and after one hour sends it back to Scheduler
3. Scheduler sends the job again
The issue here however is the possibility of that worker will be get scaled in.
Bottom Line I am trying to achieve a lightweight scheduler which would not require autoscaling and serve as a hub with sole purpose of transmitting job descriptions. And certainly should not get throttled on service restart.
Lambda is perfect for this. You have a lot of short running processes (~1 minute) and Lambda is for short processes (up until five minutes nowadays). It is very important to know that CPU speed is coupled to RAM linearly. A 1GB Lambda function is equivalent to a t2.micro instance if I recall correctly, and 1.5GB RAM means 1.5x more CPU speed. The cost of these functions is so low that you can just execute this. The 128MB RAM has 1/8 CPU speed of a micro instance so I do not recommend using those actually.
As a queueing mechanism you can use S3 (yes you read that right). Create a bucket and let the Lambda worker trigger when an object is created. When you want to schedule a job, put a file inside the bucket. Lambda starts and processes it immediately.
Now you have to respect some limits. This way you can only have 100 workers at the same time (the total amount of active Lambda instances), but you can ask AWS to increase this.
The costs are as follows:
0.005 per 1000 PUT requests, so $5 per million job requests (this is more expensive than SQS).
The Lambda runtime. Assuming normal t2.micro CPU speed (1GB RAM), this costs $0.0001 per job (60 seconds, first 300.000 seconds are free = 5000 jobs)
The Lambda requests. $0.20 per million triggers (first million is free)
This setup does not require any servers on your part. This cannot go down (only if AWS itself does).
(don't forget to delete the job out of S3 when you're done)