About Cloud Run charges by hardware resources: Am I billed for underused memory? - google-cloud-platform

I have a question about Cloud Run: If I setup my service with 4GB of RAM and 2 vCPUs, for example, differing from the standard 256MB and 1 vCPU, will I have to pay much more even if I never consume all the resources I have made available? For example again, let's say that I set --memory to 6GB and no request ever consumes more than 2GB, will I pay for 6GB of RAM or 2GB, considering that the peak of usage was 2GB?
I am asking because I want to be sure that my application will never die out of memory, since I think that the default 256MB of Cloud Run isn't enough for me, but I want to be sure of how Google charges and scales.

Here's a quote from the docs:
You are billed only for the CPU and memory allocated during billable time, rounded up to the nearest 100 milliseconds.
Meaning, if you allocated 4GB of memory on your Cloud Run service, you're still billed with 4GB whether it's underused or not.
On your case, since you want to make sure that requests don't run out of memory, then you can dedicate an instance resource to each request. With this, you just find out the cheapest memory setting that can run your requests and limit the concurrency setting to one.
Or, take advantage of concurrency (and allocate higher memory) because Cloud Run allows concurrent requests so you have control on how many requests can share resources before it starts a new instance, which can be a good thing as it helps drive down costs and minimize starting many instances (see cold starts). This can be a better option if you are confident that certain amount of requests can share an instance without running out of memory.
When more container instances are processing requests, more CPU and memory will be used, resulting in higher costs. When new container instances need to be started, requests might take more time to be processed, decreasing the performances of your service.
Note that each approach have different advantages and drawbacks. You should take note of each before making a decision. Experimenting with the costs by using GCP Calculator can help (the calculator includes Free Tier on the computation).

Related

ECS clarify on resources

I'm having trouble understanding the config definitions of a task.
I want to understand the resources. There are a few options (if we talk only about memory):
memory
containerDefinitions.memory
containerDefinitions.memoryReservation
There are a few things I'm not sure about.
First of all, the docs say that when the hard limit is exceeded, the container will stop running. Isn't the goal of a container orchestration service to keep the service alive?
Root level memory must be greater than all containers memory. In theory I would imagine once there aren't enough containers deployed, new containers are created for the image. I wouldn't like to use more resources than I need, but if I reserve the memory on root level, first, I do reserve much more than needed, and second, if my application receives a huge load, the whole cluster will shut down if the memory limit is exceeded or what?
I want to implement a system that auto-scales, and I would imagine that this way I don't have to define resources allocated, it just uses the amount needed, and deploys/kills new containers if the load increases/decreases.
For me there are a lot of confusion around ECS, and Fargate, and how it works, how it scales, and the more I read about it, the more confusing it gets.
I would like to set the minimum amount of resources per container, at how much load to create a new container, and at how much load to kill one (because it's not needed anymore).
P.S. not experienced in devops in general, I used kubernetes at my company, and there are things I'm not clear about, just learning this ECS world.
First of all, the docs say that when the hard limit is exceeded, the container will stop running. Isn't the goal of a container orchestration service to keep the service alive?
I would say the goal of a container orchestration service is to deploy your containers, and restart them if they fail for some reason. A container orchestration service can't magically add RAM to a server as needed.
I want to implement a system that auto-scales, and I would imagine that this way I don't have to define resources allocated, it just uses the amount needed, and deploys/kills new containers if the load increases/decreases.
No, you always have to define the amount of RAM and CPU that you want to reserve for each of your Fargate tasks. Amazon charges you by the amount of RAM and CPU you reserve for your Fargate tasks, regardless of what your application actually uses, because Amazon is having to allocate physical hardware resources to your ECS Fargate task to ensure that much RAM and CPU are always available to your task.
Amazon can't add extra RAM or CPU to a running Fargate task just because it suddenly needs more. There will be other processes, of other AWS customers, running on the same physical server, and there is no guarantee that extra RAM or CPU are available on that server when you need it. That is why you have to allocate/reserve all the CPU and RAM resources your task will need at the time it is deployed.
You can configure autoscaling to trigger on the amount of RAM your tasks are using, to start more instances of your task, thus spreading the load across more tasks which should hopefully reduce the amount of RAM being used by each of your individual tasks. You have to realize each of those new Fargate task instances created by autoscaling are spinning up on different physical servers, and each one is reserving a specific amount of RAM on the server they are on.
I would like to set the minimum amount of resources per container, at how much load to create a new container, and at how much load to kill one (because it's not needed anymore).
You need to allocate the maximum amount of resources all the containers in your task will need, not the minimum. Because more physical resources can't be allocated to a single task at run time.
You would configure autoscaling with the target value, of for example 60% RAM usage, and it would automatically add more task instances if the average of the current instances exceeds 60%, and automatically start removing instances if the average of the current instances is well below 60%.

AWS batch job minimum memory requirement

I'm running many small jobs with AWS batch, the jobs can run with just 200MB of memory which I have tested using AWS Lambda. But when specifying the minimum memory for the job, I find if I use any value smaller than 1024MB, the job will simply fail without ever starting. Does this mean I can only use memory at least 1024MB for this case? I thought I can use 512MB due to the presence of t2.nano.
P.S. I find t2.nano is only available in us-east-1 while I'm working with us-east-2, maybe that is the cause?
If you specify 512MB for the job, and none of your compute resources have 512MB or greater of memory available to satisfy this requirement, then the job cannot be placed in your compute environment.
Because of platform memory overhead and memory occupied by the system kernel, this number is different than the installed memory amount that is advertised for Amazon EC2 instances. For example, an m4.large instance has 8 GiB of installed memory. However, this does not always translate to exactly 8192 MiB of memory available for jobs when the compute resource registers.
For more information, please check:
https://docs.aws.amazon.com/batch/latest/userguide/memory-management.html

AWS, how high should I set memory limit?

I'm running web service on EC2-small which has 2gig memory.
It has soft and hard limit option.
I had set soft limit to 1gig and my server kept crashing when load is high.
Then I found the following SO post,
AWS ECS Task Memory Hard and Soft Limits says, it's better to use hard limit if I'm memory bound.
So how high should I set my hard limit for my 2gig memory EC2 machine?
I want my EC2 don't crash and scale up with auto scaling group policy.
Check how much memory is being used by the instance when no container is running on it, and then set the limit accordingly.
For ubuntu-based instances the OS uses around 400-500MB, so the hard limit would be 1.4G to play safe.
Also, the hard limit ensures that many replicas can't be launched in the same instance if there are not enough resources, whereas the soft limit does not. In other words, you might have a soft limit of 1.5G and two replicas, and both could be run on the same instance. However, a hard limit of 1.5G won't allow two replicas on the same one.

Capacity planning on AWS

I need some understanding on how to do capacity planning for AWS and what kind of infrastructure components to use. I am taking the below example.
I need to setup a nodejs based server which uses kafka, redis, mongodb. There will be 250 devices connecting to the server and sending in data every 10 seconds. Size of each data packet will be approximately 10kb. I will be using the 64bit ubuntu image
What I need to estimate,
MongoDB requires atleast 3 servers for redundancy. How do I estimate the size of the VM and EBS volume required e.g. should be m4.large, m4.xlarge or something else? Default EBS volume size is 30GB.
What should be the size of the VM for running the other application components which include 3-4 processes of nodejs, kafka and redis? e.g. should be m4.large, m4.xlarge or something else?
Can I keep just one application server in an autoscaling group and increase as them as the load increases or should i go with minimum 2
I want to generally understand that given the number of devices, data packet size and data frequency, how do we go about estimating which VM to consider and how much storage to consider and perhaps any other considerations too
Nobody can answer this question for you. It all depends on your application and usage patterns.
The only way to correctly answer this question is to deploy some infrastructure and simulate standard usage while measuring the performance of the systems (throughput, latency, disk access, memory, CPU load, etc).
Then, modify the infrastructure (add/remove instances, change instance types, etc) and measure again.
You should certainly run a minimal deployment per your requirements (eg instances in separate Availability Zones for High Availability) and you can use Auto Scaling to add extra capacity when required, but simulated testing would also be required to determine the right triggers points where more capacity should be added. For example, the best indicator might be memory, or CPU, or latency. It all depends on the application and how it behaves under load.

Amazon ECS Task Definition - CPU units & Memory - set container to use 100% of the EC2 available Resources

I'd like to have multiple different services running on an ECS cluster, each service should be running on a single EC2 instance. The EC2 instances type for all services are the same. And I would like those services to use all their hosting EC2 available resources.
I have the assumption that if i use only the soft memory parameter (without using the hard one ) in the Task Configuration, this will allow my container instance to use all the available memory on the EC2 instance hosting it and that i won't be limiting. Is that correct?
As for the EC2 type (t2.micro [vCPU=1, Memory=1Gib] for example) !! is it possible to simply put:
{
...
"memory": 1024,
"cpu": 1024,
...
}
Since the EC2 should be already set up with a bunch of Container Service Requirements.
Is it correct that you're trying to have each ECS Instance handle only a single task per instance?
The short answer to your question is, no. Usually the amount of memory made available to your containers is a bit less than the amount of memory available on the machine itself. This is so that the operating system has enough memory to keep running. From my experience, a T2.Small, which has 2048 MB of memory will end up with 2004 MB available for containers.
When it comes to your task definition, there are two ways of specifying Memory. The memory setting is a hard limit. If the containers memory usage hits this amount, the container will be terminated. If on the other hand, you specify memoryReservation, that much memory will be reserved for the task, but it can use more, up to the total amount of the machine. Check out the Task Definition documentation for further details.
An important consideration here is that only one of memory and memoryReservation are required. If both are used, memoryReservation should be less than memory. If you are only going to specify one of these, I'd recommend memoryReservation, as it will allow your task to use up to the total memory on the machine. If both are used, the memoryReservation will be used in calculating the amount of memory consumed by a task.
When placing tasks on an instance, it looks at the amount of available memory, that is the registered amount of memory for the instance, minus any tasks already placed on it. If this number is less than the amount of memory required for a task, no task will be placed on it. If no instance has enough memory for the task, it will not be placed, and the error will be logged in the Services Events log.
So it's important to look at the amount of memory actually registered by your instance type, and then ensure your memory or memoryReservation are lower than the amount registered by your instances. Otherwise, your tasks will never be placed.
As for cpu, this value is not required, and if not specified, all tasks on an instance are allowed an equal portion of the CPU available on the system. If only one task is on the instance, it can use the entire CPU of the instance by default.