I'm working on a video rendering server in Node.js. It spawns multiple headless chrome and uses puppeteer library to capture the screenshot and feed them to ffmpeg. Later it concats all the parts with some post processing.
Now I want to move it to production but the it's not performing efficiently.
Tried serverless architecture and cloudrun etc but still unable to achieve it. Also they clearly mention that these are not meant for heavy and long running tasks. The video is taking too much to get rendered and even longer than time my laptop takes.
I tried using GCE, the results are satisfactory but now I'm having hard time to scale it. Actually the server can only handle one request at a time efficiently. How to scale horizontally and make sure that each get only one request at a time?
Thanks in advance.
To scale up number of identical instances you can use Managed Instance Groups. Have a look at the autoscaling documentation to get better understanding how it works but basically it says:
You can autoscale based on one or more of the following metrics that reflect the load of the instance group:
Average CPU utilization.
HTTP load balancing serving capacity, which can be based on either utilization or requests per second.
Cloud Monitoring metrics.
If you will be autoscaling based on the CPI uage the just enable autoscaling and set it up when creating a new group of instances;
Here's an example gcloud command to do this:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--max-num-replicas 20 \
--target-cpu-utilization 0.60 \
--cool-down-period 90
You can also use any available metric to scale up your group or even create a new custom metric that will trigger scaling up your group;
You can create custom metrics using Cloud Monitoring and write your own monitoring data to the Monitoring service. This gives you side-by-side access to standard Google Cloud data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.
And last - I've found this example use case that scales up group of VM's based on pub/sub queue which might be the solution you're looking for.
Related
Use Case:
Shutdown group of EC2 instances where average CPU usage is lower then X percent - but not on one of them but on all of them (like average of group).
I have group of VMs that is on Product setup. I need all of them ON to run working product. Sometime nobody uses it and for cost saving I can power off all VMs. I have implemented Cloud Watch alert with action to shutdown when average CPU usage is lower then 20% and it works. But it is per VM. I don't want to have one VM from whole setup as Powered OFF. I need whole setup ON or OFF.
I was wondering to have AWS Lambda that check alerts on all VMs and based on those information make some decision.
Do you think this is way to go? Do you have diffrent idea to achive this goal? Is that possible to check Cloud Watch alarms from AWS Lambda (to determine if alert is ON or OFF)?
Looking for general ideas
I am currently running a video encoding application on ECS but auto scaling is my biggest problem.
Users start live video encoding jobs from a front end. Once a job is placed, this is added as a redis queue (rq) job that runs on an ECS task placed on a c5d.large instance using ffmpeg.
Autoscaling is currently based on alarms. If cpu is > than a set percentage, a new instance and task is spawned. If cpu is low, instances are checked and if no jobs are running they are destroyed.
This is not a bad solution but it feels clunky and slow. If a user wants to start two jobs one right after the other, it takes a couple of minutes for the instance to spawn + task to be placed (even using warm groups).
Plus cloudwatch alarms take a while to refresh and are not a super reliable way of defining work that is being done (a video encoding at 720p will use less cpu than one at 1080p and thus mess all my alarm settings).
Is there a better solution that someone can guide me to that allows for fast and precise autoscaling other than relying on cloudwatch alarms? I am tempted to try to create my own autoscaling system based on current executing jobs / workers and spawn/destroy instances directly calling the API from my code, but I'm hoping to find a better solution directly from within AWS.
Thanks
I too have this exact problem, AWS already has mediaconvert/elastictranscoder but it's just too expensive & I decided to create my own firstly on lambda with SST.dev (serverless) where all jobs are a single function invocation but I had issues with 15mins function timeout mostly because I'm not copying codecs.
scaling at this point I would think is Kubernetes. This is the sort of problem that Kubernetes is intended to handle (dynamic resource scaling on demand). Kubernetes is rather non-trivial. K8s is what the industry has settled on for the most part, so there are probably a lot of reasons to just go that route. You could start with K3S (psst! i just knew that today) and move up to K8s when you are ready.
Since you're trying to find a solution directly from within AWS, you can try EKS but I'm not completely sure what the best would be.
We have implemented kube-state metrics (by following the steps mentioned in this article section 4.4.1 Install monitoring components) on one of our kubernetes clusters on GCP. So basically it created 3 new deployments node-exporter, prometheus-k8s and kube-state metrics on our cluster. After that, we were able to see all metrics inside Metric Explorer with prefix "external/prometheus/".
In order to check External metrics pricing, we referred to this link. Hence, we calculated the price accordingly but when we received the bill it's a shocking figure. GCP has charged a lot of amount but we haven't added any single metric in dashboard or not set monitoring for anything. From the ingested volume (which is around 1.38GB/day), it looks these monitoring tools do some background job (at specific time it reads some metrics or so) which consumed this volume and we received this bill.
We would like to understand how these kube-state metrics monitoring components work. Will it automatically get metrics data and increase the ingested volume and bill in such way or there is any mis-configuration in its setup?
Any guidance on this would be really appreciated!
Thank you.
By default, when implemented, kube-state-metrics exposes several metrics for events across your cluster:
If you have a number of frequently-updating resources on your cluster, you may find that a lot of data is ingested into these metrics which incurs high costs.
You need to configure what metrics you'd like to expose, as well as consult the documentation for your Kubernetes environment in order to avoid unexpected high costs.
I have a small number of HTTP servers on GCP VMs. I have a mixture of different server languages and Linux based OS's.
Questions
A. It it possible to use the Stackdriver monitoring service to set alerts at specific percentiles for HTTP response latencies?
B. Can I do this without editing the code of each server process?
C. Will installing the agent into the VM report HTTP latencies?
For example, if the 95th percentile goes over 100ms for a certain time period I want to know.
I know I can do this for CPU utilisation and other hypervisor provided stats using:
https://console.cloud.google.com/monitoring/alerting
Thanks.
Request latencies are extracted by cloud load balancers. As long as you are using cloud load balancer you don't need to install monitoring agent to create alerts based 95th Percentile Metrics.
Monitoring agent captures latencies for some preconfigured systems such as riak, cassandra and some others. Here's a full list of systems and metrics monitoring agent supports by default: https://cloud.google.com/monitoring/api/metrics_agent
But if you want anything custom, i.e. you want to measure request latencies from VM you would need to capture response times yourself and configure logging agent to create a custom metric which you can use to create alerts. And as long as you are capturing them as distribution metrics you should be able to visualise different percentiles (i.e. 25, 50, 75, 80, 90, 95 and 99th etc.) and create alert based on that.
see: https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics
A. It it possible to use the Stackdriver monitoring service to set
alerts at specific percentiles for HTTP response latencies?
If you want to simply consider network traffic, yes it is possible. Also if you are using a load balancer it's also possible to set alerts on that.
What you want to do should be pretty straight forward from the interface, however you can also find more info in the documentation.
If you want to use some advanced metric on top of tomcat/apache2 etc, you should check the list of metrics provided by the stackdriver monitoring agent here.
B. Can I do this without editing the code of each server process?
Yes, no need to update any program, stackdriver monitoring works transparently and will be able to fetch basic metrics from a GCP VMs without the need of the monitoring agent, including network traffic and cpu utilization.
C. Will installing the agent into the VM report HTTP latencies?
No, the agent shouldn't cause any http latencies.
I am curious to know what does the model.deploy command actually does in the background when implemented in aws sagemaker notebook
for eg :
predictor = sagemaker_model.deploy(initial_instance_count=9,instance_type='ml.c5.xlarge')
and also at the time of sagemaker endpoint autoscaling what is happening in the background, it is taking to long almost 10 minutes to launch a new-instances, by which most of the requests get dropped or not processed and also getting connection timeout while load testing threw JMeter. Is there any way to fast bootup or golden AMI kind of thing in sagemaker?
are there any other means by which this issue can be solved?
The docs mention what the deploy method does: https://sagemaker.readthedocs.io/en/stable/model.html#sagemaker.model.Model.deploy
You could also take a look at the source code here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L377
Essentially the deploy method hosts your model on a SageMaker Endpoint, launching the number of instances using the instance type that you specify. You can then invoke your model using a predictor: https://sagemaker.readthedocs.io/en/stable/predictors.html
For autoscaling, you may want to consider lowering your threshold for scaling out so that the additional instances start to be launched earlier. This page offers some good advice on how to determine the RPS your endpoint can handle. Specifically, you may want to have a lower SAFETY_FACTOR to ensure new instances are provisioned in time to handle your expected traffic.