I have several GCP VMs running and each has a crontab to query a RESTful API every minute. For about two hours, they were all doing this just fine, when all of the sudden, all of their Packet/sec rates plummeted at the same time. They each were at about 1000 packets/sec when this happened. They are all still running, just much much slower. Does google cloud limit this? I could not find anything in their documentation to indicate otherwise.
Related
I have been using cloud run on GCP for several years. I recently moved a service over from AWS Lambda, which costs me like 10 dollars a month, but on cloud run it went to 50 dollars per day. The cloud run service is simply a background worker that should ideally run for 3 minutes max. It loads, schedules, and summarizes jobs. If there are 50 jobs, it should run 50 background workers so it can bursts, which cloud run complains that it cannot allocate quickly enough. Since I use a distributed lock, there should be only 50 workers running regardless.
I tried to figure out the costs breakdown and GCP in my experience does not provide any defaults. I click on the costs breakdown and it just shows me a graph of the monthly costs going from 50 dollars to 800 dollars per month. There is no costs breakdown. I'll try to implement labels, but in general, how do I determine the costs break down for cloud run. AWS lambda is simple because it essentially reduces to execution costs especially within the same VPC.
Besides labels, how do I determine the costs for cloud run? When I drill down further is this CPU allocation that is huge costs, both in tier 2 and standard tier?
Any suggestions or help would be great :)
I just tested Elastic Beanstalk which seemed like the closest thing (just deploy docker) but then when I set it to scale from 0 to 5 instances, it correctly scales to 0 and then never scales back up from 0. ie. It properly shuts down pre-production and staging environments on weekends and nights but then would never turn on again unlike GCP cloud run.
The savings on GCP cloud run (because things shut down to 0) vs. AWS is insanely huge. Comparing my previous company (25 people), GCP costs were 50 bucks for the month and currently because nothing turns off, we are seeing 600 for approximately equivalent stuff.
As we scale and grow(more and more services), I see this only getting worse and worse with more and more services that can't scale back to 0 when not in use :(. Even at Twitter, staging and testing environments, we want scaled to 0 when not in use automatically to save TONS of money.
For startups, AWS seems like a very bad cost model these days unless using lambdas which is only functions. They have no lambdas equivalent for services like cloud run? or do they?
I hope I am wrong here and someone knows something as once we scale up with more services, our costs are going to just get worse with 3 environments, I do not want 3x the cost since many of these should spin down automatically and back up automatically.
EDIT: Devops just tried AWS EKS to set it to min instance count=0 but then just got timeouts as it never spun up instances. It seems it scales to 0 fine but can't scale out of 0 like cloud run :(. That is odd as many documents claim AWS EKS scales to 0 but IMHO, you can't say that if you can't scale back out of 0 to 1, 2, 3 under load. I am very confused then and wondering if configuration is wrong?
App Runner is near one to one with cloud run but there is a github issue request scale to 0. It is basically the same service so thanks to Adi Dembark above!!!
I have a query that runs against a SQL Server instance that takes anywhere from 5 minutes to 75 minutes to complete. The response size is anywhere from a few rows to 1GB of data. I have a parquet writer that only has to wait until the query completes and results sent back and it will write the results to Google Cloud Storage.
What would be the best product to accomplish this, and is their one that would have roughly zero startup time? The two that came to mind for me were Cloud Functions and Cloud Run, but I've never used either.
Neither service meets your requirement of 75 minutes.
Cloud Functions times out at 540 seconds.
Cloud Functions Time Limits
Cloud Run times out at 60 minutes.
Cloud Run Request Timeout
For that sort of runtime, I would launch a container in Compute Engine Container-Optimized OS.
Container-Optimized OS
There is the possibility of configuring Cloud Run cpu-throttling so that you can run tasks in the background.
Run more workloads on Cloud Run with new CPU allocation controls
Note that you will be paying for the service on a constant basis as it is no longer running services (containers) on demand.
I am using Google Cloud Instance for 1 of my website.
But daily at same time my server went down. you can say that only 1 - 10 minutes difference daily maximum
When I checks in monitoring it shows me that Disk Throughput (Write) is very high.
I changed disk as well as using N2 Type machine
Waiting for Suggestions.
Thanks
For this scenarios usually an application running in your VM is consuming more resources than the VM has.
You could also review if there is any peak at the same time for CPU utlization and or if there is any peak network traffic this could point to to http requests overlading your vm.
As shot term solution you could add more persistant disk and change the machien type to increse the I/O disk performance , for reference you can review the article Optimizing persistent disk performance
I'm running flask restplus api on google container engine with TCP Load Balancer. The flask restplus api makes calls to google cloud datastore or cloud sql but this does not seem to be the problem.
A few times a day or even more, there is a moment of latency spikes. Restarting the pod solves this or it solves itself in a 5 to 10 minute period. Of course this is too much and needs to be resolved.
Anyone knows what could be the problem or has experience with these kind of issues?
Thx
One thing you could try is monitoring your instance CPU load.
Although the latency doesn't correspond with usage spikes, it may be the case that there is a cumulative effect on CPU load and the latency you're experiencing occurs when the CPU reaches a given % and needs to back off temporarily. If this is the case you could make use of cluster autoscaling, or try running a higher spec machine to see if that makes any difference. Or, if you have limited CPU use on pods/containers, try increasing this limit.
If you're confident CPU isn’t the cause of the issue, you could try to SSH into the affected instance when the issue is occurring, send a request through the load balancer and use tcpdump to analyse the traffic coming in and out. You may be able to spot if the latency stems from the load balancer (by monitoring the latency of HTTP traffic to the instance), or to Cloud Datastore or Cloud SQL (from the instance).
Alternatively, try using strace to monitor the relevant processes both before and during the latency, or dtrace to monitor the system as a whole.