Latency in calls to Google Cloud APIs from Cloud Run instance

Latency in calls to Google Cloud APIs from Cloud Run instance - google-cloud-platform

When I make the calls from Cloud Run instance to other cloud APIs for some reason there are huge delays in responses.
Everything works within 1 project.
Even from local machine the calls are much faster (couple of secs) - but deployed in the cloud it takes couple of mins for some requests to complete. As I see it is relevant for all APIs (apart from Firestore, Translate and TTS APIs as well). This is not related to cold starts for sure.
Code example (Node JS) and logs are below:
console.log('Received the request for stats');
const usersCollection = this.firestore.collection('users')
const snapshot = await this.usersCollection.get();
console.log('Fetched all users from Firestore');

Well after some further investigation I figured out what the problem was.
The thing is that all the operations I perform happen not before the response is sent but after (this is the way chatbot is architectured).
So the flow looks like this:
request to do smth - response 200 that the request is accepted
all the business logic and work
chatbot sends the message with the results
According to the docs the CPU is allocated only during the request processing by default so the only thing I had to change is to enable CPU allocation for background activities: https://cloud.google.com/run/docs/tips/general#background-activity

Related

AWS API Gateway Slowness

This has me perturbed. I have a basic API gateway that is supposed to be capped at 10,000 requests per second with 5,000 request bursts. However, when hooked up to Lambdas, best I can hit currently is ~70 requests / second.
The end-points I have are basic Lambda proxies created with Serverless framework (HTTP EDGE).
I know that the lambda itself is not the bottleneck as I have the same issue when I replace the lambda with an empty function. I have 100+ allocated concurrency for the lambda, but the lambda never appears to hit the limit.
functions:
loadtest:
handler: loadtest/index.handler
reservedConcurrency: 200
events:
- http: POST load_test
I'm wondering if there's something that I am overlooking here. My test runs for a minute and attempts to hit 200 req / sec (works fine with other so it's not my bandwidth). The delays grow to be as much as 20-30s at some point, so clearly something is choking up.
If it's a warm up issue - how long am I expected to run such load until everything is running warm?
Any ideas on where to look or additional information that I could share?
[Edit] I am using node12.x and I even tried with this code:
const AWS = require('aws-sdk');
AWS.config.update({region: '<my-region>'});
var sqs = new AWS.SQS({apiVersion: '2012-11-05'});
exports.handler = async (event, context) => {
return {"status":"ok", ... }
};
The results were basically the same. I'm not sure where the bottle neck is, to be honest. I can try further testing with concurrency on the lambda side, but going from 100 to 200 had no effect - the completed requests clocks at around 70/s for an empty function.
Also, I'm using loadtest npm package to perform the loadtest and this is what the output looks like:
{ totalRequests: 8200,
totalErrors: 0,
totalTimeSeconds: 120.00341689999999,
rps: 68,
meanLatencyMs: 39080.6,
maxLatencyMs: 78490,
minLatencyMs: 427,
percentiles: { '50': 38327, '90': 70684, '95': 74569, '99': 77679 },
errorCodes: {},
instanceIndex: 0 }
Here's a picture of how provisioned concurrency looked like over that period of time. I ran this over 2 minutes with the target at 200 req/sec.

Appears that this is actually an issue with WSL2 and NodeJS. The exact nature of it is still unclear, but it is not an issue with the API gateway itself. I demonstrated this by running it on MacBook and everything worked fine and request counts were high.
There are posts suggesting that this is an issue with the Node HTTP client & DNS, so perhaps that's a good starting point, but the above question is moot.

How to use Google Cloud PubSub and Run to handle resource-intensive long-running tasks?

I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions

Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.

Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.

Alexa sent multiple request to AWS Lambda

I'm building the Alexa skill that sends the request to my web server,
then web server will do some process and upload a file to Amazon S3.
During the period of web server process, I make skill keep getting the file from Amazon S3 per 10 seconds till get the file. And the response is based on the file content.
But unfortunately, the web server process takes more than 1 minute. That means skill must stay more than 1 minute to get the file to response.
For now, I used progressive response with async await in my code,
and skill did keep waiting for the file on S3.
But I found that the skill will send the second request to Lambda after 50 seconds automatically. That means for the same skill, i got the two lambda function running at the same time.
And the execution result is : After the first response that progressive response made, 50 seconds later will hear another response that also made by the progressive response which belongs to the second request.
And nothing happened till the end.
I know it is bad to let skill waits this long, but i still want to figure out the executable way if skill needs to wait this long.
There are some points I want to figure out.
Is there anyway to prevent the skill to send the second
requests to Lambda?
Is there another way I can try to accomplish the goal?
Thanks

Eventually, I found that the second invoke of Lambda is not from Alexa, is from AWS Lambda itself. Refer to the following artical
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
So you have to deal with this kind of situation in your Lambda code. One thing can be used is these two times invoke's request id is the same. So you can tell if this is the first time execution by checking your storage for the same request id which you store at the first time execution.
Besides, I also found that once the Alexa Skill waits for more than 1 minutes, it will crash and return the error by speaking (test by Amazon Echo). And there is nothing different in the AWS Lambda log compare to the normal execution one. That meaning the Log seems to be fine but actually the execution result is not.
Hope this can help someone is also struggled at this problem.

API Management - Response Time

We are working on setting up an API Management portal for one of our Web API. We are using eventhubs for logging the events and we are transferring the event messages to Azure Blob storage using Azure functions.
We would like to know how can we find the Time taken by API Management portal for providing the response for a message (we are capturing the time taken at the back end api layer but not from the API Management layer).
Regards,
John

The simpler solution is to enable Azure Monitor Diagnostic Logs for the Apimanagement service. You will get raw logs for each request including
durationMs - interval between receiving request line and headers from a client and writing last chunk of response body to a client. All writes and reads include network latency.
BackendTime - time spent waiting on backend response
ClientTime - time spent with client for request and response
CacheTime - time spent on fetching from cache
You can also refer this video.

Not the correct way of doing this, but still get an idea of how much time each request is taking. We can actually use the context variable to set the start time in the inbound policy node and then calculate the end time in the outbound node.

AWS API Gateway Cache - Multiple service hits with burst of calls

I am working on a mobile app that will broadcast a push message to hundreds of thousands of devices at a time. When each user opens their app from the push message, the app will hit our API for data. The API resource will be identical for each user of this push.
Now let's assume that all 500,000 users open their app at the same time. API Gateway will get 500,000 identical calls.
Because all 500,000 nearly concurrent requests are asking for the same data, I want to cache it. But keep in mind that it takes about 2 seconds to compute the requested value.
What I want to happen
I want API Gateway to see that the data is not in the cache, let the first call through to my backend service while the other requests are held in queue, populate the cache from the first call, and then respond to the other 499,999 requests using the cached data.
What is (seems to be) happening
API Gateway, seeing that there is no cached value, is sending every one of the 500,000 requests to the backend service! So I will be recomputing the value with some complex db query way more times than resources will allow. This happens because the last call comes into API Gateway before the first call has populated the cache.
Is there any way I can get this behavior?
I know that based on my example that perhaps I could prime the cache by invoking the API call myself just before broadcasting the bulk push job, but the actual use-case is slightly more complicated than my simplified example. But rest assured, solving this simplified use-case will solve what I am trying to do.

If you anticipate that kind of burst concurrency, priming the cache yourself is certainly the best option. Have you also considered adding throttling to the stage/method to protect your backend from a large surge in traffic? Clients could be instructed to retry on throttles and they would eventually get a response.
I'll bring your feedback and proposed solution to the team and put it on our backlog.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js