Throttle down GCP DataFlow? - google-cloud-platform

Using the standard GCP provided Storage/text file to PubSub DataFlow template but although I have set #workernodes eq 1 the thruput of messages processed is "to high" for downstream components.
CloudFunction that runs on message event in Pub/Sub hits GCP quotas and with CloudRun I get a bunch of 500, 429 and 503 errors in the beginning (due to to step burst rate).
Is there any way to control the processing rate of DataFlow? Need to get a softer/slower start so downstream components have time to scale up.
Anyone?

You can use Stateful ParDo's to achieve this where in you can buffer events in batches and make an API call with all the keys at once. This is very nicely explained with code snippets here

Related

Metric for Number of unacknowledged messages older than 20 minutes

I am trying to set up alerts on pubsub in gcp that monitor the number of old messages in a queue. Specifically the number of unacknowledged messages older than 20 minutes.
I want an alert that because number of unacknowledged messages cloud shoot high on sudden push of hugh number of messages. And using only the oldest unacknowledged message will run the alert for outlier messages that might stuck in the queue (ex bad formatted messages etc..)
I've tried to combine both metrics but could not know how to filter on one of them.
fetch pubsub_subscription |
{
t_0: metric 'pubsub.googleapis.com/subscription/num_undelivered_messages';
t_1: metric 'pubsub.googleapis.com/subscription/oldest_unacked_message_age'
}
| outer_join 0 # how to filter now on oldest_unacked_message_age > 20 minutes and select num_undelivered_messages
Also I think this won't work as my understanding of cloud pubsub metrics because each metric is a single time series number. It does not have information about individual messages (correct me if I am wrong).
Also I've tried to look for a metic that have them both but can't find one as well.
You can deploy an alert of undelivered messages in Google Cloud Monitoring. You will find the Pub/Sub subscription resource type, and then you can set a filter based on the response_code. Also, you can create a new chart based on your needs.

Cloud Run: 429: The request was aborted because there was no available instance

We (as a company) experience large spikes every day. We use Pub/Sub -> Cloud Run combination.
The issue we experience is that when high traffic hits, Pub/Sub tries to push messages to Cloud/Run all at the same time without any flow control. The result?
429: The request was aborted because there was no available instance.
Although this is marked as a warning, every 4xx HTTP response results in the message retry delivery.
Messages, therefore, come back to the queue and wait. If a message repeats this process and the instances are still taken, Cloud Run returns 429 again, and the message is sent back to the queue. This process repeats x times (depends on what value we set in Maximum delivery attempts). After that, the message goes to the dead-letter queue.
We want to avoid this and ideally don't get any 429, so the message won't travel back and forth, and it won't end up in the dead-letter subscription because it is not one of the application errors we want to keep there, but rather a warning caused by Pub/Sub not controlling the flow and coordinating with Cloud Run.
Neither Pub/Sub nor a push subscription (which is required to use for Cloud Run) have any flow control feature.
Is there any way to control how many messages are sent to Cloud Run to avoid getting the 429 response? And also, why does Pub/Sub even try to deliver when it is obvious that Cloud Run hit the limit of instances. The best would be to keep the messages in a queue until the instances free up.
Most of the answers would probably suggest increasing the limit of instances. We already set 1000. This would not be scalable because even if we set the limit to 1500 and a huge spike comes, we would pass the limit and get the 429 messages again.
The only option I can think of is some flow control. So far, we have read about Cloud Tasks, but we are not sure if this can help us. Ideally, we don't want to introduce any new service, but if necessary, we will do.
Thank you for all your tips and time! :)

How to do "live request batching" in gcloud

Here is my situation:
I have a rather slow tensorflow model that runs on GPU (2 to 3 seconds per prediction)
A prediction for a single 'entity' vs a prediction for 8 'entities' takes about the same time
This means I could be 8 times as efficient by simply combining multiple predictions in the same request
I have a service on AI platform serving requests to that model
The service works for slow request rates but has trouble scaling up (anything over 4 QPS is too much to handle)
My question then is:
Is there a standard way / best practice for batching live client requests:
When receiving a request, wait a little bit for other requests
After a while, or when the number of requests reaches a set number, forward the requests in a single "batch" to another service.
If traffic is low, the delay will expire before the batch is full, but since traffic is low, that's not an issue
If traffic is high, the batch will be full before the delay, and the client will have to wait less
I have an almost-working solution with app-engine + firebase (for hosting the shared 'queue') but implementing the delay is giving me trouble (app engine doesn't seem to like python's threading.Timer
I'd appreciate something that could work with app engine, but at this point I'm open to any suggestions (as long as it is applicable on google cloud).
Thanks!
The perfect (but not the cheapest) is to use Dataflow.
When a prediction request comes in, publish it in PubSub
Deploy a dataflow in streaming mode, with fixed windows of X minutes, and another trigger, not accumulated, after Y event in the window.
When a window trigger is performed (either on the number of messages or on the timer) do the batch processing
You can imagine other designs, simpler/cheaper.
Still publish the prediction requests in PubSub
You can schedule a Cloud Functions, or a Cloud Run every X minutes to pull the pubsub subscription and then to trigger the batch job. But, it's a fixed time.
When you publish the message in PubSub, you can also store, in firestore for example, and increase a counter and the date of the 1st message published in PubSub.
If the number of message is above your threshold, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
Set up a cloud scheduler which check, every minute, the value of the 1st message date in Firestore. If it's above your time limit, perform a request to your other process that pull the PubSub subscription and run the batch processing (as before #1). Reset the counter value and the message date value
The #2 will generate a lot of Firestore read/write, but will be cheaper than dataflow.

Consuming messages from Google Pubsub and publishing it to Kafka

I am trying to consume Google PubSub messages using synchronous PULL API. This is available in Apache Beam Google PubSub IO connector library.
I want to write the consumed messages to Kafka using KafkaIO. I want to use FlinkRunner to execute the job, since we run this application outside GCP.
The problem I am facing is that the consumed messages are not getting ACK'd in GCP PubSub. I have confirmed that the local Kafka instance has the messages consumed from GCP PubSub. The documentation in GCP DataFlow indicates that the data bundle gets finalized when the pipeline is terminated with a data sink, which is Kafka in my case.
But since code is running in Apache Flink and not GCP DataFlow, I think some sort of callback is not getting fired related to ACK'ing the committed message.
What am I doing wrong here?
pipeline
.apply("Read GCP PubSub Messages", PubsubIO.readStrings()
.fromSubscription(subscription)
)
.apply(ParseJsons.of(User.class))
.setCoder(SerializableCoder.of(User.class))
.apply("Filter-1", ParDo.of(new FilterTextFn()))
.apply(AsJsons.of(User.class).withMapper(new ObjectMapper()))
.apply("Write to Local Kafka",
KafkaIO.<Void,String>write()
.withBootstrapServers("127.0.0.1:9092,127.0.0.1:9093,127.0.0.1:9094")
.withTopic("test-topic")
.withValueSerializer((StringSerializer.class))
.values()
);
In the Beam documentation on the PubSub IO class it's mentioned this:
Checkpoints are used both to ACK received messages back to Pubsub (so that they may be retired on the Pubsub end), and to NACK already consumed messages should a checkpoint need to be restored (so that Pubsub will resend those messages promptly).
The ACK are not linked to Dataflow, you should have the same behavior on dataflow. The ack are sent on Checkpoints. Usually the Checkpoints are the windows that you set on your stream flow.
But, you didn't set window! By default, the windows is global, and it closed only at the end, if you stop gracefully your job (and even, I'm not sure about this). Anyway, a better solution is to have fixed windows (for example of 5 minutes) to ack the messages on each of these windows.
The way I fixed this solution was by using Guillaume Blaquiere's (https://stackoverflow.com/users/11372593/guillaume-blaquiere) suggestion of looking at Checkpoints. Even after adding the Window.into() function in the pipeline, the source PubSub subscription endpoint did not receive ACKs.
The problem was in the Flink server configuration I had failed to mention checkpoint configuration. Without these parameters, checkpoints are disabled.
state.backend: rocksdb
state.checkpoints.dir: file:///tmp/flink-1.9.3/state/checkpoints/
These configs should go in the flink_home/conf/flink-conf.yaml.
After adding these entries and restarting flink. All the backlogged (unack'd messages) went to 0 in the GCP pubsub monitoring chart.

How to use Google Cloud PubSub and Run to handle resource-intensive long-running tasks?

I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions
Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.
Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.