Cloud Function Can not Get Correct Pub/Sub MEssage - google-cloud-platform

I am setting up a pub/sub trigger-based cloud function in GCP, the topic the cloud function listen to is us-pubsub1. When I deployed the cloud function and used the testing panel to send messages like:
{"index":123,
"video_name":'test.mp4'}
in cloud function to processing the message with key index and video_name, it has no issue. But when I sent a real message to us-pubsub1 to trigger the cloud function, it always failed for not able to find the 'index' in the message body. and when reading the pub/sub message in cloud function, it return me messages like:
{'#type': 'type.googleapis.com/google.pubsub.v1.PubsubMessage',
'attributes': None, 'data':
'eyJzZXNzaW9uX2lkIjogImUzYjM0MTJiLWQxNWUtNDM5My05YjEyLWI3ZGY1ZGE4MTQ0NCIsICJzZXNzaW9uX25hbWUiOiAiU0VTU0lPTl9DMjIjMwMTI1VDIwMTI1NCIsICJzaXRlX25hbWUiOiAiVkEgTUVESUNBTCBDRU5URVIgLSBQQUxPIEFMVE8gLSc3RlY3RvbXkiLCAiaHViX3NlcmlhbF9udW1iZXIiOiAiQzIyNC0wMDIwNSIsICJjYXN0X2FwcF92ZXJzaW9uIjogIjEyLjAuMzIuNyIsICJkdl9zeXN0ZW0iOiAiVUFUU0syMDA3IiwgImludGVybmFsX2tleSI6ICJtNjQzY2U1NS0zNjNiLQ=='}
I checked that the message arrive in us-pubsub1 correctly and just failed to process in cloud function.
Is there any thing I have missed for fetching the message body for real cloud function?

It's normal. You have a pubsub message enveloppe and your content is base64 encoded in the data field. Here the documentation details
FWIW, here the real content of your sample
{"session_id": "e3b3412b-d15e-4393-9b12-b7df5da81444", "session_name": "SESSION_C22#3#UC##SB"'6FUR#%dTD44TDU"Dstectomy", "hub_serial_number": "C224-00205", "cast_app_version": "12.0.32.7", "dv_system": "UATSK2007", "internal_key": "m643ce55-363b-
it is truncated, you might not share the whole content ;)

Related

Not getting entire message from Pub/Sub while pulling

I am publishing a message to pub/sub using the gcloud command from the cloud shell like so:
gcloud pubsub topics publish <<some_topic>> \
--message={"ride_id":"3bdc2294-86a5-4f45-bb28-885d3a4c2ada","point_idx":1185,"latitude":40.76384,"longitude":-73.89548,"timestamp":"2022-02-10T02:24:06.11629-05:00","meter_reading":27.502796,"meter_increment":0.02320911,"ride_status":"enroute","passenger_count":3}
Now when I pull the messages from the consumer process using a subscription to the topic I get a base64 encoded string for the pub/sub message(normal BAU). But after I decode the message to UTF-8 it comes out as, passenger_count:3
Which is only a truncated version of the entire message. Any explanation in this regard/behavior of Pub/Sub would be very helpful. As well as a possible fix/workaround for this problem.
I am consuming the message with a Cloud Function having Pub/Sub trigger. The code looks something like below:
import base64
def subscribe_topic(event, context):
# some code
message = base64.b64decode(event['data']).decode('utf-8')
print(message)
# some more code
The subcribe_topic() function serves as the entrypoint to my CF. When I print the message it usually gets reflected in the CF logs where I am able to see the truncated message instead of the entire one.
More on CF with Pub/Sub triggers here

Move Google Pub/Sub Messages Between Topics

How can I bulk move messages from one topic to another in GCP Pub/Sub?
I am aware of the Dataflow templates that provide this, however unfortunately restrictions do not allow me to use Dataflow API.
Any suggestions on ad-hoc movement of messages between topics (besides one-by-one copy and pasting?)
Specifically, the use case is for moving messages in a deadletter topic back into the original topic for reprocessing.
You can't use snapshots, because snapshots can be applied only on subscriptions of the same topics (to avoid message ID overlapping).
The easiest way is to write a function that pull your subscription. Here, how I will do it:
Create a topic (named, for example, "transfer-topic") with a push subscription. Set the timeout to 10 minutes
Create a Cloud Functions HTTP triggered by PubSub push subscription (or a CLoud Run service). When you deploy it, set the timeout to 9 minutes for Cloud Function and to 10 minutes for Cloud Run. The content of the processing is the following
Read a chunk of messages (for examples 1000) from the deadletter pull subscription
Publish the messages (in bulk mode) into the initial topic
Acknowledge the messages of the dead letter subscription
Repeat this up to the pull subscription is empty
Return code 200.
The global process:
Publish a message in the transfer-topic
The message trigger the function/cloud run with a push HTTP
The process pull the messages and republish them into the initial topic
If the timeout is reached, the function crash and PubSub perform a retry of the HTTP request (according with an exponential backoff).
If all the message are processed, the HTTP 200 response code is returned and the process stopped (and the message into the transfer-topic subscription is acked)
this process allow you to process a very large amount of message without being worried about the timeout.
I suggest that you use a Python script for that.
You can use the PubSub CLI to read the messages and publish to another topic like below:
from google.cloud import pubsub
from google.cloud.pubsub import types
# Defining parameters
PROJECT = "<your_project_id>"
SUBSCRIPTION = "<your_current_subscription_name>"
NEW_TOPIC = "projects/<your_project_id>/topics/<your_new_topic_name>"
# Creating clients for publishing and subscribing. Adjust the max_messages for your purpose
subscriber = pubsub.SubscriberClient()
publisher = pubsub.PublisherClient(
batch_settings=types.BatchSettings(max_messages=500),
)
# Get your messages. Adjust the max_messages for your purpose
subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION)
response = subscriber.pull(subscription_path, max_messages=500)
# Publish your messages to the new topic
for msg in response.received_messages:
publisher.publish(NEW_TOPIC, msg.message.data)
# Ack the old subscription if necessary
ack_ids = [msg.ack_id for msg in response.received_messages]
subscriber.acknowledge(subscription_path, ack_ids)
Before running this code you will need to install the PubSub CLI in your Python environment. You can do that running pip install google-cloud-pubsub
An approach to execute your code is using Cloud Functions. If you decide to use it, pay attention in two points:
The maximum time that you function can take to run is 9 minutes. If this timeout get exceeded, your function will terminate without finishing the job.
In Cloud Functions you can just put google-cloud-pubsub in a new line of your requirements file instead of running a pip command.

Is there a way to be notified of status changes in Google AI Platform training jobs without polling the REST API?

Right now I monitor my submitted jobs on Google AI Platform (formerly ml engine) by polling the job REST API. I don't like this solution for a few reasons:
Awareness of status changes is often delayed or missed altogether if the interval between status changes is smaller than the monitoring polling rate
Lots of unnecessary network traffic
Lots of unnecessary function invocations
I would like to be notified as soon as my training jobs complete. It'd be great if there is some way to assign hooks or callbacks to run when the job status changes.
I've also considered adding calls to cloud functions directly within the training task python package that runs on AI Platform. However, I don't think those function calls will occur in cases where the training job is shutdown unexpectedly, such as when a job is cancelled or forced to end by GCP.
Is there a better way to go about this?
You can use a Stackdriver sink to read the logs and send it to Pub/Sub. From Pub/Sub, you can connect to a bunch of other providers:
1. Set up a Pub/Sub sink
Make sure you have access to the logs and publish rights to the topic you desire before you get started. Follow the instructions for setting up a Stackdriver -> Pub/Sub sink. You’ll want to use this query to limit the events only to Training jobs:
resource.type = "ml_job"
resource.labels.task_name = "service"
Note that Stackdriver can further limit down the query. For example, you can limit to a particular Job by adding a condition like resource.labels.job_id = "..." or to a certain event with a filter like jsonPayload.message : "..."
2. Respond to the Pub/Sub message
In order to tell what changed, the recipient of the Pub/Sub message can either query the job status from the ml.googleapis.com API or read the text of the message
Reading state from ml.googleapis.com
When you receive the message, make a call to https://ml.googleapis.com/v1/<project_id>/jobs/<job_id> to get the Job information, replacing [project_id] and [job_id] in the URL with the values of resource.label.project_id and resource.label.job_id from the Pub/Sub message, respectively.
The returned Job object contains a field state that, naturally, tells the status of the job.
Reading state from the message text
The Pub/Sub message will contain a string telling what happened to the job. You probably want behavior when the job ends. Look for these strings in jsonPayload.message:
"Job completed successfully."
"Job cancelled."
"Job failed."
I implemented a Terraform module as #htappen said. I'm happy if it would help you. But my real hope is that Google updates AI Platform with the same feature.
https://github.com/sfujiwara/terraform-google-ai-platform-notification
I think you can programmatically publish a PubSub message at the end of your training job code. Something like this:
from google.cloud import pubsub_v1
# publish job complete message
client = pubsub_v1.PublisherClient()
topic = client.topic_path(args.gcp_project_id, 'topic-name')
data = {
'ACTION': 'JOB_COMPLETE',
'SAVED_MODEL_DIR': args.job_dir
}
data_bytes = json.dumps(data).encode('utf-8')
client.publish(topic, data_bytes)
Then you can setup a cloud function to be triggered by the same pubsub topic.
You can work around the lack of a callback from the service on a custom TF training job by adding a LamdbaCallback to the fit() call. In the on_epoch method, you could then send yourself a notification on job progress and on_train_end when it finishes.
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback

Lambda.FunctionError In my elasticsearch service log

I have an aws lambda function that connects to Kinesis Firehose delivery streams. In my logs the lambda function is executing perfectly and returning the data I want.
On my Kinesis Firehose delivery streams dashboard in the monitoring section it looks like I am getting Incoming bytes and Incoming records since there is data in those graphs. there is also data in the ExecuteProcessing Duration graph but then the ExecuteProcessing Success graph shows a line at 0, so I am guessing it is failing.
In the Elasticsearch logs I am getting a
Lambda.FunctionError with a message that says: The Lambda function
was successfully invoked but it returned an error result.
I am new to working with AWS and I am having trouble debugging this error code. Any help is appreciated.
The first thing that you must check is the return output of function. Remember that you must return a array with the same number of records in the following structure:
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(payload).encode('utf-8')).decode('utf-8')
}
output.append(output_record)
**return {'records': output}**
if you forget return this array , you will get this error message.
If the Lambda function is returning an error, then you should be able to find more information in the CloudWatch logs for the function in question. These docs describe the different ways to access the logs. If the logs don't provide enough information, you might considering altering the function to write more information to stdout or stderr.

Message lost and duplicates in GCP Pubsub

I'm running into a issue reading GCP PubSub from Dataflow where when publish large number of messages in short period of time, Dataflow will receive most of the sent messages, except some messages will be lost, and some other messages would be duplicated. And the most weird part is that the number of lost messages will be exactly the same as the number of messages being duplicated.
In one of the examples, I send 4,000 messages in 5 sec, and in total 4,000 messages were received, but 9 messages were lost, and exactly 9 messages were duplicated.
The way I determine the duplicates is via logging. I'm logging every message that is published to Pubsub along with the message id generated by pubsub. I'm also logging the message right after reading from PubsubIO in a Pardo transformation.
The way I read from Pubsub in Dataflow is using org.apache.beam.sdk.ioPubsubIO:
public interface Options extends GcpOptions, DataflowPipelineOptions {
// PUBSUB URL
#Description("Pubsub URL")
#Default.String("https://pubsub.googleapis.com")
String getPubsubRootUrl();
void setPubsubRootUrl(String value);
// TOPIC
#Description("Topic")
#Default.String("projects/test-project/topics/test_topic")
String getTopic();
void setTopic(String value);
...
}
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
options.setStreaming(true);
options.setRunner(DataflowRunner.class);
...
Pipeline pipeline = Pipeline.create(options);
pipeline.apply(PubsubIO
.<String>read()
.topic(options.getTopic())
.withCoder(StringUtf8Coder.of())
)
.apply("Logging data coming out of Pubsub", ParDo
.of(some_logging_transformation)
)
.apply("Saving data into db", ParDo
.of(some_output_transformation)
)
;
pipeline.run().waitUntilFinish();
}
I wonder if this is a known issue in Pubsub or PubsubIO?
UPDATE:
tried 4000 request with pubsub emulator, no missing data and no duplicates
UPDATE #2:
I went through some more experiments and found that the duplicating messages are taking the message_id from the missing ones. Because the direction of the issue has been diverted from it's origin quite a bit, I decide to post another question with detailed logs as well as the code I used to publish and receive messages.
link to the new question: Google Cloud Pubsub Data lost
I talked with a Google guy from the PubSub team. It seems to be caused by a thread-safety issue with the Python client. Please refer to the accepted answer for Google Cloud Pubsub Data lost for the response from Google