I'm a newbie in GCP.
While I'm reading the document of google speech api, it says that "Asynchronous Recognition (REST and gRPC) sends audio data to the Speech API and initiates a Long Running Operation. Using this operation, you can periodically poll for recognition results."
But what does "a Long Running Operation" actually means? And what's the difference between the process of synchronous & asynchronous recognition?
I've searched on the internet and found an answer about this: https://www.quora.com/What-is-the-difference-between-synchronous-and-asynchronous-speech-recognition
But I still can't get the idea. Can anyone explain more specifically?
I'll very appreciate for your answer:)
Asynchronous cloud requests usually return an id that request has been en-queued for processing, and later you can use that id to check on status and retrieve results when done.
Synchronous requests return results as part of response, but they may block for longer amounts of time.
You can use gcloud command line tool to try both. Sync requests for audio less than 60 sec
gcloud ml speech recognize AUDIO_FILE ...
and async for audio longer that is longer that 60sec
gcloud ml speech recognize-long-running AUDIO_FILE ...
the latter instead of transcript will return OPERATION_ID which later you can run
gcloud ml speech operations describe OPERATION_ID
to obtain results.
TIP: You can add --log-http flag to see what API requests gcloud is making to get more insight into what is going on at api level.
Related
Suppose we have a web-service written in python, that does some time-consuming file processing. It definitely should not be run inside the HTTP handler as it takes up to 10 mins to complete. Instead, the processing should be done asynchronously by some sort of workers, and it would be also nice to report the progress of the task execution to display to the user.
Would it be a good idea to setup Google Cloud Tasks with some Cloud Run or Cloud Functions service as a HTTP target to do this work?
Is Google Cloud Tasks suitable for handling this type of async tasks, where the user is sitting and waiting for result?
If not, is there any other options to achieve this with Google Cloud? (or should I use custom task services for this purpose, for instance, celery and redis) It also seems that Cloud Run Jobs features somewhat similar functionality, but there are not any queue systems to manage workers.
GC tasks is simply a tool for queuing tasks. It is a useful tool for ensuring that tasks do run, as it has built in retry mechanisms. How you use that in the context of an application depends on a lot of other detail of the application itself, but it is definitely possible to use it for background/asynchronous processing of tasks.
We use Google Cloud tasks to implement long running processes that report their progress via data store records. Some of these processes run longer than the standard 10 minute timeout, and trigger a new cloud task to complete the processing. We then have a simple lightweight handler that retrieves the status record from data store and reports that to the user. We poll that handler from the client, but you could also implement something like websockets.
GCP can handle Asynchronous tasks, Asynchronous execution is a well-established way to reduce request latency and make your application more responsive.
We can use cloud run or cloud functions for this type of tasks , Because we can increase the time limit upto 30 min in http task handlers in GCP cloud tasks.
For more information refer to this document.
We use Google Cloud tasks to implement long running processes that report their progress via data store records. Some of these processes run longer than the standard 10 minute timeout, and trigger a new cloud task to complete the processing.
When I make the calls from Cloud Run instance to other cloud APIs for some reason there are huge delays in responses.
Everything works within 1 project.
Even from local machine the calls are much faster (couple of secs) - but deployed in the cloud it takes couple of mins for some requests to complete. As I see it is relevant for all APIs (apart from Firestore, Translate and TTS APIs as well). This is not related to cold starts for sure.
Code example (Node JS) and logs are below:
console.log('Received the request for stats');
const usersCollection = this.firestore.collection('users')
const snapshot = await this.usersCollection.get();
console.log('Fetched all users from Firestore');
Well after some further investigation I figured out what the problem was.
The thing is that all the operations I perform happen not before the response is sent but after (this is the way chatbot is architectured).
So the flow looks like this:
request to do smth - response 200 that the request is accepted
all the business logic and work
chatbot sends the message with the results
According to the docs the CPU is allocated only during the request processing by default so the only thing I had to change is to enable CPU allocation for background activities: https://cloud.google.com/run/docs/tips/general#background-activity
Right now I monitor my submitted jobs on Google AI Platform (formerly ml engine) by polling the job REST API. I don't like this solution for a few reasons:
Awareness of status changes is often delayed or missed altogether if the interval between status changes is smaller than the monitoring polling rate
Lots of unnecessary network traffic
Lots of unnecessary function invocations
I would like to be notified as soon as my training jobs complete. It'd be great if there is some way to assign hooks or callbacks to run when the job status changes.
I've also considered adding calls to cloud functions directly within the training task python package that runs on AI Platform. However, I don't think those function calls will occur in cases where the training job is shutdown unexpectedly, such as when a job is cancelled or forced to end by GCP.
Is there a better way to go about this?
You can use a Stackdriver sink to read the logs and send it to Pub/Sub. From Pub/Sub, you can connect to a bunch of other providers:
1. Set up a Pub/Sub sink
Make sure you have access to the logs and publish rights to the topic you desire before you get started. Follow the instructions for setting up a Stackdriver -> Pub/Sub sink. You’ll want to use this query to limit the events only to Training jobs:
resource.type = "ml_job"
resource.labels.task_name = "service"
Note that Stackdriver can further limit down the query. For example, you can limit to a particular Job by adding a condition like resource.labels.job_id = "..." or to a certain event with a filter like jsonPayload.message : "..."
2. Respond to the Pub/Sub message
In order to tell what changed, the recipient of the Pub/Sub message can either query the job status from the ml.googleapis.com API or read the text of the message
Reading state from ml.googleapis.com
When you receive the message, make a call to https://ml.googleapis.com/v1/<project_id>/jobs/<job_id> to get the Job information, replacing [project_id] and [job_id] in the URL with the values of resource.label.project_id and resource.label.job_id from the Pub/Sub message, respectively.
The returned Job object contains a field state that, naturally, tells the status of the job.
Reading state from the message text
The Pub/Sub message will contain a string telling what happened to the job. You probably want behavior when the job ends. Look for these strings in jsonPayload.message:
"Job completed successfully."
"Job cancelled."
"Job failed."
I implemented a Terraform module as #htappen said. I'm happy if it would help you. But my real hope is that Google updates AI Platform with the same feature.
https://github.com/sfujiwara/terraform-google-ai-platform-notification
I think you can programmatically publish a PubSub message at the end of your training job code. Something like this:
from google.cloud import pubsub_v1
# publish job complete message
client = pubsub_v1.PublisherClient()
topic = client.topic_path(args.gcp_project_id, 'topic-name')
data = {
'ACTION': 'JOB_COMPLETE',
'SAVED_MODEL_DIR': args.job_dir
}
data_bytes = json.dumps(data).encode('utf-8')
client.publish(topic, data_bytes)
Then you can setup a cloud function to be triggered by the same pubsub topic.
You can work around the lack of a callback from the service on a custom TF training job by adding a LamdbaCallback to the fit() call. In the on_epoch method, you could then send yourself a notification on job progress and on_train_end when it finishes.
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback
I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions
Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.
Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.
I am using a microphone which records sound through a browser, converts it into a file and sends the file to a java server. Then, my java server sends the file to the cloud speech api and gives me the transcription. The problem is that the transcription is super long (around 3.7sec for 2sec of dialog).
So I would like to speed up the transcription. The first thing to do is to stream the data (if I start the transcription at the beginning of the record. The problem is that I don't really understand the api. For instance if I want to transcript my audio stream from the source (browser/microphone) I need to use some kind of JS api, but I can't find anything I can use in a browser (we can't use node like this can we?).
Else I need to stream my data from my js to my java (not sure how to do it without breaking the data...) and then push it through streamingRecognizeFile from there : https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java
But it takes a file as the input, so how am I supposed to use it? I cannot really tell the system I finished or not the record... How will it understand it is the end of the transcription?
I would like to create something in my web browser just like the google demo there :
https://cloud.google.com/speech/
I think there is some fundamental stuff I do not understand about the way to use the streaming api. If someone can explain a bit how I should process about this, it would be owesome.
Thank you.
Google "Speech-to-Text typically processes audio faster than real-time, processing 30 seconds of audio in 15 seconds on average" [1]. You can use Google APIs Explorer to test exactly how long your each request would take [2].
To speed up the transcribing you may try to add recognition metadata to your request [3]. You can provide phrase hints if you are aware of the context of the speech [4]. Or use enhanced models to use special set of machine learning models [5]. All these suggestions would improve the accuracy and might have effects on transcribing speed.
When using the streaming recognition, in config you can set singleUtterance option to True. This will detect if user pause speaking and cease the recognition. If not streaming request will continue until to the content limit, which is 1 minute of audio length for streaming request [6].