Google Cloud Speech-to-Text API - Multi-speaker recognition?

Google Cloud Speech-to-Text API - Multi-speaker recognition? - google-cloud-platform

The new Google Cloud Speech-to-Text API is said to be the best in the market. Does it provide speaker annotation (or other speaker information) at all? Like who says what at what time. I can't seem to find anywhere on its documentation or examples provided that mentions it.
Both IBM and Amazon do it.
I'd be appreciated if anyone can let me know, thanks!

Individual speaker recognition is not currently a feature provided by the API. It’s noted in the issue tracker [1] as a feature request, however there’s no ETA for it currently. I’d recommend starring the issue to receive future comments and updates regarding it.
[1] https://issuetracker.google.com/35901846

Related

Does Google Dialogflow have intent recommendation?

I'm switching from IBM Watson Assistant to Google Dialogflow. In WA there's a feature called intent recommendation, which taps into live instances, detects the topics/intents that users want, and groups them together into new recommended intents. You can also upload utterances in spreadsheets and intent recommendation does the same thing. Does Dialogflow have something similar?

Likely the closest to your needs is the GCP Product called Contact Center AI Insights. If we look at this documentation page:
https://cloud.google.com/contact-center/insights/docs/topic-modeling-overview
we find that this product has a feature called "Topic Modeling". This allows us to examine the current and historic conversation transcriptions and, from that:
Monitoring topic trends to keep your agents updated.
Supporting agent
training as new topics are observed.
Using topics and their
distribution to help define Dialogflow intents.
You can also deploy your created topic model to infer topics on new conversations, allowing you to continually classify incoming conversations.

How to use Google Cloud Video Intelligence Celebrity Recognition?

I have been using Google Cloud Video Intelligence Api happily and succesfully until this point. However, now if I am not mistaken, I noticed that Celebrity API is only open to approved selected media companies. Amazon Rekognition provides this support to public. This is quite unbelievable. How can this kind of service be a private service on such a public cloud service such as Google's ?
Does anyone know how to use Celebrity Recognition API from Google Cloud ?

In answer to your question why Celebrity recognition is not made publicly available, there are legal reasons that Google may be dealing with. This type of technology is powerful and in the wrong hands could cause serious issues for all parties involved.
See the “Restricted access feature” note in Google’s documentation [1].
[1] https://cloud.google.com/vision/docs/celebrity-recognition

Fine tuning on either Google Cloud Vision, Microsoft Azure Computer Vision API or Amazon Text Extract

I need to transcribe a large number of Handwritten documents. I tried to use cloud services from either Google, Amazon, and Microsoft. Namely:
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://cloud.google.com/vision/docs/handwriting
https://aws.amazon.com/textract/
Unfortunately, none of them achieved good enough results. I suspect it is because my documents have a weird handwriting style, and as a result, the networks struggle a lot.
I searched whether I could fine-tune (with manually transcribed data), but I have not found anything online, so as a last resort, I ask here.
If it is possible to fine-tune one of these models, could you please point me some resources?

You are correct, with Azure Cognitive Services with Computer Vision you cannot upload your own data to train the API to recognise the handwriting in your documents I'm afraid. I can't comment on the other offerings from AWS and Google I'm afraid, but certainly not for Azure.

Cloud Speech-to-Text punctuation

I'm trying to find out how I may punctuate in cloud Cloud Speech-to-Text, not in English, but another language. This is a basic requirement for my use case. I'm sure google has thought of it.
Has anybody experience of this?

Currently, automatic punctuation is only available for US English only (en-US). It is likely that this feature will be available for other languages at some point, but I would recommend you to ask GCP about your particular language(s) by filling a Feature Request using this form.

Google Cloud APIs usage data by projects

Is there any way to programmatically get data similar to APIs overview of Google CLoud dashboard. Specifically, I'm interested in the list of APIs enabled for the project and their usage/error stats for some predefined timeframe. I belive there's an API for that but I struggle to find it.

There's currently no API that gives you a report similar to the one you can see through the Google Cloud Console.
The Compute API can retrieve some quotas with the get method but it's somewhat limited (only Compute Engine quotas) and, for what I understood from your question, not quite what you're looking for.
However, I've found in Google's Issue Tracker a feature request that's close to what you're asking for.
If you would need something more specific or want to do the feature request yourself, check the "Report feature requests" documentation and create your own. The GCP team will take a look at it to evaluate and consider implementation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Google Cloud Speech-to-Text API - Multi-speaker recognition? - google-cloud-platform

Related

Does Google Dialogflow have intent recommendation?

How to use Google Cloud Video Intelligence Celebrity Recognition?

Fine tuning on either Google Cloud Vision, Microsoft Azure Computer Vision API or Amazon Text Extract

Cloud Speech-to-Text punctuation

Google Cloud APIs usage data by projects

Categories

Resources