Read highlights from book using Google Glass - google-glass

I am looking to develop a Google Glass feature that can enable a reader to highlight a book and recall the sections. For instance, if there are 10 highlights on a scanned book Google Glass will show the points that are highlighted.
Is this possible to do?

Related

How does PaddleOCR performance compare to Google Cloud Vision OCR API?

Recently i find an OCR tool, which is called PaddleOCR. Has anyone used it, and how this OCR system preformance compare to Google Cloud Vision API?
I heard PaddleOCR called itself an industry-level open-sourced OCR engine, so I test a few images between it and Google Cloud Vision.
Generally speaking, commercial APIs like Google Cloud and Azure suppose to work better than the open-sourced OCR engine, it does, but for some scenarios, it's not too far away.
If the text is clear and flat, both work great. The main difference is the result format. Google API gives you rich content including block, paragraph, and word location information. PaddleOCR only returns the result according to the text line (transcriptions and locations).
If your test images are more complicated, like curved text, handwriting, or blurry. Commercial APIs probably work great than the open-sourced engine. However, when it can not meet your needs, try to use PaddleOCR training a new model.
Here is some visualization images:
PaddleOCR:
test1
test2
Google Cloud Vision API:
test1
test2

Getting word timestamps for TTS

I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.

Speaker Diarizations vs speaker recognition google cloud vs microsoft azure vs ibm watson vs aws transcribe

I want to do a project of speech-to-text analysis where I would like to 1) Speaker recognition 2) Speaker diarization 3)Speech-to-text. Right now I am testing various APIs provided for various companies like Microsoft, Google, AWS, IBM etc
I could find in Microsoft you have the option for user enrollment and speaker recognition (https://cognitivewuppe.portal.azure-api.net/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797)
However, all other platforms do have speaker diarization but not speaker recognition. In speaker diarization if I understand correctly it will be able to "distinguish" between users but how will it recognize unless until I don't enrol them? I could find only enrollment option available in azure
But I want to be sure so just want to check here maybe i am looking at correct documents or maybe there is some other way to achieve this in Google cloud, Watson and AWS transcribe. If that is the case can you folks please assist me with that
Speaker Recognition is divided into two categories: speaker verification and speaker identification.
https://learn.microsoft.com/en-us/azure/cognitive-services/speaker-recognition/home
Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports diarization and is capable of recognizing two speakers on mono channel recordings.
When you use batch transcription api and enable diarization. It will return 1,2.
All transcription output contains a SpeakerId. If diarization is not used, it will show "SpeakerId": null in the JSON output. For diarization we support two voices, so the speakers will be identified as "1" or "2".
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/cognitive-services/Speech-Service/batch-transcription.md
Ex: In a call center scenario the customer does not need to identify who is speaking, and cannot train the model beforehand with speaker voices since a new user calls in every time. Rather they only need to identify different voices when converting voice to text.
or
You can use Video Indexer supports transcription, speaker diarization (enumeration), and emotion recognition both from the text and the tone of the voice. Additional insights are available as well e.g. topic inference, language identification, brand detection, translation, etc. You can consume it via the video or audio-only APIs for COGS optimization.
You can use VI for speaker diarization. When you get the insights JSON, you can find speaker IDs both under Insights.transcript[0].speakerId as well as under Insights.Speakers. When dealing with audio files, where each speaker is recoded on a different channel, VI identifies that and applies the transcription and diarization accordingly.

google cloud vision category detecting

I want to use google cloud vision API in my android app to detect whether the uploaded picture is mainly food or not. the problem is that the response JSON is rather big and confusing. it says a lot about the picture but doesn't say what the whole picture is of (food or something like that). I contacted the support team but didn't get an answer.
What you really want is a custom classification, not specifically raw Cloud Vision annotation.
Either use the https://cloud.google.com/automl/ or invent an own wheel like I did: https://stackoverflow.com/a/55880316/322020

Can we start making a real production app using sneak peak GDK?

At this moment we have a sneak peak GDK and there are rumors that final GDK will come by this summer along with a public google glass device.
Now, we plan to make our google glass app built on GDK and at this moment we can only use sneak peak GDK. So we basically plan to build app along with new GDK SDKs appearing so this summer we can immediately publish our GDK apps once Google starts accepting such apps.
How safe is that we start building using existing GDK? Can anyone confirm it will not be drastically changes so we don't end up in ever-changing loop?
I see that Glass guys are watching this tag so I hope someone of them can give us a direction.
[Disclaimer: I am another Glass Explorer and not a Google employee ... however I have experience in several large corporations involved in software.]
I would expect to have to make minor and perhaps major adjustments in any Glassware application development that we do. In fact, as we find anomalies or other inconsistencies, I would hope that our feedback and requests would actually help shape the initially non-beta released GDK. If we get into a "continually updating" cycle as the GDK evolves, so be it.
Just my opinion and expectation. We will focus on modularizing and hiding important elements so changes to match a new GDK can be contained.