Google glass : Location recognition in Google glass - google-glass

How Google glass recognizes location when the user captures an image in Google glass,
Using GPS Tag or
Image Recognition
please give brief description.

I think it has to be the Geotagging GPS tag, for, Google glass has only a 2D camera and has only a OMAP 4430 SoC, dual-core cpu which does not gives enough information and cpu power for the glass to do Image Recognition.
So far, only devices has fancy cameras, like the one in project tango, has Image Recognition
Even Google Glass itself does not have a GPS chip, it would pull location information from cell phone.

Related

Speaker Diarizations vs speaker recognition google cloud vs microsoft azure vs ibm watson vs aws transcribe

I want to do a project of speech-to-text analysis where I would like to 1) Speaker recognition 2) Speaker diarization 3)Speech-to-text. Right now I am testing various APIs provided for various companies like Microsoft, Google, AWS, IBM etc
I could find in Microsoft you have the option for user enrollment and speaker recognition (https://cognitivewuppe.portal.azure-api.net/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797)
However, all other platforms do have speaker diarization but not speaker recognition. In speaker diarization if I understand correctly it will be able to "distinguish" between users but how will it recognize unless until I don't enrol them? I could find only enrollment option available in azure
But I want to be sure so just want to check here maybe i am looking at correct documents or maybe there is some other way to achieve this in Google cloud, Watson and AWS transcribe. If that is the case can you folks please assist me with that
Speaker Recognition is divided into two categories: speaker verification and speaker identification.
https://learn.microsoft.com/en-us/azure/cognitive-services/speaker-recognition/home
Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports diarization and is capable of recognizing two speakers on mono channel recordings.
When you use batch transcription api and enable diarization. It will return 1,2.
All transcription output contains a SpeakerId. If diarization is not used, it will show "SpeakerId": null in the JSON output. For diarization we support two voices, so the speakers will be identified as "1" or "2".
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/cognitive-services/Speech-Service/batch-transcription.md
Ex: In a call center scenario the customer does not need to identify who is speaking, and cannot train the model beforehand with speaker voices since a new user calls in every time. Rather they only need to identify different voices when converting voice to text.
or
You can use Video Indexer supports transcription, speaker diarization (enumeration), and emotion recognition both from the text and the tone of the voice. Additional insights are available as well e.g. topic inference, language identification, brand detection, translation, etc. You can consume it via the video or audio-only APIs for COGS optimization.
You can use VI for speaker diarization. When you get the insights JSON, you can find speaker IDs both under Insights.transcript[0].speakerId as well as under Insights.Speakers. When dealing with audio files, where each speaker is recoded on a different channel, VI identifies that and applies the transcription and diarization accordingly.

How to capture photo when a face detected smiles using Ionic?

Using Ionic, is it possible for me to be able to capture an image and the trigger would be whenever the face smiles? I am looking for suggestions, any resource materials that I could get using Ionic.
You need to use some emotion detection api. This problem is not related to ionic itself but to computer vision. So what you likely do is send/upload your photo to online api (for example google cloud vision or any other) to detect emotions in your photo and it will detect emotions for for you. The result will be then utilized by your application.

Google Cloud Vision API, identifying a snake in long grass

When I run the following image through the Google Cloud Vision API it see's the grass but not the snake. What can I do to improve object detection?
We can improve image detection by following the recommended image size guidelines or by using crop hints to make the snake more dominant in the image. Google Cloud Vision API is powered by machine learning and misses like this (snake) is expected on the early stages of the API. Vision API improves over time as new concepts are introduced and accuracy is improved.
Sample use of crop hints:
Result show "60% reptile" when using the Vision API explorer:

Streaming Images from native C++ to a web browser

I am trying to figure out how difficult it will be to achieve this. I have C/C++ drivers to interface with a machine vision camera. All it does is grab frames from the camera at 14 fps as a .tif. I want to display this image stream in a browser.
My background is in hardware, drivers and applications, not in web development. How much trouble am I getting in to? How easy would it be to create a video stream from the images, and feed it to a video player on the webpage?

Accessing microphone from a service on Glass

I would like a service to access the microphone (and do some signal processing on it a bit like what the google music is doing to recognise songs)
Is there a public API for that ? can't seem to find it :/
Have you tried the AudioRecord class in Android? That should do everything you need. You might also find the waveform sample on the Google Glass GitHub page to be a useful example.
Keep in mind that recording audio from a service (as in a background service) might be dangerous since other applications could need the microphone for voice recognition and so forth. Using an immersion for this might be a better approach.