I am using Google Speech API to convert voice to text convertion, it is working fine when i use my own recorded voice,
but the result is not proper while using computer generated Lady voice, like cell phone network operator voice.
Any one faced this kind of problem? or any one having solution for this? please help me to solve this issue...
Thank you.
Did you set proper sampling rate sampleRateHertz of the speech?
Did it return something close, but not correct, or it totally failed with no speech at all? If you didn't get anything converted then verify that you sent correct info to speech api.
Related
Is there a way to disable auto correction for Google Cloud Speech to Text API? It is important for me to get accurate transcript of user's speech, with any errors they make rather than a corrected version.
It is difficult to distinguish between mistakes made by speaker (grammar/pronunciation errors) in the audio content and mistakes made by Speech API. However, you can check different versions of text output predicted by model behind the scene with the help of maxAlternatives property of the API.
You have not provided the example of such use-case, but if you are already expecting unusual pronunciation or Acronyms you can provide hint to the request using phraseHint property.
Please provide further details if it doesn't answer your question.
When i try to recognize a text in image, like the italian word "Perchè", Vision API get back the word "Perche" (give back the "e" and not the correct one "è").
I don't want to use languageHints to try to obtain better results because i've to do OCR Recognition across different language.
What is the problem here?
This is known issue with the Cloud Vision API when you don't use language hints.
You can see the actual bug report here.
It is in state accepted, but there seems to be radio silence on it for the last few months. It may take some time to roll it out.
Do any of you guys know of a free online (or offline if it's in java) language identifier service? (I don't want a tool you use manually. I need a service, sice I have to do this identifying programatically.)
I've got a form and I'd like to figure out what language a user has written in.
Come to think of it, shouldn't this be doable through a Google thingy somehow? Since they detect page languages and all, and they're mostly open source...
Thanks for any help. Cheers!
[I added a "google-translate" tag since there isn't anything regarding text-recognition (there's image and voice but no text)]
Language Detection Library for Java looks like the kind of thing you are looking for.
Also see http://en.wikipedia.org/wiki/Language_identification for more links.
Language Detection API has free plan. You can pass text via HTTP POST and receive JSON result with detected languages and scores.
I am trying to build an iphone app that connects to an IP camera. The IP camera is windows based to i need to create a server using c++ and then stream the video to the iphone app.
Can anyone tell me the best way in going about this task. I am new to programming so a dummies type guide would help.
Thanks
Inam
Go to the ITunes store and download a free app from Avigilon. You won't be able to see any video unless you connect to a system but it'll tell you what ports and user information would be needed. There are gateways and streaming methods involved as well. Not a situation where a new developer will have a lot of success.
Your question is rather too broad to fit into a comment box. It seems, and correct me if I'm wrong, that you're basically asking for someone to write the applications for you.
Instead, if you're a complete beginner, you'll want to first learn how to program for the plaform.
The StackOverflow question Howto articles for iPhone development, Objective C will help you get started with programming for the iPhone.
Once you have the basics down, you might then ask more specific questions.
I need to be able to allow users to upload a wide variety of video files in various formats then clean them up and make them kosher for delivery to a dedicated content handler.
I've tried ffmpeg onsite but it has some serious flaws in regards to h.264.
Then I tried flixcloud.com which has a very good interface, api, and was looking like the perfect solution except it doesn't provide the video frame rate correctly.
Moving on I tried Ankoder.com and it does work, but unfortunately it's API is somewhat of a mess and has some quirks that are proving to be difficult to code around.
What other services are out there, will only accept answer from someone who has used a video transcoding service.
Update:
Just started looking at http://www.encoding.com/ - seems interesting.
I have worked with a number of transcoding services in the past and have found flaws with just about all of them. For the last 3 projects I have been involved with that involved media encoding I have used Expression Encoder, with great results. The application its self is a pleasure to use and simple to achieve the results needed, and the SDK is one of the best out there. Microsoft have definatly done a great job.