Google Vision API: both English and Arabic on the image - google-cloud-platform

We are trying to read Text from images that have both English and / or Arabic text in them. We do need to extract both languages' detected text.
When passing the hints as en and ar, sometimes the English are being mis-interpreted for Arabic text. Although if we pass English alone as a preferred language to the vision service call, the english text is returned correctly.
But since we need both, I suppose we have to pass both en and ar.
Is this correct? Is there anything we can do about this?

As per OCR Languages Support documentation, it seems English language doesn't have to be specified as it is Latin alphabet. Did you try to specify only Arabic in hints?
If that doesn't work, could you share an example of image as attachment to this post, and the code showing how you issue the API call? Also you can post the meaningful part of the response.

Related

Is there a way to specify that all letters in a document are capitalized in Google Cloud Vision API document text detection

I have several handwritten documents that were written in all caps. When comparing results of OCR with the original documents, almost all errors come from reading it in lowercase.
BUILDINGS is read as Buchonas, RED is reco, OCCASIONAL is deCAS AL.
Is there a way to specify to Vision API, in Python, that a document contains only numbers and capital letters? To add some sort of image context like "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890,.-!?"

Searching for content in the original language returns no results, English works fine

When I search for a keyword in the original language of an uploaded video, I get no results, whereas if I use the translated keyword in English, results are returned correctly. Here are the steps I used:
Logged into Azure Video Indexer.
Uploaded a video whose audio is in Arabic. Made sure the correct language, Arabic, is selected.
Waited until indexer completed the indexing.
Searched for a keyword in Arabic like 'حديث', but got no results.
Changed the filter by selecting a language from the dropdown (I chose Arabic, which then added a tag Language: ar-EG to the filter.
Yet again, the search returned no results.
When searched for the translated text: Talk, which is in English, the results were returned as expected.
I haven't tried to use the API instead of the Web UI, but I think I may have made a mistake somewhere.
Did anyone face a similar issue? Or is there anything I'm doing incorrectly?
Thanks
Video indexer team fix this issue.
You should see now the expected video when search Arabic keyword.

Computer Vision API Hand Writing

I am 99% sure it is not possible, but is there a way to have the response note if the text it read was "hand written"?
I have dug around the Microsoft documentation and do not see anything.
Thanks in advance!
Michael
Yes, the Read 3.2 preview API outputs an appearance object classifying whether each text line is print or handwriting style, along with a confidence score. This feature is supported only for Latin languages.

Using Arabic text with custom Font in Cocos2DX

I have a use-case involving Arabic text in a game, with custom font. I am currently using the createWithTTF API call, and selecting the Font file that I would need.
However, since Arabic is a Right to Left(RTL) language instead of a Left to Right(LTR) language, the texts are getting printed incorrectly. Apparently, the best solution for this is to use the createWithSystemFont API call. However with this call, I would not be able to use a custom font and I would have to resort to a system font.
Is there any way that you guys know in Cocos2DX to use a custom font, with Arabic text? I did look into this Github issue. I tried the Arabic Writer out, but this gives glitchy output in certain cases. I know that editing the source JSON/Plist file is an option, and I have tried using reversed Arabic strings in the source. However, since Arabic is a language that has combined characters, the result that I get on my UI is not 1:1 with the expected result, and some characters are disjointed(which are supposed to form a special character after getting merged).
Looking for suggestions on how to tackle this. I have looked into almost all open threads related to this, and could not find anything conclusive. Thanks!
I wrote a fix for the Persian language. It works for Arabic as well but you may need some Arabic only characters to it. (Might need some editing)
https://github.com/MohammadFakhreddin/cocos2dx-persian-arabic-support

Error while filtering english language tweets only

I am extracting tweets written only in English Language and I used the following filter
stream.filter(stall_warnings=True, track=['#brain'], languages=['en'])
But unfortunately this filter returns a tweet which is combination of English and some other language
Please see the tweet here
How can I extract a tweet which is written only in English Language?
Note: I am sorry if it is wrong for linking some other's tweet.
The tweets are classified by Twitter on one language or another. Their classification isn't always correct. If the tweet uses multiple languages they just assign it to one of them.
So you will need to filter them in your app against a dictionary or using some language detection libraries to be 100% sure that only English Language is used on the tweets you receive.
Source: https://blog.twitter.com/2013/introducing-new-metadata-for-tweets