Recently i find an OCR tool, which is called PaddleOCR. Has anyone used it, and how this OCR system preformance compare to Google Cloud Vision API?
I heard PaddleOCR called itself an industry-level open-sourced OCR engine, so I test a few images between it and Google Cloud Vision.
Generally speaking, commercial APIs like Google Cloud and Azure suppose to work better than the open-sourced OCR engine, it does, but for some scenarios, it's not too far away.
If the text is clear and flat, both work great. The main difference is the result format. Google API gives you rich content including block, paragraph, and word location information. PaddleOCR only returns the result according to the text line (transcriptions and locations).
If your test images are more complicated, like curved text, handwriting, or blurry. Commercial APIs probably work great than the open-sourced engine. However, when it can not meet your needs, try to use PaddleOCR training a new model.
Here is some visualization images:
PaddleOCR:
test1
test2
Google Cloud Vision API:
test1
test2
Related
I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.
I need to transcribe a large number of Handwritten documents. I tried to use cloud services from either Google, Amazon, and Microsoft. Namely:
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://cloud.google.com/vision/docs/handwriting
https://aws.amazon.com/textract/
Unfortunately, none of them achieved good enough results. I suspect it is because my documents have a weird handwriting style, and as a result, the networks struggle a lot.
I searched whether I could fine-tune (with manually transcribed data), but I have not found anything online, so as a last resort, I ask here.
If it is possible to fine-tune one of these models, could you please point me some resources?
You are correct, with Azure Cognitive Services with Computer Vision you cannot upload your own data to train the API to recognise the handwriting in your documents I'm afraid. I can't comment on the other offerings from AWS and Google I'm afraid, but certainly not for Azure.
I want to do OCR and I know that Cloud Vision API supports it. But I'm interested in making my custom model for it and wish to use AutoML for the same. But I couldn't find anything related to OCR using AutoML. Is it possible to do OCR using AutoML? How do we go about this? I know this is a very open-ended question, but I'd appreciate some help.
AutoML Natural Language can perform OCR on PDFs; however, this is just a step because is intended for creating your on models on text classification, entity extraction or sentiment analysis.
If you goal is just to perform OCR the best approach will be Vision API.
You cannot do OCR from AutoML. Your options are to use the Cloud Vision API to do OCR and then apply your own algorithms to put the detected letters together in a certain way, or to start from scratch and train your own OCR model (not recommended).
I am doing OCR using the API of Google Cloud vision.
To make it easier to check the results, I'd like to visualize where we should be more careful and where we should be better off, depending on how reliable the API output is.
I couldn't find it as far as I could, but does the API have the ability to output the confidence level? It would be very much appreciated if you could tell us.
I want to use google cloud vision API in my android app to detect whether the uploaded picture is mainly food or not. the problem is that the response JSON is rather big and confusing. it says a lot about the picture but doesn't say what the whole picture is of (food or something like that). I contacted the support team but didn't get an answer.
What you really want is a custom classification, not specifically raw Cloud Vision annotation.
Either use the https://cloud.google.com/automl/ or invent an own wheel like I did: https://stackoverflow.com/a/55880316/322020