I have images of receipts and I want to store the text in the images separately. Is it possible to detect text from images using Amazon Rekognition?
Update from November 2017:
Amazon Rekognition announces real-time face recognition, Text in Image
recognition, and improved face detection
Read the announcement here: https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-rekognition-announces-real-time-face-recognition-text-in-image-recognition-and-improved-face-detection/
Proof:
No, Amazon Rekognition not provide Optical Character Recognition (OCR).
At the time of writing (March 2017), it only provides:
Object and Scene Detection
Facial Analysis
Face Comparison
Facial Recognition
There is no AWS-provided service that offers OCR. You would need to use a 3rd-party product.
Amazon doesn't provide an OCR API. You can use Google Cloud Vision API for Document Text Recognition. It costs $3.5/1000 images though. To test Google's open this link and paste the code below in the the test request body on the right.
https://cloud.google.com/vision/docs/reference/rest/v1/images/annotate
{
"requests": [
{
"image": {
"source": {
"imageUri": "JPG_PNG_GIF_or_PDF_url"
}
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}
You may get better results with Amazon Textract although it's currently only available in limited preview.
It's possible to detect text in an image using the AWS JS SDK for Rekognition but your results may vary.
/* jshint esversion: 6, node:true, devel: true, undef: true, unused: true */
// Import libs.
const AWS = require('aws-sdk');
const axios = require('axios');
// Grab AWS access keys from Environmental Variables.
const { S3_ACCESS_KEY, S3_SECRET_ACCESS_KEY, S3_REGION } = process.env;
// Configure AWS with credentials.
AWS.config.update({
accessKeyId: S3_ACCESS_KEY,
secretAccessKey: S3_SECRET_ACCESS_KEY,
region: S3_REGION
});
const rekognition = new AWS.Rekognition({
apiVersion: '2016-06-27'
});
const TEXT_IMAGE_URL = 'https://loremflickr.com/g/320/240/text';
(async url => {
// Fetch the URL.
const textDetections = await axios
.get(url, {
responseType: 'arraybuffer'
})
// Convert to base64 Buffer.
.then(response => new Buffer(response.data, 'base64'))
// Pass bytes to SDK
.then(bytes =>
rekognition
.detectText({
Image: {
Bytes: bytes
}
})
.promise()
)
.catch(error => {
console.log('[ERROR]', error);
return false;
});
if (!textDetections) return console.log('Failed to find text.');
// Output the raw response.
console.log('\n', 'Text Detected:', '\n', textDetections);
// Output to Detected Text only.
console.log('\n', 'Found Text:', '\n', textDetections.TextDetections.map(t => t.DetectedText));
})(TEXT_IMAGE_URL);
See more examples of using Rekognition with NodeJS in this answer.
public async Task<List<string>> IdentifyText(string filename)
{
// Using USWest2, not the default region
AmazonRekognitionClient rekoClient = new AmazonRekognitionClient("Access Key ID", "Secret Access Key", RegionEndpoint.USEast1);
Amazon.Rekognition.Model.Image img = new Amazon.Rekognition.Model.Image();
byte[] data = null;
using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
data = new byte[fs.Length];
fs.Read(data, 0, (int)fs.Length);
}
img.Bytes = new MemoryStream(data);
DetectTextRequest dfr = new DetectTextRequest();
dfr.Image = img;
var outcome = rekoClient.DetectText(dfr);
return outcome.TextDetections.Select(x=>x.DetectedText).ToList();
}
Related
I am trying to get live video stream from the Amazon KVS to show in a dashboard board that I am building using React. I am very new to this (Amazon KVS)ecosystem and have no idea about how things work hence asking you good folks here.
I tried referring the 'amazon-kinesis-video-streams-webrtc-sdk-js' on github but it does not explain clearly as to how can I fetch the live video stream from kvs. The following is the thing that I've tried and have no clear idea as to what should I do with the endpointsByProtocol next to get the stream. I am bit on the deadline as well, hope someone can help.
var options = {
accessKeyId: response?.access_key_id,
secretAccessKey: response?.secret_access_key,
region: response?.region,
sessionToken: response?.session_token
}
const kinesisVideoClient = new AWS.KinesisVideo(options);
const {ChannelInfo} = await kinesisVideoClient.describeSignalingChannel({
ChannelName: channelName
}).promise();
const {ChannelARN, ChannelName} = ChannelInfo;
const getSignalingChannelEndpointResponse = await kinesisVideoClient.getSignalingChannelEndpoint({
ChannelARN: ChannelARN,
SingleMasterChannelEndpointConfiguration: {
Protocols: ['WSS', 'HTTPS'],
Role: Role.VIEWER,
},
})
.promise();
const endpointsByProtocol = getSignalingChannelEndpointResponse.ResourceEndpointList.reduce((endpoints, endpoint) => {
endpoints[endpoint.Protocol] = endpoint.ResourceEndpoint;
return endpoints;
}, {});
P.S: Apologies if this is something very basic and also thanks.
I tried referring the official doc of the AWS as well but it was not clear.
i am trying upload an in image file to s3 but get this error says :
ERROR: MethodNotAllowed: The specified method is not allowed against this resource.
my code using #aws-sdk/client-s3 package to upload wth this code :
const s3 = new S3({
region: 'us-east-1',
credentials: {
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey,
}
});
exports.uploadFile = async options => {
options.internalPath = options.internalPath || (`${config.s3.internalPath + options.moduleName}/`);
options.ACL = options.ACL || 'public-read';
logger.info(`Uploading [${options.path}]`);
const params = {
Bucket: config.s3.bucket,
Body: fs.createReadStream(options.path),
Key: options.internalPath + options.fileName,
ACL: options.ACL
};
try {
const s3Response = await s3.completeMultipartUpload(params);
if (s3Response) {
logger.info(`Done uploading, uploaded to: ${s3Response.Location}`);
return { url: s3Response.Location };
}
} catch (err) {
logger.error(err, 'unable to upload:');
throw err;
}
};
I am not sure what this error mean and once the file is uploaded I need to get his location in s3
thanks for any help
For uploading a single image file you need to be calling s3.upload() not s3.completeMultipartUpload().
If you had very large files and wanted to upload then in multiple parts, the workflow would look like:
s3.createMultipartUpload()
s3.uploadPart()
s3.uploadPart()
...
s3.completeMultipartUpload()
Looking at the official documentation, It looks like the new way to do a simple S3 upload in the JavaScript SDK is this:
s3.send(new PutObjectCommand(uploadParams));
We are trying integrate amazon Textract api in our node.js application. we are facing some issue, FeatureType parameter while processing image. we need to achieve the below option via api:
We are not finding the option in the AWS JavaScript SDK.
export type FeatureType = "TABLES"|"FORMS"|string;
I'm trying this code:
const params = {
Document: {
/* required */
Bytes: Buffer.from(fileData)
},
FeatureTypes: [""] // here i'm facing issue, if i pass "TABLES"|"FORMS" it working
};
var textract = new AWS.Textract({
region: awsConfig.awsRegion,
accessKeyId: awsConfig.awsAccesskeyID,
secretAccessKey: awsConfig.awsSecretAccessKey
})
textract.analyzeDocument(params, (err, data) => {
console.log(err, data)
if (err) {
return resolve(err)
} else {
resolve(data)
}
})
Getting this error:
InvalidParameterType: Expected params.FeatureTypes[0] to be a string
If I pass "TABLES"|"FORMS" its working but I need Raw Text option.
Thanks in advance
You have been calling the analyzeDocument() function:
Analyzes an input document for relationships between detected items.
It returns various types of text:
'BlockType': 'KEY_VALUE_SET'|'PAGE'|'LINE'|'WORD'|'TABLE'|'CELL'|'SELECTION_ELEMENT',
The LINE and WORD blocks seem to match your requirements.
Alternatively, there is also a detectDocumentText() function:
Detects text in the input document. Amazon Textract can detect lines of text and the words that make up a line of text.
I'm trying to make a skill based on Cake Time Tutorial but whenever I try to invoke my skill I'm facing an error that I don't know why.
This is my invoking function.
const LaunchRequestHandler = {
canHandle(handlerInput) {
console.log(`Can Handle Launch Request ${(Alexa.getRequestType(handlerInput.requestEnvelope) === "LaunchRequest")}`);
return (
Alexa.getRequestType(handlerInput.requestEnvelope) === "LaunchRequest"
);
},
handle(handlerInput) {
const speakOutput =
"Bem vindo, que série vai assistir hoje?";
console.log("handling launch request");
console.log(speakOutput);
return handlerInput.responseBuilder
.speak(speakOutput)
.reprompt(speakOutput)
.getResponse();
},
};
It should only prompt a message that's in Portuguese "Bem vindo, que série vai assistir hoje?" but instead it tries to access amazon S3 bucket for some reason and prints this error on console.
~~~~ Error handled: AskSdk.S3PersistenceAdapter Error: Could not read item (amzn1.ask.account.NUMBEROFACCOUNT) from bucket (undefined): Missing required key 'Bucket' in params
at Object.createAskSdkError (path\MarcaEpisodio\lambda\node_modules\ask-sdk-s3-persistence-adapter\dist\utils\AskSdkUtils.js:22:17)
at S3PersistenceAdapter.<anonymous> (path\MarcaEpisodio\lambda\node_modules\ask-sdk-s3-persistence-adapter\dist\attributes\persistence\S3PersistenceAdapter.js:90:45)
at step (path\MarcaEpisodio\lambda\node_modules\ask-sdk-s3-persistence-adapter\dist\attributes\persistence\S3PersistenceAdapter.js:44:23)
at Object.throw (path\MarcaEpisodio\lambda\node_modules\ask-sdk-s3-persistence-adapter\dist\attributes\persistence\S3PersistenceAdapter.js:25:53)
at rejected (path\MarcaEpisodio\lambda\node_modules\ask-sdk-s3-persistence-adapter\dist\attributes\persistence\S3PersistenceAdapter.js:17:65)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
Skill response
{
"type": "SkillResponseSuccessMessage",
"originalRequestId": "wsds-transport-requestId.v1.IDREQUESTED",
"version": "1.0",
"responsePayload": "{\"version\":\"1.0\",\"response\":{\"outputSpeech\":{\"type\":\"SSML\",\"ssml\":\"<speak>Desculpe, não consegui fazer o que pediu.</speak>\"},\"reprompt\":{\"outputSpeech\":{\"type\":\"SSML\",\"ssml\":\"<speak>Desculpe, não consegui fazer o que pediu.</speak>\"}},\"shouldEndSession\":false},\"userAgent\":\"ask-node/2.10.2 Node/v14.16.0\",\"sessionAttributes\":{}}"
}
----------------------
I've removed some ID information from the stack error but I think they're not relevant for the purpose.
The only thing I can think that is calling is when I add S3 adapter in alexa skill builder.
exports.handler = Alexa.SkillBuilders.custom()
.withApiClient(new Alexa.DefaultApiClient())
.withPersistenceAdapter(
new persistenceAdapter.S3PersistenceAdapter({
bucketName: process.env.S3_PERSISTENCE_BUCKET,
})
)
.addRequestHandlers(
LaunchRequestHandler,
MarcaEpisodioIntentHandler,
HelpIntentHandler,
CancelAndStopIntentHandler,
SessionEndedRequestHandler,
IntentReflectorHandler // make sure IntentReflectorHandler is last so it doesn't override your custom intent handlers
)
.addRequestInterceptors(MarcaEpisodioInterceptor)
.addErrorHandlers(ErrorHandler)
.lambda();
These are my Intents that I've created
Intents
And this is the function that should handle them.
const Alexa = require("ask-sdk-core");
const persistenceAdapter = require("ask-sdk-s3-persistence-adapter");
const intentName = "MarcaEpisodioIntent";
const MarcaEpisodioIntentHandler = {
canHandle(handlerInput) {
console.log("Trying to handle wiht marca episodio intent");
return (
Alexa.getRequestType(handlerInput.requestEnvelope) !== "LaunchRequest" &&
Alexa.getRequestType(handlerInput.requestEnvelope) === "IntentRequest" &&
Alexa.getIntentName(handlerInput.requestEnvelope) === intentName
);
},
async chandle(handlerInput) {
const serie = handlerInput.requestEnvelope.request.intent.slots.serie.value;
const episodio =
handlerInput.requestEnvelope.request.intent.slots.episodio.value;
const temporada =
handlerInput.requestEnvelope.request.intent.slots.temporada.value;
const attributesManager = handlerInput.attributesManager;
const serieMark = {
serie: serie,
episodio: episodio,
temporada: temporada,
};
attributesManager.setPersistentAttributes(serieMark);
await attributesManager.savePersistentAttributes();
const speakOutput = `${serie} marcada no episódio ${episodio} da temporada ${temporada}`;
return handlerInput.responseBuilder.speak(speakOutput).getResponse();
},
};
module.exports = MarcaEpisodioIntentHandler;
Any help will be grateful.
instead it tries to access amazon S3 bucket for some reason
First, your persistence adapter is loaded and configured on each use before your intent handlers' canHandle functions are polled. The persistence adapter is then used in the RequestInterceptor before any of them are polled.
If there's a problem there, it'll break before you ever get to your LaunchRequestHandler, which is what is happening.
Second, are you building an Alexa-hosted skill in the Alexa developer console or are you hosting your own Lambda via AWS?
Alexa-hosted creates a number of resources for you, including an Amazon S3 bucket and an Amazon DynamoDb table, then ensures the AWS Lambda it creates for you has the necessary role settings and the right information in its environment variables.
If you're hosting your own via AWS, your Lambda will need a role with read/write permissions on your S3 resources and you'll need to set the bucket where you're storing persistent values as an environment variable for your Lambda (or replace the process.env.S3_PERSISTENCE_BUCKET with a string containing the bucket name).
I call google api when the return of "We can not access the URL currently." But the resources must exist and can be accessed.
https://vision.googleapis.com/v1/images:annotate
request content:
{
"requests": [
{
"image": {
"source": {
"imageUri": "http://yun.jybdfx.com/static/img/homebg.jpg"
}
},
"features": [
{
"type": "TEXT_DETECTION"
}
],
"imageContext": {
"languageHints": [
"zh"
]
}
}
]
}
response content:
{
"responses": [
{
"error": {
"code": 4,
"message": "We can not access the URL currently. Please download the content and pass it in."
}
}
]
}
As of August, 2017, this is a known issue with the Google Cloud Vision API (source). It appears to repro for some users but not deterministically, and I've run into it myself with many images.
Current workarounds include either uploading your content to Google Cloud Storage and passing its gs:// uri (note it does not have to be publicly readable on GCS) or downloading the image locally and passing it to the vision API in base64 format.
Here's an example in Node.js of the latter approach:
const request = require('request-promise-native').defaults({
encoding: 'base64'
})
const data = await request(image)
const response = await client.annotateImage({
image: {
content: data
},
features: [
{ type: vision.v1.types.Feature.Type.LABEL_DETECTION },
{ type: vision.v1.types.Feature.Type.CROP_HINTS }
]
})
I have faced the same issue when I was trying to call the api using the firebase storage download url (although it worked initially)
After looking around I found the below example in the api docs for NodeJs.
NodeJs example
// Imports the Google Cloud client libraries
const vision = require('#google-cloud/vision');
// Creates a client
const client = new vision.ImageAnnotatorClient();
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';
// Performs text detection on the gcs file
const [result] = await client.textDetection(`gs://${bucketName}/${fileName}`);
const detections = result.textAnnotations;
console.log('Text:');
detections.forEach(text => console.log(text));
For me works only uploading image to google cloud platform and passing it to URI parameters.
In my case, I tried retrieving an image used by Cloudinary our main image hosting provider.
When I accessed the same image but hosted on our secondary Rackspace powered CDN, Google OCR was able to access the image.
Not sure why Cloudinary didn't work when I was able to access the image via my web browser, but just my little workaround situation.
I believe the error is caused by the Cloud Vision API refusing to download images on a domain whose robots.txt file blocks Googlebot or Googlebot-Image.
The workaround that others mentioned is in fact the proper solution: download the images yourself and either pass them in the image.content field or upload them to Google Cloud Storage and use the image.source.gcsImageUri field.
For me, I resolved this issue by requesting URI (e.g.: gs://bucketname/filename.jpg) instead of Public URL or Authenticated URL.
const vision = require('#google-cloud/vision');
function uploadToGoogleCloudlist (req, res, next) {
const originalfilename = req.file.originalname;
const bucketname = "yourbucketname";
const imageURI = "gs://"+bucketname+"/"+originalfilename;
const client = new vision.ImageAnnotatorClient(
{
projectId: 'yourprojectid',
keyFilename: './router/fb/yourprojectid-firebase.json'
}
);
var visionjson;
async function getimageannotation() {
const [result] = await client.imageProperties(imageURI);
visionjson = result;
console.log ("vision result: "+JSON.stringify(visionjson));
return visionjson;
}
getimageannotation().then( function (result){
var datatoup = {
url: imageURI || ' ',
filename: originalfilename || ' ',
available: true,
vision: result,
};
})
.catch(err => {
console.error('ERROR CODE:', err);
});
next();
}
I faced with the same issue several days ago.
In my case the problem happened due to using queues and call api requests in one time from the same ip. After changing the number of parallel processes from 8 to 1, the amount of such kind of errors was reduced from ~30% to less than 1%.
May be it will help somebody. I think there is some internal limits on google side for loading remote images (because as people reported, using google storage also solves the problem).
My hypothesis is that an overall (short) timeout exists on Google API side which limit the number of files that can actually be retrieved.
Sending 16 images for batch-labeling is possible but only 5 o 6 will labelled because the origin webserver hosting the images was unable to return all 16 files within <Google-Timeout> milliseconds.
In my case, the image uri that I was specifying in the request pointed at a large image ~ 4000px x 6000px. When I changed it to a smaller version of the image. The request succeeded
The very same request works for me. It is possible that the image host was temporarily down and/or had issues on their side. If you retry the request it will mostly work for you.