AWS costum entity recognition: Wrong ARN-Endpoint - amazon-web-services

I try to use the custom entity recognition I just trained on Amazon Web Service (AWS).
The training worked so far:
However, if i try to recognize my entities with AWS Lambda (code below) with the given ARN-Endpoint I get the following error (even tho AWS should use the latest Version of the botocore/boto3 framework "EntpointArn" is not available (Docs)):
Response:
{
"errorMessage": "Parameter validation failed:\nUnknown parameter in input: \"EndpointArn\", must be one of: Text, LanguageCode",
"errorType": "ParamValidationError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 21, in lambda_handler\n entities = client.detect_entities(\n",
" File \"/var/runtime/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 607, in _make_api_call\n request_dict = self._convert_to_request_dict(\n",
" File \"/var/runtime/botocore/client.py\", line 655, in _convert_to_request_dict\n request_dict = self._serializer.serialize_to_request(\n",
" File \"/var/runtime/botocore/validate.py\", line 297, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\n"
]
}
I fixed this error with the first 4 lines in my code:
#---The hack I found on stackoverflow----
import sys
from pip._internal import main
main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')
#----------------------------------------
import json
import boto3
client = boto3.client('comprehend', region_name='us-east-1')
text = "Thats my nice text with different entities!"
entities = client.detect_entities(
Text = text,
LanguageCode = "de", #If you specify an endpoint, Amazon Comprehend uses the language of your custom model, and it ignores any language code that you provide in your request.
EndpointArn = "arn:aws:comprehend:us-east-1:215057830319:entity-recognizer/MyFirstRecognizer"
)
However, I still get one more error I cannot fix:
Response:
{
"errorMessage": "An error occurred (ValidationException) when calling the DetectEntities operation: 1 validation error detected: Value 'arn:aws:comprehend:us-east-1:XXXXXXXXXXXX:entity-recognizer/MyFirstRecognizer' at 'endpointArn' failed to satisfy constraint: Member must satisfy regular expression pattern: arn:aws(-[^:]+)?:comprehend:[a-zA-Z0-9-]*:[0-9]{12}:entity-recognizer-endpoint/[a-zA-Z0-9](-*[a-zA-Z0-9])*",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n entities = client.detect_entities(\n",
" File \"/tmp/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/tmp/botocore/client.py\", line 635, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
This error also occurs if I use the NodeJS framework with the given endpoint. The funny thing I should mention is that every ARN-Endpoint I found (in tutorials) looks exactly like mine and do not match with the regex-pattern returned as error.
I'm not quite sure if I do something wrong here or if it is a bug on the AWS-Cloud (or SDK).. Maybe somebody can reproduce this error and/or find a solution (or even a hack) for this problem
Cheers

Endpoint ARN are different AWS resource as compared to Model ARN. Model ARN refers to a custom model while endpoint hosts that model. In your code, your code you are passing in the modelARN instead of endpointARN which is causing the error to be raised.
You can differentiate between the two ARNs on the basis of the prefix.
endpoint arn - arn:aws:comprehend:us-east-1:XXXXXXXXXXXX:entity-recognizer-endpoint/xxxxxxxxxx
model arn - arn:aws:comprehend:us-east-1:XXXXXXXXXXXX:entity-recognizer/MyFirstRecognizer
You can read more about Comprehend custom endpoints and its pricing on the documentation page.
https://docs.aws.amazon.com/comprehend/latest/dg/detecting-cer-real-time.html

Related

Is Snomed Integration For AWS Medical Comprehend Actually Working?

According to a recent update, the AWS medical comprehend service should now be returning snomed categories along with other medical terms.
I am running this in a Python 3.9 Lambda:
import json
def lambda_handler(event, context):
clinical_note = "Patient X was diagnosed with insomnia."
import boto3
cm_client = boto3.client("comprehendmedical")
response = cm_client.infer_snomedct(Text=clinical_note)
print (response)
I get the following response:
{
"errorMessage": "'ComprehendMedical' object has no attribute 'infer_snomedct'",
"errorType": "AttributeError",
"requestId": "560f1d3c-800a-46b6-a674-e0c3c3cc719f",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 7, in lambda_handler\n response = cm_client.infer_snomedct(Text=clinical_note)\n",
" File \"/var/runtime/botocore/client.py\", line 643, in __getattr__\n raise AttributeError(\n"
]
}
So either I am missing something (probably something obvious) or maybe the method is not actually available yet? Anything to set me on the right path would be welcome.
This is most likely due to the default Boto3 version being out of date on my lambda.

Renaming S3 object at the same location for removal of Glue bookmark

I have a specific use case where I want to upload an object to S3 at a specific prefix. A file already exists at that prefix and I want to replace that file with this new one. I am using boto3 to do the same and I am getting the following error. Bucket versioning is turned off and hence I am expecting the file to be overwritten in this case. However, I get the following error.
{
"errorMessage": "An error occurred (InvalidRequest) when calling the CopyObject operation: This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)\n",
" File \"/var/runtime/boto3/resources/factory.py\", line 520, in do_action\n response = action(self, *args, **kwargs)\n",
" File \"/var/runtime/boto3/resources/action.py\", line 83, in __call__\n response = getattr(parent.meta.client, operation_name)(*args, **params)\n",
" File \"/var/runtime/botocore/client.py\", line 386, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 705, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
This is what I have tried so far.
import boto3
import tempfile
import os
import tempfile
print('Loading function')
s3 = boto3.resource('s3')
glue = boto3.client('glue')
bucket='my-bucket'
bucket_prefix='my-prefix'
def lambda_handler(_event, _context):
my_bucket = s3.Bucket(bucket)
# Code to find the object name. There is going to be only one file.
for object_summary in my_bucket.objects.filter(Prefix=bucket_prefix):
product_key= object_summary.key
print(product_key)
#Using product_key variable I am trying to copy the same file name to the same location, which is when I get an error.
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key)
# Maybe the following line is not required
s3.Object(bucket,bucket_prefix).delete()
I have a very specific reason to copy the same file at the same location. AWS GLue doesn't pick the same file once it's bookmarked it. My copying the file again I am hoping that the Glue bookmark will be dropped and the Glue job will consider this as a new file.
I am not too tied up with the name. If you can help me modify the above code to generate a new file at the same prefix level that would work as well. There always has to be one file here though. Consider this file as a static list of products that has been bought over from a relational DB into S3.
Thanks
From Tracking Processed Data Using Job Bookmarks - AWS Glue:
For Amazon S3 input sources, AWS Glue job bookmarks check the last modified time of the objects to verify which objects need to be reprocessed. If your input source data has been modified since your last job run, the files are reprocessed when you run the job again.
So, it seems your theory could work!
However, as the error message states, it is not permitted to copy an S3 object to itself "without changing the object's metadata, storage class, website redirect location or encryption attributes".
Therefore, you can add some metadata as part of the copy process and it will succeed. For example:
s3.Object(bucket,product_key).copy_from(CopySource=bucket + '/' + product_key, Metadata={'foo': 'bar'})

Aws Transribe unable to start_transcription_job without LanguageCode in boto3

I have an audio file in S3.
I don't know the language of the audio file. So I need to use IdentifyLanguage for start_transcription_job().
LanguageCode will be blank since I don't know the language of the audio file.
Envirionment
Using
Python 3.8 runtime,
boto3 version 1.16.5 ,
botocore version: 1.19.5,
no Lambda Layer.
Here is my code for the Transcribe job:
mediaFileUri = 's3://'+ bucket_name+'/'+prefixKey
transcribe_client = boto3.client('transcribe')
response = transcribe_client.start_transcription_job(
TranscriptionJobName="abc",
IdentifyLanguage=True,
Media={
'MediaFileUri':mediaFileUri
},
)
Then I get this error:
{
"errorMessage": "Parameter validation failed:\nMissing required parameter in input: \"LanguageCode\"\nUnknown parameter in input: \"IdentifyLanguage\", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, ModelSettings, JobExecutionSettings, ContentRedaction",
"errorType": "ParamValidationError",
"stackTrace": [
" File \"/var/task/app.py\", line 27, in TranscribeSoundToWordHandler\n response = response = transcribe_client.start_transcription_job(\n",
" File \"/var/runtime/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 607, in _make_api_call\n request_dict = self._convert_to_request_dict(\n",
" File \"/var/runtime/botocore/client.py\", line 655, in _convert_to_request_dict\n request_dict = self._serializer.serialize_to_request(\n",
" File \"/var/runtime/botocore/validate.py\", line 297, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\n"
]
}
With this error, means that I must specify the LanguageCode and IdentifyLanguage is an invalid parameter.
100% sure the audio file exist in S3. But without LanguageCode it don't work, and IdentifyLanguage parameter is unknown parameter
I using SAM application to test locally using this command:
sam local invoke MyHandler -e lambda\TheDirectory\event.json
And I cdk deploy, and check in Aws Lambda Console as well, tested it the same events.json, but still getting the same error
This I think is Lambda Execution environment, I didn't use any Lambda Layer.
I look at this docs from Aws Transcribe:
https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html
and this docs of boto3:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job
Clearly state that LanguageCode is not required and IdentifyLanguage is a valid parameter.
So what I missing out? Any idea on this? What should I do?
Update:
I keep searching and asked couple person online, I think I should build the function container 1st to let SAM package the boto3 into the container.
So what I do is, cdk synth a template file:
cdk synth --no-staging > template.yaml
Then:
sam build --use-container
sam local invoke MyHandler78A95900 -e lambda\TheDirectory\event.json
But still, I get the same error, but post the stack trace as well
[ERROR] ParamValidationError: Parameter validation failed:
Missing required parameter in input: "LanguageCode"
Unknown parameter in input: "IdentifyLanguage", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, JobExecutionSettings, ContentRedaction
Traceback (most recent call last):
File "/var/task/app.py", line 27, in TranscribeSoundToWordHandler
response = response = transcribe_client.start_transcription_job(
File "/var/runtime/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 607, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/var/runtime/botocore/client.py", line 655, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
File "/var/runtime/botocore/validate.py", line 297, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
Really no clue what I doing wrong here. I also report a github issue here, but seem like cant reproduce the issue.
Main Question/Problem:
Unable to start_transription_job
without LanguageCode
with IdentifyLanguage=True
What possible reason cause this, and how can I solve this problem(Dont know the languange of the audio file, I want to identify language of audio file without given the LanguageCode) ?
Check whether you are using the latest boto3 version.
boto3.__version__
'1.16.5'
I tried it and it works.
import boto3
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(TranscriptionJobName='Test-20201-27',IdentifyLanguage=True,Media={'MediaFileUri':'s3://BucketName/DemoData/Object.mp4'})
print(response)
{
"TranscriptionJob": {
"TranscriptionJobName": "Test-20201-27",
"TranscriptionJobStatus": "IN_PROGRESS",
"Media": {
"MediaFileUri": "s3://BucketName/DemoData/Object.mp4"
},
"StartTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 599000, tzinfo=tzlocal())",
"CreationTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 565000, tzinfo=tzlocal())",
"IdentifyLanguage": "True"
},
"ResponseMetadata": {
"RequestId": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/x-amz-json-1.1",
"date": "Tue, 27 Oct 2020 14:41:02 GMT",
"x-amzn-requestid": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"content-length": "268",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}
End up I notice this is because my packaged lambda function isn’t being uploaded for some reason. Here is how I solved it after getting help from couple of people.
First modify CDK stack which define my lambda function like this:
from aws_cdk import (
aws_lambda as lambda_,
core
)
from aws_cdk.aws_lambda_python import PythonFunction
class MyCdkStack(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# define lambda
my_lambda = PythonFunction(
self, 'MyHandler',
entry='lambda/MyHandler',
index='app.py',
runtime=lambda_.Runtime.PYTHON_3_8,
handler='MyHandler',
timeout=core.Duration.seconds(10)
)
This will use aws-lambda-python module ,it will handle installing all required modules into the docker.
Next, cdk synth a template file
cdk synth --no-staging > template.yaml
At this point, it will bundling all the stuff inside entry path which define in PythonFunction and install all the necessary dependencies defined in requirements.txt inside that entry path.
Next, build the docker container
$ sam build --use-container
Make sure template.yaml file in root directory. This will build a docker container, and the artifact will build inside .aws-sam/build directory in my root directory.
Last step, invoke the function using sam:
sam local invoke MyHandler78A95900 -e path\to\event.json
Now finally successfully call start_transcription_job as stated in my question above without any error.
In Conclusion:
At the very beginning I only pip install boto3, this only will
install the boto3 in my local system.
Then, I sam local invoke without build the container 1st by sam build --use-container
Lastly, I have sam build at last, but in that point, I didn't
bundle what defined inside requirements.txt into the
.aws-sam/build, therefore need to use aws-lambda-python
module as stated above.

Unable to invoke SageMaker endpoint(TensorFlow Model) using Boto3 client from AWS Lambda in python

I have a TensorFlow Model deployed with AWS SageMaker endpoint exposed .
From Lambda Python I am using boto3 client to invoke the endpoint .
The TensorFlow model accepts 3 inputs as follows
{'input1' : numpy array , 'input2' : integer ,'input3' :numpy array }
From Lambda using runtime.invoke_endpoint to invoke the SageMaker endpoint .
Getting the error as Parse Error when the API is invoked from boto3client
I tried serializing the data into csv format before calling the API endpoint
Below code written in Lambda
payload = {'input1': encoded_enc_inputstanza_in_batch,
'input2' : encoded_enc_inputstanza_in_batch.shape[1],
'input3' : np.reshape([[15]*20],20) }
infer_file = io.StringIO()
writer = csv.writer(infer_file)
for key, value in payload.items():
writer.writerow([key, value])
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=infer_file.getvalue())
Additional Details
These are the additional details
- Sagemaker Model expects 3 fields as input - 'input1' - Numpy array
'input2' - Int data type ,
'input3' - numpy array
Actual result -
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 143, in lambda_handler
Body=infer_file.getvalue())
File "/var/runtime/botocore/client.py", line 320, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 623, in _make_api_call
raise error_class(parsed_response, operation_name)
END RequestId: fa70e1f3-763b-41be-ad2d-76ae80aefcd0
Expected Result - Successful invocation of the API endpoint .
For text/csv the value for the Body argument to invoke_endpoint should be a string with commas separating the values for each feature. For example, a record for a model with four features might look like: 1.5,16.0,14,23.0. You can try the following:
dp = ','.join(str(a) for a in payload)
After converting the data to text/csv format with comma delimited, have you updated the Endpoint to use the new model data? The input data needs to match the schema of the model.
Is there comma in the "encoded_enc_inputstanza_in_batch" variable?

Error using OAuth2 to connect to dropbox in Python

On my Raspberry Pi running raspbian jessie I tried to go through the OAuth2 flow to connect a program to my dropbox using the dropbox SDK for Python which I installed via pip.
For a test, I copied the code from the documentation (and defined the app-key and secret, of course):
from dropbox import DropboxOAuth2FlowNoRedirect
auth_flow = DropboxOAuth2FlowNoRedirect(APP_KEY, APP_SECRET)
authorize_url = auth_flow.start()
print "1. Go to: " + authorize_url
print "2. Click \"Allow\" (you might have to log in first)."
print "3. Copy the authorization code."
auth_code = raw_input("Enter the authorization code here: ").strip()
try:
access_token, user_id = auth_flow.finish(auth_code)
except Exception, e:
print('Error: %s' % (e,))
return
dbx = Dropbox(access_token)
I was able to get the URL and to click allow. When I then entered the authorization code however, it printed the following error:
Error: 'str' object has no attribute 'copy'
Using format_exc from the traceback-module, I got the following information:
Traceback (most recent call last):
File "test.py", line 18, in <module>
access_token, user_id = auth_flow.finish(auth_code)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 180, in finish
return self._finish(code, None)
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 50, in _finish
url = self.build_url(Dropbox.HOST_API, '/oauth2/token')
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 111, in build_url
return "https://%s%s" % (self._host, self.build_path(target, params))
File "/usr/local/lib/python2.7/dist-packages/dropbox/oauth.py", line 89, in build_path
params = params.copy()
AttributeError: 'str' object has no attribute 'copy'
It seems the build_path method expects a dict 'params' and receives a string instead. Any ideas?
Thanks to smarx for his comment. The error is a known issue and will be fixed in version 3.42 of the SDK. source