AWS: Boto3 configuring bucket lifecycle - Malformed XML - amazon-web-services

The following code should enable versioning on a bucket/list of buckets, and then set the lifecycle configuration.
import boto3
# Create session
s3 = boto3.resource('s3')
s3Client = boto3.client('s3')
# Bucket list
buckets = ['BUCKETNAMEHERE']
# iterate through list of buckets
for bucket in buckets:
# Enable Versioning
bucketVersioning = s3.BucketVersioning(bucket)
bucketVersioning.enable()
# Configure Lifecycle
s3Client.put_bucket_lifecycle_configuration(
Bucket=bucket,
LifecycleConfiguration={
'Rules': [
{
'Status': 'Enabled',
'NoncurrentVersionTransitions': [
{
'NoncurrentDays': 7,
'StorageClass': 'GLACIER'
},
],
'NoncurrentVersionExpiration': {
'NoncurrentDays': 30
}
},
]
}
)
print "Versioning and lifecycle have been enabled for buckets."
However, whenever I run this I get the following error:
File "putVersioning.py", line 42, in <module>
'NoncurrentDays': 30
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 253, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 557, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the PutBucketLifecycleConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema
As far as I can tell, everything looks correct?

According to the docs here you need to add Filter element, which is required as per Amazon API, and confusingly enough, not required by boto. I added the deprecated Prefix argument instead of the Filter and it seems to be working too.

This one works for me:
client.put_bucket_lifecycle_configuration(
Bucket=s3_bucket,
LifecycleConfiguration={
'Rules': [
{
'Expiration': {'Days': 5},
'Filter': {'Prefix': 'folder1/'},
'ID': 'id',
'Status': 'Enabled'
}
]
})
To see an actual Schema, create a new rule in S3, and then use client.get_bucket_lifecycle_configuration(Bucket=s3_bucket)

Related

Is Snomed Integration For AWS Medical Comprehend Actually Working?

According to a recent update, the AWS medical comprehend service should now be returning snomed categories along with other medical terms.
I am running this in a Python 3.9 Lambda:
import json
def lambda_handler(event, context):
clinical_note = "Patient X was diagnosed with insomnia."
import boto3
cm_client = boto3.client("comprehendmedical")
response = cm_client.infer_snomedct(Text=clinical_note)
print (response)
I get the following response:
{
"errorMessage": "'ComprehendMedical' object has no attribute 'infer_snomedct'",
"errorType": "AttributeError",
"requestId": "560f1d3c-800a-46b6-a674-e0c3c3cc719f",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 7, in lambda_handler\n response = cm_client.infer_snomedct(Text=clinical_note)\n",
" File \"/var/runtime/botocore/client.py\", line 643, in __getattr__\n raise AttributeError(\n"
]
}
So either I am missing something (probably something obvious) or maybe the method is not actually available yet? Anything to set me on the right path would be welcome.
This is most likely due to the default Boto3 version being out of date on my lambda.

Aws Transribe unable to start_transcription_job without LanguageCode in boto3

I have an audio file in S3.
I don't know the language of the audio file. So I need to use IdentifyLanguage for start_transcription_job().
LanguageCode will be blank since I don't know the language of the audio file.
Envirionment
Using
Python 3.8 runtime,
boto3 version 1.16.5 ,
botocore version: 1.19.5,
no Lambda Layer.
Here is my code for the Transcribe job:
mediaFileUri = 's3://'+ bucket_name+'/'+prefixKey
transcribe_client = boto3.client('transcribe')
response = transcribe_client.start_transcription_job(
TranscriptionJobName="abc",
IdentifyLanguage=True,
Media={
'MediaFileUri':mediaFileUri
},
)
Then I get this error:
{
"errorMessage": "Parameter validation failed:\nMissing required parameter in input: \"LanguageCode\"\nUnknown parameter in input: \"IdentifyLanguage\", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, ModelSettings, JobExecutionSettings, ContentRedaction",
"errorType": "ParamValidationError",
"stackTrace": [
" File \"/var/task/app.py\", line 27, in TranscribeSoundToWordHandler\n response = response = transcribe_client.start_transcription_job(\n",
" File \"/var/runtime/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 607, in _make_api_call\n request_dict = self._convert_to_request_dict(\n",
" File \"/var/runtime/botocore/client.py\", line 655, in _convert_to_request_dict\n request_dict = self._serializer.serialize_to_request(\n",
" File \"/var/runtime/botocore/validate.py\", line 297, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\n"
]
}
With this error, means that I must specify the LanguageCode and IdentifyLanguage is an invalid parameter.
100% sure the audio file exist in S3. But without LanguageCode it don't work, and IdentifyLanguage parameter is unknown parameter
I using SAM application to test locally using this command:
sam local invoke MyHandler -e lambda\TheDirectory\event.json
And I cdk deploy, and check in Aws Lambda Console as well, tested it the same events.json, but still getting the same error
This I think is Lambda Execution environment, I didn't use any Lambda Layer.
I look at this docs from Aws Transcribe:
https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html
and this docs of boto3:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job
Clearly state that LanguageCode is not required and IdentifyLanguage is a valid parameter.
So what I missing out? Any idea on this? What should I do?
Update:
I keep searching and asked couple person online, I think I should build the function container 1st to let SAM package the boto3 into the container.
So what I do is, cdk synth a template file:
cdk synth --no-staging > template.yaml
Then:
sam build --use-container
sam local invoke MyHandler78A95900 -e lambda\TheDirectory\event.json
But still, I get the same error, but post the stack trace as well
[ERROR] ParamValidationError: Parameter validation failed:
Missing required parameter in input: "LanguageCode"
Unknown parameter in input: "IdentifyLanguage", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, JobExecutionSettings, ContentRedaction
Traceback (most recent call last):
File "/var/task/app.py", line 27, in TranscribeSoundToWordHandler
response = response = transcribe_client.start_transcription_job(
File "/var/runtime/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 607, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/var/runtime/botocore/client.py", line 655, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
File "/var/runtime/botocore/validate.py", line 297, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
Really no clue what I doing wrong here. I also report a github issue here, but seem like cant reproduce the issue.
Main Question/Problem:
Unable to start_transription_job
without LanguageCode
with IdentifyLanguage=True
What possible reason cause this, and how can I solve this problem(Dont know the languange of the audio file, I want to identify language of audio file without given the LanguageCode) ?
Check whether you are using the latest boto3 version.
boto3.__version__
'1.16.5'
I tried it and it works.
import boto3
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(TranscriptionJobName='Test-20201-27',IdentifyLanguage=True,Media={'MediaFileUri':'s3://BucketName/DemoData/Object.mp4'})
print(response)
{
"TranscriptionJob": {
"TranscriptionJobName": "Test-20201-27",
"TranscriptionJobStatus": "IN_PROGRESS",
"Media": {
"MediaFileUri": "s3://BucketName/DemoData/Object.mp4"
},
"StartTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 599000, tzinfo=tzlocal())",
"CreationTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 565000, tzinfo=tzlocal())",
"IdentifyLanguage": "True"
},
"ResponseMetadata": {
"RequestId": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/x-amz-json-1.1",
"date": "Tue, 27 Oct 2020 14:41:02 GMT",
"x-amzn-requestid": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"content-length": "268",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}
End up I notice this is because my packaged lambda function isn’t being uploaded for some reason. Here is how I solved it after getting help from couple of people.
First modify CDK stack which define my lambda function like this:
from aws_cdk import (
aws_lambda as lambda_,
core
)
from aws_cdk.aws_lambda_python import PythonFunction
class MyCdkStack(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# define lambda
my_lambda = PythonFunction(
self, 'MyHandler',
entry='lambda/MyHandler',
index='app.py',
runtime=lambda_.Runtime.PYTHON_3_8,
handler='MyHandler',
timeout=core.Duration.seconds(10)
)
This will use aws-lambda-python module ,it will handle installing all required modules into the docker.
Next, cdk synth a template file
cdk synth --no-staging > template.yaml
At this point, it will bundling all the stuff inside entry path which define in PythonFunction and install all the necessary dependencies defined in requirements.txt inside that entry path.
Next, build the docker container
$ sam build --use-container
Make sure template.yaml file in root directory. This will build a docker container, and the artifact will build inside .aws-sam/build directory in my root directory.
Last step, invoke the function using sam:
sam local invoke MyHandler78A95900 -e path\to\event.json
Now finally successfully call start_transcription_job as stated in my question above without any error.
In Conclusion:
At the very beginning I only pip install boto3, this only will
install the boto3 in my local system.
Then, I sam local invoke without build the container 1st by sam build --use-container
Lastly, I have sam build at last, but in that point, I didn't
bundle what defined inside requirements.txt into the
.aws-sam/build, therefore need to use aws-lambda-python
module as stated above.

Missing field 'SetIdentifier' in Change when deleting/upserting with ChangeResourceRecordSets Boto3

I've struggled for a week trying to delete/upsert a simple Route 53 resource record with Boto3 v1.10.39.
My code:
resp=r53_client.change_resource_record_sets(
HostedZoneId=<ZONE_ID>,
ChangeBatch={
'Comment': 'del_ip',
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': <SUBDOMAIN>,
'Type': 'A',
'Region': 'us-east-1',
'TTL': 300,
'ResourceRecords': [{'Value': <OLD_IP>}]
}
}
]
}
)
Error msg:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 272, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 576, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidInput: An error occurred (InvalidInput) when calling the ChangeResourceRecordSets operation: Invalid request: Missing field 'SetIdentifier' in Change with [Action=DELETE, Name=<SUBDOMAIN>., Type=A, SetIdentifier=null]
Troubleshoot steps:
I've gone through all possible documentations (Boto3, AWSCLI, AWS developer guide), the 'SetIdentifier' field is only required when it's not a simple RR (weighted, multivalue, failover etc.):
SetIdentifier (string)
Resource record sets that have a routing policy other than simple: An identifier that differentiates among multiple resource record sets that have the same combination of name and type, such as multiple weighted resource record sets named acme.example.com that have a type of A. In a group of resource record sets that have the same name and type, the value of SetIdentifier must be unique for each resource record set.
For information about routing policies, see Choosing a Routing Policy in the Amazon Route 53 Developer Guide.
Tried the same operation with AWS CLI, same error as calling with Boto3.
Called list_resource_record_sets and verified there's no set identifier value exists.
Tried to add 'SetIdentifier': None, 'SetIdentifier': '' and 'SetIdentifier': , none of them works, which make sense as the original RR doesn't have this property at all.
Environment:
OS: amzn-ami-hvm-2015.09.1.x86_64-gp2
Python: v2.7.14
BOTO3: v1.10.39
botocore: 1.13.39
aws-cli: v1.16.301
I'm wondering if this is a bug from AWS API.
Created an issue in Boto3 github and got help from them saying removing 'Region' attribute is going to fix my issue, and it works like a charm for my code once I do that.
resp=r53_client.change_resource_record_sets(
HostedZoneId=<ZONE_ID>,
ChangeBatch={
'Comment': 'del_ip',
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': <SUBDOMAIN>,
'Type': 'A',
'TTL': 300,
'ResourceRecords': [{'Value': <OLD_IP>}]
}
}
]
}
)

App Engine stackdriver logging to Global log instead of service log

I'm trying to set up logging for a django app hosted as an App Engine service on GAE.
I have set up the logging succesfully, except that the logging is showing up in the global log for that entire project instead of the log for that service. I would like for the logs to show up only in the specific service logs
this is my django logging config:
from google.cloud import logging as google_cloud_logging
log_client = google_cloud_logging.Client()
log_client.setup_logging()
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'stackdriver_logging': {
'class': 'google.cloud.logging.handlers.CloudLoggingHandler',
'client': log_client
},
},
'loggers': {
'': {
'handlers': ['stackdriver_logging'],
'level': 'INFO',
}
},
}
And I am able to succesfully log to the Global project log by calling like this:
def fetch_orders(request):
logger.error('test error')
logger.critical('test critical')
logger.warning('test warning')
logger.info('test info')
return redirect('dashboard')
I'm wanting to figure out if I can configure the logger to always use the log for the service that it's running in.
EDIT:
I tried the suggestion below, however now it is returning the following error:
Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/google/cloud/logging/handlers/transports/background_thread.py", line 122, in _safely_commit_batch
batch.commit()
File "/env/lib/python3.7/site-packages/google/cloud/logging/logger.py", line 381, in commit
entries = [entry.to_api_repr() for entry in self.entries]
File "/env/lib/python3.7/site-packages/google/cloud/logging/logger.py", line 381, in <listcomp>
entries = [entry.to_api_repr() for entry in self.entries]
File "/env/lib/python3.7/site-packages/google/cloud/logging/entries.py", line 318, in to_api_repr
info = super(StructEntry, self).to_api_repr()
File "/env/lib/python3.7/site-packages/google/cloud/logging/entries.py", line 241, in to_api_repr
info["resource"] = self.resource._to_dict()
AttributeError: 'ConvertingDict' object has no attribute '_to_dict'
I can over-ride this in the package source code to make it work, however the GAE environment requires that I use the package as supplied by google for the cloud logging. Is there any way to go from here?
To my understanding, it should be possible to accomplish what you want using the resource option of CloudLoggingHandler. In the Stackdriver Logging (and Stackdriver Monitoring) API, each object (log line, time-series point) is associated with a "resource" (some thing that exists in a project, that can be provisioned, and can be the source of logs or time-series or the thing that the logs or time-series are being written about). When the resource option is omitted, the CloudLoggingHandler defaults to global as you have observed.
There are a number of monitored resource types, including gae_app, which can be used to represent a specific version of a particular service that is deployed on GAE. Based on your code, this would look something like:
from google.cloud.logging import resource
def get_monitored_resource():
project_id = get_project_id()
gae_service = get_gae_service()
gae_service_version = get_gae_service_version()
resource_type = 'gae_app'
resource_labels = {
'project_id': project_id,
'module_id': gae_service,
'version_id': gae_service_version
}
return resource.Resource(resource_type, resource_labels)
GAE_APP_RESOURCE = get_monitored_resource()
LOGGING = {
# ...
'handlers': {
'stackdriver_logging': {
'class': 'google.cloud.logging.handlers.CloudLoggingHandler',
'client': log_client,
'resource': GAE_APP_RESOURCE,
},
},
# ...
}
In the code above, the functions get_project_id, get_gae_service, and get_gae_service_version can be implemented in terms of the environment variables GOOGLE_CLOUD_PROJECT, GAE_SERVICE, and GAE_VERSION in the Python flexible environment as documented by The Flexible Python Runtime as in:
def get_project_id():
return os.getenv('GOOGLE_CLOUD_PROJECT')

AWS: Boto3 Enable S3 Versioning/Lifecycle - Access denied

I am trying to pass boto3 a list of bucket names and have it first enable versioning on each bucket, then enable a lifecycle policy on each.
I have done aws configure, and do have two profiles, both current, active user profiles with all necessary permissions. The one I want to use is named "default."
import boto3
# Create session
s3 = boto3.resource('s3')
# Bucket list
buckets = ['BUCKET-NAME']
# iterate through list of buckets
for bucket in buckets:
# Enable Versioning
bucketVersioning = s3.BucketVersioning('bucket')
bucketVersioning.enable()
# Current lifecycle configuration
lifecycleConfig = s3.BucketLifecycle(bucket)
lifecycleConfig.add_rule={
'Rules': [
{
'Status': 'Enabled',
'NoncurrentVersionTransition': {
'NoncurrentDays': 7,
'StorageClass': 'GLACIER'
},
'NoncurrentVersionExpiration': {
'NoncurrentDays': 30
}
}
]
}
# Configure Lifecycle
bucket.configure_lifecycle(lifecycleConfig)
print "Versioning and lifecycle have been enabled for buckets."
When I run this I get the following error:
Traceback (most recent call last):
File "putVersioning.py", line 27, in <module>
bucketVersioning.enable()
File "/usr/local/lib/python2.7/dist-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 253, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 557, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutBucketVersioning operation: Access Denied
My profiles has full privileges, so that shouldn't be a problem. Is there something else I need to do for passing credentials? Thanks everyone!
To set the versioning state, you must be the bucket owner.
The above statement means - To use PutBucketVersioning operation to enable the versioning, you must be the owner of the bucket.
Use the below command to check the owner of the bucket. If you are the owner of the bucket, you should be able to set the versioning state as ENABLED / SUSPENDED.
aws s3api get-bucket-acl --bucket yourBucketName
Ok, notionquest is correct; however, it appears I also goofed up in my code by quoting a variable:
bucketVersioning = s3.BucketVersioning('bucket')
should be
bucketVersioning = s3.BucketVersioning(bucket)