Troposphere Add S3 LifecycleRule to the class LifecycleConfiguration - amazon-web-services

Method Signature:
I have tried to change the implementation some different ways, but its not helping.
import troposphere.s3 as S3
class LifecycleConfiguration(AWSProperty):
props = {
'Rules': ([LifecycleRule], True),
}
class LifecycleRule(AWSProperty):
props = {
'ExpirationDate': (basestring, False),
'ExpirationInDays': (positive_integer, False),
'Id': (basestring, False),
'Prefix': (basestring, False),
'Status': (basestring, True),
'Transition': (LifecycleRuleTransition, False),
}
Test Implimentation
myLifecycleConfiguration = S3.LifecycleConfiguration(title='myLifecycleConfiguration',
Rules=S3.LifecycleRule(title="check"))
Error Signature
TypeError: Rules is <class 'troposphere.s3.LifecycleRule'>, expected [<class 'troposphere.s3.LifecycleRule'>]

That's normal, and the error message is telling you what the problem is : the 'Rules' property of a LifecycleConfiguration is expected to be a list of LifecycleRule objects (see in the code : 'Rules': ([LifecycleRule], True),)
But when you create your S3.LifecycleConfiguration, you feed the 'Rules' property with a single LifecycleRule object instead.
You shoud write:
myLifecycleConfiguration = S3.LifecycleConfiguration(
title='myLifecycleConfiguration',Rules=[S3.LifecycleRule(title="check")])

Related

Variable passing throwing error in BigQueryInsertJobOperator in Airflow

I have written a BigQueryInsertJobOperator in Airflow to select and insert data to a Big Query table. But I am facing issue with variable passing. I am getting below error while executing Airflow DAG.
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 911, in to_api_repr
configuration = self._configuration.to_api_repr()
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 683, in to_api_repr
query_parameters = resource["query"].get("queryParameters")
AttributeError: 'str' object has no attribute 'get'
Here is my Operator code:
dag = DAG(
'bq_to_sql_operator',
default_args=default_args,
schedule_interval="#daily",
template_searchpath="/opt/airflow/dags/scripts",
user_defined_macros={"BQ_PROJECT": BQ_PROJECT, "BQ_EDW_DATASET": BQ_EDW_DATASET, "BQ_STAGING_DATASET": BQ_STAGING_DATASET},
catchup=False
)
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
My SQL file looks like as below:
select
faccs2.employer_key employer_key,
faccs2.service_name service_name,
gender,
approximate_age_band,
state,
relationship_map_name,
account_attribute1_name,
account_attribute1_value,
account_attribute2_name,
account_attribute2_value,
account_attribute3_name,
account_attribute3_value,
account_attribute4_name,
account_attribute4_value,
account_attribute5_name,
account_attribute5_value,
count(distinct faccs2.sf_service_id) total_service_count
from `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_survey` faccs
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_service` faccs2 on faccs.sf_case_id = faccs2.sf_case_id
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.dim_account` da on faccs2.account_key = da.account_key
left join `{{params.BQ_PROJECT}}.{{params.BQ_STAGING_DATASET}}.stg_account_selected_attr_tmp2` attr on faccs.account_key = attr.account_key
where not da.is_test_account_flag
and attr.gender is not null
and coalesce(faccs.case_status,'abc') <> 'Closed as Duplicate'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16;
Can someone please help me how to fix this issue.
I think that the query configuration should be in a nested document called query:
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": {
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
With your provided configuration dict, an internal method try to access queryParameters which should be in the dict configuration["query"], but it finds str instead of dict.
Consider below script what I've used at work.
target_date = '{{ ds_nodash }}'
...
# DAG task
t1= bq.BigQueryInsertJobOperator(
task_id = 'sample_task,
configuration = {
"query": {
"query": f"{{% include 'your_query_file.sql' %}}",
"useLegacySql": False,
"queryParameters": [
{ "name": "target_date",
"parameterType": { "type": "STRING" },
"parameterValue": { "value": f"{target_date}" }
}
],
"parameterMode": "NAMED"
},
},
location = 'asia-northeast3',
)
-- in your_query_file.sql, #target_date value is passed as a named parameter.
DECLARE target_date DATE DEFAULT SAFE.PARSE_DATE('%Y%m%d', #target_date);
SELECT ... FROM ... WHERE partitioned_at = target_date;
You can refer to configuration JSON field specification on the link below.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#queryrequest
parameterMode string
Standard SQL only. Set to POSITIONAL to use positional (?) query parameters or to NAMED to use named (#myparam) query parameters in this query.
queryParameters[] object (QueryParameter)
jobs.query parameters for Standard SQL queries.
queryParameters is an array of QueryParameter which has following JSON format.
{
"name": string,
"parameterType": {
object (QueryParameterType)
},
"parameterValue": {
object (QueryParameterValue)
}
}
https://cloud.google.com/bigquery/docs/reference/rest/v2/QueryParameter

The argument type 'Object?' can't be assigned to the parameter type 'String'.?

I have a problem with models flutter project that I have..
I get an error:
The argument type 'Object?' can't be assigned to the parameter type
'String'.
The argument type 'Object?' can't be assigned to the parameter type
'Int'.
The argument type 'Object?' can't be assigned to the parameter type
'String'.
class Category {
final String name;
final int numOfCourses;
final String image;
Category(this.name, this.numOfCourses, this.image);
}
List<Category> categories = categoriesData
.map((item) => Category(item['name'], item['courses'], item['image']))
.toList();
var categoriesData = [
{"name": "Marketing", 'courses': 17, 'image': "assets/images/marketing.png"},
{"name": "UX Design", 'courses': 25, 'image': "assets/images/ux_design.png"},
{
"name": "Photography",
'courses': 13,
'image': "assets/images/photography.png"
},
{"name": "Business", 'courses': 17, 'image': "assets/images/business.png"},
];
error in this part
(item['name'], item['courses'], item['image'])
thanks for the answers..
Dart doesn't know what categoriesData['name'], categoriesData['courses'] or categoriesData['image'] are supposed to be, to tell it, you can use the as keyword:
categories = categoriesData
.map((item) => Category(item['name'] as String, item['courses'] as int, item['image'] as String))
.toList();

Properly return a label in post-annotation lambda for AWS SageMaker Ground Truth custom labeling job

I'm working on a SageMaker labeling job with custom datatypes. For some reason though, I'm not getting the correct label in the AWS web console. It should have the selected label which is "Native", but instead, I'm getting the <labelattributename> which is "new-test-14".
After Ground Truth runs the post-annotation lambda, it seems to modify the metadata before returning a data object. The data object it returns doesn't contain a class-name key inside the metadata attribute, even when I hard-code the lambda to return an object that contains it.
My manifest file looks like this:
{"source-ref" : "s3://<file-name>", "text" : "Hello world"}
{"source-ref" : "s3://"<file-name>", "text" : "Hello world"}
And the worker response looks like this:
{"answers":[{"acceptanceTime":"2021-05-18T16:08:29.473Z","answerContent":{"new-test-14":{"label":"Native"}},"submissionTime":"2021-05-18T16:09:15.960Z","timeSpentInSeconds":46.487,"workerId":"private.us-east-1.ea05a03fcd679cbb","workerMetadata":{"identityData":{"identityProviderType":"Cognito","issuer":"https://cognito-idp.us-east-1.amazonaws.com/us-east-1_XPxQ9txEq","sub":"edc59ce1-e09d-4551-9e0d-a240465ea14a"}}}]}
That worker response gets processed by my post-annotation lambda which is modeled after this aws sample ground truth recipe. Here's my code:
import json
import sys
import boto3
from datetime import datetime
def lambda_handler(event, context):
# Event received
print("Received event: " + json.dumps(event, indent=2))
labeling_job_arn = event["labelingJobArn"]
label_attribute_name = event["labelAttributeName"]
label_categories = None
if "label_categories" in event:
label_categories = event["labelCategories"]
print(" Label Categories are : " + label_categories)
payload = event["payload"]
role_arn = event["roleArn"]
output_config = None # Output s3 location. You can choose to write your annotation to this location
if "outputConfig" in event:
output_config = event["outputConfig"]
# If you specified a KMS key in your labeling job, you can use the key to write
# consolidated_output to s3 location specified in outputConfig.
# kms_key_id = None
# if "kmsKeyId" in event:
# kms_key_id = event["kmsKeyId"]
# # Create s3 client object
# s3_client = S3Client(role_arn, kms_key_id)
s3_client = boto3.client('s3')
# Perform consolidation
return do_consolidation(labeling_job_arn, payload, label_attribute_name, s3_client)
def do_consolidation(labeling_job_arn, payload, label_attribute_name, s3_client):
"""
Core Logic for consolidation
:param labeling_job_arn: labeling job ARN
:param payload: payload data for consolidation
:param label_attribute_name: identifier for labels in output JSON
:param s3_client: S3 helper class
:return: output JSON string
"""
# Extract payload data
if "s3Uri" in payload:
s3_ref = payload["s3Uri"]
payload_bucket, payload_key = s3_ref.split('/',2)[-1].split('/',1)
payload = json.loads(s3_client.get_object(Bucket=payload_bucket, Key=payload_key)['Body'].read())
# print(payload)
# Payload data contains a list of data objects.
# Iterate over it to consolidate annotations for individual data object.
consolidated_output = []
success_count = 0 # Number of data objects that were successfully consolidated
failure_count = 0 # Number of data objects that failed in consolidation
for p in range(len(payload)):
response = None
dataset_object_id = payload[p]['datasetObjectId']
log_prefix = "[{}] data object id [{}] :".format(labeling_job_arn, dataset_object_id)
print("{} Consolidating annotations BEGIN ".format(log_prefix))
annotations = payload[p]['annotations']
# print("{} Received Annotations from all workers {}".format(log_prefix, annotations))
# Iterate over annotations. Log all annotation to your CloudWatch logs
annotationsFromAllWorkers = []
for i in range(len(annotations)):
worker_id = annotations[i]["workerId"]
anotation_data = annotations[i]["annotationData"]
annotation_content = anotation_data["content"]
annotation_content_json = json.loads(annotation_content)
annotation_job = annotation_content_json["new_test"]
annotation_label = annotation_job["label"]
consolidated_annotation= {
"workerId": worker_id,
"annotationData": {
"content": {
"annotatedResult": {
"instances": [{"label":annotation_label }]
}
}
}
}
annotationsFromAllWorkers.append(consolidated_annotation)
consolidated_annotation = {"annotationsFromAllWorkers": annotationsFromAllWorkers} # TODO : Add your consolidation logic
# Build consolidation response object for an individual data object
response = {
"datasetObjectId": dataset_object_id,
"consolidatedAnnotation": {
"content": {
label_attribute_name: consolidated_annotation,
label_attribute_name+ "-metadata": {
"class-name": "Native",
"confidence": 0.00,
"human-annotated": "yes",
"creation-date": datetime.strftime(datetime.now(), "%Y-%m-%dT%H:%M:%S"),
"type": "groundtruth/custom"
}
}
}
}
success_count += 1
# print("{} Consolidating annotations END ".format(log_prefix))
# Append individual data object response to the list of responses.
if response is not None:
consolidated_output.append(response)
failure_count += 1
print(" Consolidation failed for dataobject {}".format(p))
print(" Unexpected error: Consolidation failed." + str(sys.exc_info()[0]))
print("Consolidation Complete. Success Count {} Failure Count {}".format(success_count, failure_count))
print(" -- Consolidated Output -- ")
print(consolidated_output)
print(" ------------------------- ")
return consolidated_output
As you can see above, the do_consolidation method returns an object hard-coded to include a class-name of "Native", and the lambda_handler method returns that same object. Here's the post-annotation function response:
[{
"datasetObjectId": "4",
"consolidatedAnnotation": {
"content": {
"new-test-14": {
"annotationsFromAllWorkers": [{
"workerId": "private.us-east-1.ea05a03fcd679cbb",
"annotationData": {
"content": {
"annotatedResult": {
"instances": [{
"label": "Native"
}]
}
}
}
}]
},
"new-test-14-metadata": {
"class-name": "Native",
"confidence": 0,
"human-annotated": "yes",
"creation-date": "2021-05-19T07:06:06",
"type": "groundtruth/custom"
}
}
}
}]
As you can see, the post-annotation function return value has the class-name of "Native" in the metadata so I would expect the class-name to be present in the data object metadata, but it's not. And here's a screenshot of the data object summary:
It seems like Ground Truth overwrote the metadata, and now the object doesn't contain the correct label. I think perhaps that's why my label is coming through as the label attribute name "new-test-14" instead of as the correct label "Native". Here's a screenshot of the labeling job in the AWS web console:
The web console is supposed to show the label "Native" inside the "Label" column but instead I'm getting the <labelattributename> "new-test-14" in the label column.
Here is the output.manifest file generated by Ground Truth at the end:
{
"source-ref": "s3://<file-name>",
"text": "Hello world",
"new-test-14": {
"annotationsFromAllWorkers": [{
"workerId": "private.us-east-1.ea05a03fcd679ert",
"annotationData": {
"content": {
"annotatedResult": {
"label": "Native"
}
}
}
}]
},
"new-test-14-metadata": {
"type": "groundtruth/custom",
"job-name": "new-test-14",
"human-annotated": "yes",
"creation-date": "2021-05-18T12:34:17.400000"
}
}
What should I return from the Post-Annotation function? Am I missing something in my response? How do I get the proper label to appear in the AWS web console?

Serialize Django Request

I'd like to serialize the Django request in order to log it in a DB. I tried different approaches but none of them successfully.
class RunTest(View):
def get(self, request, url):
srd = serializers.serialize('json', request)
return HttpResponse(json.dumps(request.META))
But this raise the error
module 'rest_framework.serializers' has no attribute 'serialize'
Probably because I'm using the rest-framework as a Middleware.
I also used
srd = json.dumps(request)
In this case the error is
Object of type 'WSGIRequest' is not JSON serializable
Any ideas how to fix this?
I've faced similar problem when trying to store received requests META data in JSONField. Problem is that request.META is a dict but it's not appropriate JSON.
Example request.META I have is:
{
"wsgi.version": (1, 0),
"wsgi.url_scheme": "http",
"wsgi.input": <_io.BufferedReader name=10>,
"wsgi.errors": <_io.TextIOWrapper name="<stderr>" mode="w" encoding="utf-8">,
"wsgi.multithread": True,
"wsgi.multiprocess": False,
"wsgi.run_once": False,
"SERVER_SOFTWARE": "Werkzeug/1.0.1",
"REQUEST_METHOD": "POST",
"SCRIPT_NAME": "",
"PATH_INFO": "/api/v1/vouchers/voucher-distribute/",
"QUERY_STRING": "",
"REQUEST_URI": "/api/v1/vouchers/voucher-distribute"
...
}
So as you can see first few keys with wsgi prefix is inappropriate JSON format what you can also check online at: http://json.parser.online.fr/
So to store request.META as JSON dict it's necessary to get rid of this keys. The trick is that you cannot use request.META.pop("wsgi.version") because request.META is not appropriate JSON format :)
What I did is I've created helper function:
def create_request_meta_json_object(meta_data):
return {
"REQUEST_METHOD": meta_data["REQUEST_METHOD"],
"SERVER_SOFTWARE": meta_data["SERVER_SOFTWARE"],
"REQUEST_METHOD": meta_data["REQUEST_METHOD"],
"SCRIPT_NAME": meta_data["SCRIPT_NAME"],
"PATH_INFO": meta_data["PATH_INFO"],
"QUERY_STRING": meta_data["QUERY_STRING"],
"REQUEST_URI": meta_data["REQUEST_URI"],
"RAW_URI": meta_data["RAW_URI"],
"REMOTE_ADDR": meta_data["REMOTE_ADDR"],
"REMOTE_PORT": meta_data["REMOTE_PORT"],
"SERVER_NAME": meta_data["SERVER_NAME"],
"SERVER_PORT": meta_data["SERVER_PORT"],
"SERVER_PROTOCOL": meta_data["SERVER_PROTOCOL"],
"HTTP_X_FORWARDED_HOST": meta_data["HTTP_X_FORWARDED_HOST"],
"HTTP_X_FORWARDED_PORT": meta_data["HTTP_X_FORWARDED_PORT"],
"HTTP_ACCEPT_ENCODING": meta_data["HTTP_ACCEPT_ENCODING"],
"HTTP_USER_AGENT": meta_data["HTTP_USER_AGENT"],
"HTTP_FROM": meta_data["HTTP_FROM"],
"HTTP_ACCEPT": meta_data["HTTP_ACCEPT"],
"CONTENT_TYPE": meta_data["CONTENT_TYPE"],
"CONTENT_LENGTH": meta_data["CONTENT_LENGTH"],
"HTTP_CONNECTION": meta_data["HTTP_CONNECTION"],
"HTTP_X_NGINX_PROXY": meta_data["HTTP_X_NGINX_PROXY"],
"HTTP_X_FORWARDED_PROTO": meta_data["HTTP_X_FORWARDED_PROTO"],
"HTTP_X_FORWARDED_FOR": meta_data["HTTP_X_FORWARDED_FOR"],
"HTTP_X_REAL_IP": meta_data["HTTP_X_REAL_IP"],
}
and use it like:
meta_data_as_json = create_request_meta_json_object(request.META)
You cannot serialize request - you could serialize request.GET also know as request.query_paramsin DRF.
srd = json.dumps(request.query_params)
or
srd = json.dumps(request.GET)
To use a serializer, you first have to create one. Declaring Serializers is a good starting point.
Another potential solution is to use dictionary comprehension:
meta_keys = {
"wsgi.version": (1, 0),
"wsgi.version": "http",
"wsgi.input": object(),
"wsgi.errors": object(),
"wsgi.multithread": True,
"wsgi.multiprocess": False,
"wsgi.run_once": False,
"SERVER_SOFTWARE": "Werkzeug/1.0.1",
"REQUEST_METHOD": "POST",
"SCRIPT_NAME": "",
"PATH_INFO": "/api/v1/vouchers/voucher-distribute/",
"QUERY_STRING": "",
"REQUEST_URI": "/api/v1/vouchers/voucher-distribute"
}
excluded_meta_keys = ['wsgi.version', 'wsgi.version', 'wsgi.input', 'wsgi.multithread', 'wsgi.multiprocess', 'wsgi.run_once', 'wsgi.errors']
print({key: value for key, value in meta_keys.items() if key not in excluded_meta_keys})
print({key: value for key, value in meta_keys.items() if isinstance(value, (str, bool, int, float))})
results in:
{'SERVER_SOFTWARE': 'Werkzeug/1.0.1', 'REQUEST_METHOD': 'POST', 'SCRIPT_NAME': '', 'PATH_INFO': '/api/v1/vouchers/voucher-distribute/', 'QUERY_STRING': '', 'REQUEST_URI': '/api/v1/vouchers/voucher-distribute'}
{'wsgi.version': 'http', 'wsgi.multithread': True, 'wsgi.multiprocess': False, 'wsgi.run_once': False, 'SERVER_SOFTWARE': 'Werkzeug/1.0.1', 'REQUEST_METHOD': 'POST', 'SCRIPT_NAME': '', 'PATH_INFO': '/api/v1/vouchers/voucher-distribute/', 'QUERY_STRING': '', 'REQUEST_URI': '/api/v1/vouchers/voucher-distribute'}
respectively
request.META has keys of types not allowed by a json object, so you can generate another dictionary only with keys with values of str type:
dict = create_request_meta_json_object(request.META)
def create_request_meta_json_object(meta_data):
diccionario = []
for key, value in meta_data.items():
if type(value) is str:
diccionario.append({"key": key, "value": value})
return diccionario

How does JSON.mapping macro work with union types of arguments?

In JSON.mapping documentation explicitly stated the value of type property should be single type. However, in practice union types also works:
json1 = %q({"ok": true, "result": [{"type": "update", "id": 1}, {"type": "update", "id": 2}]})
json2 = %q({"ok": true, "result": {"type": "message"}})
class Response
JSON.mapping({
ok: Bool,
result: Message | Array(Update)
})
end
class Update
JSON.mapping({
type: String,
id: Int32
})
end
class Message
JSON.mapping({
type: String
})
end
Calling Response.from_json on both JSON string will output expected result.
Response.from_json json1
will output:
#<Response:0x10d20ce20
#ok=true,
#result=
[#<Update:0x10d20cc60 #id=1, #type="update">,
#<Update:0x10d20cbe0 #id=2, #type="update">]>
And
Response.from_json json2
will output:
#<Response:0x10d20c180
#ok=true,
#result=#<Message:0x10e241f80 #type="message">>
My question is how does it work? Is it expected behaviour or random unreliable feature?
This is expected, the documentation is incorrect.