Sagemaker python sdk: accessing custom_attributes in inference job - amazon-web-services

I am using Sagemaker python sdk for my inference job and following this guide. I am triggering my sagemaker inference job from Airflow with below python callable:
def transform(sage_role, inference_file_local_path, **kwargs):
"""
Python callable to execute Sagemaker SDK train job. It takes infer_batch_output, infer_batch_input, model_artifact,
instance_type and infer_file_name as run time parameter.
:param inference_file_local_path: Local entry_point path for Inference file.
:param sage_role: Sagemaker execution role.
"""
model = TensorFlowModel(entry_point=infer_file_name,
source_dir=inference_file_local_path,
model_data=model_artifact,
role=sage_role,
framework_version="2.5.1")
tensorflow_serving_transformer = model.transformer(
instance_count=1,
instance_type=instance_type,
accept="text/csv",
strategy="SingleRecord",
max_payload=10,
max_concurrent_transforms=10,
output_path=batch_output)
return tensorflow_serving_transformer.transform(data=batch_input, content_type='text/csv')
and my simply inference.py looks like:
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/x-npy':
# very simple numpy handler
payload = np.load(data.read().decode('utf-8'))
x_user_feature = np.asarray(payload.item().get('test').get('feature_a_list'))
x_channel_feature = np.asarray(payload.item().get('test').get('feature_b_list'))
examples = []
for index, elem in enumerate(x_user_feature):
examples.append({'feature_a_list': elem, 'feature_b_list': x_channel_feature[index]})
return json.dumps({'instances': examples})
if context.request_content_type == 'text/csv':
payload = pd.read_csv(data)
print("Model name is ..............")
model_name = context.model_name
print(model_name)
examples = []
row_ch = []
if config_exists(model_bucket, "{}{}".format(config_path, model_name)):
config_keys = get_s3_json_file(model_bucket, "{}{}".format(config_path, model_name))
feature_b_list = config_keys["feature_b_list"].split(",")
row_ch = [float(ch_feature_str) for ch_feature_str in feature_b_list]
if "column_names" in config_keys.keys():
cols = config_keys["column_names"].split(",")
payload.columns = cols
for index, row in payload.iterrows():
row_user = row['feature_a_list'].replace('[', '').replace(']', '').split()
row_user = [float(x) for x in row_user]
if not row_ch:
row_ch = row['feature_b_list'].replace('[', '').replace(']', '').split()
row_ch = [float(x) for x in row_ch]
example = {'feature_a_list': row_user, 'feature_b_list': row_ch}
examples.append(example)
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type
It is working fine however I want to pass custom arguments to inference.py so that I can modify the input data accordingly based on requirement. I thought of using a config file per requirement and download it from s3 based on model name but as I am using model_data and passes model.tar.gz at runtime context.model_name is always None.
Is there a way I can pass run time argument to inference.py that I can use for customization?
In the docs I see sagemaker provides custom_attributes but I don't see any example of it on how to use it and access it in inference.py.
custom_attributes (string): content of ‘X-Amzn-SageMaker-Custom-Attributes’ header from the original request. For example, ‘tfs-model-name=half*plus*three,tfs-method=predict’

Currently CustomAttributes is supported in the InvokeEndpoint API call when using a realtime Endpoint.
As an example, you can look at passing JSON Lines as input to your Transform Job that contains the input payload and some custom arguments which you can consume in your inference.py file.
For example,
{
"input":"1,2,3,4",
"custom_args":"my_custom_arg"
}

Related

Lambda Function working, but cannot work with API Gateway

I have a working Lambda function when I test it using a test event:
{
"num1_in": 51.5,
"num2_in": -0.097
}
import json
import Function_and_Data_List
#Parse out query string parameters
def lambda_handler(event, context):
num1_in = event['num1_in']
num2_in = event['num2_in']
coord = {'num1': num1_in, 'num2': num2_in}
output = func1(Function_and_Data_List.listdata, coord)
return {
"Output": output
}
However, when I use API gateway to create a REST API I keep getting errors. My method for the REST API are:
1.) Build REST API
2.) Actions -> Create Resource
3.) Actions -> Create Method -> GET
4.) Integration type is Lambda Function, Use Lambda Proxy Integration
5.) Deploy
What am I missing for getting this API to work?
If you use lambda proxy integration, your playload will be in the body. You seem also having incorrect return format.
Therefore, I would recommend trying out the following version of your code:
import json
import Function_and_Data_List
#Parse out query string parameters
def lambda_handler(event, context):
print(event)
body = json.loads(event['body'])
num1_in = body['num1_in']
num2_in = body['num2_in']
coord = {'num1': num1_in, 'num2': num2_in}
output = func1(Function_and_Data_List.listdata, coord)
return {
"statusCode": 200,
"body": json.dumps(output)
}
In the above I also added print(event) so that in the CloudWatch Logs you can inspect the event object which should help debug the issue.

How to deploy breast cancer prediction endpoint created by AWS Sagemaker using Lambda and API gateway?

I am trying to deploy the existing breast cancer prediction model on Amazon Sagemanker using AWS Lambda and API gateway. I have followed the official documentation from the below url.
https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/
I am getting a type error at "predicted_label".
result = json.loads(response['Body'].read().decode())
print(result)
pred = int(result['predictions'][0]['predicted_label'])
predicted_label = 'M' if pred == 1 else 'B'
return predicted_label
please let me know if someone could resolve this issue. Thank you.
By printing the result type by print(type(result)) you can see its a dictionary. now you can see the key name is "score" instead of "predicted_label" that you are giving to pred. Hence replace it with
pred = int(result['predictions'][0]['score'])
I think this solves your problem.
here is my lambda function:
import os
import io
import boto3
import json
import csv
# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
data = json.loads(json.dumps(event))
payload = data['data']
print(payload)
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload)
#print(response)
print(type(response))
for key,value in response.items():
print(key,value)
result = json.loads(response['Body'].read().decode())
print(type(result))
print(result['predictions'])
pred = int(result['predictions'][0]['score'])
print(pred)
predicted_label = 'M' if pred == 1 else 'B'
return predicted_label

AWS Lambda Python - Push part of MediaInfo function response to SNS

firstly I am relatively new to code and attempting to teach myself what I need! I have managed to butcher bits of example code that I have found on various forums to get to where I am now. I am running an AWS Lambda function that triggers when a new file is uploaded to a bucket, and then sends the file off to MediaInfo (I built a self contained CLI executable that is uploaded to the Lambda function) the result of this is in XML format, and I have managed to pass this onto a DynamoDB database.
My question is - I want to export the XML produced by this function and push it to an SNS topic so that I can pick it up and use elsewhere (knack database). Here is my Lambda code in full (changed private info).
import logging
import subprocess
import boto3
SIGNED_URL_EXPIRATION = 300 # The number of seconds that the Signed
URL is valid
DYNAMODB_TABLE_NAME = "demo_metadata"
DYNAMO = boto3.resource("dynamodb")
TABLE = DYNAMO.Table(DYNAMODB_TABLE_NAME)
logger = logging.getLogger('boto3')
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""
:param event:
:param context:
"""
# Loop through records provided by S3 Event trigger
for s3_record in event['Records']:
logger.info("Working on new s3_record...")
# Extract the Key and Bucket names for the asset uploaded to S3
key = s3_record['s3']['object']['key']
bucket = s3_record['s3']['bucket']['name']
logger.info("Bucket: {} \t Key: {}".format(bucket, key))
# Generate a signed URL for the uploaded asset
signed_url = get_signed_url(SIGNED_URL_EXPIRATION, bucket, key)
logger.info("Signed URL: {}".format(signed_url))
# Launch MediaInfo
# Pass the signed URL of the uploaded asset to MediaInfo as an
input
# MediaInfo will extract the technical metadata from the asset
# The extracted metadata will be outputted in XML format and
# stored in the variable xml_output
xml_output = subprocess.check_output(["./mediainfo", "--full", "--output=XML", signed_url])
logger.info("Output: {}".format(xml_output))
save_record(key, xml_output)
def save_record(key, xml_output):
"""
Save record to DynamoDB
:param key: S3 Key Name
:param xml_output: Technical Metadata in XML Format
:return: xml_output
"""
logger.info("Saving record to DynamoDB...")
TABLE.put_item(
Item={
'keyName': key,
'technicalMetadata': xml_output
}
)
logger.info("Saved record to DynamoDB")
def get_signed_url(expires_in, bucket, obj):
"""
Generate a signed URL
:param expires_in: URL Expiration time in seconds
:param bucket:
:param obj: S3 Key name
:return: Signed URL
"""
s3_cli = boto3.client("s3")
presigned_url = s3_cli.generate_presigned_url('get_object', Params= {'Bucket': bucket, 'Key': obj},
ExpiresIn=expires_in)
return presigned_url
The output I get from the Lambda function when using the aws GUI is here, and this is what I went to send to an SNS topic.
<Height>1080</Height>
<Height>1 080 pixels</Height>
<Stored_Height>1088</Stored_Height>
<Sampled_Width>1920</Sampled_Width>
<Sampled_Height>1080</Sampled_Height>
<Pixel_aspect_ratio>1.000</Pixel_aspect_ratio>
<Display_aspect_ratio>1.778</Display_aspect_ratio>
<Display_aspect_ratio>16:9</Display_aspect_ratio>
<Rotation>0.000</Rotation>
<Frame_rate_mode>CFR</Frame_rate_mode>
<Frame_rate_mode>Constant</Frame_rate_mode>
<Frame_rate>29.970</Frame_rate>
<Frame_rate>29.970 (30000/1001) fps</Frame_rate>
<FrameRate_Num>30000</FrameRate_Num>
<FrameRate_Den>1001</FrameRate_Den>
<Frame_count>630</Frame_count>
<Resolution>8</Resolution>
<Resolution>8 bits</Resolution>
<Colorimetry>4:2:0</Colorimetry>
<Color_space>YUV</Color_space>
<Chroma_subsampling>4:2:0</Chroma_subsampling>
<Chroma_subsampling>4:2:0</Chroma_subsampling>
<Bit_depth>8</Bit_depth>
<Bit_depth>8 bits</Bit_depth>
<Scan_type>Progressive</Scan_type>
<Scan_type>Progressive</Scan_type>
<Interlacement>PPF</Interlacement>
<Interlacement>Progressive</Interlacement>
<Bits__Pixel_Frame_>0.129</Bits__Pixel_Frame_>
<Stream_size>21374449</Stream_size>
<Stream_size>20.4 MiB (99%)</Stream_size>
<Stream_size>20 MiB</Stream_size>
<Stream_size>20 MiB</Stream_size>
<Stream_size>20.4 MiB</Stream_size>
<Stream_size>20.38 MiB</Stream_size>
<Stream_size>20.4 MiB (99%)</Stream_size>
<Proportion_of_this_stream>0.98750</Proportion_of_this_stream>
<Encoded_date>UTC 2017-11-24 19:29:16</Encoded_date>
<Tagged_date>UTC 2017-11-24 19:29:16</Tagged_date>
<Buffer_size>16000000</Buffer_size>
<Color_range>Limited</Color_range>
<colour_description_present>Yes</colour_description_present>
<Color_primaries>BT.709</Color_primaries>
<Transfer_characteristics>BT.709</Transfer_characteristics>
<Matrix_coefficients>BT.709</Matrix_coefficients>
</track>
<track type="Audio">
<Count>272</Count>
<Count_of_stream_of_this_kind>1</Count_of_stream_of_this_kind>
<Kind_of_stream>Audio</Kind_of_stream>
<Kind_of_stream>Audio</Kind_of_stream>
<Stream_identifier>0</Stream_identifier>
<StreamOrder>1</StreamOrder>
<ID>2</ID>
<ID>2</ID>
<Format>AAC</Format>
<Format_Info>Advanced Audio Codec</Format_Info>
<Commercial_name>AAC</Commercial_name>
<Format_profile>LC</Format_profile>
<Codec_ID>40</Codec_ID>
<Codec>AAC LC</Codec>
<Codec>AAC LC</Codec>
<Codec_Family>AAC</Codec_Family>
</File>
</Mediainfo>
[INFO] 2018-04-22T18:50:01.803Z efde8294-465d-11e8-9ad2-0db0d6b36746 Saving record to DynamoDB...
[INFO] 2018-04-22T18:50:02.21Z efde8294-465d-11e8-9ad2-0db0d6b36746
Saved record to DynamoDB
END RequestId: efde8294-465d-11e8-9ad2-0db0d6b36746
REPORT RequestId: efde8294-465d-11e8-9ad2-0db0d6b36746 Duration: 9769.02 ms Billed Duration: 9800 ms Memory Size: 128 MB Max Memory Used: 61 MB
Many thanks in advance to anyone with advice!

Importing an AWS image into softlayer

Is it possible to import an AWS image into softlayer directly? I know we can download AWS image and the import into softlayer but was looking for some automated solution.
there is not any Softlayer's API method wich make all the proccess automatically, the image must be uploaded in any of your object storage's account you could use the API to upload the image there here some references:
http://sldn.softlayer.com/blog/waelriac/Managing-SoftLayer-Object-Storage-Through-REST-APIs
and see this documentation about how to handle large files
https://docs.openstack.org/developer/swift/overview_large_objects.html
Once the file has been uploaded tou can import it using the API:
here an example using the SOftlayer's Python client:
"""
Create Image Template from external source
This script creates a transaction to import a disk image from an external source and create
a standard image template
Important manual pages:
http://sldn.softlayer.com/reference/services/SoftLayer_Virtual_Guest_Block_Device_Template_Group/createFromExternalSource
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Container_Virtual_Guest_Block_Device_Template_Configuration
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Virtual_Guest_Block_Device_Template_Group
License: http://sldn.softlayer.com/article/License
Author: SoftLayer Technologies, Inc. <sldn#softlayer.com>
"""
import SoftLayer
# Your SoftLayer username and apiKey
USERNAME = 'set me'
API_KEY = 'set me'
# Declare the group name to be applied to the imported template
name = 'imageTemplateTest'
# Declare the note to be applied to the imported template
note = 'This is for test Rcv'
'''
Declare referenceCode of the operating system software description for the imported VHD
available options: CENTOS_6_32, CENTOS_6_64, CENTOS_7_64, REDHAT_6_32, REDHAT_6_64, REDHAT_7_64,
UBUNTU_12_32, UBUNTU_12_64, UBUNTU_14_32, UBUNTU_14_64, WIN_2003-STD-SP2-5_32, WIN_2003-STD-SP2-5_64,
WIN_2008-STD-R2-SP1_64, WIN_2012-STD_64.
'''
operatingSystemReferenceCode = 'CENTOS_6_64'
'''
Define the parameters below, which refers to object storage where the image template is stored.
It will help to build the uri.
'''
# Declare the object storage account name
objectStorageAccountName = 'SLOS307608-10'
# Declare the cluster name where the image is stored
clusterName = 'dal05'
# Declare the container name where the image is stored
containerName = 'OS'
# Declare the file name of the image stored in the object storage, it should be .vhd or
fileName = 'testImage2.vhd-0.vhd'
"""
Creating an SoftLayer_Container_Virtual_Guest_block_Device_Template_Configuration Skeleton
which contains the information from external source
"""
configuration = {
'name': name,
'note': note,
'operatingSystemReferenceCode': operatingSystemReferenceCode,
'uri': 'swift://'+ objectStorageAccountName + '#' + clusterName + '/' + containerName + '/' + fileName
}
# Declare the API client
client = SoftLayer.Client(username=USERNAME, api_key=API_KEY)
groupService = client['SoftLayer_Virtual_Guest_Block_Device_Template_Group']
try:
result = groupService.createFromExternalSource(configuration)
print(result)
except SoftLayer.SoftLayerAPIError as e:
print("Unable to create the image template from external source. faultCode=%s, faultString=%s" % (e.faultCode, e.faultString))
exit(1)
Regards

Using different settings on a aws lambda function coded in python

I'm using a lambda function, coded in python, as a backend to an aws-api-gateway method.
The api is completed, but now I have a new problem, the API should be deployed to multiple environments (production, test, etc), and each one should use a different configuration for the backend. Let's say that I had this handler:
import settings
import boto3
def dummy_handler(event, context):
logger.info('got event{}'.format(event))
utils = Utils(event["stage"])
response = utils.put_ticket_on_dynamodb(event["item"])
return json.dumps(response)
class Utils:
def __init__(self, stage):
self.stage = stage
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.TABLE_NAME)
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item
Now, in order to use a different TABLE_NAME on each stage, I replace the setting.py file by a module, with this structure:
settings/
__init__.py
_base.py
_servers.py
development.py
production.py
testing.py
Following this answer here.
But I don't have any idea of how can I use it on my solution, considering that stage (passed as parameter to the Utils class), will match the settings filename in the module settings, What should I change in my class Utils to make it works?
Another alternative to handling this use case is to use API Gateway's stage variables and pass in the setting which vary by stage as parameters to your Lambda function.
Stage variables are name-value pairs associated with a specific API deployment stage and act like environment variables for use in your API setup and mapping templates. For example, you can configure an API method in each stage to connect to a different backend endpoint by setting different endpoint values in your stage variables.
Here is a blog post on using stage variables.
Here is the full documentation on using stage variables.
I finally used a different approach here. Instead of a python module for the setting, I used a single script for the settings, with a dictionary containing the configuration for each environment. I would like to use a separate settings script for each environment, but so far I can't find how.
So, now my settings file looks like this:
COUNTRY_CODE = 'CL'
TIMEZONE = "America/Santiago"
LOCALE = "es_CL"
DEFAULT_PAGE_SIZE = 20
ENV = {
'production': {
'TABLE_NAME': "dynamodbTable",
'BUCKET_NAME': "sssBucketName"
},
'testing': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
},
'test-invoke-stage': {
'TABLE_NAME': "dynamodbTableTest",
'BUCKET_NAME': "sssBucketNameTest"
}
}
And my code:
def put_ticket_on_dynamodb(self, item):
# Write record to dynamoDB
try:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(settings.ENV[self.stage]["TABLE_NAME"])
table.put_item(Item=item)
except Exception as e:
logger.error("Fail to put item on DynamoDB: {0}".format(str(e)))
raise
logger.info("Item successfully written to DynamoDB")
return item