Use AWS Lambda to run Sagemaker Batch Transform job - amazon-web-services

I would like to place a csv file in an S3 bucket and get predictions from a Sagemaker model using batch transform job automatically. I would like to do that by using s3 event notification (upon csv upload) to trigger a Lambda function which would do a batch transform job. The lambda function I have written so far is this:
import boto3
sagemaker = boto3.client('sagemaker')
input_data_path = 's3://yeex/upload/examples.csv'.format(default_bucket, 's3://yeex/upload/', 'examples.csv')
output_data_path = 's3://nooz/download/'.format(default_bucket, 's3://nooz/download')
transform_job = sagemaker.transformer.Transformer(
model_name = y_xgboost_21,
instance_count = 1,
instance_type = 'ml.m5.large',
strategy = 'SingleRecord',
assemble_with = 'Line',
output_path = output_data_path,
base_transform_job_name='y-test-batch',
sagemaker_session=sagemaker.Session(),
accept = 'text/csv')
transform_job.transform(data = input_data_path,
content_type = 'text/csv',
split_type = 'Line')
The error it returns is that object sagemaker does not have module transform
What is the syntax I should use in Lambda function?

While Boto3 (boto3.client("sagemaker")) is the general-purpose AWS SDK for Python across different services, examples that you might see referencing classes like Estimator, Transformer, Predictor and etc are referring to the SageMaker Python SDK (import sagemaker).
In general I'd say (almost?) anything that can be done in one can also be done in the other as they use the same underlying service APIs - but the purpose of the SM Python SDK is to provide higher-level abstractions and useful utilities: For example transparently zipping and uploading a source_dir to S3 to deliver "script mode" training.
As far as I'm aware, the SageMaker Python SDK is still not pre-installed in AWS Lambda Python runtimes by default: But it is an open-source and pip-installable package.
So you have 2 choices here:
Continue using boto3 and create your transform job via the low-level create_transform_job API
Install sagemaker in your Python Lambda bundle (Tools like AWS SAM or CDK might make this process easier) and instead import sagemaker so you can use the Transformer and other high-level Python APIs.

Related

Using Lambda for data processing - Sagemaker

I have created a docker image which has Entrypoint as processing.py. This script is taking data from /opt/ml/processing/input and after processing putting it /opt/ml/processing/output folder.
For processing the data I should put the file in /opt/ml/processing/input from s3 and then pick processed file from /opt/ml/processing/output into S3.
Following script in sagemaker is doing it properly:
from sagemaker.processing import Processor, ProcessingInput, ProcessingOutput
import sagemaker
input_data = 's3://sagemaker-ap-south-1-057036842446/sagemaker/Data/Training/Churn_Modelling.csv'
output_dir = 's3://sagemaker-ap-south-1-057036842446/sagemaker/Outputs/'
image_uri = '057036842446.dkr.ecr.ap-south-1.amazonaws.com/aws-docker-repo:latest'
aws_role = sagemaker.get_execution_role()
processor = Processor(image_uri= image_uri, role=aws_role, instance_count=1, instance_type="ml.m5.xlarge")
processor.run(
inputs=[
ProcessingInput(
source=input_data,
destination='/opt/ml/processing/input'
)
],
outputs=[
ProcessingOutput(
source='/opt/ml/processing/output',
destination=output_dir
)
]
)
Could someone please guide how this can be executed with lambda function? It is not recognizing sagemaker package, second there is a challenge in placing file before the script execution and pick processed files.
I am trying codepipeline to automate this operation. However got no success on that.
Not sure how to put image from S3 into folders internally used by script
I need to know how S3 processing step which pick data from /opt/ml/processing/input
If you want to kick off a Processing Job from Lambda you can use boto3 to make the CreateProcessingJob API call:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_processing_job
I would suggest creating the Job as you have been doing using the SageMaker SDK. Once created, you can describe the Job using the DescribeProcessingJob API call:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_processing_job
You can then use the information from the DescribeProcessingJob API call output to fill out the CreateProcessingJob in Lambda.

Run containers on AWS Lambda

I am trying to use the newly launched service i.e. running containers on AWS lambda. But since this whole idea is quite new I am unable to find much support online.
Question: Is there a programmatic way to publish a new ECR image into lambda with all the configurations, using AWS SDK (preferably python)?
Also, can it directly be published version instead of
def pushLatestAsVesion(functionArn, description="Lambda Eenvironment"):
functionDetails = client.get_function(FunctionName=functionArn)
config = functionDetails['Configuration']
response = client.publish_version(
FunctionName=functionArn,
Description=description,
CodeSha256=functionDetails['Configuration']['CodeSha256'],
RevisionId=functionDetails['Configuration']['RevisionId']
)
print(response)
pushLatestAsVesion('arn:aws:lambda:ap-southeast-1:*************:function:my-serverless-fn')
I'm not sure about SDK, but please check SAM - https://aws.amazon.com/blogs/compute/using-container-image-support-for-aws-lambda-with-aws-sam/

import JSON file to google firestore using cloud functions

I'm new to GCP and Python. I have got a task to import JSON file into google firestore using google cloud functions via Python.
Kindly assist please.
I could achieve this system setup using below code. Posting for your reference:-
CLOUD FUNCTIONS CODE
REQUIREMENTS.TXT (Dependencies)
`google-api-python-client==1.7.11
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-cloud-storage==1.19.1
google-cloud-firestore==1.6.2`
MAIN.PY
from google.cloud import storage
from google.cloud import firestore
import json
client = storage.Client()``
def hello_gcs_generic(data, context):
print('Bucket: {}'.format(data['bucket']))
print('File: {}'.format(data['name']))
bucketValue = data['bucket']
filename = data['name']
print('bucketValue : ',bucketValue)
print('filename : ',filename)
testFile = client.get_bucket(data['bucket']).blob(data['name'])
dataValue = json.loads(testFile.download_as_string(client=None))
print(dataValue)
db = firestore.Client()
doc_ref = db.collection(u'collectionName').document(u'documentName')
doc_ref.set(dataValue)
Cloud functinos are server less functions provided by google. The beauty of cloud function is it will destroy it will invoke any trigger happens and itself once the execution is complete. Cloud functions are single purpose functions. Not only python, you can also use NodeJS and Go to write cloud functions. You can create a cloud function very easily by visiting quick start of cloud functions (https://cloud.google.com/functions/docs/quickstart-console).
Your task is to import a JSON file into google firestore. This part you can do using Firestore python connector like any normal python program and add into cloud function console or upload via gcloud. Still the trigger part is missing here. As I mentioned cloud function is serverless. It will execute when any event happens in the attached trigger. You haven't mentioned any trigger here (when you want to trigger the function). Once you give information about the trigger I can give more insights on resolution.
Ishank Aggarwal, You can add the above code snippet as a part of cloud function by following steps:
https://console.cloud.google.com/functions/
Create function using function name, yours requirements, choose run time as python and choose trigger as your gcs bucket.
Once you create it, if any change happens in your bucket, the function will trigger and execute your code

AWS Sagemaker using Batch Transfrom on AWS?

I have a Sagemaker model trained and deployed and looking to run a Batch Transform on multiple files.
I have a lambda function configured to run when a new file is uploaded to S3.
Currently, I have only seen ways to use the Invoke Endpoint function on lambda
i.e
runtime= boto3.client('runtime.sagemaker')
message = ["On Wed Sep PDT Brent Welch said Hacksaw said W"]
response = runtime.invoke_endpoint(EndpointName="sagemaker-scikit-learn-2020-02-05-13-44-45-011",
ContentType='text/plain',
Body=message)
json.loads(response['Body'].read())
However, i have multiple files that need to processed by the Sagemaker model.
I have code to create a transformer and run batch transform on multiple files
import sagemaker
training_job = sagemaker.estimator.Estimator.attach("{model_name")
transformer = training_job.transformer(instance_count=1,instance_type='ml.m4.xlarge',strategy='MultiRecord',assemble_with='Line')
#batch_input_s3
transformer.transform(batch_input_s3)
print('Waiting for transform job: ' + transformer.latest_transform_job.job_name)
transformer.wait()
however, im unable to use this transformer code in a lambda since it requires the sagemaker library, which is over the size limit for zip file is 50mb

i get the this error -->> module 'sagemaker' has no attribute 'describe_training_job'

I am using AWS, but I get the following error in this code:
-->> module 'sagemaker' has no attribute 'describe_training_job'
This is the code:
training_info = sagemaker.describe_training_job(TrainingJobName=job_name)
status = training_info['TrainingJobStatus']
print("Training job ended with status: " + status)
module 'sagemaker' has no attribute 'describe_training_job'
This error message could also occur if the method does not exist due to running an older version of boto3 than the method you are trying to use. For example, I was running SageMaker Studio's public preview and received the following message.
-->> module 'sagemaker' has no attribute 'create_auto_ml_job'
The output of the following did not list create_auto_ml_job as an available option.
sagemaker = boto3.client('sagemaker')
dir(sagemaker)
I found that boto3 was running an older version. Updating boto3 resolved my problem.
pip show boto3
pip install boto3 --upgrade
Yes, the high-level sagemaker Python SDK has no attribute nor method called describe_training_job.
This is actually a boto3 method. boto3 is a lower-level python SDK for all AWS services, that has a SageMaker client. The snippet below illustrates what you want to achieve:
import boto3
boto3_sm = boto3.client('sagemaker')
training_info = boto3_sm.describe_training_job(TrainingJobName=job_name)
Note that model metrics collected over the logs and sent to Cloudwatch (those typically are performance metrics in the built-in algos, and could be any metrics when you write custom code since you extract them with regexp) are not available with the describe_training_job call. That being said, you can get them via the (beta) Search feature:
import boto3
client = boto3.client('sagemaker')
response = client.search(
Resource='TrainingJob',
SearchExpression={
'Filters': [
{
'Name': 'TrainingJobName',
'Operator': 'Equals',
'Value': '<you training job name here>'}]})