Is it possible to trigger Sagemaker notebook from AWS Lambda function? - amazon-web-services

I am trying to trigger Sagemaker notebook or Sagemaker Studio notebook from AWS Lambda when data is available in S3 bucket. I want to know if this is possible and if yes, how?
All I want is once data is uploaded in S3, the lambda function should be able to spin up the Sagemaker notebook with a standard CPU cluster.

Here is a Jupyter plug in that you can use to do this, please note this is not managed by AWS. It is experimental software and should be used that way.
https://github.com/aws-samples/sagemaker-run-notebook
Using this extension, you can run your notebook based on an event.
I work at AWS and my opinions are my own.

Related

How can we orchestrate and automate data movement and data transformation in AWS sagemaker pipeline

I'm migrating our ML notebooks from Azure Databricks to AWS environment using Sagemaker and Step functions. I have separate notebooks for data processing, feature engineering and ML algorithms which I want to run in a sequence after completion of previous notebook. Can you help me any resource which shows to execute sagemaker notebooks in a sequence using AWS step?
For this type of architecture you need to involve some other elements of the aws as well.
The other services which might be helpful to achieve this is using the combination of eventbridge (scheduled rules) which will execute lambda and then reaches to sagemaker where you can execute you notebooks.
A new feature allows you to Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs. Unfortunately there no way yet to tie them together into a pipeline.
The other alternative would be to convert your notebooks to processing and training jobs, and use something like AWS Step Functions, or SageMaker Pipelines to run them as a pipeline.

How to run AWS Sagemaker Studio job based on pre defined schedule

Currently I am developing a model in AWS Sagemaker Studio. In Sagemaker there are multiple options for running model, like notebook instance, sagemaker studio etc, to schedule a task in notebook instance, it is known that we need to use AWS lambda for that. But I can't see an documentation on how to run scheduled job on AWS Sagemaker Studio.
Need suggestion on this. I know this is not a good question based on StackOverflow guidance like showing some code, but the problem itself is a bit new one, with a newer solution like AWS Sagemaker Studio.
A new feature allows you to Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs

Use AWS Lambda to execute a jupyter notebook on AWS Sagemaker

I made a classifier in Python that uses a lot of libraries. I have uploaded the model to Amazon S3 as a pickle (my_model.pkl). Ideally, every time someone uploads a file to a specific S3 bucket, it should trigger an AWS Lambda that would load the classifier, return predictions and save a few files on an Amazon S3 bucket.
I want to know if it is possible to use a Lambda to execute a Jupyter Notebook in AWS SageMaker. This way I would not have to worry about the dependencies and would generally make the classification more straight forward.
So, is there a way to use an AWS Lambda to execute a Jupyter Notebook?
Scheduling notebook execution is a bit of a SageMaker anti-pattern, because (1) you would need to manage data I/O (training set, trained model) yourself, (2) you would need to manage metadata tracking yourself, (3) you cannot run on distributed hardware and (4) you cannot use Spot. Instead, it is recommended for scheduled task to leverage the various SageMaker long-running, background job APIs: SageMaker Training, SageMaker Processing or SageMaker Batch Transform (in the case of a batch inference).
That being said, if you still want to schedule a notebook to run, you can do it in a variety of ways:
in the SageMaker CICD Reinvent 2018 Video, Notebooks are launched as Cloudformation templates, and their execution is automated via a SageMaker lifecycle configuration.
AWS released this blog post to document how to launch Notebooks from within Processing jobs
But again, my recommendation for scheduled tasks would be to remove them from Jupyter, turn them into scripts and run them in SageMaker Training
No matter your choices, all those tasks can be launched as API calls from within a Lambda function, as long as the function role has appropriate permissions
I agree with Olivier. Using Sagemaker for Notebook execution might not be the right tool for the job.
Papermill is the framework to run Jupyter Notebooks in this fashion.
You can consider trying this. This allows you to deploy your Jupyter Notebook directly as serverless cloud function and uses Papermill behind the scene.
Disclaimer: I work for Clouderizer.
It totally possible, not an anti-pattern at all. It really depends on your use-case. AWs actually made a great article describing it, which includes a lambda

What's the most efficient way to export files from EC2 to S3 on timed intervals?

Working on a problem at the moment where I want to export a file on an EC2 instance running a Windows AMI at four hour intervals to an S3 bucket. Currently, the architecture I'm thinking is as follows.
1. CloudWatch Events rule using scheduled trigger
2. Rule triggers Lambda function to run
3. Lambda function would use some form of the AWS CLI on the windows EC2 instance to extract (sync, cp, etc.) the file
4. File is placed is S3 bucket
Does anyone see a path that's more efficient than this one? I want to ensure that I'm handling this in the most straightforward manner. Thanks in advance for any input!
It is quite difficult to have external code (eg an AWS Lambda function) cause something to execute on a Windows computer. You could use Systems Manager Run Command, but that's a rather complex solution.
It would be much simpler to have the Windows computer push the files to Amazon S3:
Create a scheduled task in Windows
Use aws s3 cp or aws s3 sync to copy the files to Amazon S3
Done!
Your solution seems solid. Alternatively you may want to write daemon-like service (background process) that runs on each EC2 and does the data transfer from that instance to S3. What I like about your solution is how you can centrally control the scheduling easily. For my distributed solution you can have the processes read from central config, but that seems more complicated than the CW/Lambda solution.
For the EC2 process solution, this may be useful:
How to mount Amazon S3 Bucket as a Windows Drive, but it should be easy (and more scalable) to just use the AWS SDK instead to talk to S3

Export files from S3 to another cloud storage using Amazon Lambda?

I want to add an export functionality to transfer some data from S3 to another selected cloud storage by the user.
I am already doing it by using a simple node.js server on an amazon t2.micro instance that gets the file from S3 and pipes it to a POST request to the desired cloud storage.
The problem with this solution is scalability and network saturation of my amazon infrastructure.
I recently discovered Amazon Lambda and thought it would be the perfect solution for my feature, but then I saw that the function can't run more than 300s and some of my files may take more than 300s.
I know there are some services like mover.io which handle that, but they don't support some of the cloud storage I need to export to.
What do you suggest me ?
Thank you.