Is there any possible ways to do model monitoring in aws without an endpoint? Kindly share any good notebook regarding this if you knew
Aws not gives any explainable example regarding Batch Model monitoring.
Amazon SageMaker Model Monitor monitors the quality of Amazon SageMaker machine learning models in production.
You can set up continuous monitoring with a real-time endpoint (or a batch transform job that runs regularly), or on-schedule monitoring for asynchronous batch transform jobs.
Here are some example notebooks:
(1) SageMaker Model Monitor with Batch Transform - Data Quality Monitoring On-Schedule (link)
(2) SageMaker Data Quality Model Monitor for Batch Transform with SageMaker Pipelines On-demand (link)
Related
I have been planning to migrate my Batch processing from Spring Batch to AWS Batch. Can someone give me the reasons to Choose AWS Batch over Spring Batch?
Whilst both these things will play a role in orchestrating your batch workloads a key difference is that AWS Batch will also manage the infrastructure you need to run the jobs/pipeline. AWS Batch lets you to tailor the underlying cloud instances, or specifcy a broad array of instance types that will work for you. And it'll let you make trade-offs: you can task it with managing a bag of EC2 Spot instances for you (for example), and then ask it to optimize time-to-execution over price (or prefer price to speed).
(For full disclosure, I work for the engineering team that builds AWS Batch).
I believe both work at different levels .Spring batch provide framework that reduce the boiler plate code that you need in order to write a batch job.For eg. saving the state of job in Job repository that provide restartability.
On the contrary, AWS batch is an infrastructure framework that helps in managing infra and set some environment variable that help differentiate master node from slave node.
In my opinion both can work together to write a full fledged cost effective batch job at scale on AWS cloud.
Aws Batch is full blown SaaS solution for batch processing,
It has inbuilt
Queue with priority options
Runtime, which can be self managed and auto managed
Job repo, docker images for the job definitions
Monitoring , dashboards, integration with other AWS services like SNS and from SNS to where ever you want
On the other hand, batch is a framework which would still need some of your efforts to manage it all. like employing a queue, scaling is your headache, monitoring etc
My take is , if your company or app is on AWS , go for AWS batch , you will save months of time and get to scalability to a million jobs per day in no time . If you are on-perm or private go for spring batch with some research
I made a classifier in Python that uses a lot of libraries. I have uploaded the model to Amazon S3 as a pickle (my_model.pkl). Ideally, every time someone uploads a file to a specific S3 bucket, it should trigger an AWS Lambda that would load the classifier, return predictions and save a few files on an Amazon S3 bucket.
I want to know if it is possible to use a Lambda to execute a Jupyter Notebook in AWS SageMaker. This way I would not have to worry about the dependencies and would generally make the classification more straight forward.
So, is there a way to use an AWS Lambda to execute a Jupyter Notebook?
Scheduling notebook execution is a bit of a SageMaker anti-pattern, because (1) you would need to manage data I/O (training set, trained model) yourself, (2) you would need to manage metadata tracking yourself, (3) you cannot run on distributed hardware and (4) you cannot use Spot. Instead, it is recommended for scheduled task to leverage the various SageMaker long-running, background job APIs: SageMaker Training, SageMaker Processing or SageMaker Batch Transform (in the case of a batch inference).
That being said, if you still want to schedule a notebook to run, you can do it in a variety of ways:
in the SageMaker CICD Reinvent 2018 Video, Notebooks are launched as Cloudformation templates, and their execution is automated via a SageMaker lifecycle configuration.
AWS released this blog post to document how to launch Notebooks from within Processing jobs
But again, my recommendation for scheduled tasks would be to remove them from Jupyter, turn them into scripts and run them in SageMaker Training
No matter your choices, all those tasks can be launched as API calls from within a Lambda function, as long as the function role has appropriate permissions
I agree with Olivier. Using Sagemaker for Notebook execution might not be the right tool for the job.
Papermill is the framework to run Jupyter Notebooks in this fashion.
You can consider trying this. This allows you to deploy your Jupyter Notebook directly as serverless cloud function and uses Papermill behind the scene.
Disclaimer: I work for Clouderizer.
It totally possible, not an anti-pattern at all. It really depends on your use-case. AWs actually made a great article describing it, which includes a lambda
I have created a customer docker container using an Amazon tensorflow container as a starting point:
763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:1.15.2-gpu-py36-cu100-ubuntu18.04
inside the container I run a custom keras (with TF backend) training job from the docker SAGEMAKER_PROGRAM. I can access the training data ok (from an EFS mount) and can generate output into /opt/ml/model that gets synced back to S3. So input and output is good: what I am missing is real-time monitoring.
A Sagemaker training job emits system metrics like cpu and gpu loads which you can conveniently view in real-time on the Sagemaker training job console. But I cannot find a way to emit metrics about the progress of the training job. i.e. loss, accuracy etc from my python code.
Actually, ideally I would like to use Tensorboard but as Sagemaker doesn't expose the instance on the EC2 console I cannot see how I can find the IP address of the instance to connect to for Tensorboard.
So the fallback is try and emit relevant metrics from the training code so that we can monitor the job as it runs.
The basic question is how do I real-time monitor key metrics for my custom training job runnning in a container on Sagemaker training job:
- Is a tensorboard solution possible? If so how?
- If not how do I emit metrics from my python code and have them show up in the training job console or as cloudwatch metrics directly?
BTW: so far I have failed to be able to get sufficient credentials inside the training job container to access either s3 or cloudwatch.
If you're using customer images for training, you can specify a name and a regular expression for metrics you want to track for training.
byo_estimator = Estimator(image_name=image_name,
role='SageMakerRole', train_instance_count=1,
train_instance_type='ml.c4.xlarge',
sagemaker_session=sagemaker_session,
metric_definitions=[{'Name': 'test:msd', 'Regex': '#quality_metric: host=\S+, test msd <loss>=(\S+)'},
{'Name': 'test:ssd', 'Regex': '#quality_metric: host=\S+, test ssd <loss>=(\S+)'}])
I am trying to train a neural network (Tensorflow) on AWS. I have some AWS credits. From my understanding AWS SageMaker is the one best for the job. I managed to load the Jupyter Lab console on SageMaker and tried to find a GPU kernel since, I know it is the best for training neural networks. However, I could not find such kernel.
Would anyone be able to help in this regard.
Thanks & Best Regards
Michael
You train models on GPU in the SageMaker ecosystem via 2 different components:
You can instantiate a GPU-powered SageMaker Notebook Instance, for example p2.xlarge (NVIDIA K80) or p3.2xlarge (NVIDIA V100). This is convenient for interactive development - you have the GPU right under your notebook and can run code on the GPU interactively and monitor the GPU via nvidia-smi in a terminal tab - a great development experience. However when you develop directly from a GPU-powered machine, there are times when you may not use the GPU. For example when you write code or browse some documentation. All that time you pay for a GPU that sits idle. In that regard, it may not be the most cost-effective option for your use-case.
Another option is to use a SageMaker Training Job running on a GPU instance. This is a preferred option for training, because training metadata (data and model path, hyperparameters, cluster specification, etc) is persisted in the SageMaker metadata store, logs and metrics stored in Cloudwatch and the instance automatically shuts down itself at the end of training. Developing on a small CPU instance and launching training tasks using SageMaker Training API will help you make the most of your budget, while helping you retain metadata and artifacts of all your experiments. You can see here a well documented TensorFlow example
All Notebook GPU and CPU instance types: AWS Documentation.
I want to build some neural network models for NLP and recommendation applications. The framework I want to use is TensorFlow. I plan to train these models and make predictions on Amazon web services. The application will be most likely distributed computing.
I am wondering what are the pros and cons of SageMaker and EMR for TensorFlow applications?
They both have TensorFlow integrated.
In general terms, they serve different purposes.
EMR is when you need to process massive amounts of data and heavily rely on Spark, Hadoop, and MapReduce (EMR = Elastic MapReduce). Essentially, if your data is in large enough volume to make use of the efficiencies of Spark, Hadoop, Hive, HDFS, HBase and Pig stack then go with EMR.
EMR Pros:
Generally, low cost compared to EC2 instances
As the name suggests Elastic meaning you can provision what you need when you need it
Hive, Pig, and HBase out of the box
EMR Cons:
You need a very specific use case to truly benefit from all the offerings in EMR. Most don't take advantage of its entire offering
SageMaker is an attempt to make Machine Learning easier and distributed. The marketplace provides out of the box algos and models for quick use. It's a great service if you conform to the workflows it enforces. Meaning creating training jobs, deploying inference endpoints
SageMaker Pros:
Easy to get up and running with Notebooks
Rich marketplace to quickly try existing models
Many different example notebooks for popular algorithms
Predefined kernels that minimize configuration
Easy to deploy models
Allows you to distribute inference compute by deploying endpoints
SageMaker Cons:
Expensive!
Enforces a certain workflow making it hard to be fully custom
Expensive!
From AWS documentation:
Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
(...) Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker removes all the barriers that typically slow down developers who want to use machine learning.
Conclussion:
If you want to deploy AI models just use AWS SageMaker