MLOps monitoring with quicksight - amazon-web-services

MLOps monitoring with quicksight - amazon-web-services

We currently have 3 machine learning models in production in our team (2 classifiers & one time-series). Sagemaker studio with Sagemaker model monitoring wasn't the right option for us because of our CICD architecture. So now we have an ECS container with our models for predictions.
We now want to apply proper model monitoring to our model. My idea is to store ground truth and prediction data in s3 and apply quicksight for monitoring to this via Athena.
My question is:
Is this a good way of doing this? Can we apply the right metrics this way?

So the long and the short it is, no one can give you a complete answer because this is a vast and wide industry level problem, and you should know that. You need to learn how it works, in general, to figure out how and what to implement for a given use case + desired perf metrics + distance metrics (drift) + tech stack.
You will have to decipher and learn the code examples and article below, then reimplement and refactor for your use case.
1. Code:
https://github.com/graviraja/MLOps-Basics/tree/main/week_9_monitoring
2. Article:
https://www.ravirajag.dev/blog/mlops-serverless
3. GitHub/Sagemaker: Model monitoring with your own container:
https://github.com/aws-samples/sagemaker-model-monitor-bring-your-own-container
4. GitHub/Sagemaker: Visualize model monitoring data:
https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/visualization/SageMaker-Model-Monitor-Visualize.ipynb

Related

GCP AutoML Vision - How to count the number of annotations each of my team members makes in GCP AutoML Vision Annotation Tool using the Web UI?

We are automating the process of our deep learning project. Images are automatically uploaded to a dataset in AutoML Vision (Object detection) in the Google Cloud Platform. We have a couple of team members who regularly annotate the uploaded images by using the provided Annotation Tool in Web UI. We need to measure the productivity of our team members by counting the annotations they make for each of them. I haven't found an efficient solution yet. I would appreciate it if you could share your ideas.

There is not a feature to identify who annotated which images; however, the approach I can think of is that you can split the work between your team members and distribute the labels that each one should annotate. Then you can simply count the number annotations for each label. For instance, in from this guide you can give Baked Goods and Cheese to one collaborator and Salad and Seafood to another one, and so on, so that you can check the totals in the UI. Even, the label statistics can give you more details of annotations for each label (hence for each team member), note that statistics are only available in AutoML Vision Object Detection UI.
An automated approach, in case you are interested in, is Human Labeling Service; according to documentation, currently, it is only available by email because of the Coronavirus (COVID-19) measures
If recommendations above don't fit your needs, you could always file a Feature Request for asking the desired functionality and add the required details.

how to increase performance in aws comprehend on custom classification

I trained a custom classifier with simply two tag in CSV
I have feed my custom classification model with 1000 text each
but when I run a job in my custom classification model, the job take ~5 min (running) for analyses one new text, I search about this issue in AWS, but I don't find any answer...
How can I speed up / optimize my job for analysis new text with the model ?
Thank you in advance

Prior to Nov 2019, Comprehend only supported asynchronous inference for Custom classification. Asynchronous inference is optimized for bulk processing.
Comprehend has since launched real-time inference for Custom classification to satisfy the real-time needs of our customers.
https://docs.aws.amazon.com/comprehend/latest/dg/custom-sync.html
Note that Custom endpoints are charged by time units even when you're not actively using them. You can also look at the pricing document for details - https://aws.amazon.com/comprehend/pricing/

AWS Sagemaker - using cross validation instead of dedicated validation set?

When I train my model locally I use a 20% test set and then cross validation. Sagameker seems like it needs a dedicated valdiation set (at least in the tutorials I've followed). Currently I have 20% test, 10% validation leaving 70% to train - so I lose 10% of my training data compared to when I train locally, and there is some performance loss as a results of this.
I could just take my locally trained models and overwrite the sagemaker models stored in s3, but that seems like a bit of a work around. Is there a way to use Sagemaker without having to have a dedicated validation set?
Thanks

SageMaker seems to allow a single training set while in cross validation you iterate between for example 5 different training set each one validated on a different hold out set. So it seems that SageMaker training service is not well suited for cross validation. Of course cross validation is usually useful with small (to be accurate low variance) data, so in those cases you can set the training infrastructure to local (so it doesn't take a lot of time) and then iterate manually to achieve cross validation functionality. But it's not something out of the box.

Sorry, can you please elaborate which tutorials you are referring to when you say "SageMaker seems like it needs a dedicated validation set (at least in the tutorials I've followed)."
SageMaker training exposes the ability to separate datasets into "channels" so you can separate your dataset in whichever way you please.
See here for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html#your-algorithms-training-algo-running-container-trainingdata

AWS Sagemaker init 1K+ models "endpoints"?

under the assumptions that the model training itself is very fast, I'm wondering what is the best practice to spin up ~ > 1K models endpoints
fast as possible.
Thanks for any hint
Christian

Assuming these are different models (not production variants for testing), you'll need one endpoint per model and thus one SageMaker instance. Probably not the greatest option (cost, time to spin up instances, synchronous calls, API throttling, etc). For now, I'd use another service to deploy, e.g. an ECS cluster.
Could you please tell me a little more about your use case (business problem, framework, model size, etc)? You're not the first one to ask about this capability and your feedback would be very valuable in building the best solution.
Julien (AWS)

Planning an architecture in GCP

I want to plan an architecture based on GCP cloud platform. Below are the subject areas what I have to cover. Can someone please help me to find out the proper services which will perform that operation?
Data ingestion (Batch, Real-time, Scheduler)
Data profiling
AI/ML based data processing
Analytical data processing
Elastic search
User interface
Batch and Real-time publish
Security
Logging/Audit
Monitoring
Code repository
If I am missing something which I have to take care then please add the same too.

GCP offers many products with functionality that can overlap partially. What product to use would depend on the more specific use case, and you can find an overview about it here.
That being said, an overall summary of the services you asked about would be:
1. Data ingestion (Batch, Real-time, Scheduler)
That will depend on where your data comes from, but the most common options are Dataflow (both for batch and streaming) and Pub/Sub for streaming messages.
2. Data profiling
Dataprep (which actually runs on top of Dataflow) can be used for data profiling, here is an overview of how you can do it.
3. AI/ML based data processing
For this, you have several options depending on your needs. For developers with limited machine learning expertise there is AutoML that allows to quickly train and deploy models. For more experienced data scientists there is ML Engine, that allows training and prediction of custom models made with frameworks like TensorFlow or scikit-learn.
Additionally, there are some pre-trained models for things like video analysis, computer vision, speech to text, speech synthesis, natural language processing or translation.
Plus, it’s even possible to perform some ML tasks in GCP’s data warehouse, BigQuery in SQL language.
4. Analytical data processing
Depending on your needs, you can use Dataproc, which is a managed Hadoop and Spark service, or Dataflow for stream and batch data processing.
BigQuery is also designed with analytical operations in mind.
5. Elastic search
There is no managed Elastic search service directly provided by GCP, but you can find several options on the marketplace, like an API service or a Kubernetes app for Google’s Kubernetes Engine.
6. User interface
If you are referring to a user interface for your own use, GCP’s console is what you’d be using. If you are referring to a UI for end-users, I’d suggest using App Engine.
If you are referring to a UI for data exploration, there is Datalab, which is essentially a managed notebook service, and Data Studio, where you can build plots of your data in real time.
7. Batch and Real-time publish
The publishing service in GCP, for both synchronous and asynchronous messages is Pub/Sub.
8. Security
Most security concerns in GCP are addressed here. Which is a wide topic by itself and should probably need a separate question.
9. Logging/Audit
GCP uses Stackdriver for logging of most of its products, and provides many ways to process and analyze those logs.
10. Monitoring
Stackdriver also has monitoring features.
11. Code repository
For this there is Cloud Source Repositories, which integrate with GCP’s automated build system and can also be easily synched with a Github repository.
12. Analytical data warehouse
You did not ask for this one, but I think it's an important part of a data analysis stack.
In the case of GCP, this would be BigQuery.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js