AWS SageMaker hosting multiple models on the same machine (ML compute instance) - amazon-web-services

I am able to host the models developed in SageMaker by using the deploy functionality. Currently, I see that the different models that I have developed needs to deployed on different ML compute instances.
Is there a way to deploy all models on the same instance, using separate instances seems to be very expensive option. If it is possible to deploy multiple models on the same instance, will that create different endpoints for the models?

SageMaker is designed to solve deployment problems in scale, where you want to have thousands of model invocations per seconds. For such use cases, you want to have multiple tasks of the same model on each instance, and often multiple instances for the same model behind a load balancer and an auto scaling group to allow to scale up and down as needed.
If you don’t need such scale and having even a single instance for a single model is not economic for the request per second that you need to handle, you can take the models that were trained in SageMaker and host them yourself behind some serving framework such as MXNet serving (https://github.com/awslabs/mxnet-model-server ) or TensorFlow serving (https://www.tensorflow.org/serving/ ).
Please also note that you have control over the instance type that you are using for the hosting, and you can choose a smaller instance for smaller loads. Here is a list of the various instance types that you can choose from: https://aws.amazon.com/sagemaker/pricing/instance-types/

I believe this is a new feature introduced in AWS sagemaker, please refer below links which exactly does the same.
Yes, now in AWS sagemaker you can deploy multiple models in same ML instance.
In Below Link,
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/
You can find examples,
multi_model_bring_your_own
multi_model_sklearn_home_value
multi_model_xgboost_home_value
Another link which explains multi-model XGboost in details.
https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/
Hope this helps anyone who is looking to solve this issue in future.

Related

GCP components to orchestrate crons running in GCE (Google Workflows?)

I need to run a pipeline of data transformation that is composed of several scripts in distinct projects = Python repos.
I am thinking of using Compute Engine to run these scripts in VMs when needed as I can manage resources required.
I need to be able to orchestrate these scripts in the sense that I want to run steps sequentially and sometimes asyncronously.
I see that GCP provides us with a Worflows components which seems to suit this case.
I am thinking of creating a specific project to orchestrate the executions of scripts.
However I cannot see how I can trigger the execution of my scripts which will not be in the same repo as the orchestrator project. From what I understand of GCE, VMs are only created when scripts are executed and provide no persistent HTTP endpoints to be called to trigger the execution from elsewhere.
To illustrate, let say I have two projects step_1 and step_2 which contain separate steps of my data transformation pipeline.
I would also have a project orchestrator with the only use of triggering step_1 and step_2 sequentially in VMs with GCE. This project would not have access to the code repos of these two former projects.
What would be the best practice in this case? Should I use other components than GCE and Worflows for this or there is a way to trigger scripts in GCE from an independent orchestration project?
One possible solution would be to not use GCE (Google Compute Engines) but instead create Docker containers that contain your task steps. These would then be registered with Cloud Run. Cloud Run spins up docker containers on demand and charges you only for the time you spend processing a request. When the request ends, you are no longer charged and hence you are optimally consuming resources. Various events can cause a request in Cloud Run but the most common is a REST call. With this in mind, now assume that your Python code is now packaged in a container which is triggered by a REST server (eg. Flask). Effectively you have created "microservices". These services can then be orchestrated by Cloud Workflows. The invocation of these microservices is through REST endpoints which can be Internet addresses with authorization also present. This would allow the microservices (tasks/steps) to be located in separate GCP projects and the orchestrator would see them as distinct callable endpoints.
Other potentials solutions to look at would be GKE (Kubernetes) and Cloud Composer (Apache Airflow).
If you DO wish to stay with Compute Engines, you can still do that using shared VPC. Shared VPC would allow distinct projects to have network connectivity between each other and you could use Private Catalog to have the GCE instances advertize to each other. You could then have a GCE instance choreograph or, again, choreograph through Cloud Workflows. We would have to check that Cloud Workflows supports parallel items ... I do not believe that as of the time of this post it does.
This is a common request, to organize automation into it's own project. You can setup service account that spans multiple projects.
See a tutorial here: https://gtseres.medium.com/using-service-accounts-across-projects-in-gcp-cf9473fef8f0
On top of that, you can also think to have Workflows in both orchestrator and sublevel project. This way the orchestrator Workflow can call another Workflow. So the job can be easily run, and encapsuled also under the project that has the code + workflow body, and only the triggering comes from other project.

Amazon SageMaker multiple-models

I am interested when using Amazon Sagemaker multiple-models options running on one endpoint. How does it look in practice? When I send more requests on different models, can Sagemaker deal with this simultaneously?
Thank you.
You need to specify which model in the request body. The model name specified when creating the sagemaker model.
Invoke a Multi-Model Endpoint
response = runtime_sm_client.invoke_endpoint(
EndpointName = ’my-endpoint’,
ContentType = 'text/csv',
TargetModel = ’new_york.tar.gz’,
Body = body)
Save on inference costs by using Amazon SageMaker multi-model endpoints
There are multiple limitations. Currently the sagemaker multi model server (MMS) cannot use GPU.
Host Multiple Models with Multi-Model Endpoints
Multi-model endpoints are not supported on GPU instance types.
The SageMaker Python SDK is not clear which framework model supports the multi model server deployment and how. For instance with Use TensorFlow with the SageMaker Python SDK, the SageMaker endpoint docker image is automatically picked up by SageMaker using the images in Available Deep Learning Containers Images. However it is not clear which framework images are MMS ready.
[Deploy Multiple ML Models on a Single Endpoint Using Multi-model Endpoints on Amazon SageMaker] explains building AWS XGBoost image with MMS. Hence apparently the docker image needs to be built with MMS being specified as the front-end. If the images are not built in such a way, MMS may not be available.
Such information is missing in AWS, so if there is an issue encountered, you would need AWS support to identify the cause. Especially SageMaker team keeps changing the images, MMS implementation, etc, there can be issues expected.
References
SageMaker Inference Toolkit
Multi Model Server
Deploy Multiple ML Models on a Single Endpoint Using Multi-model Endpoints on Amazon SageMaker

Is it possible to use EC2 and Lambda for Machine Learning Models

I want to use Amazon Lambda for my Machine Learning Model Web Services but since Lambda has size limits and several of my models are over 1 or 2GB I wondered if there is a way to use S3 to store the models and call them inside Lambda or do I need to use EC2?
Thanks in advance.
You can side load the app. The lambda can be a small bootstrap script that downloads your app and models from s3 and unzips it. This is a popular pattern in server less frameworks. But you need to remember it will be slow as models and other data will be downloaded to lambda work space. You pay for this during a cold start of the lambda so you will need to keep it warm in a production environment.

AWS CodeDeploy Instance specific configuration

I'm not native, so first I'm sorry for my bad English.
What is the best practice for instance specific configuration in AWS CodeDeploy?
I want to deploy server for multiple instances, and I also want to register some cron job (like, daily report?) on just one of these instances. I'm using AWS CodeDeploy, and looks like there's no simple option to do such thing.
I have some solutions but not very satisfying. One is separating Deployment Group. Means I have to manage some additional Revisions. The other is add tags to EC2 instances and diverge with the tags. It feels too tricky. Is there any other recommended way to do it?
There is no best practice for instance specific configuration in CodeDeploy for instances in the same deployment group. I recommend creating a separate application entirely running on a different instance if you want to run jobs like daily report, so that the job will not interfere with the normal functioning of your application (for example, if the job consumes all the CPU, then your server on that same box will be impacted.)

How to deploy pipelines on user demand in AWS app

I am currently looking at the possibility to use AWS as a way to scale up the infrastructure. I am looking for the best way to set up an application to run different computational pipelines with data provided by the user. I have already seen the possibility of creating on-demand cluster using containers to run the analysis that are currently available (already predefined and ready in containers).
I am looking for advices in which are the amazon services tipically used to launch computations (or containers) once they have been selected in a web app by a user and stored in the backend.
Thanks