Hosting sklearn - SVM Classifier on AWS Sagemaker - amazon-web-services

I have a model running on my jupyter notebook instance with very basic SVM classifier
# Text lassifier - Algorithm - SVM
# fit the training dataset on the classifier
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto',probability=True)
SVM.fit(Train_X_Tfidf,Train_Y)
# predict the labels on validation dataset
predictions_SVM = SVM.predict(Test_X_Tfidf)
# Use accuracy_score function to get the accuracy
print("SVM Accuracy Score -> ",accuracy_score(predictions_SVM, Test_Y)*100)
Use Case : Host the model on Sagemaker and create an endpoint.Use the end point via Lambda for text classification
I saw AWS has few posts on creating an endpoint E.g. https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html but majority of the content is not applicable to scikit-learn : SVM
Is there an another approach I should be looking at ?

If your model is small enough, you can create a lambda function that loads the model and does predictions based on input passed in from user.

Related

Error when invoking pre-trained model: NotFittedError("Vocabulary not fitted or provided")

I'm new to AWS SageMaker and I'm trying to deploy a simple pre-trained model to SageMaker to create endpoint and then make predictions.
The model is a sklearn linear regression model, the input is a vectorized sparse matrix, derived from a string of text (customer's review), and output the star-rating value (1 to 5).
I have trained the model locally and export its artifact to a model.joblib file.
Then I configure the inference.py file to zip it together with the model.joblib file into a model.tar.gz file, which is then uploaded to S3 for model registration and endpoint creation.
However, when I invoke the endpoint on a sample text, the following error is returned in the CloudWatch log:
File "/miniconda3/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 498, in _check_vocabulary
raise NotFittedError("Vocabulary not fitted or provided")
I understand that this means SageMaker is complaining about the trained model artifact being not fitted, and there is no problem with other parts (such as the inference.py file). However the pre-trained model was fitted before exporting.
I'm not sure which part was wrong, so I didn't upload any more codes not to cluster.
Thank you.

Loading trained model in to SageMaker Estimator

I've trained a custom model on sagemaker based on PyTorch estimator.
Training has been completed, and I verified that the model artifacts have been saved into s3 location.
I want to load my trained model into my sagemaker notebooks so I can perform analysis/inference so on ...
I did as below but I am not sure if this is the right method to do this as it asks for instance type, and to my knowledge, If I were to load the already trained estimator, I would need to declare which type of computing instance I use once I start deploying the model for inference.
estimator = PyTorch(
model_data = ModelArtifact_S3_LOCATION,
entry_point ='train.py',
source_dir = 'code',
role = role,
framework_version = '1.5.0',
py_version = 'py3',)
If training has been completed and you want to setup for inference then you want to point to your tar.gz model artifact file to create an endpoint or take your training estimator directly. The following code block is the general flow that you want to follow for training, inference, and predictions.
# Train my estimator
pytorch_estimator = PyTorch(entry_point='train_and_deploy.py',
instance_type='ml.p3.2xlarge',
instance_count=1,
framework_version='1.8.0',
py_version='py3')
pytorch_estimator.fit('s3://my_bucket/my_training_data/')
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = pytorch_estimator.deploy(instance_type='ml.m4.xlarge',
initial_instance_count=1)
# `data` is a NumPy array or a Python list.
# `response` is a NumPy array.
response = predictor.predict(data)
For more information check out the following link for deploying PyTorch models on SageMaker.
https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models
I work for AWS & my opinions are my own

Can we specify the training and test document percentage in AWS comprehend Custom Entity Recognizer?

I trained a custom entity recognizer using AWS comprehend for an entity extraction problem. The trained recognizer uses default train and test data-split which here splits test data more than train data. This affects the recognizer metrics. Also these values(number of train & test documents) are higher than the total input "train.csv" file added in s3 bucket for training.
Total number of inputs given in csv file : 1010
Recognizer used train documents : 2480
Recognizer used test document : 3270

Beginners guide to Sagemaker

I have followed an Amazon tutorial for using SageMaker and have used it to create the model in the tutorial (https://aws.amazon.com/getting-started/tutorials/build-train-deploy-machine-learning-model-sagemaker/).
This is my first time using SageMaker, so my question may be stupid.
How do you actually view the model that it has created? I want to be able to see a) the final formula created with the parameters etc. b) graphs of plotted factors etc. as if I was reviewing a GLM for example.
Thanks in advance.
If you followed the SageMaker tutorial you must have trained an XGBoost model. SageMaker places the model artifacts in a bucket that you own, check the output S3 location in the AWS SageMaker console.
For more information about XGBoost you can check the AWS SageMaker documentation https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#xgboost-sample-notebooks and the example notebooks, e.g. https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb
To consume the XGBoost artifact generated by SageMaker, check out the official documentation, which contains the following code:
# SageMaker XGBoost uses the Python pickle module to serialize/deserialize
# the model, which can be used for saving/loading the model.
# To use a model trained with SageMaker XGBoost in open source XGBoost
# Use the following Python code:
import pickle as pkl
model = pkl.load(open(model_file_path, 'rb'))
# prediction with test data
pred = model.predict(dtest)

Get weights after training from Sagemaker Estimator

Tensorflow's Estimator provides a method to get desired variable values after training/testing using get_variable_value.
Does there exist similar functionality in Sagemaker's Estimator, so that I am able to obtain weights after my model is trained.
For the Estimator object in the SageMaker Python SDK, after you call fit() you can call get the S3 URL of your model artifacts with
model_artifacts_url = estimator.create_model().model_data
The model itself is saved into your S3 bucket as a tarball at this location. So from here you can get the model parameters out of S3.