Get weights after training from Sagemaker Estimator - amazon-web-services

Tensorflow's Estimator provides a method to get desired variable values after training/testing using get_variable_value.
Does there exist similar functionality in Sagemaker's Estimator, so that I am able to obtain weights after my model is trained.

For the Estimator object in the SageMaker Python SDK, after you call fit() you can call get the S3 URL of your model artifacts with
model_artifacts_url = estimator.create_model().model_data
The model itself is saved into your S3 bucket as a tarball at this location. So from here you can get the model parameters out of S3.

Related

Error when invoking pre-trained model: NotFittedError("Vocabulary not fitted or provided")

I'm new to AWS SageMaker and I'm trying to deploy a simple pre-trained model to SageMaker to create endpoint and then make predictions.
The model is a sklearn linear regression model, the input is a vectorized sparse matrix, derived from a string of text (customer's review), and output the star-rating value (1 to 5).
I have trained the model locally and export its artifact to a model.joblib file.
Then I configure the inference.py file to zip it together with the model.joblib file into a model.tar.gz file, which is then uploaded to S3 for model registration and endpoint creation.
However, when I invoke the endpoint on a sample text, the following error is returned in the CloudWatch log:
File "/miniconda3/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 498, in _check_vocabulary
raise NotFittedError("Vocabulary not fitted or provided")
I understand that this means SageMaker is complaining about the trained model artifact being not fitted, and there is no problem with other parts (such as the inference.py file). However the pre-trained model was fitted before exporting.
I'm not sure which part was wrong, so I didn't upload any more codes not to cluster.
Thank you.

Upload custom file to s3 from training script in training component of AWS SageMaker Pipeline

I am new to Sagmaker, and I have created a pipeline from the SageMaker notebook, consisting of training and deployment components.
In the training script, we can upload the model to s3 via SM_MODEL_DIR. But now, I want to upload the classification report to s3. I tried this code. But It shows this is not a proper s3 bucket.
df_classification_report = pd.DataFrame(class_report).transpose()
classification_report_file_name = os.path.join(args.output_data_dir,
f"{args.eval_model_name}_classification_report.csv")
df_classification_report.to_csv(classification_report_file_name)
# instantiate S3 client and upload to s3
# save classification report to s3
s3 = boto3.resource('s3')
print(f"classification_report is being uploaded to s3- {args.model_dir}")
s3.meta.client.upload_file(classification_report_file_name, args.model_dir,
f"{args.eval_model_name}_classification_report.csv")
And the error
Invalid bucket name "/opt/ml/output/data": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
Can anybody help? I really appreciate any help you can provide.
SageMaker Training Jobs will compress any files located in /opt/ml/model which is the value of SM_MODEL_DIR and upload it to S3 automatically. You could look at saving your file to SM_MODEL_DIR (Your classification report will thus be uploaded to S3 in the model tar ball).
The upload_file() function requires you to pass an S3 bucket.
You could also look at manually specify an S3 bucket in your code to upload the file to.
s3.meta.client.upload_file(classification_report_file_name, <YourS3Bucket>,
f"{args.eval_model_name}_classification_report.csv")
You can save non model artifacts, such as reports, to output_data_dir. See here.
parser.add_argument("--output_data_dir", type=str,
default=os.environ.get('SM_OUTPUT_DATA_DIR'),
help="Directory to save output data artifacts.")
If you want the artifacts to be packaged with the model files then follow #Marc's answer. Maybe it makes sense in the case of a report that pertains to a specific model, though capturing this in a model registry makes more sense to me.
Note that these additional artifacts would be carried over if you deploy the model to an endpoint (might confuse the inference runtime model loading code).

Loading trained model in to SageMaker Estimator

I've trained a custom model on sagemaker based on PyTorch estimator.
Training has been completed, and I verified that the model artifacts have been saved into s3 location.
I want to load my trained model into my sagemaker notebooks so I can perform analysis/inference so on ...
I did as below but I am not sure if this is the right method to do this as it asks for instance type, and to my knowledge, If I were to load the already trained estimator, I would need to declare which type of computing instance I use once I start deploying the model for inference.
estimator = PyTorch(
model_data = ModelArtifact_S3_LOCATION,
entry_point ='train.py',
source_dir = 'code',
role = role,
framework_version = '1.5.0',
py_version = 'py3',)
If training has been completed and you want to setup for inference then you want to point to your tar.gz model artifact file to create an endpoint or take your training estimator directly. The following code block is the general flow that you want to follow for training, inference, and predictions.
# Train my estimator
pytorch_estimator = PyTorch(entry_point='train_and_deploy.py',
instance_type='ml.p3.2xlarge',
instance_count=1,
framework_version='1.8.0',
py_version='py3')
pytorch_estimator.fit('s3://my_bucket/my_training_data/')
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = pytorch_estimator.deploy(instance_type='ml.m4.xlarge',
initial_instance_count=1)
# `data` is a NumPy array or a Python list.
# `response` is a NumPy array.
response = predictor.predict(data)
For more information check out the following link for deploying PyTorch models on SageMaker.
https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models
I work for AWS & my opinions are my own

How to deploy our own TensorFlow Object Detection Model in amazon Sagemaker?

I have my own trained TF Object Detection model. If I try to deploy/implement the same model in AWS Sagemaker. It was not working.
I have tried TensorFlowModel() in Sagemaker. But there is an argument called entrypoint- how to create that .py file for prediction?
entrypoint is a argument which contains the file name inference.py,which means,once you create a endpoint and try to predict the image using the invoke endpoint api. the instance will be created based on you mentioned and it will go to the inference.py script and execute the process.
Link : Documentation for tensor-flow model deployment in amazon sage-maker
.
The inference script must contain a methods input_handler and output_handler or handler which will cover both the function in inference.py script, this for pre and post processing of your image.
Example for Deploying the tensor flow model
In the above link, i have mentioned a medium post, this will be helpful for your doubts.

Beginners guide to Sagemaker

I have followed an Amazon tutorial for using SageMaker and have used it to create the model in the tutorial (https://aws.amazon.com/getting-started/tutorials/build-train-deploy-machine-learning-model-sagemaker/).
This is my first time using SageMaker, so my question may be stupid.
How do you actually view the model that it has created? I want to be able to see a) the final formula created with the parameters etc. b) graphs of plotted factors etc. as if I was reviewing a GLM for example.
Thanks in advance.
If you followed the SageMaker tutorial you must have trained an XGBoost model. SageMaker places the model artifacts in a bucket that you own, check the output S3 location in the AWS SageMaker console.
For more information about XGBoost you can check the AWS SageMaker documentation https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#xgboost-sample-notebooks and the example notebooks, e.g. https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb
To consume the XGBoost artifact generated by SageMaker, check out the official documentation, which contains the following code:
# SageMaker XGBoost uses the Python pickle module to serialize/deserialize
# the model, which can be used for saving/loading the model.
# To use a model trained with SageMaker XGBoost in open source XGBoost
# Use the following Python code:
import pickle as pkl
model = pkl.load(open(model_file_path, 'rb'))
# prediction with test data
pred = model.predict(dtest)