Sagemaker inference for encoding and custom logic - amazon-web-services

I have a requirement where I want to perform processing on sentences encoded by sentence_transformers model and then add some logic to get the final results and I want to do it on serverless inference endpoint.
The code should be something like this:
df = pd.read_csv(filename)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
output = model.encode(df['query'])
vec_result.append(output)
cos_scores = util.pytorch_cos_sim(vec_result, vec_result)
final_result = postporcess(df, cos_scores)
return final_result
Is it doable using inference endpoint?

Related

How could I update this deprecated code for batch prediction using sagemaker?

Let me explain my problem:
I have to update the code of a notebook that used version 1.x of sagemaker to make a batch prediction from an xgboost endpoint that has been generated in aws SageMaker.
After defining a dataframe called ordered_data, when trying to run this:
def batch_predict(data, xgb_predictor, rows=500):
split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
predicates = ''
for array in split_array:
new_predictions = xgb_predictor.predictor.predict(array).decode('utf-8').
predictions = predictions + '\n' + predictions_new
predictions = predictions.replace('\n', ',')
predictions = predictions.replace(',,', ',')
return np.fromstring(predictions[1:], sep=',')
def get_predictions(sorted_data, xgb_predictor):
xgb_predictor.content_type = 'text/csv'.
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None
#predictions = batch_predict(ordered_data.as_matrix(), xgb_predictor) # get the scores for each piece of data
predictions = batch_predict(ordered_data.values, xgb_predictor)
predictions = pd.DataFrame(predictions, columns=['score'])
return predictions
xgb_predictor = sagemaker.predictor.RealTimePredictor(endpoint_name='sagemaker-xgboost-2023-01-18')
predictions = get_predictions(sorted_data, xgb_predictor)
predictions2 = pd.concat([predictions, raw_data[[['order_id']]]], axis=1).
I've checked the documentation of sagemaker v2, and tried to update many things, and also I've run the code !sagemaker-upgrade-v2 --in-file file.ipynb --out-file file2.ipynb
but nothing works.
I get several errors like:
'content_type' property of object 'deprecated_class..DeprecatedClass' has no setter.
If I delete the line where I define content_type, I get: AttributeError: 'NoneType' object has no attribute 'ACCEPT'.
and so on.
I need to update all this code but I don't know how.
SageMaker RealTimePredictor class has serializer and deserializer parameters in Python SDK V2, behavior for serialization of input data and deserialization of result data can be configured through initializer arguments.
Note: The csv_serializer, json_serializer, npy_serializer, csv_deserializer, json_deserializer, and numpy_deserializer objects have been deprecated in v2
serializer=CSVSerializer(),
deserializer=JSONDeserializer()

How to run inference on an image classification model simultaneously for multiple images in MXNet and Python 2.7

I am running inference using Python 2.7, MXNet V1.3.0 ML framework on an image classification model of ONNX format (V1.2.1 with opset 7) where I feed an image to the inferrer at a time. What do I need to do to asynchronously run inference for multiple images but also await for all of them to finish?
I am extracting frames as .jpeg images from a video at 30 FPS. So for an example, when I run the process on a video of length 20s, it generates 600 .jpeg images. For now, I iterate through a list of those images and pass a relative path to each of them to the following function which then infers from the target image.
def infer(self, target_image_path):
target_image_path = self.__output_directory + '/' + target_image_path
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Define the model's input'''
model_metadata = onnx_mxnet.get_model_metadata(self.__model)
data_names = [inputs[0]
for inputs in model_metadata.get('input_tensor_data')]
Batch = namedtuple('Batch', 'data')
ctx = mx.eia() # Set the context to elastic inference
'''Load the model'''
sym, arg, aux = onnx_mxnet.import_model(self.__model)
mod = mx.mod.Module(symbol=sym, data_names=data_names,
context=ctx, label_names=None)
mod.bind(data_shapes=[(data_names[0], image_data.shape)],
label_shapes=None, for_training=False)
mod.set_params(arg_params=arg, aux_params=aux,
allow_missing=True, allow_extra=True)
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
predictions = mod.get_outputs()[0].asnumpy()
predictions = predictions[0].tolist()
'''Apply emotion labels'''
zipb_object = zip(self.__emotion_labels, predictions)
prediction_dictionary = dict(zipb_object)
return prediction_dictionary
Expected behavior would be to run the inference for each image asynchronously but also to await the process to finish for the entire batch.
One thing you shouldn't do is to load the model for every image. Should load the model once and then run inference on all your 600 images.
For example you can refactor your code like that:
def load_model(self):
'''Load the model'''
model_metadata = onnx_mxnet.get_model_metadata(self.__model)
data_names = [inputs[0]
for inputs in model_metadata.get('input_tensor_data')]
Batch = namedtuple('Batch', 'data')
ctx = mx.eia() # Set the context to elastic inference
'''Load the model'''
sym, arg, aux = onnx_mxnet.import_model(self.__model)
mod = mx.mod.Module(symbol=sym, data_names=data_names,
context=ctx, label_names=None)
mod.bind(data_shapes=[(data_names[0], image_data.shape)],
label_shapes=None, for_training=False)
mod.set_params(arg_params=arg, aux_params=aux,
allow_missing=True, allow_extra=True)
return mod
def infer(self, mod, target_image_path):
target_image_path = self.__output_directory + '/' + target_image_path
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
predictions = mod.get_outputs()[0].asnumpy()
predictions = predictions[0].tolist()
'''Apply emotion labels'''
zipb_object = zip(self.__emotion_labels, predictions)
prediction_dictionary = dict(zipb_object)
return prediction_dictionary
MXNet runs on a asynchronous engine, you don't have to wait for the image to finish processing to enqueue a new one.
Some calls in MXNet are asynchronous, for example when you call mod.forward() this calls returns immediately and does not wait for the result to be computed. Other calls are synchronous, for example mod.get_outputs()[0].asnumpy() this copies the data to the CPU so it has to be synchronous. Having a synchronous call between each of your iteration slows down the processing a bit.
Assuming you have access to the list of image_paths, you can process them like this to minimize waiting time and having a synchronization point only at the end:
results = []
for target_image_path in image_paths:
image_data = self.__get_image_data(target_image_path) # Get pixel data
'''Run inference on the image'''
mod.forward(Batch([mx.nd.array(image_data)]))
results.append(mod.get_outputs()[0])
predictions = [result.asnumpy()[0].tolist() for result in results]
You can read more about asynchronous programming with MXNet here: http://d2l.ai/chapter_computational-performance/async-computation.html
Even better if you know that you have N images to process, you can batch them into batches of for example 16 in order to increase the parallelism of the processing. However doing so will increase the memory consumption. Since you seem to be using an elastic inference context, your overall memory will be limited and I would advise sticking with smaller batch size to not risk to run out of memory.

How to restrict model predicted value within range?

I want to do linear regression with aws sagemaker. Where i have trained my model with some values and it's predicting values as per inputs. but sometimes it predicts value out of range as in i am predicting percentage which can't go less than 0 and more than 100. how can i restrict it here:
sess = sagemaker.Session()
linear =
sagemaker.estimator.Estimator(containers[boto3.Session().region_name],
role,
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
output_path='s3://{}/{}/output'.format(bucket, prefix),
sagemaker_session=sess)
linear.set_hyperparameters(feature_dim=5,
mini_batch_size=100,
predictor_type='regressor',
epochs=10,
num_models=32,
loss='absolute_loss')
linear.fit({'train': s3_train_data, 'validation': s3_validation_data})
how can i make my model not to predict values out of range : [0,100].
Yes you can. You can implement the output_fn to "brick wall" your output. SageMaker would call the output_fn after the model returns the value to do any post-processing of the result.
This can be done by creating a separate python file, specify the output_fn method there.
Provide this python file when instantiating your Estimator.
something like
sess = sagemaker.Session()
linear =
sagemaker.estimator.Estimator(containers[boto3.Session().region_name],
role,
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
output_path='s3://{}/{}/output'.format(bucket, prefix),
sagemaker_session=sess)
linear.set_hyperparameters(feature_dim=5,
mini_batch_size=100,
predictor_type='regressor',
epochs=10,
num_models=32,
loss='absolute_loss',
entry_point = 'entry.py'
)
linear.fit({'train': s3_train_data, 'validation': s3_validation_data})
Your entry.py could look something like
def output_fn(data, accepts):
"""
Args:
data: A result from TensorFlow Serving
accepts: The Amazon SageMaker InvokeEndpoint Accept value. The content type the response object should be
serialized to.
Returns:
object: The serialized object that will be send to back to the client.
"""
Implement the logic to "brick wall" here.
return data.outputs['outputs'].string_val

Network in Network using keras

I want to implement NiN using keras but I could not found useful in net. I want to implement below image architecture. anybody can help??
Just look at the functional API of Keras (https://keras.io/models/model/) and do something like this:
def build_model(input_layer, idx):
# model code (logits = first_layer(parameters)(input_layer)
# could also load an already trained model.
return logits
input_layer = Input(...)
output = input_layer
for i in range(num_models):
output = build_model(output , i)
final_layer = Model(input_layer, output)

How to save features into a file in keras?

I have trained weight matrix, I would like to extract features at each end every layer and store them in a file. How could I do that? Thanks.
Have a look at the Keras FAQ
One simple way is to create a new Model that will output the layers
that you are interested in:
from keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the
output of a certain layer given a certain input, for example:
from keras import backend as K
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([X])[0]
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and
testing phase (e.g. if it uses Dropout, BatchNormalization, etc.),
you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0
layer_output = get_3rd_layer_output([X, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([X, 1])[0]
Then you just need to store your predictions in a file using e.g. np.save('filename.npz',intermediate_output )