Batch prediction Input - google-cloud-platform

I have a tensorflow model deployed on Vertex AI of Google Cloud. The model definition is:
item_model = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=item_vocab, mask_token=None),
tf.keras.layers.Embedding(len(item_vocab) + 1, embedding_dim)
])
user_model = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=user_vocab, mask_token=None),
# We add an additional embedding to account for unknown tokens.
tf.keras.layers.Embedding(len(user_vocab) + 1, embedding_dim)
])
class NCF_model(tf.keras.Model):
def __init__(self,user_model, item_model):
super(NCF_model, self).__init__()
# define all layers in init
self.user_model = user_model
self.item_model = item_model
self.concat_layer = tf.keras.layers.Concatenate()
self.feed_forward_1 = tf.keras.layers.Dense(32,activation= 'relu')
self.feed_forward_2 = tf.keras.layers.Dense(64,activation= 'relu')
self.final = tf.keras.layers.Dense(1,activation= 'sigmoid')
def call(self, inputs ,training=False):
user_id , item_id = inputs[:,0], inputs[:,1]
x = self.user_model(user_id)
y = self.item_model(item_id)
x = self.concat_layer([x,y])
x = self.feed_forward_1(x)
x = self.feed_forward_2(x)
x = self.final(x)
return x
The model has two string inputs and it outputs a probability value.
When I use the following input in the batch prediction file, I get an empty prediction file.
Sample of csv input file:
userid,itemid
yuu,190767
yuu,364
yuu,154828
yuu,72998
yuu,130618
yuu,183979
yuu,588
When I use a jsonl file with the following input.
{"input":["yuu", "190767"]}
I get the following error.
('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 400 ({\n "error": "Failed to process element: 0 key: input of \'instances\' list. Error: INVALID_ARGUMENT: JSON object: does not have named input: input"\n}) from server, retry=3.', 1)
What seems to be going wrong with these inputs?

After a bit of experimenting, I found out what was wrong with the batch prediction input. In the csv file, the item column was being interpreted as an integer whereas the model has a string as an input. I'm not sure why there was no output at all in that case and I couldn't find the logs for the batch prediction.
The correct format for jsonlines was:
["user1", "item1"]
["user2", "item2"]
["user3", "item3"]
The one I used assumed the input was a named layer, 'input'. In all of this, I found the documentation of google cloud to be lacking.

Related

Google Cloud Video Intelligence API in Python - Unable to run object tracking on multiple videos in a folder

I'm trying to run object tracking on a folder containing multiple videos. There are 5 videos in my bucket and following the documentation from here, it suggests using the wildcard (*) operator. However, when I run the entire script, only 1 video gets annotated and not the entire folder containing 5 videos. Also, response2.json does not get created as the output_uri in my GCS bucket.
To identify multiple videos, a video URI may include wildcards in the object-id. Supported wildcards: ' * ' to match 0 or more characters; ‘?’ to match 1 character.
https://googleapis.dev/python/videointelligence/latest/gapic/v1/types.html
Which is what I've done in my input_uri bit of the code:
gcs_uri = 'gs://video_intel/*'
If you check the screenshot, it should the bucket id name and shows multiple videos in the same folder.
Can anyone pls help with this question. Thanks.
Full script:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS']='poc-video-intelligence-da5d4d52cb97.json'
"""Object tracking in a video stored on GCS."""
from google.cloud import videointelligence
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]
gcs_uri = 'gs://video_intel/*'
output_uri = 'gs://video_intel/response2.json'
operation = video_client.annotate_video(input_uri=gcs_uri, features=features, output_uri=output_uri)
print("\nProcessing video for object annotations.")
result = operation.result(timeout=300)
print("\nFinished processing.\n")
# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations
for object_annotation in object_annotations:
print("Entity description: {}".format(object_annotation.entity.description))
if object_annotation.entity.entity_id:
print("Entity id: {}".format(object_annotation.entity.entity_id))
print(
"Segment: {}s to {}s".format(
object_annotation.segment.start_time_offset.seconds
+ object_annotation.segment.start_time_offset.nanos / 1e9,
object_annotation.segment.end_time_offset.seconds
+ object_annotation.segment.end_time_offset.nanos / 1e9,
)
)
print("Confidence: {}".format(object_annotation.confidence))
# Here we print only the bounding box of the first frame in the segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print(
"Time offset of the first frame: {}s".format(
frame.time_offset.seconds + frame.time_offset.nanos / 1e9
)
)
print("Bounding box position:")
print("\tleft : {}".format(box.left))
print("\ttop : {}".format(box.top))
print("\tright : {}".format(box.right))
print("\tbottom: {}".format(box.bottom))
print("\n")
please modify 'gcs_uri = 'gs://video_intel/'' to gcs_uri = 'gs://video_intel/.*'

PowerBI Query WebMethod.Post returns Expression.Error: We cannot convert the value "POST" to type Function

I'm using a website that requires that their API key AND query data be submitted using Webform.Post method. I'm able to get this to work in Python, C# and I'm even able to construct and execute a cURL command which returns a usable JSON file that Excel can parse. I am also using Postman to validate my parameters and everything looks good using all these methods. However, my goal is to build a query form that I can use within Excel but I can't get past this query syntax in PowerBi Query.
For now I am doing a simple query. That query looks like this:
let
url_1 = "https://api.[SomeWebSite].com/api/v1.0/search/keyword?apiKey=blah-blah-blah",
Body_1 = {
"SearchByKeywordRequest:
{
""keyword"": ""Hex Nuts"",
""records"": 0,
""startingRecord"": 0,
""searchOptions"": Null.Type,
""searchWithYourSignUpLanguage"": Null.Type
}"
},
Source = WebMethod.Post(url_1,Body_1)
in
Source
ScreenSnip showing valid syntax
It generates the following error:
Expression.Error: We cannot convert the value "POST" to type Function.
Details:
Value=POST
Type=[Type]
ScreenSnip of Error as it appears in PowerQuery Advanced Editor
I've spend the better part of the last two days trying to find either some example using this method or documentation. The Microsoft documentation simple states the follow:
WebMethod.Post
04/15/2018
2 minutes to read
About
Specifies the POST method for HTTP.
https://learn.microsoft.com/en-us/powerquery-m/webmethod-post
This is of no help and the only posts I have found so far criticize the poster for not using GET versus POST. I would do this but it is NOT supported by the website I'm using. If someone could just please either point me to a document which explains what I am doing wrong or suggest a solution, I would be grateful.
WebMethod.Post is not a function. It is a constant text value "POST". You can send POST request with either Web.Contents or WebAction.Request function.
A simple example that posts JSON and receives JSON:
let
url = "https://example.com/api/v1.0/some-resource-path",
headers = [#"Content-Type" = "application/json"],
body = Json.FromValue([Foo = 123]),
source = Json.Document(Web.Contents(url, [Headers = headers, Content = body])),
...
Added Nov 14, 19
Request body needs to be a binary type, and included as Content field of the second parameter of Web.Contents function.
You can construct a binary JSON value using Json.FromValue function. Conversely, you can convert a binary JSON value to a corresponding M type using Json.Document function.
Note {} is list type in M language, which is similar to JSON array. [] is record type, which is similar to JSON object.
With that said, your query should be something like this,
let
url_1 = "https://api.[SomeWebSite].com/api/v1.0/search/keyword?apiKey=blah-blah-blah",
Body_1 = Json.FromValue([
SearchByKeywordRequest = [
keyword = "Hex Nuts",
records = 0,
startingRecord = 0,
searchOptions = null,
searchWithYourSignUpLanguage = null
]
]),
headers = [#"Content-Type" = "application/json"],
source = Json.Document(Web.Contents(url_1, [Headers = headers, Content = Body_1])),
...
References:
Web.Contents (https://learn.microsoft.com/en-us/powerquery-m/web-contents)
Json.FromValue (https://learn.microsoft.com/en-us/powerquery-m/json-fromvalue)
Json.Document (https://learn.microsoft.com/en-us/powerquery-m/json-document)

How can I fix my estimator classifier code

Hi I'm new to Tensorflow and I've been practicing with the tensorflow.estimator library. Basically I ran the inbuilt tf.estimator.DNNClassifier algorithm below
import tensorflow as tf
def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle, repeat, and batch the examples.
return dataset.shuffle(1000).repeat().batch(batch_size)
# Feature columns describe how to use the input.
my_feature_columns = []
for key in landmark_features.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(feature_columns=my_feature_columns, hidden_units=[10, 10],n_classes=10)
dataset = train_input_fn(landmark_features, emotion_labels, batch_size = 1375 )
However I keep getting the following error:
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpc_tag0rc
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpc_tag0rc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
Any idea on what I can do to fix my code ?

How to pass base64 encoded image to Tensorflow prediction?

I have a google-cloud-ml model that I can run prediction by passing a 3 dimensional array of float32...
{ 'instances' [ { 'input' : '[ [ [ 0.0 ], [ 0.5 ], [ 0.8 ] ] ... ] ]' } ] }
However this is not an efficient format to transmit images, so I'd like to pass base64 encoded png or jpeg. This document talks about doing that, but what is not clear is what the entire json object looks like. Does the { 'b64' : 'x0welkja...' } go in place of the '[ [ [ 0.0 ], [ 0.5 ], [ 0.8 ] ] ... ] ]', leaving the enclosing 'instances' and 'input' the same? Or some other structure? Or does the tensorflow model have to be trained on base64?
The TensorFlow model does not have to be trained on base64 data. Leave your training graph as is. However, when exporting the model, you'll need to export a model that can accept PNG or jpeg (or possibly raw, if it's small) data. Then, when you export the model, you'll need to be sure to use a name for the output that ends in _bytes. This signals to CloudML Engine that you will be sending base64 encoded data. Putting it all together would like something like this:
from tensorflow.contrib.saved_model.python.saved_model import utils
# Shape of [None] means we can have a batch of images.
image = tf.placeholder(shape = [None], dtype = tf.string)
# Decode the image.
decoded = tf.image.decode_jpeg(image, channels=3)
# Do the rest of the processing.
scores = build_model(decoded)
# The input name needs to have "_bytes" suffix.
inputs = { 'image_bytes': image }
outputs = { 'scores': scores }
utils.simple_save(session, export_dir, inputs, outputs)
The request you send will look something like this:
{
"instances": [{
"b64": "x0welkja..."
}]
}
If you just want an efficient way to send images to a model (and not necessarily base-64 encode it), I would suggest uploading your images(s) to Google Cloud Storage and then having your model read off GCS. This way, you are not limited by image size and you can take advantage of multi-part, multithreaded, resumable uploads etc. that the GCS API provides.
TensorFlow's tf.read_file will directly off GCS. Here's an example of a serving input_fn that will do this. Your request to CMLE would send it an image URL (gs://bucket/some/path/to/image.jpg)
def read_and_preprocess(filename, augment=False):
# decode the image file starting from the filename
# end up with pixel values that are in the -1, 1 range
image_contents = tf.read_file(filename)
image = tf.image.decode_jpeg(image_contents, channels=NUM_CHANNELS)
image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 0-1
image = tf.expand_dims(image, 0) # resize_bilinear needs batches
image = tf.image.resize_bilinear(image, [HEIGHT, WIDTH], align_corners=False)
#image = tf.image.per_image_whitening(image) # useful if mean not important
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0) # -1 to 1
return image
def serving_input_fn():
inputs = {'imageurl': tf.placeholder(tf.string, shape=())}
filename = tf.squeeze(inputs['imageurl']) # make it a scalar
image = read_and_preprocess(filename)
# make the outer dimension unknown (and not 1)
image = tf.placeholder_with_default(image, shape=[None, HEIGHT, WIDTH, NUM_CHANNELS])
features = {'image' : image}
return tf.estimator.export.ServingInputReceiver(features, inputs)
Your training code will train off actual images, just as in rhaertel80's suggestion above. See https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/08_image/flowersmodel/trainer/task.py#L27 for what the training/evaluation input functions would look like.
I was trying to use #Lak's answer (thanks Lak) to get online predictions for multiple instances in one json file, but kept getting the following error (I had two instances in my test json, hence the shape [2]):
input filename tensor must be scalar but had shape [2]
The problem is that ML engine apparently batches all the instances together and passes them to the serving inpur receiver function, but #Lak's sample code assumes the input is a single instance (it indeed works fine if you have a single instance in your json). I altered the code so that it can process a batch of inputs. I hope it will help someone:
def read_and_preprocess(filename):
image_contents = tf.read_file(filename)
image = tf.image.decode_image(image_contents, channels=NUM_CHANNELS)
image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 0-1
return image
def serving_input_fn():
inputs = {'imageurl': tf.placeholder(tf.string, shape=(None))}
filename = inputs['imageurl']
image = tf.map_fn(read_and_preprocess, filename, dtype=tf.float32)
# make the outer dimension unknown (and not 1)
image = tf.placeholder_with_default(image, shape=[None, HEIGHT, WIDTH, NUM_CHANNELS])
features = {'image': image}
return tf.estimator.export.ServingInputReceiver(features, inputs)
The key changes are that 1) you don't squeeze the input tensor (that would cause trouble in the special case when your json contains only one instance) and, 2) use tf.map_fn to apply the read_and_preprocess function to a batch of input image urls.

tensorboard embeddings show no data

I am rying to use tensorboard embeddings page in order to visualize the results of word2vec. After debugging, digging of lots of codes i came to a point that tensorboard successfully runs, reads the confguration file, reads the tsv files but now the embeddings page does not show data.
( the page is opened , i can see the menus , items etc) this is my config file:
embeddings {
tensor_name: 'word_embedding'
metadata_path: 'c:\data\metadata.tsv'
tensor_path: 'c:\data\tensors2.tsv'
}
What can be the problem?
The tensor file originally is 1gb. in size, if i try that file , the app crashes becasue of the memory. So i copy and paste 1 or 2 pages of the original file into tensor2.tsv and use this file. May be this is the problem. May be i need to create more data by copy/ paste.
thx
tolga
Try following code snippet to get visualized word embedding in tensorboard. Open tensorboard with logdir, check localhost:6006 for viewing your embedding.
tensorboard --logdir="visual/1"
# code
fname = "word2vec_model_1000"
model = gensim.models.keyedvectors.KeyedVectors.load(fname)
# project part of vocab, max of 100 dimension
max = 1000
w2v = np.zeros((max,100))
with open("prefix_metadata.tsv", 'w+') as file_metadata:
for i,word in enumerate(model.wv.index2word[:max]):
w2v[i] = model.wv[word]
file_metadata.write(word + '\n')
# define the model without training
sess = tf.InteractiveSession()
with tf.device("/cpu:0"):
embedding = tf.Variable(w2v, trainable=False, name='prefix_embedding')
tf.global_variables_initializer().run()
path = 'visual/1'
saver = tf.train.Saver()
writer = tf.summary.FileWriter(path, sess.graph)
# adding into projector
config = projector.ProjectorConfig()
embed= config.embeddings.add()
embed.tensor_name = 'prefix_embedding'
embed.metadata_path = 'prefix_metadata.tsv'
# Specify the width and height of a single thumbnail.
projector.visualize_embeddings(writer, config)
saver.save(sess, path+'/prefix_model.ckpt', global_step=max)