Does CMLE provides a REST API endpoint for Prediction? - google-cloud-platform

Is there a way I can access a REST API endpoint for a Model created by Cloud ML Engine? I only see:
gcloud ml-engine jobs submit prediction $JOB_NAME \
--model census \
--version v1 \
--data-format TEXT \
--region $REGION \
--runtime-version 1.10 \
--input-paths gs://cloud-samples-data/ml-engine/testdata/prediction/census.json \
--output-path $GCS_JOB_DIR/predictions

Yes, in fact their are two APIs available to do this.
The projects.predict call is the simplest method. You pass in a request as described here, and it returns with the result. This cannot take input from GCS like your gsutil command.
The projects.jobs.create call with the predictionInput and predictionOutput fields allows batch prediction, with input from GCS.
The equivalent for your command is:
POST https://ml.googleapis.com/v1/projects/$PROJECT_ID/jobs
{
"jobId" : "$JOB_NAME",
"predictionInput": {
"dataFormat": "TEXT",
"inputPaths": "gs://cloud-samples-data/ml-engine/testdata/prediction/census.json",
"region": "REGION",
"runtimeVersion": "1.10",
"modelName": "projects/$PROJECT_ID/models/census"
},
"predictionOutput": {
"outputPath": "$GCS_JOB_DIR/predictions"
}
}
This returns immediately. use projects.jobs.get to check for success/failure.

Related

Specify signature name on Vertex AI Predict

I've deployed a tensorflow model in vertex AI platform using TFX Pipelines. The model have custom serving signatures but I'm strugling to specify the signature when I'm predicting.
I've the exact same model deployed in GCP AI Platform and I'm able to specify it.
According to the vertex documentation, we must pass a dictionary containing the Instances (List) and the Parameters (Dict) values.
I've submitted these arguments to this function:
instances: [{"argument_n": "value"}]
parameters: {"signature_name": "name_of_signature"}
Doesn't work, it still get the default signature of the model.
In GCP AI Platform, I've been able to predict directly specifying in the body of the request the signature name:
response = service.projects().predict(
name=name,
body={"instances": instances,
"signature_name": "name_of_signature"},
).execute()
#EDIT
I've discovered that with the rawPredict method from gcloud it works.
Here is an example:
!gcloud ai endpoints raw-predict {endpoint} --region=us-central1 \
--request='{"signature_name":"name_of_the_signature", \
"instances": [{"instance_0": ["value_0"], "instance_1": ["value_1"]}]}'
Unfortunately, looking at google api models code it only have the predict method, not the raw_predict. So I don't know if it's available through python sdk right now.
Vertex AI is a newer platform with limitations that will be improved over time. “signature_name” can be added to HTTP JSON Payload in RawPredictRequest or from gcloud as you have done but right now this is not available in regular predict requests.
Using HTTP JSON payload :
Example:
input.json :
{
"instances": [
["male", 29.8811345124283, 26.0, 1, "S", "New York, NY", 0, 0],
["female", 48.0, 39.6, 1, "C", "London / Paris", 0, 1]],
"signature_name": <string>
}
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}:rawPredict \
-d "#input.json"

How do I set up my API to require an API key with amazon API Gateway?

I have been following advice on this post I've created an API key on AWS and set my POST method to require an API key.
I have also setup a usage plan and linked that API key to it.
My API key is enabled
When I have been testing requests with postman, my request still goes through without any additional headers.
I was expecting no requests to go through unless I had included a header in my request like this "x-api-key":"my_api_key"
Do I need to change the endpoint I send requests to in postman for them to go through API Gateway?
If you need to enable API key for each method then needs to be enabled API key required true for each method.
Go to resources--> select your resource and method, go to Method Request and set "API Key Required" to true.
https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-use-postman-to-call-api.html
https://docs.aws.amazon.com/apigateway/latest/developerguide/welcome.html
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-api-key-source.html
If you want, I've made the following script to enable the API key on every method for certain API. It requires the jq tool for advanced JSON parsing.
You can find the script to enable the API key for all methods of an API Gateway API on this gist.
#!/bin/bash
api_gateway_method_enable_api_key() {
local api_id=$1
local method_id=$2
local method=$3
aws --profile "$profile" --region "$region" \
apigateway update-method \
--rest-api-id "$api_id" \
--resource-id "$method_id" \
--http-method "$method" \
--patch-operations op="replace",path="/apiKeyRequired",value="true"
}
# change this to 1 in order to execute the update
do_update=0
profile=your_profile
region=us-east-1
id=your_api_id
tmp_file="/tmp/list_of_endpoint_and_methods.json"
aws --profile $profile --region $region \
apigateway get-resources \
--rest-api-id $id \
--query 'items[?resourceMethods].{p:path,id:id,m:resourceMethods}' >"$tmp_file"
while read -r line; do
path=$(jq -r '.p' <<<"$line")
method_id=$(jq -r '.id' <<<"$line")
echo "$path"
# do not update OPTIONS method
for method in GET POST PUT DELETE; do
has_method=$(jq -r ".m.$method" <<<"$line")
if [ "$has_method" != "null" ]; then
if [ $do_update -eq 1 ]; then
api_gateway_method_enable_api_key "$id" "$method_id" "$method"
echo " $method method changed"
else
echo " $method method will be changed"
fi
fi
done
done <<<"$(jq -c '.[]' "$tmp_file")"

AWS EMR Spark Step args bug

I'm submitting a Spark job to EMR via AWSCLI, EMR steps and spark configs are provided as separate json files. For some reason the name of my main class gets passed to my Spark jar as an unnecessary command line argument, resulting in a failed job.
AWSCLI command:
aws emr create-cluster \
--name "Spark-Cluster" \
--release-label emr-5.5.0 \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
InstanceGroupType=CORE,InstanceCount=20,InstanceType=m3.xlarge \
--applications Name=Spark \
--use-default-roles \
--configurations file://conf.json \
--steps file://steps.json \
--log-uri s3://blah/logs \
The json file describing my EMR Step:
[
{
"Name": "RunEMRJob",
"Jar": "s3://blah/blah.jar",
"ActionOnFailure": "TERMINATE_CLUSTER",
"Type": "CUSTOM_JAR",
"MainClass": "blah.blah.MainClass",
"Args": [
"--arg1",
"these",
"--arg2",
"get",
"--arg3",
"passed",
"--arg4",
"to",
"--arg5",
"spark",
"--arg6",
"main",
"--arg7",
"class"
]
}
]
The argument parser in my main class throws an error (and prints the parameters provided):
Exception in thread "main" java.lang.IllegalArgumentException: One or more parameters are invalid or missing:
blah.blah.MainClass --arg1 these --arg2 get --arg3 passed --arg4 to --arg5 spark --arg6 main --arg7 class
So for some reason the main class that I define in steps.json leaks into my separately provided command line arguments.
What's up?
I misunderstood how EMR steps work. There were two options for resolving this:
I could use Type = "CUSTOM_JAR" with Jar = "command-runner.jar" and add a normal spark-submit call to Args.
Using Type = "Spark" simply adds the "spark-submit" call as the first argument, one still needs to provide a master, jar location, main class etc...

Find Google Cloud Platform Operations Performed by a User

Is there a way to track what Google Cloud Platform operations were performed by a user? We want to audit our costs and track usage accordingly.
Edit: there's a Cloud SDK (gcloud) command:
compute operations list
that lists actions taken on Compute Engine instances. Is there a way to see what user performed these actions?
While you can't see a list of gcloud commands executed, you can see a list of API actions. gcloud beta logging surface help with listing/reading logs, but via the console it's a bit harder to use. Try checking the logs on the cloud console.
If you wish to only track Google Cloud Project (GCP) Compute Engine (GCE) operations with the list command for the operations subgroup, you are able to use the --filter flag to see operations performed by a given user $GCE_USER_NAME:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--limit=1 \
--sort-by="~endTime"
#=>
NAME TYPE TARGET HTTP_STATUS STATUS TIMESTAMP
$GCP_COMPUTE_OPERATION_NAME start $GCP_COMPUTE_INSTANCE_NAME 200 DONE 1970-01-01T00:00:00.001-00:00
Note: feeding the string "~endTime" into the --sort-by flag puts the most recent GCE operation first.
It might help to retrieve the entire log object in JSON:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--format=json \
--limit=1 \
--sort-by="~endTime"
#=>
[
{
"endTime": "1970-01-01T00:00:00.001-00:00",
. . .
"user": "$GCP_COMPUTE_USER"
}
]
or YAML:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--format=yaml \
--limit=1 \
--sort-by="~endTime"
#=>
---
endTime: '1970-01-01T00:00:00.001-00:00'
. . .
user: $GCP_COMPUTE_USER
You are also able to use the Cloud SDK (gcloud) to explore all audit logs, not just audit logs for GCE; it is incredibly clunky, as the other existing answer points out. However, for anyone who wants to use gcloud instead of the console:
gcloud logging read \
'logName : "projects/$GCP_PROJECT_NAME/logs/cloudaudit.googleapis.com"
protoPayload.authenticationInfo.principalEmail="GCE_USER_NAME"
severity>=NOTICE' \
--freshness="1d" \
--limit=1 \
--order="desc" \
--project=$GCP_PROJECT_NAME
#=>
---
insertId: . . .
. . .
protoPayload:
'#type': type.googleapis.com/google.cloud.audit.AuditLog
authenticationInfo:
principalEmail: $GCP_COMPUTE_USER
. . .
. . .
The read command defaults to YAML format, but you can also get your audit logs in JSON:
gcloud logging read \
'logName : "projects/$GCP_PROJECT_NAME/logs/cloudaudit.googleapis.com"
protoPayload.authenticationInfo.principalEmail="GCE_USER_NAME"
severity>=NOTICE' \
--format=json \
--freshness="1d" \
--limit=1 \
--order="desc" \
--project=$GCP_PROJECT_NAME
#=>
[
{
. . .
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "$GCE_USER_NAME"
},
. . .
},
. . .
}
]

How to add an index from command line to DynamoDB after table was created

Could you please point me to an appropriate documentation topic or provide an example how to add index to DynamoDB as far as I couldn't find any related info.
According to this blog: http://aws.amazon.com/blogs/aws/amazon-dynamodb-update-online-indexing-reserved-capacity-improvements/?sc_ichannel=em&sc_icountry=global&sc_icampaigntype=launch&sc_icampaign=em_130867660&sc_idetail=em_1273527421&ref_=pe_411040_130867660_15 it seems to be possible to do it with UI, however there are no mentions about CLI interface usages.
Thanks in advance,
Yevhenii
The aws command has help for every level of subcommand. For example, you can run aws help to get a list of all service names and discover the name dynamodb. Then you can aws dynamodb help to find the list of DDB commands and find that update-table is a likely culprit. Finally, aws dynamodb update-table help shows you the flags needed to add a global secondary index.
The AWS CLI documentation is really poor and lacks examples. Evidently AWS is promoting the SDK or the console.
This should work for updating
aws dynamodb update-table --table-name Test \
--attribute-definitions AttributeName=City,AttributeType=S AttributeName=State,AttributeType=S \
--global-secondary-index-updates \
"Create={"IndexName"="state-index", "KeySchema"=[ {"AttributeName"="State", "KeyType"="HASH" }], "Projection"={"ProjectionType"="INCLUDE", "NonKeyAttributes"="City"}, "ProvisionedThroughput"= {"ReadCapacityUnits"=1, "WriteCapacityUnits"=1} }"
Here's a shell function to do this that sets the R/W caps, and optionally handles --global-secondary-index-updates if an index name is provided
dynamodb_set_caps() {
# [ "$1" ] || fail_exit "Missing table name"
# [ "$3" ] || fail_exit "Missing read capacity"
# [ "$3" ] || fail_exit "Missing write capacity"
if [ "$4" ] ; then
aws dynamodb update-table --region $region --table-name ${1} \
--provisioned-throughput ReadCapacityUnits=${2},WriteCapacityUnits=${3} \
--global-secondary-index-updates \
"Update={"IndexName"="${4}", "ProvisionedThroughput"= {"ReadCapacityUnits"=${2}, "WriteCapacityUnits"=${3}} }"
else
aws dynamodb update-table --region $region --table-name ${1} \
--provisioned-throughput ReadCapacityUnits=${2},WriteCapacityUnits=${3}
fi
}
Completely agree that the aws docs are lacking in this area
Here is reference for creating a global secondary index:
https://docs.aws.amazon.com/pt_br/amazondynamodb/latest/developerguide/getting-started-step-6.html
However the example only provides the creation of an index for a single primary key.
This code helped me to create a global secondary index for a composite primary key:
aws dynamodb update-table \
--table-name YourTableName \
--attribute-definitions AttributeName=GSI1PK,AttributeType=S \
AttributeName=GSI1SK,AttributeType=S \
AttributeName=createdAt,AttributeType=S \
--global-secondary-index-updates \
"[{\"Create\":{\"IndexName\": \"GSI1\",\"KeySchema\":[{\"AttributeName\":\"GSI1PK\",\"KeyType\":\"HASH\"},{\"AttributeName\":\"GSI1SK\",\"KeyType\":\"RANGE\"}], \
\"ProvisionedThroughput\": {\"ReadCapacityUnits\": 5, \"WriteCapacityUnits\": 5 },\"Projection\":{\"ProjectionType\":\"ALL\"}}}]" --endpoint-url http://localhost:8000
A note in the bottom line considers that you are creating this index in your local database. If not, just delete it.