Can you start AI platform jobs from HTTP requests? - google-cloud-platform

I have a web app (react + node.js) running on App Engine.
I would like to kick off (from this web app) a Machine Learning job that requires a GPU (running in a container on AI platform or running on GKE using a GPU node pool like in this tutorial, but we are open to other solutions).
I was thinking of trying what is described at the end of this answer, basically making an HTTP request to start the job using project.job.create API.
More details on the ML job in case this is useful: it generates an output every second that is stored on Cloud Storage and then read in the web app.
I am looking for examples of how to set this up? Where would the job configuration live and how should I set up the API call to kick off that job? Are the there other ways to achieve the same result?
Thank you in advance!

On Google Cloud, all is API, and you can interact with all the product with HTTP request. SO you can definitively achieve what you want.
I personally haven't an example but you have to build a JSON job description and post it to the API.
Don't forget, when you interact with Google Cloud API, you have to add an access token in the Authorization: Bearer header
Where should be your job config description? It depends...
If it is strongly related to your App Engine app, you can add it in App Engine code itself and have it "hard coded". The downside of that option is anytime you have to update the configuration, you have to redeploy a new App Engine version. But if your new version isn't correct, a rollback to a previous and stable version is easy and consistent.
If you prefer to update differently your config file and your App Engine code, you can store the config out of App Engine code, on Cloud Storage for instance. Like that, the update is simple and easy: update the config on Cloud Storage to change the job configuration. However there is no longer relation between the App Engine version and the config version. And the rollback to a stable version can be more difficult.
You can also have a combination of both, where you have a default job configuration in your App Engine code, and an environment variable potentially set to point to a Cloud Storage file that contain a new version of the configuration.
I don't know if it answers all your questions. Don't hesitate to comment if you want more details on some parts.

As mentionated, you can use the AI Platform api to create a job via a post.
Following is an example using Java Script and request to trig a job.
Some usefull tips:
Jobs console to create a job manually, then use the api to list this job then you will have a perfect json example of how to trig it.
You can use the Try this API tool to get the json output of the manually created job. Use this path to get the job: projects/<project name>/jobs/<job name>.
Get the authorization token using the OAuth 2.0 Playground for tests purposes (Step 2 -> Access token:). Check the docs for a definitive way.
Not all parameters are required on the json, thtas jus one example of the job that I have created and got the json using the steps above.
JS Example:
var request = require('request');
request({
url: 'https://content-ml.googleapis.com/v1/projects/<project-name>/jobs?alt=json',
method: 'POST',
headers: {"authorization": "Bearer ya29.A0AR9999999999999999999999999"},
json: {
"jobId": "<job name>",
"trainingInput": {
"scaleTier": "CUSTOM",
"masterType": "standard",
"workerType": "cloud_tpu",
"workerCount": "1",
"args": [
"--training_data_path=gs://<bucket>/*.jpg",
"--validation_data_path=gs://<bucket>/*.jpg",
"--num_classes=2",
"--max_steps=2",
"--train_batch_size=64",
"--num_eval_images=10",
"--model_type=efficientnet-b0",
"--label_smoothing=0.1",
"--weight_decay=0.0001",
"--warmup_learning_rate=0.0001",
"--initial_learning_rate=0.0001",
"--learning_rate_decay_type=cosine",
"--optimizer_type=momentum",
"--optimizer_arguments=momentum=0.9"
],
"region": "us-central1",
"jobDir": "gs://<bucket>",
"masterConfig": {
"imageUri": "gcr.io/cloud-ml-algos/image_classification:latest"
}
},
"trainingOutput": {
"consumedMLUnits": 1.59,
"isBuiltInAlgorithmJob": true,
"builtInAlgorithmOutput": {
"framework": "TENSORFLOW",
"runtimeVersion": "1.15",
"pythonVersion": "3.7"
}
}
}
}, function(error, response, body){
console.log(body);
});
Result:
...
{
createTime: '2022-02-09T17:36:42Z',
state: 'QUEUED',
trainingOutput: {
isBuiltInAlgorithmJob: true,
builtInAlgorithmOutput: {
framework: 'TENSORFLOW',
runtimeVersion: '1.15',
pythonVersion: '3.7'
}
},
etag: '999999aaaac='

Thank you everyone for the input. This was useful to help me resolve my issue, but I wanted to also share the approach I ended up taking:
I started by making sure I could kick off my job manually.
I used this tutorial with a config.yaml file that looked like this:
workerPoolSpecs:
machineSpec:
machineType: n1-standard-4
acceleratorType: NVIDIA_TESLA_T4
acceleratorCount: 1
replicaCount: 1
containerSpec:
imageUri: <Replace this with your container image URI>
args: ["--some=argument"]
When I had a job that could be kicked off manually, I switched to using
the Vertex AI Node.js API to start the job or cancel it. The API exists in other languages.
I know my original question was about HTTP requests, but having an API in the language was a lot easier for me, in particular because I didn't have to worry about authentification.
I hope that is useful, happy to provide mode details if needed.

Related

Consuming API's using AWS Lambda

I am a newcomer to AWS with very little cloud experience. The project I have is to call and consume a API from NOAA, and then save parse the returned XML document to a database. I have a ASP.NET console app that is able to do this pretty easily and successfully. However, I need to do the same thing, but in the cloud on a serverless architecture. Here are the steps I am wanting it to take:
Lambda calls the API at NOAA everyday at midnight
the API returns an XML doc with results
Parse the data and save the data to a cloud PostgreSQL database
It sounds simple, but I am having one heck of a time figuring out how to do this. I have a DB requisitioned from AWS, as that is where data is currently going through my console app. Does anyone have any advice or a resource I could look at for advice? Also, I would prefer to keep this in .NET, but realize that I may need to move it to Python.
Thanks in advance everyone!
Its pretty simple and you can test your code with below simple python boto3 lambda code.
Create new lambda function with admin access (temporary set Admin role and then you can set required role)
Add the following code
https://github.com/mmakadiya/public_files/blob/main/lambda_call_get_rest_api.py
import json
import urllib3
def lambda_handler(event, context):
# TODO implement
http = urllib3.PoolManager()
r = http.request('GET', 'http://api.open-notify.org/astros.json')
print(r.data)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
The above code will run the REST API to fetch the data. This is just a sample program and it will help you to go further.
MAKE SURE that Lambda function max run time is `15 minutes and it can not be run >15 min so think accordingly.

Editing API Documentation in CDK

In the AWS Documentation for API Gateway, there are ways to edit your API documentation in the console. Working in CDK, I can't find any way to achieve the same thing. The goal is to create the exact same outputs.
Question 1:
See API Gateway documentation in console. This shows how you can edit pretty much everything you need to get nice headings and so on in your swagger / redoc outputs. But I can't find any way of inserting chunks of yaml / json into the doc in cdk.
Question 2:
Is it possible to prevent your exported OAS file from including all of the options methods? I want to automate the process of updating the API docs after cdk deploy, so it should be done as part of the code.
Question 3:
How can you add tags to break your API into logical groupings. Again, this is something that is very useful in standard API documentation, but I can't find the related section in cdk anywhere?
Really, I think AWS could knock up a short petstore example to help us all out. If I get it working, perhaps I'll come back here and post up one of my own with notes.
Question 1 & Question 3:
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
//https://docs.aws.amazon.com/apigateway/latest/api/API_DocumentationPart.html
new apigateway.CfnDocumentationPart(this, 'GetDocumentationPart', {
location: {
method: 'GET',
path: '/my/path',
type: 'METHOD',
},
properties: `{
"tags": ["example"],
"description": "This is a description of the method."
}`,
restApiId: 'api-id',
});
//https://docs.aws.amazon.com/apigateway/latest/api/API_DocumentationVersion.html
new apigateway.CfnDocumentationVersion(
this,
'DocumentationVersion',
{
documentationVersion: 'generate-version-id',
restApiId: 'api-id',
}
);
Notice that you need to generate a version to publish the documentation changes. For the version value, I'll suggest generating a UUID.
According to the example above, the GET /my/path will be grouped by the "example" tag at a Swagger UI.
Question 2:
No, it is not possible.
I solved it by creating a lambda function listening to the api-gateway deployment event, getting the JSON file from the api-gateway via AWS SDK, parsing it for removing unwanted paths, and storing it in an S3 bucket.

Gatsby GDPR cookies plugin not working - Google analytics script is missing in <head> section

I spent a lot of time troubleshooting my problem. I want to check value of my own cookie using gatsby-plugin-gdpr-cookies. I hardcoded, that value of gatsby-gdpr-google-analytics is always true to demonstrate if plugin is working correctly. Problem is, that my website still doesn't have <script> for Google analytics in <head> section. Any Ideas how to solve this? I tried to move plugin on different levels in gatsby-config.js, but nothing worked.
This is my configuration:
plugins: [
{
resolve: `gatsby-plugin-gdpr-cookies`,
options: {
googleAnalytics: {
trackingId: 'MY_TRACKING_ID',
cookieName: 'gatsby-gdpr-google-analytics',
anonymize: true,
allowAdFeatures: false
}
},
environments: ['production', 'development']
},
Google recently applied some changes in their analytics API (migrated to v4) and it seems that some of the plugins haven't updated yet so they won't work. You will notice because the trackingId changed its format.
You can follow the stack trace in this GitHub thread (from the plugin's owner).
At this point, you can make some trials:
Upgrade the dependency keeping in mind that it may won't work yet: npm upgrade gatsby-plugin-gdpr-cookies or yarn upgrade gatsby-plugin-gdpr-cookies
Switching to the old analytics tracking: set it in your Analytics dashboard, get the new tracking identifier and use it in:
trackingId: 'OLD_TRACKING_ID',
Swtich to another React-based approach using some other dependencies that supports the v4 tracking, such ga-4-react. You'll need to customize your <head> tag using <Helmet> component to add the tracking identifier. To avoid multiple instances of the same module, you may need to use some of the gatsby-browsers.js API, onClientEntry or onInitialClientRender should fit for your use-case:
exports.onClientEntry = () => {
yourAnalyticsFunction()
}

Google Dataprep copy flows from one project to another

I have two Google projects: dev and prod. I import data from also different storage buckets located in these projects: dev-bucket and prod-bucket.
After I have made and tested changes in the dev environment, how can I smoothly apply (deploy/copy) the changes to prod as well?
What I do now is I export the flow from devand then re-import it into prod. However, each time I need to manually do the following in the `prod flows:
Change the dataset that serve as inputs in the flow
Replace the manual and scheduled destinations for the right BigQuery dataset (dev-dataset-bigquery and prod-dataset-bigquery)
How can this be done more smoother?
If you want to copy data between Google Cloud Storage (GCS) buckets dev-bucket and prod-bucket, Google provides a Storage Transfer Service with this functionality. https://cloud.google.com/storage-transfer/docs/create-manage-transfer-console You can either manually trigger data to be copied from one bucket to another or have it run on a schedule.
For the second part, it sounds like both dev-dataset-bigquery and prod-dataset-bigquery are loaded from files in GCS? If this is the case, the BigQuery Transfer Service may be of use. https://cloud.google.com/bigquery/docs/cloud-storage-transfer You can trigger a transfer job manually, or have it run on a schedule.
As others have said in the comments, if you need to verify data before initiating transfers from dev to prod, a CI system such as spinnaker may help. If the verification can be automated, a system such as Apache Airflow (running on Cloud Composer, if you want a hosted version) provides more flexibility than the transfer services.
Follow below procedure for movement from one environment to another using API and for updating the dataset and the output as per new environment.
1)Export a plan
GET
https://api.clouddataprep.com/v4/plans/<plan_id>/package
2)Import the plan
Post:
https://api.clouddataprep.com/v4/plans/package
3)Update the input dataset
PUT:
https://api.clouddataprep.com/v4/importedDatasets/<datset_id>
{
"name": "<new_dataset_name>",
"bucket": "<bucket_name>",
"path": "<bucket_file_name>"
}
4)Update the output
PATCH
https://api.clouddataprep.com/v4/outputObjects/<output_id>
{
"publications": [
{
"path": [
"<project_name>",
"<dataset_name>"
],
"tableName": "<table_name>",
"targetType": "bigquery",
"action": "create"
}
]
}

Internationalization on serverless backend (AWS)

I'm building a serverless node.js web app on AWS (using Serverless Framework) and trying to implement internationalization on the backend (API Gateway/Lambda/DynamoDB).
For front-end(React), I use redux to store the selected language and react-intl to switch multiple languages. For the backend, what's the best way to implement internationalization?
Here are two ways I can think of, but there must be better ones.
A. Translate on the backend (Get language from path parameter)
path: {language}/validate
validate.js
export function main(event, context, callback) {
const language = event.pathParameters.language;
const data = JSON.parse(event.body);
callback(null, validate(language, data));
}
This way, I need to pass the language as a function parameter to everywhere, which is not desirable.
B. Translate on front-end (i18n, react-intl)
backend hello.js response
{
id: "samplePage.message.hello",
defaultMessage: `Hello, ${name}`,
values: { name }
}
frontend hello.js
<FormattedMessage {...response} />
ja.json (translation file for i18n)
{
"samplePage.message.hello": "こんにちは、{name}。",
}
This way, it looks like everything works fine without any trouble, but am I missing anything?
We do the same as you suggest in B)...basically we have our backend on AWS lambda and access data from dynamodb.
All our translation happens in the frontend. Only difference we use i18next (more specific react-i18next but makes no difference if this or react-intl -> just offers a little more backends, caching, language detection,... https://www.i18next.com/).
If you like to learn more or see it in action checkout https://locize.com (or directly try it at https://www.locize.io/ 14d free trial) while the app currently only is available in english all the texts comes in via xhr loading and get applied on runtime (i18n).
If interested in how we use serverless at locize.com see following slides from a speech we gave last year: https://blog.locize.com/2017-06-22-how-locize-leverages-serverless/
Last but not least...if you like to get most out of your ICU messages and validation, syntax highlighting and proper plural conversion and machine translation by not destroying the icu dsl during MT -> Just give our service a try...it comes with 14d free trial.