unable to delete custom plugin from datafusion instance - google-cloud-platform

I tried uploading a custom jar as cdap plugin and it has few errors in it. I want to delete that particular plugin and upload a new one. what is the process for it ? I tried looking for documentation and it was not much informative.
Thanks in advance!

You can click on the hamburger menu, and click on Control Center at the bottom of the left panel. In the Control Center, click on Filter by, and select the checkbox for Artifacts. After that, you should see the artifact being listed in the Control Center, which then you can delete.
Alternatively, we suggest that while developing, the version of the artifact should be suffixed with -SNAPSHOT (ie. 1.0.0-SNAPSHOT). Any -SNAPSHOT version can be overwritten simply by reuploading. This way, you don't have to delete first before deploying a patched plugin JAR.

Actually each Data Fusion instance is running in GCP tenant project inside fully isolated area, keeping all orchestration actions, pipeline lifecycle management tasks and coordination as a part of GCP managed scenarios, thus you can make a user defined actions within a dedicated Data Fusion UI or targeting execution environment via CDAP REST API HTTP calls.
The purpose for using Data Fusion UI is to create a visual design for data pipelines, controlling ETL data processing through different phases of data executions, therefore you can do the same accessing particular CDAP API inventory.
Looking into the origin CDAP documentation you can find Artifact HTTP RESTful API that offers a set of HTTP methods that you can consider to manage custom plugin operations.
Referencing GCP documentation, there are a few simple steps how to prepare sufficient environment, supplying INSTANCE_URL variable for the target Data Fusion instance in order to smoothly trigger API functions within HTTP call methods against CDAP endpoint, i.e.:
export INSTANCE_ID=your-instance-id
export CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \
--location=us-central1 \
--format="value(apiEndpoint)" \
${INSTANCE_ID})
When you are ready with above steps, you can push a particular HTTP call method, approaching specific action.
For plugin deletion, try this one, invoking HTTP DELETE method:
curl -X DELETE -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/system/artifacts/<artifact-name>/versions/<artifact-version>"

Related

Google Cloud Run service deployment, is it the best direction in my situation?

I have some experience with Google Cloud Functions (CF). I tried to deploy a CF function recently with a Python app, but it uses an NLP model so the 8GB memory limit is exceeded when the model is triggered. The function is triggered when a JSON file is uploaded to a bucket.
So, I plan to try Google Cloud Run but I have no experience with it. Also, I am not completely sure if it is the best course of action.
If it is, what is the best way of implementing provided that the Run service will be triggered by a file uploaded to a bucket? In CF, you can select the triggering event, in Run I didn't see anything like that. I could use some starting points as I couldn't find my case in the GCP documentation.
Any help will be appreciated.
You can use at least these two things:
The legacy one: Create a GCS notification in PubSub. Then create a push subscription and add the Cloud Run URL in the HTTP push destination
A more recent way is to use Eventarc to invoke directly a Cloud Run endpoint from an event (it roughly create the same thing with a PubSub topic and push subscription, but it's fully configured for you)
EDIT 1
When you use Push notification, you will received a standard PubSub message. The format is described in the documentation for the attributes and for the body content; keep in mind that the raw content is base64 encoded and you have to decode it to get the final format
I personally have a Cloud Run service that log the contents of any requests to be able to get in the logs all the data that I need to develop. When I have a new message format, I configure the push to that Cloud Run endpoint and I automatically get the format
For Eventarc, the format will be added to the UI soon (I view that feature in preview, but it's not yet available). The best solution is to log the content to know what you get to know what to do!

Update GCP asset labels

What is the most efficient way to update all assets labels per project?
I can list all project resources and their labels with gcloud asset search-all-resources --project=SomeProject. The command also returns the labels for those assets.
Is there something like gcloud asset update-labels?
I'm unfamiliar with the service but, APIs Explorer (Google's definitive service documentation), shows a single list method.
I suspect (!?) that you will need to iterate over all your resource types and update instances of them using any (there may not be) update (PATCH) method that permits label changes for that resource type.
This seems like a reasonable request and you may wish to submit a feature request using Google's issue tracker
gcloud does not seem to have a update-labels command.
You could try the Cloud Resource Manager API. For example, call the REST or Python API: https://cloud.google.com/resource-manager/docs/creating-managing-labels#update-labels

Amazon SageMaker Model Registry / Pipelines - how to manually set a Stage for a given Model Version?

This might be a very specific question, but I will try anyway.
I want to explicitly set the Stage column in Model registry for a given Model Version:
This picture comes from the documentation and it gets set only when you run the example SageMaker Projects MLOps Templates they provide. When I create the Model Package (i.e. Model Version) manually, the column remains empty. How do I set it? What API do I call?
Additionally, the documentation on browsing the model version history has a following sentence
How do we send that exact event ("Deployed to stage XYZ") manually?
I already thoroughly went over all the files SageMaker MLOps Project generates (CodeBuild Builds, CodePipeline, CloudFormation, various .py files, SageMaker Pipeline) but could not find any direct and explicit call for that event.
I think it may be somehow connected to the Tag sagemaker:deployment-stage but I've already set it on Endpoint, EndpointConfiguration and Model, with no success. I also tried to blindly call the UpdateModelPackage API and set Stage in CustomerMetadataProperties. Again - no luck.
The only thing I get in that Activity tab is that given Model Version is deployed to Inference endpoint:
You can set the status with the ModelApprovalStatus parameter in the create_model_package API or the update_model_package API
Model package state change should create an event in EventBridge (like many other SageMaker events) https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html#eventbridge-model-package, which enables you to run the automation of your choice.
In the default SageMaker Pipelines Project template, you can see the EventBridge-driven proposed logic in the CodePipeline pipeline created for deployment: you can see on top "Trigger - CloudWatchEvent".
You don't see the event source as code in the git, because the status change is expected to be done in the Studio model registry UI in that demo template.
Those EventBridge events emitted by the Model Registry can also be seen in few blogs:
Taming Machine Learning on AWS with MLOps: A Reference Architecture
Patterns for multi-account, hub-and-spoke Amazon SageMaker model registry
Build MLOps workflows with Amazon SageMaker projects, GitLab, and GitLab pipelines
I was having the exact same issue, I wanted to change the model stage but could not find where it was being done in the sample code AWS provides.
After some research and looking into the sample code I realized that it was being done in the cloud formation execution. First they add the tag
'sagemaker:deployment-stage': stage_config['Parameters']['StageName']
and then the cloud formation execution (cfnUpdate call) updates the stage and deploys.
I couldn't find another way to change the state with a call to update_model_package or other methods.

Copy a GCR image from one project to another

I aim to copy a gcr image from one project to another as soon as the image lands in the container registry of the first project. I am aware of the gcloud container images add-tag command, looking for a more automated option. Also the second project where the image has to be copied is protected by VPC-SC. Any leads will be appreciated...
I understand that you are looking for the best way to mirror the GCR images between two projects. Currently, you can follow the workaround in this document click to copy the container images for your use case. At the moment, the only way to move between two registries is by pulling from one and pushing to another, if you have the right permission. There is currently a tool on github that can automate this for you, gcrane click . However, for mirroring the container images between two projects, a feature request has already been submitted but there is no ETA.
According to the GCP documentation click , If the project is protected by VPC-SC, the container registry does not use googleapis.com domain. To achieve this, container registry need to configured via private DNS or BIND to map to the restricted VIP separately from other APIs.
When a change is made to a container registry that you own, a Pub/Sub message can be published. You can use this Pub/Sub message as a trigger to perform work. My immediate thought would be to create a Cloud Function that is triggered by the arrival of a message which then fires off a Cloud Build recipe. The Cloud Build would perform a docker pull of your original image and then a tag rename and then a docker push. It feels like this would be 100% automated and use components that are designed for CI/CD pipelines.
References:
Configuring Pub/Sub notifications
Cloud Build documentation

Automatically instrumenting Java application using AWS X-Ray

I am trying to achieve automatic instrumentation of all calls made by AWS SDKs for Java using X-Ray.
The X-Ray SDK for Java automatically instruments all AWS SDK clients when you include the AWS SDK Instrumentor submodule in your build dependencies.
(from the documentation)
I have added these to my POM
aws-xray-recorder-sdk-core
aws-xray-recorder-sdk-aws-sdk
aws-xray-recorder-sdk-spring
aws-xray-recorder-sdk-aws-sdk-instrumentor
and am using e.g. aws-java-sdk-ssm and aws-java-sdk-sqs.
I expected to only have to add the X-Ray packages to my POM and provide adequate IAM policies.
However, when I start my application I get exceptions such as these:
com.amazonaws.xray.exceptions.SegmentNotFoundException: Failed to begin subsegment named 'AWSSimpleSystemsManagement': segment cannot be found.
I tried wrapping the SSM call in a manual segment and so that worked but then immediately the next call from another AWS SDK throws a similar exception.
How do I achieve the automatic instrumentation mentioned in the documentation? Am I misunderstanding something?
It depends on how you make AWS SDK calls in your application. If you have added X-Ray servlet to your spring application per https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-java-filters.html then each time your application receives a request, the X-Ray servlet filter will open a segment and store it in the thread serving that request. Any AWS SDK calls you make as part of that request/response cycle will pick up that segment as the parent.
The error you got means the the X-Ray instrumentor tries to record the AWS API call to a subsegment but it cannot find a parent (which request this call belongs to).
Depending on your use case you might want to explicitly instrument certain AWS SDK clients and left others plain, if some of those clients are making calls in a background worker.