AWS SageMaker Domain Status "Update_Failed" due to custom image appImageConfigName error - amazon-web-services

I'm having some trouble recovering from failures in attaching custom images to my sagemaker domain.
I first created a custom image according to here.
When I use sagemaker console to attach the image built with sm-docker, it appears to successfully "attach" in the domain's image list, but when inspecting the image in the console, it shows an error:
Value '' at 'appImageConfigName' failed to satisfy constraint: Member
must satisfy regular expression pattern
This occurs even when the repository or tag are comprised of only alphanumeric characters.
After obtaining this error, I deleted the repositories in ECR.
Since then, the domain fails to update and I am unable to launch any apps or attempt to attach additional images.
The first issue I would like to address is restoring functionality of my sagemaker domain so I can further troubleshoot the issue. I am unable to delete the domain because of this error, even when there are no users, apps, or custom images associated with the domain.
The second issue I would like to address is being able troubleshoot the appImageConfigName error.
Thanks!

While I was unable to delete the domain via console, I was able to delete it via cli.

Related

Using Glyphs with Amazon Location Service and Mapbox-GL

I am using the Amazon Location Service with React, react-map-gl and Mapbox-GL. I can successfully load ESRI and HERE maps which suggests my authentication is OK but I seem to have trouble with accessing Glyphs (fonts). I am trying to add a cluster markers feature like this. I can add the points and load the base layer but when I try to add the point counts there is an error accessing the glyph. It is sending a request like this:
https://maps.geo.eu-west-1.amazonaws.com/maps/v0/maps/<MY_MAP>/glyphs/Noto%20Sans,Arial%20Unicode/0-255.pbf?<....SOME_AUTHENTICATION_STUFF>
This seems to match the request format shown here: https://docs.aws.amazon.com/location-maps/latest/APIReference/location-maps-api.pdf
But it responds with: {"message":"Esri glyph resource not found"}
I get a similar error message with HERE maps and different fonts. I have added the following to the action on the role with no success (it loads the map but not glyphs)
Tried this:
"geo:GetMap*"
And this:
"geo:GetMapStyleDescriptor",
"geo:GetMapGlyphs",
"geo:GetMapSprites",
"geo:GetMapTile"
What do I have to do to setup glyphs correctly in the Amazon Location Service? I have not configured anything just hoped they would naturally work. Have I missed a step? Can't see anything online about it.
Is there a work around where I could load the system font instead of a remote glyph?
I am using the following versions which are not the most recent as the most recent are incompatible with Amazon Location Service:
"mapbox-gl": "^1.13.0",
"react-map-gl": "^5.2.11",
The default font stack (Noto Sans, Arial Unicode) for the cluster layer isn't currently available via Amazon Location. You will need to change the font stack used by the cluster layer to something in the supported list: https://docs.aws.amazon.com/location-maps/latest/APIReference/API_GetMapGlyphs.html#API_GetMapGlyphs_RequestSyntax

Elasticsearch 6.3 (AWS) snapshot restore progress ERROR: "/_recovery is not allowed"

I take manual snapshots of an Elasticsearch index
These are stored in a snapshot repo on S3
I have created a new ES cluster, also version 6.3
I have connected the new cluster to the S3 snapshot repo via python script method mentioned in this blog post: https://medium.com/docsapp-product-and-technology/aws-elasticsearch-manual-snapshot-and-restore-on-aws-s3-7e9783cdaecb
I have confirmed that the new cluster has access to the snapshot repo via the GET /_snapshot/manual-snapshot-repo/_all?pretty command
I have initiated a snapshot restore to this new cluster via:
POST /_snapshot/manual-snapshot-repo/snapshot_name/_restore
{
"indices": "reports",
"ignore_unavailable": false,
"include_global_state": false
}
It is clear that this operation has at least partially succeeded as the cluster status has gone from "green" to "yellow" and a GET request to /_cluster/health yields information that suggests actions are occuring on an otherwise empty cluster... not to mention storage is starting to be utilized (when viewing cluster health on AWS).
I would very much like to monitor the progress of the restore operation.
Elasticsearch docs suggest to use the Recovery API. Docs Link: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-recovery.html
It is clear from the docs that GET /_recovery?human or GET /my_index/_recovery?human should yield restore progress.
However, I encounter the following error:
"Message": "Your request: '/_recovery' is not allowed."
I get the same message when attempting the GET command in the following ways:
Via Kibana dev tools
Via chrome address bar (It's just a GET operation after all)
Via Advanced REST Client (a Chrome app)
I have not been able to locate any other mention of this particular error message.
How can I utilize the GET /_recovery?human command on my ElasticSearch 6.3 clusters?
Thank you!
The Amazon managed Elasticsearch does not have all the endpoints available.
For version 6.3 you can check this link for the endpoints available, and _recovery is not on the list, that is why you get that message.
Without the _recovery endpoint you will need to rely on _cluster/health.

where does the google gcr images gets stored in google storage?

Unknowingly I have deleted the below buckets from my project
artifacts.<PROJECT-ID>.appspot.com
us.artifacts.<PROJECT-ID>.appspot.com
This has deleted all the images from gcr. Let me know if the above buckets are where the gcr images are stored or is it something else?
Also when I created a new image and pushed it to gcr, all the deleted images in gcr console got recovered. But whenever I try to pull any old image it is throwing "unknown blob" error.
Yes, these buckets are where the docker container artifacts are built and stored..(Artifacts being the build steps results, that add up to an image)
Then they are referenced by the Google Container Registry (i.e. gcr.io), but they will be still located in your bucket.
Since you removed the bucket and its contents, you will be missing old building steps from your built images, that's why you get the error pulling image configuration: unknown blob error message.
For example, I uploaded a new image following this documentation, and I removed the artifacts.<PROJECT-ID>.appspot.com bucket afterwards. Then I reuploaded it, using a tag (I used quickstart-image:tag1, and when pulling it this way:
docker pull gcr.io/wave16-joan/quickstart-image:latest
I got the error pulling image configuration: unknown blob error message, because it's missing the steps I already had in my previous build.
However, doing this:
docker pull gcr.io/wave16-joan/quickstart-image:tag1
Allowed me to pull the image, but the image wasn't usable.
Regarding your second question, I believe that the reason why you are seeing in the Container Registry references to the images you removed, it's because GCR is still saving the references to the steps from building these images, however since they are deleted, they are not able to be pulled.

How to add or get a launch path to a product in AWS Service Catalog using Javascript sdk

I'm using Javascript SDK of AWS to access Service Catalog in my Lambda function.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/ServiceCatalog.html#provisionProduct-property
I have successfully created portfolio and product and attached the product to this portfolio. When I try to provision the product it throws the error "No launch path is found". To get launch path list I hit the listLaunchPath API and it returns empty array with message "No launch path found for this product"
I have explored AWS Docs in detail but did not find any way to set launch path.
Can anybody guide me how to create and get a launch path for a product in AWS service Catalog?
The error message Unable to launch provisioned product because: No launch paths found for resource isn't super helpful. It can mean any of the following:
The product doesn't exist
The provisioning artifact doesn't exist
The product exists but it's in a failed state
You don't have access to the product
You don't have access to the product's portfolio
The product isn't associated with a portfolio
The launch path does not exist
Since the error message is not helpful, it doesn't tell you which of these are to blame.
To see how unhelpful the error message is, try this for fun:
% aws servicecatalog provision-product --provisioned-product-name no --product-id nope --provisioning-artifact-id nopity-nope
An error occurred (ResourceNotFoundException) when calling the ProvisionProduct operation: No launch paths found for resource: nope
Some pointers to getting it to work:
Associate the product to a portfolio.
Associate a principal that is or includes you to the portfolio.
Make sure the product is properly created by not using DisableTemplateValidation. When you create the product, you'll get an error if the template has an error.
Try describing the provisioning artifact to make sure it exists.
Try describing the product. If you can describe the product, it exists, and you have access. You should see a launch path as part of the product description. If you can describe the product but it doesn't have a launch path, I suspect the template is bad.
You need to add IAM Role/User/Group to portfolio where your product is attached to.
Use the Role/User/Group Creds to list launch path. It works

Trying to set up AWS IoT button for the first time: Please correct validation errors with your trigger

Has anyone successfully set up their AWS IoT button?
When stepping through with default values I keep getting this message: Please correct validation errors with your trigger. But there are no validation errors on any of the setup pages, or the page with the error message.
I hate asking a broad question like this but it appears no one has ever had this error before.
This has been driving me nuts for a week!
I got it to work by using Custom IoT Rule instead of IoT Button on the IoT Type. The default rule name is iotbutton_xxxxxxxxxxxxxxxx and the default SQL statement is SELECT * FROM 'iotbutton/xxxxxxxxxxxxxxxx' (xxx... = serial number).
Make sure you copy the policy from the sample code into the execution role - I know that has tripped up a lot of people.
I was getting the same error. The cause turned out to be that I had multiple certificates associated with the button. This was caused by me starting over again on the wizard, generating cert & key, loading cert & key again. While on the device itself this doesn't seem to be a problem, the result was that on AWS I had multiple certs associated to the device.
Within the AWS IoT Resources view I eventually managed to delete all resources. Took some fiddling to get certs detached and able to be deleted. Once I deleted all resources I returned to the wizard, created yet another cert & key pair, pushed the Lambda code, and everything works.