AWS CDK unstable deployment of Lambda CustomResource - amazon-web-services

I use cdk to deploy my AWS stack. It's NextJS app with RDS instance. Initialization Database I do using CustomResource approach (Lambda build from Docker image) as suggested that Article
Sometimes my deployment fails with error message
Received response status [FAILED] from custom resource. Message returned: Connection timed out after 120000ms
Im sure because my database init takes to much time. I do filling the database with "INSERT INTO" SQL queries that repeat about 5000 times.
Could you advise how to avoid that error because deployment script is unstable and I can't rely on it? Many thanks.

Related

AWS fargate tasks won't start reliably

I have an ECS cluster with a bunch of different tasks in it (using the same docker image but with different environment variables).
Some of the tasks come up without problem but others fail a lot even though i've used the same VPC, subnet and security-group. The error message shows ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post https://api.ecr..
Bizarre is that the same task sometimes comes up if i create a new task definition or delete the ECR repository and re-upload the docker image.
I'm unable to draw any conclusion out of this..
Update: strange... the task starts successfully when i deregister the task definition and recreate it with the same specs. But only once..
It turns out one have to select the taskExecution role on Task Role - override and Task Execution Role - override in the run task Advanced Options section when starting the task. I don't know why it was arbitrarily working when randomly trying or working when i recreated the task definition every time.

Creating RDS Proxy and RDS Cluster with CDK fails the first time

I'm trying to create a DB stack, it has and RDS Cluster, and RDS Proxy and some other stuff.
When I first run cdk deploy always fails half way with the following message:
Embedded stack
arn:aws:cloudformation:us-east-2:####:stack/MainStack-DBNestedStackDBNestedStackResource####/###
was not successfully created: The following resource(s) failed to create: [memcachedApp, Parameters6795E5B4, DatabaseproxyIAMRole7D0578A1].
If I run again cdk deploy it works.
I'm trying to find out what is causing the error but I don't know where to look and because the ROLLBACK start immediately, I can't see any more errors.
Do you know if there is something I missing?

Kubernetes: Get mail once deployment is done

Is there a way to have post deployment mail in kubernetes on GCP/AWS ?
It has become harder to maintaining deployment on kubernetes once deployment team size grows. Having a post deployment mail service will ease up the process. As it'll also say who applied the deployment.
You could try to watch deployment events using https://github.com/bitnami-labs/kubewatch and webhook handler.
Another thing could be implementing customized solution with kubernetes API, for instance in python: https://github.com/kubernetes-client/python then run it as a separate notification pod in your cluster
Third option is to have deployment managed in ci/cd pipeline where actual deployment execution step is "approval" type, you should see user who approved and next step in the pipeline after approving could be the email notification
Approval in circle ci: https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval
I don’t think such feature is built-in in Kubernetes.
There is a watch mechanism though, what you could use. Run the following GET query:
https://<api-server-url>/apis/apps/v1/namespace/<namespace>/deployments?watch=true
The connection will not close and you’ll get a “notification” about each deployment. Check the status fields. Then you can send the mail or do something else.
You’ll need to pass an authorization token to gain access to the API server. If you have kubectl setup, you could run a local proxy, which then won’t need the token: kubectl proxy.
You can attach handlers to container lifecycle events. Kubernetes supports preStop and postStart events. Kubernetes sends the postStart event immediately after the container is started. Here is the snippet of the pod manifest deployment file.
spec:
containers:
- name: <******>
images: <******>
lifecycle:
postStart:
exec:
command: [********]
Considering GCP, one option could be create a filter to get the info about your deployment finalization at Stackdriver Logging, and with the filter you can use the CREATE METRIC option, also in Stackdriver Logging.
With the metric created, use Stackdriver Monitoring to create an alert to send e-mails. More details at official documentation.
It looks like no one has mentioned "native tool" Kubernetes provides for that yet.
Please note, that there is a concept of Audit in Kubernetes.
It provides a security-relevant chronological set of records documenting the sequence of activities that have affected system by individual users, administrators or other components of the system.
Each request on each stage of its execution generates an event, which is then pre-processed according to a certain policy and processed by certain backend.
That allows cluster administrator to answer the following questions:
what happened?
when did it happen?
who initiated it?
on what did it happen?
where was it observed?
from where was it initiated?
to where was it going?
Administrator can specify what events should be recorded and what data they should include with the help of Audit policy/ies.
There are a few backends that persist audit events to an external storage.
Log backend, which writes events to a disk
Webhook backend, which sends events to an external API
Dynamic backend, which configures webhook backends through an AuditSink API object.
In case you use log backend, it is possible to collect data with tools such as a fluentd. With that data you can achieve more than just a post deployment mail in Kubernetes.
Hope that helps!

GCP Terraform Apply and Destroy errors: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: net/http: TLS handshake timeout

I've been using GCP and Terraform for a few months - just creating some basic VMs and firewall resources for testing.
Increasingly, about 50% of the time when applying and 100% of the time when trying to destroy an environment using Terraform, I get the following error:
Error creating Firewall: Post https://www.googleapis.com/compute/beta/projects/mkdemos-219107/global/firewalls?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: net/http: TLS handshake timeout
To destroy, the only way is to log into the console, manually delete the resource and rm my local terraform state files.
Its the intermittent nature of this that is driving me crazy. I've tried creating a new project, re-creating a new json with service credentials and still the same behaviour.
If it consistently failed or had been doing this all the time, I'd assume there was something wrong in my Terraform template or the way I've setup the GCP Service Account. But sometimes it works - sometimes it doesn't - it makes no sense and is making using GCP unworkable for testing.
If anyone has any similar experience of this I'd welcome some thoughts. Surely it can't just be me?? ;-)
FYI:
Terraform: v0.11.7
provider.google: v1.19.0
Mac OSX: 10.13.1
Cheers.
there might be a strange solution. please check that another user in your OS is able to do terraform commands. It means that problem is located in your user profile.
Finally, If it works then try backup and delete all certificates in your login keychain. Retry the terraform commands

Best practices with OpsWorks Setup Failure

yesterday I setup our AWS OpsWorks Bench. We are using a custom cookbook which we are hosting on GitHub. I saw that the setup process failed and had a look in the logs. I saw that the custom cookbook could not be fetched from GitHub because they had server problems. Therefor the setup on the server failed and the process stopped.
Does anyone know if I could handle that sort of failures and restart the setup process till it is done?
One way to avoid issues like this is to host your assets on S3. Alternatively you can poll the deployment status to determine if it succeeds or fails and then have some retry logic.