AWS CodeDeploy - deploy using the incorrect revision files - amazon-web-services

been banging my head against the wall trying to get an unbelievably simple CodeDeploy run going. The behavior I'm seeing suggests either a configuration issue or an issue local to the running agent. Basically, deployments are not using the files explicitly supplied to them - they're stuck implicitly using a prior version.
Having created an application and deployment group (and ensuring all prerequisites are in place such as the agent and roles are correctly assigned), I'm creating a deployment with via zipping up my code folder (at the root, not including the code's containing folder). There were a few issues to fix in a few of the hook steps, but I was able to fix a couple of them by changing the code, re-zipping and re-uploading before things got particularly weird. There was a syntax issue in my ApplicationStart hook script (when I finally got that far), so I fixed it and re-uploaded as before. However, the same syntax error occurred. I tried re-uploading, deleting all my S3 files and re-uploading, downloading the listed revision files and checking their contents (changes were reflected), but the same syntax error occurred. I even deleted the script and hook step completely from the yml file and it still happened, so clearly the deployment system is "stuck" in some sense. I went as far as feeding it a completely empty text file, told it it was a tar file, and it's still running my old revision. It's as though the runner agent's local files are stale and it's failing to clear local contents.
What's the deal? I feel like I've missed something fundamental.
edit - I created an entirely identical but new deployment group and re-tried the deployment with my new files and it worked. So that deployment group itself is stuck.

Related

Running into the "cannot find module" with AWS lambda... sometimes

Been working on some lambdas and all of a sudden I started running into this error. The strange thing is that everything was working fine until a certain point in time, at which point all get requests to that endpoint returned an internal server error. I looked in cloud watch to find the "Runtime.ImportModuleError" error type. "Cannont find module 'yadda yadda' \n require stack....."
I have tried several things. I tried using 7zip instead of compress-archive (just made it a different module that couldn't be found). I tried removing aws-sdk from the package.json and this is the part that really makes me feel insane. After removing that, rezipping, and reuploading this made it work. I thought it was solved so I resumed working on the lambda, but when I zipped and uploaded again it was back to the same error. At this point I took a break, looked some stuff up, and came back. The same lambda that was previously not working now worked perfectly fine with no changes from me. Tried zipping and uploading again, back to module cannot be found error. I'm at my wits end here, what the heck could be happening?
From the error you get from the cloudwatch its clear that some of the module that your lambda function needs is not present in the node_module directory and removing it from the package.json doesn't affect the lambda function because the lambda doesn't use the package.json to install the dependencies, this has been done manually.
For dependencies Lambda completely relies on the modules that you attached to your zipped( or s3:// bucket) file.
There are few things you can check to analyze
Re-align your package.json file with required dependencies and run the npm install
It could be possible that your lambda function using lambda layers for dependencies and that layers weren't included in the configuration.

Cloud Function build error - failed to get OS from config file for image

I'm seeing this Cloud Build error when I try to deploy a Cloud Function:
"Step #2 - "analyzer": [31;1mERROR: [0mfailed to initialize cache: failed to create image cache: accessing cache image "us.gcr.io/MY_PROJECT/gcf/us-central1/SOME_KEY/cache:latest": failed to get OS from config file for image 'us.gcr.io/MY_PROJECT/gcf/us-central1/SOME_KEY/cache:latest'"
I'm able to build and emulate the cloud function locally, but I can't deploy it due to this error. I was able to deploy just fine until now. I've looked everywhere and I can't find any discussion about this. Anyone know what's going on here?
UPDATE: I deployed a new function 3 days ago and now I can't seem to deploy an update to it. I get the same error. I'm fairly sure this is happening due to the lifecycle rule I set up to ensure I don't keep storing images of functions: Firebase storage artifacts is huge and keeps increasing. This rule is important to keep around because I don't want to pay for unnecessary storage, but it seems like it might be the source of our problem here. Can someone from Google look into this?
I got the same error, even for code that deployed successfully before.
A workaround is to delete the Docker images for the failing Firebase functions inside Container Registry and re-deploying the functions. (The images will be re-created upon deploying.)
The error still occurs sporadically, so I suspect this may be a bug introduced in Firebase's deployment process. Thankfully for now, the workaround above resolves the issue every time the error comes up.
I also encountered the same problem, and solved it by deleting the images in the Container Registry of Firebase Project.
I made a Script at that time, and I'll put it here. The usage is as follows. Please use it if you like.
Install the Google Cloud SDK.
Download the Script
Edit CONTAINER_REGISTRY to your registry name. For example: CONTAINER_REGISTRY=asia.gcr.io/project-name/gcf/asia-northeast1
Grant execute permission. - $ chmod +x script.sh
Execute it. - $ sh script.sh
Deploy your functions.
I'm having the same problem for the last few days and in contact with the support. I had the same log and in my case it wasn't connected to the artifacts because the artifacts rebuild themselves automatically on deploy (read below about a subtle case related to the artifacts and how to fix it), but deleting the functions and redeploying solved it for me.
Artifacts auto cleanup
Note that if the artifacts bucket is empty, then the problem is somewhere else.
But if it's not empty, what you can do to resolve any possible problems related to the artifacts auto cleanup, is to delete the whole "container" folder manually in the artifacts which should solve it. Then just redeploy again.
Make sure not to delete the artifacts bucket itself!
Dough from firebase confirmed in the question you referring to that removing the artifacts content is safe.
So, here is how to delete it:
go to the google cloud console, select your project -> storage -> browser https://console.cloud.google.com/storage/browser
Select the "artifacts" bucket
Choose "containers" and delete it
If the problem was here, it should work fine after that.
This happens because the deletion rule you refer to in your question checks the "last updated" timestamp of each file while on redeploy only some files are updated. So the next day the rule will delete some of the files while leaving the others which will lead to the inconsistent state of the bucket in this case. So you just remove everything manually.

Why is my cloud run deploy hanging at the "Deploying..." step?

Up until today, my deploy process has worked fine. Today when I go to deploy a new revision, I get stuck at the Deploying... text with a spinning indicator, and it says One or more of the referenced revisions does not yet exist or is deleted. I've tried a number of different images and flags -- all the same.
See Viewing the list of revisions for a service, in order to undo whatever you may have done.
Probably you have the wrong project selected, if it does not know any of the revisions.
I know I provided scant information, but just to follow up with an answer: it looks like the issue was that I was deploying a revision, and then immediately trying to tag it using gcloud alpha run services update-traffic <service_name> --set-tags which looks to have caused some sort of race, where it complained that the revision was not yet deployed, and would hang indefinitely. Moving the set-tag into the gcloud alpha run deploy seemed to fix it.

Missing Jenkins Job Information

Recently I restarted my AWS instance and got a new IP address but after I restarted both Jenkins and AWS, the information about my previous jobs was no longer shown in Jenkins.
I checked the path and it still exists in the instance but it is not shown on the web. I tried to create another project and it still created in the same path just that only the newly created project is in. Any suggestions on how to recover my missing projects??
FYI
I have lots of old plugins that mentions "xxx failed to load" so I do not know if that is causing it.
one of my plugins does not match and all those that depends on it will fail to show on the installed section of the plugin. Thus I remove all the plugins by deleting it directly from the plugin folder and check for the working copy that was on my previous version and download the same version of plugins. After which, all the jobs come back on screen

Jenkins triggered code deploy is failing at ApplicationStop step even though same deployment group via code deploy directly is running successfully

When I trigger via Jenkins (code deploy plugin), I get the following error -
No such file or directory - /opt/codedeploy-agent/deployment-root/edbe4bd2-3999-4820-b782-42d8aceb18e6/d-8C01LCBMG/deployment-archive/appspec.yml
However, if I trigger deployment into the same deployment group via code deploy directly, and specify the same zip in S3 (obtained via Jenkins trigger), this step passes.
What does this mean, and how do I find a workaround to this? I am currently working on integrating a few things and so, will need to deploy via code deploy and via Jenkins simultaneously. I will run the code deploy triggered deployment when I will need to ensure that the smaller unit is functioning well.
Update
Just mentioning another point, in case it applies. I was previously using a different codedeploy "application" and "deployment group" on the same ec2 instances, and deplying using jenkins and code deploy directly as well. In order to fix some issue (not allowing to overwrite existing files due to failed deployments, allegedly), I had deleted everything inside the /opt/codedeploy-agent/deployment-root/<directory containing deployments> directory, trying to follow what was mentioned in this answer. However, note that I deleted only items inside that directory. Thereafter, I started getting this error appspec.yml not found in deployment archive. So, then I created a new application and deployment group and since then, I am working on it.
So, another point to consider is whether I should do some further cleanup, if the jenkins triggered deployment is somehow still affected by those deletions (even though it is referring to the new application and deployment group).
As part of its process, CodeDeploy needs to reference previous deployments for Redeployments and Deployment Rollbacks operations. These references are maintained outside of the deployment archive folders. If you delete these archives manually as you indicate, then a CodeDeploy install can get fatally corrupted: the references left to previous deployments are no longer correct or consistent, and deploys will fail.
The best thing at this point is to remove the old installation completely, and re-install. This will allow the code deploy agent to work correctly again.
I have learned the hard way not to remove/modify any of the CodeDeploy install folders or files manually. Even if you change apps or deployment groups, CodeDeploy will figure it out itself, without the need for any manual cleanup.
In order to do a deployment, the bundle needs to contain a appspec.yml file, and the file needs to be put at the top directory. Seems the error message is due to the host agent can't find the appspec.yml file.