I get this error after creating a new instance in an autoscale group and trying to deploy. I also get this error if I create a new instance (not part of autoscale group) and deploy to it.
If I log into this newly created instance, and restart the code deploy agent, and try deploying again, it succeeds. It will now succeed every time.
If I create an image of this instance at this point, and use this image as base for a new auto-scale group, the deploy fails again.
Since I cannot restart the agent during the auto-scale setup, auto-scaling always fails.
Does anybody have any ideas on how to fix this?
I use AWS code pipeline to pull from GitHub. there are no UTF8 issues in the repo. I confirmed line endings are correct too. I converted all non-UTF8 text files to UTF8 to prove this.
There is a recent fix for Ruby encoding issue:
https://github.com/aws/aws-codedeploy-agent/commit/c2f6489a8429c5f09470fa8e354c5406ec4a4d6a.
For this error to happen, it is likely that you have foreign characters as file names that will be deployed onto instances.
Check the format of the appspec.yml file. You very likely have an encoding/line end problem there.
Related
I'm seeing this Cloud Build error when I try to deploy a Cloud Function:
"Step #2 - "analyzer": [31;1mERROR: [0mfailed to initialize cache: failed to create image cache: accessing cache image "us.gcr.io/MY_PROJECT/gcf/us-central1/SOME_KEY/cache:latest": failed to get OS from config file for image 'us.gcr.io/MY_PROJECT/gcf/us-central1/SOME_KEY/cache:latest'"
I'm able to build and emulate the cloud function locally, but I can't deploy it due to this error. I was able to deploy just fine until now. I've looked everywhere and I can't find any discussion about this. Anyone know what's going on here?
UPDATE: I deployed a new function 3 days ago and now I can't seem to deploy an update to it. I get the same error. I'm fairly sure this is happening due to the lifecycle rule I set up to ensure I don't keep storing images of functions: Firebase storage artifacts is huge and keeps increasing. This rule is important to keep around because I don't want to pay for unnecessary storage, but it seems like it might be the source of our problem here. Can someone from Google look into this?
I got the same error, even for code that deployed successfully before.
A workaround is to delete the Docker images for the failing Firebase functions inside Container Registry and re-deploying the functions. (The images will be re-created upon deploying.)
The error still occurs sporadically, so I suspect this may be a bug introduced in Firebase's deployment process. Thankfully for now, the workaround above resolves the issue every time the error comes up.
I also encountered the same problem, and solved it by deleting the images in the Container Registry of Firebase Project.
I made a Script at that time, and I'll put it here. The usage is as follows. Please use it if you like.
Install the Google Cloud SDK.
Download the Script
Edit CONTAINER_REGISTRY to your registry name. For example: CONTAINER_REGISTRY=asia.gcr.io/project-name/gcf/asia-northeast1
Grant execute permission. - $ chmod +x script.sh
Execute it. - $ sh script.sh
Deploy your functions.
I'm having the same problem for the last few days and in contact with the support. I had the same log and in my case it wasn't connected to the artifacts because the artifacts rebuild themselves automatically on deploy (read below about a subtle case related to the artifacts and how to fix it), but deleting the functions and redeploying solved it for me.
Artifacts auto cleanup
Note that if the artifacts bucket is empty, then the problem is somewhere else.
But if it's not empty, what you can do to resolve any possible problems related to the artifacts auto cleanup, is to delete the whole "container" folder manually in the artifacts which should solve it. Then just redeploy again.
Make sure not to delete the artifacts bucket itself!
Dough from firebase confirmed in the question you referring to that removing the artifacts content is safe.
So, here is how to delete it:
go to the google cloud console, select your project -> storage -> browser https://console.cloud.google.com/storage/browser
Select the "artifacts" bucket
Choose "containers" and delete it
If the problem was here, it should work fine after that.
This happens because the deletion rule you refer to in your question checks the "last updated" timestamp of each file while on redeploy only some files are updated. So the next day the rule will delete some of the files while leaving the others which will lead to the inconsistent state of the bucket in this case. So you just remove everything manually.
I'm currently using the "Ruby 2.6 running on 64bit Amazon Linux 2/3.0.2" image, and by looking, inside the EC2 instance at the /var/logs/eb-engine.log ("eb logs" command won't show me this), there is a recurring error:
[ERROR] failed to parse JSON file
/opt/elasticbeanstalk/deployment/app_version_manifest.json with error:
json: cannot unmarshal string into Go struct field
AppVersionManifest.Serial of type uint64
When I check that file, I do not know what is wrong with it, or what is preventing that file from being parsed, if that is actually the problem:
{ "RuntimeSources":{"my_api":{"my_api-source_alfa0.2":"s3url":""}}},"DeploymentId":9,"Serial":"23","VersionLabel":"my_api-source_alfa0.2"}
The serial "23" seems pretty parsable to me. Please help!
What causes this
I believe this is a bug.
In some cases, this can occur if you try to terminate or rebuild your Elastic Beanstalk environment and the operation fails to delete your AWSEBSecurityGroup.
There are reports (see comments) of other causes besides this.
How to fix it
The AWS document How do I terminate or rebuild my AWS Elastic Beanstalk environment when the AWSEBSecurityGroup fails to delete? describes how to resolve this, but I excerpted the main steps below, in case that link ever breaks:
Open the AWS CloudFormation console.
From the Stack Name column, choose the stack that failed to delete.
Note: The Status column of your stack shows DELETE_FAILED.
From the Actions menu, choose Delete Stack.
In the Delete Stack pop-up window, choose AWSEBSecurityGroup, and then choose Yes, Delete.
Terminate or rebuild the Elastic Beanstalk environment.
The linked docs have other steps if you prefer the CLI or have a more complex setup.
Then what?
After you've deleted the group and rebuilt your environment, you won't get the app_version_manifest.json error any more. Deploy your app.
Once it's done, if you SSH in and run…
cat /opt/elasticbeanstalk/deployment/app_version_manifest.json
…you'll notice that Serial is now correctly represented as a JSON number.
I recently added a few new DAGs to production airflow and as a result decided to scale up the number of nodes in the Composer pool. After doing so I got the error: Can't decrypt _val for key=<KEY>, invalid token or value. This happens now for every single DAG that uses variables. It's not the same key either, it depends on what variables the DAG needs.
I immediately scaled Composer back down to 3 nodes and the problem persisted.
I have tried re-saving all of the Variables, recreating them in the UI (which says they are all valid), recreating them in the CLI (which lists invalid for every single one).
I have also tried updating configuration to try and reboot the server, and manually stopping the VM instances.
Composer also seems to negate the ability to update the Fernet Key, so I can't try and use a new one. For some reason it appears that the permanent one Composer has assigned is now invalid.
Is there anything else that can be done to remedy that problem short of recreating the environment?
I managed to fix this problem by adding a new python package. It seems that adding a package is the only way to really "reboot" the environment. The reboot invalidated all of my variables and connections when it had finished but I was able to just add those back in rather than having to recreate the entire environment.
Heard back about this issue: According to Google, Composer creates a custom image for the environment and passes one to each node, and if that got corrupted during scaling then the only way to fix it is by adding a new python package so it rebuilds the image. Incidentally, version 1.3.0 of Composer is much better as the scheduler is restarted every 10 minutes which should solve some of the latter issues I experienced.
I am building a web-app using Elastic beanstalk on AWS and was wondering if there was a way that I can edit the source code without having to re-upload a zip of my application every time I want to make an edit.
Thanks!
The Elastic Beanstalk environment is based on EC2 instances. You can connect to your instances using SSH and, inside the instance, download your source code. If you use a non compiled language, like Javascript (Node) or Python you can edit the code directly. If you use Java you will need to upload the source code and compile it. Maybe using the environment JDK.
But keep in mind two details:
You must install your compiled/edited code in the same path used by elastic beanstalk;
If your instance is reinitialized, your changes will be lost, because in this case eb will get a fresh copy of your code based on your last upload.
When I trigger via Jenkins (code deploy plugin), I get the following error -
No such file or directory - /opt/codedeploy-agent/deployment-root/edbe4bd2-3999-4820-b782-42d8aceb18e6/d-8C01LCBMG/deployment-archive/appspec.yml
However, if I trigger deployment into the same deployment group via code deploy directly, and specify the same zip in S3 (obtained via Jenkins trigger), this step passes.
What does this mean, and how do I find a workaround to this? I am currently working on integrating a few things and so, will need to deploy via code deploy and via Jenkins simultaneously. I will run the code deploy triggered deployment when I will need to ensure that the smaller unit is functioning well.
Update
Just mentioning another point, in case it applies. I was previously using a different codedeploy "application" and "deployment group" on the same ec2 instances, and deplying using jenkins and code deploy directly as well. In order to fix some issue (not allowing to overwrite existing files due to failed deployments, allegedly), I had deleted everything inside the /opt/codedeploy-agent/deployment-root/<directory containing deployments> directory, trying to follow what was mentioned in this answer. However, note that I deleted only items inside that directory. Thereafter, I started getting this error appspec.yml not found in deployment archive. So, then I created a new application and deployment group and since then, I am working on it.
So, another point to consider is whether I should do some further cleanup, if the jenkins triggered deployment is somehow still affected by those deletions (even though it is referring to the new application and deployment group).
As part of its process, CodeDeploy needs to reference previous deployments for Redeployments and Deployment Rollbacks operations. These references are maintained outside of the deployment archive folders. If you delete these archives manually as you indicate, then a CodeDeploy install can get fatally corrupted: the references left to previous deployments are no longer correct or consistent, and deploys will fail.
The best thing at this point is to remove the old installation completely, and re-install. This will allow the code deploy agent to work correctly again.
I have learned the hard way not to remove/modify any of the CodeDeploy install folders or files manually. Even if you change apps or deployment groups, CodeDeploy will figure it out itself, without the need for any manual cleanup.
In order to do a deployment, the bundle needs to contain a appspec.yml file, and the file needs to be put at the top directory. Seems the error message is due to the host agent can't find the appspec.yml file.