The GCP command appcfg has been deprecated. appcfg used to have appcfg rollback to be used when there is a failed deployment.
What is its equivalent for gcloud (the new command)? I can't find it in Google GCP documentation.
More context:
Rolling back in appcfg was not meant for managing the traffic and going back to the previous version. It was used to remove the lock on your deploy.
If you had an unsuccessful deployment, you were not able to deploy any more. appcfg rollback was used to remove that lock and make it available for you to deploy again.
I think there is no direct command to appcfg rollback. However, I would highly recommend you to consider the Splitting the traffic option.
This will let you redirect the traffic from one of your versions to another, even between old versions of your service.
Let's imagine this:
You have version 1 of your service and it works just fine.
A couple weeks later you decide to deploy a new the version: version 2
However, the deploy fails and your app is completely down. You are loosing users and money. Everything is on fire.
You can easily switch the traffic to the trusty version 1 by redirecting 100% of the traffic to it.
Version 2 is out of the game until you deploy a new version.
The advantage of this is that you don't have to wait until the rollback is done. The traffic is automatically redirected to an old version. Additionally, it has the gcloud set traffic command for you to run it via the CLI.
Hope this is helpful!
Related
I have an ECS task which has 2 containers using 2 different images, both hosted in ECR. There are 2 GitHub repos for the two images (app and api), and a third repo for my IaC code (infra). I am managing my AWS infrastructure using Terraform Cloud. The ECS task definition is defined there using Cloudposse's ecs-alb-service-task, with the containers defined using ecs-container-definition. Presently I'm using latest as the image tag in the task definition defined in Terraform.
I am using CircleCI to build the Docker containers when I push changes to GitHub. I am tagging each image with latest and the variable ${CIRCLE_SHA1}. Both repos also update the task definition using the aws-ecs orb's deploy-service-update job, setting the tag used by each container image to the SHA1 (not latest). Example:
container-image-name-updates: "container=api,tag=${CIRCLE_SHA1}"
When I push code to the repo for e.g. api, a new version of the task definition is created, the service's version is updated, and the existing task is restarted using the new version. So far so good.
The problem is that when I update the infrastructure with Terraform, the service isn't behaving as I would expect. The ecs-alb-service-task has a boolean called ignore_changes_task_definition, which is true by default.
When I leave it as true, Terraform Cloud successfully creates a new version whenever I Apply changes to the task definition. (A recent example was to update environment variables.) BUT it doesn't update the version used by the service, so the service carries on using the old version. Even if I stop a task, it will respawn using the old version. I have to manually go in and use the Update flow, or push changes to one of the code repos. Then CircleCI will create yet aother version of the task definition and update the service.
If I instead set this to false, Terraform Cloud will undo the changes to the service performed by CircleCI. It will reset the task definition version to the last version it created itself!
So I have three questions:
How can I get Terraform to play nice with the task definitions created by CircleCI, while also updating the service itself if I ever change it via Terraform?
Is it a problem to be making changes to the task definition from THREE different places?
Is it a problem that the image tag is latest in Terraform (because I don't know what the SHA1 is)?
I'd really appreciate some guidance on how to properly set up this CI flow. I have found next to nothing online about how to use Terraform Cloud with CI products.
I have learned a bit more about this problem. It seems like the right solution is to use a CircleCI workflow to manage Terraform Cloud, instead of having the two services effectively competing with each other. By default Terraform Cloud will expect you to link a repo with it and it will auto-plan every time you push. But you can turn that off and use the terraform orb instead to run plan/apply via CircleCI.
You would still leave ignore_changes_task_definition set to true. Instead, you'd add another step to the workflow after the terraform/apply step has made the change. This would be aws-ecs/run-task, which should relaunch the service using the most recent task definition, which was (possibly) just created by the previous step. (See the task-definition parameter.)
I have decided that this isn't worth the effort for me, at least not at this time. The conflict between Terraform Cloud and CircleCI is annoying, but isn't that acute.
I am using a Spinnaker implementation set up on GCP using the spinnaker-for-gcp tools. My initial setup worked fine. However, we recently had to re-configure our GKE clusters (independently of Spinnaker). Consequently I deleted and re-added our gke-accounts. After doing that the Spinnaker UI appears to show the existing GKE-based applications but if I click on any of them there are no clusters or load balancers listed anymore! Here are the spinnaker-for-gcp commands that I executed:
$ hal config provider kubernetes account delete company-prod-acct
$ hal config provider kubernetes account delete company-dev-acct
$ ./add_gke_account.sh # for gke_company_us-central1_company-prod
$ ./add_gke_account.sh # for gke_company_us-west1-a_company-dev
$ ./push_and_apply.sh
When the above didn't work I did an experiment where I deleted the two account and added an account with a different name (but the same GKE cluster) and ran push_and_apply. As before, the output messages seem to indicate that everything worked, but the Spinnaker UI continued to show all the old account names, despite the fact that I deleted them and added new ones (which did not show up). And, as before, not details could be seen for any of the applications. Also note that hal config provider kubernetes account list did show the new account name and did not show the old ones.
Any ideas for what I can do, other than complete recreating our Spinnaker installation? Is there anything in particular that I should look for in the Spinnaker logs in GCP to provide more information?
Thanks in advance.
-Mark
The problem turned out to be that the data that was in my .kube/config file in Cloud Shell was obsolete. Removing that file, recreating it (via the appropriate kubectl commands) and then running the commands mentioned in my original description fixed the problem.
Note, though, that it took a lot of shell script and GCP log reading by our team to figure out the problem. Ultimately, what would have been nice would have been if the add_gke_account.sh or push_and_apply.sh scripts could have detected the issue, presumably by verifying that the expected changes did, in fact, correctly occur in the running spinnaker.
I am trying to figure it out which one is the best practice when creating a new server on AWS EC2.
To do that I choose Tableau Server. First time, as Tableau docs recommend I did the install myself, but I would like to keep as automatic as possible everything, the idea behind that is if ec2 get destroyed how can I recover everything fast?
I am using terraform to store as a code all the AWS infrastructure, but the installation itself is not automatic yet.
To do that, I have two options, ansible (never worked before) or in this particular case Tableau has an automated install script in python, which I could add in the EC2 template launch configuration,and then using terraform I can raise it in minutes.
Which one should be the choosen why? Both seems to acomplish the final goal.
Also it raises some kind of doubs such as:
It retrieves the server up and with a full instalation of the software, but to get all users, and all the Tableau setup I have to raise anyways an snapshot, right? Is there any other tool to do that?
Then, if the manual install of the software is fast enough, why then I should use IaC to keep the install as code, instead of document the script of installation? And just keep the Infrastructure as code?
I'm presently looking into GCP's Deployment Manager to deploy new projects, VMs and Cloud Storage buckets.
We need a web front end that authenticated users can connect to in order to deploy the required infrastructure, though I'm not sure what Dev Ops tools are recommended to work with this system. We have an instance of Jenkins and Octopus Deploy, though I see on Google's Configuration Management page (https://cloud.google.com/solutions/configuration-management) they suggest other tools like Ansible, Chef, Puppet and Saltstack.
I'm supposing that through one of these I can update something simple like a name variable in the config.yaml file and deploy a project.
Could I also ensure a chosen name for a project, VM or Cloud Storage bucket fits with a specific naming convention with one of these systems?
Which system do others use and why?
I use Deployment Manager, as all 3rd party tools are reliant upon the presence of GCP APIs, as well as trusting that those APIs are in line with the actual functionality of the underlying GCP tech.
GCP is decidedly behind the curve on API development, which means that even if you wanted to use TF or whatever, at some point you're going to be stuck inside the SDK, anyway. So that's why I went with Deployment Manager, as much as I wanted to have my whole infra/app deployment use other tools that I was more comfortable with.
To specifically answer your question about validating naming schema, what you would probably want to do is write a wrapper script that uses the gcloud deployment-manager subcommand. Do your validation in the wrapper script, then run the gcloud deployment-manager stuff.
Word of warning about Deployment Manager: it makes troubleshooting very difficult. Very often it will obscure the error that can help you actually establish the root cause of a problem. I can't tell you how many times somebody in my office has shouted "UGGH! Shut UP with your Error 400!" I hope that Google takes note from my pointed survey feedback and refactors DM to pass the original error through.
Anyway, hope this helps. GCP has come a long way, but they've still got work to do.
I'm automating app deployment to cloud foundry. So in the start command, I do a db migration. What can happen is that the migration would fail and as the result, the app will be dead. Is there some predefined strategy that can be used to rollback to the last working deployment, or I should manually store the last working version, check for failure and in that case redeploy the stored version?
The typical strategy used to deploy apps on Cloud Foundry is blue/green. This generally works like this:
Push the new app under a new name & host, like my-app-new.
Test the app & make sure it works.
When your satisfied, change the route mapping from the old app to the new app.
Delete the old app & optionally rename the new app.
Step #3 is where the cut-over happens. Prior to that all traffic keeps flowing to the old app.
This is documented more here.
https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html
I'd say this often works well, but sometimes there are problems. Where this breaks down is with steps #1 & #2, if your app cannot have multiple instances of itself running or if migrations to your service are so different that the old app breaks when you update the database. It definitely helps if you keep this strategy in mind as you develop your app.
Aside from that, which has historically been the way to go, you could take a look at the new v3 API functionality. With v3, apps now retain multiple versions of a droplet. With this, you can rollback to a previous version of a droplet.
http://v3-apidocs.cloudfoundry.org/version/3.36.0/index.html#droplets
You can run cf v3-droplets to see the available droplets and cf v3-set-droplet to change the droplet being used.
That said, this will only rollback the droplet. It would not rollback a service like a database schema. If you need to do that, you'd need reverse migrations or perhaps even restore from a backup.
Hope that helps!
I work on very similar automation processes.
Daniel has explained the process very well. I think you're looking for the blue-green deployment methodology
1) Read up on blue green deploy here:
https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html
2) Look at this plugin or implement blue green deploy manually:
https://github.com/contraband/autopilot
3) Blue-green restage plugin (a nice to have, in case you need to restage the app but not cause any downtime to the clients):
https://github.com/orange-cloudfoundry/cf-plugin-bg-restage
It works by creating a temporary app, copying the env vars/routes/code from the working app to he temp app.
The temp app now accepts traffic while the original app is being restaged.
the traffic moves on to the original app after it is restaged and the temporary app is deleted.