GCP: How to prune/maintain Artifact Registry storage?

GCP: How to prune/maintain Artifact Registry storage? - google-cloud-platform

We've started to use Google Cloud Platform's Artifact Registry where pricing is pr. GB pr. month.
But how can I see how much storage is being used and by what?
It also looks like all pushed images are saved forever by default. So for each build, the repository will only grow and grow? How do I (automatically?) delete old builds, keeping only the most recent one (or N, or tagged images) available?
It seems disingenuous to price us pr. GB, but not provide any means to investigate or prune how much storage is being used, so I'm hoping we've missed something.
Edited to add: We have CI/CD pipelines creating between 20-50 new images a day. Having to manually delete them is not maintainable in the long run.
Edited to add: Essentially I'm looking for sethvargo/gcr-cleaner: Delete untagged image refs in Google Container Registry, as a service but for Artifact Registry instead of the Container Registry, which it will replace. Or the shell-script gist (also GCR-only) that inspired gcr-cleaner.

GCR Cleaner does support purging from Artifact Registry. I verified this myself and updated the documentation to reflect so. I don't plan on changing the tool's name since it's pretty well-recognized, but it will work with GCR and AR.

I hope somebody can come with a better answer, but I came across [FR] Show Image size information in Artifact Registry GUI [156322291] in Google's Issue Tracker. So this is a known issue.
And gcr-cleaner has this issue: Support for Artifact Registry · Issue #9 · sethvargo/gcr-cleaner - that is closed because it went stale.
Its looking like Artifact Repository is not yet mature enough for prime-time, and that I'm better off using Container Registry for the time being. A shame though.

The Artifact Registry service should introduce a support for retention policies in Q4'22. Until then some GCP customers will be able to subscribe for this feature as early previewers. You can see more information at Configure cleanup policies for repositories
Other way around would be to use Cloud Scheduler or Cloud Run jobs to do the work.

Related

Best Devops solution on GCP

I am quite new to GCP. My requirement is to implement devops solution on GCP. We are going to use python scripts and bigqueries.
I want to know which is the best cost effective devops solution to implement in GCP?

The built in CI/CD solution on Google Cloud is Cloud Build. I like this tool and I strongly recommend it. In summary, you have to define the steps to your build. Each steps are based on container. Load it, use it, go to the next one. Only the /workspace directory is kept between step (which creates some challenge sometime). You can redefine your entrypoint, your env vars for a step,... There is a lot of capabilities and there is a lot of help/tips on Stack Overflow or elsewhere.
For the pricing, it's interesting: you have 120 minutes of build free per day and PER BILLING ACCOUNT.
I'm not a Jenkins expert, I used it 6 years ago!
The main difference is the GUI and Plugins. You can do all with the GUI with jenkins, with Cloud Build, only the trigger and the jobs running/terminated (+ logs) are viewable on the console. The steps' configurations are only done by code (YAML or JSON file). Plugin are custom workers, but you haven't the same library as Jenkins.
On the other hand, Jenkins need to be hosted on VM, to be upgraded, the VM to be patched. And you have a minimum fee for Jenkins even if you have any builds.
Opinionated answer are difficult, because it depends on many factors!!

GCP Deployment Manager - What Dev Ops Tool To Use In Conjunction?

I'm presently looking into GCP's Deployment Manager to deploy new projects, VMs and Cloud Storage buckets.
We need a web front end that authenticated users can connect to in order to deploy the required infrastructure, though I'm not sure what Dev Ops tools are recommended to work with this system. We have an instance of Jenkins and Octopus Deploy, though I see on Google's Configuration Management page (https://cloud.google.com/solutions/configuration-management) they suggest other tools like Ansible, Chef, Puppet and Saltstack.
I'm supposing that through one of these I can update something simple like a name variable in the config.yaml file and deploy a project.
Could I also ensure a chosen name for a project, VM or Cloud Storage bucket fits with a specific naming convention with one of these systems?
Which system do others use and why?

I use Deployment Manager, as all 3rd party tools are reliant upon the presence of GCP APIs, as well as trusting that those APIs are in line with the actual functionality of the underlying GCP tech.
GCP is decidedly behind the curve on API development, which means that even if you wanted to use TF or whatever, at some point you're going to be stuck inside the SDK, anyway. So that's why I went with Deployment Manager, as much as I wanted to have my whole infra/app deployment use other tools that I was more comfortable with.
To specifically answer your question about validating naming schema, what you would probably want to do is write a wrapper script that uses the gcloud deployment-manager subcommand. Do your validation in the wrapper script, then run the gcloud deployment-manager stuff.
Word of warning about Deployment Manager: it makes troubleshooting very difficult. Very often it will obscure the error that can help you actually establish the root cause of a problem. I can't tell you how many times somebody in my office has shouted "UGGH! Shut UP with your Error 400!" I hope that Google takes note from my pointed survey feedback and refactors DM to pass the original error through.
Anyway, hope this helps. GCP has come a long way, but they've still got work to do.

Setting build priority in yaml or UI

Is there a way to setup up a build's priority in a yaml based pipeline? There seem to be references to build priority in the Azure DevOps API, but nothing in how to do this via yaml. I thought there might be some docs in the Triggers section, but no.
We need this because we have some fast building NuGet packages, but these get starved via slow-build pipelines making turnaround time for packages painful.
The closest thing I could come up with to working around this is via agent demands in the yaml
demands:
- Agent.ComputerName = XYZ
to separate build pipelines, but this is a bit of a hack and doesn't use agents efficiently.
A way to set this in UI would be acceptable, but I couldn't seem to find anything.

Recently Azure DevOps introduced the ability to manually specify a build/release runs next.
This manifests as a Run next button. (image source).
So while you can't say "this pipeline always takes priority" yet, you can manually force a specific run to the front of the queue.
If you need a specific pipeline to always take priority, then you likely want to setup a separate agent pool just for those pipelines, or use demands as Leo Liu mentioned.

Setting build priority in yaml or UI
I'm afraid this feature is not yet supported in Azure DevOps at this moment.
There is a popular user voice about it, you can upvote it and check the feedback from that ticket.
Currently as a workaround, just like what you did, set the demands in build definitions to force building with the specific agents.
Hope this helps.

Google Cloud Data Fusion + CI/CD for Data Pipelines

I am just getting started with both, GCP & Google Cloud Data Fusion. Just viewed the intro video. I see that pipelines can be exported. I was wondering how we might promote a pipeline from say, Dev to Prod env? My guess is that after some testing, the exported file is copied to the Prod branch on Git, from where we need to invoke the APIs to deploy it? Also, what about connection details, how do we avoid hard-coding the source/destination configurations & credentials?

Yes. You would have to export and re-import the pipeline.

About the first question, if you have different environments for development and production, you can export your pipeline and import it in the correct environment.
I didn't understand the second question very well. In the official Data Fusion plugins there is a standard way to provide your credentials. If you need a better answer, please explain a little more carefully your doubt.

Google Container Builder: How to cache dependencies between two builds

We are migrating our container building process to Google Container Builder. We have multiple repo using Node or Scala.
As of actual container builder features, is it possible to cache dependencies between two builds (ex: node_modules, .ivy, ...). It's really time (money) consuming to download everything each time.
I know it's possible to build a custom docker image with all packaged within, but we would prefer avoiding this solution.
For example can we mount a persistent volume for that purpose, as we used to do with DroneIO? or even better automatically like in Bitbucket Pipelines?
Thanks

GCB doesn't currently support mounting a persistent volume across builds.
In the meantime, the team recently published a document outlining some options for speeding up builds, which might be useful: https://cloud.google.com/container-builder/docs/speeding-up-builds
In particular, caching generated output to Google Cloud Storage and pulling it in at the beginning of your build might help in your case.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GCP: How to prune/maintain Artifact Registry storage? - google-cloud-platform

GCR Cleaner does support purging from Artifact Registry. I verified this myself and updated the documentation to reflect so. I don't plan on changing the tool's name since it's pretty well-recognized, but it will work with GCR and AR.

Related

Best Devops solution on GCP

GCP Deployment Manager - What Dev Ops Tool To Use In Conjunction?

Setting build priority in yaml or UI

Google Cloud Data Fusion + CI/CD for Data Pipelines

Google Container Builder: How to cache dependencies between two builds

Categories

Resources