Is it possible track the number of docker pulls in Google Artifact Registry? - google-cloud-platform

I'd like to measure the number of times a Docker image has been downloaded from a Google Artifact registry repository in my GCP project.
Is this possible?

Interesting question.
I think this would be useful too.
I think there aren't any Monitoring metrics (no artifactregistry resource type is listed nor metrics are listed)
However, you can use Artifact Registry audit logs and you'll need to explicitly enable Data Access logs see e.g. Docker-GetManifest.
NOTE I'm unsure whether this can be achieved from gcloud.
Monitoring Developer tools, I learned that Audit Logs are configured in Project Policies using AuditConfig's. I still don't know whether this functionality is available through gcloud (anyone?) but evidently, you can effect these changes directly using API calls e.g. projects.setIamPolicy:
gcloud projects get-iam-policy ${PROJECT}
auditConfigs:
- auditLogConfigs:
- logType: DATA_READ
- logType: DATA_WRITE
service: artifactregistry.googleapis.com
bindings:
- members:
- user:me
role: roles/owner
etag: BwXanQS_YWg=
Then, pull something from the repo and query the logs:
PROJECT=[[YOUR-PROJECT]]
REGION=[[YOUR-REGION]]
REPO=[[YOUR-REPO]]
FILTER="
logName=\"projects/${PROJECT}/logs/cloudaudit.googleapis.com%2Fdata_access\"
protoPayload.methodName=\"Docker-GetManifest\"
"
gcloud logging read "${FILTER}" \
--project=${PROJECT} \
--format="value(timestamp,protoPayload.methodName)"
Yields:
2022-03-20T01:57:16.537400441Z Docker-GetManifest
You ought to be able to create a logs-based metrics for these too.

We do not yet have platform logs for Artifact Registry unfortunately, so using the CALs is the only way to do this today. You can also turn the CALs into log-based metrics and get graphs and metrics that way too.
The recommendation to filter by 'Docker-GetManifest' is also correct - it's the only request type for which a Docker Pull always has exactly one. There will be a lot of other requests that are related but don't match 1:1. The logs will have all requests (Docker-Token, 0 or more layer pulls), including API requests like ListRepositories which is called by the UI in every AR region when you load the page.
Unfortunately, the theory about public requests not appearing is correct. CALs are about logging authentication events, and when a request has no authentication whatsover, CALs are not generated.

Related

How to list/get the 'creator' of all GCP resource in a project?

Is there a way to list/get the owner(creator) of all the resource under a project ?
I have already looked at the answers here and tried the cloud assets api gcloud asset search-all-resources --scope=projects/123 but this doesn't list the creator of each resource. I have also referred to the searching resources samples queries here but again this doesn't suffice my needs.
Ideally I need the following, for example -
asset type -storage bucket
resource name - test_bucket
owner/creator/user - user123#org1.com or test#gservice_account.com
created - 02-02-2018
same for other asset types like compute instance, BigQuery datasets etc.
Has anyone ever tried this ?
What you are looking for is the Audit logs.
As mentioned in docs:
Google Cloud services write audit logs that record administrative activities and accesses within your Google Cloud resources. Audit logs help you answer "who did what, where, and when?" within your Google Cloud resources with the same level of transparency as in on-premises environments
Also,
Here, you can find a list of all services that produces audit-logs.
Take a look here on the Best practices and things to be taken into consideration while working with audit-logs.

Bringing Google Cloud costs to zero (compute engine)

In the billing section for one of my projects the costs for Compute Engine - E2 Instance Core of 12 hours are listed every day. But there are no instances in the Compute Engine section. The project actually only contains special Google Maps API keys that cannot be transferred.
I have also tried to disable the Compute Engine API. Unfortunately this fails with the following error: Hook call/poll resulted in failed op for service 'compute.googleapis.com': Could not turn off service, as it still has resources in use] with failed services [compute.googleapis.com]
Any idea?
Based on the error message: ‘Could not turn off service, as it still has resources in use] with failed services [compute.googleapis.com]’
That means that there are resources under Compute Engine API, so, you can either run a gcloud command to list the current instances or run a gcloud command to view the Asset Inventory, I suggest you to open your GCP project in a Chrome incognito window and use cloud shell.
List instances
gcloud compute instances list
List Asset Inventory
gcloud asset search-all-resources
NOTE: The Asset Inventory API is not enabled by default, so, after you run the command you’ll receive this message:
user#cloudshell:~ (project-id)$ gcloud asset search-all-resources
API [cloudasset.googleapis.com] not enabled on project [project-id].
Would you like to enable and retry (this will take a few minutes)?
(y/N)?
Please type y, to enable the API and be able to see the output of the command.
Having said that, when you see the results on the screen you’ll be able to identify the resources under the Compute Engine API and all its components, e.g.
---
additionalAttributes:
networkInterfaces:
- network: https://www.googleapis.com/compute/v1/projects/project-id/global/networks/default
networkIP: 1.18.0.5
assetType: compute.googleapis.com/Instance
displayName: linux-instance
location: us-central1-a
name: //compute.googleapis.com/projects/project-id/zones/us-central1-a/instances/linux-instance
project: projects/12345678910
---
additionalAttributes: {}
assetType: compute.googleapis.com/Disk
displayName: linux-instance
location: us-central1-a
name: //compute.googleapis.com/projects/project-id/zones/us-central1-a/disks/linux-instance
project: projects/12345678910
---
As you can see the 2 above lines describe the instance ‘linux-instance’ and its components (disk and ip address), all of them are under the API -> compute.googleapis.com
If you need further assistance, please send the output of the command to a TXT file and remove the sensitive information like: project-id, external IPs, internal IPs and share the output with me so I can take a look at it.
Alternatively, you can sanitize the output of the command just like I did it by replacing the instance name, project ID, project number and IP address with fake data.
Please keep in mind that since this is a billing concern the GCP billing team is open to hear you.
Curious.
There are some services that require Compute Engine resources, e.g. Kubernetes Engine, but I thought that, if used, the resources are always exposed.
One way to surface the user of this resource may be to enumerate the project's services and eyeball the result for a service that may be consuming VMs:
gcloud services list --enabled --project=[[YOUR-PROJECT]]

Unable to collect metrics from customized fluentd on GKE

I have troubles to enable metrics on my GKE after customized fluentd in another namespace.
I add some changes to the fluentd configmap, since GKE default fluentd & configmap in kube-system namespace can't change(changes always get reverted), I deployed the fluentd and event-exporter in another namespace.
But the metrics are missing after I made the change. All the logs are OK, still in the logging viewer.
What needs to be done so GKE can collect the metrics again? Or maybe I'm wrong, is there any way to modify the default fluentd configmap in the kube-system?
I wasn't able to find anything useful on this topic. So I create a GCP support ticket.
Google provided one solution:
With Cloud Operations for GKE, you can collect just system logs [1] that way monitoring remains enabled in your cluster. Please note that this option can be enabled only via console but not via gcloud command line. There is a tracking bug, https://issuetracker.google.com/163356799 for the same.
Further, you can deploy your own configurable Fluentd daemonset to
customize the applications logs [2]
You will be running 2 daemonsets for fluentd with this config, however
to reduce the amount of log duplication it would be recommended that
you decrease the logging from CloudOps to capture system logs only[2],
while your customized fluentd daemonset will be able to capture your
application workload logs.
The disadvantages from using this approach are: ensuring your custom
deployment doesn't overlap something CloudOps is watching (ie. files,
logs), there will be an increased amount of API calls and you will be
responsible for updating/maintaining and managing your custom fluentd
deployment.
[1] https://cloud.google.com/stackdriver/docs/solutions/gke/installing#controlling_the_collection_of_application_logs
[2]. https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd

Kubernetes: Get mail once deployment is done

Is there a way to have post deployment mail in kubernetes on GCP/AWS ?
It has become harder to maintaining deployment on kubernetes once deployment team size grows. Having a post deployment mail service will ease up the process. As it'll also say who applied the deployment.
You could try to watch deployment events using https://github.com/bitnami-labs/kubewatch and webhook handler.
Another thing could be implementing customized solution with kubernetes API, for instance in python: https://github.com/kubernetes-client/python then run it as a separate notification pod in your cluster
Third option is to have deployment managed in ci/cd pipeline where actual deployment execution step is "approval" type, you should see user who approved and next step in the pipeline after approving could be the email notification
Approval in circle ci: https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval
I don’t think such feature is built-in in Kubernetes.
There is a watch mechanism though, what you could use. Run the following GET query:
https://<api-server-url>/apis/apps/v1/namespace/<namespace>/deployments?watch=true
The connection will not close and you’ll get a “notification” about each deployment. Check the status fields. Then you can send the mail or do something else.
You’ll need to pass an authorization token to gain access to the API server. If you have kubectl setup, you could run a local proxy, which then won’t need the token: kubectl proxy.
You can attach handlers to container lifecycle events. Kubernetes supports preStop and postStart events. Kubernetes sends the postStart event immediately after the container is started. Here is the snippet of the pod manifest deployment file.
spec:
containers:
- name: <******>
images: <******>
lifecycle:
postStart:
exec:
command: [********]
Considering GCP, one option could be create a filter to get the info about your deployment finalization at Stackdriver Logging, and with the filter you can use the CREATE METRIC option, also in Stackdriver Logging.
With the metric created, use Stackdriver Monitoring to create an alert to send e-mails. More details at official documentation.
It looks like no one has mentioned "native tool" Kubernetes provides for that yet.
Please note, that there is a concept of Audit in Kubernetes.
It provides a security-relevant chronological set of records documenting the sequence of activities that have affected system by individual users, administrators or other components of the system.
Each request on each stage of its execution generates an event, which is then pre-processed according to a certain policy and processed by certain backend.
That allows cluster administrator to answer the following questions:
what happened?
when did it happen?
who initiated it?
on what did it happen?
where was it observed?
from where was it initiated?
to where was it going?
Administrator can specify what events should be recorded and what data they should include with the help of Audit policy/ies.
There are a few backends that persist audit events to an external storage.
Log backend, which writes events to a disk
Webhook backend, which sends events to an external API
Dynamic backend, which configures webhook backends through an AuditSink API object.
In case you use log backend, it is possible to collect data with tools such as a fluentd. With that data you can achieve more than just a post deployment mail in Kubernetes.
Hope that helps!

Kubernetes Engine unable to pull image from non-private / GCR repository

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.
I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.
The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.
I get the error ImagePullBackOff and see this in the pod events:
Failed to pull image
"gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988":
rpc error: code = Unknown desc = unauthorized: authentication required
What's going on? Who needs authorization, and for what?
In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.
My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.
The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.
First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.
I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.
Things worked after that.
The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.
These are some examples of useful commands:
gcloud projects get-iam-policy $PROJECT_ID
gcloud services disable container.googleapis.com --verbosity=debug
gcloud services enable container.googleapis.com
More info here, including how to restore service account credentials.