fetch cloud storage bucket last access details - google-cloud-platform

How to fetch cloud storage bucket last access details. As of now, I'm seeing we can find only last modified date for bucket and objects. Is there any way to fetch last access details for buckets and objects. Do we need to enable logging for each object to fetch it or Is there any options available?

There are several types of logs you can enable to get this information.
Cloud Audit Logs is the recommended method for generating logs that track API operations performed in Cloud Storage:
Cloud Audit Logs tracks access on a continuous basis.
Cloud Audit Logs produces logs that are easier to work with.
Cloud Audit Logs can monitor many of your Google Cloud services, not
just Cloud Storage.
Audit Logs are logged in "near" real-time and available as any other logs in GCP. You can view a summary of the audit logs for your project in the Activity Stream in the Google Cloud Console. A more detailed version of the logs can found in the Logs Viewer.
In some cases, you may want to use Access Logs instead. You most likely want to use access logs if:
You want to track access to public objects, such as assets in a
bucket that you've configured to be a static website.
You want to track access to objects when the access is exclusively
granted because of the Access Control Lists (ACLs) set on the
objects.
You want to track changes made by the Object Lifecycle Management
feature.
You intend to use authenticated browser downloads to access objects
in the bucket.
You want your logs to include latency information, or the request and
response size of individual HTTP requests.
As opposed to audit logs, access logs aren't sent "real-time" to Stackdriver Logging but are offered in the form of CSV files, generated hourly when there is activity to report in the monitored bucket, that you can download and view.
The access logs can provide an overwhelming amount of information. You'll find here a table to help you identify all the information provided in these logs.

Cloud Storage buckets are meant to serve high volumes of read requests through a variety of means. As such, reads don't also write any additional data - that would not scale well. If you want to record when an object gets read, you would need to have the client code reading the object to also write the current time to some persistent storage. Or, you could force all reads through some API endpoint that performs the update manually. In either case, you are writing code and using additional resources to store this data.

Related

Monitoring downloads of Google Cloud Storage objects that have public URLs

I have some objects in a Google Cloud Storage bucket that are publicly downloadable on URLs like https://storage.googleapis.com/blahblahblah. I want to set up a monitoring rule that lets me see how often one of these objects is being downloaded. I have turned on the Data Read audit log as mentioned here, but I don't see any logs when I download the object from the storage.googleapis.com link. I have another bucket where downloads are performed through the Node Google Cloud Storage client library, and I can see download logs from that bucket, so it seems like downloads from the public URL don't get logged.
I also don't see a way to specify the object in a particular bucket when setting up an alert in Google Cloud. Is creating a new bucket solely for this object the best way to try to set up monitoring for the number of downloads, or is there something I'm missing here?
Google Cloud Audit logs do not track objects that are public (allUsers or allAuthenticatedUsers).
Enable usage logs to track access to public objects.
Should you use usage logs or Cloud Audit Logs?

Google Cloud get the log usage information of a API Key

I'm building a chat and having a feature with cloud translation API, for each client I create a new API Key to been able to identify the consume usage of each client, the problem is the following:
I want to see the consume of all API Keys inside a project, something like the Operations Logging:
But revealing information of the timestamp and the API Key name use so I can be able to track each client usage of the service and determine how much I am going to bill them.
Update
Doing some additional research come up to this article which gives a walkthrough to gain visibility on Service Account Keys (similar but not what I needed). On this guide they create a Log Sink to push logs into BigQuery.
The problem now is that the filter used to extract the logs is the following:
logName:"projects/<PROJECT>/logs/cloudaudit.googleapis.com"
protoPayload.authenticationInfo.serviceAccountKeyName:"*"
The second line extract log that belongs to Service Account Key Name. But as it was stated at the beginning of the question I'm looking for the API Key log not the service account key.
You can use Cloud Audit logs 1 , Cloud Audit Logs provides the following audit logs for each Cloud project, folder, and organization:
-Admin Activity audit logs
-Data Access audit logs
-System Event audit logs
-Policy Denied audit logs
Google Cloud services write audit log entries to these logs to help you answer the questions of "who did what, where, and when?" within your Google Cloud resources.
For this scenario it could be helpful Data Access audit logs 2, it contains API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read user-provided resource data. Data Access audit logs do not record the data-access operations on resources that are publicly shared (available to All Users or All Authenticated Users) or that can be accessed without logging into Google Cloud.
As mentioned in the previous comment, this logs are disabled by default because they can be quite large; they must be explicitly enabled to be written.
However, the simplest way to view your API metrics is to use the Google Cloud Console's API Dashboard 3. You can see an overview of all your API usage, or you can drill down to your usage of a specific API.

GCP Storage Bucket Access Logs

If you set a storage bucket as a static website, is there any way to trace who has accessed it? e.g. IP addresses, time viewed etc...
I have looked in the stackdriver logs but it only shows events e.g. bucket created, files uploaded etc...
You will need to configure access logs for public buckets. Then you may import them into BigQuery for analysis.
Use Access and Storage logs if:
You want to track access to public objects, such as assets in a
bucket that you've configured to be a static website.
You'll be able to get all the information required such as IP address, time, region, zone, headers, read/write ops etc. in the access log fields.
https://cloud.google.com/storage/docs/access-logs
In GCP (of which GCS is a part), there is the concept of Audit Logs. These are normally switched off by default and be can be enabled on a product by product basis. For GCS, the Data Access Logs include DATA_READ which claims to log information on "getting object data".
However, before we go much further, there is a huge caveat. It reads:
Cloud Audit Logs does not track access to public objects.
What this means is that if you have exposed the objects as publicly readable (which is common for a WebSite hosted on GCS) then there are no logs captured.
References:
Cloud Audit Logs with Cloud Storage

how to securely publish logs to the cloud

My library is a CLI utility, and people get it by running pip install [libname]. I would like to automatically record exceptions that occur when people use it and store these logs in the cloud. I have found services that should do just that: AWS CloudWatch, GCP Stackdriver.
However, while looking at their API it appears that I would have to ship my private key in order for the library to authenticate to my account. This doesn't sound right and I am warned by the cloud providers not to do this.
Example from GCP fails, requires credentials:
from google.cloud import logging
client = logging.Client()
logger = client.logger('log_name')
logger.log_text('A simple entry') # API call
While python library exposes source, I understand that any kind of authentication I ship would bear the risk of people sending any fake logs, but this is OK to me, as I would just limit the spending on my account for the (unexpected) case that somebody does just that. Of Course the credentials that ship with the library should be restricted to logging only.
Any example of how to enable logging to a cloud service from user machines?
For Azure Application Insights' "Instrumentation Key" there is a very good article about that subject here: https://devblogs.microsoft.com/premier-developer/alternative-way-to-protect-your-application-insights-instrumentation-key-in-javascript/
While I'm not familiar with the offerings of AWS or GCP, I would assume similar points are vaild.
Generally speaking: While the instrumentation key is a method of authentication, it is not considered a very secret key in most scenarios. The worst damage somebody can do is to send unwanted logs. They cannot read any data or overwrite anything with that key. And you already stated above that you are not really worried in your case about the issue of unwated logs.
So, as long as you are using an App Insights instance only for one specific application / purpose, I would say you are fine. You can still further aggregate that data in the background with data from different sources.
To add an concrete example to this: This little tool from Microsoft (the specific use case does not matter here), collects telemetry as well and sends it to Azure Application Insights - if the user does not opt out. I won't point to the exact code line but their instrumentation key is checked-in to that public GitHub repo for anybody to find.
Alternatively, the most secure way would be to send data from the
browser to your custom API on your server then forward to Application
Insights resource with the correct instrumentation key (see diagram
below).
(Source: the link above)
App Insights SDK for python is here btw: https://github.com/microsoft/ApplicationInsights-Python
To write logs to Stackdriver requires credentials. Anonymous connections to Stackdriver are NOT supported.
Under no circumstances give non-privileged users logging read permissions. Stackdriver records sensitive information in Stackdriver Logs.
Google Cloud IAM provides the role roles/logging.logWriter. This role gives users just enough permissions to write logs. This role does not grant read permissions.
The role roles/logging.logWriter is fairly safe. A user can write logs, but cannot read, overwrite or delete logs. I say fairly safe as there is private information stored in the service account. I would create a separate project only for Stackdriver logging with no other services.
The second issue with providing external users access is cost. Stackdriver logs are $0.50 per GiB. You would not want someone uploading a ton of logfile entries. Make sure that you monitor external usage. Create an alert to monitor costs.
Creating and managing service accounts
Chargeable Stackdriver Products
Alert on Stackdriver usage
Stackdriver Permissions and Roles

Can I track all users' file access on GCP, i.e. when file was downloaded, read, etc?

I want to upload a document to gcp google cloud storage but I want to know whenever that file is downloaded for security purposes, i.e. I don't want someone downloaded file locally without knowing.
If you intend to stop people from downloading it, IAM & Bucket permissions are what you're looking for.
If its to Monitor Access for an audit trail, you can check the logs in StackDriver by filtering to GCS and looking for your bucket/file.
The log viewer filtering details are described here:
If your Cloud Storage object is public, Google Cloud does not log or track access.
If you put a CDN in front of the bucket, and then use the CDN URL, Stackdriver will log access via the CDN.