If you set a storage bucket as a static website, is there any way to trace who has accessed it? e.g. IP addresses, time viewed etc...
I have looked in the stackdriver logs but it only shows events e.g. bucket created, files uploaded etc...
You will need to configure access logs for public buckets. Then you may import them into BigQuery for analysis.
Use Access and Storage logs if:
You want to track access to public objects, such as assets in a
bucket that you've configured to be a static website.
You'll be able to get all the information required such as IP address, time, region, zone, headers, read/write ops etc. in the access log fields.
https://cloud.google.com/storage/docs/access-logs
In GCP (of which GCS is a part), there is the concept of Audit Logs. These are normally switched off by default and be can be enabled on a product by product basis. For GCS, the Data Access Logs include DATA_READ which claims to log information on "getting object data".
However, before we go much further, there is a huge caveat. It reads:
Cloud Audit Logs does not track access to public objects.
What this means is that if you have exposed the objects as publicly readable (which is common for a WebSite hosted on GCS) then there are no logs captured.
References:
Cloud Audit Logs with Cloud Storage
Related
I have some objects in a Google Cloud Storage bucket that are publicly downloadable on URLs like https://storage.googleapis.com/blahblahblah. I want to set up a monitoring rule that lets me see how often one of these objects is being downloaded. I have turned on the Data Read audit log as mentioned here, but I don't see any logs when I download the object from the storage.googleapis.com link. I have another bucket where downloads are performed through the Node Google Cloud Storage client library, and I can see download logs from that bucket, so it seems like downloads from the public URL don't get logged.
I also don't see a way to specify the object in a particular bucket when setting up an alert in Google Cloud. Is creating a new bucket solely for this object the best way to try to set up monitoring for the number of downloads, or is there something I'm missing here?
Google Cloud Audit logs do not track objects that are public (allUsers or allAuthenticatedUsers).
Enable usage logs to track access to public objects.
Should you use usage logs or Cloud Audit Logs?
How to fetch cloud storage bucket last access details. As of now, I'm seeing we can find only last modified date for bucket and objects. Is there any way to fetch last access details for buckets and objects. Do we need to enable logging for each object to fetch it or Is there any options available?
There are several types of logs you can enable to get this information.
Cloud Audit Logs is the recommended method for generating logs that track API operations performed in Cloud Storage:
Cloud Audit Logs tracks access on a continuous basis.
Cloud Audit Logs produces logs that are easier to work with.
Cloud Audit Logs can monitor many of your Google Cloud services, not
just Cloud Storage.
Audit Logs are logged in "near" real-time and available as any other logs in GCP. You can view a summary of the audit logs for your project in the Activity Stream in the Google Cloud Console. A more detailed version of the logs can found in the Logs Viewer.
In some cases, you may want to use Access Logs instead. You most likely want to use access logs if:
You want to track access to public objects, such as assets in a
bucket that you've configured to be a static website.
You want to track access to objects when the access is exclusively
granted because of the Access Control Lists (ACLs) set on the
objects.
You want to track changes made by the Object Lifecycle Management
feature.
You intend to use authenticated browser downloads to access objects
in the bucket.
You want your logs to include latency information, or the request and
response size of individual HTTP requests.
As opposed to audit logs, access logs aren't sent "real-time" to Stackdriver Logging but are offered in the form of CSV files, generated hourly when there is activity to report in the monitored bucket, that you can download and view.
The access logs can provide an overwhelming amount of information. You'll find here a table to help you identify all the information provided in these logs.
Cloud Storage buckets are meant to serve high volumes of read requests through a variety of means. As such, reads don't also write any additional data - that would not scale well. If you want to record when an object gets read, you would need to have the client code reading the object to also write the current time to some persistent storage. Or, you could force all reads through some API endpoint that performs the update manually. In either case, you are writing code and using additional resources to store this data.
I want to upload a document to gcp google cloud storage but I want to know whenever that file is downloaded for security purposes, i.e. I don't want someone downloaded file locally without knowing.
If you intend to stop people from downloading it, IAM & Bucket permissions are what you're looking for.
If its to Monitor Access for an audit trail, you can check the logs in StackDriver by filtering to GCS and looking for your bucket/file.
The log viewer filtering details are described here:
If your Cloud Storage object is public, Google Cloud does not log or track access.
If you put a CDN in front of the bucket, and then use the CDN URL, Stackdriver will log access via the CDN.
I'm setting up a Coldline bucket for unstructured data backup.
The bucket level public access setting for my Coldline storage bucket is set at "Per Object" and the object level public access setting is at "Not Public".
But whenever I generate an access link to my private storage objects, I'm able to use the generated access links without any credentials (say incognito).
Does this mean if someone is able to generate (highly unlikely) or able to snoop my GET requests (highly likely) they get view access to my private stored objects?
I think you are referring to the usage of Signed URLs that can be implemented to give time-limited read or write access for GCP buckets and objects. Keep in mind that this method will give access to anyone in possession of the URL, regardless of whether they have a Google account, as you well mentioned.
In case you want to implement a user authenticated methods, it is recommended to use IAM and ACLs permissions. You can take a look on the Access Control Options document to know more about the available alternatives to control who has access to your Cloud Storage.
I was trying to understand the Google Cloud Platform storage but couldn't really comprehend the language used in the documentation. I wanted to ask if you could use the storage and the APIs to store photos users take within your application and also get the images back if provided with a URL? and even if you can, would it be a safe and reasonable method to do so?
Yes you can pretty much use a storage bucket to store any kind of data.
In terms of transferring images from an application to storage buckets, the application must be authorised to write to the bucket.
One option is to use a service account key within the application. A service account is a special account that can be used by an application to authorise to various Google APIs, including the storage API.
There is some more information about service accounts here and information here about using service account keys. These keys can be used within your application, and allow the application to inherit the permission/scopes assigned to that service account.
In terms of retrieving images using a URL, one possible option would be to use signed URLs which would allow you to give users read or write access to an object (in your case images) in a bucket for a given amount of time.
Access to bucket objects can also be controlled with ACL (Access Control Lists). If you're happy for you images to be available publicly (i.e. accessible to everybody), it's possible to set an ACL with 'Reader' access for AllUsers.
More information on this can be found here.
Should you decide to make the images available publically, the URL format to retrive the object/image from the bucket would be:
https://storage.googleapis.com/[BUCKET_NAME]/[OBJECT_NAME]
EDIT:
In relation to using an interface to upload the files before the files land in the bucket, one option would be to have a instance with an external IP address (or multiple instances behind a Load Balancer) where the images are initially uploaded. You could mount Cloud Storage to this instance using FUSE, so that uploaded files are easily transferred to the bucket. In terms of databases you have the option of manually installing your database on a Compute Engine instance, or using a fully managed database service such as Cloud SQL.