Log Buckets from Google - google-cloud-platform

Is it possible to download a Log Storage (Log bucket) from Google Cloud Platform, specifically the one created by default? In case someone knows they can explain how to do it.

The possible solution for the question is you need to choose the required logs and then get the logs for the time period of 1 day to download them in JSON or CSV format.
Step1- From the logging console goto advanced filtering mode
Step2- To choose the log type use filtering query, for example
resource.type="audited_resource"
logName="projects/xxxxxxxx/logs/cloudaudit.googleapis.com%2Fdata_access"
resource.type="audited_resource"
logName="organizations/xxxxxxxx/logs/cloudaudit.googleapis.com%2Fpolicy"
Step3- You can download them as JSON and CSV format
If you have a huge number of audit logs generated per day then above one will not work out. So, you need to export logs to Cloud storage and a big query for further analysis. Please note that cloud logging doesn’t charge to export logs but destination charges might apply.
Another option, you can use the following gcloud command to download the logs.
gcloud logging read "logName : projects/Your_Project/logs/cloudaudit.googleapis.com%2Factivity" --project=Project_ID --freshness=1d >> test.txt

Related

GCP logging: Find all resources (recently) used by a specific user

This is part of my journey to get a clear overview of which users/service accounts are in my GCP Project and when they last logged in.
Endgoal: to be able to clean up users/service-accounts if needed when they weren't on GCP for a long time.
First question:
How can I find in the logs when a specific user used resources, so I can determine when this person last logged in?
You need the Auditlogs and to see them you can run the following query in Cloud Logging:
protoPayload.#type="type.googleapis.com/google.cloud.audit.AuditLog"
protoPayload.authenticationInfo.principalEmail="your_user_name_email_or_your_service_account_email"
You can also check the Activity logs and filter on a user:
https://console.cloud.google.com/home/activity
Related questions + answers:
Pull "last access" information on projects from Google Cloud Platform (GCP)
IAM users and last login date in google cloud
How to list, find, or search iam policies across services (APIs), resource types, and projects in google cloud platform (GCP)?
There is now also the newly added Log Analytics.
This allows you to use SQL to query your logs.
Your logging buckets _Default and _Required need to be upgraded to be able to use Log Analytics:
https://cloud.google.com/logging/docs/buckets#upgrade-bucket
After that you use for example the console to use SQL on your logs:
https://console.cloud.google.com/logs/analytics
Unfortunately, at the moment you can only query the logs that were created after you've switched on Log Analytics.
Example query in the Log Analytics:
SELECT
timestamp,
proto_Payload.audit_log.authentication_info.principal_email,
auth_info.resource,
auth_info.permission,
auth_info.granted
FROM
`logs__Default_US._AllLogs`
left join unnest(proto_Payload.audit_log.authorization_info) auth_info
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
and proto_payload.type = "type.googleapis.com/google.cloud.audit.AuditLog"
and proto_Payload.audit_log.authentication_info.principal_email in ("name_of_your_user")
ORDER BY
timestamp

How to schedule an export from a BigQuery table to Cloud Storage?

I have successfully scheduled my query in BigQuery, and the result is saved as a table in my dataset. I see a lot of information about scheduling data transfer in to BigQuery or Cloud Storage, but I haven't found anything regarding scheduling an export from a BigQuery table to Cloud Storage yet.
Is it possible to schedule an export of a BigQuery table to Cloud Storage so that I can further schedule having it SFTP-ed to me via Google BigQuery Data Transfer Services?
There isn't a managed service for scheduling BigQuery table exports, but one viable approach is to use Cloud Functions in conjunction with Cloud Scheduler.
The Cloud Function would contain the necessary code to export to Cloud Storage from the BigQuery table. There are multiple programming languages to choose from for that, such as Python, Node.JS, and Go.
Cloud Scheduler would send an HTTP call periodically in a cron format to the Cloud Function which would in turn, get triggered and run the export programmatically.
As an example and more specifically, you can follow these steps:
Create a Cloud Function using Python with an HTTP trigger. To interact with BigQuery from within the code you need to use the BigQuery client library. Import it with from google.cloud import bigquery. Then, you can use the following code in main.py to create an export job from BigQuery to Cloud Storage:
# Imports the BigQuery client library
from google.cloud import bigquery
def hello_world(request):
# Replace these values according to your project
project_name = "YOUR_PROJECT_ID"
bucket_name = "YOUR_BUCKET"
dataset_name = "YOUR_DATASET"
table_name = "YOUR_TABLE"
destination_uri = "gs://{}/{}".format(bucket_name, "bq_export.csv.gz")
bq_client = bigquery.Client(project=project_name)
dataset = bq_client.dataset(dataset_name, project=project_name)
table_to_export = dataset.table(table_name)
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
extract_job = bq_client.extract_table(
table_to_export,
destination_uri,
# Location must match that of the source table.
location="US",
job_config=job_config,
)
return "Job with ID {} started exporting data from {}.{} to {}".format(extract_job.job_id, dataset_name, table_name, destination_uri)
Specify the client library dependency in the requirements.txt file
by adding this line:
google-cloud-bigquery
Create a Cloud Scheduler job. Set the Frequency you wish for
the job to be executed with. For instance, setting it to 0 1 * * 0
would run the job once a week at 1 AM every Sunday morning. The
crontab tool is pretty useful when it comes to experimenting
with cron scheduling.
Choose HTTP as the Target, set the URL as the Cloud
Function's URL (it can be found by selecting the Cloud Function and
navigating to the Trigger tab), and as HTTP method choose GET.
Once created, and by pressing the RUN NOW button, you can test how the export
behaves. However, before doing so, make sure the default App Engine service account has at least the Cloud IAM roles/storage.objectCreator role, or otherwise the operation might fail with a permission error. The default App Engine service account has a form of YOUR_PROJECT_ID#appspot.gserviceaccount.com.
If you wish to execute exports on different tables,
datasets and buckets for each execution, but essentially employing the same Cloud Function, you can use the HTTP POST method
instead, and configure a Body containing said parameters as data, which
would be passed on to the Cloud Function - although, that would imply doing
some small changes in its code.
Lastly, when the job is created, you can use the Cloud Function's returned job ID and the bq CLI to view the status of the export job with bq show -j <job_id>.
Not sure if this was in GA when this question was asked, but at least now there is an option to run an export to Cloud Storage via a regular SQL query. See the SQL tab in Exporting table data.
Example:
EXPORT DATA
OPTIONS (
uri = 'gs://bucket/folder/*.csv',
format = 'CSV',
overwrite = true,
header = true,
field_delimiter = ';')
AS (
SELECT field1, field2
FROM mydataset.table1
ORDER BY field1
);
This could as well be trivially setup via a Scheduled Query if you need a periodic export. And, of course, you need to make sure the user or service account running this has permissions to read the source datasets and tables and to write to the destination bucket.
Hopefully this is useful for other peeps visiting this question if not for OP :)
You have an alternative to the second part of the Maxim answer. The code for extracting the table and store it into Cloud Storage should work.
But, when you schedule a query, you can also define a PubSub topic where the BigQuery scheduler will post a message when the job is over. Thereby, the scheduler set up, as described by Maxim is optional and you can simply plug the function to the PubSub notification.
Before performing the extraction, don't forget to check the error status of the pubsub notification. You have also a lot of information about the scheduled query; useful is you want to perform more checks or if you want to generalize the function.
So, another point about the SFTP transfert. I open sourced a projet for querying BigQuery, build a CSV file and transfert this file to FTP server (sFTP and FTPs aren't supported, because my previous company only used FTP protocol!). If your file is smaller than 1.5Gb, I can update my project for adding the SFTP support is you want to use this. Let me know

Splunk migration to S3 DataLake

We're looking at moving away from Splunk as our datastore and looking at AWS Data Lake backed by S3.
What would be the process of migrating data from Splunk to S3? I've read lots of documents talking about archiving data from Splunk to S3 but not sure if this archives the data as a usable format OR if its in some archive format that needs to be restored to splunk itself?
Check out Splunk's SmartStore feature. It moves your non-hot buckets to S3 so you save storage costs. Running SmartStore on AWS only makes sense, however, if you run Splunk on AWS. Otherwise, the data export charges will bankrupt you. Data export applies when Splunk needs to search a bucket that's stored in S3 and so copies that bucket to an indexer. See https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/AboutSmartStore for more information.
From what I've read there are a couple of ways to do it:
Export using the Web UI
Export using REST API Endpoint
Export using CLI
Copy certain files in the filesystem
So far I've tried using the CLI to export and I've managed to export around 500,000 events at a time using
splunk search "index=main earliest=11/11/2019:00:00:01 latest=11/15/2019:23:59:59" -output rawdata -maxout 500000 > output2.dmp
However - I'm not sure how I can accurately repeat this step to make sure I include all 100 million+ events. IE search from DATE A to DATE B for 500,000 records, then search from DATE B to DATE C for the next 500,000 - without missing any events inbetween.

Get the BigQuery Table creator and Google Storage Bucket Creator Details

I am trying to identify the users who created tables in BigQuery.
Is there any command line or API that would provide this information. I know that audit logs do provide this information, but I was looking for a command line which could do the job so that i could wrap this in a shell script and run them against all the tables at one time. Same for Google Storage Buckets as well. I did try
gsutil iam get gs://my-bkt and looked for "role": "roles/storage.admin" role, but I do not find the admin role with all buckets. Any help?
This is a use case for audit logs. BigQuery tables don't report metadata about the original resource creator, so scanning via tables.list or inspecting the ACLs don't really expose who created the resource, only who currently has access.
What's the use case? You could certainly export the audit logs back into BigQuery and query for table creation events going forward, but that's not exactly the same.
You can find it out using Audit Logs. You can access them both via Console/Log Explorer or using gcloud tool from the CLI.
The log filter that you're interested in is this one:
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
If you want to run it from the command line, you'd do something like this:
gcloud logging read \
'
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
'\
--limit 10
You can then post-process the output to find out who created the table. Look for principalEmail field.

Pointing multiple projects' log sinks to one bucket

I have a few GCP projects with log sinks to different storage buckets. I'd like to combine them into a single bucket. But the stackdriver export doesn't add any distinguishing information to the object names it creates; they all look like cloudaudit.googleapis.com/activity/2017/11/14/00:00:00_00:59:59_S0.json
What will happen if I start pushing them all to a single bucket? Will the different project sinks overwrite each other's objects? Is there any way to distinguish which project created the logs just from the object?
If not, I guess I should switch to pubsub sinks, and then write some code that produces objects with more desirable names. Are there any established patterns or examples for doing this?
Update: I filed https://issuetracker.google.com/issues/69371200 for this issue.
To enable this, just select custom destination on the sink and point to the bucket with this format: storage.googleapis.com/[BUCKET_ID].
I've just enabled this in a couple of my projects, as I'm curious to see the results when exporting to a bucket. However, I have been using a single BQ sink for all my projects, and the tables created have all the logs mixed, so no logs lost when using a single BQ sink.
I'm assuming for a GCS sink will work in the same way, but I'll tell you in a couple of days.
If a single bucket sink does not work, you can always use a single BQ sink (that will help in analyzing the logs), and when you no longer want to have them in BQ, export them and store the files wherever you want.
Also, since you'll be writing to your sink constantly, you can't use nearline or coldline, so the storage pricing is better in BQ than a regional bucket (0.02 USD/GB in BQ vs somewhere between 0.02 and 0.35 USD/GB for regional storage, depending on the region; BQ has 10GB free monthly, GCS 5GB).
I would generally recommend using a BQ sink, but I'll tell you what happens with my bucket logs.
Update:
A few hours later, and I've verified that shared bucket sinks work pretty much as you would expect. It concatenates logs chronologically regardless of the project origin, and only creates a single file for each time window. Hope this helps! (I still prefer BQ as a log sink...)
Update 2:
For the behavior you seek in the feature request, I would use BQ, but you could just as easily grep the project ID and separate the logs:
grep '"logName":"projects/<your-project-id>/' mixed-log.json > single-project-log.json
Or just get a cloud function triggered by bucket updates (so, every time you receive a log file in the sink) to run this for you.
Or namespace you buckets and have a cloud function moving them to wherever you need as soon as they are written.
The possibilities are endless!
If you have an organization or folder which includes all the projects that you want to collect logs from, then you can create a sink that collects from all projects in that org/folder.
Unfortunatlely, you cannot do this from the Cloud Console. Instead you must use gcloud with the --organization or --folder option or the API.