I try to follow the instructions in https://cloud.google.com/storage/docs/access-logs to create access log for my bucket in Google Cloud Platform. The problem is that the access log is created hourly, which means in one day, there will be 24 files and for one week, there will be 24 * 7 files. That is not convenient to manage.
Is it possible to create access log daily or weekly? Or auto merge all hourly logs?
Thanks
It is not possible to change the frequency of these logs creation. However, you could create a Cloud Functions that is triggered every day or week and merges all the logs into one CSV. Here I found a blog that explains how to do it and may be useful for you.
Also, I have created a feature request and you can follow its progress in this link
Related
I want to set weekly Google Play transfer, but it can not be saved.
At first, I set daily a play-transfer job. It worked. I tried to change transfer frequency to weekly - every Monday 7:30 - got an error:
"This transfer config could not be saved. Please try again.
Invalid schedule [every mon 7:30]. Schedule has to be consistent with CustomScheduleGranularity [daily: true ].
I think this document shows it can change transfer frequency:
https://cloud.google.com/bigquery-transfer/docs/play-transfer
Can Google Play transfer be set to weekly?
By default transfer is created as daily. From the same docs:
Daily, at the time the transfer is first created (default)
Try to create brand new weekly transfer. If it works, I would think it is a web UI bug. Here are two other options to change your existing transfer:
BigQuery command-line tool: bq update --transfer_config
Very limited number of options are available, and schedule is not available for update.
BigQuery Data Transfer API: transferConfigs.patch Most transfer options are updatable. Easy way to try it is with API Explorer. Details on transferconfig object. schedule field need to be defined:
Data transfer schedule. If the data source does not support a custom
schedule, this should be empty. If it is empty, the default value for
the data source will be used. The specified times are in UTC. Examples
of valid format: 1st,3rd monday of month 15:30, every wed,fri of
jan,jun 13:15, and first sunday of quarter 00:00. See more explanation
about the format here:
https://cloud.google.com/appengine/docs/flexible/python/scheduling-jobs-with-cron-yaml#the_schedule_format
NOTE: the granularity should be at least 8 hours, or less frequent.
For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en
I am new to Google Cloud StackDriver Logging and as per this documentation StackDriver stores the Data Access audit logs for 30 days. Also mentioned on the same page, that Size of a log entry is limited to 100KB.
I am aware of the fact that the logs can be exported Google Cloud Storage using Cloud SDK as well as using Logging Libraries in many languages (we prefer Python).
I have two questions related to the exporting the logs, which are:
Is there any way in StackDriver to schedule something similar to a task or cronjob that keeps exporting the Logs in the Google Cloud storage automatically after a fixed interval of time?
What happens to the log entries which are larger than 100KB. I assume they get truncated. Is my assumption correct? If yes, is there any way in which we can export/view the full(which is not at all truncated) Log entry?
Is there any way in StackDriver to schedule something similar to a
task or cronjob that keeps exporting the Logs in the Google Cloud
storage automatically after a fixed interval of time?
Stackdriver supports exporting log data via sinks. There is no schedule that you can set as everything is automatic. Basically, the data is exported as soon as possible and you have no control over the amount exported at each sink or the delay between exports. I have never found this to be an issue. Logging, by design, is not to be used as a real-time system. The closest is to sink to PubSub which has a couple of second delay (based upon my experience).
There are two methods to export data from Stackdriver:
Create an export sink. Supported destinations are BigQuery, Cloud Storage and PubSub. The log entries will be written to the destination automatically. You can then use tools to process the exported entries. This is the recommended method.
Write your own code in Python, Java, etc. to read the log entries and do what you want with them. Scheduling is up to you. This method is manual and requires your management of schedule and destination.
What happens to the log entries which are larger than 100KB. I assume
they get truncated. Is my assumption correct? If yes, is there any way
in which we can export/view the full(which is not at all truncated)
Log entry?
Entries that exceed the max size of an entry cannot be written to Stackdriver. The API call that attempts to create the entry will fail with an error message similar to (Python error message):
400 Log entry with size 113.7K exceeds maximum size of 110.0K
This means that entries that are too large will be discarded unless the writer has logic to handle this case.
As per the documentation of stack driver logging the whole process is automatic. Export sink to google cloud storage is slower than the Bigquery and Cloud sub/pub. link for the documentation
I recently used the export sink to the big query, which is better than cloud pub/sub in case if you don't want to use other third-party application for log analysis. For Bigquery sink needs dataset where do you want to store the log entries. I noticed that sink create bigquery table on a timestamp basis in the bigquery dataset.
One more thing if you want to query timestamp partitioned tables check this link
Legacy SQL Functions and Operators
I am looking to pull a monthly performance report of a campaign running in Double click bid manager, I'm pulling this report via the `DoubleClick Bid Manager API(https://developers.google.com/bid-manager/v1/queries/createquery).
I just plug the JASON syntax with the account and various fields, when I click on execute I'm able to download the reports from the reports tab of bid manager, then I'm uploading it to S3 manually so that I can later dump it into Redshift.
I was wondering if there was a way I can do this programmatically, instead of uploading the report manually every time.
If there is a question already on this, please point me in the right direction.
Thanks in advance.
I´ve managed to create a bucket in my Amazon s3 account, and I´ve added to it a couple of files.
I´ve downloaded CloudBerry Explorer (the freeware edition), and I´ve uploaded a file and set it as private with an expiration date.
This is what my link looks like:
http://dja61b1p3y3bp.cloudfront.net/taller_parte1.flv?AWSAccessKeyId=AKIAIEZ23XILJFNS3ZVA&Expires=1379991642&Signature=GrBLzn13nkm8NiU6JXBj0jC0i%2F8%3D
The thing is that this is an "static" expiration date, it has a one week expiration date counting it from now.
I mean, If I have an online course, and I receive every week different students, and I want to set that course and use that file, I should go and update that expiration date every single week.
Is there any way to configure the file to expire a week later counting that period of time from the click that has made each individual user?
How may I do that?
Thanks for your insight!