I have a query and I wanted to schedule it as quarterly basis.
For example, first quarter of calendar is Jan 2022 - Mar 2022 and so I wanted to run this query on 01 Mar 2022. Is there anyway we can do that ?
Any help will be highly appreciated.
Many thanks.
BigQuery uses the cron sintax of AppEngine (see docs) for custom schedules, so if want run a query every "1 Mar", you can set the "Repeats" option to "Custom" and the "Custom Schedule" as below:
1 of mar 00:00
To schedule multiple months, you can use a comma-separated list:
1 of mar,jun,sep,dec 00:00
Option1: Using some programmatic api like python/go/java for bigquery to call your query and schedule through some cron on backend/GCE box.
Option2: If you do not want to use some programatic api to do it on some cron job: Bigquery native UI now does support scheduled queries. You can schedule your query through it. Screenshot below:
Option3: If you have many of such queries which you want to run on some cadence you can either go: Airflow OR even better option: https://potens.io/products/#magnus
Related
I need a scheduled query only between Monday to Friday between 9 and 7 o'clock:
Scheduled queries is currently: every hour from 9:00 to 19:00
But how to modify for Mo-Fr ?
every monday to friday from 9:00 to 19:00 not working
every monday from 9:00 to 19:00 working (so day of the week is in general not working ?)
Thanks
UPDATE: The question at hand is much more complex than the Custom setting in BigQuery Scheduled Queries allows. For this purpose, #guillaume blaquiere has the best suggestion: use Cloud Scheduler to run a cron job. Tools like Crontab Guru can be helpful in creating a statement such as 00 9-19 * * 1-5.
For simpler Scheduled Queries, please review the following from the official documentation: Set up scheduled queries.
Specifically,
To specify a custom frequency, select Custom, then enter a Cron-like
time specification in the Custom schedule field; for example every 3
hours.
There is excellent documentation in the Custom Interval tab here on the many options you have available in this field.
thanks for the Feedback. So like this one ? But this is not working
I have lake dataset which take data from a OLTP system, with the nature of transactions we have lot of updates the next day, so to keep track of the latest record we are using active_flag = '1'.
We also created a update script which retires old records and updates active_flag = '0'.
Now the main question: how can i execute a update statement by changing table name automatically(programmatically).
I know we have a option of using cloudfunctions but it'll expire in 9 mins and I have atleast 350 tables to update.
Has anyone faced this situation earlier??
You can easily do this with Cloud Workflows.
There you setup the template calls to Bigquery as a substeps, and then you pass a list of tables, and loop through the items and invoke the BigQuery step for each item/table.
I wrote an article with samples that you can adapt: Automate the execution of BigQuery queries with Cloud Workflows
I want to set weekly Google Play transfer, but it can not be saved.
At first, I set daily a play-transfer job. It worked. I tried to change transfer frequency to weekly - every Monday 7:30 - got an error:
"This transfer config could not be saved. Please try again.
Invalid schedule [every mon 7:30]. Schedule has to be consistent with CustomScheduleGranularity [daily: true ].
I think this document shows it can change transfer frequency:
https://cloud.google.com/bigquery-transfer/docs/play-transfer
Can Google Play transfer be set to weekly?
By default transfer is created as daily. From the same docs:
Daily, at the time the transfer is first created (default)
Try to create brand new weekly transfer. If it works, I would think it is a web UI bug. Here are two other options to change your existing transfer:
BigQuery command-line tool: bq update --transfer_config
Very limited number of options are available, and schedule is not available for update.
BigQuery Data Transfer API: transferConfigs.patch Most transfer options are updatable. Easy way to try it is with API Explorer. Details on transferconfig object. schedule field need to be defined:
Data transfer schedule. If the data source does not support a custom
schedule, this should be empty. If it is empty, the default value for
the data source will be used. The specified times are in UTC. Examples
of valid format: 1st,3rd monday of month 15:30, every wed,fri of
jan,jun 13:15, and first sunday of quarter 00:00. See more explanation
about the format here:
https://cloud.google.com/appengine/docs/flexible/python/scheduling-jobs-with-cron-yaml#the_schedule_format
NOTE: the granularity should be at least 8 hours, or less frequent.
We have a campaign management system. We create and run campaigns on various channels. When user clicks/accesses any of the Adv (as part of campaign), system generates a log. Our system is hosted in GCP. Using ‘Exports’ feature logs are exported to BigQuery
In BigQuery the Log Table is partitioned using ‘timestamp’ field (time when log is generated). We understand that BigQuery stores dates in UTC timezone and so partitions are also based on UTC time
Using this Log Table, We need to generate Reports per day. Reports can be like number of impressions per each day per campaign. And we need to show these reports as per ETC time.
Because the BigQuery table is partitioned by UTC timezone, query for ETC day would potentially need to scan multiple partitions. Had any one addressed this issue or have suggestions to optimise the storage and query so that its takes complete advantage of BigQuery partition feature
We are planning to use GCP Data studio for Reports.
BigQuery should be smart enough to filter for the correct timezones when dealing with partitions.
For example:
SELECT MIN(datehour) time_start, MAX(datehour) time_end, ANY_VALUE(title) title
FROM `fh-bigquery.wikipedia_v3.pageviews_2018` a
WHERE DATE(datehour) = '2018-01-03'
5.0s elapsed, 4.56 GB processed
For this query we processed the 4.56GB in the 2018-01-03 partition. What if we want to adjust for a day in the US? Let's add this in the WHERE clause:
WHERE DATE(datehour, "America/Los_Angeles") = '2018-01-03'
4.4s elapsed, 9.04 GB processed
Now this query is automatically scanning 2 partitions, as it needs to go across days. For me this is good enough, as BigQuery is able to automatically figure this out.
But what if you wanted to permanently optimize for one timezone? You could create a generated, shifted DATE column - and use that one to PARTITION for.
I have a restricted URL where every 10 mins. a json is incorporated. Accordingly, I have developed several R visuals in power BI Desktop by importing a sample json from this URL and thereby have published them in Power BI service ( in Power BI Pro trial version ).
How can I schedule import of the latest json every 10 minutes from this URL so that the reports so developed are automatically updated as per the latest json?
Configuring scheduled refresh is what you're looking for.
This will enable you to import the latest json every time you specify it.
If you're desperate for a 10 minute refresh, you cannot do that.
The best you can do is use Azure Analysis services / SSAS to connect to the JSON, do all the ETL/whatever there, and then direct query to it. You'll have to set the SSAS/AAS to Process very often.
Direct query allows for 15 minute refreshes.