I have a table in GCP (BigQuery) that is overwritten every day with data from an external source. Is there any way to view the state of the table at a point in the past? The following code (from https://cloud.google.com/bigquery/docs/time-travel):
SELECT *
FROM `mydataset.mytable`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
produces results just for the case when the "interval" is younger than the time when the table was last modified.
EDIT: It looks like the maximum time travel window is 7 days.
I used the console since it was easier to manage and view.
In order to view the past state of your BigQuery table, you can hover to PERSONAL HISTORY to display the information of your recent jobs or to PROJECT HISTORY to display the recent jobs in your project.
After overwriting your query, you can notice that it has been executed, and per execution has a Job ID. If you specifically know the Job ID of the past table that you want to view, you can filter it. You can also view the Query job details when you click the 3 vertical dots, and from there you can see the state of the table with its specific timestamp and other essential details.
Related
I have a PBI desktop dashboard I've created to pull machine data from a local SQL server. I'm using a relative date time filter on one of the pages to drill down data for live feed, however anything under 5 hours of the relative time, the data goes blank.
I use 4 log tables for the raw data, each having their own time stamp for each instance. Each are related using a ID table with other general information contained. In addition, time is related using a calculated table to create a timeframe of all instances:
Relationship Model
DateTable = distinct(union(SUMMARIZE(LogFault,LogFault[Time]),SUMMARIZE(LogGood,LogGood[Time]),SUMMARIZE(LogReject,LogReject[Time]),SUMMARIZE(LogState,LogState[Time])))
5 Hours Relative Time
4 hours relative time
As you can see from the top right of the images, not even the times are pulled to the page. Is there a limitation to PBI on the relative time function? This wouldn't make sense to me if there is a "minutes" option under relative time. Any feedback on this would be appreciated.
For those looking in the future, unfortunately PowerBI desktop, along with service, appears to only like to work in the UTC time zone. So the relative date/time was filtering based on the UTC time zone, not my time zone (EST). In order to resolve this, I had to create a new calculated column next to my distinct time stamps to correct for the time zone. I then used the adjusted time for the relative time filtering, but the charts remained under the original time stamps.
UTC to EST time zone adjust
UTC_AdjustTZ = FORMAT(DateTable[Time]+TIME(4,0,0),"General Date")
Chart Example after adjust
Chart after fix implemented
Probably because your filter on Date Table doesn't reach the destined table. Normally filter moves from one side to many side, then one side to many side in a chain of relationships; but
In your case for example:
Filter goes from Date Table to Log Reject then It can't move to RejectDefinitions because of the filter direction. You have 2 options here:
1) Change the model relationships : Make Log Reject(One side) and RejectDefinitions(Many side) if It is possible.
OR
2) Set the filter direction as Both in the model.
You need to do this for all the remaining log tables(LogFault-FaultDefinitions,Logstate-StateDefinitions)
I hope It solves your problem. Please check that your model is not ambiguous after making those changes.
I have setup a PowerBI dataset with incremental refresh following this guide https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-configure and ensured that all tables have RangeStart > x and RangeEnd <= x to ensure only one side has the =. I continued to investigate https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-troubleshoot and noticed there is a comment
With a refresh operation, only data that has changed at the data source is refreshed in the dataset. As the data is divided by a date, it’s recommended post (transaction) dates are not changed.
Which to me sounds extremely limiting. Our data has two date fields LastModified and RowCreatedAt that are both date/time columns. Last Modified is the real date/time of the last modification to the data in the row. RowCreatedAt is the real date/time of when that modification was persisted to the database. These can be very different (eg, if the customer is new, but has legacy data, the LastModified date may be very old, but RowCreatedAt will be very recent).
I decided to go with the RowCreatedAt value since that is something that we control (eg, if we were to refresh LastModifiedDate and load in historical data, it would never be imported to PowerBI after the initial refresh). Both the LastModifiedDate and RowCreatedAt fields are updated when data changes in the system (eg, sales order gets a new line item added to it).
My expectation was that when data changed and the partition date was updated, it would properly update the data in the dataset (eg, remove the old row and insert the new row since the same primary key, but other data is changed). This seems completely normal and expected behavior, but from the documentation, it seems like you can only import data which is not ever going to change or you have to refresh your history to the point where the change occurred. This seems like a crazy limitation (eg, who has unchanging data for all time??) so I'm hopefully just misunderstanding something.
I make comparisons of the dates in my data (saved as "due dates" in my data table, as date data type ,without time) with current_date in many pages in my oracle apex application. For example I have one page where I show the "items due for today" in an interactive report by checking the due date of records for their equivalence with current_date, and similarly another page shows the items post due date. The SQL queries generate correct reports but the problem is that current date is calculated as per US (PDT/UCT7) time which is -9 hrs behind from my current region's time, therefore my pages show correct result on afternoon only.
I researched and discovered that setting the application timezone to automatic in application's globalization properties will solve this problem but when I do set the automatic timezone, my application stops working at all and . A "Set time zone" page shows for microsecond but without even letting the user set timezone, page redirects and shows the below error.
Any suggestion to fix this really serious issue will be highly appreciated :(
You mention that your dates are stored in a table of type DATE. Setting "Automatic Timezone" in your application will not change anything. It sets the database session time zone, which is used for columns of datatype "TIMESTAMP WITH LOCAL TIME ZONE". The datatype "DATE" does not know about time zones.
To ensure that your users see the same information is every region around the world, you should store your date information in a column of datatype "TIMESTAMP WITH LOCAL TIME ZONE". The "timezone sensitive" equivalent of SYSDATE is CURRENT_TIMESTAMP.
Joel Kallman wrote a blog about this a long time ago, describing this scenario.
https://joelkallman.blogspot.com/2010/09/automatic-time-zone-support-in.html
I have 40 million rows in my dataset. Each day I may get an extra 100 rows. Obviously I don't want to have to import the whole 40 million each time I do a data refresh. Is it possible to do an incremental refresh where only the new rows are added?
I don't think incremental update as you describe it is possible yet.
It looks like you can push rows with Power BI REST API, if you're happy to switch to that.
However, you might find this workaround useful:
Split your table and query into two: where date <= 'somedate' and where date >'somedate'
Add an "empty query", use Table.Combine to join your two subtables. Use this as your main table.
Whenever you need to refresh, only refresh the second query (the one with where date >'somedate').
Every once in a while, when that second query starts taking a long time, change somedate to the current date and do a full refresh.
The feature has now been implemented and is called Incremental refresh. Currently it is a premium only feature.
i'm trying the new Power BI (Desktop) to create a barchart that shows me the duration in days for the delivery of an order.
I have 2 files. 1 with the delivery data (date, barcode) and another file with the deliverystatusses (date, barcode).
I Created a relation in the powerBI relations tab on the left side to create a relation on barcode. 1 Delivery to many DeliveryStatusses.
Now I want to add a column/measure to calculate the number of days before a package is delivered. I searched a few blogs but with no succes.
The function DATEDIFF is only recognized in a measure, and measures seem to work on table date, not rowdata. So adding a column using the DATEDIFF function doesn't work.
Adding a column using a formula :
Duration = [DeliveryDate] - Delivery[OrderDate]
results in an error that the right side is a list (It seems the relationship isn't in place)?
What am I doing wrong?
You might try doing this in the Query window instead since I think each barcode has just one delivery date and one delivery status. You could merge the two queries into a single table. Then you wouldn't need to worry about the relationships... If on the other hand you can have multiple lines for each delivery in the delivery status table, then you need to get more fancy. If you're only interested in the last status (as opposed to the history of status) you could again use the Query windows to group the data. If you need the full flexibility, you'd probably need to create a Measure that expresses the logic you want.
The RELATED keyword is used to reference another table. Update your query as follows and it should work.
Like this:
Duration = [DeliveryDate] - RELATED(Delivery[OrderDate])