Google Cloud DataPrep DATEDIF function inconsistent - google-cloud-platform

I have four DateTime columns, all in long format eg 2016-08-01T21:13:02Z. They are called EnqDateTime, QuoteCreatedDateTime, BookingCreatedDateTime and RejAt.
I want to add columns for the duration (in days) between EnquiryDateTime and the other three columns, i.e.
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
This works for RejAt, but throws an error for all the other columns:
Parameter "rhs" accepts only ["Datetime"]
As per the image below, all four columns are DateTime.
Can anyone see any other reason this may not be working for 2 of the three columns?

As you can see in the image below, I reproduced an scenario such as the one you presented here, and I had no issue with it. I create the three columns X2Y using the same formula that you shared:
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
DATEDIF(EnqDateTime, BookingCreatedDateTime, day)
DATEDIF(EnqDateTime, RejAt, day)
My guessing is that, for some reason, the columns do not have an appropriate Datetime format. Maybe you can try applying some transformations to the data in order to make sure that the data contained in the columns has the appropriate format. I recommend that you try doing the following:
Clean all missing values, clicking on the column and then Clean > Missing > Fill with NULL. Missing values can prevent Dataprep from recognizing a data type properly.
Change the data type again to Datetime, just to doublecheck that there is not any field that does not have the Datetime type. You can do so by clicking on the column and then Change type > Date/Time.
If these methods do not solve your issue, maybe you can try working with a minimal example, having only a few rows, so that you can narrow down the variables with which to work. Then you can update your question with more information.
It would also be nice to know where are you getting the error Parameter "rhs" accepts only ["Datetime"]. It is not clear for me what the rhs (Right Hand Side) parameter is in this case, so maybe you can also provide more details about that.

Related

POWER QUERY [Expression.Error] Cannot convert the value null to type Table

SOLVED USING A DIFFERENT APPROACH (see at the end)
I am trying to combine some queries into one by using the Table.Combine() function.
If I explicitly write the name of each query (e. g., Table.Combine({#"Name of query 1", #"Name of query 2"})) and then apply the changes, everything works fine.
However, since I want to make it dynamic, instead of writing a list of names, I pass the function a list of tables generated in a previous step:
So after I get this table, the next step is: = Table.Combine(PreviousStep[Value]). Note that Value is the name of the column that contains the tables. Apparently, by doing so this column of a table containing tables is converted to a list containing tables. This works fine (I can preview the resultset) until I hit that Apply changes button. When I do it, this message pops up:
I had a look at these threads: https://community.powerbi.com/t5/Desktop/We-cannot-convert-the-value-null-to-type-Table/td-p/391064, https://community.powerbi.com/t5/Desktop/We-cannot-convert-the-value-null-to-type-table/m-p/346056, but it didn't work. I've tried other approaches as well.
Further information:
Power BI Desktop version: 2.106.582.0 64-bit (June 2022)
Data source: combining existing queries that come from a single Excel file.
Steps followed to get that list of tables that I pass the Table.Combine() function:
let
Origen = #sections[Section1],
#"Convertido en tabla" = Record.ToTable(Origen),
#"Errores quitados" = Table.RemoveRowsWithErrors(#"Convertido en tabla", {"Value"}),
Personalizado1 = Table.SelectRows(#"Errores quitados", each Text.StartsWith([Name], "COMPRAS Y GASTOS")),
Personalizado2 = Table.Combine(Personalizado1[Value])
in
Personalizado2
I access all the queries I have (with the #sections keyword), convert it to a table, remove possible errors, filter to get the queries I want (the ones starting by "COMPRAS Y GASTOS") and then try to combine the queries).
A DIFFERENT APPROACH
What I wanted to do was merge tables that came from an Excel file, each of them referring to a year (2019, 2020, 2021, 2022). But I also wanted the combined table to update when new sheets were added on Excel (2023, 2024...).
I've tried many different approaches, like generating a dynamic list (from 2019 until the current year)... but for some reason none of them worked, even though the code apparently is correct.
So my new approach has been to create a sufficient amount of Excel sheets for the coming years (that are now empty, but when the new year comes the information will be filled in there), to create the queries referring to those sheets (they return empty tables) and merging those existing (but empty) tables with the ones from 2019-2022. This way, when data from 2023 is filled in in the sheet, the query is updated and it works.
It's a shame I couldn't actually solve the original problem I had, but this approach works.

Superset partion graph type order by

We are creating several charts in superset and with the partition type chart the ORDER BY seems to be hard coded and we cannot change it. The goal is too have the months on the left in the correct order (the column in this case is Month). When run in sql lab it works in correct order but in the chart view we cannot change the ordering
Any suggestions?
I assume you mean the dates on the right here?
I work with superset and I have experienced this limitation that does appear to be hard-coded into the ordering once a chart is made. I would suggest if it wasn't too much hassle to add another column to your database of the text value and follow the patter of;
WHERE "Month" = 'January' SET "OrderingColumn" = 'A'
WHERE "Month" = 'February' SET "OrderingColumn" = 'B'
etc etc
Then in your charts you can try: ORDER BY "OrderingColumn"
It is a bit of an inconvenience but if you are able to manipulate your data by changing tables or views this seems to be a solution you could use.
I hope this may be useful even to change the way of approaching the problem.

How to see 'full' SQL Error Messages in BigQuery?

I am writing a large MERGE statement in BigQuery.
When I attempt to run this query the validator gives me an error involving a lot of ...'s that hides the useful information as shown below:
Value has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> which cannot be inserted into column Events, which has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> at [535:1]
I am extremely confident these two array objects match exactly, however since I am struggling to get around this I would love to see the full error message.
Is there any way to see the full error?
I have looked into the Google Logging tool and cannot see any additional information.
I have also tried the following Cloud Shell command:
bq --format=prettyjson show -j [Job Id Goes Here]
Again, this seems to provide no additional information.
This approach feels pretty silly but it could be the last resort for really long nest type.
Use INFORMATION_SCHEMA.COLUMNS to get a full string of the target type, in your case, type of column Events.
Use CREATE TABLE <yourDataset>.<yourTempTable> AS SELECT ... to dump one row of the Value into a table. Use 1) again to see its full type string.

Dataprep change str yyyymmdd date to datetime column

I have a column with dates (in a string format) in Dataprep: yyyymmdd. I would like it to become a datetime object. Which function/transformation should I apply to achieve this result automatically?
In this case, you actually don't need to apply a transformation at all—you can just change column type to Date/Time and select the appropriate format options.
Note: This is one of the least intuitive parts of Dataprep as you have to select an incorrect format (in this case yy-mm-dd) before you can drill-down to the correct format (yyyymmdd).
Here's a screenshot of the Date / Time type window to illustrate this:
While it's unintuitive, this will correctly treat the column as a date in future operations, including assigning the correct type in export operations (e.g. BigQuery).
Through the UI, this will generate the following Wrangle Script:
settype col: YourDateCol customType: 'Datetime','yy-mm-dd','yyyymmdd' type: custom
According to the documentation, this should also work (and is more succinct):
settype col: YourDateCol type: 'Datetime','yy-mm-dd','yyyymmdd'
Note that if you absolutely needed to do this in a function context, you could extract the date parts using SUBSTRING/LEFT/RIGHT and pass them to the DATE or DATETIME function to construct a datetime object. As you've probably already found, DATEFORMAT will return NULL if the source column isn't already of type Datetime.
(From a performance standpoint though, it would probably be far more efficient for a large dataset to either just change the the or create a new column with the correct type versus having to perform those extra operations on so many rows.)

Getting a date value in a postgres table column and check if it's bigger than todays date

I have a Postgres table called clients. The name column contains certain values eg.
test23233 [987665432,2014-02-18]
At the end of the value is a date, I need to compare this date, and return all records where this specific date is younger than today
I tried
select id,name FROM clients where name ~ '(\d{4}\-\d{1,2}\-\d{1,2})';
but this isn't returning any values. How would I go about to achieve the results I want?
If the data is always stored this way (i.e. after the comma), I would not use a regex, but extract the date part and convert it to a proper date type.
SELECT *
FROM the_table
WHERE to_date(substring(name, strpos(name, ',') + 1, 10), 'yyyy-mm-dd') < current_date
You might want to put that to_date(...) thing into a view to make this easier for other queries.
In the long run you should realy (really) try to fix that data model.
Using a regular expression for this would be extremely hard. Is it possible to change the schema and data to separate the name, whatever the second value is, and the timestamp into separate columns? That would be far more logical, less error prone, and significantly faster.
Otherwise, I suspect you'll have to use some sort of parsing (possibly a regex) to extract the date, then convert it to a Postgres date, then compare that with the current time... for every single row. Ick.
EDIT: Actually, it's not quite that bad... because your dates are stored in a sort-friendly way, it's possible that you could do the extraction (whether with a regex or anything else) and just do an ordinal comparison with the string representation of today's date, without actually performing any date conversion for each row. It's still ugly though, and doesn't validate that the date isn't (say) 2011-99-99. If you can possibly store the data more sensibly, do.
I solved my issue by doing something similar to
select id,substring(name,'[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}'),name FROM clients where substring(name,'[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}') > '2011-03-18';
Might not be the best practice, but it works. But open to better suggestions