Quick sight adding nesting to SQL query which causes errors in Athena - amazon-athena

I'm trying to create a very simple visualisation in Quicksight, and to do this I'm using an SQL query in Quicksight,
SELECT COUNT(distinct uuid), day
FROM analytics.myTable
GROUP BY day
Unfortunately, whenever I run this query in Quicksight it fails due to the following error
from the AWS Athena client. SYNTAX_ERROR: line 2:8: Column '_col0'
cannot be resolved
When I look in Athena, I can see that Quicksight is "nesting" the SQL query... this is what's causing the error in Athena,
/* QuickSight 4da449cf-ffc6-11e8-92ea-9ffafcc3adb3 */
SELECT "_col0"
FROM (SELECT COUNT(distinct uuid)
FROM pregnancy_analytics.final_test_parquet) AS "DAU"
What I don't understand is:
a) why this is flagging an error?
b) why Quicksight is nesting the SQL?
If I simply run the command directly in Athena,
SELECT COUNT(distinct uuid) FROM analytics.myTable
It does indeed show the column name "_col0",
_col0
1 1699174
so the fact that Quicksight is raising an error shouldn't actually be a problem.
Can someone offer some advice on how to resolve this issue?
Thanks

You can modify the query to explicitly name the aggregated column and then the query will work.
Example:
SELECT COUNT(distinct uuid) as "distinct_uuid", day
FROM analytics.myTable
GROUP BY day

Often in visualization software you will need to explicitly name your aggregate/function-wrapped columns as they default to things like _col0 which the software doesn't parse well, so it throws that error.
Specifically, I see this all the time in Superset using Presto.
For your problem explicitly, you should just do what Piotr recommended which is just adding a name after COUNT(distinct uuid) - I'm partial to freq, but it looks like you'll want something like uuid or unique_uuid :)

Related

Fetch Schedule data from a BigQuery Table to another BigQuery Table (Scheduled queries)

I am really new to GCP and I am trying to Query in a GCP BigQuery to fetch all data from one BigQuery table and Insert all into another BigQuery table
I am trying the Following query where Project 1 & Dataset.Table1 is the Project where I am trying to read the data. and Project 2 and Dataset2.Table2 is the Table where I am trying to Insert all the data with the same Naming
SELECT * FROM `Project1.DataSet1.Table1` LIMIT 1000
insert INTO `Project2.Dataset2.Table2`
But am I receiving a query error message?
Does anyone know how to solve this issue?
There may be a couple of comments...
The syntax might be different => insert into table select and so on - see DML statements in the standard SQL
Such approach of data coping might not be very optimal considering time and cost. It might be better to use bq cp -f ... commands - see BigQuery Copy — How to copy data efficiently between BigQuery environments and bq command-line tool reference - if that is possible in your case.
The correct syntax of the query is as suggested by #al-dann. I will try to explain further with a sample query as below:
Query:
insert into `Project2.Dataset2.Table2`
select * from `Project1.DataSet1.Table1`
Input Table:
This will insert values into the second table as below:
Output Table:

How to set a custom partition expiration in BigQuery

I have a dataset for which the Default table expiration is 7 days. I want only one of the tables within this dataset to never expire.
I found the following bq command : bq update --time_partitioning_expiration 0 --time_partitioning_type DAY project-name:dataset-name.table_name
The problem is my tables have a partitionning suffix so they're named like this example :
REF_PRICE_20210921, REF_PRICE_20210922, etc... so the table name per se is REF_PRICE_.
I can't seem to apply the bq command on this partitionned table. As I get an error BigQuery error in update operation: Not found: Table project-name:dataset-name.REF_PRICE_ but it does exist. What am I doing/understanding wrong?
EDIT : My tables are not "partitionned" but sharded; they are wildcard tables, and so separate. It is not possible to set an expiration date for those tables apparently, unless it's done on each one individually.
Have you tried suffixing the table name with * like REF_PRICE_* ?
Moreover you should read this post because you might have created sharded tables while you wanted partitioned one.

SnowFlake & PowerBI "native queries aren't support by this value"

Quick Note, I have reviewed these threads and they do not fix my issue:
( Outdated info, see documentation below ) Access Snowflake query results through PowerBI
( I would expect this to fix my issue, but it does not ) How to write a Snowflake SELECT statement query in Advance Editor from powerBi
Hi All,
When attempting to query snowflake with a native query, I get this error:
These are the parameters:
I have verified the credentials / tables /  databases / schemas are correct by connecting to directly to one table at a time, but simple queries like the screenshot and complex queries all return this message "native queries aren't support by this value".
I know this is a new feature ( June 2021 ) and I have read the documentation here: https://learn.microsoft.com/en-us/power-query/connectors/snowflake#connect-using-advanced-options
EDIT:
I have tried the following query formats:
SELECT * FROM "MyDatabase".PUBLIC.ITEMSTABLE
SELECT * FROM "MyDatabase"."PUBLIC".ITEMSTABLE
SELECT * FROM "MyDatabase"."PUBLIC"."ITEMSTABLE"
I believe that this may be due to my MyDatabase being case sensitive and PowerBI stripping the quotes around it in the query.
In snowflake, this query succeeds while the same query in PowerBI fails:
SELECT * FROM "MyDatabase".PUBLIC.ITEMSTABLE
Issue opened with Microsoft here:
https://community.powerbi.com/t5/Issues/Unable-to-query-case-sensitive-Snowflake-tables/idc-p/2030983
Any help is appreciated.
Most likely the query provided in message box is terminated with semicolon. It should be removed from the source query:
And actual query sent to Snowflake:
As we can see it is wrapped with outer query so any kind of input that makes the full query invalid one will error out.
I had similar issue with Native query written using Dataverse as Datasource. PowerBI Refresh was successful on the power bi desktop but refresh was failing on powerbi server. It was fixed for me when I have appended [EnableFolding=false] keyword in the native query. I have attached the screenshot for a reference of what i have modified. Hope this will help someone
enter image description here
Regards,
Mohith
It has been confirmed by a Microsoft ticket that my issue was that I had a case sensitive database name. The solution from MS was to... Not have a case sensitive DB name.

How to write a Snowflake SELECT statement query in Advance Editor from powerBi

I am trying to repoint the data Source of a power Bi dashboard from SQL to Snowflake.
So far, for most of the tables worked. However, I got one table on which I got the following error when changing the data source:
Expression.Error: 'Query' are not valid Snowflake options. Valid options are: 'ConnectionTimeout, CommandTimeout, CreateNavigationProperties, Role'
This specific query ( from Advance Editor in PowerBi) contains a simple select and it looks as follows:
let
Source =Snowflake.Databases("serverabc", "abc", [Query="SELECT DateLabel, SnapshotDate, Future, Latest#(lf)FROM Xtable#(lf)WHERE DateLabel IS NOT NULL#(lf)GROUP BY DateLabel, SnapshotDate, Future, Latest", CreateNavigationProperties=false]),
#"Filtered Rows" = Table.SelectRows(Source, each true)
in
#"Filtered Rows"
The select statement works in both SQL and Snowflake but I am having difficulties on how to translate this in Power BI as well.
Thank you in advance
EDIT:
PowerBI June
Snowflake (updated connector)
We are adding the highly demanded Custom SQL support for the Snowflake connector. Like the SQL connector, this will let you input a Snowflake native query and then build reports on top of it. This will work with both Import and Direct Query mode.
https://powerbiblogscdn.azureedge.net/wp-content/uploads/2021/06/snowflake_update.png
Expression.Error: 'Query' are not valid Snowflake options. Valid options are: 'ConnectionTimeout, CommandTimeout, CreateNavigationProperties, Role'
It seems that the previous source was supporting "custom query".
Sql.Database
function (server as text, database as text, optional options as nullable record) as table
Query : A native SQL query used to retrieve data. If the query produces multiple result sets, only the first will be returned.
PowerBI connector to Snowflake does not support such option:
Snowflake.Databases
function (server as text, warehouse as text, optional options as nullable record) as table
options, may be specified to control the following options:
ConnectionTimeout: The number of seconds to wait for network responses from Snowflake.
CommandTimeout: The number of seconds to wait for a query to execute.
There is active ticket: Snowflake connector -> add SQL statement as optional.
Possible workarounds:
Create a view in Snowflake that wraps the query and use it instead
Access the table content and perform the filtering/aggregation in
PowerQuery

Amazon Redshift query editor triggering a query twice

I am trying to insert some values in a customer table:
INSERT into customer
(c_custkey,c_name,c_address,c_city,c_nation,c_region,c_phone,c_mktsegment)
VALUES (123,'ddf','sfvvc','ccxx','dddd','dddss','sszzs','sssaaaa');
I am using the Amazon Redshift Query Editor for this. The query is getting triggered twice and I can see that in STL_QUERY and STV_RECENTS tables.
Can someone help me resolve this and why does it work this way?