SELECT DISTINCT in PartiQL (AWS DynamoDB) - amazon-web-services

I am running the following query in the AWS DynamoDB PartiQL editor:
SELECT DISTINCT column1
FROM "my_lucky-table"
WHERE Id = "db05-5d1"
but I am getting the following error:
ValidationException: Unsupported token in expression: DISTINCT
Any idea how to deal with this? If DINSTICT is not supported in PartiQL what else can I run in order to get the unique values from column1 ?? Thank you.

PartiQL does not support Distinct. I'm not sure what data type your column1 is but for the most part you would need to do the distinct filtering on your client side not on the DynamoDB side.

Related

Is it possible to use JOIN in PartiQL in Dynamo?

From the AWS guide here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Querying.html
SELECT ecs.state_name, f.feature_name, f.feature_class
FROM s3_east_coast_states ecs
JOIN ddb_features f ON ecs.state_alpha = f.state_alpha
WHERE ecs.state_name LIKE 'New%';
That's definitely a JOIN. But when I run a join:
SELECT * FROM "division-allocations-dev" da JOIN "branch-division-dev" bd ON bd.divisionID = da.divisionID where da.divisionID = 499;"
I get this error:
Only select from a single table or index is supported.
Now those docs are specific to EMR for Dynamo, so is a JOIN only allowed in the EMR tool? PartiQL definitely has JOINs so is Dynamo only supporting a subset of PartiQL? If so, where do I find a list of what Dynamo supports?
Is it possible to use JOIN in PartiQL in Dynamo?
Short answer: no.
Dynamo only supporting a subset of PartiQL?
Yes. The DynamoDB PartiQL subset provides familiar syntax consistent with the (no-join) core API.
where do I find a list of what Dynamo supports?
See the docs. You get SELECT (no joins), UPDATE, INSERT and DELETE.

Fetch Schedule data from a BigQuery Table to another BigQuery Table (Scheduled queries)

I am really new to GCP and I am trying to Query in a GCP BigQuery to fetch all data from one BigQuery table and Insert all into another BigQuery table
I am trying the Following query where Project 1 & Dataset.Table1 is the Project where I am trying to read the data. and Project 2 and Dataset2.Table2 is the Table where I am trying to Insert all the data with the same Naming
SELECT * FROM `Project1.DataSet1.Table1` LIMIT 1000
insert INTO `Project2.Dataset2.Table2`
But am I receiving a query error message?
Does anyone know how to solve this issue?
There may be a couple of comments...
The syntax might be different => insert into table select and so on - see DML statements in the standard SQL
Such approach of data coping might not be very optimal considering time and cost. It might be better to use bq cp -f ... commands - see BigQuery Copy — How to copy data efficiently between BigQuery environments and bq command-line tool reference - if that is possible in your case.
The correct syntax of the query is as suggested by #al-dann. I will try to explain further with a sample query as below:
Query:
insert into `Project2.Dataset2.Table2`
select * from `Project1.DataSet1.Table1`
Input Table:
This will insert values into the second table as below:
Output Table:

How to write a Snowflake SELECT statement query in Advance Editor from powerBi

I am trying to repoint the data Source of a power Bi dashboard from SQL to Snowflake.
So far, for most of the tables worked. However, I got one table on which I got the following error when changing the data source:
Expression.Error: 'Query' are not valid Snowflake options. Valid options are: 'ConnectionTimeout, CommandTimeout, CreateNavigationProperties, Role'
This specific query ( from Advance Editor in PowerBi) contains a simple select and it looks as follows:
let
Source =Snowflake.Databases("serverabc", "abc", [Query="SELECT DateLabel, SnapshotDate, Future, Latest#(lf)FROM Xtable#(lf)WHERE DateLabel IS NOT NULL#(lf)GROUP BY DateLabel, SnapshotDate, Future, Latest", CreateNavigationProperties=false]),
#"Filtered Rows" = Table.SelectRows(Source, each true)
in
#"Filtered Rows"
The select statement works in both SQL and Snowflake but I am having difficulties on how to translate this in Power BI as well.
Thank you in advance
EDIT:
PowerBI June
Snowflake (updated connector)
We are adding the highly demanded Custom SQL support for the Snowflake connector. Like the SQL connector, this will let you input a Snowflake native query and then build reports on top of it. This will work with both Import and Direct Query mode.
https://powerbiblogscdn.azureedge.net/wp-content/uploads/2021/06/snowflake_update.png
Expression.Error: 'Query' are not valid Snowflake options. Valid options are: 'ConnectionTimeout, CommandTimeout, CreateNavigationProperties, Role'
It seems that the previous source was supporting "custom query".
Sql.Database
function (server as text, database as text, optional options as nullable record) as table
Query : A native SQL query used to retrieve data. If the query produces multiple result sets, only the first will be returned.
PowerBI connector to Snowflake does not support such option:
Snowflake.Databases
function (server as text, warehouse as text, optional options as nullable record) as table
options, may be specified to control the following options:
ConnectionTimeout: The number of seconds to wait for network responses from Snowflake.
CommandTimeout: The number of seconds to wait for a query to execute.
There is active ticket: Snowflake connector -> add SQL statement as optional.
Possible workarounds:
Create a view in Snowflake that wraps the query and use it instead
Access the table content and perform the filtering/aggregation in
PowerQuery

Amazon Redshift query editor triggering a query twice

I am trying to insert some values in a customer table:
INSERT into customer
(c_custkey,c_name,c_address,c_city,c_nation,c_region,c_phone,c_mktsegment)
VALUES (123,'ddf','sfvvc','ccxx','dddd','dddss','sszzs','sssaaaa');
I am using the Amazon Redshift Query Editor for this. The query is getting triggered twice and I can see that in STL_QUERY and STV_RECENTS tables.
Can someone help me resolve this and why does it work this way?

Quick sight adding nesting to SQL query which causes errors in Athena

I'm trying to create a very simple visualisation in Quicksight, and to do this I'm using an SQL query in Quicksight,
SELECT COUNT(distinct uuid), day
FROM analytics.myTable
GROUP BY day
Unfortunately, whenever I run this query in Quicksight it fails due to the following error
from the AWS Athena client. SYNTAX_ERROR: line 2:8: Column '_col0'
cannot be resolved
When I look in Athena, I can see that Quicksight is "nesting" the SQL query... this is what's causing the error in Athena,
/* QuickSight 4da449cf-ffc6-11e8-92ea-9ffafcc3adb3 */
SELECT "_col0"
FROM (SELECT COUNT(distinct uuid)
FROM pregnancy_analytics.final_test_parquet) AS "DAU"
What I don't understand is:
a) why this is flagging an error?
b) why Quicksight is nesting the SQL?
If I simply run the command directly in Athena,
SELECT COUNT(distinct uuid) FROM analytics.myTable
It does indeed show the column name "_col0",
_col0
1 1699174
so the fact that Quicksight is raising an error shouldn't actually be a problem.
Can someone offer some advice on how to resolve this issue?
Thanks
You can modify the query to explicitly name the aggregated column and then the query will work.
Example:
SELECT COUNT(distinct uuid) as "distinct_uuid", day
FROM analytics.myTable
GROUP BY day
Often in visualization software you will need to explicitly name your aggregate/function-wrapped columns as they default to things like _col0 which the software doesn't parse well, so it throws that error.
Specifically, I see this all the time in Superset using Presto.
For your problem explicitly, you should just do what Piotr recommended which is just adding a name after COUNT(distinct uuid) - I'm partial to freq, but it looks like you'll want something like uuid or unique_uuid :)