aws QuickSight could not generate any output column after applying transformation - amazon-web-services

I am trying to create a visualization using aws quicksight, I've done it before using the same data source and tables.
Right now when I try to run simple query
select * from table order by time desc limit 10;
it outputs QuickSight could not generate any output column after applying transformation.
When I run same query in aws athena it works fine.
I have my data in SPICE.
EDIT: I've just re-created my dataset in quicksight and it is now working... Still want to know what was wrong.

I got the same error in QuickSight when I created a new custom SQL data source getting data out of Spectrum. And I also got it to work by re-creating the data source.

Related

How can I save SQL script from AWS Athena view with boto3/python

I have been working with AWS Athena for a while and need to do create a backup and version control of the views. I'm trying to build an automation for the backup to run daily and get all the views.
I tried to find a way to copy all the views created in Athena using boto3, but I couldn't find a way to do that. With Dbeaver I can see and export the views SQL script but from what I've seen only one at a time which not serve the goal.
I'm open for any way.
I try to find answer to my question in boto3 documentation and Dbeaver documentation. read thread on stack over flow and some google search did not took me so far.
Views and Tables are stored in the AWS Glue Data Catalog.
You can Query the AWS Glue Data Catalog - Amazon Athena to obtain information about tables, partitions, columns, etc.
However, if you want to obtain the DDL that was used to create the views, you will probably need to use SHOW CREATE TABLE [db_name.]table_name:
Analyzes an existing table named table_name to generate the query that created it.
Have you tried using get_query_results in boto3? get_query_results

Copying BigQuery result to another table is not working?

I have noticed one weird problem on BigQuery in last 2-3 days, earlier it was working fine.
I have a bigquery table in a dataset located in EU region. I am running a simple SELECT query on that table and it ran without any issue.
Now, I am trying to save that query result into another bigquery table in the same dataset, it is giving below error -
To copy a table, the destination and source datasets must be in the
same region. Copy an entire dataset to move data between regions.
Strange part is that, other alternatives are working fine, such as -
Copying the source table to new table is working fine.
When I set the destination table in the query setting and run the query then it is able to save the query result into that configured table.
I ran the query and access the temporary table where BigQuery actually stores the query result and then copy that temporary table to destination table, this is also working.
Not sure why only the save result option is not working, it was working before though.
Anyone has any idea if something has changed on GCP recently?
You can try to create or replace table 'abc.de.omg' AS SELECT .... to store the same result.
edit: another workaround is to set it up as a schedule query and run it as a backfill once.
On another note, anyone finding this can comment on the reported bug here: https://issuetracker.google.com/issues/233184546 (i'm not the original poster)
I tried to save query results as the BigQuery table as below, I manually gave the Dataset name and it worked.
This happens when you have source and destination dataset in different region.
you can share source and destination dataset screenshots to check the region.
Tried to reproduce issue
Error when i typed dataset name manually.
Successfull When I selected dataset from drop-down - Strange

Fetch Schedule data from a BigQuery Table to another BigQuery Table (Scheduled queries)

I am really new to GCP and I am trying to Query in a GCP BigQuery to fetch all data from one BigQuery table and Insert all into another BigQuery table
I am trying the Following query where Project 1 & Dataset.Table1 is the Project where I am trying to read the data. and Project 2 and Dataset2.Table2 is the Table where I am trying to Insert all the data with the same Naming
SELECT * FROM `Project1.DataSet1.Table1` LIMIT 1000
insert INTO `Project2.Dataset2.Table2`
But am I receiving a query error message?
Does anyone know how to solve this issue?
There may be a couple of comments...
The syntax might be different => insert into table select and so on - see DML statements in the standard SQL
Such approach of data coping might not be very optimal considering time and cost. It might be better to use bq cp -f ... commands - see BigQuery Copy — How to copy data efficiently between BigQuery environments and bq command-line tool reference - if that is possible in your case.
The correct syntax of the query is as suggested by #al-dann. I will try to explain further with a sample query as below:
Query:
insert into `Project2.Dataset2.Table2`
select * from `Project1.DataSet1.Table1`
Input Table:
This will insert values into the second table as below:
Output Table:

Have you managed to get results when running the query SELECT * FROM information_schema.views in AWS Athena?

Is there a bug in the information_schema.views implementation in AWS Athena?
I'm getting 0 rows returned when running the query SELECT * FROM information_schema.views in AWS Athena even though the database I'm running on has tables and views in it.
Wanted to check if anyone else is facing the same issue.
I'm trying to fetch the view definition script for ALL views in the AWS Athena database as a single result set instead of using the SHOW CREATE VIEW statement.
You should be using
SHOW TABLES IN example_database; to get information about tables in Athena database.
And the loop through use describe query to fetch details.
Hope this will help!

aws glue triggering job

I have modified a Glue generated script that I use for transformation and manipulation with the data. I want to run the same job by trigger on every new table that appears in the catalog but without manually changing the table name in the job script.
In short, how can I run the same transformation the script provides on every new table that appears in the data catalog without manually changing the table name every time ?
Thanks
You can use Catalog Client to dynamically get the list of tables in the database. I don't know how to get the catalog client in pyspark, but in scala it looks like this
val catalog = glueContext.getCatalogClient
for (table <- catalog.listTables("myDatabaseName", "").getTableList.asScala) {
// do your transformation
}