I am trying to add some columns to the Athena table with Cascade using below:
ALTER TABLE test ADD columns (c1 string, c2 string) CASCADE;
But this gives error in Athena. I have 2 questions:
Is Cascade not supported in Athena with Alter Table Add Columns?
Is there any option of IF NOT EXIST with ADD Columns?
The ALTER TABLE ADD COLUMNS documentation does not show a CASCADE option, nor an IF NOT EXIST option.
Related
In SQL Server , we can create index like this. How do we create the index after the table already exists? What is the syntax of create clusted index in bigquery?
CREATE INDEX abcd ON `abcd.xxx.xxx`(columnname )
In big query, we can create table like below. But how to create partition and cluster on an existing table?
CREATE TABLE rep_sales.orders_tmp PARTITION BY DATE(created_at) CLUSTER BY created_at AS SELECT * FROM rep_sales.orders
As #Sergey Geron mentioned in the comments, BigQuery doesn’t support indexes. For more information, please refer to this doc.
An existing table cannot be partitioned but you can create a new partitioned table and then load the data into it from the unpartitioned table.
As for clustering of tables, BigQuery supports changing an existing non-clustered table to a clustered table and vice versa. You can also update the set of clustered columns of a clustered table. This method of updating the clustering column set is useful for tables that use continuous streaming inserts because those tables cannot be easily swapped by other methods.
You can change the clustering specification in the following ways:
Call the tables.update or tables.patch API method.
Call the bq command-line tool's bq update command with the --clustering_fields flag.
Note: When a table is converted from non-clustered to clustered or the clustered column set is changed, automatic re-clustering only works from that time onward. For example, a non-clustered 1 PB table that is converted to a clustered table using tables.update still has 1 PB of non-clustered data. Automatic re-clustering only applies to any new data committed to the table after the update.
I am aware that an Azure Table has a composite key that is made up of a RowKey and PartitionKey. I am also aware that you can pull and Azure Table into PowerBI. I am new to PowerBI, so I am not sure if I am using the right term, but what I would like to be able to do is break my Azure Table into multiple tables in PowerBI based on the PartitionKey. Is this something that is possible? If so, can someone point me in the right direction?
Thanks
Import all your Azure Table data as one PowerQuery table. Then right click on the table in the PowerQuery editor and select Reference. This will give you a new table that points to the Azure Table data, call it "Partition Key Link" or "Partition Key Bridge". Remove all the row data columns. Right click on the partition key column header and select "Remove Duplicates". You now have a table of distinct Partition Keys. The go to your PowerBI model view. Create a relationship between the link table and the data from your Azure Table. You can then link your other data to the link table in order to get to a model that will work well in PowerBI.
I am trying to drop all the partitions on an external table in a redshift cluster. I am unable to find an easy way to do it. I am currently doing this by running a dynamic query to select the dates from the table and concatenating it with the drop logic and taking the result set and running it separately like this
select 'ALTER TABLE procore_iad_ext.active_histories DROP PARTITION (values='''||rtrim(ltrim(values, '["'),'"]') ||''');' from svv_external_partitions
where tablename = 'xyz';
values looks like this ->["2009-03-10"]
Looking for a simpler direct solution. Thanks.
The easiest way to do this would be to drop the table itself. As long as you have the DDL to recreate the table and don't mind dropping all partitions, just DROP TABLE <schemaname>.<tablename>; then recreate the table. The new table will not have any partitions.
Please check out the Glue catalog. It provides a UI to easily delete the tables/partitions etc.
I'm trying to duplicate a Redshift table including modifiers.
I've tried using a CTAS statement and for some reason that fails to copy modifiers like not null
create table public.my_table as (select * from public.my_old_table limit 1);
There also doesn't seem to be a way to alter the table to add modifiers after creating the table which leads me to believe that there isn't a way to duplicate a Redshift table schema except by running the original create table statement vs the CTAS statement.
According to the docs you can do
CREATE TABLE my_table(LIKE my_old_table);
Does the "Create table as" function in SQL Data Warehouse create statistics in the background, or do they have to manually be created (as I would when I do a normal "Create table" statement?)
As of the current version, you always have to create column-level statistics on tables, irrespective of whether it was created with a normal CREATE TABLE or the CTAS CREATE TABLE AS... command. It's also good practice to create stats for columns used in JOINs, WHERE clauses, GROUP BY, ORDER BY and DISTINCT clauses.
Regarding tables created with CTAS, the database engine has a correct idea of how many rows are in the table as listed in sys.partitions, but not at the column-level statistics level. For tables created by CREATE TABLE this defaults to 1,000 rows. For the example below, the first table was created with a CTAS and has 208 rows, the second table with an ordinary CREATE TABLE and INSERT from the first table and also has 208 rows, but sys.partitions believes it to have 1,000 eg
Creating any column-level statistics manually will correct this number.
In summary, always manually create statistics against important columns irrespective of how the table was created.