How to RENAME struct/array nested columns using ALTER TABLE in BigQuery? - google-cloud-platform

Suppose we have the following table in BigQuery:
CREATE TABLE sample_dataset.sample_table (
id INT
,struct_geo STRUCT
<
country STRING
,state STRING
,city STRING
>
,array_info ARRAY
<
STRUCT<
key STRING
,value STRING
>
>
);
I want to rename the columns inside the STRUCT and the ARRAY using an ALTER TABLE command. It's possible to follow the Google documentation available here for normal columns ("non-nested" columns) i:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN id TO str_id
But when I try to run the same command for nested columns I got errors from BigQuery.
Running the command for a column inside a STRUCT gives me the following message:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `struct_geo.country` TO `struct_geo.str_country`
Error: ALTER TABLE RENAME COLUMN not found: struct_geo.country.
The exact same message appears when I run the same statement, but targeting a column inside an ARRAY:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `array_info.str_key` TO `array_info.str_key`
Error: ALTER TABLE RENAME COLUMN not found: array_info.str_key
I got stuck since the BigQuery documentation about nested columns (available here) lacks examples of ALTER TABLE statements and refers directly to the default documentation for non-nested columns.
I understand that I can rename the columns by simply creating a new table using a CREATE TABLE new_table AS SELECT ... and then passing the new column names as aliases, but this would run a query over the whole table, which I'd rather avoid since my original table weighs way over 10TB...
Thanks in advance for any tips or solutions!

Related

make all values of mutiple columns in one column Power Bi

I want to get this result (in the picture) using this code but it doesn't work , any suggestions to have all the values of all columns in one column and keep occurrences.
AllZipCode=UNION(SUMMARIZE('Table','Table'[ZipCode1]),
SUMMARIZE('Table','Table'[ZipCode2]),
SUMMARIZE('Table','Table'[ZipCode3]))
It is unlikely to combine all the column and return it in the existing table, else it will return error, by create a new table to join all the column referring to the table, you can achieve the expected output:
Table = UNION(SELECTCOLUMNS(Sheet1,"col1",Sheet1[Zip1]),
SELECTCOLUMNS(Sheet1,"col2",Sheet1[Zip2]),
SELECTCOLUMNS(Sheet1,"col3",Sheet1[Zip3]))
Original table :
Union table:

Power BI : How to count occurrence of value from source table?

I have my data source something like below.
I need to show output in the report as below.
I tried using the unpivot column and getting something like this, how to count the occurrence value of each Business value.
Plot following mesure against Value column (from your unpivot table):
Business Occurance = COUNTROWS('your unpivot table')
We have to remove the Attribute column as the next step to Unpivot. Then my table should be looks like this.
Now create a new table with following Dax function, let's say the current table as Business Data (Your Unpivot table)
Occurrence Table = DISTINCT('Business Data')
Now end result table should look like this,
You can make use of this table for your table visual in the report.
Note: You can add n-number of rows and column into your source table and this logic will do magic to get the correct result.
I have marked two places first marked place you have to add Value column then click second marked place one dropdown value is open click count menu

Power query append multiple tables with single column regardless column names

I have the following query in M:
= Table.Combine({
Table.Distinct(Table.SelectColumns(Tab1,{"item"})),
Table.Distinct(Table.SelectColumns(Tab2,{"Column1"}))
})
Is it possible to get it working without prior changing column names?
I want to get something similar to SQL syntax:
select item from Tab1 union all
select Column1 from Tab2
If you need just one column from each table then you may use this code:
= Table.FromList(List.Distinct(Tab1[item])
& List.Distinct(Tab2[Column1]))
If you use M (like in your example or the append query option) the columns names must be the same otherwise it wont work.
But it works in DAX with the command
=UNION(Table1; Table2)
https://learn.microsoft.com/en-us/dax/union-function-dax
It's not possible in Power Query M. Table.Combine make an union with columns that match. If you want to keep all in the same step you can add the change names step instead of tap2 like you did with Table.SelectColumns.
This comparison of matching names is to union in a correct way.
Hope you can manage in the same step if that's what you want.

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?
Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;
From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

Column names containing dots in Spectrum

I created a customers table with columns has account_id.cust_id, account_id.ord_id and so on.
My create external table query was as follows:
CREATE EXTERNAL TABLE spectrum.customers
(
"account_id.cust_id" numeric,
"account_id.ord_id" numeric
)
row format delimited
fields terminated by '^'
stored as textfile
location 's3://awsbucketname/test/';
SELECT "account_id.cust_id" FROM spectrum.customers limit 100
and I get an error as :
Invalid Operation: column account_id.cust_id does not exists in
customers.
Is there any way or syntax to write column names like account_id.cust_id (text.text) while creating the table or while writing the select query?
Please help.
PS: Single quotes, back ticks don't work either.