How to update redshift column: simple text replacement - sql-update

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?

Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;

From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

Related

How to RENAME struct/array nested columns using ALTER TABLE in BigQuery?

Suppose we have the following table in BigQuery:
CREATE TABLE sample_dataset.sample_table (
id INT
,struct_geo STRUCT
<
country STRING
,state STRING
,city STRING
>
,array_info ARRAY
<
STRUCT<
key STRING
,value STRING
>
>
);
I want to rename the columns inside the STRUCT and the ARRAY using an ALTER TABLE command. It's possible to follow the Google documentation available here for normal columns ("non-nested" columns) i:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN id TO str_id
But when I try to run the same command for nested columns I got errors from BigQuery.
Running the command for a column inside a STRUCT gives me the following message:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `struct_geo.country` TO `struct_geo.str_country`
Error: ALTER TABLE RENAME COLUMN not found: struct_geo.country.
The exact same message appears when I run the same statement, but targeting a column inside an ARRAY:
ALTER TABLE sample_dataset.sample_table
RENAME COLUMN `array_info.str_key` TO `array_info.str_key`
Error: ALTER TABLE RENAME COLUMN not found: array_info.str_key
I got stuck since the BigQuery documentation about nested columns (available here) lacks examples of ALTER TABLE statements and refers directly to the default documentation for non-nested columns.
I understand that I can rename the columns by simply creating a new table using a CREATE TABLE new_table AS SELECT ... and then passing the new column names as aliases, but this would run a query over the whole table, which I'd rather avoid since my original table weighs way over 10TB...
Thanks in advance for any tips or solutions!

Athena insert data into new added column

Trying to insert data into a new column I added. Athena does not have an update table command. Is there anyway to do this without reloading the whole table?
I created a test table and then added the column doing this:
ALTER TABLE MikeTest ADD COLUMNS (monthNum int);
I want to update the column with this SQL statement:
month(date_parse("date", '%m/%d/%Y'))
Amazon Athena reads its data from Amazon S3. It is not possible to 'update' a table because this would require re-writing the files in S3.
You could create a new table with the additional column:
CREATE TABLE new_table
WITH (
external_location = 's3://my_athena_results/folder/',
format = 'Parquet',
write_compression = 'SNAPPY'
)
AS
SELECT
*,
month(date_parse("date", '%m/%d/%Y')) as month
from old_table
This will copy the data to a new location in S3, while populating the new column

Adding NOT NULL constraint to Cloud Spanner table

If a Cloud Spanner table is created with nullable columns, is it possible to add a NOT NULL constraint on a column without recreating the table?
You can add a NOT NULL constraint to a non-key column. You must first ensure that all rows actually do have values for the column. Spanner will scan the data to verify before fully applying the NOT NULL constraint. More information about how to alter tables is here and here.
However, you can not add such a constraint to a key column. That kind of change would require rewriting all the data in the table, because the nullness of the key affects how the data is encoded. The only option for making that change is to create a new table that's set up the way you want, make code changes to support using both tables temporarily, gradually move the data from the old table to the new table, and eventually change the code to use only the new table and drop the old table. If you further then wanted the original table name, you'd have to do the whole thing again.
Unfortenately there is not way to add not null column
The way to do it:
1 add nullable column
ALTER TABLE table1 ADD COLUMN column1 STRING(255)
UPDATE table1.column1, SET NOT NULL VALUE to the column (if the table is not empty).
UPDATE TABLE table1 SET column1 = "<GENERATED DATA>"
Add constraint
ALTER TABLE table1 ADD COLUMN column1 STRING(255) NOT NULL
Thanks.
Creating a non-nullable column in Spanner on an existing table is typically a three step process:
# add new column to table
ALTER TABLE <table_name> ADD COLUMN <column_name> <value_type>;
# create default values
UPDATE <table_name> SET <column_name>=<default_value> WHERE TRUE;
# add constraint
ALTER TABLE <table_name> ALTER COLUMN <column_name> <value_type> NOT NULL;

Informatica : something like CDC without adding any column in target table

I have a source table named A in oracle.
Initially Table A is loaded(copied) into table B
next I operate DML on Table A like Insert , Delete , Update .
How do we reflect it in table B ?
without creating any extra column in target table.
Time stamp for the row is not available.
I have to compare the rows in source and target
eg : if a row is deleted in source then it should be deleted in target.
if a row is updated then update in target and if not available in source then insert it in the target .
Please help !!
Take A and B as source.
Do a full outer join using a joiner (or if both tables are in the same databse, you can join in Source Qualifier)
In a expression create a flag based on the following scenarios.
A key fields are null => flag='Delete',
B key fields are null => flag='Insert',
Both A and B key fields are present - Compare non-key fields of A and B, if any of the fields are not equal set flag to 'Update' else 'No Change'
Now you can send the records to target(B) after applying the appropriate function using Update Strategy
If you do not want to retain the operations done in target table (as no extra column is allowed), the fastest way would simply be -
1) Truncate B
2) Insert A into B

SyncFramework: How to sync all columns from a table?

I create a program to sync tables between 2 databases.
I use this common code:
DbSyncScopeDescription myScope = new DbSyncScopeDescription("myscope");
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn);
myScope.Tables.Add(tblDesc);
My program creates the tracking table only with Primary Key (id column).
The sync is ok to delete and insert rows.
But updating don't. I need update all the columns and they are not updated (For example: a telephone column).
I read that I need to add the columns I want to sync MANUALLY with this code:
Collection<string> includeColumns = new Collection<string>();
includeColumns.Add("telephone");
...
includeColumns.Add(Last column);
And changing the table descripcion in this way:
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", includeColumns, onPremiseConn);
Is there a way to add all the columns of the table automatically?
Something like:
Collection<string> includeColumns = GetAllColums("Table");
Thanks,
SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn) will include all the columns of the table already.
the tracking tables only stores the PK and filter columns and some Sync Fx specific columns.
the tracking is at row level, not column level.
during sync, the tracking table and its base table are joined to get the row to be synched.