Alter column data type in Amazon Redshift - amazon-web-services

How to alter column data type in Amazon Redshift database?
I am not able to alter the column data type in Redshift; is there any way to modify the data type in Amazon Redshift?

As noted in the ALTER TABLE documentation, you can change length of VARCHAR columns using
ALTER TABLE table_name
{
ALTER COLUMN column_name TYPE new_data_type
}
For other column types all I can think of is to add a new column with a correct datatype, then insert all data from old column to a new one, and finally drop the old column.
Use code similar to that:
ALTER TABLE t1 ADD COLUMN new_column ___correct_column_type___;
UPDATE t1 SET new_column = column;
ALTER TABLE t1 DROP COLUMN column;
ALTER TABLE t1 RENAME COLUMN new_column TO column;
There will be a schema change - the newly added column will be last in a table (that may be a problem with COPY statement, keep that in mind - you can define a column order with COPY)

to avoid the schema change mentioned by Tomasz:
BEGIN TRANSACTION;
ALTER TABLE <TABLE_NAME> RENAME TO <TABLE_NAME>_OLD;
CREATE TABLE <TABLE_NAME> ( <NEW_COLUMN_DEFINITION> );
INSERT INTO <TABLE_NAME> (<NEW_COLUMN_DEFINITION>)
SELECT <COLUMNS>
FROM <TABLE_NAME>_OLD;
DROP TABLE <TABLE_NAME>_OLD;
END TRANSACTION;

(Recent update) It's possible to alter the type for varchar columns in Redshift.
ALTER COLUMN column_name TYPE new_data_type
Example:
CREATE TABLE t1 (c1 varchar(100))
ALTER TABLE t1 ALTER COLUMN c1 TYPE varchar(200)
Here is the documentation link

If you don't want to change the column order, an option will be creating a temp table, drop & create the new one with desired size and then bulk again the data.
CREATE TEMP TABLE temp_table AS SELECT * FROM original_table;
DROP TABLE original_table;
CREATE TABLE original_table ...
INSERT INTO original_table SELECT * FROM temp_table;
The only problem recreating the table is that you will need to grant again permissions and if the table is too bigger it will take a piece of time.

ALTER TABLE publisher_catalogs ADD COLUMN new_version integer;
update publisher_catalogs set new_version = CAST(version AS integer);
ALTER TABLE publisher_catalogs DROP COLUMN version RESTRICT;
ALTER TABLE publisher_catalogs RENAME new_version to version;

Redshift being columnar database doesn't allow you to modify the datatype directly,
however below is one approach this will change the column order.
Steps -
1.Alter table add newcolumn to the table
2.Update the newcolumn value with oldcolumn value
3.Alter table to drop the oldcolumn
4.alter table to rename the columnn to oldcolumn
If you don't want to alter the order of the columns then solution would be to
1.create temp table with new column name
copy data from old table to new table.
drop old table
rename the newtable to oldtable
One important thing create a new table using like command instead simple create.

This method works for converting an (big) int column into a varchar
-- Create a backup of the original table
create table original_table_backup as select * from original_table;
-- Drop the original table, and then recreate with new desired data types
drop table original_table;
create table original_table (
col1 bigint,
col2 varchar(20) -- changed from bigint
);
-- insert original entries back into the new table
insert into original_table select * from original_table_backup;
-- cleanup
drop original_table_backup;

You can use the statements below:
ALTER TABLE <table name --etl_proj_atm.dim_card_type >
ALTER COLUMN <col name --card_type> type varchar(30)

UNLOAD and COPY with table rename strategy should be the most efficient way to do this operation if retaining the table structure(row order) is important.
Here is an example adding to this answer.
BEGIN TRANSACTION;
ALTER TABLE <TABLE_NAME> RENAME TO <TABLE_NAME>_OLD;
CREATE TABLE <TABLE_NAME> ( <NEW_COLUMN_DEFINITION> );
UNLOAD ('select * from <TABLE_NAME>_OLD') TO 's3://bucket/key/unload_' manifest;
COPY <TABLE_NAME> FROM 's3://bucket/key/unload_manifest'manifest;
END TRANSACTION;

for updating the same column in redshift this would work fine
UPDATE table_name
SET column_name = 'new_value' WHERE column_name = 'old_value'
you can have multiple clause in where by using and, so as to remove any confusion for sql
cheers!!

Related

Modify datatype of date column into integer datatype

I have a table of millions or records and now the requirement is to change the datatype of the date column into integer.
Can we do this ? If yes then how? without loosing any data.
Thanks
Modify, no.
Realistically, you'd want to add a new integer column to the table, move the data to the new column, drop the old column, and rename the new column. Something like
alter table foo
add( new_column integer );
update foo
set new_column = to_number( to_char( old_column, 'YYYYMMDDHH24MISS' ) );
alter table foo
drop column old_column;
alter table foo
rename column new_column to old_column;
It may be more efficient to set the old column to unused rather than dropping it depending on how big your table is and what sort of window you have.
And this doesn't address the underlying wisdom of the requirement. If you have data that represents a date, it very seldom makes sense to store that data as an integer or a varchar2 or anything other than a date.
You can convert the date to a formatted string and then to a number:
SELECT TO_NUMBER(TO_CHAR(your_date, 'YYYYMMDDHH24MISS')) AS date_number
FROM your_table
Or you could convert the values to be relative to an epoch:
SELECT ROUND((your_date - DATE '1970-01-01')*86400) AS seconds_since_1970
FROM your_table
If you want to add a virtual column to the database to do the conversion then:
ALTER TABLE your_table
ADD date_number INTEGER
GENERATED ALWAYS AS (TO_NUMBER(TO_CHAR(your_date, 'YYYYMMDDHH24MISS')));
or
ALTER TABLE your_table
ADD seconds_since_1970 INTEGER
GENERATED ALWAYS AS (ROUND((your_date - DATE '1970-01-01') * 86400));
Then the table would contain:
YOUR_DATE
DATE_NUMBER
SECONDS_SINCE_1970
2022-04-26T14:36:48
20220426143648
1650983808
db<>fiddle here

Redshift add new column based on values from existing column

I have a Redshift table I want to alter adding a new column, which values are derived from an existing column on the table.
Basically, only adding a column "year" which extracts the year from the column "snapshot_date".
Any ideas how to achieve that? Tried following code, but it errors out.
ALTER TABLE test_schema.table_name ADD year AS ( extract(year from snapshot_date) );

Updating Records Via RowVersion , using 'SQL WHERE' to filter for a MAX Value

Trying to update a table based off a RowVersion value in existing table. My data lake updates once a week , with new data stored as a .json file, which holds any new RowVersions.
I need to:
1)Query the existing table in my data warehouse to find the most up to date RowVersion( ie max)
2)Use that value to only filter/select the records in my data warehouse that are greater than the RowVersion I just identified
3)Update my table to include the new Rows
My Question is - the SQL Below, I am not sure how to select the Max RowNumber in the current table and then use that to filter/specify what I want returned when querying my S3 Bucket:
create or replace temporary table UPDATE_CAR_SALES AS
SELECT
VALUE:CAR::string AS CARS,
VALUE:RowVersion::INT AS ROW_VERSION
having row_version > max(row_version)
from '#s3_bucket',
lateral flatten( input => $1:value);
It's not clear to me how you store the data. Is the CARS column unique? Do you need to find maximum row version for each car or for all cars/rows? Anyway you can use a sub-query to filter the rows having row version is higher than the max value:
create or replace temporary table UPDATE_CAR_SALES AS
SELECT
VALUE:CAR::string AS CARS,
VALUE:RowVersion::INT AS ROW_VERSION
FROM #s3_bucket, lateral flatten( input => $1 )
where ROW_VERSION > (SELECT MAX(RowVersion)
from MAIN_TABLE);
If you need to filter the rows, based on row version of each car (of the existing table):
create or replace temporary table UPDATE_CAR_SALES AS
SELECT * FROM (SELECT
VALUE:CAR::string AS CARS,
VALUE:RowVersion::INT AS ROW_VERSION
FROM #s3_bucket, lateral flatten( input => $1 )) temp_table
where temp_table.ROW_VERSION > (SELECT MAX(RowVersion)
from MAIN_TABLE where cars = temp_table.CARS );
I needed to put the main query in brackets to be able to use alias. Hope it helps.

sql alchemy Update resultset of raw query

Am new to Sql Alchemy. I have a raw sql which i need to execute by passing bind parameters. Resulting rows from the query, i need to update a particular column value. How do i do this in the efficient way?
Below are the columns in my table metrics
TABLE
id,total,pass,fail,category,ref_id
query = "Select * from table where id in(select max(id) from table ...)"
sql = text(query)
result = db.engine.execute(sql, CATEGORY=category)
for row in result:
//update here
So i have this complex query, that i need to execute as an inline query. Let's say i get three rows from my query and i need to update ref_id for all the 3 rows with a values. How can i achieve this preferably bulk update.
Am using python 2.7,SQLAlchemy==0.9.9,SQLAlchemy-Utils==0.29.8

Is it possible to remove a column from a partitioned table in Google BigQuery?

I'm trying to remove a column from a partitioned table in BigQuery using this command
bq query --destination_table [DATASET].[TABLE_NAME] --replace --use_legacy_sql=false 'SELECT * EXCEPT(column) FROM [DATASET].[TABLE_NAME]'
As a result the unwanted column is removed, the schema is changed but the data is no more partitioned.
Any suggestion on how to keep the data partitioned after the column is removed? Docs are clear only for non partitioned tables.
There are two workarounds you can use:
Use a column-partitioning table, which means it's partitioned on a value of a regular column in a table. You can create a new column-partitioned table and copy the data deleting the column:
bq mk --time_partitioning_field=pt --schema=... [DATASET].[TABLE_NAME2]
bq query --destination_table=[DATASET].[TABLE_NAME2] "SELECT _PARTITIONTIME as pt, * EXCEPT(column) from [DATASET].[TABLE_NAME]"
You can also still use day-partitioned tables, but copy the data using DML. You can set or copy _PARTITIONTIME column inside the DML INSERT statement, which is not possible with regular SELECT. Here is an example:
INSERT INTO
dataset1.table1 (_partitiontime,
a,
b)
SELECT
TIMESTAMP(DATE "2008-12-25") AS _partitiontime,
"a" AS a,
"b" AS b
This requires DML over partitioned tables, which is currently in alpha: https://issuetracker.google.com/issues/36383555
BigQuery now supports DROP COLUMN in partitioned tables:
ALTER TABLE mydataset.mytable
DROP COLUMN column
It's in beta at the time of writing, but it worked for me.