Athena insert data into new added column - amazon-web-services

Trying to insert data into a new column I added. Athena does not have an update table command. Is there anyway to do this without reloading the whole table?
I created a test table and then added the column doing this:
ALTER TABLE MikeTest ADD COLUMNS (monthNum int);
I want to update the column with this SQL statement:
month(date_parse("date", '%m/%d/%Y'))

Amazon Athena reads its data from Amazon S3. It is not possible to 'update' a table because this would require re-writing the files in S3.
You could create a new table with the additional column:
CREATE TABLE new_table
WITH (
external_location = 's3://my_athena_results/folder/',
format = 'Parquet',
write_compression = 'SNAPPY'
)
AS
SELECT
*,
month(date_parse("date", '%m/%d/%Y')) as month
from old_table
This will copy the data to a new location in S3, while populating the new column

Related

How to fetch all the values of one column using column name in dynamoDB table in java?

I have a dynamoDB table in my aws account. I can create client like this.
AmazonDynamoDB amazonDynamoDB =
AmazonDynamoDBClient.builder().withRegion("eu-west-1").withCredentials(creds).build();
DynamoDB dynamoDB = new DynamoDB(amazonDynamoDB);
Table table = dynamoDB.getTable("table name");
Suppose there is column name "content". I want to get a list or set of all values in "content" column.

why crawler create duplicate column name with #?

When i try to crawl the data from datalake in s3 bucket.Some duplicate column are created with # at prefix of the column ex: price#22 in my data catalog.

saving athena results to another table with partitions

From AWS Athena
I am trying to concatenate multiple tables then save it with partitoned key.
after running
select *
from t1
union all
select *
from t2
select *
from t3
create table on the console creates query like this,
create table db.table_name
with(
format='parquet',
external_location=...
) AS
select *
from t1
union all
select *
from t2
select *
from t3;
But I want to add partitoned by column. I've tried
adding partitioned by on top and bottom. Also saved query result then created new table from that using CREATE EXTERNAL TABLE command (this works but return empty row -> even after running MSCK REPAIR)
From https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/ It seems like I need to save data into S3 by partitions so in bucket1 it will have bucket1/2021, bucket1/2022 if 'year' column is the partition column. Correct? If yes, is there efficient to create partitioned buckets?
I have successfully created new, partitioned tables by using this method:
CREATE TABLE my_table
WITH (
format = 'PARQUET',
parquet_compression = 'SNAPPY',
external_location = 's3://bucket/folder/',
partitioned_by = ARRAY['year']
)
AS
SELECT
...

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?
Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;
From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

SyncFramework: How to sync all columns from a table?

I create a program to sync tables between 2 databases.
I use this common code:
DbSyncScopeDescription myScope = new DbSyncScopeDescription("myscope");
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn);
myScope.Tables.Add(tblDesc);
My program creates the tracking table only with Primary Key (id column).
The sync is ok to delete and insert rows.
But updating don't. I need update all the columns and they are not updated (For example: a telephone column).
I read that I need to add the columns I want to sync MANUALLY with this code:
Collection<string> includeColumns = new Collection<string>();
includeColumns.Add("telephone");
...
includeColumns.Add(Last column);
And changing the table descripcion in this way:
DbSyncTableDescription tblDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", includeColumns, onPremiseConn);
Is there a way to add all the columns of the table automatically?
Something like:
Collection<string> includeColumns = GetAllColums("Table");
Thanks,
SqlSyncDescriptionBuilder.GetDescriptionForTable("Table", onPremiseConn) will include all the columns of the table already.
the tracking tables only stores the PK and filter columns and some Sync Fx specific columns.
the tracking is at row level, not column level.
during sync, the tracking table and its base table are joined to get the row to be synched.