Cassandra migrates value from text to list of text - list

I try to update my table in Cassandra database. Right now I have table which looks like
foo(id, email) - email text type
I want to update the table to something like
foo(id, emails) - emails list of text type
I added new column
ALTER TABLE foo ADD emails set <text>;
but I don't know how to migrate value from email column to emails.

I am not very familiar with your domain, however you have to tell Cassandra how to create the new list of emails. Take a look at COPY if you don't have too many records. You can
Add a new column
Use copy to export your existing data using COPY TO
Apply your transformations on the exported data
Use COPY FROM to upload your new data.
If you have to do this update in real time while your table is growing, this solution won't work.

Cassandra don't have a native solution to do such operation.
Maybe you can create a simple migration program with your favorite programming language and follow these steps :
Create the new column in cassandra (with ALTER command).
Execute the migration program.
Delete the old column in cassandra.
The program it self will take all record from cassandra one by one an apply the transformation.

Related

How to fetch the latest schema change in BigQuery and restore deleted column within 7 days

Right now I fetch columns and data type of BQ tables via the below command:
SELECT COLUMN_NAME, DATA_TYPE
FROM `Dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name="User"
But if I drop a column using command : Alter TABLE User drop column blabla:
the column blabla is not actually deleted within 7 days(TTL) based on official documentation.
If I use the above command, the column is still there in the schema as well as the table Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
It is just that I cannot insert data into such column and view such column in the GCP console. This inconsistency really causes an issue.
If I want to write bash script to monitor schema changes and do some operation based on it.
I need more visibility on the table schema of BigQuery. The least thing I need is:
Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS can store a flag column that indicates deleted or TTL:7days
My questions are:
How can I fetch the correct schema in spanner which reflects the recently deleted the column?
If the column is not actually deleted, is there any way to easily restore it?
If you want to fetch the recently deleted column you can try searching through Cloud Logging. I'm not sure what tools Spanner supports but if you want to use Bash you can use gcloud to fetch logs. Though it will be difficult to parse the output and get the information you want.
Command used below fetched the logs for google.cloud.bigquery.v2.JobService.InsertJob since an ALTER TABLE is considered as an InsertJob and filter it based from the actual query where it says drop. The regex I used is not strict (for the sake of example), I suggest updating the regex to be stricter.
gcloud logging read 'protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob" AND protoPayload.metadata.jobChange.job.jobConfig.queryConfig.query=~"Alter table .*drop.*"'
Sample snippet from the command above (Column PADDING is deleted based from the query):
If you have options other than Bash, I suggest that you create a BQ sink for your logging and you can perform queries there and get these information. You can also use client libraries like Python, NodeJS, etc to either query in the sink or directly query in the GCP Logging.
As per this SO answer, you can use the time travel feature of BQ to query the deleted column. The answer also explains behavior of BQ to retain the deleted column within 7 days and a workaround to delete the column instantly. See the actual query used to retrieve the deleted column and the workaround on deleting a column on the previously provided link.

Extract script with which data was inserted into an old table and the owner in BigQuery - GCP

I am trying to find out what was the query run to be able to insert data into an empty table created earlier.
It would also be useful for me to know the user who was the creator of that table, if not, just to ask him about that script.
I tried with "INFORMATION_SCHEMA.TABLES;" but I only get the create script of the table.
I would look into BigQuery audit logs
most probably you can get information you want :
here is reference: https://cloud.google.com/bigquery/docs/reference/auditlogs/

Update AWS Athena data & table to rename columns

Today, I saw myself with a simple problem, renaming column of an Athena glue table from old to new name.
First thing, I search here and tried some solutions like this, this, and many others. Unfortunately, none works, so I decided to use my knowledge and imagination.
I'm posting this question with the intention of share, but also, with the intention to get how others did and maybe find out I reinvented the wheel. So please also share your way if you know how to do it.
My setup is, a Athena JSON table partitioned by day with valuable and enormous amount of data, the infrastructure is defined and updated through Cloudformation.
How to rename an Athena column and still keep the data?
Explaining without all the cloudformation infrastructure.
Imagine a table containing:
userId
score
otherColumns
eventDateUtc
dt_utc
Partitioned by dt_utc and stored using JSON format. Wee need to change the column score to deltaScore.
Keep in mind, although I haven't tested with others format/configurations, this should apply to any configuration supported by athena as we are going to use athena algorithm to do the job for us.
How to do
if you run the cloudformation migration first, you gonna "lose" access to the dropped column.
but you can simply rename the column back and the data appears.
Those are the steps required for rename a AWS Athena table:
Create a temporary table mapping the old column name to the new one:
This can be done by use of CREATE TABLE AS, read more in the aws docs
With this command, we use Athena engine to apply the transformation on the files of the original table for us and save at s3://bucket_name/A_folder/temp_table_rename/.
CREATE TABLE "temp_table_rename"
WITH(
format = 'JSON',
external_location = 's3://bucket_name/A_folder/temp_table_rename/',
partitioned_by = ARRAY['dt_utc']
)
AS
SELECT DISTINCT
userid,
score as deltascore,
otherColumns,
eventDateUtc,
"dt_utc"
FROM "my_database"."original_table"
Apply the database rename by running the cloudformation with the changes or on the way you have.
At this point, you can even drop the original_table, and create again using the right column name.
After rename, you will notice that the renamed column have no data.
Remove the data of the original table by deleting it's s3 source.
Copy the data from the temp table source to the original table source
I prefer to use a aws command as, there can be thousands of files to copy
aws s3 cp s3://bucket_name/A_folder/temp_table_rename/ s3://bucket_name/A_folder/original_table/ --recursive
Restore the index of the original table
MSCK REPAIR TABLE "my_database"."original_table"
done.
Final notes:
Using CREATE TABLE AS to do the transformation job, allow you to do much more than only renaming the column, for example split the data of a column into 2 new columns, or merge it to a single one.

How to update Athena tables in a zero-down-time manner?

I am deploying Athena external tables, and want to update their definition without downtime, is there a way?
The ways I thought about are:
Create a new table and rename the old and then rename the new to the old name, first, it involves a very small downtime, and renaming tables doesn't seem to be supported (neither altering the definition).
The other way is to drop the table and recreate it, which obviously involves downtime.
If you use the Glue UpdateTable API call you can change a table definition without first dropping the table. Athena uses the Glue Catalog APIs behind the scenes when you do things like CREATE TABLE … and DROP TABLE ….
Please note that if you make changes to partitioned tables you also need to update all partitions to match.
Another way that does not involve using Glue directly would be to create a view for each table and only use these views in queries. When you need to replace a table you create a new table with the new schema, then recreate the view (with CREATE OR REPLACE) to use the new table, then drop the old table. I haven't checked, but it would make sense if replacing a view used the UpdateTable API behind the scenes.

How to transfer data from one database to another in Django?

I am recreating a web app in Django that was running in a server but it was terminated, fortunately, I did a backup of all the code. My problem comes with the database because but I do not know how to transfer all the data from the old db.sqlite3 Django database web app into the new one.
I found a similar question as mine Django: transfer data from one database to another but the user wanted to transfer data from specific columns because their models.pyfrom the old and new databases were slightly different. In my case, my models.py from the old and new databases are the same.
Alternatives
I am using the DB Browser for SQLite to explore the content of the old database and I could add manually each row into the Django administration but this will take me too much time.
I could copy the old db.sqlite and replace it in the new web app because the models.py file remains the same but this solution is not appropriate IMO, this solution is rude and I think it goes against the good practices of Software.
How should I proceed for transferring data from the old database to the new one?
This seems like a one time copy of one db to another. I don't see how this goes against good software practice unless you have to be copying this db frequently. I've done it before when migrating servers and it doesn't cause any issues assuming the two instances of the application are the same build.
I was able to do some minor tricks in order to solve my problem because there is not a straightforward functionality that allows you to transfer all your data from two sqlite databases in Django. Here is the trick I did:
Download the sqlite db browser to explore and export the contents of your old database in a .csv file. Open you database with sqlite db browser and hit on the tables option and you will see all your tables, then do a right click on any of those and hit the export as a csv file option to generate the csv file (name_of_your_csv_file.csv). The other alternative is to use the sqlite3.exe to open your database in cmd or powershell and then doing the export with:
.tables #this lets you explore your available tables
headers on
mode csv
output name_of_your_csv_file.csv
2.There are two choices up to this point: You can either insert all the records at once to your new database or you can drop your existing tables from the new database and then recreate them and import the .csv file. I went for the drop option because there were more than 100 records to migrate.
# sqlite3
# check the structure of your table so you can re-create it
.schema <table_name>
#the result might be something like CREATE TABLE IF NOT EXISTS "web_app_navigator_table" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "ticket" varchar(120) NOT NULL);
#drop the table
drop table web_app_navigator_table
#re-create the table
create table web_app_navigator_table(id integer not null primary key autoincrement, ticket varchar(120) not null);
#do the import of the csv file
.import C:/Users/a/PycharmProjects/apps/navigator/name_of_your_csv_file.csv table_name_goes_here
You might see an error such as csv:1: INSERT failed datatype mismatch but this indicates that the first row of your csv file was not inserted because it contains the headers of the exported data from your old database.