AWS dynamodb, table is getting deleted automatically - amazon-web-services

Created a table in AWS Dynamodb using AWS console in us-west-2 region. Table is getting automatically delete with no trace at all. To debug I enabled BAckup point in time recovery mode. I can see that there are backups of table which got deleted automatically by the system with $deletedTableBackup as a suffix.
Each time I create the table, I can pump data using access
_key and secrets.
Any help what's going on, what exactly is causing the issue. I am using a corporate account and I have the access to create/delete/modify table.

Related

Creating Databricks Database Snapshot

I have a database created in my databricks environment which is mount to AWS S3 location. I am looking a way to take the snapshot of the database so that I can store it to different place and restore it in case of any failure.
Databricks is not like a traditional database where all data is stored "inside" the database. For example, Amazon RDS provides a "snapshot" feature that can dump the entire contents of a database, and the snapshot can then be restored to a new database server if required.
The equivalent in Databricks would be Delta Lake time travel, which allows you to access the database as it was at a previous point-in-time. Data is not "restored" -- rather, it is simply presented as it previously was at a given timestamp. It is a snapshot without the need to actually create a snapshot.
From Configure data retention for time travel:
To time travel to a previous version, you must retain both the log and the data files for that version.
The data files backing a Delta table are never deleted automatically; data files are deleted only when you run VACUUM. VACUUM does not delete Delta log files; log files are automatically cleaned up after checkpoints are written.
If, instead, you do want to keep a "snapshot" of the database, a good method would be to create a deep clone of a table, which includes all data. See:
CREATE TABLE CLONE | Databricks on AWS
Using Deep Clone for Disaster Recovery with Delta Lake on Databricks
I think you would need to write your own script to loop through each table and perform this operation. It is not as simple as clicking the "Create Snapshot" button in Amazon RDS.

Back up RDS snapshots to S3 automatically

I have RDS which autobackup period is 7 days.
And I have found I can backup the RDS's snapshot to S3 manually.
However I want to make back up RDS snapshot sautomatically to S3.
How can I do this , I should make event bridge?
The first stop for an answer about an AWS service is normally the AWS documentation.
Since sometimes finding the right section in the sea of info could be a bit overwhelming, please find below references that should answer your question.
There are 3 ways you could export an RDS snapshot to S3:
Using the management console
the AWS CLI
RDS APIs
The Exporting DB snapshot data to Amazon S3 AWS document explains each process in detail.
As described in previous comments, you could for instance using a lambda to trigger the RDS APIs.
Even more interesting, AWS provide a GitHub repository with the code to automate the export. Please find the code here.
As mentioned in the document, please note that:
Exporting RDS snapshots can take a while depending on your database type and size. The export task first restores and scales the entire database before extracting the data to Amazon S3. The task's progress during this phase displays as Starting. When the task switches to exporting data to S3, progress displays as In progress. The time it takes for the export to complete depends on the data stored in the database. For example, tables with well-distributed numeric primary key or index columns export the fastest. Tables that don't contain a column suitable for partitioning and tables with only one index on a string-based column take longer. This longer export time occurs because the export uses a slower single-threaded process.

restoring DynamoDB table from AWS Backup

I am using AWS Backup to back up some DynamoDB tables. Using the AWS Backup console to restore the back-ups I am prompted to restore to a new table. This works fine but my tables are deployed using CloudFormation, so I need the restored data in the existing table.
What is the process to get the restored data into the existing table? It looks like there are some third-party tools to copy data between tables but I'm looking for something within AWS itself.
I recently had this issue and actually got cloudformation to work quite seamlessly. The process was
Delete existing tables directly from dynamodb (do not delete from cloudformation)
Restore backup to new table, using the name of the deleted table
In cloudformation, detect drift, manually fix any drift errors in dynamodb, and then detect drift again
After this, the CFN template was healthy
At this time, AWS has no direct way to do this (though it looks like you can export to some service, then import from that service into an existing table).
I ended up writing my own code to do this.

Archiving data of specific table in AWS RDS periodically

I use AWS RDS as a database for my Spring boot application. I would like to archive earlier than 6 months of data from one specific table. In this context, I have gone through a few articles here but did not get any concrete idea of how to do this. Could anyone please help here?
If you are looking to backup with RDS itself, your options are limited. You can, of course, use automated RDS snapshots, but that won't let you pick a specific table (it will backup the entire database) and can't be set for retention longer than 35 days. Alternatively, you could manually initiate a snapshot, but you can't indicate a retention period. In this case though, you could instead use the AWS published rds-snapshot-tool which will help you automate the snapshot process and let you specify a maximum age of the snapshot. This is likely the easiest way to use RDS for your question. If you only wanted to restore one specific table (and didn't care about have the other tables in the backup), you could restore the snapshot and just immediately DROP the tables you don't care about before you start using the snapshot.
However, if you really care about only backing up one specific table, then RDS itself is out as a possible means for taking the backups on your behalf. I am assuming a mysql database for your spring application, in which case you will need to use the mysqldump tool to grab the database table you are interested in. You will need to manually call that tool from an application and then store the data persistently somewhere (perhaps S3). You will also need to manage the lifecycle on that backup, but if you do use S3, you can set a lifecycle policy to automatically age out and drop old files (backups, in this case).

Writing parquet file from transient EMR Spark cluster fails on S3 credentials

I am creating a transient EMR Spark cluster programmatically, reading a vanilla S3 object, converting it to a Dataframe and writing a parquet file.
Running on a local cluster (with S3 credentials provided) everything works.
Spinning up a transient cluster and submitting the job fails on the write to S3 with the error:
AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.
But my job is able to read the vanilla object from S3, and it logs to S3 correctly. Additionally I see that EMR_EC2_DefaultRole is set as EC2 instance profile, and that EMR_EC2_DefaultRole has the proper S3 permissions, and that my bucket has a policy set for EMR_EC2_DefaultRole.
I get the the 'filesystem' that I am trying to write the parquet file to is special, but I cannot figure out what needs to be set for this to work.
Arrrrgggghh! Basically as soon as I had posted my question, the light bulb went off.
In my Spark job I had
val cred: AWSCredentials = new DefaultAWSCredentialsProviderChain().getCredentials
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", cred.getAWSAccessKeyId)
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", cred.getAWSSecretKey)
which were necessary when running locally in a test cluster, but clobbered the good values when running on EMR. I changed the block to
overrideCredentials.foreach(cred=>{
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", cred.getAWSAccessKeyId)
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", cred.getAWSSecretKey)
})
and pushed the credential retrieval into my test harness (which is, of course, where it should have been all the time.)
If you are running in EC2 on the AWS code (not EMR), use the S3A connector, as it will use the EC2 IAM credential provider as the last one of the credential providers it uses by default.
The IAM credentials are short lived and include a session key: if you are copying them then you'll need to refresh at least every hour and set all three items: access key, session key and secret.
Like I said: s3a handles this, with the IAM credential provider triggering a new GET of the instance-info HTTP server whenever the previous key expires.