AWS RDS - replicate data in one table to another - amazon-web-services

No. I am not talking about read replicas.
The scenario I am thinking of is this. Let's say you have an RDS table called user_profile. You want to record a history of the changes of each user profile in another table, let's say we call it user_profile_history. Is it possible in RDS to do real time porting from the main user_profile table to its history table, whenever updates are done to the main table?
End scenario would be, user_profile table only contain the latest user data. All other past snapshots of profile are in the history table.
Both the tables are on the same RDS database.
I have done my due diligence and did a bit of research but all I could find was read replicas and replicating data to another region. Haven't found any that would cover this scenario. Yes, you could say that we can just implement the logic in the app itself but what if we want to "pass the burden" to the RDS DB?

Related

Best practice of using Dynamo table when it needs to be periodically updated

In my use case, I need to periodically update a Dynamo table (like once per day). And considering lots of entries need to be inserted, deleted or modified, I plan to drop the old table and create a new one in this case.
How could I make the table queryable while I recreate it? Which API shall I use? It's fine that the old table is the target table. So that customer won't experience any outage.
Is it possible I have something like version number of the table so that I could perform rollback quickly?
I would suggest table name with a common suffix (some people use date, others use a version number).
Store the usable DynamoDB table name in a configuration store (if you are not already using one, you could use Secrets Manager, SSM Parameter Store, another DynamoDB table, a Redis cluster or a third party solution such as Consul).
Automate the creation and insertion of data into a new DynamoDB table. Then update the config store with the name of the newly created DynamoDB table. Allow enough time to switchover, then remove the previous DynamoDB table.
You could do the final part by using Step Functions to automate the workflow with a Wait of a few hours to ensure that nothing is happening, in fact you could even add a Lambda function that would validate whether any traffic is hitting the old DynamoDB.

Single query to get the data from DynamoDB and RDS

Looking for an advice on AWS architecture. Did some research on my own, but I'm far from an expert and I would really love to hear other opinions. This seems to be a pretty common problem for miscroservice architecture, but AWS looks like a different universe to me with its own rules (and tools), there should be best practices that I'm not aware of yet.
What we have:
SOA: Lambda per entity (usually node.js + DynamoDB)
Some Lambda functions use RDS (MySQL) as a DB (this data was supposed to be used by Quicksight)
GraphQL (AppSync)
First problem occurred when we understood that we have to display in Quicksight the data that is stored in DynamoDB. This was solved by Data Pipeline job that transfers the data from DynamoDB to S3 and then is fetched by Quicksight using Athena. In this case it's acceptable that the data for analysis is not updated in real time.
But now we need to create a table in the main application and combine the data that is stored in different data sources - DynamoDB and MySQL. For example, we have an entity payment with attributes like amount and currency, this data is stored in MySQL. And then there is a contract entity which is stored in DynamoDB. Payment can have a link to a contract (one to many relation). We need to create a table with a list of contracts, so the user can filter contracts by payments attributes like seeing the contracts that have payments in EUR or with total amount > 500 USD. This table must contain real time data and have common data grid features: filtering, sorting, pagination.
Options that I see at the moment:
use SQS to transfer payment attributes from payment service to DynamodDB and store it as a String Set in DynamoDB (e.g. column currencies: ['EUR', 'USD']).
use streams (DynamoDB streams, Kinesis?) to transfer data from DynamoDB to S3, and then query the data with Athena. Not sure it will work for us, I got really bad performance issues with Athena, queries stuck in queue for a couple of minutes, did I do something wrong?
remodel the architecture, merge entities into one DB. Probably this one will take far too long to be allowed by project managers.
Data duplication (and consistency issues as a result) was always a pain for me, but it seems to be unavoidable here.
Any thoughts or links to the articles that might help are highly appreciated.
P.S. The architecture was designed by a previous development team.

DynamoDB local db limits - use for initial beta-go-live

given Dynamo's pricing, the thought came to mind to use DynamoDB Local DB on an EC2 instance for the go-live of our startup SaaS solution. I've been trying to find like a data sheet for the local db, specifying limits as to # of tables, or records, or general size of the db file. Possibly, we could even run a few local db instances on dedicated EC2 servers as we know at login what user needs to be connected to what db.
Does anybody have any information on the local db limits or on this approach? Also, anybody knows of any legal/licensing issues with using dynamo-local in that way?
Every item in DynamoDB Local will end up as a row in the SQLite database file. So the limits are based on SQLite's limitations.
Maximum Number Of Rows In A Table = 2^64 but the database file limit will likely be reached first (140 terabytes).
Note: because of the above, the number of items you can store in DynamoDB Local will be smaller with the preview version of local with Streams support. This is because to support Streams the update records for items are also stored. E.g. if you are only doing inserts of these items then the item will effectively be stored twice: once in a table containing item data and once in a table containing the INSERT UpdateRecord data for that item (more records will also be generated if the item is being updated over time).
Be aware that DynamoDB Local was not designed for the same performance, availability, and durability as the production service.

Migrating a relational DB into AWS services

I have a terabyte size SQL Server DB table which has only two columns:
Id,
HTML Content
There are few applications that call this Table to retrieve the HTML content by providing the Id of the row.
The DB is residing On-premises, and the maintenance cost and size of it is getting higher and higher. I am thinking to move this DB into AWS Dynamo DB. Reason I have choose Dynamo DB is the cost and the performance I have read about it.
Are the any concerns I should know about before choosing Dynamo DB?
Are the any other services in AWS that I could possibly use over
Dynamo DB?
I understand that SQL Server is a Relational DB, while DynamoDB is no sql. And it seems a No Sql DB could be a potential solution for this scenario. I have no kind of joins nor transactions against that Table. All I am doing with the table is to Insert, and Select.
Are the any concerns I should know about before choosing Dynamo DB?
As with any NoSql bigdata DB, Dynamo is "eventually consistent", so, if your application writes and then immediately reads the same record - you should expect failures (inconsistencies).
I'm not familiar with "Prem" and assuming you mean that you're working with your private servers I feel obligated to provide the following warning: working in the cloud is very different from working with your own servers: requests fail more often, latency pattern is different and you should architect your software to handle these sort of issues. If you're planning on moving to the cloud I'd start with migrating your application and leave the DB to be last.
If you really need real time updates of your data, You should reconsider moving on Dynamo. Also dynamo is useful when you do need a dynamic number of columns for each row. So except the cost, i don't see any benefits here.
If you don't need realtime updates, you can look into AWS Redshift or Google BigQuery, and these will be cheaper solutions compare to Dynamo.
Like you have mentioned, you just have two columns, take a look into "redis" also. A plain key value structure will help in performance. But since Redis stores everything in the Physical memory, costing will be high and you'll still need permanent storage/ DB like SQL, MySQL. So in terms of performance, yes you ll be able to see huge difference. but you'll be more thn the current cost.
How about AWS Aurora? At least AWS claims of 1/10th of cost compare to other SQL/MySQL instances. It have backward compatibility also.

DynamoDB - limit on number of tables per account

We are working on deploying our product (currently on prem) on AWS and are looking at DynamoDB as a alternative to Cassandra mainly to avoid the devop costs associated with a large number of Cassandra clusters.
The DynamoDB doc says that the per account limit on the number of tables is 256 per region but can be increased by calling AWS support. How much is the max limit for this per account?
Our product is separated into distinct logical units where each such unit will have several tables (say 100). Each customer can have several of such units. Each logical unit can be backed up (i.e. a snapshot taken) and that snapshot can be restored at any time in the future (to overwrite the current content of all tables). The backup/restore performance - time taken to take a snapshot/import old data for all the tables - need to be good - it cannot be several minutes/hrs.
We were thinking of using distinct set of tables for each such logical unit - so that backup/restore is quick using EMR on S3. But if we follow this approach, we will run out of the 256 table number limit even with one customer. Looks like there are 2 options
Create a new account for each such logical unit for each customer. Is this possible? We will have a main corporate account I suppose (I am still learning about this), but can it have a set of sub-accounts for our customers using IAM each of which is considered as an independent AWS account?
Use each table in a true multi-tenant manner - where the primary key contains the customer id + logical unit id. But in this scenario,when using EMR to backup an entire table, we will need to selectively back up specific set of rows/items which may be in millions and this will go on while other write/read operations are going on on a different set of items. Is this feasible in terms of large scale?
Any other thoughts on how to approach this?
Thanks for any info.
I would suggest changing the approach - rather then thinking how to get more tables via creating more accounts.
I would think of how to use less tables.
Having said that - you could contact support and increase the amount of tables for you account.
I think that you will run into a money problem, due to the current pricing model of provisioning throughput per table.
Many people split tables based on time frame.
e.x: this weeks table, last weeks table, then move it to last months table and so on..
This helps when analyzing the data with EMR/Redshift - so you wont have to pull the whole table every time.