Single table db architecture with AWS Amplify

Single table db architecture with AWS Amplify - amazon-web-services

By default AWS Amplify transformers creating tables per each graphql type.
But according DynamoDB documentation it's best practice to
Keep tables few as possible
Keep often queried together entries within a same table
I have an impression Amplify way of doing things stays in contradiction with the statement above.
I am new to both NoSQL and Amplify
Can someone suggest ways to address those issues?

I think we're in a bit of a transition or gray area here. I'm very new to Amplify and have been investigating moving to a single-table design as there are sources (below) that indicate that it's always be there but you'd have to write everything in VTL templates. But in 2020 they released direct lambda resolver support: https://youtu.be/EOQqi6Yun7g?t=960 (clip)
However, it seems like you lose access to the #auth directive (and probably others because you're no longer going to use #model) along with a lot of the nice out-of-the-box functionality that's available with Amplify's multi-table approach.
At this point, being that I'm developing a new app, I'm going to stick with the default multi-table design to hasten the process of getting the app functional.
Trying to implement the single-table design seems to go against what the Amplify team recommends and requires more manual work. You'd have to manually create custom lambda functions (AppSync) and code queries to DynamoDb for each data access element and manage authorization through some other means which I'm not aware of at this time. Maybe someone can chime in here...
Single table vs multi table info
Using Amplify with single table:
https://youtu.be/EOQqi6Yun7g
Single vs Multi Clip:
https://youtu.be/1WF_wped808?t=1251 (clip)
https://www.alexdebrie.com/posts/dynamodb-single-table/ (towards bottom)
https://youtu.be/EOQqi6Yun7g?t=1288 (clip)
Example single table design by Alex Debrie:
https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56
More Discussion:
https://stackoverflow.com/a/56438716/1956540
Basic Setup steps
I setup a single table by following the below instructions. Again, you don't use #models for this. Also, I think you have to include a type query {} in your schema for it to compile, but I could be wrong here.
So the basic steps are:
Create a single table (amplify add storage)
amplify push
Create your schema in the schema.graphql file.
Create supporting lambda function (amplify add function)
Note: if you look at the example here, I believe you can create an entry point to routes to all other methods: https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56#lambda
Add the DynamoDb query code in the function.
amplify push
Complete steps for Setting up a single Table:
https://catalog.us-east-1.prod.workshops.aws/workshops/53b10bf8-2271-4ab4-bfd2-39e878a90dc8/en-US/lab2/1-vtl (both "Connecting to an existing DynamoDB table" and "Direct Lambda Resolver" steps)
Not trying to be negative about Amplify, it is awesome, I love what they are doing with this product. I just think it's very new to everyone and I'm hoping this post is no longer valid next year and we continue to see great progress from the team.

Related

Restoring a DynamoDB table using PITR and updating Cloudformation/CDK/other services references

I have an environment where my DynamoDB table is central to a few services (a couple of lambdas, kinesis and firehoses). All of that is managed by AWS's CloudFormation/Typescript CDK.
This table has PITR enabled and, as far as I know, it's only possible to do a PITR by dumping the recovered data into a new table. Here is where the pain begins:
AWS's documentation after the creation of the new table is INEXISTENT!
How can I update the references for the new table on all other services?
Should I just 'erase' my old table and import the recovered ones?
Wouldn't this means that I would need to take my service down to recover it?
What is the "standard" or "best practice" here?
Thanks a lot community! :D

You must restore to a new table yes. There are some ways to overcome the issues you describe. Firstly, when you restore to a new table you will need to import that resource to your CDK stack.
Parameter Store
Use the parameter Store to hold the latest name of your table, all of your down stream applications will resolve the table name by querying the param store.
Lambda Env Variable
Set your table name dynamically as environment variables for your lambda, this will reduce latency as opposed to the other approach, but it's only applicable to Lambda or services which allow you to set env variables.
Inline Anwers for Completeness
AWS's documentation after the creation of the new table is INEXISTENT!
Please share feedback directly on the docs page if you believe relevant information is missing.
How can I update the references for the new table on all other services?
2 options mentioned above is the most common approach.
Should I just 'erase' my old table and import the recovered ones?
This would cause application downtime, if you can afford that then it would be an easy approach. If not, follow the above suggestions.
Wouldn't this means that I would need to take my service down to recover it? What is the "standard" or "best practice" here?
Yes, as mentioned above.

Automating dynamodb scripts

Like we used to do with rdbms sql scripts. I wanted to do a similar thing with my dynamodb table.
Currently its very difficult to track changes from environment to environment(dev - qa -prod). We are directly making changes via the console.
What I want to do is, Keep the table data/json in the git version control and whenever any dev makes a change, we should be able to just run a script that will be able migrate the respective changes to on the dynamodb table eg. update/create/delete the tables, add/remove/update the records.
But I am not able to find a proper way/guide to achieve this currently. I am using javascript/nodejs as our base language.
Any help regarding this scenario will be appriciable.
Thanks
ref : https://forums.aws.amazon.com/thread.jspa?threadID=342538

As far as I can tell you described to separate issues:
Changing the tables "structure"
Updating records after the update
Before I go into my answer, remember that DynamoDB is a NoSQL database and your previous RDBMS was a relational database. Operational tasks can differ very much for both types of databases.
1. Automating changes to the "structure" of the table
For this you can check out Infrastructure as Code tools like Terraform, CloudFormation or Pulumi.
But since DynamoDB is a NoSQL database, you only can do a few things like setting your hash and sort key etc and defining indices. Adding "fields" to the DynamoDB is not done with those tools, because except for the hash and sort key, there are no fields. Everything else does not follow a explicit (sql) schema.
2. Updating records after an update
If you do not have a lot of records, you could write yourself a simple tool or script to do the relevant work using the AWS SDK and run that during your CI/CD pipeline. A simple approach would be to have a "migrations" folder and if there is a file in it, the pipeline will execute it. So after the migration is done, just remove the file again. Not great, but pragmatic.
If you have a lot of records this won't work that great anymore, at least if you want to have a downtime-less deployment. In that case you will have update your software to be able to work with the old and new versions of the records structure, while you gradually update all records in the background (using a script etc.). Once all the records are updated, you can remove the code paths that handle the old structure.

Do I need to update item csv in AWS personalize?

I'm trying to use AWS personalize, and following their documents.
So I've uploaded dataset files(interaction, user, item) to S3, then created a solution and a campaign.
And I implemented PutEvents API using java.
GetRecommendations API call works good.
At this moment I'm curious I need to update dataset files, especially item csv.

In general it's done at this point for very basic recommendations.
Since you are using PutEvents call, then all of the real-time events are added to Interactions dataset this way. Interactions datasets created by manual import and by PutEvents calls are separated from themselves. You can actually see them in Personalize Datasets web console.
Still you might want to update dataset files, using dataset import job feature, but it's going to replace your existing dataset. In general I would recommend using it only when:
You just created a fresh/bigger/better dump of your database with Interactions.
You've found, that your previous interactions dataset was invalid.
The schema of dataset changed (pretty much you are forced to do it then).
User or Item dataset changed/improved, it's actually a good idea to refresh it often, so Personalize can produce better recommendations. Keep in mind, that it also requires retraining of the Solution, so the new Items/Users will be included during the recommendations generation.
So for interactions you usually don't want to update dataset. For other datasets it might be a good idea to even create an automatic import mechanism.
Keep in mind, that Items and Users datasets are used only with Personalize Recipes, that support metadata. Otherwise they are simply ignored.

Using an exisiting AppSync API w/ Amplify

Using:
AppSync, DynamoDB, and Lambda
So I am a bit stuck on how to integrate AppSync within Amplify in React Native. I have an existing API within AWS AppSync that I created on the console. This API has several different models like: Users, Videos, VideoComments, etc. Some of the objects within these models have custom mapping templates and resolvers that are really important for the application.
For Example, this is a quick look at what we have schema wise.
type User {
userId: Id!
name: String
uploadedVideos(limit: Int, nextToken: String): VideoConnection
etc etc
}
type Video {
videoID: Id!
object: S3Object
userId: Id!
uploadedBy: User
}
We have a resolver that runs with a simple getVideo query that will retrieve the uploadedBy attribute by using the userId, and retrieves all the necessary information of that user.
Additionally, the data sources (dynamoDb tables) we created for the models have primary keys, and some have sort keys. Like a VideoLikes table keeps track of who liked a video and to avoid duplicates the primary key is the VideoID and the sort key is UserID. This is just a minor example, we have other places where we do this to also have access to LSIs.
When I started using Amplify, I tried recreating the AppSync API because I liked how powerful the CloudFormation capabilities were with the different staging environments. However, I noticed the DynamoDB for models were automatically defined and were automatically set to an id as a primary key. We use the LSIs to help sort by certain values, like for a video if we wanted to sort by the number of likes, or comments, so unfortunately this would not work for us. So when I noticed this, I used the "Codegen" command from my original AppSync API and ran into the issue where my resolvers and mapping templates were not copied down with the schema, queries, mutations, and subscriptions making most of the queries fail because those data sources were missing.
So my question is:
Is there a way to integrate/use EVERYTHING from my exiting AppSync API within my React Native application? This includes the custom resolvers and mapping templates.
IF NOT
Is there a way to set a primary and sort key for the DynamoDB of the Models when creating an API directly within the Amplify CLI?
IF NOT
Is there another way to have data sorted efficiently within DynamoDB without using LSIs and GSIs? If the models automatically generate tables with GSIs, this can be problematic because I know GSIs are a bit more expensive so I would like to avoid those as much as possible. Is there another service that will sort the data that can be used within AppSync from DynamoDB?
Any help will be appreciated, thank you.

Is there a way to integrate/use EVERYTHING from my exiting AppSync API within my React Native application?
Yes, you can deploy your API however you want and then use it using amplify client tools. You can always use the Ampify CLI's codegen features without deploying the API via the amplify api category. You can use custom stacks to define custom resolvers that are not generated by #model but this will deploy a new API with the same structure as your existing API.
Is there a way to set a primary and sort key for the DynamoDB of the Models when creating an API directly within the Amplify CLI?
Soon. There is an RFC here https://github.com/aws-amplify/amplify-cli/issues/1062 and implementation is in PR here https://github.com/aws-amplify/amplify-cli/pull/1463.
Is there another way to have data sorted efficiently within DynamoDB without using LSIs and GSIs?
No but you can use index overloading to make 1 index store and sort multiple different conceptual object types. TBH this is a complex subject but here is a good place to start https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html.

How to move the data from one DynamoDB table to other DynamoDB table in the same region on click of HTML button

I am very new to AWS. As the first step I am creating an eCommerce application on my personal interest to give the demo of this application to my colleagues.
I am implementing 'Order' part. For this, I am thinking of moving the data from one table to other. I.e Once the user add the product to cart , it will saved in Cart table in dynamo-db and in cart screen when the user clicks on 'Order'button/Link, the same data as it is in cart table should be moved to Order table and the cart should be empty So, the order can be confirmed.
How could I implement it? Not sure the method I am thinking is right if there any other method to accomplish Order functionality.

The answer to this is really going to depend on your architecture and stack - and even within that you have lots of options.
In a serverless way, i.e. from a static html page with no server-side backend, you could create a lambda function in the supported language of your choice and with the proper IAM role, to move the data from one table to the other - your html page could call it via an API call, and I would suggest you use AWS API Gateway to expose an api endpoint that then calls the lambda function.
If on (one of the other many) other-hands, say you were using ASP.net or PHP on the server side, you could use the AWSSDK to talk to the dynamodb directly and accomplish the same thing.
Besides these two options there are many, many alternatives and variations - and with all of the options you are also going to need to deal with authentication/security to make sure no one can make calls to your database/service that they aren't permitted to - perhaps not important for your demo application, but will definitely be an issue if/when you go live.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js