AWS Amplify : Search and result aggregations with #searchable directive - amazon-web-services

This is a Usage Question, there is no issue right now, but I have ambiguity with #searchable directive,
As you may know, AppSync is good for simple filtering, but searching for data using advanced filters becomes not possible, like sum, average, min, max, regex, total search result count queries
question 1: does add #searchable directive will launch an EC2 instance (or OpenSearch instance) automatically and start invoicing even in idle time without invoking any query?
question 2: for simple queries, is it possible to not use opensearch, and how to tell aws amplify to use AppSync instead for data filtering? (without remove #searchable)
question 3: Do you think using a custom business logic with Lambda function for advanced filters is better than OpenSearch for small-budget projects?
Hope I was clear
Thanks

Related

Single table db architecture with AWS Amplify

By default AWS Amplify transformers creating tables per each graphql type.
But according DynamoDB documentation it's best practice to
Keep tables few as possible
Keep often queried together entries within a same table
I have an impression Amplify way of doing things stays in contradiction with the statement above.
I am new to both NoSQL and Amplify
Can someone suggest ways to address those issues?
I think we're in a bit of a transition or gray area here. I'm very new to Amplify and have been investigating moving to a single-table design as there are sources (below) that indicate that it's always be there but you'd have to write everything in VTL templates. But in 2020 they released direct lambda resolver support: https://youtu.be/EOQqi6Yun7g?t=960 (clip)
However, it seems like you lose access to the #auth directive (and probably others because you're no longer going to use #model) along with a lot of the nice out-of-the-box functionality that's available with Amplify's multi-table approach.
At this point, being that I'm developing a new app, I'm going to stick with the default multi-table design to hasten the process of getting the app functional.
Trying to implement the single-table design seems to go against what the Amplify team recommends and requires more manual work. You'd have to manually create custom lambda functions (AppSync) and code queries to DynamoDb for each data access element and manage authorization through some other means which I'm not aware of at this time. Maybe someone can chime in here...
Single table vs multi table info
Using Amplify with single table:
https://youtu.be/EOQqi6Yun7g
Single vs Multi Clip:
https://youtu.be/1WF_wped808?t=1251 (clip)
https://www.alexdebrie.com/posts/dynamodb-single-table/ (towards bottom)
https://youtu.be/EOQqi6Yun7g?t=1288 (clip)
Example single table design by Alex Debrie:
https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56
More Discussion:
https://stackoverflow.com/a/56438716/1956540
Basic Setup steps
I setup a single table by following the below instructions. Again, you don't use #models for this. Also, I think you have to include a type query {} in your schema for it to compile, but I could be wrong here.
So the basic steps are:
Create a single table (amplify add storage)
amplify push
Create your schema in the schema.graphql file.
Create supporting lambda function (amplify add function)
Note: if you look at the example here, I believe you can create an entry point to routes to all other methods: https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56#lambda
Add the DynamoDb query code in the function.
amplify push
Complete steps for Setting up a single Table:
https://catalog.us-east-1.prod.workshops.aws/workshops/53b10bf8-2271-4ab4-bfd2-39e878a90dc8/en-US/lab2/1-vtl (both "Connecting to an existing DynamoDB table" and "Direct Lambda Resolver" steps)
Not trying to be negative about Amplify, it is awesome, I love what they are doing with this product. I just think it's very new to everyone and I'm hoping this post is no longer valid next year and we continue to see great progress from the team.

The correct way to remove or update Item

I am building recommendation system for classified ads website , ads are added and deleted daily.
What I thought of is to use PutItems to add new ads and make field called status = 0 , if user deleted the ad , I will use the same PutItem API with the same ITEM_ID to update the stored Item, and use filter to select only ads with status = 0 when generation recommendation.
Is that correct ? will the PutItems API update the existing ad ? and is there anyway to delete the Item ?
Currently there is no way to remove items that were already added to Datasets.
Your workaround looks good, however from my experience with working with Personalize, the filter might decrease your recommendations quality.
To understand why, this is the more or less algorithm, that Personalize uses for filtering recommendations:
Get recommended items for user
Filter recommendations using filter expression
Return first N recommended items left after filtering
Because the filtering is done after getting recommendations, it means, that Personalize will simply fill recommendations list with items, that were somewhere down on the recommended list.
And there is a problem with that approach - items lower on the list, have lower "Score" value, which indicates accuracy of recommendations. That's why you will end up with in general worse recommendations, but it will depend how many ads that have status = 0 were recommended, before filtering out them.
To check your recommendations scores, simply get recommendations in Personalize web UI. It will return list of recs with scores.
Better approach
If your ads are updated daily, then you can definitely workaround it by following those steps:
Create a Lambda function, that is triggered every 24 hours
Lambda will fetch all of the ads and put them into S3 bucket as CSV file. It should exclude ads that are no longer available (status = 0)
Call CreateDatasetImportJob API using any AWS SDK of your choice and provide the data which is stored on S3 bucket
Personalize will start import job. When it finishes, all of the items are replaced with the newest dump
However it has some downsides.
If you are not using the User-Personalization (aws-user-personalization) Recipe, then after each import of Items, you need to update your Solution by creating new Solution Version. Otherwise it won't include changes made by items dataset import job.
Creating a new Solution Version is quite slow and expensive, that's why I would recommend to use User-Personalization Recipe, if you want to use this approach and since HRNN Recipes are marked as legacy, it's a good idea to migrate anyways.
If you are using User-Personalization Recipe, then according to AWS documentation:
Amazon Personalize automatically updates your latest solution version every two hours to include new data. Your campaign automatically uses the updated solution version. For more information see Automatic Updates.
So pretty much all of the work is done on Personalize side and you don't have to worry about Solution retraining after each Items import job.
And the last problem...
Since for User-Personalization Recipe documentation claims, that your solution will be updated within two hours, then you might end up with recommending items, that are not available, for some short period of time. If you are updating items daily, it might be a significant problem.
To fix that case, I would recommend simply using Filter approach, that you mentioned. Thanks to this, you have benefits of both approaches
and your recommendations are always valid.

DynamoDB - how to query by something that is not the primary key

So, I have a table on DybamoDB with this structure:
- userId as the primarykey (it's a uuid)
- email
- hashedPassword
I want to, as someone is signing up, find out if there's already someone using that email.
This should be easy but, as far as I know, you can't query on DynamoDB unless you are using the primary key as parameters or the sort key (and I'm not sure if it would make sense to make email a sort key).
The other way I found out was using a Global Secondary Index, which is pretty much an index table you create using another field as the primary sort of, but this is billable and since I'm still developing and testing I did not want to have expenses.
Does anyone have another option? Or am I wrong and there's another way to do it?
Like other answers, I also think that GSI is the best option here.
But I would like to also add that since search capabilities of DynamoDB are very limited, it is not uncommon to use DynamoDB with something else for that very purpose. One such use case is described in the AWS blog:
Indexing Amazon DynamoDB Content with Amazon Elasticsearch Service Using AWS Lambda
The main querying capabilities of DynamoDB are centered around lookups using a primary key. However, there are certain times where richer querying capabilities are required. Indexing the content of your DynamoDB tables with a search engine such as Elasticsearch would allow for full-text search.
Obviously, I don't recommend using ES over GSI in your scenario. But it is worth knowing that DynamoDB can be, and is often, used with other services to extend its search capabilities.
Even you put email as sort key alongside of userId as primary key, you can't query only using email(unless it is scan operation). You don't want to use scan to see whether email exists in your table. It's like iterating the each value by scanning the whole table.
I think your best option is global secondary index. Another option would be creating a new table which only includes email values, but in that case you have to write/maintain to multiple tables which is unnecessary.
The other way I found out was using a Global Secondary Index, which is pretty much a index table you create using another field as the primary sort of, but this is billable and since I'm still developing and testing I did not want to have expenses.
As #Ersoy has said, GSI is the legit solution, even it will increase the consumed writes units.
Dynamodb is cheap for a low-traffic app and/or a test environment, but to hold these expenses flat, you can:
Use dynamodb local during local devs/tests and CI builds
Choose a provisioned capacity mode for your table (you may find its free-tier interesting)

How to structure AWS DynamoDB Table with Cognito

I am trying to do something that would be relatively simple for a relational database but I don't know how to do it for a nonrelational one.
I am trying to make a simple task web app on AWS where people can post their tasks.
I have a table called tasks which uses the userid from the auth token provisioned by AWS Cognito. I am wondering how I can return the user information. I do not want to rely on Cognito by simply calling it every time a user sends a request. So, my thought would be to create another table to store all of the user information. That, however, is not a very nonrelational way of doing things since JOINS are so bad.
So, I was wondering if I should do any of the following
a) Using RDS instead
b) Not use Cognito and set up my own Auth system
c) Just doing the JOIN with a table containing all of the user info
d) Doing the request to Cognito each time
Although I personally like the idea of cognito, at this time it has some major drawbacks...
You can not backup / restore a user pool without loosing their password, also you have to implement your own backup/restore.
A way around is to save the user password in a cognito custom attribute.
I expected by using api gateway/lambda authorizer to have all the user data in the lambda context but its not there. Or am indoing something wrong with api gateway template mapping 😬
Good thing api gateway/lambda authorizer, can be cached by up to an hour, wont call the authorizer function again which seems like a top feature.
Does not work well with cloudformation, with every attribute update it recreates the user pool without restoring the users, thus loosing the users.
I used it only in one implementation and ended up duplicating the users in DynamoDB as well.
I'm avoiding it ever since. I wish they solve these issues as it looks like a service to be included with every project saving lot of time.
Reading your post I asked myself the same questions and not sure the answer either 😄
Pricing seems fair.
The default 5 requests/second to get user info seems strange as it woukd be consumed by one page load doing multiple ajax api requests .
For this in DynamoDB, there is no need for another table. If the access patterns dictate you store the information in another object, then so be it, but more than likely it should be in the same table. Sounds like you need two different item types in the same table.
For the task PK of userid and SK of task::your-task-id. This would allow you to get all of a user's tasks easily or even a specific task very easily if you knew the task ID. You might even have an attribute that is a timestamp and then have a GSI that is the userID as the PK and the timestamp as the SK. then you could use the begins_with operator on the SK and "paginate through all of the user's tasks that are in the month of 2019-04".
For the user information, have the userID be the PK and the SK be user_info and attributes be the user's information.
The one challenge for this is if you were to go to extremes and one single user is doing thousands of ops per second. e.g. "All tweets by very popular celebrity". If you have such a use case there are ways around that as well, e.g. write sharding. These are just examples for you to play with. Without knowing all your access patterns, I cannot model everything you might want to do. I highly recommend you go watch this presentation from reInvent 2018.

AWS Appsync multiple dynamodb requesst in one dynamodb resolver

I would like to know if it is possible to have multiple dynamodb request using only one dynamo resolver in AppSync?
Or the only/best way to have more complicated processing is to use a lambda function ?
Practically, no. You even cannot query on multiple indices in a single resource definition for an query, indeed.
However, if you are to use that structure for joining multiple DynamoDB tables, you can attach resolvers not to the query entry; but to the field you want to relate on other fields.
I had an issue like relating users to another table for containing the posts and I've passed it by attaching a resolver aiming the Posts field of the User type.
This issue refers to a similar problem and is quite helpful for that kind of cases: https://github.com/awslabs/aws-mobile-appsync-sdk-js/issues/17
If it is not the case of yours, you can elaborate the question. I may look like guessing your purpose for relating tables, all in all.
Have you looked at batch resolvers with AWS AppSync?https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-batch.html
This will allow you to write to one or more tables in a single request, and also allow you to do multiple write/read/delete operations in a single request.
You can do it with pipeline resolvers
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-pipeline-resolvers.html