Getting recommendations for new users with AWS personalize - amazon-web-services

I've used AWS personalize to create a campaign that can successfully produce recommendations for the users/events/items that I have uploaded.
I now want to produce recommendations for new users - ones not in the initial dataset. I thought the way to do this was to create an event stream, post their initial interactions, and then somehow this would get blended into the campaign, but I get the same recommendations back regardless of what I seed the new user with.
What's the correct way of achieving this?

From what I understand from https://docs.aws.amazon.com/cli/latest/reference/personalize/create-solution-version.html and https://aws.amazon.com/personalize/pricing/ you have to retrain your model to incorporate new data with create-solution-version call. After the new version is created for an active solution campaign should start returning updated results.
Please let me know if this resolves your problem.

You can get default recommendations for any new user(USER_ID) as long as you don't have any filter created for user_schema. If you have created filter like gender etc, you will have to first call "put_user" API for new user and insert that record in AWS personalize "user dataset". Once, that put request is executed during the registration process or so, then your user will start getting recommendations. As that user interacts more, their recommendations will change in real time.
https://docs.aws.amazon.com/personalize/latest/dg/importing-users.html

Related

AWS Personalize - Recommender retraining questions

I'm a new user with AWS Personalize. So, I only have a few questions about recommender retraining below.
Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right? following this docs
that's all from my question.
Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
The recommenders for domain dataset groups automatically manage the inference endpoint for you. So the step of creating a campaign is not necessary. The service handles this.
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
Correct. Training and retraining is managed by the service for domain recommenders.
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right?
You can send in new event data two ways. First, an event tracker can be used to incrementally stream in new events. In this case, Personalize will use new events to adjust recommendations in near-real-time to match the user's evolving intent (retraining is not necessary for this). Personalize will also persist those new events in the incremental interactions dataset so they are included in the next retraining.
The other way you can send in new event data is with a bulk import of the interactions dataset. Since bulk imports replace the previous bulk import, your bulk files need to include all interaction history you want to train on and not just new interactions. Bulk imports of the interactions dataset are included in the next retraining.

Paginate GetRecommendations API of AWS Personalize

I'm currently try to create a recommendation page which incorporates a infinite scrolling pagination (something like Instagram Explore) but could't find a way to do it with AWS SDK out of the box.
Is there any other way to fetch the data from AWS Personalize in a paginated way and ensuring the retrieved data not duplicate?
The GetRecommendations API for Personalize currently does not support pagination so the only way to use this API directly would be to fetch up to 500 items from the client in one call and then progressively reveal recommended items as the user scrolls. Or you could create an intermediate public API endpoint using something like API Gateway & Lambda that supports pagination and then lazily fetch and load recommendations for a user in a datastore like Redis. A Redis lrange or zrange to paginate.
The advantage of retrieving the max number of recommendations and then paginating over them is that they represent a snapshot of recommendations at a moment in time. Since Personalize will potentially adjust recommendations based on new interactions streamed into the service, recommendations could change from one call to GetRecommendations to the next. This could create a user experience where the same item appears to be duplicated because it drops down in relevancy between calls and shows up in multiple "pages" of results.

What's the best practice to implement "read receipts" on group chats in AWS AppSync and Amplify?

I'm building an Angular 11 web app using AppSync for the backend.
I've mentioned group chat, but basically I have a feature in my app where I have an announcement feature where there's a person creating announcements to a specific audience (can be individual members or groups of members) and whenever the receiving user opens the announcement, it has to mark that announcement as read for that user in their UI and also let the sender know that it has been opened by that particular member.
I have an idea for implementing this:-
Each announcement needs to have a "seenBy" which aggregates the user Ids of the ones who open it.
Each member also has an attribute in their user object named "announcementsRead" which is an array of Ids of the announcements that they have opened.
In the UI when I'm gathering the list of announcements for the user, the ones whose ID don't belong in the member's own announcementsRead array, will be marked as unread.
When they click on it and it is opened, I make 2 updates - a) To the announcement object I simply push the member's user ID to the "seenBy" attribute and push to db. b) to the member's user object, I add the announcement's id to the "announcementRead" attribute and push it to the DB.
This is just something that I came up with.
Please let me know if there are any pitfalls to this approach. Or if there are simpler ways to achieve this functionality.
I have a few concerns as well:-
Let's say that two users are opening an announcement at the same time, and the clients try to update the announcement with the updated seenBy containing the user's ID, what happens when the two requests from two different clients are happening concurrently? It's possible that the first user fetches the object and then the second user fetches it immediately, and by the time the second user has updated the attribute and sent it back to the DB, the first user has already written their updated data. In such a case the second user's write to the DB will overwrite the first user's change. I am not sure of the internal mechanisms of the amplify data store, but I can imagine this happening. Is this possible? If so, how do we ensure that it is prevented?
Is it really necessary for me to maintain the "announcementsRead" attribute in the user? I mean I can imagine generating that list in the UI every time I get the list of announcements by checking if the current user's ID exists in the announcement's "seenBy" and maintaining that list in the UI, that way we can eliminate redundancy of info in the DB and also it would make sense to not accumulate extremely old announcement IDs that may have been deleted. But I'm wondering if having this on the member actually helps in an indispensable way.
Hope my questions are clear.

The correct way to remove or update Item

I am building recommendation system for classified ads website , ads are added and deleted daily.
What I thought of is to use PutItems to add new ads and make field called status = 0 , if user deleted the ad , I will use the same PutItem API with the same ITEM_ID to update the stored Item, and use filter to select only ads with status = 0 when generation recommendation.
Is that correct ? will the PutItems API update the existing ad ? and is there anyway to delete the Item ?
Currently there is no way to remove items that were already added to Datasets.
Your workaround looks good, however from my experience with working with Personalize, the filter might decrease your recommendations quality.
To understand why, this is the more or less algorithm, that Personalize uses for filtering recommendations:
Get recommended items for user
Filter recommendations using filter expression
Return first N recommended items left after filtering
Because the filtering is done after getting recommendations, it means, that Personalize will simply fill recommendations list with items, that were somewhere down on the recommended list.
And there is a problem with that approach - items lower on the list, have lower "Score" value, which indicates accuracy of recommendations. That's why you will end up with in general worse recommendations, but it will depend how many ads that have status = 0 were recommended, before filtering out them.
To check your recommendations scores, simply get recommendations in Personalize web UI. It will return list of recs with scores.
Better approach
If your ads are updated daily, then you can definitely workaround it by following those steps:
Create a Lambda function, that is triggered every 24 hours
Lambda will fetch all of the ads and put them into S3 bucket as CSV file. It should exclude ads that are no longer available (status = 0)
Call CreateDatasetImportJob API using any AWS SDK of your choice and provide the data which is stored on S3 bucket
Personalize will start import job. When it finishes, all of the items are replaced with the newest dump
However it has some downsides.
If you are not using the User-Personalization (aws-user-personalization) Recipe, then after each import of Items, you need to update your Solution by creating new Solution Version. Otherwise it won't include changes made by items dataset import job.
Creating a new Solution Version is quite slow and expensive, that's why I would recommend to use User-Personalization Recipe, if you want to use this approach and since HRNN Recipes are marked as legacy, it's a good idea to migrate anyways.
If you are using User-Personalization Recipe, then according to AWS documentation:
Amazon Personalize automatically updates your latest solution version every two hours to include new data. Your campaign automatically uses the updated solution version. For more information see Automatic Updates.
So pretty much all of the work is done on Personalize side and you don't have to worry about Solution retraining after each Items import job.
And the last problem...
Since for User-Personalization Recipe documentation claims, that your solution will be updated within two hours, then you might end up with recommending items, that are not available, for some short period of time. If you are updating items daily, it might be a significant problem.
To fix that case, I would recommend simply using Filter approach, that you mentioned. Thanks to this, you have benefits of both approaches
and your recommendations are always valid.

How to structure AWS DynamoDB Table with Cognito

I am trying to do something that would be relatively simple for a relational database but I don't know how to do it for a nonrelational one.
I am trying to make a simple task web app on AWS where people can post their tasks.
I have a table called tasks which uses the userid from the auth token provisioned by AWS Cognito. I am wondering how I can return the user information. I do not want to rely on Cognito by simply calling it every time a user sends a request. So, my thought would be to create another table to store all of the user information. That, however, is not a very nonrelational way of doing things since JOINS are so bad.
So, I was wondering if I should do any of the following
a) Using RDS instead
b) Not use Cognito and set up my own Auth system
c) Just doing the JOIN with a table containing all of the user info
d) Doing the request to Cognito each time
Although I personally like the idea of cognito, at this time it has some major drawbacks...
You can not backup / restore a user pool without loosing their password, also you have to implement your own backup/restore.
A way around is to save the user password in a cognito custom attribute.
I expected by using api gateway/lambda authorizer to have all the user data in the lambda context but its not there. Or am indoing something wrong with api gateway template mapping 😬
Good thing api gateway/lambda authorizer, can be cached by up to an hour, wont call the authorizer function again which seems like a top feature.
Does not work well with cloudformation, with every attribute update it recreates the user pool without restoring the users, thus loosing the users.
I used it only in one implementation and ended up duplicating the users in DynamoDB as well.
I'm avoiding it ever since. I wish they solve these issues as it looks like a service to be included with every project saving lot of time.
Reading your post I asked myself the same questions and not sure the answer either 😄
Pricing seems fair.
The default 5 requests/second to get user info seems strange as it woukd be consumed by one page load doing multiple ajax api requests .
For this in DynamoDB, there is no need for another table. If the access patterns dictate you store the information in another object, then so be it, but more than likely it should be in the same table. Sounds like you need two different item types in the same table.
For the task PK of userid and SK of task::your-task-id. This would allow you to get all of a user's tasks easily or even a specific task very easily if you knew the task ID. You might even have an attribute that is a timestamp and then have a GSI that is the userID as the PK and the timestamp as the SK. then you could use the begins_with operator on the SK and "paginate through all of the user's tasks that are in the month of 2019-04".
For the user information, have the userID be the PK and the SK be user_info and attributes be the user's information.
The one challenge for this is if you were to go to extremes and one single user is doing thousands of ops per second. e.g. "All tweets by very popular celebrity". If you have such a use case there are ways around that as well, e.g. write sharding. These are just examples for you to play with. Without knowing all your access patterns, I cannot model everything you might want to do. I highly recommend you go watch this presentation from reInvent 2018.