AWS Personalize - Recommender retraining questions - amazon-web-services

I'm a new user with AWS Personalize. So, I only have a few questions about recommender retraining below.
Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right? following this docs
that's all from my question.

Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
The recommenders for domain dataset groups automatically manage the inference endpoint for you. So the step of creating a campaign is not necessary. The service handles this.
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
Correct. Training and retraining is managed by the service for domain recommenders.
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right?
You can send in new event data two ways. First, an event tracker can be used to incrementally stream in new events. In this case, Personalize will use new events to adjust recommendations in near-real-time to match the user's evolving intent (retraining is not necessary for this). Personalize will also persist those new events in the incremental interactions dataset so they are included in the next retraining.
The other way you can send in new event data is with a bulk import of the interactions dataset. Since bulk imports replace the previous bulk import, your bulk files need to include all interaction history you want to train on and not just new interactions. Bulk imports of the interactions dataset are included in the next retraining.

Related

What is the difference between "Use Case Optimized Recommenders" and "Custom Recommender Solutions" in Amazon Personalize?

I'm new to Amazon Personalize, I'm checking the price of this service on this link and I see 3 different categories ("Use Case Optimized Recommenders" "User Segmentation" and "Custom Recommender Solutions"). I wonder what the main difference between them is?
As I noticed, the Use Case Optimized Recommenders price doesn't include "Training cost" and "TPS cost". Is this true? How can this Recommendation Mode work without Training?
Also, what should I do if I upload new data from a new user and need to re-train each month? Can I do it in the Use Case Optimized Recommenders since they don't have "Training Cost"? Since the price from Custom Recommender Solutions for real-time recommendations is quite high.
Training and retraining is managed by Personalize for Use Case Optimized Recommenders. They are designed specifically for the most common use cases in Media (VOD) and Retail and are intended to make it easier to launch and operate recommendation engines for these industries. They must be created within a Domain Dataset Group.
Domain dataset group: A dataset group containing preconfigured resources for different business domains and use cases. Amazon Personalize manages the life cycle of training models and deployment. When you create a Domain dataset group, you choose your business domain, import your data, and create recommenders for each of your use cases. You use your recommender in your application to get recommendations with the GetRecommendations operation.
Therefore, the cost for retraining Use Case Optimized Recommenders is built-into their pricing. There is still a cost for real-time recommenders when you exceed the free number of recommendations per hour.
Custom Recommenders do not support automatic training/retraining so you are responsible for initiating training by creating Solution Versions. Note that you can add custom recommenders to a domain dataset group but you cannot add use case optimized recommenders to a custom dataset group.
If you start with a Domain dataset group, you can still add custom resources such as solutions and solution versions trained with recipes for custom use cases.
Regardless of the dataset group type you create, you still want to keep your datasets updated with the latest interactions and item/user data.
User Segmentation is designed for building segments of users based on their affinity for items or item attributes. They are considered custom recommenders from a training/retraining perspective.
The AWS pricing calculator for Personalize was recently updated to support Use Case Optimized Recommenders and User Segmentation.

feeding real time data into aws personalize

I want to feed real time data into aws personalize to build a recommendation engine. I've read online resources and in those guides, I could see that the training user-interaction data, user data and item data is provided in the beginning while creating the recommendation engine.
However, I have an app and I will gather data in the app and want to feed those realtime data into aws personalize. I want to know if building the recommendation engine is possible without providing any data at first and then stream real time data from my app later with the putevents, putItem and putUser api from aws-sdk? I'm quite new to this so I'm quite confused with this initial step
I want to know if building the recommendation engine is possible without providing any data at first and then stream real time data from my app later with the putevents, putItem and putUser api from aws-sdk?
Yes, it is possible. You just need to adjust the sequence of creating resources.
Interaction data is required for all Personalize recipes before a recommender can be created that provides recommendations. However, if you don't have interaction data (or enough data; see quotas and limits) to start with, you can create a dataset group and an interactions dataset, feed interactions to the dataset using the PutEvents API (see recording events page), and then create a domain recommender or custom solution when enough data has been ingested.
The minimum amount of interaction data (and potentially item metadata) required before you can train a model/recommender depends on the recipe that you select. Generally speaking, you will need 1000 interactions across 25 distinct users where each of those users has 2+ interactions. The domain recommenders also require specific event types. Check the docs linked above. The quality and relevance of recommendations will improve as you collect more data and retrain.

Single table db architecture with AWS Amplify

By default AWS Amplify transformers creating tables per each graphql type.
But according DynamoDB documentation it's best practice to
Keep tables few as possible
Keep often queried together entries within a same table
I have an impression Amplify way of doing things stays in contradiction with the statement above.
I am new to both NoSQL and Amplify
Can someone suggest ways to address those issues?
I think we're in a bit of a transition or gray area here. I'm very new to Amplify and have been investigating moving to a single-table design as there are sources (below) that indicate that it's always be there but you'd have to write everything in VTL templates. But in 2020 they released direct lambda resolver support: https://youtu.be/EOQqi6Yun7g?t=960 (clip)
However, it seems like you lose access to the #auth directive (and probably others because you're no longer going to use #model) along with a lot of the nice out-of-the-box functionality that's available with Amplify's multi-table approach.
At this point, being that I'm developing a new app, I'm going to stick with the default multi-table design to hasten the process of getting the app functional.
Trying to implement the single-table design seems to go against what the Amplify team recommends and requires more manual work. You'd have to manually create custom lambda functions (AppSync) and code queries to DynamoDb for each data access element and manage authorization through some other means which I'm not aware of at this time. Maybe someone can chime in here...
Single table vs multi table info
Using Amplify with single table:
https://youtu.be/EOQqi6Yun7g
Single vs Multi Clip:
https://youtu.be/1WF_wped808?t=1251 (clip)
https://www.alexdebrie.com/posts/dynamodb-single-table/ (towards bottom)
https://youtu.be/EOQqi6Yun7g?t=1288 (clip)
Example single table design by Alex Debrie:
https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56
More Discussion:
https://stackoverflow.com/a/56438716/1956540
Basic Setup steps
I setup a single table by following the below instructions. Again, you don't use #models for this. Also, I think you have to include a type query {} in your schema for it to compile, but I could be wrong here.
So the basic steps are:
Create a single table (amplify add storage)
amplify push
Create your schema in the schema.graphql file.
Create supporting lambda function (amplify add function)
Note: if you look at the example here, I believe you can create an entry point to routes to all other methods: https://gist.github.com/dabit3/96dc51e688b18a7d40fc534331758c56#lambda
Add the DynamoDb query code in the function.
amplify push
Complete steps for Setting up a single Table:
https://catalog.us-east-1.prod.workshops.aws/workshops/53b10bf8-2271-4ab4-bfd2-39e878a90dc8/en-US/lab2/1-vtl (both "Connecting to an existing DynamoDB table" and "Direct Lambda Resolver" steps)
Not trying to be negative about Amplify, it is awesome, I love what they are doing with this product. I just think it's very new to everyone and I'm hoping this post is no longer valid next year and we continue to see great progress from the team.

Do I need to update item csv in AWS personalize?

I'm trying to use AWS personalize, and following their documents.
So I've uploaded dataset files(interaction, user, item) to S3, then created a solution and a campaign.
And I implemented PutEvents API using java.
GetRecommendations API call works good.
At this moment I'm curious I need to update dataset files, especially item csv.
In general it's done at this point for very basic recommendations.
Since you are using PutEvents call, then all of the real-time events are added to Interactions dataset this way. Interactions datasets created by manual import and by PutEvents calls are separated from themselves. You can actually see them in Personalize Datasets web console.
Still you might want to update dataset files, using dataset import job feature, but it's going to replace your existing dataset. In general I would recommend using it only when:
You just created a fresh/bigger/better dump of your database with Interactions.
You've found, that your previous interactions dataset was invalid.
The schema of dataset changed (pretty much you are forced to do it then).
User or Item dataset changed/improved, it's actually a good idea to refresh it often, so Personalize can produce better recommendations. Keep in mind, that it also requires retraining of the Solution, so the new Items/Users will be included during the recommendations generation.
So for interactions you usually don't want to update dataset. For other datasets it might be a good idea to even create an automatic import mechanism.
Keep in mind, that Items and Users datasets are used only with Personalize Recipes, that support metadata. Otherwise they are simply ignored.

AWS Machine Learning Retrain Model

I have some Models created in AWS Machine Learning with a S3 csv file.
After a lot of search I didn't find the better way to retrain my model.
I would like to know if there any any option to retrain my models with new data or I if need to create a new one each time.
Amazon ML is providing a set of API (and SDKs) that allows creating programmatically a pipeline that will take new data from S3 and generate the datasource and the ML models from it.
All the components including datasources, ML models, evaluation etc. are immutable, and if you want to retrain, you need to recreate it. It allows you to roll back to a previous model, if the performance of the new model is not better that the old model.