I'm currently try to create a recommendation page which incorporates a infinite scrolling pagination (something like Instagram Explore) but could't find a way to do it with AWS SDK out of the box.
Is there any other way to fetch the data from AWS Personalize in a paginated way and ensuring the retrieved data not duplicate?
The GetRecommendations API for Personalize currently does not support pagination so the only way to use this API directly would be to fetch up to 500 items from the client in one call and then progressively reveal recommended items as the user scrolls. Or you could create an intermediate public API endpoint using something like API Gateway & Lambda that supports pagination and then lazily fetch and load recommendations for a user in a datastore like Redis. A Redis lrange or zrange to paginate.
The advantage of retrieving the max number of recommendations and then paginating over them is that they represent a snapshot of recommendations at a moment in time. Since Personalize will potentially adjust recommendations based on new interactions streamed into the service, recommendations could change from one call to GetRecommendations to the next. This could create a user experience where the same item appears to be duplicated because it drops down in relevancy between calls and shows up in multiple "pages" of results.
Related
I want to feed real time data into aws personalize to build a recommendation engine. I've read online resources and in those guides, I could see that the training user-interaction data, user data and item data is provided in the beginning while creating the recommendation engine.
However, I have an app and I will gather data in the app and want to feed those realtime data into aws personalize. I want to know if building the recommendation engine is possible without providing any data at first and then stream real time data from my app later with the putevents, putItem and putUser api from aws-sdk? I'm quite new to this so I'm quite confused with this initial step
I want to know if building the recommendation engine is possible without providing any data at first and then stream real time data from my app later with the putevents, putItem and putUser api from aws-sdk?
Yes, it is possible. You just need to adjust the sequence of creating resources.
Interaction data is required for all Personalize recipes before a recommender can be created that provides recommendations. However, if you don't have interaction data (or enough data; see quotas and limits) to start with, you can create a dataset group and an interactions dataset, feed interactions to the dataset using the PutEvents API (see recording events page), and then create a domain recommender or custom solution when enough data has been ingested.
The minimum amount of interaction data (and potentially item metadata) required before you can train a model/recommender depends on the recipe that you select. Generally speaking, you will need 1000 interactions across 25 distinct users where each of those users has 2+ interactions. The domain recommenders also require specific event types. Check the docs linked above. The quality and relevance of recommendations will improve as you collect more data and retrain.
I'm a new user with AWS Personalize. So, I only have a few questions about recommender retraining below.
Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right? following this docs
that's all from my question.
Currently, I focus on E-Commerce data set group and use the e-commerce use-case recommender. If I use this; It can't create a campaign right?
The recommenders for domain dataset groups automatically manage the inference endpoint for you. So the step of creating a campaign is not necessary. The service handles this.
If I understand correctly this one is no need to retrain the model right? (If I use recommender above) because I read in many docs, it has only a retraining process when we use only the custom resource and create a campaign right?
Correct. Training and retraining is managed by the service for domain recommenders.
So, when I increment the new event data, the recommender will apply the new data directly for recommendations, right? If yes, that means we don't need to focus on the retraining process for the e-commerce use case right?
You can send in new event data two ways. First, an event tracker can be used to incrementally stream in new events. In this case, Personalize will use new events to adjust recommendations in near-real-time to match the user's evolving intent (retraining is not necessary for this). Personalize will also persist those new events in the incremental interactions dataset so they are included in the next retraining.
The other way you can send in new event data is with a bulk import of the interactions dataset. Since bulk imports replace the previous bulk import, your bulk files need to include all interaction history you want to train on and not just new interactions. Bulk imports of the interactions dataset are included in the next retraining.
I am building recommendation system for classified ads website , ads are added and deleted daily.
What I thought of is to use PutItems to add new ads and make field called status = 0 , if user deleted the ad , I will use the same PutItem API with the same ITEM_ID to update the stored Item, and use filter to select only ads with status = 0 when generation recommendation.
Is that correct ? will the PutItems API update the existing ad ? and is there anyway to delete the Item ?
Currently there is no way to remove items that were already added to Datasets.
Your workaround looks good, however from my experience with working with Personalize, the filter might decrease your recommendations quality.
To understand why, this is the more or less algorithm, that Personalize uses for filtering recommendations:
Get recommended items for user
Filter recommendations using filter expression
Return first N recommended items left after filtering
Because the filtering is done after getting recommendations, it means, that Personalize will simply fill recommendations list with items, that were somewhere down on the recommended list.
And there is a problem with that approach - items lower on the list, have lower "Score" value, which indicates accuracy of recommendations. That's why you will end up with in general worse recommendations, but it will depend how many ads that have status = 0 were recommended, before filtering out them.
To check your recommendations scores, simply get recommendations in Personalize web UI. It will return list of recs with scores.
Better approach
If your ads are updated daily, then you can definitely workaround it by following those steps:
Create a Lambda function, that is triggered every 24 hours
Lambda will fetch all of the ads and put them into S3 bucket as CSV file. It should exclude ads that are no longer available (status = 0)
Call CreateDatasetImportJob API using any AWS SDK of your choice and provide the data which is stored on S3 bucket
Personalize will start import job. When it finishes, all of the items are replaced with the newest dump
However it has some downsides.
If you are not using the User-Personalization (aws-user-personalization) Recipe, then after each import of Items, you need to update your Solution by creating new Solution Version. Otherwise it won't include changes made by items dataset import job.
Creating a new Solution Version is quite slow and expensive, that's why I would recommend to use User-Personalization Recipe, if you want to use this approach and since HRNN Recipes are marked as legacy, it's a good idea to migrate anyways.
If you are using User-Personalization Recipe, then according to AWS documentation:
Amazon Personalize automatically updates your latest solution version every two hours to include new data. Your campaign automatically uses the updated solution version. For more information see Automatic Updates.
So pretty much all of the work is done on Personalize side and you don't have to worry about Solution retraining after each Items import job.
And the last problem...
Since for User-Personalization Recipe documentation claims, that your solution will be updated within two hours, then you might end up with recommending items, that are not available, for some short period of time. If you are updating items daily, it might be a significant problem.
To fix that case, I would recommend simply using Filter approach, that you mentioned. Thanks to this, you have benefits of both approaches
and your recommendations are always valid.
I've used AWS personalize to create a campaign that can successfully produce recommendations for the users/events/items that I have uploaded.
I now want to produce recommendations for new users - ones not in the initial dataset. I thought the way to do this was to create an event stream, post their initial interactions, and then somehow this would get blended into the campaign, but I get the same recommendations back regardless of what I seed the new user with.
What's the correct way of achieving this?
From what I understand from https://docs.aws.amazon.com/cli/latest/reference/personalize/create-solution-version.html and https://aws.amazon.com/personalize/pricing/ you have to retrain your model to incorporate new data with create-solution-version call. After the new version is created for an active solution campaign should start returning updated results.
Please let me know if this resolves your problem.
You can get default recommendations for any new user(USER_ID) as long as you don't have any filter created for user_schema. If you have created filter like gender etc, you will have to first call "put_user" API for new user and insert that record in AWS personalize "user dataset". Once, that put request is executed during the registration process or so, then your user will start getting recommendations. As that user interacts more, their recommendations will change in real time.
https://docs.aws.amazon.com/personalize/latest/dg/importing-users.html
I use marketing fb graph API through my fb app with developer level access. I need to basically get all data (info about campaigns, ad sets, stats, and so on) as often as possible. There are some limits and I'm reaching them pretty quickly calling graph marketing API. After a few hundred calls I'm stuck with err #17 - request limit reached.
My questions are:
is it possible and how to increase the limit
what's the efficient way of getting all data. I'm mostly interested in tracking changes (smth has been added/ updated/ deleted and stats), but first of course I need to gather all my marketing account data somehow.
Requesting each object for it's info / stats is going to hurt after a while. If you need detailed stats about engagement, social reach, impressions and such, you have a couple options:
The /reportstats endpoint is available for most objects and overs aggregation filters, specificity, column selecting and other useful things and can also be called to asynchronously generate a stats report for all of the objects you requests. This can even be done on an account level. You could easily retrieve a low-level aggregation of delivery stats for the entire tree of objects in your account or campaign.
Another more recent development in their API is the Insights endpoint, which is a centralized endpoint for retrieving stats as well and operates more like the rest of the graph API (request field names, filtering, preset, etc).
I recommend reading up on each for retrieving delivery stats for your ads api objects.
Additionally, if you're looking for the objects' attributes themselves, there are a couple options. The most useful one may be that you can request the graph api with no endpoint and a list of object ids in the params. The request would look like this graph.facebook.com/?ids=000000000. Furthermore, you can use facebook's fields query parameter to return attributes about that object. But, wait; there's more. Facebook's graph api offers field expansion for connected objects through this same query parameter. That is to say, you could request all campaigns' attributes, all ad sets' names and all ad sets' ads' names like so:
curl -G \
-d 'ids=00000000,111111111' \ # ids of campaigns
-d 'fields=name,adcampaigns.limit(50).fields(name),adgroups.limit(50).fields(name)' \
-d 'access_token=xxxxxxxxxx' \
https://graph.facebook.com/v2.4