AWS Personalize: Dumping User-item interaction Dataset Created By PutEvent - amazon-web-services

Following AWS Personalize documents, I successfully imported my datasets (User, Item, Interaction) from S3, created an EventTrcker, trained the model, and deployed the campaign. The solution works without any issue and I get the recommendations.
I rely on Putevent to add new user-item interaction events. I also dump those interaction events using Lambda+firehose in my s3. But I am wondering if AWS Personalize internally creates/augments the original user-item interaction dataset? How I can access and download the revised version of the dataset? I cannot see any new dataset in "Dataset groups > Datasets" rather than my original 3 datasets...
I prefer to dump it regularly from AWS Personalize to my S3 storage rather than using my own Lambda+Firehose solution.
This is the output of my Putevent call. I see 200...but not sure it works fine or not...should I see any new dataset in "Dataset groups > Datasets" created by putevents?
{
"ResponseMetadata": {
"RequestId": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/json",
"date": "Mon, 04 Jan 2021 18:04:28 GMT",
"x-amzn-requestid": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"content-length": "0",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}

Update: Now it's possible
AWS documentation:
https://docs.aws.amazon.com/personalize/latest/dg/export-data.html
You can use this AWS CLI command for exporting only interactions, that were added but PutEvents/PutUsers/PutItems API calls:
aws personalize create-dataset-export-job \
--job-name job name \
--dataset-arn dataset ARN \
--job-output "{\"s3DataDestination\":{\"kmsKeyArn\":\"kms key ARN\",\"path\":\"s3://bucket-name/folder-name/\"}}" \
--role-arn role ARN \
--ingestion-mode PUT
In that case --ingestion-mode PUT will make sure, that:
Specify PUT to export only data that you imported incrementally using the console or the PutEvents, PutUsers, or PutItems operations.
So I believe it covers your use case.
No, it's not possible
It's simply impossible right now to export this data.
There is no API to retrieve a dump of your Interactions dataset in Personalize.
I believe Lambda + Firehose workaround for this is correct approach.
But how to test, if PutEvents works?
To make sure, that Interactions added through PutEvents, you can make use of Filters feature:
https://docs.aws.amazon.com/personalize/latest/dg/filter-expressions.html
Pretty much create a new Filter, with similar expression:
EXCLUDE ItemID WHERE Interactions.EVENT_TYPE IN ("your_event_type_name")
Which will exclude from recommendations any item, that user previously interacted with.
Then you can test, if events added through PutEvents API are recognized correctly:
Create Filter expression as described above.
Create any campaign for simple recommendations (User-Personalization recipe).
Connect the filter to campaign.
Get recommendations for any user and save them somewhere.
Call PutEvents API with any of the recommended items, that was returned in 4 and user id from 4.
Again get recommendations for the same user as in 4.
If the item, that you did added with PutEvents call is no longer recommended, then you have a proof, that events added through PutEvents call are correctly added to Interactions dataset.
What if PutEvents call doesn't affect recommendations in that case?
Then simply you are providing incorrect values in API call. Personalize might return 200 response, even if event provided was invalid.
To fix that, try:
Make sure date is in correct format. Personalize might ignore events with very old timestamps, if there are much more newer events (it's possible to configure it in Solution config).
Check if you are not passing any strange values like "null" or "undefined" for sessionId, userId, trackingId in PutEvents params. It might cause ignoring the event by Personalize (https://github.com/aws/aws-sdk-js/issues/3371)
Make sure, you are passing correct eventType value (should match eventType in Solution and Filter).
If it still doesn't work, raise a support ticket to AWS with an example PutEvents API call params.
Are there any simpler solutions?
Well, maybe there are, but in our project we use this approach and it also tests, if filtering feature is working correctly. You will probably make use of Filtering anyways in the future, so I believe it's good enough method.

Related

List all LogGroups using cdk

I am quite new to the CDK, but I'm adding a LogQueryWidget to my CloudWatch Dashboard through the CDK, and I need a way to add all LogGroups ending with a suffix to the query.
Is there a way to either loop through all existing LogGroups and finding the ones with the correct suffix, or a way to search through LogGroups.
const queryWidget = new LogQueryWidget({
title: "Error Rate",
logGroupNames: ['/aws/lambda/someLogGroup'],
view: LogQueryVisualizationType.TABLE,
queryLines: [
'fields #message',
'filter #message like /(?i)error/'
],
})
Is there anyway I can add it so logGroupNames contains all LogGroups that end with a specific suffix?
You cannot do that dynamically (i.e. you can't make this work such that if you add a new LogGroup, the query automatically adjusts), without using something like AWS lambda that periodically updates your Log Query.
However, because CDK is just a code, there is nothing stopping you from making an AWS SDK API call inside the code to retrieve all the log groups (See https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogGroups-property) and then populate logGroupNames accordingly.
That way, when CDK compiles, it will make an API call to fetch LogGroups and then generated CloudFormation will contain the log groups you need. Note that this list will only be updated when you re-synthesize and re-deploy your stack.
Finally, note that there is a limit on how many Log Groups you can query with Log Insights (20 according to https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html).
If you want to achieve this, you can create a custom resource using AwsCustomResource and AwsSdkCall classes to do the AWS SDK API call (as mentioned by #Tofig above) as part of the deployment. You can read data from the API call response as well and act on it as you want.

Get all items in DynamoDB with API Gateway's Mapping Template

Is there a simple way to retrieve all items from a DynamoDB table using a mapping template in an API Gateway endpoint? I usually use a lambda to process the data before returning it but this is such a simple task that a Lambda seems like an overkill.
I have a table that contains data with the following format:
roleAttributeName roleHierarchyLevel roleIsActive roleName
"admin" 99 true "Admin"
"director" 90 true "Director"
"areaManager" 80 false "Area Manager"
I'm happy with getting the data, doesn't matter the representation as I can later transform it further down in my code.
I've been looking around but all tutorials explain how to get specific bits of data through queries and params like roles/{roleAttributeName} but I just want to hit roles/ and get all items.
All you need to do is
create a resource (without curly braces since we dont need a particular item)
create a get method
use Scan instead of Query in Action while configuring the integration request.
Configurations as follows :
enter image description here
now try test...you should get the response.
to try it out on postman deploy the api first and then use the provided link into postman followed by your resource name.
API Gateway allows you to Proxy DynamoDB as a service. Here you have an interesting tutorial on how to do it (you can ignore the part related to index to make it work).
To retrieve all the items from a table, you can use Scan as the action in API Gateway. Keep in mind that DynamoDB limits the query sizes to 1MB either for Scan and Query actions.
You can also limit your own query before it is automatically done by using the Limit parameter.
AWS DynamoDB Scan Reference

Is there a way to create Quicksight analysis purely through code (boto3)?

What I currently have in my Quicksight account is a Data Source (Redshift), some datasets (some Redshift views) and an analysis (graphs and charts that use the datasets). I can view all of these on the AWS Quicksight Console. But when I use boto3 to create a data source and datasets, nothing shows up on the console. They do however show up when I use the list_data_sources and list_data_sets calls.
After this, I need to create all the graphs by code that I created manually. I can't currently find an option to do this through code. There is a 'create_template' api call which is supposed to create a template through an existing Quicksight analysis. But it requires the ARN of the analysis which I can't find.
Any suggestions on what to do?
Note: this only answers why the data sets/sources do not appear in the console. As for the other question, I assume mjgpy3 was of some help.
Summary
Add the permissions at the bottom of this post to your data set and data source in order for them to appear in the console. Make sure to fill in the principal arn with your details.
Details
In order for data sets and data sources to appear in the console when created via the API, you must ensure that the correct permissions have been added to them. Without adding the correct permissions, it is true that the CLI lists them whereas the console does not.
If you have created data sets/sources via the console, you can use the CLI (aws quicksight describe-data-set-permissions and aws quicksight describe-data-source-permissions) to view what permissions AWS gives them so that your account can interact with them.
I've tested this and these are what AWS assigns them as of 25/03/2020.
Data Set permissions:
"permissions": [
{
"Principal": "arn:aws:quicksight:<region>:<aws_account_id>:user/default/{IAM user name}",
"Actions": [
"quicksight:UpdateDataSetPermissions",
"quicksight:DescribeDataSet",
"quicksight:DescribeDataSetPermissions",
"quicksight:PassDataSet",
"quicksight:DescribeIngestion",
"quicksight:ListIngestions",
"quicksight:UpdateDataSet",
"quicksight:DeleteDataSet",
"quicksight:CreateIngestion",
"quicksight:CancelIngestion"
]
}
]
Data Source permissions:
"permissions": [
{
"Principal": "arn:aws:quicksight:<region>:<aws_account_id>:user/default/{IAM user name}",
"Actions": [
"quicksight:UpdateDataSourcePermissions",
"quicksight:DescribeDataSource",
"quicksight:DescribeDataSourcePermissions",
"quicksight:PassDataSource",
"quicksight:UpdateDataSource",
"quicksight:DeleteDataSource"
]
}
]
It sounds like your smaller question is regarding the ARN of the analysis.
The format of analysis ARNs is
arn:aws:quicksight:$AWS_REGION:$AWS_ACCOUNT_ID:analysis/$ANALYSIS_ID
Where
$AWS_REGION is replaced with the region in which the analysis lives
$AWS_ACCOUNT_ID is replaced with your AWS account ID
$ANALYSIS_ID is replaced with the analysis ID
If you're looking for the $ANALYSIS_ID it's the GUID-looking thing on the end of the URL for the analysis in the QuickSight URL
So, if you were on an analysis at the URL
https://quicksight.aws.amazon.com/sn/analyses/018ef6393-2c71-4842-9798-1aa2f0902804
the analysis ID would be 018ef6393-2c71-4842-9798-1aa2f0902804 (this is a fake ID I injected for this example).
Your larger question seems to be whether you can use the create_template API to duplicate your analysis. The answer at this moment (12/16/19) is, unfortunately, no.
You can use the create_dashboard API to publish a Dashboard from a template made with create_template but you can't create an Analysis from a template.
I'm answering this bit just to clarify since you may actually be okay with creating a dashboard (basically the published version of an analysis) rather than another analysis.
There are multiple ways you can find analysis id associated. Use any of the following.
A dashboard url has dashboard id included, Use this ID to execute API call describe-dashboard and you would see analysis ARN in the source entity.
Click on "save as" option on the dashboard and it would take you to the associated analysis. [ One might not see this option if a dashboard is created from a template ]
A dashboard ID can also be found by using list_dashboards API call. Print all the dashboard ID and name. You can match the ID with the given dashboard name.Look at the whole list because a dashboard id is unique but the dashboard name is not. One can have multiple dashboards with the same name.
Yes you can create lambda and trigger using cron Job
import boto3
quicksight = boto3.client('quicksight')
response = quicksight.create_ingestion(AwsAccountId=XXXXXXX,
DataSetId=YYYY,IngestionId=ZZZZ)
https://aws.amazon.com/blogs/big-data/automate-dataset-monitoring-in-amazon-quicksight/
https://aws.amazon.com/blogs/big-data/event-driven-refresh-of-spice-datasets-in-amazon-quicksight/
I've been playing with this as well and ran into the same issue. Make sure that your permissions are set up properly for the data source and the data set by referencing the quicksight user as follows:
arn:aws:quicksight:{region}:xxxxxxxxxx:user/default/{user}
I would include all the quicksight permissions found in the docs to start with and shave down from there. If nothing else, create the data source/set from the console, and then use the describe-* CLI call to see what they use.
It's kind of wonky.

How to download publicly available pdf and png files from S3 with AppSync

I'm fairly new to GraphQL and AWS AppSync, and I'm running into an issue downloading files (PDFs and PNGs) from a public S3 bucket via AWS AppSync. I've looked at dozens of tutorials and dug through a mountain of documentation, and I'm just not certain what's going on at this point. This may be nothing more than a misunderstanding about the nature of GraphQL or AppSync functionality, but I'm completely stumped.
For reference, I've heavily sourced from other posts like How to upload file to AWS S3 using AWS AppSync (specifically, from the suggestions by the accepted answer author), but none of the solutions (or the variations I've attempted) are working.
The Facts
S3 bucket is publicly accessible – i.e., included folders and files are not tied to individual users with Cognito credentials
Files are uploaded to S3 outside of AppSync (so there's no GraphQL mutation); it's a manual file upload
Schema works for all other queries and mutations
We are using AWS Cognito to authenticate users and queries
Abridged Schema and DynamoDB Items
Here's an abridged version of the relevant GraphQL schema types:
type MetroCard implements TripCard {
id: ID!
cardType: String!
resIds: String!
data: MetroData!
file: S3Object
}
type MetroData implements DataType {
sourceURL: String!
sourceFileURL: String
metroName: String!
}
type S3Object {
bucket: String!
region: String!
key: String!
}
Metadata about the files is stored in DynamoDB and looks something like this:
{
"data": {
"metroName": "São Paulo Metro",
"sourceFileURL": "http://www.metro.sp.gov.br/pdf/mapa-da-rede-metro.pdf",
"sourceURL": "http://www.metro.sp.gov.br/en/your-trip/index.aspx"
},
"file": {
"bucket": "test-images",
"key": "some_folder/sub_folder/bra-sbgr-metro-map.pdf",
"region": "us-east-1"
},
"id": "info/en/bra/sbgr/metro"
}
VTL Request/Response Resolvers
For our getMetroCard(id: ID!): MetroCard query, the mapping templates are pretty vanilla. The request template is a standard query on a DynamoDB table. The response template is a basic $util.toJson($ctx.result).
For the field-level resolver on MetroCard.file, we've attached a local data source with an empty {} payload for the request and the following for the response (see referenced link for reasoning):
$util.toJson($util.dynamodb.fromS3ObjectJson($context.source.file)) // we've played with this bit in a couple of ways, including simply returning $context.result but no change
Results
All of the query fields resolve appropriately; however, the file field inevitably always returns null no matter what the field-level resolver is mapped to. Interestingly, I've noticed in the CloudWatch logs the value of context.result does change from null to {} with the above mapping template.
Questions
Given the above, I have several questions:
Does AppSync file download require files to be uploaded to S3 with user credentials through a mutation with a complex object handler in order to make them retrievable?
What should a successful response look like in the AppSync console return – i.e., I have no client implementation (like a React Native app) to test successful file downloads? More directly, is it actually retrieving the files, and I just don't know it? (Note: I actually have tested it briefly with a React Native client, but nothing rendered so I've just been using the AppSync console returns as direction ever since.)
Does it make more sense to remove the file download process entirely from our schema? (I'm assuming the answers I need reveal that AppSync just wasn't built for file transfer like this, and so we'll need to rethink our approach.)
Update
I've started playing around with the data source for MetroCard.file per the suggestion of this recent post https://stackoverflow.com/a/52142178/5989171. If I make the data source the same as the database storing the file metadata, I now get the error mentioned in the ref but his solution doesn't seem to be working for me. Specifically, I now get the following:
"message": "Value for field '$[operation]' not found."
Our Solution
For our use case, we've decided to go ahead and use the AWS Amplify Storage module as suggested here: https://twitter.com/presbaw/status/1040800650790002689. Despite that, I'm keeping this question open and unanswered, because I'm just genuinely curious about what I'm not understanding here, and I have a feeling I'm not the only one!
$util.toJson($util.dynamodb.fromS3ObjectJson($context.source.file))
You can only use this if your DynamoDB save file field as format: {"s3":{"key":"file.jpg","bucket":"bucket_name/folder","region":"us-east-1"}}

Query AWS SNS Endpoints by User Data

Simple question, but I suspect it doesn't have a simple or easy answer. Still, worth asking.
We're creating an implementation for push notifications using AWS with our Web Server running on EC2, sending messages to a queue on SQS, which is dealt with using Lambda, which is sent finally to SNS to be delivered to the iOS/Android apps.
The question I have is this: is there a way to query SNS endpoints based on the custom user data that you can provide on creation? The only way I see to do this so far is to list all the endpoints in a given platform application, and then search through that list for the user data I'm looking for... however, a more direct approach would be far better.
Why I want to do this is simple: if I could attach a User Identifier to these Device Endpoints, and query based on that, I could avoid completely having to save the ARN to our DynamoDB database. It would save a lot of implementation time and complexity.
Let me know what you guys think, even if what you think is that this idea is impractical and stupid, or if searching through all of them is the best way to go about this!
Cheers!
There isn't the ability to have a "where" clause in ListTopics. I see two possibilities:
Create a new SNS topic per user that has some identifiable id in it. So, for example, the ARN would be something like "arn:aws:sns:us-east-1:123456789:know-prefix-user-id". The obvious downside is that you have the potential for a boat load of SNS topics.
Use a service designed for this type of usage like PubNub. Disclaimer - I don't work for PubNub or own stock but have successfully used it in multiple projects. You'll be able to target one or many users this way.
According the the [AWS documentation][1] if you try and create a new Platform Endpoint with the same User Data you should get a response with an exception including the ARN associated with the existing PlatformEndpoint.
It's definitely not ideal, but it would be a round about way of querying the User Data Endpoint attributes via exception.
//Query CustomUserData by exception
CreatePlatformEndpointRequest cpeReq = new CreatePlatformEndpointRequest().withPlatformApplicationArn(applicationArn).withToken("dummyToken").withCustomUserData("username");
CreatePlatformEndpointResult cpeRes = client.createPlatformEndpoint(cpeReq);
You should get an exception with the ARN if an endpoint with the same withCustomUserData exists.
Then you just use that ARN and away you go.