Analytics with AWS Amplify and Pinpoint - amazon-web-services

Currently working implementing web and mobile analytics via AWS Amplify and Pinpoint.
I've noticed that analytics section of the Amplify docs for JS differs from the related section for iOS and Android, specifically in the extra parameters/keys/values that an event object can include.
JS docs specify an attributes property, while the iOS and Android versions both specify something referred to as properties. See snippets below:
JavaScript
Analytics.record({
name: 'albumVisit',
// Attribute values must be strings
attributes: { genre: '', artist: '' }
});
iOS
func recordEvents() {
let properties: AnalyticsProperties = [
"eventPropertyStringKey": "eventPropertyStringValue",
"eventPropertyIntKey": 123,
"eventPropertyDoubleKey": 12.34,
"eventPropertyBoolKey": true
]
let event = BasicAnalyticsEvent(name: "eventName", properties: properties)
Amplify.Analytics.record(event: event)
}
Android
val event = AnalyticsEvent.builder()
.name("PasswordReset")
.addProperty("Channel", "SMS")
.addProperty("Successful", true)
.addProperty("ProcessDuration", 792)
.addProperty("UserAge", 120.3)
.build()
Amplify.Analytics.recordEvent(event)
Are attributes and properties interchangeable? I'm going to be using Amazon QuickSight to build out analytics dashboards, ultimately the collected event data will end up on S3, and queried using Athena. I'll need to define a schema for the table in Athena and I'm uncertain, based on the above, what format I can expect the data attributes/properties to be in. It seems like Amazon intended for attributes and properties to contain the same type of event-related information. But I'm confused as to why the naming convention differs between platforms.

On iOS & Android, Amplify Analytics' String & Boolean properties are mapped to Pinpoint attributes. Double & Integer properties are mapped to Pinpoint metrics.
Or, expresssed as a table:
Amplify property type (in)
Pinpoint type (out)
String
Attribute
Boolean
Attribute
Integer
Metric
Double
Metric
(Refer to Amplify's Android code, iOS code.)

On the off chance someone else notices the lack of explanation around this in their doc's... any property added via the mobile SDKs will appear within the attributes object in the record event payload.

Related

Sagemaker Data Capture does not write files

I want to enable data capture for a specific endpoint (so far, only via the console). The endpoint works fine and also logs & returns the desired results. However, no files are written to the specified S3 location.
Endpoint Configuration
The endpoint is based on a training job with a scikit learn classifier. It has only one variant which is a ml.m4.xlarge instance type. Data Capture is enabled with a sampling percentage of 100%. As data capture storage locations I tried s3://<bucket-name> as well as s3://<bucket-name>/<some-other-path>. With the "Capture content type" I tried leaving everything blank, setting text/csv in "CSV/Text" and application/json in "JSON".
Endpoint Invokation
The endpoint is invoked in a Lambda function with a client. Here's the call:
sagemaker_body_source = {
"segments": segments,
"language": language
}
payload = json.dumps(sagemaker_body_source).encode()
response = self.client.invoke_endpoint(EndpointName=endpoint_name,
Body=payload,
ContentType='application/json',
Accept='application/json')
result = json.loads(response['Body'].read().decode())
return result["predictions"]
Internally, the endpoint uses a Flask API with an /invocation path that returns the result.
Logs
The endpoint itself works fine and the Flask API is logging input and output:
INFO:api:body: {'segments': [<strings...>], 'language': 'de'}
INFO:api:output: {'predictions': [{'text': 'some text', 'label': 'some_label'}, ....]}
Data capture can be enabled by using the SDK as shown below -
data_capture_config = DataCaptureConfig(
enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
)
predictor = model.deploy(
initial_instance_count=1,
instance_type="ml.m4.xlarge",
endpoint_name=endpoint_name,
data_capture_config=data_capture_config,
)
Make sure to reference your data capture config in your endpoint creation step. I've always seen this method to work. Can you try this and let me know? Reference notebook
NOTE - I work for AWS SageMaker , but my opinions are my own.
So the issue seemed to be related to the IAM role. The default role (ModelEndpoint-Role) does not have access to write S3 files. It worked via the SDK since it uses another role in the sagemaker studio. I did not receive any error message about this.

Is it possible to export all recorded events together in AWS Pinpoint?

I am tracking user behaviour within my mobile react-native application. My configurations are working and I am recording an event like this:
await Analytics.record({
name: "TestEvent",
attributes: {
user: user,
attr: "test",
},
});
The events are successfully recorded and I can see them in the pinpoint dashboard.
The problem I am facing now is that I can only download the CSV file of a single event. Also, it seems to be necessary to filter for each individual attribute recorded to get a set of csv files belonging to a single event.
This seems to be very cumbersome. I would like to be able to download all records at once.
Is this possible within AWS Pinpoint? Additionally I tried to set up the Amazon Kinesis Data Stream - but there I can only access GET/PUT records etc. pp. but not the recorded event(-type)s.

AWS Personalize: Dumping User-item interaction Dataset Created By PutEvent

Following AWS Personalize documents, I successfully imported my datasets (User, Item, Interaction) from S3, created an EventTrcker, trained the model, and deployed the campaign. The solution works without any issue and I get the recommendations.
I rely on Putevent to add new user-item interaction events. I also dump those interaction events using Lambda+firehose in my s3. But I am wondering if AWS Personalize internally creates/augments the original user-item interaction dataset? How I can access and download the revised version of the dataset? I cannot see any new dataset in "Dataset groups > Datasets" rather than my original 3 datasets...
I prefer to dump it regularly from AWS Personalize to my S3 storage rather than using my own Lambda+Firehose solution.
This is the output of my Putevent call. I see 200...but not sure it works fine or not...should I see any new dataset in "Dataset groups > Datasets" created by putevents?
{
"ResponseMetadata": {
"RequestId": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/json",
"date": "Mon, 04 Jan 2021 18:04:28 GMT",
"x-amzn-requestid": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"content-length": "0",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}
Update: Now it's possible
AWS documentation:
https://docs.aws.amazon.com/personalize/latest/dg/export-data.html
You can use this AWS CLI command for exporting only interactions, that were added but PutEvents/PutUsers/PutItems API calls:
aws personalize create-dataset-export-job \
--job-name job name \
--dataset-arn dataset ARN \
--job-output "{\"s3DataDestination\":{\"kmsKeyArn\":\"kms key ARN\",\"path\":\"s3://bucket-name/folder-name/\"}}" \
--role-arn role ARN \
--ingestion-mode PUT
In that case --ingestion-mode PUT will make sure, that:
Specify PUT to export only data that you imported incrementally using the console or the PutEvents, PutUsers, or PutItems operations.
So I believe it covers your use case.
No, it's not possible
It's simply impossible right now to export this data.
There is no API to retrieve a dump of your Interactions dataset in Personalize.
I believe Lambda + Firehose workaround for this is correct approach.
But how to test, if PutEvents works?
To make sure, that Interactions added through PutEvents, you can make use of Filters feature:
https://docs.aws.amazon.com/personalize/latest/dg/filter-expressions.html
Pretty much create a new Filter, with similar expression:
EXCLUDE ItemID WHERE Interactions.EVENT_TYPE IN ("your_event_type_name")
Which will exclude from recommendations any item, that user previously interacted with.
Then you can test, if events added through PutEvents API are recognized correctly:
Create Filter expression as described above.
Create any campaign for simple recommendations (User-Personalization recipe).
Connect the filter to campaign.
Get recommendations for any user and save them somewhere.
Call PutEvents API with any of the recommended items, that was returned in 4 and user id from 4.
Again get recommendations for the same user as in 4.
If the item, that you did added with PutEvents call is no longer recommended, then you have a proof, that events added through PutEvents call are correctly added to Interactions dataset.
What if PutEvents call doesn't affect recommendations in that case?
Then simply you are providing incorrect values in API call. Personalize might return 200 response, even if event provided was invalid.
To fix that, try:
Make sure date is in correct format. Personalize might ignore events with very old timestamps, if there are much more newer events (it's possible to configure it in Solution config).
Check if you are not passing any strange values like "null" or "undefined" for sessionId, userId, trackingId in PutEvents params. It might cause ignoring the event by Personalize (https://github.com/aws/aws-sdk-js/issues/3371)
Make sure, you are passing correct eventType value (should match eventType in Solution and Filter).
If it still doesn't work, raise a support ticket to AWS with an example PutEvents API call params.
Are there any simpler solutions?
Well, maybe there are, but in our project we use this approach and it also tests, if filtering feature is working correctly. You will probably make use of Filtering anyways in the future, so I believe it's good enough method.

How get a metric sample from monitoring APIs

I took a look very carefully to monitoring API. As far as I have read, it is possible to use gcloud for creating Monitoring Policies and edit the Policies ( Using Aleert API).
Nevertheless, from one hand it seems gcloud is able only to create and edit policies options not for reading the result from such policies. From this page I read this options:
Creating new policies
Deleting existing policies
Retrieving specific policies
Retrieving all policies
Modifying existing policies
On another hand I read from result of a failed request
Summary of the result of a failed request to write data to a time series.
So it rings a bell in my mind that I do can get a list of results like all failed request to write during some period. But how?
Please, my straigh question is: can I somehow either listen alert events or get a list of alert reults throw Monitoring API v3?.
I see tag_firestore_instance somehow related to firestore but how to use it and which information can I search for? I can't find anywhere how to use it. Maybe as common get (eg. Postman/curl) or from gcloud shell.
PS.: This question was originally posted in Google Group but I was encoraged to ask here.
*** Edited after Alex's suggestion
I have an Angular page listening a document from my Firestore database
export class AppComponent {
public transfers: Observable<any[]>;
transferCollectionRef: AngularFirestoreCollection<any>;
constructor(public auth: AngularFireAuth, public db: AngularFirestore) {
this.listenSingleTransferWithToken();
}
async listenSingleTransferWithToken() {
await this.auth.signInWithCustomToken("eyJ ... CVg");
this.transferCollectionRef = this.db.collection<any>('transfer', ref => ref.where("id", "==", "1"));
this.transfers = this.transferCollectionRef.snapshotChanges().map(actions => {
return actions.map(action => {
const data = action.payload.doc.data();
const id = action.payload.doc.id;
return { id, ...data };
});
});
}
}
So, I understand there is at least one reader count to return from
name: projects/firetestjimis
filter: metric.type = "firestore.googleapis.com/document/read_count"
interval.endTime: 2020-05-07T15:09:17Z
It was a little difficult to follow what you were saying, but here's what I've figured out.
This is a list of available Firestore metrics: https://cloud.google.com/monitoring/api/metrics_gcp#gcp-firestore
You can then pass these metric types to this API
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/list
On that page, I used the "Try This API" tool on the right side and filled in the following
name = projects/MY-PROJECT-ID
filter = metric.type = "firestore.googleapis.com/api/request_count"
interval.endTime = 2020-05-05T15:01:23.045123456Z
In chrome's inspector, i can see that this is the GET request that the tool made:
https://content-monitoring.googleapis.com/v3/projects/MY-PROJECT-ID/timeSeries?filter=metric.type%20%3D%20%22firestore.googleapis.com%2Fapi%2Frequest_count%22&interval.endTime=2020-05-05T15%3A01%3A23.045123456Z&key=API-KEY-GOES-HERE
EDIT:
The above returned 200, but with an empty json payload.
We also needed to add the following entry to get data to populate
interval.startTime = 2020-05-04T15:01:23.045123456Z
Also try going here console.cloud.google.com/monitoring/metrics-explorer and type firestore in the "Find resource type and metric" box and see if google's own dashboards has data populating. (This is to confirm that there is actually data there for you to fetch)

How to download publicly available pdf and png files from S3 with AppSync

I'm fairly new to GraphQL and AWS AppSync, and I'm running into an issue downloading files (PDFs and PNGs) from a public S3 bucket via AWS AppSync. I've looked at dozens of tutorials and dug through a mountain of documentation, and I'm just not certain what's going on at this point. This may be nothing more than a misunderstanding about the nature of GraphQL or AppSync functionality, but I'm completely stumped.
For reference, I've heavily sourced from other posts like How to upload file to AWS S3 using AWS AppSync (specifically, from the suggestions by the accepted answer author), but none of the solutions (or the variations I've attempted) are working.
The Facts
S3 bucket is publicly accessible – i.e., included folders and files are not tied to individual users with Cognito credentials
Files are uploaded to S3 outside of AppSync (so there's no GraphQL mutation); it's a manual file upload
Schema works for all other queries and mutations
We are using AWS Cognito to authenticate users and queries
Abridged Schema and DynamoDB Items
Here's an abridged version of the relevant GraphQL schema types:
type MetroCard implements TripCard {
id: ID!
cardType: String!
resIds: String!
data: MetroData!
file: S3Object
}
type MetroData implements DataType {
sourceURL: String!
sourceFileURL: String
metroName: String!
}
type S3Object {
bucket: String!
region: String!
key: String!
}
Metadata about the files is stored in DynamoDB and looks something like this:
{
"data": {
"metroName": "São Paulo Metro",
"sourceFileURL": "http://www.metro.sp.gov.br/pdf/mapa-da-rede-metro.pdf",
"sourceURL": "http://www.metro.sp.gov.br/en/your-trip/index.aspx"
},
"file": {
"bucket": "test-images",
"key": "some_folder/sub_folder/bra-sbgr-metro-map.pdf",
"region": "us-east-1"
},
"id": "info/en/bra/sbgr/metro"
}
VTL Request/Response Resolvers
For our getMetroCard(id: ID!): MetroCard query, the mapping templates are pretty vanilla. The request template is a standard query on a DynamoDB table. The response template is a basic $util.toJson($ctx.result).
For the field-level resolver on MetroCard.file, we've attached a local data source with an empty {} payload for the request and the following for the response (see referenced link for reasoning):
$util.toJson($util.dynamodb.fromS3ObjectJson($context.source.file)) // we've played with this bit in a couple of ways, including simply returning $context.result but no change
Results
All of the query fields resolve appropriately; however, the file field inevitably always returns null no matter what the field-level resolver is mapped to. Interestingly, I've noticed in the CloudWatch logs the value of context.result does change from null to {} with the above mapping template.
Questions
Given the above, I have several questions:
Does AppSync file download require files to be uploaded to S3 with user credentials through a mutation with a complex object handler in order to make them retrievable?
What should a successful response look like in the AppSync console return – i.e., I have no client implementation (like a React Native app) to test successful file downloads? More directly, is it actually retrieving the files, and I just don't know it? (Note: I actually have tested it briefly with a React Native client, but nothing rendered so I've just been using the AppSync console returns as direction ever since.)
Does it make more sense to remove the file download process entirely from our schema? (I'm assuming the answers I need reveal that AppSync just wasn't built for file transfer like this, and so we'll need to rethink our approach.)
Update
I've started playing around with the data source for MetroCard.file per the suggestion of this recent post https://stackoverflow.com/a/52142178/5989171. If I make the data source the same as the database storing the file metadata, I now get the error mentioned in the ref but his solution doesn't seem to be working for me. Specifically, I now get the following:
"message": "Value for field '$[operation]' not found."
Our Solution
For our use case, we've decided to go ahead and use the AWS Amplify Storage module as suggested here: https://twitter.com/presbaw/status/1040800650790002689. Despite that, I'm keeping this question open and unanswered, because I'm just genuinely curious about what I'm not understanding here, and I have a feeling I'm not the only one!
$util.toJson($util.dynamodb.fromS3ObjectJson($context.source.file))
You can only use this if your DynamoDB save file field as format: {"s3":{"key":"file.jpg","bucket":"bucket_name/folder","region":"us-east-1"}}