I've been working on an application hosted on the AWS cloud that is part of a data pipeline. The application processes events from EventBridge, does some data mapping and then puts the result on a Kinesis stream.
The incoming events payload looks something like this (truncated for readability):
{
"version": "0",
"id": "9a0f9e20-c518-a968-7fa6-1d8038a5bcfc",
"detail-type": "Some sort of event",
....
}
and the event put onto the Kinesis stream looks something like:
{
"eventId": "9a0f9e20-c518-a968-7fa6-1d8038a5bcfc",
"eventTime": "2021-04-08T06:19:47.683Z",
"eventType": "created",
...
}
I looked at the "id" attribute on the incoming event and at first glance it looks like a UUID. I put a few examples into an online validator and it came back as a valid UUID. Since it is a UUID and is supposed to be "universally unique" I thought I might just reuse that ID for the "eventId" attribute of the outgoing payload. I thought that might even make it easier to trace events back to the source.
However, when I started my integration testing I started to notice alarms going off on unrelated services. There were validation errors happening all over the place. Turns out that the downstream services didn't like the format of "eventId".
The downstream services use the "uuid" NPM module to validate UUIDs in our event envelopes and it seems like it doesn't like the UUIDs that come from AWS. To make sure that I had diagnosed the problem correctly I fired up a node REPL and tried to validate one of the UUIDs that came through and sure enough it came back as invalid!
> const uuid = require('uuid');
> u.validate("9a0f9e20-c518-a968-7fa6-1d8038a5bcfc")
false
I then checked the regex that the 'uuid' module was using to do the validation and I noticed that it was checking for the numbers 1-5 in the first character of the third group of the UUID.
Confused, I checked out the Wikipedia page for UUIDs and discovered that the UUID version of the UUIDs coming from AWS is A, instead of the expected version numbers (1-5)
I have a few related questions:
Why does AWS have it's own UUID version?
Is it even a UUID?
Why would AWS go and violate the principle of least
astonishment
like that, surely it's easier to just use a regular UUID?
I'm hoping someone has an interesting story about how AWS had to invent their own UUID version to deal with some epic engineering problem that only happens at their scale, but I suppose I'll settle for a more simple answer.
Related
This is a real challenge. I have been successful in everything up to this point in Fleet provisioning on an embedded device. I have subscribed and published to topics and received new certificates and keys. But, when I take the certificateOwnershipToken that has been given to me and I try to trigger a DeviceRegistration, I get:
{"statusCode":400,"errorCode":"InvalidCertificateOwnershipToken","errorMessage":"Certificate ownership token cannot be empty."}
My token is 466 characters long and I send it with 2 other items in this string:
{"certificateOwnershipToken":"eyF1ZXJzaW9uIjoiMjAxOTEwMjMiLCJjaXBoZXIiOiJBaURqMUdYMjBiVTUwQTFsTUV4eEJaM3ZXREU1dXZSSURoTy80cGpLS1o1VkVHdlZHQm81THNaS1VydG0zcTdoZGtVR0l1cmJZS0dLVkx2dTZNL2ViT2pkVFdIeDEwU3o3aFZPeExERkxWVlJ4OUIvL2RzcXRIeVp1WVo2RXZoU1k0L0txQ0doZ1lyRklwZGlLK05pUlNHMXlLQXJUSGJXSkNlVUxHcHRPWHJtdHJaNWJMUyt1MHFUcjNJVnlVLzNpcGZVVm1PanpmL3NCYzdSNkNyVGJPZ05Nc2xmOXdHSVRWM0tPUjd1aFFSbnZySnY0S1ZtU2hYc2REODI4K1crRE1xYnRYZGUxSXlJU29XZTVTSHh6NVh2aFF3OGl3V09FSzBwbG15Zi82dUgyeERoNjB1WS9lMD0ifQ==","parameters":{"SerialNumber":"82B910","CertificateId":"175b43a3d605f22d30821c4a920a6231978e5d846d3f2e7a15d2375d2fd5098c"}}
My templates looks right, my policy looks correct. The role which is attached to my template seem to cover my needs. I just don't know how AWS is failing without more information.
Does anyone have ideas on how to proceed?
I found my problem. In C/C++ aws iot sdk -- there is a data structure where you must specify the payload string and a few other things. One of those data elements is the length of the payload and I forgot to set that length before sending my payload to the $aws/provisioning-templates//provision/json topic. Once I set that length, then the submission worked and the template was acted upon and the thing was created
I am trying to learn how to use AppSync and its DynamoDB integrations.
I have successfully created an AppSync GraphQL API and linked a resolver to a getter on the primary key and thought I understood what is happening. However, I can not get a putItem resolver to work at all and am struggling to find a useful way to debug the logic.
There is a cdk repository here which will deploy the app. Lines 133-145 have a hand written schema which I thought should work however that receives the error
One or more parameter values were invalid: Type mismatch for key food_name expected: S actual: NULL (Service: DynamoDb, Status Code: 400
I also have attempted to wrap the expressions in quotes but receive errors.
Where should I go from here?
The example data creates a table with keys
food_name
scientific_name
group
sub_group
with food_name as the primary key.
https://github.com/AG-Labs/AppSyncTask
Today I have attempted to reimplement the list resolver as
{
"version" : "2017-02-28",
"operation" : "Scan",
## Add 'limit' and 'nextToken' arguments to this field in your schema to implement pagination. **
"limit": $util.defaultIfNull(${ctx.args.limit}, 20),
"nextToken": $util.toJson($util.defaultIfNullOrBlank($ctx.args.nextToken, null))
}
with a response mapping of
$util.toJson($ctx.result.items)
In cloud watch I can see a list of results under log type ResponseMapping (albeit not correctly filtered but i'll ignore that for now) but these do not get returned to the querier. That result is simply
{
"data": {
"listGenericFoods": {
"items": null
}
}
}
I don't understand where this is going wrong.
The problem was that the resolvers were nested.
Writing a handwritten schema fixed the issue but resulted in a poorer API. Going back a few steps and will implement from the ground up slowly adding more resolvers.
The CloudWatch Logs once turned on helped somewhat but still required a lot of changing the resolvers ever so slightly and retrying.
Following AWS Personalize documents, I successfully imported my datasets (User, Item, Interaction) from S3, created an EventTrcker, trained the model, and deployed the campaign. The solution works without any issue and I get the recommendations.
I rely on Putevent to add new user-item interaction events. I also dump those interaction events using Lambda+firehose in my s3. But I am wondering if AWS Personalize internally creates/augments the original user-item interaction dataset? How I can access and download the revised version of the dataset? I cannot see any new dataset in "Dataset groups > Datasets" rather than my original 3 datasets...
I prefer to dump it regularly from AWS Personalize to my S3 storage rather than using my own Lambda+Firehose solution.
This is the output of my Putevent call. I see 200...but not sure it works fine or not...should I see any new dataset in "Dataset groups > Datasets" created by putevents?
{
"ResponseMetadata": {
"RequestId": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/json",
"date": "Mon, 04 Jan 2021 18:04:28 GMT",
"x-amzn-requestid": "a6c96496-cbd6-4ad8-9183-371d1794cbd8",
"content-length": "0",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}
Update: Now it's possible
AWS documentation:
https://docs.aws.amazon.com/personalize/latest/dg/export-data.html
You can use this AWS CLI command for exporting only interactions, that were added but PutEvents/PutUsers/PutItems API calls:
aws personalize create-dataset-export-job \
--job-name job name \
--dataset-arn dataset ARN \
--job-output "{\"s3DataDestination\":{\"kmsKeyArn\":\"kms key ARN\",\"path\":\"s3://bucket-name/folder-name/\"}}" \
--role-arn role ARN \
--ingestion-mode PUT
In that case --ingestion-mode PUT will make sure, that:
Specify PUT to export only data that you imported incrementally using the console or the PutEvents, PutUsers, or PutItems operations.
So I believe it covers your use case.
No, it's not possible
It's simply impossible right now to export this data.
There is no API to retrieve a dump of your Interactions dataset in Personalize.
I believe Lambda + Firehose workaround for this is correct approach.
But how to test, if PutEvents works?
To make sure, that Interactions added through PutEvents, you can make use of Filters feature:
https://docs.aws.amazon.com/personalize/latest/dg/filter-expressions.html
Pretty much create a new Filter, with similar expression:
EXCLUDE ItemID WHERE Interactions.EVENT_TYPE IN ("your_event_type_name")
Which will exclude from recommendations any item, that user previously interacted with.
Then you can test, if events added through PutEvents API are recognized correctly:
Create Filter expression as described above.
Create any campaign for simple recommendations (User-Personalization recipe).
Connect the filter to campaign.
Get recommendations for any user and save them somewhere.
Call PutEvents API with any of the recommended items, that was returned in 4 and user id from 4.
Again get recommendations for the same user as in 4.
If the item, that you did added with PutEvents call is no longer recommended, then you have a proof, that events added through PutEvents call are correctly added to Interactions dataset.
What if PutEvents call doesn't affect recommendations in that case?
Then simply you are providing incorrect values in API call. Personalize might return 200 response, even if event provided was invalid.
To fix that, try:
Make sure date is in correct format. Personalize might ignore events with very old timestamps, if there are much more newer events (it's possible to configure it in Solution config).
Check if you are not passing any strange values like "null" or "undefined" for sessionId, userId, trackingId in PutEvents params. It might cause ignoring the event by Personalize (https://github.com/aws/aws-sdk-js/issues/3371)
Make sure, you are passing correct eventType value (should match eventType in Solution and Filter).
If it still doesn't work, raise a support ticket to AWS with an example PutEvents API call params.
Are there any simpler solutions?
Well, maybe there are, but in our project we use this approach and it also tests, if filtering feature is working correctly. You will probably make use of Filtering anyways in the future, so I believe it's good enough method.
I have created a rule to send the incoming IoT messages to a S3 bucket.
The problem is that any time IoT recieves a messages is sended and stored in a new file (with the same name) in S3.
I want this S3 file to keep all the data from before and not truncate each time a new message is stored.
How can I do that?
When you set up an IoT S3 rule action, you need to specify a bucket and a key. The key is what we might think of as a "path and file name". As the docs say, we can specify the key string by using a substitution template, which is just a fancy way of saying "build a path out of these pieces of information". When you are building your substitution template, you can reference fields inside the message as well as use use a bunch of other functions
Especially look at the functions topic, timestamp, as well as some of the string manipulator functions.
Let's say your topic names are something like things/thing-id-xyz/location and you just want to store each incoming JSON message in a "folder" for the thing-id it came in from. You might specify a key like:
${topic(2)}/${timestamp()).json
it would evaluate to something like:
thing-id-xyz/1481825251155.json
where the timestamp part is the time the message came in. That will be different for each message, and then the messages would not overwrite each other.
You can also specify parts of the message itself. Let's imagine our incoming messages look something like this:
{
"time": "2022-01-13T10:04:03Z",
"latitude": 40.803274,
"longitude": -74.237926,
"note": "Great view!"
}
Let's say you want to use the nice ISO date value you have in your data instead of the timestamp of the file. You could reference the time field no problem, like:
${topic(2)}/${time}.json
Now the file would be written as the key:
thing-id-xyz/2022-01-13T10:04:03Z.json
You should be able to find some combination of values that works for your needs, and that most importantly, is UNIQUE for each message so they don't overwrite each other in S3.
You can do it using AWS IoT SQL variable expressions. For example use following as a key ${newuuid()}. This will create new s3 object for each message received.
See more about SQL Functions https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html
You can't do this with the S3 IoT Rule Action. You can get similar results using AWS Firehose, which will batch up several messages and write to one file. You will still end up with multiple files though.
Simple question, but I suspect it doesn't have a simple or easy answer. Still, worth asking.
We're creating an implementation for push notifications using AWS with our Web Server running on EC2, sending messages to a queue on SQS, which is dealt with using Lambda, which is sent finally to SNS to be delivered to the iOS/Android apps.
The question I have is this: is there a way to query SNS endpoints based on the custom user data that you can provide on creation? The only way I see to do this so far is to list all the endpoints in a given platform application, and then search through that list for the user data I'm looking for... however, a more direct approach would be far better.
Why I want to do this is simple: if I could attach a User Identifier to these Device Endpoints, and query based on that, I could avoid completely having to save the ARN to our DynamoDB database. It would save a lot of implementation time and complexity.
Let me know what you guys think, even if what you think is that this idea is impractical and stupid, or if searching through all of them is the best way to go about this!
Cheers!
There isn't the ability to have a "where" clause in ListTopics. I see two possibilities:
Create a new SNS topic per user that has some identifiable id in it. So, for example, the ARN would be something like "arn:aws:sns:us-east-1:123456789:know-prefix-user-id". The obvious downside is that you have the potential for a boat load of SNS topics.
Use a service designed for this type of usage like PubNub. Disclaimer - I don't work for PubNub or own stock but have successfully used it in multiple projects. You'll be able to target one or many users this way.
According the the [AWS documentation][1] if you try and create a new Platform Endpoint with the same User Data you should get a response with an exception including the ARN associated with the existing PlatformEndpoint.
It's definitely not ideal, but it would be a round about way of querying the User Data Endpoint attributes via exception.
//Query CustomUserData by exception
CreatePlatformEndpointRequest cpeReq = new CreatePlatformEndpointRequest().withPlatformApplicationArn(applicationArn).withToken("dummyToken").withCustomUserData("username");
CreatePlatformEndpointResult cpeRes = client.createPlatformEndpoint(cpeReq);
You should get an exception with the ARN if an endpoint with the same withCustomUserData exists.
Then you just use that ARN and away you go.