What's a "cloud-native" way to convert a Location History REST API into AWS Location pings? - amazon-web-services

My use case: I've got a Spot Tracker that sends location data up every 5 minutes. I'd like to get these pings into AWS Location, so I can do geofencing, mapping, and other fun stuff with them.
Spot offers a REST API that will show the last X number of events, such as:
"messages": {
"message": [
{
"id": 1605371088,
"latitude": 41.26519,
"longitude": -95.99069,
"dateTime": "2021-06-26T23:21:24+0000",
"batteryState": "GOOD",
"altitude": -103
},
{
"id": 1605371124,
"latitude": 41.2639,
"longitude": -95.98545,
"dateTime": "2021-06-26T23:11:24+0000",
"altitude": 0
},
{
"id": 1605365385,
"latitude": 41.25448,
"longitude": -95.94189,
"dateTime": "2021-06-26T23:06:01+0000",
"altitude": -103
},
...
]
}
What's the most idiomatic, cloud-native way to turn these into pings that go into AWS Location?
Here's a diagram of my initial approach:
The idea is, use a timed Lambda to periodically hit the Spot endpoint, and keep track of the latest one I've sent out in a store like Dynamo:
I'm not an AWS expert, but I feel like there must be a cleaner integration. Are there other tools that would help with this? Is there anything in AWS IOT, for example, that would help me not have to keep track of the last one I uploaded?

Related

Best way to load 1MM JSON records into AWS Redshift with Kinesis Firehose?

I've got a bunch of JSON records that I want to add to an Amazon Redshift instance from S3, via Kinesis Firehose. It's several hundred files, give or take, that have 1,000 or so records each, and each file looks like the below sample. For my purposes, I don't care about the info entry, at least for now. I have a working Kinesis Firehose service that can update my Redshift DB with the sample stock ticker data, so that part is OK. My questions are (and hopefully this shouldn't actually be split into two different posts):
This is in large part a learning exercise, so if it's overkill for what I'm trying to do, that's OK. If there's a reason it's actually a bad idea, let me know.
If I want to just ignore the info field, do I have to use a Lambda to strip it, or is there a way to do that without one? If so, are there any tricks that wouldn't be the same as writing a script to process from a regular textfile? As I'm typing this I realize I could probably just put info in the DB and never touch it, but if there's a reason not to do that, or a cleaner way than that, I'd appreciate hearing it.
When I have individual manufacturers with a set of features, and there could be dozens of features per manufacturer, does it make sense to make a separate DB table for features, or am I coming at it from a Python dict/Perl hash perspective that doesn't make sense for a SQL DB when I need to tie them back together later?
Sample:
{
"info": {
"generated_on": "2022-08-09 19:25:34",
"version": "v1"
},
"manufacturer": [
{
"name": "Audi",
"id": 1,
"num_features": 2,
"features": [
{
"name": "seat heaters",
"standard": "N",
"cost": 100
},
{
"name": "A/C",
"standard": "Y",
"cost": 0
}
]
},
{
"name": "BMW",
"id": 2,
"num_features": 3,
"features": [
{
"name": "seat heaters",
"standard": "Y",
"cost": 0
},
{
"name": "backup camera",
"standard": "N",
"cost": 500
},
{
"name": "A/C",
"standard": "Y",
"cost": 0
}
]
}
]
}

How do I extract a string of numbers from random text in Power Automate?

I am setting up a flow to organize and save emails as PDF in a Dropbox folder. The first email that will arrive includes a 10 digit identification number which I extract along with an address. My flow creates a folder in Dropbox named in this format: 2023568684 : 123 Main St. Over a few weeks, additional emails arrive that I need to put into that folder. The subject always has a 10 digit number in it. I was building around each email and using functions like split, first, last, etc. to isolate the 10 digits ID. The problem is that there is no consistency in the subjects or bodies of the messages to be able to easily find the ID with that method. I ended up starting to build around each email format individually but there are way too many, not to mention the possibility of new senders or format changes.
My idea is to use List files in folder when a new message arrives which will create an array that I can filter to find the folder ID the message needs to be saved to. I know there is a limitation on this because of the 20 file limit but that is a different topic and question.
For now, how do I find a random 10 digit number in a randomly formatted email subject line so I can use it with the filter function?
For this requirement, you really need regex and at present, PowerAutomate doesn't support the use of regex expressions but the good news is that it looks like it's coming ...
https://powerusers.microsoft.com/t5/Power-Automate-Ideas/Support-for-regex-either-in-conditions-or-as-an-action-with/idi-p/24768
There is a connector but it looks like it's not free ...
https://plumsail.com/actions/request-free-license
To get around it for now, my suggestion would be to create a function app in Azure and let it do the work. This may not be your cup of tea but it will work.
I created a .NET (C#) function with the following code (straight in the portal) ...
#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
string strToSearch = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String((string)data?.Text));
string regularExpression = data?.Pattern;
var matches = System.Text.RegularExpressions.Regex.Matches(strToSearch, regularExpression);
var responseString = JsonConvert.SerializeObject(matches, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
return new ContentResult()
{
ContentType = "application/json",
Content = responseString
};
}
Then in PowerAutomate, call the HTTP action passing in a base64 encoded string of the content you want to search ...
The is the expression in the JSON ... base64(variables('String to Search')) ... and this is the json you need to pass in ...
{
"Text": "#{base64(variables('String to Search'))}",
"Pattern": "[0-9]{10}"
}
This is an example of the response ...
[
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 33,
"Length": 10,
"Value": "2023568684"
},
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 98,
"Length": 10,
"Value": "8384468684"
}
]
Next, add a Parse JSON action and use this schema ...
{
"type": "array",
"items": {
"type": "object",
"properties": {
"Groups": {
"type": "object",
"properties": {}
},
"Success": {
"type": "boolean"
},
"Name": {
"type": "string"
},
"Captures": {
"type": "array"
},
"Index": {
"type": "integer"
},
"Length": {
"type": "integer"
},
"Value": {
"type": "string"
}
},
"required": [
"Groups",
"Success",
"Name",
"Captures",
"Index",
"Length",
"Value"
]
}
}
Finally, extract the first value that you find which matches the regex pattern. It returns multiple results if found so if you need to, you can do something with those.
This is the expression ... #{first(body('Parse_JSON'))?['value']}
From this string ...
We're going to search for string 2023568684 within this text and we're also going to try and find 8384468684, this should work.
... this is the result ...
Don't have a Premium PowerAutomate licence so can't use the HTTP action?
You can do this exact same thing using the LogicApps service in Azure. It's the same engine with some slight differences re: connectors and behaviour.
Instead of the HTTP, use the Azure Functions action.
In relation to your action to fire when an email is received, in LogicApps, it will poll every x seconds/minutes/hours/etc. rather than fire on event. I'm not 100% sure which email connector you're using but it should exist.
Dropbox connectors exist, that's no problem.
You can export your PowerAutomate flow into a LogicApps format so you don't have to start from scratch.
https://learn.microsoft.com/en-us/azure/logic-apps/export-from-microsoft-flow-logic-app-template
If you're concerned about cost, don't be. Just make sure you use the consumption plan. Costs only really rack up for these services when the apps run for minutes at a time on a regular basis. Just keep an eye on it for your own mental health.
TO get the function URL, you can find it in the function itself. You have to be in the function ...

I want to find out the total RAM size of AWS RDS through lambda python.I tried the code and got empty set.Is there any other way to find this?

import json
import boto3,datetime
def lambda_handler(event, context):
cloudwatch = boto3.client('cloudwatch',region_name=AWS_REGION)
response = cloudwatch.get_metric_data(
MetricDataQueries=[
{
'Id': 'memory',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/RDS',
'MetricName': 'TotalMemory',
'Dimensions': [
{
"Name": "DBInstanceIdentifier",
"Value": "mydb"
}]
},
'Period': 30,
'Stat': 'Average',
}
}
],
StartTime=(datetime.datetime.now() - datetime.timedelta(seconds=300)).timestamp(),
EndTime=datetime.datetime.now().timestamp()
)
print(response)
The result is like below:
{'MetricDataResults': [{'Id': 'memory', 'Label': 'TotalMemory', 'Timestamps': [], 'Values': [], 'StatusCode': 'Complete'}]
If you are looking to get the configured vCPU/Memory then it seems like we need to call DescribeDBInstances API to get DBInstanceClass, which contains the hardware information from here
You would need to use one of the CloudWatch metric names from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MonitoringOverview.html#rds-metrics and it seems like we can retrieve the currently available memory metric using the FreeableMemory. I was able to get data (in bytes) as seen from the RDS' Monitoring console while using this metric name from your sample code.
You can check the total amount of memory and other useful information associated with the RDS in the CloudWatch console.
Step1: Go to the CloudWatch console. Navigate to Log groups.
Step2: Search for RDSOSMetrics in the search bar.
Step3: Click on the log stream. You will be able to find all the details in the JSON. Your total memory would be present in the field titled memory.total. Sample result would be like this
{
"engine": "MYSQL",
"instanceID": "dbName",
"uptime": "283 days, 21:08:36",
"memory": {
"writeback": 0,
"free": 171696,
"hugePagesTotal": 0,
"inactive": 1652000,
"pageTables": 19716,
"dirty": 324,
"active": 5850016,
"total": 7877180,
"buffers": 244312
}
}
I have intentionally reduced the message in the JSON because of the size, but there will be many other useful fields that you can find here.
You can use custom jq command-line utility to extract the field that you want from these log groups.
You can read more about this here cloudwatch enhanced monitoring.

AWS X-Ray trace segments are missing or not connected

I have code that creates a segment when a queue is being read. In the first function (within the same lambda) I have this:
import * as AWSXRay from 'aws-xray-sdk'; // (using TypeScrpt)
AWSXRay.enableManualMode();
var segment1 = new AWSXRay.Segment("A");
In the second function (within the same lambda), called from the first, I have something like this:
var segment2 = new AWSXRay.Segment("B", segment1.trace_id, segment1.id);
Instead of seeing
*->A->B
On the AWS graph (on the website), I see:
*->A
*->B
...where they are not even associated, even though they have the same tracing ID, and the parent IDs are properly set. I seem to be missing something but not sure what...?
I even tried to pull X-Amzn-Trace-Id from the API request to use that as the root tracking ID for everything but that didn't work either.
This is the JSON for the first segment (A):
{
"Duration": 0.808,
"Id": "1-5d781a08-d41b49e35c3c0f38cdbd4912",
"Segments": [
{
"Document": {
"id": "74c99567f73185ce",
"name": "router",
"start_time": 1568152071.979,
"end_time": 1568152072.787,
"parent_id": "ef34fc0bcf23bbbe",
"aws": {
"xray": {
"sdk": "X-Ray for Node.js",
"sdk_version": "2.3.6",
"package": "aws-xray-sdk"
}
},
"service": {
"version": "unknown",
"runtime": "node",
"runtime_version": "v10.16.3",
"name": "unknown"
},
"trace_id": "1-5d781a08-d41b49e35c3c0f38cdbd4912"
},
"Id": "74c99567f73185ce"
}
]
}
This is the JSON for the second segment (B):
{
"Duration": 0.801,
"Id": "1-5d781a08-d9626abbab1cfbbfe4ff0dff",
"Segments": [
{
"Document": {
"id": "e2b4faaa6538bbb2",
"name": "handleCreateLoad",
"start_time": 1568152071.98,
"end_time": 1568152072.781,
"parent_id": "74c99567f73185ce",
"aws": {
"xray": {
"sdk": "X-Ray for Node.js",
"sdk_version": "2.3.6",
"package": "aws-xray-sdk"
}
},
"service": {
"version": "unknown",
"runtime": "node",
"runtime_version": "v10.16.3",
"name": "unknown"
},
"trace_id": "1-5d781a08-d9626abbab1cfbbfe4ff0dff",
"subsegments": [
{
"id": "08ccf2f374364066",
"name": "...-CreateLoad",
"start_time": 1568152071.981,
"end_time": 1568152072.781
}
]
},
"Id": "e2b4faaa6538bbb2"
}
]
}
It's quite clear the the parent ID for 'B' (74c99567f73185ce) points to "A"'s ID, but the graph does not connect them.
Also, I think _x_amzn_trace_id should be set when the lambda executes, but it is not. That may be root of my issues.
Turns out process.env._x_amzn_trace_id, required by the AWS XRay SDK, does NOT exist until the handler is called. It may help others to know what I went through:
At first I tried to get the trace details for the current lambda on start up (before the handler is called) to connect my new segments, but it didn't work. I have many handlers in the same project, so getting the lambda segment on startup is what I was hoping to do.
I then proceeded to create a main lambda segment (thinking I had to create the first segment myself) but all it did was create an orphaned segment. To make matters worse, each segment creates a new trace ID if one is not provided, and since I could not get the trace ID from the global start-up scope, nothing was connecting. The proper trace ID is important to pass along from start to finish for each request to make sure the calls down-stream are tracked properly.
Dumping of the environment variables before the handler is called and after clearly showed the trace ID is not provided until just before the handler gets called. It's sad that most of the online examples don't even bother to warn about this. I then moved the called to AWSXRay.getSegment() at the start of the lambda handler, then passed the details onto the child segments.
DO NOT set context.callbackWaitsForEmptyEventLoop = false while also calling the callback(error, response) callback passed to the lambda handler. Doing so will terminate the lambda without waiting for segment update events to flush to the daemon, resulting in orphaned segments. :(
Note: This documentation is lacking: https://docs.aws.amazon.com/xray-sdk-for-nodejs/latest/reference/
It states "You can retrieve the current segment or subsegment at any time" when in fact there are some times when you cannot. It's too bad there are no proper examples using actual working NodeJS Lambda code, instead of isolated lines of code thrown everywhere.

Utterances to test lambda function not working (but lambda function itself executes)

I have a lambda function that executes successfully with an intent called GetEvent that returns a specific string. I've created one utterance for this intent for testing purposes (one that is simple and doesn't require any of the optional slots for invoking the skill), but when using the service simulator to test the lambda function with this utterance for GetEvent I'm met with a lambda response that says "The response is invalid". Here is what the interaction model looks like:
#Intent Schema
{
"intents": [
{
"intent": "GetVessel",
"slots": [
{
"name": "boat",
"type": "LIST_OF_VESSELS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
},
{
"name": "date",
"type": "AMAZON.DATE"
},
{
"name": "event",
"type": "LIST_OF_EVENTS"
}
]
},
{
"intent": "GetLocation",
"slots": [
{
"name": "event",
"type": "LIST_OF_EVENTS"
},
{
"name": "date",
"type": "AMAZON.DATE"
},
{
"name": "boat",
"type": "LIST_OF_VESSELS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
}
]
},
{
"intent": "GetEvent",
"slots": [
{
"name": "event",
"type": "LIST_OF_EVENTS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
}
]
}
]
}
With the appropriate custom skill type syntax and,
#First test Utterances
GetVessel what are the properties of {boat}
GetLocation where did {event} occur
GetEvent get me my query
When giving Alexa the utterance get me my query the lambda response should output the string as it did in the execution. I'm not sure why this isn't the case; this is my first project with the Alexa Skills Kit, so I am pretty new. Is there something I'm not understanding with how the lambda function, the intent schema and the utterances are all pieced together?
UPDATE: Thanks to some help from AWSSupport, I've narrowed the issue down to the area in the json request where new session is flagged as true. For the utterance to work this must be set to false (this works when inputting the json request manually, and this is also the case during the lambda execution). Why is this the case? Does Alexa really care about whether or not it is a new session during invocation? I've cross-posted this to the Amazon Developer Forums as well a couple of days ago, but have yet to get a response from someone.
This may or may not have changed -- the last time I used the service simulator (about two weeks ago at the time of writing) it had a pretty severe bug which would lead to requests being mapped to your first / wrong intent, regardless of actual simulated speech input.
So even if you typed in something random like wafaaefgae it simply tries to map that to the first intent you have defined, providing no slots to said intent which may lead to unexpected results.
Your issue could very well be related to this, triggering the same unexpected / buggy behavior because you aren't using any slots in your sample utterance
Before spending more time debugging this, I'd recommend trying the Intent using an actual echo or alternatively https://echosim.io/ -- interaction via actual speech works as expected, unlike the 'simulator'