I want to find out the total RAM size of AWS RDS through lambda python.I tried the code and got empty set.Is there any other way to find this? - list

import json
import boto3,datetime
def lambda_handler(event, context):
cloudwatch = boto3.client('cloudwatch',region_name=AWS_REGION)
response = cloudwatch.get_metric_data(
MetricDataQueries=[
{
'Id': 'memory',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/RDS',
'MetricName': 'TotalMemory',
'Dimensions': [
{
"Name": "DBInstanceIdentifier",
"Value": "mydb"
}]
},
'Period': 30,
'Stat': 'Average',
}
}
],
StartTime=(datetime.datetime.now() - datetime.timedelta(seconds=300)).timestamp(),
EndTime=datetime.datetime.now().timestamp()
)
print(response)
The result is like below:
{'MetricDataResults': [{'Id': 'memory', 'Label': 'TotalMemory', 'Timestamps': [], 'Values': [], 'StatusCode': 'Complete'}]

If you are looking to get the configured vCPU/Memory then it seems like we need to call DescribeDBInstances API to get DBInstanceClass, which contains the hardware information from here
You would need to use one of the CloudWatch metric names from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MonitoringOverview.html#rds-metrics and it seems like we can retrieve the currently available memory metric using the FreeableMemory. I was able to get data (in bytes) as seen from the RDS' Monitoring console while using this metric name from your sample code.

You can check the total amount of memory and other useful information associated with the RDS in the CloudWatch console.
Step1: Go to the CloudWatch console. Navigate to Log groups.
Step2: Search for RDSOSMetrics in the search bar.
Step3: Click on the log stream. You will be able to find all the details in the JSON. Your total memory would be present in the field titled memory.total. Sample result would be like this
{
"engine": "MYSQL",
"instanceID": "dbName",
"uptime": "283 days, 21:08:36",
"memory": {
"writeback": 0,
"free": 171696,
"hugePagesTotal": 0,
"inactive": 1652000,
"pageTables": 19716,
"dirty": 324,
"active": 5850016,
"total": 7877180,
"buffers": 244312
}
}
I have intentionally reduced the message in the JSON because of the size, but there will be many other useful fields that you can find here.
You can use custom jq command-line utility to extract the field that you want from these log groups.
You can read more about this here cloudwatch enhanced monitoring.

Related

How do I extract a string of numbers from random text in Power Automate?

I am setting up a flow to organize and save emails as PDF in a Dropbox folder. The first email that will arrive includes a 10 digit identification number which I extract along with an address. My flow creates a folder in Dropbox named in this format: 2023568684 : 123 Main St. Over a few weeks, additional emails arrive that I need to put into that folder. The subject always has a 10 digit number in it. I was building around each email and using functions like split, first, last, etc. to isolate the 10 digits ID. The problem is that there is no consistency in the subjects or bodies of the messages to be able to easily find the ID with that method. I ended up starting to build around each email format individually but there are way too many, not to mention the possibility of new senders or format changes.
My idea is to use List files in folder when a new message arrives which will create an array that I can filter to find the folder ID the message needs to be saved to. I know there is a limitation on this because of the 20 file limit but that is a different topic and question.
For now, how do I find a random 10 digit number in a randomly formatted email subject line so I can use it with the filter function?
For this requirement, you really need regex and at present, PowerAutomate doesn't support the use of regex expressions but the good news is that it looks like it's coming ...
https://powerusers.microsoft.com/t5/Power-Automate-Ideas/Support-for-regex-either-in-conditions-or-as-an-action-with/idi-p/24768
There is a connector but it looks like it's not free ...
https://plumsail.com/actions/request-free-license
To get around it for now, my suggestion would be to create a function app in Azure and let it do the work. This may not be your cup of tea but it will work.
I created a .NET (C#) function with the following code (straight in the portal) ...
#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
string strToSearch = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String((string)data?.Text));
string regularExpression = data?.Pattern;
var matches = System.Text.RegularExpressions.Regex.Matches(strToSearch, regularExpression);
var responseString = JsonConvert.SerializeObject(matches, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
return new ContentResult()
{
ContentType = "application/json",
Content = responseString
};
}
Then in PowerAutomate, call the HTTP action passing in a base64 encoded string of the content you want to search ...
The is the expression in the JSON ... base64(variables('String to Search')) ... and this is the json you need to pass in ...
{
"Text": "#{base64(variables('String to Search'))}",
"Pattern": "[0-9]{10}"
}
This is an example of the response ...
[
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 33,
"Length": 10,
"Value": "2023568684"
},
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 98,
"Length": 10,
"Value": "8384468684"
}
]
Next, add a Parse JSON action and use this schema ...
{
"type": "array",
"items": {
"type": "object",
"properties": {
"Groups": {
"type": "object",
"properties": {}
},
"Success": {
"type": "boolean"
},
"Name": {
"type": "string"
},
"Captures": {
"type": "array"
},
"Index": {
"type": "integer"
},
"Length": {
"type": "integer"
},
"Value": {
"type": "string"
}
},
"required": [
"Groups",
"Success",
"Name",
"Captures",
"Index",
"Length",
"Value"
]
}
}
Finally, extract the first value that you find which matches the regex pattern. It returns multiple results if found so if you need to, you can do something with those.
This is the expression ... #{first(body('Parse_JSON'))?['value']}
From this string ...
We're going to search for string 2023568684 within this text and we're also going to try and find 8384468684, this should work.
... this is the result ...
Don't have a Premium PowerAutomate licence so can't use the HTTP action?
You can do this exact same thing using the LogicApps service in Azure. It's the same engine with some slight differences re: connectors and behaviour.
Instead of the HTTP, use the Azure Functions action.
In relation to your action to fire when an email is received, in LogicApps, it will poll every x seconds/minutes/hours/etc. rather than fire on event. I'm not 100% sure which email connector you're using but it should exist.
Dropbox connectors exist, that's no problem.
You can export your PowerAutomate flow into a LogicApps format so you don't have to start from scratch.
https://learn.microsoft.com/en-us/azure/logic-apps/export-from-microsoft-flow-logic-app-template
If you're concerned about cost, don't be. Just make sure you use the consumption plan. Costs only really rack up for these services when the apps run for minutes at a time on a regular basis. Just keep an eye on it for your own mental health.
TO get the function URL, you can find it in the function itself. You have to be in the function ...

S3 efficiency overwrite versus read

I just finished the following function on getting customer data from my shopify into an S3 bucket. What happens now is the following. A trigger runs this lambda on a daily basis. Then, all customers are written to an S3 bucket. Every already existing entry is just overwritten. New customers are added.
My question is: Is this a scalable approach or should I read all the files and compare timestamps to only add the new entries? Or is this second approach maybe worse?
import requests
import json
import boto3
s3 = boto3.client('s3')
bucket ='testbucket'
url2 = "something.json"
def getCustomers():
r = requests.get(url2)
return r.json()
def lambda_handler(event, context):
data = getCustomers()
for customer in data["customers"]:
#create a unique id for each customer
customer_id = str(customer["id"])
#create a file name to put the customer in bucket
file_name = 'customers' + '/' + customer_id + '.json'
#Saving .json to s3
customer_string = str(customer)
uploadByteStream = bytes(customer_string.encode('UTF-8'))
s3.put_object(Bucket=bucket, Key=file_name, Body=uploadByteStream)
return {
'statusCode': 200,
'body': json.dumps('Success')
}
An example response is the following:
{
"id": 71806090000,
"email": "something#gmail.com",
"accepts_marketing": false,
"created_at": "2021-07-27T11:06:38+02:00",
"updated_at": "2021-07-27T11:11:58+02:00",
"first_name": "Bertje",
"last_name": "Bertens",
"orders_count": 0,
"state": "disabled",
"total_spent": "0.00",
"last_order_id": null,
"note": "",
"verified_email": true,
"multipass_identifier": null,
"tax_exempt": false,
"phone": "+32470000000",
"tags": "",
"last_order_name": null,
"currency": "EUR",
"addresses": [
{
"id": 6623179276486,
"customer_id": 5371846099142,
"first_name": "Bertje",
"last_name": "Bertens",
"company": "",
"address1": "Somewhere",
"address2": "",
"city": "Somecity",
"province": null,
"country": "",
"zip": "0000",
"phone": null,
"name": "Bertje Bertens",
"province_code": null,
"country_code": null,
"country_name": "",
"default": true
}
],
"accepts_marketing_updated_at": "2021-07-27T11:11:35+02:00",
"marketing_opt_in_level": null,
"tax_exemptions": [],
"admin_graphql_api_id": "",
"default_address": {
"id": 6623179276486,
"customer_id": 5371846099142,
"first_name": "Bertje",
"last_name": "Bertens",
"company": "",
"address1": "Somewhere",
"address2": "",
"city": "Somecity",
"province": null,
"country": "",
"zip": "0000",
"phone": null,
"name": "Bertje Bertens",
"province_code": null,
"country_code": null,
"country_name": "",
"default": true
}
}
Is this a scalable approach or should I read all the files and compare timestamps to only add the new entries? Or is this second approach maybe worse?
Generally speaking, you're not going to run into many scalability problems with a daily task utilizing Lambda and S3.
Some considerations:
Costs
a. Lambda execution costs. The longer your lambda runs, the more time you pay
b. S3 Transfer costs. Unless you run your lambda in a VPC and setup a VPC endpoint for your bucket, you pay S3 transfer costs from lambda -> internet (-> s3).
Lambda execution timeouts.
If you have many files to upload, you may eventually run into a problem where you have so many files to transfer it can't be completed within a single invocation.
Fault tolerance
Right now, if your lambda fails for some reason, you'll drop all the work for the day.
How do these two approaches bear on these considerations?
For (1) you simply have to calculate your costs. Technically, the approach of checking the timestamp first will help you here. However, my guess is that, if you're only running this on a daily basis within a single invocation, the costs are minimal right now and not of much concern. We're talking pennies per month at most (~$0.05/mo # full 15 minute invocation once daily + transfer costs).
For (2) the approach of checking timestamps is also somewhat better, but doesn't truly address the scalability issue. If you expect you may eventually reach a point where you will run out of execution time in Lambda, you may want to consider a new architecture for the solution.
For (3) neither approach has any real bearing. Either way, you have the same fault tolerance problem.
Possible alternative architecture components to address these areas may include:
use of SQS to queue file transfers (help with decoupling and DLQ for fault tolerance)
use of scheduled (fargate) ECS tasks instead of Lambda for compute (deal with Lambda timeout limitations) OR have lambda consume the queue in batches
S3 VPC endpoints and in-vpc compute (optimize s3 transfer; likely not cost effective until much larger scale)
So, to answer the question directly in summary:
The current solution has some scalability concerns, namely the execution timeout of lambda and fault tolerance concerns. The second approach does introduce optimizations, but they do not address the scalability concerns. Additionally, the value you get from the second solution may not be significant.
In any case, what you propose makes sense and shouldn't take much effort to implement.
...
customer_updated_at = datetime.datetime.fromisoformat(customer['created_at'])
file_name = 'customers' + '/' + customer_id + '.json'
# Send HEAD request to check date to see if we need to update it
response = s3.head_object(bucket, file_name)
s3_modified = response["LastModified"]
if customer_updated_at > s3_modified:
# Saving .json to s3
customer_string = str(customer)
uploadByteStream = bytes(customer_string.encode('UTF-8'))
s3.put_object(Bucket=bucket, Key=file_name, Body=uploadByteStream)
else:
print('s3 version is up to date, no need to upload')
It will work as long as you manage to finish the whole process within the max 15 minute timeout of Lambda.
S3 is built to scale to much more demanding workloads ;-)
But:
It's very inefficient as you already observed. A better implementation would be to keep track of the timestamp of the last full load somewhere, e.g. DynamoDB or the Systems Manager parameter store and only write all customers where the "created_at" or "updated_at" attributes are after the last successful full load. In the end you update the full load timestamp.
Here is some pseudo code:
last_full_load_date = get_last_full_load() or '1900-01-01T00:00:00Z'
customers = get_customers()
for customer in customers:
if customer.created_at >= last_full_load_date or customer.updated_at >= last_full_load_date:
write_customer(customer)
set_last_full_load(datetime.now())
This way you only write data that has actually changed (assuming the API is reliable).
This also has the benefit, that you'll be able to retry if something goes wrong during writing since you only update the last_full_load time in the end. Alternatively you could keep track of the last modified time per user, but that seems not necessary if you to a bulk load anyway.

GCP recommendation data format for catalog

I am currently working on recommendation AI. since I am new to GCP recommendation, I have been struggling with data format for catalog. I read the documentation and it says each product item JSON format should be on a single line.
I understand this totally, but It would be really great if I could get what the JSON format looks like in real because the one in their documentation is very ambiguous to me. and I am trying to use console to import data
I tried to import data looking like down below but I got error saying invalid JSON format 100 times. it has lots of reasons such as unexpected token and something should be there and so on.
[
{
"id": "1",
"title": "Toy Story (1995)",
"categories": [
"Animation",
"Children's",
"Comedy"
]
},
{
"id": "2",
"title": "Jumanji (1995)",
"categories": [
"Adventure",
"Children's",
"Fantasy"
]
},
...
]
Maybe it was because each item was not on a single line, but I am also wondering if the above is enough for importing. I am not sure if those data should be included in another property like
{
"inputConfig": {
"productInlineSource": {
"products": [
{
"id": "1",
"title": "Toy Story (1995)",
"categories": [
"Animation",
"Children's",
"Comedy"
]
},
{
"id": "2",
"title": "Jumanji (1995)",
"categories": [
"Adventure",
"Children's",
"Fantasy"
]
},
}
I can see the above in the documentation but it says it is for importing inline which is using POST request. it does not mention anything about importing with console. I just guess the format is also used for console but I am not 100% sure. that is why I am asking
Is there anyone who can show me the entire data format to import data by using console?
Problem Solved
For those who might have the same question, The exact data format you should import by using gcp console looks like
{"id":"1","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"2","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
No square bracket wrapping all the items.
No comma between items.
Only each item on a single line.
Posting this Community Wiki for better visibility.
OP edited question and add solution:
The exact data format you should import by using gcp console looks like
{"id":"1","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"2","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
No square bracket wrapping all the items.
No comma between items.
Only each item on a single line.
However I'd like to elaborate a bit.
There are a few ways to import Importing catalog information:
Importing catalog data from Merchant Center
Importing catalog data from BigQuery
Importing catalog data from Cloud Storage
I guess this is what was used by OP, as I was able to import catalog using UI and GCS with below JSON file.
{
"inputConfig": {
"catalogInlineSource": {
"catalogItems": [
{"id":"111","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"222","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
{"id":"333","title":"Test Movie (2020)","categories":["Adventure","Children's","Fantasy"]}
]
}
}
}
Importing catalog data inline
At the bottom of the Importing catalog information documentation you can find information:
The line breaks are for readability; you should provide an entire catalog item on a single line. Each catalog item should be on its own line.
It means you should use something similar to NDJSON - convenient format for storing or streaming structured data that may be processed one record at a time.
If you would like to try inline method, you should use this format, however it's single line but with breaks for readability.
data.json file
{
"inputConfig": {
"catalogInlineSource": {
"catalogItems": [
{
"id": "1212",
"category_hierarchies": [ { "categories": [ "Animation", "Children's" ] } ],
"title": "Toy Story (1995)"
},
{
"id": "5858",
"category_hierarchies": [ { "categories": [ "Adventure", "Fantasy" ] } ],
"title": "Jumanji (1995)"
},
{
"id": "321123",
"category_hierarchies": [ { "categories": [ "Comedy", "Adventure" ] } ],
"title": "The Lord of the Rings: The Fellowship of the Ring (2001)"
},
]
}
}
}
Command
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
--data #./data.json \
"https://recommendationengine.googleapis.com/v1beta1/projects/[your-project]/locations/global/catalogs/default_catalog/catalogItems:import"
{
"name": "import-catalog-default_catalog-1179023525XX37366024",
"done": true
}
Please keep in mind that the above method requires Service Account authentication, otherwise you will get PERMISSION DENIED error.
"message" : "Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the translate.googleapis.com. We recommend that most server applications use service accounts instead. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.",
"status" : "PERMISSION_DENIED"

AWS IoT rule - timestamp for Elasticsearch

Have a bunch of IoT devices (ESP32) which publish a JSON object to things/THING_NAME/log for general debugging (to be extended into other topics with values in the future).
Here is the IoT rule which kind of works.
{
"sql": "SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS timestamp, topic(2) AS deviceId FROM 'things/+/stdout'",
"ruleDisabled": false,
"awsIotSqlVersion": "2016-03-23",
"actions": [
{
"elasticsearch": {
"roleArn": "arn:aws:iam::xxx:role/iot-es-action-role",
"endpoint": "https://xxxx.eu-west-1.es.amazonaws.com",
"index": "devices",
"type": "device",
"id": "${newuuid()}"
}
}
]
}
I'm not sure how to set #timestamp inside Elasticsearch to allow time based searches.
Maybe I'm going about this all wrong, but it almost works!
Elasticsearch can recognize date strings matching dynamic_date_formats.
The following format is automatically mapped as a date field in AWS Elasticsearch 7.1:
SELECT *, parse_time("yyyy/MM/dd HH:mm:ss", timestamp()) AS timestamp FROM 'events/job/#'
This approach does not require to create a preconfigured index, which is important for dynamically created indexes, e.g. with daily rotation for logs:
devices-${parse_time("yyyy.MM.dd", timestamp(), "UTC")}
According to elastic.co documentation,
The default value for dynamic_date_formats is:
[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]
#timestamp is just a convention as the # prefix is the default prefix for Logstash generated fields. Because you are not using Logstash as a middleman between IoT and Elasticsearch, you don't have a default mapping for #timestamp.
But basically, it is just a name, so call it what you want, the only thing that matters is that you declare it as a timestamp field in the mappings section of the Elasticsearch index.
If for some reason you still need it to be called #timestamp, you can either SELECT it with that prefix right away in the AS section (might be an issue with IoT's sql restrictions, not sure):
SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS #timestamp, topic(2) AS deviceId FROM 'things/+/stdout'
Or you use the copy_to functionality when declaring you're mapping:
PUT devices/device
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"copy_to": "#timestamp"
},
"#timestamp": {
"type": "date",
}
}
}
}

AWS - How to obtain Billing Monthly Forecast programmatically

I'm just wondering if it is currently possible to obtain the billing monthly forecast amount using either an SDK or the API.
Looking at the AWS docs it doesn't seem possible. Although I haven't delved into the Cost Explorer API too much, I was wondering if anyone else has been able to obtain this data point?
There is a GetCostAndUsage method in AWS Billing and Cost Management API which returns the cost and usage metrics. This method also accepts the TimePeriod which return the results as per given time frame. Although I didn't test it but you can try to pass future dates in it maybe it will return forecast results. Give it a try
{
"TimePeriod": {
"Start":"2018-06-01",
"End": "2018-06-30"
},
"Granularity": "MONTHLY",
"Filter": {
"Dimensions": {
"Key": "SERVICE",
"Values": [
"Amazon Simple Storage Service"
]
}
},
"GroupBy":[
{
"Type":"DIMENSION",
"Key":"SERVICE"
},
{
"Type":"TAG",
"Key":"Environment"
}
],
"Metrics":["BlendedCost", "UnblendedCost", "UsageQuantity"]
}