Setting a dynamic limit_amount in AWS Budget Billing Module - amazon-web-services

I am setting up alerting in AWS, using AWS Budgets to trigger an alert if an account’s cost is exceeding x% or x amount of the cost by x date of the month, to identify when spikes in price occur.
resource "aws_budgets_budget" "all-cost-budget" {
name = "all-cost-budget"
budget_type = "COST"
limit_amount = "10"
limit_unit = "USD"
time_unit = "DAILY"
notification {
comparison_operator = "GREATER_THAN"
threshold = "100"
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["email address"]
}
}
We currently do not have a specific limit amount, and would like to set it based on the previous month's spending.
Is there a way to do this dynamically within AWS and Terraform?

You can setup a Lambda function which would automatically execute at the start of every month and update the budget value. The AWS quickstart for Landing Zone has a CloudFormation template which does something similar to what you have described, setting the budget as the rolling average of the last three months (Template, Documentation). You will need to convert the CloudFormation template to Terraform and tweak the criteria to match your requirements. You might also want to consider using FORECASTED instead of ACTUAL.

Related

gcp create metric health checks from health check logs

I'm trying to create a metric on health check for a backend service (load balancer). I need this metric to trigger alerts on failed health checks.
From: https://cloud.google.com/monitoring/api/v3/kinds-and-types
Emulating string-valued custom metrics
String values in custom metrics are not supported, but you can
replicate string-valued metric functionality in the following ways:
Create a GAUGE metric using an INT64 value as an enum that
maps to a string value. Externally translate the enum to a string
value when you query the metric.
Create a GAUGE metric with a BOOL value and a label whose
value is one of the strings you want to monitor. Use the boolean to
indicate if the value is the active value.
For example, suppose you want to create a string-valued metric called "status" with possible options OK, OFFLINE, or PENDING. You could make a GAUGE metric with a label called status_value. Each update would write three time series, one for each status_value (OK, OFFLINE, or PENDING), with a value of 1 for "true" or 0 for "false".
Using Terraform, I tried this, but not sure if it's really converting the values "UNHEALTHY" and "HEALTHY" to 0/1s. I tried to switch metric_type to GAUGE instead of DELTA, but the error from Terraform said I needed to use DELTA, and DISTRIBUTION is required for the value_type. Has anybody tried the docs above where it says GAUGE metric with BOOL value? Don't we need some kind of map of strings to boolean?
Here is my terraform:
resource "google_logging_metric" "logging_metric" {
name = var.name
filter = "logName=projects/[project_id]/logs/compute.googleapis.com%2Fhealthchecks"
metric_descriptor {
metric_kind = "DELTA"
value_type = "DISTRIBUTION"
labels {
key = "status"
value_type = "STRING"
description = "status of health check"
}
display_name = var.display_name
}
value_extractor = "EXTRACT(jsonPayload.request)"
label_extractors = {
"status" = "EXTRACT(jsonPayload.healthCheckProbeResult.healthState)"
}
bucket_options {
linear_buckets {
num_finite_buckets = 3
width = 1
offset = 1
}
}
}

DAX measure: "Weighted distribution %"

I hope you can help me with a complicated problem.
I am trying to make a measure which calculate the “weighted distribution %” of a product.
The business definition for this calculation is:
“In a customer Chain we have a number of customers which buys a specific product(selected). We need to find out how much volume these “BUYING customers” buys of the whole PRODUCT GROUP (which this product belongs to) and compare it to the volume of the PRODUCT GROUP bought by ALL customers in the Customer Chain.”
Example (calculation):
Product (selected)=”Product 1”
Number of ALL Customers in “Chain 1” = 18
Volume of PRODUCT GROUP bought by ALL customers in chain = 10.915
Number of BUYING customers in “Chain 1” (who have bought “Product 1”) = 8
Volume of PRODUCT GROUP bought by BUYING customers in chain = 6.945
Calculation:
Weighted distribution % =
Volume (BUYING Customers in chain) / Volume (ALL Customers in chain) = 6.945 / 10.915 = 63,6%
Example (Calculation setup in PBI):
Now, my datamodel is this (simplified):
NOTE(just for info): you may ask why I make a customer count on both “D_Customer” and “F_SALES”, but that is because I can make a customer count on specific transaction dates I F_SALES, ande these I don’t have in D_CUSTOMERIf I set the the following filters:
Chain= “Chain 1”
Product=”Product 1”
I get the following table:
I then calculate the volume on the PRODUCT GROUP with the following measure
Volume (PRODUCT GROUP) = CALCULATE('F_SALES'[Volume];ALLEXCEPT('D_PRODUCT';'D_PRODUCT'[Product group]))
And add it to the table:
Now I have the “Volume (ALL Customers in chain)” part for my weighted distribution calculation.
My problem is how I make a measure, which shows the Volume for the BUYING customers only?
I have tried to make the following calculation, which brings me close:
Volume (BUYING Customers) =
VAR BuyingCustomers_=CALCULATE([Number of Customers(F_SALES)];FILTER('F_SALES';NOT(ISBLANK('Sold to trade'[Customer ID]))))
RETURN
SUMX(SUMMARIZE(D_Customer;D_Customer[Customer Chain];"volume";CALCULATE('F_SALE'[Volume];ALLEXCEPT('D_Product';'D_Product'[Product Group]);FILTER('F_SALE';NOT(ISBLANK(BuyingCustomers_)))));[volume])
Result:
But, as you can see, the volume doesn’t aggregate to “PRODUCT GROUP”-level ?
What I need is this:
Which will give me the measure necessary for my calculation:
Can anyone bring me the missing part?
It will be greatly appreciated.
Br,
JayJay0306

How to get all metrics for a SageMaker Hyperparameter tuning job?

SageMaker does offer a HyperparameterTuningJobAnalytics object, but it only contains the final objective metric value.
Here is example code.
tuner = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
full_df = tuner.dataframe()
The dataframe it returns only contains the objective metric as the column FinalObjectiveValue.
If I have defined more than one metric for the tuning job, how do I get other metrics in SageMaker?
You can retrieve all metrics you have configured for your job using describe_training_job. Here is an example using boto3:
Create the SageMaker client:
smclient = boto3.client('sagemaker')
Get a list of all training jobs (note the example parameters here - only getting the last 100 jobs sorted by the final objective metric in descending order):
trjobs = smclient.list_training_jobs_for_hyper_parameter_tuning_job(
HyperParameterTuningJobName='YOUR_TUNING_JOB_NAME_HERE',
MaxResults=100,
SortBy='FinalObjectiveMetricValue',
SortOrder='Descending')
Iterate over each job summary and retrieve all metrics:
for trjob in trjobs['TrainingJobSummaries']:
jobd = smclient.describe_training_job(TrainingJobName=trjob['TrainingJobName'])
metrics = {m['MetricName']: m['Value'] for m in jobd['FinalMetricDataList']}
print '%s Metrics: %s' % (trjob['TrainingJobName'], metrics)

Add a partition on glue table via API on AWS?

I have an S3 bucket which is constantly being filled with new data, I am using Athena and Glue to query that data, the thing is if glue doesn't know that a new partition is created it doesn't search that it needs to search there. If I make an API call to run the Glue crawler each time I need a new partition is too expensive so the best solution to do this is to tell glue that a new partition is added i.e to create a new partition is in it's properties table. I looked through AWS documentation but no luck, I am using Java with AWS. Any help?
You may want to use batch_create_partition() glue api to register new partitions. It doesn't require any expensive operation like MSCK REPAIR TABLE or re-crawling.
I had a similar use case for which I wrote a python script which does the below -
Step 1 - Fetch the table information and parse the necessary information from it which is required to register the partitions.
# Fetching table information from glue catalog
logger.info("Fetching table info for {}.{}".format(l_database, l_table))
try:
response = l_client.get_table(
CatalogId=l_catalog_id,
DatabaseName=l_database,
Name=l_table
)
except Exception as error:
logger.error("Exception while fetching table info for {}.{} - {}"
.format(l_database, l_table, error))
sys.exit(-1)
# Parsing table info required to create partitions from table
input_format = response['Table']['StorageDescriptor']['InputFormat']
output_format = response['Table']['StorageDescriptor']['OutputFormat']
table_location = response['Table']['StorageDescriptor']['Location']
serde_info = response['Table']['StorageDescriptor']['SerdeInfo']
partition_keys = response['Table']['PartitionKeys']
Step 2 - Generate a dictionary of lists where each list contains the information to create a single partition. All lists will have same structure but their partition specific values will change (year, month, day, hour)
def generate_partition_input_list(start_date, num_of_days, table_location,
input_format, output_format, serde_info):
input_list = [] # Initializing empty list
today = datetime.utcnow().date()
if start_date > today: # To handle scenarios if any future partitions are created manually
start_date = today
end_date = today + timedelta(days=num_of_days) # Getting end date till which partitions needs to be created
logger.info("Partitions to be created from {} to {}".format(start_date, end_date))
for input_date in date_range(start_date, end_date):
# Formatting partition values by padding required zeroes and converting into string
year = str(input_date)[0:4].zfill(4)
month = str(input_date)[5:7].zfill(2)
day = str(input_date)[8:10].zfill(2)
for hour in range(24): # Looping over 24 hours to generate partition input for 24 hours for a day
hour = str('{:02d}'.format(hour)) # Padding zero to make sure that hour is in two digits
part_location = "{}{}/{}/{}/{}/".format(table_location, year, month, day, hour)
input_dict = {
'Values': [
year, month, day, hour
],
'StorageDescriptor': {
'Location': part_location,
'InputFormat': input_format,
'OutputFormat': output_format,
'SerdeInfo': serde_info
}
}
input_list.append(input_dict.copy())
return input_list
Step 3 - Call the batch_create_partition() API
for each_input in break_list_into_chunks(partition_input_list, 100):
create_partition_response = client.batch_create_partition(
CatalogId=catalog_id,
DatabaseName=l_database,
TableName=l_table,
PartitionInputList=each_input
)
There is a limit of 100 partitions in a single api call, So if you are creating more than 100 partitions then you will need to break your list into chunks and iterate over it.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.batch_create_partition
You can configure you're glue crawler to get triggered every 5 mins
You can create a lambda function which will either run on schedule, or will be triggered by an event from your bucket (eg. putObject event) and that function could call athena to discover partitions:
import boto3
athena = boto3.client('athena')
def lambda_handler(event, context):
athena.start_query_execution(
QueryString = "MSCK REPAIR TABLE mytable",
ResultConfiguration = {
'OutputLocation': "s3://some-bucket/_athena_results"
}
Use Athena to add partitions manualy. You can also run sql queries via API like in my lambda example.
Example from Athena manual:
ALTER TABLE orders ADD
PARTITION (dt = '2016-05-14', country = 'IN') LOCATION 's3://mystorage/path/to/INDIA_14_May_2016'
PARTITION (dt = '2016-05-15', country = 'IN') LOCATION 's3://mystorage/path/to/INDIA_15_May_2016';
This question is old but I wanted to put it out there that someone could have s3:ObjectCreated:Put notifications trigger a Lambda function which registers new partitions when data arrives on S3. I would even expand this function to handle deprecations based on object deletes and so on. Here's a blog post by AWS which details S3 event notifications: https://aws.amazon.com/blogs/aws/s3-event-notification/
AWS Glue recently added a RecrawlPolicy that only crawls the new folders/paritions that you add to your S3 bucket.
https://docs.aws.amazon.com/glue/latest/dg/incremental-crawls.html
This should help you with minimizing crawling all the data again an again. From what I read, you can define incremental crawls while setting up your crawler, or editing an existing one. One thing however to note is that incremental crawls require the schema of new data to be more or less the same as existing schema.

Facebook Post Insights Metrics Download via API

I wish to download all 30 post insight metrics on a Facebook page which averages 144 posts per month. Knowing that Facebook applies certain limits, what is the best way to download these posts?
https://graph.facebook.com/$post_id/insights/$metric/$period?since=$since&until=$until&access_token=$token";
post_stories, post_storytellers, post_stories_by_action_type, post_storytellers_by_action_type, post_impressions, post_impressions_unique, post_impressions_paid, post_impressions_paid_unique, post_impressions_fan, post_impressions_fan_unique, post_impressions_fan_paid, post_impressions_fan_paid_unique, post_impressions_organic, post_impressions_organic_unique, post_impressions_viral, post_impressions_viral_unique, post_impressions_by_story_type, post_impressions_by_story_type_unique, post_consumptions, post_consumptions_unique, post_consumptions_by_type, post_consumptions_by_type_unique, post_engaged_users, post_negative_feedback, post_negative_feedback_unique, post_negative_feedback_by_type, post_negative_feedback_by_type_unique.
So far I thought of looping all these 30 metrics around the posts per one day:
30 * n (count of posts) = number of Graph API calls which is too much for Facebook.
Ideally what I wish is the Post Export that Facebook provides in XLS but using Graph or FQL.
Thank you!
Cyril
$fql = "SELECT metric, value
FROM insights
WHERE
object_id = '$post_id' AND (
metric = 'post_impressions_by_paid_non_paid' OR
metric = 'post_impressions_by_paid_non_paid_unique' OR
metric = 'post_stories' OR
metric = 'post_storytellers' OR
metric = 'post_stories_by_action_type' OR
metric = 'post_storytellers_by_action_type' OR
metric = 'post_impressions' OR
metric = 'post_impressions_unique' OR
metric = 'post_impressions_paid' OR
metric = 'post_impressions_paid_unique' OR
metric = 'post_impressions_fan' OR
metric = 'post_impressions_fan_unique' OR
metric = 'post_impressions_fan_paid' OR
metric = 'post_impressions_fan_paid_unique' OR
metric = 'post_impressions_organic' OR
metric = 'post_impressions_organic_unique' OR
metric = 'post_impressions_viral' OR
metric = 'post_impressions_viral_unique' OR
metric = 'post_impressions_by_story_type' OR
metric = 'post_impressions_by_story_type_unique' OR
metric = 'post_consumptions' OR
metric = 'post_consumptions_unique' OR
metric = 'post_consumptions_by_type' OR
metric = 'post_consumptions_by_type_unique' OR
metric = 'post_engaged_users' OR
metric = 'post_negative_feedback' OR
metric = 'post_negative_feedback_unique' OR
metric = 'post_negative_feedback_by_type' OR
metric = 'post_negative_feedback_by_type_unique') AND
period=period('lifetime')
";
I later used to page/insights and post/insights in Graph/API to download all the insights at once.