I wish to download all 30 post insight metrics on a Facebook page which averages 144 posts per month. Knowing that Facebook applies certain limits, what is the best way to download these posts?
https://graph.facebook.com/$post_id/insights/$metric/$period?since=$since&until=$until&access_token=$token";
post_stories, post_storytellers, post_stories_by_action_type, post_storytellers_by_action_type, post_impressions, post_impressions_unique, post_impressions_paid, post_impressions_paid_unique, post_impressions_fan, post_impressions_fan_unique, post_impressions_fan_paid, post_impressions_fan_paid_unique, post_impressions_organic, post_impressions_organic_unique, post_impressions_viral, post_impressions_viral_unique, post_impressions_by_story_type, post_impressions_by_story_type_unique, post_consumptions, post_consumptions_unique, post_consumptions_by_type, post_consumptions_by_type_unique, post_engaged_users, post_negative_feedback, post_negative_feedback_unique, post_negative_feedback_by_type, post_negative_feedback_by_type_unique.
So far I thought of looping all these 30 metrics around the posts per one day:
30 * n (count of posts) = number of Graph API calls which is too much for Facebook.
Ideally what I wish is the Post Export that Facebook provides in XLS but using Graph or FQL.
Thank you!
Cyril
$fql = "SELECT metric, value
FROM insights
WHERE
object_id = '$post_id' AND (
metric = 'post_impressions_by_paid_non_paid' OR
metric = 'post_impressions_by_paid_non_paid_unique' OR
metric = 'post_stories' OR
metric = 'post_storytellers' OR
metric = 'post_stories_by_action_type' OR
metric = 'post_storytellers_by_action_type' OR
metric = 'post_impressions' OR
metric = 'post_impressions_unique' OR
metric = 'post_impressions_paid' OR
metric = 'post_impressions_paid_unique' OR
metric = 'post_impressions_fan' OR
metric = 'post_impressions_fan_unique' OR
metric = 'post_impressions_fan_paid' OR
metric = 'post_impressions_fan_paid_unique' OR
metric = 'post_impressions_organic' OR
metric = 'post_impressions_organic_unique' OR
metric = 'post_impressions_viral' OR
metric = 'post_impressions_viral_unique' OR
metric = 'post_impressions_by_story_type' OR
metric = 'post_impressions_by_story_type_unique' OR
metric = 'post_consumptions' OR
metric = 'post_consumptions_unique' OR
metric = 'post_consumptions_by_type' OR
metric = 'post_consumptions_by_type_unique' OR
metric = 'post_engaged_users' OR
metric = 'post_negative_feedback' OR
metric = 'post_negative_feedback_unique' OR
metric = 'post_negative_feedback_by_type' OR
metric = 'post_negative_feedback_by_type_unique') AND
period=period('lifetime')
";
I later used to page/insights and post/insights in Graph/API to download all the insights at once.
Related
I am setting up alerting in AWS, using AWS Budgets to trigger an alert if an account’s cost is exceeding x% or x amount of the cost by x date of the month, to identify when spikes in price occur.
resource "aws_budgets_budget" "all-cost-budget" {
name = "all-cost-budget"
budget_type = "COST"
limit_amount = "10"
limit_unit = "USD"
time_unit = "DAILY"
notification {
comparison_operator = "GREATER_THAN"
threshold = "100"
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["email address"]
}
}
We currently do not have a specific limit amount, and would like to set it based on the previous month's spending.
Is there a way to do this dynamically within AWS and Terraform?
You can setup a Lambda function which would automatically execute at the start of every month and update the budget value. The AWS quickstart for Landing Zone has a CloudFormation template which does something similar to what you have described, setting the budget as the rolling average of the last three months (Template, Documentation). You will need to convert the CloudFormation template to Terraform and tweak the criteria to match your requirements. You might also want to consider using FORECASTED instead of ACTUAL.
I hope you can help me with a complicated problem.
I am trying to make a measure which calculate the “weighted distribution %” of a product.
The business definition for this calculation is:
“In a customer Chain we have a number of customers which buys a specific product(selected). We need to find out how much volume these “BUYING customers” buys of the whole PRODUCT GROUP (which this product belongs to) and compare it to the volume of the PRODUCT GROUP bought by ALL customers in the Customer Chain.”
Example (calculation):
Product (selected)=”Product 1”
Number of ALL Customers in “Chain 1” = 18
Volume of PRODUCT GROUP bought by ALL customers in chain = 10.915
Number of BUYING customers in “Chain 1” (who have bought “Product 1”) = 8
Volume of PRODUCT GROUP bought by BUYING customers in chain = 6.945
Calculation:
Weighted distribution % =
Volume (BUYING Customers in chain) / Volume (ALL Customers in chain) = 6.945 / 10.915 = 63,6%
Example (Calculation setup in PBI):
Now, my datamodel is this (simplified):
NOTE(just for info): you may ask why I make a customer count on both “D_Customer” and “F_SALES”, but that is because I can make a customer count on specific transaction dates I F_SALES, ande these I don’t have in D_CUSTOMERIf I set the the following filters:
Chain= “Chain 1”
Product=”Product 1”
I get the following table:
I then calculate the volume on the PRODUCT GROUP with the following measure
Volume (PRODUCT GROUP) = CALCULATE('F_SALES'[Volume];ALLEXCEPT('D_PRODUCT';'D_PRODUCT'[Product group]))
And add it to the table:
Now I have the “Volume (ALL Customers in chain)” part for my weighted distribution calculation.
My problem is how I make a measure, which shows the Volume for the BUYING customers only?
I have tried to make the following calculation, which brings me close:
Volume (BUYING Customers) =
VAR BuyingCustomers_=CALCULATE([Number of Customers(F_SALES)];FILTER('F_SALES';NOT(ISBLANK('Sold to trade'[Customer ID]))))
RETURN
SUMX(SUMMARIZE(D_Customer;D_Customer[Customer Chain];"volume";CALCULATE('F_SALE'[Volume];ALLEXCEPT('D_Product';'D_Product'[Product Group]);FILTER('F_SALE';NOT(ISBLANK(BuyingCustomers_)))));[volume])
Result:
But, as you can see, the volume doesn’t aggregate to “PRODUCT GROUP”-level ?
What I need is this:
Which will give me the measure necessary for my calculation:
Can anyone bring me the missing part?
It will be greatly appreciated.
Br,
JayJay0306
I notice when you query the data catalog in the Google Cloud Platform it retrieves stats for the amount of times a table has been queried:
Queried (Past 30 days): 5332
This is extremely useful information and I was wondering where this is actually stored and if it can be retrieved for all the tables in a project or a dataset.
I have trawled the data catalog tutorials and written some python scripts but these just retrieve entry names for tables and in an iterator which is not what I am looking for.
Likewise I also cannot see this data in the information schema metadata.
You can retrieve the number of completed/performed queries of any table/dataset exporting log entries to BiqQuery. Every query generates some logging on Stackdriver so you can use advanced filters to select the logs you are interested it and store them as a new table in Bigquery.
However, the retention period for the data access logs in GCP is 30 days, so you can only export the logs in the past 30 days.
For instance, use the following advance filter for getting the logs corresponding to all the jobs completed of an specific table:
resource.type="bigquery_resource" AND
log_name="projects/<project_name>/logs/cloudaudit.googleapis.com%2Fdata_access" AND
proto_payload.method_name="jobservice.jobcompleted"
"<table_name>"
Then select Bigquery as Sink Service and state a name for your sink table and the dataset where it will be stored.
All the completed jobs on this table performed after the sink is established will appear as a new table in BigQuery. You can query this table to get information about the logs (you can use a COUNT statement on any column to get the total number of successful jobs for instance).
This information is available in the projects.locations.entryGroups.entries/get API. It is availble as UsageSignal, and contains usage information of 24 hours, 7days, 30days.
Sample output:
"usageSignal": {
"updateTime": "2021-05-23T06:59:59.971Z",
"usageWithinTimeRange": {
"30D": {
"totalCompletions": 156890,
"totalFailures": 3,
"totalCancellations": 1,
"totalExecutionTimeForCompletionsMillis": 6.973312e+08
},
"7D": {
"totalCompletions": 44318,
"totalFailures": 1,
"totalExecutionTimeForCompletionsMillis": 2.0592365e+08
},
"24H": {
"totalCompletions": 6302,
"totalExecutionTimeForCompletionsMillis": 25763162
}
}
}
Reference:
https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.entryGroups.entries/get
https://cloud.google.com/data-catalog/docs/reference/rest/v1/projects.locations.entryGroups.entries#UsageSignal
With Python Datacatalog - You first need to search the Data catalog and you will receive linked_resource in response.
Pass this linked_resource as a request to lookup_entry and you will fetch the last queried (30 days)
results = dc_client.search_catalog(request=request, timeout=120.0)
for result in results:
linked_resource = result.linked_resource
# Get the Location and number of times the table is queried in last 30 days
table_entry = dc_client.lookup_entry(request={"linked_resource": linked_resource})
queried_past_30_days = table_entry.usage_signal.usage_within_time_range.get("30D")
if queried_past_30_days is not None:
dc_num_queried_past_30_days = int(queried_past_30_days.total_completions)
else:
dc_num_queried_past_30_days = 0
SageMaker does offer a HyperparameterTuningJobAnalytics object, but it only contains the final objective metric value.
Here is example code.
tuner = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)
full_df = tuner.dataframe()
The dataframe it returns only contains the objective metric as the column FinalObjectiveValue.
If I have defined more than one metric for the tuning job, how do I get other metrics in SageMaker?
You can retrieve all metrics you have configured for your job using describe_training_job. Here is an example using boto3:
Create the SageMaker client:
smclient = boto3.client('sagemaker')
Get a list of all training jobs (note the example parameters here - only getting the last 100 jobs sorted by the final objective metric in descending order):
trjobs = smclient.list_training_jobs_for_hyper_parameter_tuning_job(
HyperParameterTuningJobName='YOUR_TUNING_JOB_NAME_HERE',
MaxResults=100,
SortBy='FinalObjectiveMetricValue',
SortOrder='Descending')
Iterate over each job summary and retrieve all metrics:
for trjob in trjobs['TrainingJobSummaries']:
jobd = smclient.describe_training_job(TrainingJobName=trjob['TrainingJobName'])
metrics = {m['MetricName']: m['Value'] for m in jobd['FinalMetricDataList']}
print '%s Metrics: %s' % (trjob['TrainingJobName'], metrics)
I am new to AWS workspace, as of now we are using DynamoDB to feed our logs on daily bases for each job execution,
And then each day we generating a summary report from all the data which was posted to dynamoDB on the previous day.
I am facing an issue while fetching the data from dynamoDB while generating the summary report. For fetching the data, I am using Java Client inside my scala class. The issue is that I am not able to retrieve all the data from dynamoDB for any filter condition. But while checking at DynamoDB UI, I can see a lot more no of records.
..using below code ..
val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard.build
//Function that returns filter expression and ExpressionAttribute
val (filterExpression, expressionAttributeValues) = getDynamoDBQuery(inputArgs)
val scanRequest: ScanRequest = new ScanRequest()
.withTableName("table_name")
.withFilterExpression(filterExpression)
.withExpressionAttributeValues(expressionAttributeValues)
client.scan(scanRequest)
After a lot of analysis, it looks like that DynamoDB is taking a while for fetching all the data for any filter condition (when we scan the dataset). And Java client is not waiting while all the records are retrieved from the DynamoDB. Is there any workaround for this. Please help.
Thanks
DynamoDB returns results in a paginated manner. For a given ScanRequest, the ScanResult contains getLastEvaluatedKey that should be passed through setExclusiveStartKey of the next ScanRequest to get the next page. You should loop through this until the getLastEvaluatedKey in a ScanResult is null.
BTW, I agree with the previous answer that DynamoDB may not be an ideal choice to store this kind of data from a cost perspective, but you are a better judge of the choice made!
Dynamodb is not meant for the purpose which you are using for. Storage is not only costlier, but querying the data will also be costlier.
DynamoDb is meant for transaction key value store.
You can store it in Firehose, S3 and query with Athena. That is cheaper, scalable and good for analytical use.
Log --> Firehose --> S3 --> Athena
With regards to your question, DynamoDB will not return all the records when you request for it. It will return a set of records and will give the lastevaluatedkey.
More documentation on DynamoDB Scan.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html
Hope it helps.
Thanks #Vikdor for your help .. I did the same way you suggested and it worked perfectly fine. Below is the code ..
var output = new StringBuilder
val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard.build
val (filterExpression, expressionAttributeValues) = getDynamoDBQuery(inputArgs)
var scanRequest: ScanRequest = new ScanRequest()
.withTableName("watchman-jobs")
.withFilterExpression(filterExpression)
.withExpressionAttributeValues(expressionAttributeValues)
var flag: Boolean = false
var scanResult = client.scan(scanRequest)
var items : util.List[util.Map[String,AttributeValue]] = scanResult.getItems
var lastEvaluatedKey: util.Map[String, AttributeValue] = null
do {
scanRequest = scanRequest.withExclusiveStartKey(lastEvaluatedKey)
scanResult = client.scan(scanRequest)
if(flag) items.addAll(scanResult.getItems)
lastEvaluatedKey = scanResult.getLastEvaluatedKey
flag = true
} while ( {
lastEvaluatedKey != null
})
return items