GCP - Is there a way to get bill line items at Instance level - google-cloud-platform

GCP provides a mechanism to export billing data to BigQuery. This is really helpful but what it lacks is to provide cost line items at the Instance level (or at least I could not figure out a way). We can get cost aggregates at SKU, Project, Service level, but more granularity is required. This is very much possible with Azure and AWS.
Following are the columns I see in the exported BigQuery Billing table;
billing_account_id, invoice.month, cost_type, service.id, service.description, sku.id, sku.description, usage_start_time, usage_end_time, project.id, project.name, project.ancestry_numbers, project.labels.key, project.labels.value, labels.key, labels.value, system_labels.key, system_labels.value, location.location, location.country, location.region, location.zone, cost, currency, currency_conversion_rate, , usage.amount, usage.unit, usage.amount_in_pricing_units, usage.pricing_unit, credits.name, credits.amount, export_time
Is there a workaround to fetch cost aggregates at Instance level?
Example: If I have subscribed for 2 Compute Engines of a specific SKU. Is there a mechanism to get cost aggregates for each Compute Engine separately?

At the moment its not possible to filter your reports in instance level and SKU is the most granular filter.
An approach you can use to identify your instances and a get a better understanding of your data is using labels. As you can see here :
A label is a key-value pair that helps you organize your Google Cloud
instances. You can attach a label to each resource, then filter the
resources based on their labels. Information about labels is forwarded
to the billing system, so you can break down your billing charges by
label.
In this document which explains the billing data table's schema you can see that the labels attached in your resource will be present in your data.

Related

What is the difference between "Use Case Optimized Recommenders" and "Custom Recommender Solutions" in Amazon Personalize?

I'm new to Amazon Personalize, I'm checking the price of this service on this link and I see 3 different categories ("Use Case Optimized Recommenders" "User Segmentation" and "Custom Recommender Solutions"). I wonder what the main difference between them is?
As I noticed, the Use Case Optimized Recommenders price doesn't include "Training cost" and "TPS cost". Is this true? How can this Recommendation Mode work without Training?
Also, what should I do if I upload new data from a new user and need to re-train each month? Can I do it in the Use Case Optimized Recommenders since they don't have "Training Cost"? Since the price from Custom Recommender Solutions for real-time recommendations is quite high.
Training and retraining is managed by Personalize for Use Case Optimized Recommenders. They are designed specifically for the most common use cases in Media (VOD) and Retail and are intended to make it easier to launch and operate recommendation engines for these industries. They must be created within a Domain Dataset Group.
Domain dataset group: A dataset group containing preconfigured resources for different business domains and use cases. Amazon Personalize manages the life cycle of training models and deployment. When you create a Domain dataset group, you choose your business domain, import your data, and create recommenders for each of your use cases. You use your recommender in your application to get recommendations with the GetRecommendations operation.
Therefore, the cost for retraining Use Case Optimized Recommenders is built-into their pricing. There is still a cost for real-time recommenders when you exceed the free number of recommendations per hour.
Custom Recommenders do not support automatic training/retraining so you are responsible for initiating training by creating Solution Versions. Note that you can add custom recommenders to a domain dataset group but you cannot add use case optimized recommenders to a custom dataset group.
If you start with a Domain dataset group, you can still add custom resources such as solutions and solution versions trained with recipes for custom use cases.
Regardless of the dataset group type you create, you still want to keep your datasets updated with the latest interactions and item/user data.
User Segmentation is designed for building segments of users based on their affinity for items or item attributes. They are considered custom recommenders from a training/retraining perspective.
The AWS pricing calculator for Personalize was recently updated to support Use Case Optimized Recommenders and User Segmentation.

Google Cloud Billing - Filter By Label Not Showing

I added resource labels to a few VMs to be able to pull a more granular billing breakdown by label. However, when I go to the billing report, I don't see any option to filter by Label. Is this a permission issue or am I missing something?
If I embed "label=" in the url, the label option will show, but it still doesn't retrieve the matching key pair.
As per my analysis your issue can be due to below reasons :
As per the official doc it says that
When filtering your billing breakdown by label keys, you are not able
to select labels applied to a project. You can select other
user-created labels that you set up and applied to Google Cloud
services.
This might be the reason you are unable to filter the label.
Google does not recommend creating large numbers of unique labels,
such as for timestamps or individual values for every API call. Refer
to these common use cases for labels and Refer this link for
requirements of label.
You need to enable “resourcemanager.projects.get “ permissions and
also enable “resourcemanager.projects.update” to add or modify the
label.
Refer to this link to create the label.

Monitor BigQuery Performances

We have BigQuery instances with various datasets for each datasets we want to monitor the usage,
like Number of Queries per datasets, Queries fired for each datasets, Number of users accessing the datasets.
Is there any way in which we can monitor BigQuery usage?
You can see some metrics here:
https://console.cloud.google.com/monitoring/dashboards/resourceList/bigquery_dataset?project=**[YOUR_PROJECTID_GOES_HERE]**
Some more info here as well: https://cloud.google.com/bigquery/docs/monitoring
You can also enable BigQuery audit logs, and query the audit tables to get some insights https://cloud.google.com/bigquery/docs/reference/auditlogs.
Probably to monitor users, queries and other fine-grained monitoring you will only be able to do so using the audit logs
Most likely the best choice here is to simply query the job metadata directly in aggregate, through the relevant INFORMATION_SCHEMA views.
See https://cloud.google.com/bigquery/docs/information-schema-jobs for details about the job views, which includes some simple query examples at the end.
The jobs views do provide a list of referenced_tables, and you can identify the encapsulating data from them. You'll likely need to consider how you report on queries that reference multiple datasets, particularly if you are reporting on metrics like bytes scanned or resources utilized.

BigQuery summary

Where/how can I easily see how many BigQuery analysis queries have been run per month. How about storage usage overall/changes-over-time(monthly)?
I've had a quick look at "Monitoring > Dashboards > Bigquery". Is that the best place to explore? It only seems to go back to early October - was that when it was released or does it only display the last X weeks of data? Trying metrics explorer for Queries Count (Metric:bigquery.googleapis.com/job/num_in_flight) was giving me a weird unlabelled y-axis, e.g. a scale of 0 to 0.08? Odd as I expect to see a few hundred queries run per week.
Context: It would be good to have a high level summary of BigQuery, as the the months progress, to give an idea to the wider organisation and management on the scale of usage.
You can track your bytes billed by exporting BigQuery usage logs.
Setup logs export (this is using the Legacy Logs Viewer)
Open Logging -> Logs Viewer
Click Create Sink
Enter "Sink Name"
For "Sink service" choose "BigQuery dataset"
Select your BigQuery dataset to monitor
Create sink
Create sink
Once Logs is enabled, all queries to be executed will store data usage logs in table "cloudaudit_googleapis_com_data_access_YYYYMMDD" under the BigQuery dataset you selected in your sink.
Created cloudaudit_googleapis_com_* tables
Here is a sample query to get bytes used per user
#standardSQL
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.metadataJson AS metadataJson,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.totalBilledBytes") AS INT64) AS totalBilledBytes,
FROM
`myproject_id.training_big_query.cloudaudit_googleapis_com_data_access_*`
)
SELECT
principalEmail,
SUM(totalBilledBytes) AS billed_bytes
FROM
data
WHERE
JSON_EXTRACT_SCALAR(metadataJson, "$.jobChange.job.jobConfig.type") = "QUERY"
GROUP BY principalEmail
ORDER BY billed_bytes DESC
Query results
NOTES:
You can only track the usage starting at the date when you set up the logs export
Table "cloudaudit_googleapis_com_data_access_YYYYMMDD" is created daily to track all logs
I think Cloud Monitoring is the only place to create and view metrics. If you are not happy with what they provide for BigQuery by default, the only other alternative is to create your own customized carts and dashboards that satisfy your need. You can achieve that using Monitoring Query Language. Using MQL you can achieve the stuff you described in you question. Here are the links for more detailed information.
Introduction to BigQuery monitoring
Introduction to Monitoring Query Language

How to retrive Bigquery billing details from GCP console or UI?

My team is using bigquery for our product development. Other bill of Rs 5159 got generated for one days transaction.
I checked the transaction details and
BigQuery Analysis: 15.912 Tebibytes [Currency conversion: USD to INR using rate 69.155]
Is is possible to somehow find out more details about the transactions like table name, queries that were executed and exact time of execution?
BigQuery automatically sends audit logs to Stackdriver Logging and provide the ability to do aggregated analysis on logs data. You can see BigQuery schema for exported logs for details
As quick example: Query cost breakdown by identity
This query shows estimated query costs by user identity. It estimates costs based on the list price for on-demand queries in the US. This pricing may not be accurate for other locations or for customers leveraging flat-rate billing.
#standardSQL
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent AS jobCompletedEvent
FROM
`MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_data_access_YYYYMMDD`
)
SELECT
principalEmail,
FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/POWER(2, 40))) AS Estimated_USD_Cost
FROM
data
WHERE
jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY principalEmail
ORDER BY Estimated_USD_Cost DESC
As of last year BigQuery provides INFORMATION_SCHEMA tables that also give access to job information via JOBS_BY_* views. The INFORMATION_SCHEMA.JOBS_BY_USER and INFORMATION_SCHEMA.JOBS_BY_PROJECT views even include the exact query alongside the processed bytes. It might not be 100% accuracte (because bytes processed != bytes billed) but it should allow you to gain a good overview over your costs, which queries triggered them an who the initiator was.
Example
SELECT
creation_time,
job_id,
project_id,
user_email,
total_bytes_processed,
query
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
The most "efficient" way to keep an eye on the cost is using the INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION view as it automatically includes all projects of the organization. You need to be Organization Owner or Organization Administrator to use that view, though.
From there you can figure out which jobs were the most expensive (= get their job id) and form there drill down via JOBS_BY_PROJECT to get the exact query.
See https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/ for a more comprehensive explanation.
You need to Export Billing Data to BigQuery
Tools for monitoring, analyzing and optimizing cost have become an important part of managing development. Billing export to BigQuery enables you to export your daily usage and cost estimates automatically throughout the day to
export data to a CSV,JSON file
However, if you use regular file export, you should be aware that regular file export captures a smaller dataset than export to BigQuery. For more information about regular file export and the data it captures, see Export Billing Data to a File.
to a BigQuery dataset you specify.
After you enable BigQuery export, it might take a few hours to start seeing your data. Billing data automatically exports your data to BigQuery in regular intervals, but the frequency of updates in BigQuery varies depending on the services you're using. Note that BigQuery loads are ACID compliant, so if you query the BigQuery billing export dataset while data is being loaded into it, you will not encounter partially loaded data.
Follow the step by step guide: How to enable billing export to BigQuery
https://cloud.google.com/billing/docs/how-to/export-data-bigquery