Google Data Studio Billing Report Demo for GCP multiple projects - google-cloud-platform

Basically I am trying to setup Google Cloud Billing Report Demo for multiple projects.
Example mentioned in this link
In it there are 3 steps to configure datasource for data studio
Create the Billing Export Data Source
Create the Spending Trends Data Source
Create the BigQuery Audit Data Source
Now 1st point is quite clear.
For 2nd point the query example which is provided in demo is based on a single project. In my case I wanted to have spending datasource from multiple projects.
Does doing UNION of query based on each project works in this case?
For 3rd point, I need Bigquery Audit log from all my projects. I thought setting the external single dataset sink as shown below for bigquery in all my project should be able to do the needful.
bigquery.googleapis.com/projects/myorg-project/datasets/myorg_cloud_costs
But I see that in my dataset tables are creating with a suffix _(1) as shown below
cloudaudit_googleapis_com_activity_ (1)
cloudaudit_googleapis_com_data_access_ (1)
and these tables doesn't contain any data despite running bigquery queries in all projects multiple times.In fact it shows below error on previewing.
Unable to find table: myorg-project:cloud_costs.cloudaudit_googleapis_com_activity_20190113
I think auto generated name with suffix _ (1) is causing some issue and because of that data is also not getting populated.
I believe there should be a very simple solution for it, but it just I am not able to think in correct way.
Can somebody please provide some information on how to solve 2nd and 3rd requirement for multiple projects in gcp datastudio billing report demo?

For 2nd point the query example which is provided in demo is based on
a single project. In my case I wanted to have spending datasource from
multiple projects. Does doing UNION of query based on each project
works in this case?
That project is the project you specify for the bulling audit logs in BigQuery. The logs are attached to the billing account, which can contain multiple projects underneath it. All projects in the billing account will be captured in the logs - more specifically, the column project.id.
For 3rd point, I need Bigquery Audit log from all my projects. I
thought setting the external single dataset sink as shown below for
bigquery in all my project should be able to do the needful.
You use the includeChildren property. See here. If you don't have an organisation or use folders, then you will need to create a sink per project and point it at the dataset in BigQuery where you want all the logs to go. You can script this up using the gcloud tool. It's easy.
I think auto generated name with suffix _ (1) is causing some issue and because of that data is also not getting populated.
The suffix normal. Also, it can take a few hours for your logs/sinks to start flowing.

Related

Power BI Embedded Approach for 100s of SQL Targets

I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls

Creating a BigQuery dataset from a log sink in GCP

When running
gcloud logging sinks list
it seems I have several sinks for my project
▶ gcloud logging sinks list
NAME DESTINATION FILTER
myapp1 bigquery.googleapis.com/projects/myproject/datasets/myapp1 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp1"
myapp2 bigquery.googleapis.com/projects/myproject/datasets/myapp2 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp2"
myapp3 bigquery.googleapis.com/projects/myproject/datasets/myapp3 resource.type="k8s_container" resource.labels.cluster_name="mygkecluster" resource.labels.container_name="myapp3"
However, when I navigate in my BigQuery console, I don't see the corresponding datasets.
Is there a way to import these sinks as datasets so that I can run queries against them?
This guide on creating BigQuery datasets does not list how to do so from a log sink (unless I am missing something)
Also any idea why the above datasets are not displayed when using the bq ls command?
Firstly, be sure to be in the good project. if not, you can import dataset from external project by clicking on the PIN button (and you need to have enough permission for this).
Secondly, the Cloud Logging sink to BigQuery doesn't create the dataset, only the tables. So, if you have created the sinks without the dataset, you sinks aren't running (or run in error). Here more details
BigQuery: Select or create the particular dataset to receive the exported logs. You also have the option to use partitioned tables.
In general, what you expect for this feature to do is right, using BigQuery as log sink is to allow you to query the logs with BQ. For the problem you're facing, I believe it is to do with using Web console vs. gcloud.
When using BigQuery as log sink, there are 2 ways to specify a dataset:
point to an existing dataset
create a new dataset
When creating a new sink via web console, there's an option to have Cloud Logging create a new dataset for you as well. However, when using gcloud logging sinks create, it does not automatically create a dataset for you, only create the log sink. It seems like it also does not validate whether the specified dataset exists.
To resolve this, you could either use web console for the task or create the datasets on your own. There's nothing special about creating a BQ dataset to be a log sink destination comparing to creating a BQ dataset for other purpose. Create a BQ dataset, then create a log sink to point to the dataset and you're good to go.
Conceptually, different products (BigQuery, Cloud Logging) on GCP runs independently, the log sink in Cloud Logging is simply an object that pairs up filter and destination, but does not own/manage the destination resource (eg. BQ dataset). It's just that in web console, it provide some extra integration to make things easier.

Why does querying a report from google play console by the google cloud BigQuery API give incomplete results

I'm trying to get data from one of the reports available in the google play console. Specifically the user_acquisition report. I set up the data transfer service within the google cloud platform in order to use the BigQuery API.
When querying that specific report the results are partial. Some columns match the results I get when downloading the report manually but other columns just have the value null although the downloaded report shows that there should be numerical values there.
Another peculiar thing is that when specifying a date range for the query (month of may for example) the result will show about 1/3 of the dates in that month but there should be a row for each day of the month.
When looking at the transfer runs history, some of the runs have completed successfully, and some have failed giving the error message: Error code 5 : No files found for any reports. Please make sure you selected the correct Google Cloud Storage bucket and Google Play reports exist. But if no files are found, then how am i getting any results at all?
The users of both the GCP and Google Play Console are the owners of the project, so there shouldn't be any issue with the permissions to access the bucket where the reports are stored.
I tried creating another data transfer service to see if it can even find the reports. It did find some of the files but not the one I'm interested in. The transfer run history shows the same error as mentioned above.
Has anyone had some similar problem before and perhaps can offer some sort of solution? Or maybe just has some insights into why this problem is occurring?
I think the issue could be related with the availability of the desired report, since I've found that only some reports are supported by this service:
Detailed reports (Reviews, Financial reports)
Aggregated reports (Statistics, User acquisition)
Could it happen that the specific report your want to export is not supported?
If that's not the case I think you should file a support case sharing the "Resource name" into the Transfer details of the failed exports (and correct ones for reference). Alternatively of the support ticket you can also report a defect on the transfer service on a Public Issue tracker. The support team can help you to review further the error message.

"No data" message in Google Data Studio chart after connecting dataset from BigQuery?

I am trying to connect and visualise aggregation of metrics from a wildcard table in BigQuery. This is the first time I am connecting a table from this particular Google Cloud project to Data Studio. Prior to this, I have successfully connected and visualised metrics from other BigQuery tables from other Google Cloud projects in Google Data Studio and never encountered this issue. Any ideas? Could this be something to do with project-level permissions for Google Data Studio to access a BigQuery table for the first time?
More details of this instance: the dataset itself seems to be successfully connected into Data Studio so errors were encountered. After adding some charts connected to that data source and aggregating metrics, no other Data Studio error messages were encounterd. Just the words "No data" displayed in the chart. Could this also be a formatting issue in the BigQuery table itself? The BigQuery table in question was created via pandas-gbq in a loop to split the original dataset into individual daily _YYYYMMDD tables. However, this has been done before and never presented a problem.
I have been struggling with the same problem for a while, and eventually I find out that, at least for my case, it is related to the date I add to the suffix (_YYYYMMDD). If I add "today" to the suffix, DataStudio won't recognize it and will display "no data", but if I change it to "yesterday" (a day earlier), it will then display the data correctly. I think it is probably related to the timezones, e.g., "today" here is not yet there in the US, so the system can't show. Hopefully it helps.

How to update data in google cloud storage/bigquery for google data studio?

For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en