Can Google BigQuery data transfer service allow me to transfer specific app data automatically?
For example, I have 10 apps in my Google play console, I only want to transfer to BQ within only 3 apps. Is it possible to make this work or any approach?
Also, I just read the price of doc, The monthly charge is $25 per unique Package Name in the Installs_country table.
I don't quite understand how to calculate my cost with that example.
Thank you.
For your requirement, you can download the reports in Cloud Storage of a specific app by selecting the app from Google Play Store for which you want to get the data and then send it to BigQuery using BigQuery Data Transfer Service. The cost calculation of Google Play, it is calculated as $25 per month per unique package and stored in the Installs_country table in BigQuery.
For selecting the specific app, follow the steps given below :
Go to the Play Console.
Click on Download Reports and select the type of report you want.
Under "Select an application," type and select the app for which you want to get the data.
Select the year and month for which you want to download the report.
If you are storing data in a Cloud Storage bucket then that will incur cost and the pricing for data transfer from one storage bucket to another storage bucket can be checked in this link and since you are storing and querying in BigQuery that will also be chargeable.For BigQuery pricing details you can check this documentation. You can use the Billing Calculator to calculate your costs.
Related
I want to use Appflow to ingest data from Google Analytics, that track and store User's information and interaction data of some Website. I have never worked with Appflow or Google Analytics before, I found that Appflow worked quite weird and I don't know why, hope that somebody can give me some help or explanation:
On the Google Analytics Report Console, I can see that we have much customer data ~ 20K customer (users)
On the Appflow side, when I create flow, I selected some Object (or Field) from Source (GA) to ingest to S3, with ON-DEMAND trigger, WITHOUT any filter ,what i found is that:
If I select 3 fields: ga:userAgeBacket, ga:userGender, ga:interestOtherCatgory, the ouput on S3 Bucket ends up with ~1000 rows
If I select 3 fields above and add some addition fields like: ga:date, ga:day, ga:year, the output on S3 Bucket ends up with less data ~ 100 record
I have try a few times (with Schedule trigger and Full Load mode also) and the the result is still the same and I don't know whether in the first scenario, Appflow ingest all data or just ingest some of my data from GA either?
Thank you very much!
While coming to storage metrices, there are just options provided 'object count', 'Total Bytes' , 'Total byte seconds' . What if I need few more metrices like object whose retention period are about to over, maximum size of an object in the bucket etc. How can I achieve such metrices using monitoring dashboard?
Tried to replicate on my GCP environment but unfortunately their was no metrices for retention period and maximum size of an object in the bucket, I recommend you to create a feature request for another metrices dashboard for Cloud Storage. Regarding to your other inquiry you can check Metric Explorer from the documentation you can choose a specific metrics to create a chart for a specific metric categories using configuration (Console) or fetching of data using the query editor (MQL or PROMQL) then save to a custom dashboard.
I'm looking for the best place to store information when data from Bigquery table is ready for export and table is up-to-date - ready for user's queries. This information should be accesible for business users and external applications (checking will be performed e.g. every 5 minutes).
I'm going to use Cloud Composer as data workflow orchestration service but Composer metadata in Cloud SQL is accesible only for user who created Composer instance.
What are best practices to share such a data with users?
This is more like a functional requirement. Si why not at the end of each integration you add a new record in a data store. Then make that data accessible by business users. Or you can use a store like Cloud Firestore, and when you add or modify a record you can trigger a Cloud Function that can send an email.
I am working on a project to get data from an Amazon S3 bucket into Tableau.
The data needs to reorganised and combined from multiple .CSV files. Is Amazon Athena capable of connecting from the S3 to Tableau directly and is it relatively easy/cheap? Or should I instead look at another software package to achieve this?
I am looking to visualise the data and provide a forecast based on observed trend (may need to incorporate functions to generate data to fit linear regression).
It appears that Tableau can query data from Amazon Athena.
See: Connect to your S3 data with the Amazon Athena connector in Tableau 10.3 | Tableau Software
Amazon Athena can query multiple CSV files in a given path (directory) and run SQL against the data. So, it sounds like this is a feasible solution for you.
Yes, you can integrate Athena with Tableau to query your data in S3. There are plenty resource online that describe how to do that, e.g. link 1, link 2, link 3. But obviously, tables that define meta information of your data have to be defined before hand.
Amazon Athena pricing is based on on the amount of data scanned by each query, i.e. 5$ per 1TB of data scanned. So it all comes down how much data you have and how it is structured, i.e. partitioning, bucketing file format etc. Here is a nice blog post that covers these aspects.
While you prototype a dashboard there is one thing to keep in mind. By deafult, each time you would change list of parameters, filters etc, Tableau would automatically send a request to AWS Athena to execute your query. Luckily, you can disable auto querying of the data source and do it manually.
I tried to find out how to get the node hour usage of my Google Cloud ML Prediction API, but didn't find any. Is there a way to know the usage, except for looking at the bills?
Here is the API Documentation I referred to.
The documentation page you referenced is part of the Cloud Machine Learning Engine API documentation:
An API to enable creating and using machine learning models.
That API is for using the product itself, it doesn't contain billing information for the product.
For billing info you want to look at Cloud Billing and its API:
With this API, you can do any of the following.
For billing accounts:
...
Get billing information for a project.
However from just a quick glance at the docs (I didn't use it yet) the API itself doesn't appear to directly provide the particular info you're looking for. But possible ways to get that info appear to be:
using Billing Reports:
The Cloud Billing Reports page lets you view your Google Cloud
Platform (GCP) usage costs at a glance and discover and analyze
trends. The Reports page displays a chart that plots usage costs for
all projects linked to a billing account. To help you view the cost
trends that are important to you, you can select a data range, specify
a time range, configure the chart filters, and group by project,
product, or SKU.
Billing reports can help you answer questions like these:
How is my current month's GCP spending trending?
Export Billing Data to a File:
To access a detailed breakdown of your charges, you can export your
daily usage and cost estimates automatically to a CSV or JSON file
stored in a Google Cloud Storage bucket you specify. You can then
access the data via the Cloud Storage API, CLI tool, or Google Cloud
Platform Console.
Export Billing Data to BigQuery:
Billing export to BigQuery enables you to export your daily usage and
cost estimates automatically throughout the day to a BigQuery dataset
you specify. You can then access your billing data from BigQuery. You
can also use this export method to export data to a JSON file.