Share information when Bigquery data is up-to-date with users - google-cloud-platform

I'm looking for the best place to store information when data from Bigquery table is ready for export and table is up-to-date - ready for user's queries. This information should be accesible for business users and external applications (checking will be performed e.g. every 5 minutes).
I'm going to use Cloud Composer as data workflow orchestration service but Composer metadata in Cloud SQL is accesible only for user who created Composer instance.
What are best practices to share such a data with users?

This is more like a functional requirement. Si why not at the end of each integration you add a new record in a data store. Then make that data accessible by business users. Or you can use a store like Cloud Firestore, and when you add or modify a record you can trigger a Cloud Function that can send an email.

Related

BigQuery data transfer service for Google Play specific App data

Can Google BigQuery data transfer service allow me to transfer specific app data automatically?
For example, I have 10 apps in my Google play console, I only want to transfer to BQ within only 3 apps. Is it possible to make this work or any approach?
Also, I just read the price of doc, The monthly charge is $25 per unique Package Name in the Installs_country table.
I don't quite understand how to calculate my cost with that example.
Thank you.
For your requirement, you can download the reports in Cloud Storage of a specific app by selecting the app from Google Play Store for which you want to get the data and then send it to BigQuery using BigQuery Data Transfer Service. The cost calculation of Google Play, it is calculated as $25 per month per unique package and stored in the Installs_country table in BigQuery.
For selecting the specific app, follow the steps given below :
Go to the Play Console.
Click on Download Reports and select the type of report you want.
Under "Select an application," type and select the app for which you want to get the data.
Select the year and month for which you want to download the report.
If you are storing data in a Cloud Storage bucket then that will incur cost and the pricing for data transfer from one storage bucket to another storage bucket can be checked in this link and since you are storing and querying in BigQuery that will also be chargeable.For BigQuery pricing details you can check this documentation. You can use the Billing Calculator to calculate your costs.

Transfer/Replicate Data periodically from AWS Documentdb to Google Cloud Big Query

We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.
This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.

Database changes on AWS real time sync to GCP

We have Postgresql on AWS. All realtime changes from Portal UI are captured on this database. However there is a request to move these changes in realtime or near realtime to GCP.
Purpose: We want various consumers to ingest data from GCP, instead of master data source in PostgresAWS.
When a customer table (in AWS Postgres) is being inserted with a new customer record, then I want to immediately populate that record in JSON format in GCP pub sub topic.
Please let me know any reference to move a database table specific data across cloud as and when any DML event occurs?
Please note that am new to GCP and learning and exploring :)
Thanks
Databases use log shipping to update slaves/replicas. In your case, you want to update two targets (database, Cloud Pub/Sub) by having the database do the Pub/Sub update. That might be possible but will require development work.
PostgreSQL does not have a native ability to update Pub/Sub. Instead, change your requirements so that the application/service that is updating the database then updates Pub/Sub.
If you really want PostgreSQL to do this task, you will need to use PostgreSQL triggers and write a trigger function in C with the Google Cloud Pub/Sub REST API.
PostgreSQL Trigger Example
PostgreSQL Event Trigger Example
Event triggers for PostgreSQL on Amazon RDS
Cloud Pub/Sub API

gcp - Trigger Cloud Function on database insert?

Not sure how to search this; I'm looking for a way to trigger a Cloud Function whenever a new row is inserted into a database in Cloud SQL. The search for "google cloud function events" (or "triggers") turn up Firebase results, which is not what I want.
There are a series of Cloud Functions that receive data and transform it according to the clients' needs; in the end, after some manipulation, that data ends up in a table. Is there an event I can listen to so I can access the newly inserted rows? If not, I might end up using the Cloud Scheduler and peek regularly into the DB. However, this solution doesn't seem viable for long-term.
I'd appreciate any advice.
Currently there is no official Cloud Function event which could be triggered on changes to a Cloud SQL database. You can check the available events in the Events and Triggers documentation.
You could still do something like it with Cloud Pub/Sub, and it could be done in 2 ways:
1 - The first would be to enable and export logs from the Cloud SQL instance to a Pub/Sub topic by creating a sink on Stackdriver, and have the Cloud Function listen to that topic.
Although this method does not require you to change the way you are inserting data to the DB, it might expose too much information, as all queries will be logged on Stackdriver. It also means you would not have full control of what information is passed to the function, as the message would be the contents of the log entry.
2 - The ideal solution would be to create the Pub/Sub topic and publish to it when you insert new data to the database. This way you have more control over the information sent to the topic. You can find more information about how to set up a new topic in the Cloud Pub/Sub documentation.

Rate limited API requests in Cloud Composer

I'm planning a project whereby I'd be hitting the (rate-limited) Reddit API and storing data in GCS and BigQuery. Initially, Cloud Functions would be the choice, but I'd have to create a Datastore implementation to manage the "pseudo" queue of requests and GAE for cron jobs.
Doing everything in Dataflow wouldn't make sense because it's not advised the make external requests (i.e. hitting the Reddit API) and perpetually running a single job.
Could I use Cloud Composer to read fields from a Google Sheet, then create a queue of requests based on the Google Sheet, then have a task queue execute those requests, store them in GCS and load into BigQuery?
Sounds like a legitimate use case for Composer, additionally you could also leverage the pool concept in Airflow to manage concurrent calls to the same endpoint (e.g., Reddit API).