Process data on Firebase of Google Cloud platform - google-cloud-platform

I hope to upload data from Android app to Google Cloud Platform and do some basic machine learning/statistic operations. I have used firebase and upload the generated data on Android app to 'realtime database' on firebase of Google cloud platform. My next goal is to do some data processing, such as simple statistic and machine learning operations, I do not know how 'realtime database' could support these operations? If not, it seems Google Cloud Platform can do such operations in MySQL, how I transfer the data in 'realtime database' on firebase on My SQL? I am the fresh guy in GCP, hope get a clear direction. Thank you

You can use firebase_admin library to access Realtime Database data. Then you can either store it using one of many google cloud client libraries or use it directly in ML Job.

Related

How can I integrate delta tables in databriscks with SAP Analytics Cloud

I have to create reports in SAP Analytics Cloud using data saved in delta tables in Databricks on AWS. I have come across some ready-made connectors (such as this: https://www.cdata.com/kb/tech/databricks-connect-sac.rst) but as a proof of concept my team has decided to deploy a docker container with the sap data provider (https://docs.aws.amazon.com/sap/latest/general/data-provider-installallation.html) and to pull the data into SAC via a JDBC connection. This feels like re-inventing the wheel, so I was wondering if there are ready made tools for this purpose, or if not, if anyone has done this using a docker container and can share some tips or code that would be much appreciated.

Google Merchant Center - retrieve the BestSellers_TopProducts_ report without the BigQuery Data Transfer service?

I'm trying to find a way to retrieve a specific Google Merchant Center report (BestSellers_TopProducts_) and upload it to BigQuery as part of a specific ETL process we're developing for a customer we have at my workplace.
So far, I know you can set up the BigQuery Data Transfer service so it automates the process of downloading this report but I was wondering if I could accomplish the same with Python and some API libraries from Google (like python-google-shopping) but I may be overdoing it and setting up the service is the way to go.
Is there a way to accomplish this rather than resorting to the aforementioned service?
On the other hand, and assuming the BigQuery Data Transfer service is the way to go, I see (in the examples) you need to create and provide the dataset you're going to extract the report data to so I guess the extraction is limited to the GCP project you're working with.
I mean... you can't extract the report data for a third-party even if you had the proper service account credentials, right?

Connecting on prem MQ to Google Cloud Platform

This is more of a conceptual question as there is no relevant documentation available. We have an on prem IBM-MQ from which we need to transfer data on our cloud storage bucket (GCP/AWS), what could be possible solutions in this case? Any help or direction would be appreciated. Thank you!
I'm assuming you can reach your goal once the MQ-data has been changed/converted to supported format by the Big Query.
You can refer on this google documentation for full guide on Loading data from local files. You can upload file via GCP Console or using selected programming language that will match on your on-prem. There's also variety of uploads that you can choose from according to data file. This also includes the right permission to use the BigQuery.
If you require authentication you check on this Big Query Authentication Guide

How to serve Google Big Query output to a client web

I have exported Firestore collections to Google Big Query to make data analysis and aggregation.
What is the best practice (using Google Cloud Products) to serve Big Query outputs to a client web application?
Google provides seven client libraries for BigQuery. You can take any library and write a webserver that will serve requests from client web application. The webserver can use a GCP service account to access BigQuery on behalf of its clients.
One such sample is this project. It's written in TypeScript. Uses NodeJS library on the server and React for the client app. I'm the author.
You may try to have an express tour through Google Data Studio, looking for the main features what this Google analytics service can offer for the customers. If your aim stands for visualizing data from Bigquery, Data Studio is a good option, thus it provides a variety of informative dashboards and reports, allowing the user customize charts and graphs sharing them publicly or via user collaboration groups.
Data Studio spreads a lot of connectors to different data sources, hence you can find a separate Bigquery connector for further integration with data resources residing in Bigquery warehouse.
You can track for any future product enhancements here.

Planning an architecture in GCP

I want to plan an architecture based on GCP cloud platform. Below are the subject areas what I have to cover. Can someone please help me to find out the proper services which will perform that operation?
Data ingestion (Batch, Real-time, Scheduler)
Data profiling
AI/ML based data processing
Analytical data processing
Elastic search
User interface
Batch and Real-time publish
Security
Logging/Audit
Monitoring
Code repository
If I am missing something which I have to take care then please add the same too.
GCP offers many products with functionality that can overlap partially. What product to use would depend on the more specific use case, and you can find an overview about it here.
That being said, an overall summary of the services you asked about would be:
1. Data ingestion (Batch, Real-time, Scheduler)
That will depend on where your data comes from, but the most common options are Dataflow (both for batch and streaming) and Pub/Sub for streaming messages.
2. Data profiling
Dataprep (which actually runs on top of Dataflow) can be used for data profiling, here is an overview of how you can do it.
3. AI/ML based data processing
For this, you have several options depending on your needs. For developers with limited machine learning expertise there is AutoML that allows to quickly train and deploy models. For more experienced data scientists there is ML Engine, that allows training and prediction of custom models made with frameworks like TensorFlow or scikit-learn.
Additionally, there are some pre-trained models for things like video analysis, computer vision, speech to text, speech synthesis, natural language processing or translation.
Plus, it’s even possible to perform some ML tasks in GCP’s data warehouse, BigQuery in SQL language.
4. Analytical data processing
Depending on your needs, you can use Dataproc, which is a managed Hadoop and Spark service, or Dataflow for stream and batch data processing.
BigQuery is also designed with analytical operations in mind.
5. Elastic search
There is no managed Elastic search service directly provided by GCP, but you can find several options on the marketplace, like an API service or a Kubernetes app for Google’s Kubernetes Engine.
6. User interface
If you are referring to a user interface for your own use, GCP’s console is what you’d be using. If you are referring to a UI for end-users, I’d suggest using App Engine.
If you are referring to a UI for data exploration, there is Datalab, which is essentially a managed notebook service, and Data Studio, where you can build plots of your data in real time.
7. Batch and Real-time publish
The publishing service in GCP, for both synchronous and asynchronous messages is Pub/Sub.
8. Security
Most security concerns in GCP are addressed here. Which is a wide topic by itself and should probably need a separate question.
9. Logging/Audit
GCP uses Stackdriver for logging of most of its products, and provides many ways to process and analyze those logs.
10. Monitoring
Stackdriver also has monitoring features.
11. Code repository
For this there is Cloud Source Repositories, which integrate with GCP’s automated build system and can also be easily synched with a Github repository.
12. Analytical data warehouse
You did not ask for this one, but I think it's an important part of a data analysis stack.
In the case of GCP, this would be BigQuery.