Google Merchant Center - retrieve the BestSellers_TopProducts_ report without the BigQuery Data Transfer service? - google-cloud-platform

I'm trying to find a way to retrieve a specific Google Merchant Center report (BestSellers_TopProducts_) and upload it to BigQuery as part of a specific ETL process we're developing for a customer we have at my workplace.
So far, I know you can set up the BigQuery Data Transfer service so it automates the process of downloading this report but I was wondering if I could accomplish the same with Python and some API libraries from Google (like python-google-shopping) but I may be overdoing it and setting up the service is the way to go.
Is there a way to accomplish this rather than resorting to the aforementioned service?
On the other hand, and assuming the BigQuery Data Transfer service is the way to go, I see (in the examples) you need to create and provide the dataset you're going to extract the report data to so I guess the extraction is limited to the GCP project you're working with.
I mean... you can't extract the report data for a third-party even if you had the proper service account credentials, right?

Related

Best way to ingest data to bigquery

I have heterogeneous sources like flat files residing on prem, json on share point, api which serves data so and so. Which is the best etl tool to bring data to bigquery environment ?
Im a kinder garden student in GCP :)
Thanks in advance
There are many solutions to achieve this. It depends on several factors some of which are:
frequency of data ingestion
whether or not the data needs to be
manipulated before being written into bigquery (your files may not
be formatted correctly)
is this going to be done manually or is this going to be automated
size of the data being written
If you are just looking for an ETL tool you can find many. If you plan to scale this to many pipelines you might want to look at a more advanced tool like Airflow but if you just have a few one-off processes you could set up a Cloud Function within GCP to accomplish this. You can schedule it (via cron), invoke it through HTTP endpoint, or pub/sub. You can see an example of how this is done here
After several tries and datalake/datawarehouse design and architecture, I can recommend you only 1 thing: ingest your data as soon as possible in BigQuery; no matter the format/transformation.
Then, in BigQuery, perform query to format, clean, aggregate, value your data. It's not ETL, it's ELT: you start by loading your data and then you transform them.
It's quicker, cheaper, simpler, and only based on SQL.
It works only if you use ONLY BigQuery as destination.
If you are starting from scratch and have no legacy tools to carry with you, the following GCP managed products target your use case:
Cloud Data Fusion, "a fully managed, code-free data integration service that helps users efficiently build and manage ETL/ELT data pipelines"
Cloud Composer, "a fully managed data workflow orchestration service that empowers you to author, schedule, and monitor pipelines"
Dataflow, "a fully managed streaming analytics service that minimizes latency, processing time, and cost through autoscaling and batch processing"
(Without considering a myriad of data integration tools and fully customized solutions using Cloud Run, Scheduler, Workflows, VMs, etc.)
Choosing one depends on your technical skills, real-time processing needs, and budget. As mentioned by Guillaume Blaquiere, if BigQuery is your only destination, you should try to leverage BigQuery's processing power on your data transformation.

Can SSO be used to create a dataflow that will reside in the PBI service?

Our client would like us to use dataflows for data reuse and other reasons. We will be connecting to a Snowflake database from PBI service. However, they also want to be able to use SSO (Single Sign-On). So, when a user creates a dataset referencing a dataflow, they want the credentials from the currently logged in user to be picked up via SSO and passed along to Snowflake when the dataflow retrieves data from Snowflake. I don't think this can be done but I wanted to verify.
BTW, I know that SSO can be used with PBI Desktop. Just curious if dataflows can use it.
Yes, it seems is possible to use DataFlows with SSO for Snowflake. I am coming to this conclusion from the following reference : https://learn.microsoft.com/en-us/power-query/connectors/snowflake
which includes DataFlows under Summary - Products.

Google Cloud Dataflow - is it possible to define a pipeline that reads data from BigQuery and writes to an on-premise database?

My organization plans to store a set of data in BigQuery and would like to periodically extract some of that data and bring it back to an on-premise database. In reviewing what I've found online about Dataflow, the most common examples involve moving data in the other direction - from an on-premise database into the cloud. Is it possible to use Dataflow to bring data back out of the cloud to our systems? If not, are there other tools that are better suited to this task?
Abstractly, yes. If you've got a set of sources and syncs and you want to move data between them with some set of transformations, then Beam/Dataflow should be perfectly suitable for the task. It sounds like you're discussing a batch-based periodic workflow rather than a continuous streaming workflow.
In terms of implementation effort, there's more questions to consider. Does an appropriate Beam connector exist for your intended on-premise database? You can see the built-in connectors here: https://beam.apache.org/documentation/io/built-in/ (note the per-language SDK toggle at top of page)
Do you need custom transformations? Are you combining data from systems other than just BigQuery? Either implies to me that you're on the right track with Beam.
On the other hand, if your extract process is relatively straightforward (e.g. just run a query once a week and extract it), you may find there are simpler solutions, particularly if you're not moving much data and your database can ingest data in one of the BigQuery export formats.

How to serve Google Big Query output to a client web

I have exported Firestore collections to Google Big Query to make data analysis and aggregation.
What is the best practice (using Google Cloud Products) to serve Big Query outputs to a client web application?
Google provides seven client libraries for BigQuery. You can take any library and write a webserver that will serve requests from client web application. The webserver can use a GCP service account to access BigQuery on behalf of its clients.
One such sample is this project. It's written in TypeScript. Uses NodeJS library on the server and React for the client app. I'm the author.
You may try to have an express tour through Google Data Studio, looking for the main features what this Google analytics service can offer for the customers. If your aim stands for visualizing data from Bigquery, Data Studio is a good option, thus it provides a variety of informative dashboards and reports, allowing the user customize charts and graphs sharing them publicly or via user collaboration groups.
Data Studio spreads a lot of connectors to different data sources, hence you can find a separate Bigquery connector for further integration with data resources residing in Bigquery warehouse.
You can track for any future product enhancements here.

SpagoBI SDK how to get the entire datasets & extract info from them through webservices?

Is it possible to get a dataset through WS against SpagoBI SDK? All that I got were just some documents but I was wondering if I could do that with Datasets as well.
From the website I can see
"The services available through web service are:
Data Set: it allows to receive the data sets, defined in SpagoBI, and their respective metadata."
http://www.spagoworld.org/xwiki/bin/view/SpagoBI/SpagoBISDK
I'd really appreciate any help.
What I want to do is just to integrate some other external tools on my project. I though SpagoBI would be nice to be a kind of Back-Office to manage any Dataset from different Datasource
Bye