Can we connect Facebook/ Meta Ads with Apache Beam or any other real-time streaming service to get real time data from Facebook ads to Bigquery - facebook-graph-api

How can I bring real-time streaming data from Meta Ads and stream it into bigquery or the desired sink I want using Apache Beam or any connector.

I don't know anything about Meta Ads subscriptions, but you could write your own streaming source or run a process that subscribes to the ad data and publishes to pubsub (or another system like kafka) and process it in an Apache Beam pipeline.

Related

How to externalize API Manager Analytics

Is there a way to externalize API Manager Analytics without using Choreo Cloud?
In a situation where we don't have ELK stack, can we use custom log files, CSV or a Database to store Analytics information so we can run custom reports against them?
Can we write a custom handler to write Analytics information into a different external source?
WSO2 API Manager, by default uses logs files to log API Analytics information. It logs all the successful, fault and throttled event into the log file. With on-premises Analytics option, we can use ELK stack and with ELK stack approach, it uses Filebeats and Logstash to read those logs, and filter Analytics related information.
There is no out-of-the-box option to plug in a custom log file, CSV or a Database as the destination for Analytics information. However, we can write a custom handler/ event publisher (as demonstrated in [1]) to read those log files and collect Analytics related information and put them into a different format such as CSV, Database. Instead of publishing already available analytics events data, it is also possible to publish custom analytics data with the existing event schema as demonstrated here [2].
But this requires a lot of effort and sounds like implementing a new Analytics options from the scratch as this custom event publisher is responsible only for publishing the events and we will need some way of visualizing them.
As of now, there is only 2 options for API Manager Analytics and they are either Choreo Cloud (cloud) or ELK stack (on-premise).
[1] https://apim.docs.wso2.com/en/latest/api-analytics/samples/publishing-analytics-events-to-external-systems/
[2] https://apim.docs.wso2.com/en/latest/api-analytics/samples/publishing-custom-analytics-data/

Stream from Realtime Firebase DB to BigQuery

Is there any extention or ways to stream data from Realtime Firebase DB to BigQuery? I have read about Stream Collections to BigQuery but the description said it can be from Cloud Firestore
firebaser here
There is currently no Firebase Extension for streaming data from the Realtime Database to BigQuery. It is being considered though, so I file a or feature request with our support team to express your interest in that.
But for the moment you will have to implement the functionality yourself, likely with Cloud Functions listening to changes in the Realtime Database and then calling the BigQuery API to insert the data there. Given that Extensions are open-source by definition, you could use the source of the Stream Collections to BigQuery as a template/for inspiration.

Can Google Dataflow connect to API data source and insert data into Big Query

We are exploring few use cases where we might have to ingest data generated by the SCADA/PIMS devices.
For security reason, we are not allowed to directly connect to OT devices or datasources. Hence, this data has REST APIs which can be used to consume the data.
Please suggest if Dataflow or any other service from GCP can be used to capture this data and put it into Big Query or any other relevant target service.
If possible, please share any relevant documentation/link around such requirements.
Yes!
Here is what you need to know: when you write an Apache Beam pipeline, your processing logic lives in DoFn that you create. These functions can call any logic you want. If your data source is unbounded or just big, then you will author a "splittable DoFn" that can be read by multiple worker machines in parallel and checkpointed. You will need to figure out how to provide exactly-once ingestion from your REST API and how to not overwhelm your service; that is usually the hardest part.
That said, you may wish to use a different approach, such as pushing the data into Cloud Pubsub first. Then you would use Cloud Dataflow to read the data from Cloud Pubsub. This will provide a natural scalable queue between your devices and your data processing.
You can capture data with PubSub and direct it to be processed in Dataflow and then saved into BigQuery (or storage), with a specific IO connector.
Stream messages from Pub/Sub by using Dataflow:
https://cloud.google.com/pubsub/docs/stream-messages-dataflow
Google-provided streaming templates (for Dataflow): PubSub->Dataflow->BigQuery:
https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming
Whole solution:
https://medium.com/codex/a-dataflow-journey-from-pubsub-to-bigquery-68eb3270c93

Creating serverless audio processing with Google Cloud Storage and Google Speech to Text

I'm trying to create a small app that will allow me to translate audio to text via the Google Speech to Text services. I'd like to bypass the need for heavy processing and leverage as many cloud tools as possible to have audio streamed to the text to speech service. I've been able to get the streaming process to work, however, I have to relay the data to my server first and this creates an expense I'd like to cut out. There are a few questions that would help solve my problem in a cost effective way!
Can I created a signed URL for a Google Text To Speech streaming session?
Can I leverage the cloud and Cloud Functions to trigger processing by the text to speech service and then retrieve real time updates?
Can I get a signed URL that links to a copy of the audio streamed to the Google text to speech service?

How to send sensor data (like temperature data from DHT11 sensor) to Google Cloud IoT Core and store it

I am working on connecting a Raspberry Pi (3B+) to Google Cloud and send sensor's data to Google IoT Core. But I couldn't find any content in this matter. I will be so thankful, if anyone would help me, in dealing with the same.
PS: I have already followed the interactive tutorial from Google Cloud itself and connected a simulated virtual device to Cloud and sent data. I am really looking for a tutorial, that helps me in connecting physical Raspberry Pi.
Thank you
You may want to try following along with this community article covering pretty much exactly what you're asking.
The article covers the following steps:
Creating a registry for your gateway device (the Raspberry Pi)
Adding a temperature / humidity sensor
Adding a light
Connecting the devices to Cloud IoT Core
Sending the data from the sensors to Google Cloud
Pulling the data back using PubSub
Create a Registry in Google Cloud IoT Core and setup devices and their public/private key pairs.
You will also have to setup PubSub topics for publishing device telemetry and state events while creating IoT Core Registries.
Once that is done, you can create a Streaming pipeline in Cloud Dataflow that will read data from the pubsub subscriber and sink it in Big Query (Relational Data Warehouse) or Big Table (No-SQL Data Warehouse).
Dataflow is managed service of Apache Beam where you can create and deploy pipelines written in JAVA or Python.
If you are not familiar with coding, you can use Data Fusion that will help you write your ETL's using drag and drop functionalities similar to Talend.
You can create Data Fusion instance in order to create Streaming ETL pipeline. The source will be pubsub and sink will be Big Query or Big Table based on your use case.
For reference:
https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming
This link will guide you how to deploy google provided dataflow template from pubsub to big query.
For your own custom pipeline, you can take help fron the github link of pipeline code.