Specifics of using a push subscription as a load balancer - google-cloud-platform

I am trying to send IoT commands using a push subscription. I have 2 reasons for this. Firstly, my devices are often on unstable connections so going through the pubsub let me have retries and I don't have to wait the QoS 1 timeout (I still need it because I log it for later use) at the time I send the message. The second reason is the push subscription can act as a load balancer. To my understanding, if multiple consumers listen to the same push subscription, each will receive a subset of the messages, effectively balancing my workload. Now my question is, this balancing is a behavior I observed on pull subscriptions, I want to know if:
Do push subscription act the same ?
Is it a reliable way to balance a workload ?
Am I garanteed that these commands will be executed at most once if there is, lets say, 15 instances listening to that subscription ?
Here's a diagram of what I'm trying to acheive:
Idea here is that I only interact with IoT Core when instances receive a subset of the devices to handle (when the push subscription triggers). Also to note that I don't need this perfect 1 instance for 1 device balancing. I just need the workload to be splitted in a semi equal manner.
EDIT: The question wasn't clear so I rewrote it.

I think you are a bit confused about the concepts behind Pub/Sub. In general, you publish messages to a topic for one or multiple subscribers. I prefer to compare Pub/Sub with a magazine that is being published by a big publishing company. People who like the magazine can get a copy of that magazine by means of a subscription. Then when a new edition of that magazine arrives, a copy is being sent to the magazine subscribers, having exactly the same content among all subscribers.
For Pub/Sub you can create multiple push subscriptions for a topic, up to the maximum of 10,000 subscriptions per topic (also per project). You can read more about those quotas in the documentation. Those push subscriptions can contain different endpoints, in your case, representing your IoT devices. Referring back to the publishing company example, those push endpoints can be seen as the addresses of the subscribers.
Here is an example IoT Core architecture, which focuses on the processing of data from your devices to a store. The other way around could also work. Sending a message (including device/registry ID) from your front-end to a Cloud Function wrapped in API gateway. This Cloud Function then publishes the message to a topic, which sends the message to a cloud Function that posts the message using the MQTT protocol. I worked out both flows for you that are loosely coupled so that if anything goes wrong with your device or processing, the data is not lost.
Device to storage:
Device
IoT Core
Pub/Sub
Cloud Function / Dataflow
Storage (BigQuery etc.)
Front-end to device:
Front-end (click a button)
API Gateway / Cloud Endpoints
Cloud Function (send command to pub/sub)
Pub/Sub
Cloud Function (send command to device with MQTT)
Device (execute the command)

Related

google cloud pubsublite client on a serverless service

First of all, I wanted to tag this post to google-cloud-pubsub-lite, but it's not created yet, my apologizes
I'm trying to get introduced with pubsub lite. I think it can be used as a "cheap" way to get an event store in a GCP project.
We usually create GAE standard services so we pay for what we used and at the same time it offers a great scalability.
Reading samples about how to currently subscribe to pubsub lite I observe that there's no option to supply an endpoint to receive new messages. The client connects to a subscription and stays awaiting for new messages to be streamed throw the connection.
I'm wondering a few qustions:
Can we receive messages from a pubsub lite topic in a Cloud Function or in an endpoint of a GAE standard service?
How can we scale to several clients for a topic subscription
Thanks
PubSub lite subscription supports only the Pull mode. So, you need to create one or several clients, to plug them to the subscription and to get the messages.
In serverless mode, you should use the Push subscription more suitable for scalability and integration. In the pull subscription mode, you need to perform microbatches
Create a Cloud Scheduler
* * * * * as frequency
Call the serverless tool that you want (Cloud Run, Cloud Function, App Engine)
On the serverless product, when you receive a request, create a connection to the PubSub lite subscription and start to pull the messages.
If the pulling takes more than 1 minutes a new request will be received from Cloud Scheduler
Cloud Function will create a new instance automatically and start the pulling
Cloud Run can handle up to 80 requests concurrently. I recommend you to set the Concurrency paramater to 1 to have the exact same behavior as Cloud Function
You can't play with the concurrency on App Engine
Set the timeout to the max
If there is no new message (for example during 500ms) exit gracefully.
If the service timeout is close (15s before for example), stop the pulling and exit gracefully.
Like this, you could have several client to the same subscription (scale + 1 per minutes
and per scheduler, if the previous run is still active)
This workaround keep the serverless mode. If there is no messages, the pulling stopped after 500ms, or when there is no new messages. You scale up with your traffic.
However, I don't understand your concept of cheap event store.
PubSub lite is not a pay as you go model, but a flat model. You reserve capacity and you pay for it 24/7 even if it is not used
PubSub lite is zonal, and dangerous for HA
You can keep the event up to the partition is full. But will not be cheaper to store the event elsewhere? BigQuery? Firestore? Cloud SQL?

Push vs Pull for GCP Dataflow

I want to know what type of subscription one should create in GCP pubsub in order to handle high-frequency data from pubsub topic.
I will be ingesting data in dataflow with 100 plus messages per second.
Will pull or push subscription really matters and how it will gonna affect the speed and all.
If you consume the PubSub subscription with Dataflow, only Pull subscription is available
either you create one and you give it in the parameter of your dataflow pipeline
or you specify only the topic in your dataflow pipeline and Dataflow will create by itself the pull subscription.
If both case, Dataflow will process the messages in streaming mode
The difference
If you create the subscription by yourselves, all the messages will be stored and kept (up to 7 days by default) and will be consumed when the dataflow pipeline will be started.
If you let Dataflow to create the subscription, only the message that arrives AFTER the subscription creation will be consumed by the dataflow pipeline. If you want to not loose a message, it's not the recommended solution. If you don't care about the old message, it's a good choice.
High frequency
Then, 100 messages per second is absolutely not high frequency. 1 pubsub topic can ingest up to 1 000 000 of messages per second. Don't worry about that!
Push VS Pull
The model is different.
With the push subscription, you have to specify an HTTP endpoint (on GCP or elsewhere) that consumes the message. It's a webhook pattern. If the platform endpoint scale automatically with the traffic (Cloud Run, Cloud Functions for example), the message rate can go very high!! And the HTTP return code stands for message acknowledgment.
With Pull subscription, the client needs to open a connection to the subscription and then pull the message. The client needs to explicitly acknowledge the messages. Several clients can be connected at the same time. With the client library, the message is consumed with gRPC protocol and it's more efficient (in terms of network bandwidth) to receive and consume the message
Security point of view
With push, it's the PubSub to be authenticated on the HTTP endpoint, if the endpoint required authentication
With pull, it's the client that needs to be authenticated on the PubSub subscription.

Google Cloud IoT Core and Pubsub Pricing?

I am using google IoT core and pubsub services for my IoT devices. I am publishing data using pubsub to the database. but I think its quite expensive to store every data into the database. I have some data like if the device is on or off and a configuration file which has some parameter which I need to process my IoT payload. Now I am not able to understand if configuration and state topic in IoT is expensive or not? and how long the data is stored in the config topic and is it feasible that whenever the parameter is changed in the config file it publish that data into config topic? and what if I publish my state of a device that if it is online or not every 3 seconds or more into the state topic?
You are mixing different things. There is Cloud IoT, where you have a device registry, with metadata, configuration and states. You also have PubSub topic in which you can publish message about IoT payload that can contain configuration data (I assume that is that you means in this sentence: "it publish that data into config topic").
In definitive it's simple.
All the management operations on Cloud IoT are free (device registration, configuration, metadata,...). There is no limitation and no duration limit. The only one which exists in the quotas for rate limit and configuration size.
The inbound and outbound traffic from and to the IoT devices is billed as described here
If you use PubSub for pushing your messages, Cloud Functions (or Cloud Run, or other compute option), a database (Cloud SQL or Datastore/Firestore), all these services are billed as usual, there is no relation with Cloud IoT service & billing. The constraints of each services are applied as a regular usage. For example, a PubSub message live up to 7 days (by default) in a subscription and until it hasn't acknowledged.
EDIT
Ok, got it, I took time to understood what you wanted to achieve.
The state is designed for getting the internal representation of the devices, but the current limitation doesn't allow you to update it automatically when you received message.
You have 2 solutions:
Either you can update your devices and send an update message only when its state changes (it's for this kind of use case that the feature is designed!)
Or, let the device published the messages every 3 seconds, but in the event PubSub topic. Get the events in a function which get the state list, get the first one (the most recent) and compare the value with the PubSub message. If different, update the state. This workflow also work with external database like Datastore or Firestore.

Google Cloud - Detecting Offline Devices

I am rather new to Google Cloud IoT Core and the associated services, and have come across a problem for which I can find no "best practice" solution.
Using Google Cloud IoT Core to receive telemetry data from IoT Devices, what is the best way to detect when an IoT Sensor Device goes offline or becomes silent? Other Cloud based IoT Service implementations have built-in notification timeouts for generating alerts, but I can find no similar for Google IoT
Example: A number of IoT Edge devices monitors the temperature of cold storage rooms, and pushes a measurement every minute to a Google Cloud IoT Core, via MQTT or HTTP through WiFi or mobile data connections. If the measured temperature exceeds acceptable limits, an alert message is triggered, and routed to operational service personnel.
However, if one of the IoT Edge sensors suddenly stops operating, for whatever reason, how can this be detected by Google Cloud IoT services? Obviously, the only sign of something being wrong, is that no messages have been received from a certain DeviceID for a period substantially longer than the configured messaging-interval, e.g. 2 x interval + grace_period, so that an alert can be generated to warn of a lack of telemetry data, possibly caused by a power failure, which needs to be addressed?
Is there any standard-means by which an "IoT Device Presence" status can be automatically maintained for each device, based on the (lack of) received telemetry data from the device, in such a way, that the state change (online/offline transitions) can cause alert messages to be generated?
Or will it require a separate scheduled service to iterate all (supposedly active) devices, measuring the duration since the last received telemetry (temperature) update, and updating the device presence status directly?
Assuming you just want disconnect events, there was a solution posted earlier that involves setting up StackDriver logs that exports messages to Pub/Sub. From there, you can handle the event in a Cloud Function to send an email in a similar way to what is available in your listed implementation. It takes more time to set up, but is more flexible in terms of what you can do with connect/disconnect events.
Google Core IoT Device Offline Event or Connection Status

Publish message to specific subfolder

I'm trying to use Google Cloud Platform to make a little IoT project.
I've created a registry and a device in the "IoT Core" section and connected the registry to a default topic.
I've also specified three subfolders for that topic: "events", "config" and "status".
Now, I would like to connect a "Cloud Function" for the incoming messages but I can't find how to configure a single subfolder to monitor neither how to publish messages on them in the "Cloud Pub/Sub" section.
All the documentation talks about single topics so.... Am I missing some base concept on how it works?
Need to back up a step. What are you trying to do with the various subfolders? This may be a misunderstanding with how communication works with the device to and from the Cloud.
So, there's 3 MQTT topics, those are the events/config/state (not status). Those DON'T map to Pub/Sub topics at all. They're each handled separately in IoT Core.
Events is device->Cloud, and gets put into the Pub/Sub topic you specify when creating the registry. Setting subfolders is all about splitting telemetry from the same device to multiple places for handling. So, e.g. you have temp data you want to go in one pub/sub topic, and pressure data you want into another. The other way to handle this would be to attach the function to the main pub/sub topic, parse the telemetry, and then re-issue a Pub/Sub message to different places depending on the payload itself.
Config is Cloud->device, and is initiated by calls to the IoT Core Admin SDK. If your device subscribes to the /config/ MQTT topic, then they'll get a callback on the MQTT client's on_message handler (exact code depends on library being used of course) when some external entity sends a config message to IoT Core for your device.
State is device->Cloud but is specially handled, and doesn't go to any Pub/Sub topic. The results can be retrieved by the IoT Core Admin SDK. It's a way for the device to report its state and then external processes/applications to get that status without having to call back down to the device itself. This is particularly useful when you have devices which don't remain connected, for example, but you still want to be able to do things based on the last known state.
So the only one that you can use Cloud Functions with is the /events/ topic. That's done by deploying a Cloud Function and choosing the Pub/Sub publish event hook as the firing mechanism for the function, and specifying the IoT Core's registry Pub/Sub topic as the source of the events. Then anytime your device publishes telemetry to the /events/ MQTT topic, it'll get published to that Pub/Sub topic (confusing I know because we call them both topics) and the Cloud Function will fire.
Hopefully this clarifies what's going on? Or did I totally miss the question? :)
I believe the correct way to set this up would be to set up one pubsub topic for each subfolder. From the cloud iot core documentation:
Devices can publish data to separate Pub/Sub topics by specifying a subfolder in the MQTT topic. The subfolder is the subtopic after {device-id}/events. For example, if the device publishes to the MQTT topic /devices/{device-id}/events/alerts, the subfolder is the string alerts. This subfolder must be configured in the device registry resource's eventNotificationConfigs.subfolderMatches field with a matching Pub/Sub topic in the eventNotificationConfigs.pubsubTopicName field. When data is sent to a subfolder, it is published to the subfolder's matching Pub/Sub topic.
https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#publishing_telemetry_events_to_separate_pubsub_topics