Need a high latency, ultra low bandwidth data transfer technique the likes of Google Pub/Sub but inverse - google-cloud-platform

I have an interesting IoT use case. For this example let's say I need to deploy a thousand IoT cellular displays. These have a single LED that will display information that will be useful to someone out in the world. For example a sign at the start of hiking trail that indicates whether the conditions are favorable. Each display needs to receive 3 bytes of data every 5-10 minutes.
I have successfully created a computer based demo of this system using a basic http GET request and cloud functions in the GCP. The "device" will ask for its 3 bytes every 10 minutes and receive the data back. The issue here is that the http overhead takes up 200+ bytes so the bandwidth usage will be high over cellular.
I then decided to try out Google cloud Pub/Sub protocol, but quickly realized that it is designed for devices transmitting to the cloud rather than receiving. My best guess is that each device would need its own topic, which would scale horribly?
Does anyone have any advice on a protocol that would work with the cloud (hopefully GCP) that could serve low bandwidth receive only devices? Does the pub/sub structure really not work for this case?

Related

Are websockets a suitable lowest latency and robust real-time communication protocol between two nearby servers in the same AWS Availability Zones?

Suitable technologies I am aware of:
Websockets
Zeromq
Please suggest others if they are a better fit for my problem.
For this use case I have just two machines, the sender and the receiver, and it's important to note they are fixed "nearby" each other, as they will be in the same availability zone on AWS. Answers which potentially relate to message passing over large spans of the internet aren't necessarily applicable. Note also the receiver server isn't queuing these up as tasks, it will just be forwarding select message feeds to website visitors over a websocket. The sending server does a lot of pre-processing and collating to the messages.
The solution needs to:
Be very high throughput. At present the sending server is processing about 10,000 messages per second (written in Rust) without breaking a sweat. Bursty traffic may increase this up to 20,000 or a bit more. I know zeromq can handle this.
Robust. The communication pipe will be open 24/7 365 days per year. My budget is extremely limited in terms of setting up clusters of machines as failovers so I have to do the best I can with two machines.
Message durability isn't required or a concern, the receiving server isn't required to store anything, it just needs all the data. The sender server asynchronously writes a durable 5 second summary of the data to a database and to a cache.
Messages must retain the order in which they are sent.
Low latency. This is very important as the data needs to be as realtime as possible.
A websocket seems to get this job done for 1 to 4. What I don't know is how robust a websocket is for communication that's 24 hours a day 7 days a week. I've observed websocket connections getting dropped online in general (Of course I will write re-connect code, heartbeat mointoring if required but still this concerns me). I also wonder if the high throughput is too much for the websocket.
I have zero experience in this kind of problem but I have a very good websocket library that I'm comfortable using. I ruled out Apache Kafka as it seems expensive to get high throughput, tricky to manage with dev ops (zookeeper) and seems overkill as I don't need durability and it's only communication between 2 machines. So I'm hoping for a simple solution.
It sounds like you are explaining precicely what EC2 cluster placement groups provide: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-placementgroup.html
Edit: You should be able to create the placement group with 2 machines to cover your limited budget. Using larger instances, according to your budget, will also support higher network throughput.
Point 4 looks like it would be supported by SQS FIFO though, despite the fact that SQS FIFO queues only support up to 3,000 messages per second with batching.
A managed streaming solution like Kinesis Data Streams would definitely cover your use case, at scale, much better than a raw web socket. Using Kinesis Client Libraries, you can write your consumer to read from the stream.
AWS also has a Managed Kafka service to rule out the overhead and management of necessary components like Apache ZK: https://aws.amazon.com/msk/

I want to know the procedure, someone should follow, to do actuation task, from sensor data that is stored in Google Cloud

I am following this tutorial
https://codelabs.developers.google.com/codelabs/iotcore-heartrate/index.html?index=..%2F..index#0
Now i am able to send heart rate sensor data to Google Cloud BigQuery, Cloud storage etc, as described in the tutorial clearly and I am able to visualise it as well
But my next question is, how do we get access to data in real-time. For example, say if the heart rate data from Raspberry Pi (3B+) goes up over 75, i want to trigger and turn on the LED of the ESP32 that is connected at the receiving end.
In a nutshell, I want to do some actuation (like LED blinking as I told earlier) on ESP32, based on the sensor data from Raspberry Pi that goes to Google Cloud. I am only successful in sending, storing, and visualising sensor data in Google Cloud. Your help in enabling me to complete the actuation step is so valuable as I am pretty much clueless, how it can be done
Thanking you
There's a couple options here. The easiest to stand up, is Cloud Functions. The function can be triggered by Pub/Sub messages. It can also be authenticated with the IoT Core Admin SDK (via service accounts) to then send a configuration/command back down to the device you want to light up with the LED.
I wrote a blog post about setting up the Cloud to device communication piece:
https://medium.com/google-cloud/cloud-iot-step-by-step-cloud-to-device-communication-655a92d548ca
It covers how to setup the function to do it, although the function code itself in the example is an HTTP function, which means it triggers by hitting a URL endpoint instead of Pub/Sub, but that part's easy enough.
The big piece you'll need to investigate is pulling the Pub/Sub message in the function that triggered it. There's good docs on that here:
https://cloud.google.com/functions/docs/calling/pubsub
If you have super high throughput, then Cloud Functions can get expensive, and at that point you'd want to switch over to using something like Dataflow (https://cloud.google.com/dataflow/docs/). Then either having that job when it runs react to telemetry and hit an endpoint Function when it hits the target condition, or go through authenticating the job itself with the IoT Admin SDK. I haven't done that before, so I actually don't know how easy/hard that might be to do.

how handle high incoming data and sending throttled data to web sockets in django-channels

I am facing a problem in my django web server.
We are using python3, django2, django-rest-framework3.8 and channels2.x
Scenario is we are receiving DATA from a UDP connection at very fast rate (~100 messages per second). The data revived is in proto format (you can say we are receiving byte data). some data gets starved in this process as Rate of production >>> rate of consumption we are implementing throttling but still at 100 concurrent users data starves again. Can anyone help us in this scenario.
If anyone has any new architecture idea please share.
This is surely an interesting problem. This is about stock market feed
PS :- I cannot post any code as it is my companies. but i can help any time you need clarification on any point.
In many stock market data applications your same exact problem is solved by having Lightstreamer Server take care of throttling on the websocket (full disclosure: I am the CEO at Lightstreamer).
You will develop a Data Adapter using the Lightstreamer API to consume data from your UDP connection and inject them into the Lightstreamer Server. Then, you can specify a maximum update rate for each client and each subscription, as well as a max bandwidth. Lightstreamer will throttle the data on the fly, taking into consideration not only the client capacity, but also the network status.
When throttling, you can choose between conflating updates (typical for stock market data) and queuing them.

streaming data through mqtt to aws IoT with acknowledgement mechanism

I am trying to send chunks of a data by using MQTT to aws IoT and by rule engine, the data will be streamed to a web app in real time (or near to real time )manner.
I am trying to find a way by which I can get some kind of acknowledgement. the role of acknowledgement is important for my use case as I am trying to send some kind of medical equipment data (lets say blood pressure data during operation).
thus, In a nut shell, I need a quickest way to transfer data(just like MQTT) between the device and AWS IoT which should have some kind acknowledgement mechanism.
Also I would like to add that ,'if suppose the device or web server could not get the message sent by MQTT due to some internet issues, then it will be lost right. I need to add some sort of mechanism by which the mqtt messages could be buffered and queued for some time so that once the device or web app comes online it can get the queued data. I also know that there is something called device shadow but we have thought using it differently. can you suggest about it ?'
I am sure that some of you have faced this problem and also found an alternate of MQTT in data transferring to AWS IoT.
Kindly share your thoughts.
Thanks.

Debugging network applications and testing for synchronicity?

If I have a server running on my machine, and several clients running on other networks, what are some concepts of testing for synchronicity between them? How would I know when a client goes out-of-sync?
I'm particularly interested in how network programmers in the field of game design do this (or just any continuous network exchange application), where realtime synchronicity would be a commonly vital aspect of success.
I can see how this may be easily achieved on LAN via side-by-side comparisons on separate machines... but once you branch out the scenario to include clients from foreign networks, I'm just not sure how it can be done without clogging up your messaging system with debug information, and therefore effectively changing the way that synchronicity would result without that debug info being passed over the network.
So what are some ways that people get around this issue?
For example, do they simply induce/simulate latency on the local network before launching to foreign networks, and then hope for the best? I'm hoping there are some more concrete solutions, but this is what I'm doing in the meantime...
When you say synchronized, I believe you are talking about network latency. Meaning, that a client on a local network may get its gaming information sooner than a client on the other side of the country. Correct?
If so, then I'm sure you can look for books or papers that cover this kind of topic, but I can give you at least one way to detect this latency and provide a way to manage it.
To detect latency, your server can use a type of trace route program to determine how long it takes for data to reach each client. A common Linux program example can be found here http://linux.about.com/library/cmd/blcmdl8_traceroute.htm. While the server is handling client data, it can also continuously collect the latency statistics and provide the data to the clients. For example, the server can update each client on its own network latency and what the longest latency is for the group of clients that are playing each other in a game.
The clients can then use the latency differences to determine when they should process the data they receive from the server. For example, a client is told by the server that its network latency is 50 milliseconds and the maximum latency for its group it 300 milliseconds. The client then knows to wait 250 milliseconds before processing game data from the server. That way, each client processes game data from the server at approximately the same time.
There are many other (and probably better) ways to handle this situation, but that should get you started in the right direction.