how handle high incoming data and sending throttled data to web sockets in django-channels - django

I am facing a problem in my django web server.
We are using python3, django2, django-rest-framework3.8 and channels2.x
Scenario is we are receiving DATA from a UDP connection at very fast rate (~100 messages per second). The data revived is in proto format (you can say we are receiving byte data). some data gets starved in this process as Rate of production >>> rate of consumption we are implementing throttling but still at 100 concurrent users data starves again. Can anyone help us in this scenario.
If anyone has any new architecture idea please share.
This is surely an interesting problem. This is about stock market feed
PS :- I cannot post any code as it is my companies. but i can help any time you need clarification on any point.

In many stock market data applications your same exact problem is solved by having Lightstreamer Server take care of throttling on the websocket (full disclosure: I am the CEO at Lightstreamer).
You will develop a Data Adapter using the Lightstreamer API to consume data from your UDP connection and inject them into the Lightstreamer Server. Then, you can specify a maximum update rate for each client and each subscription, as well as a max bandwidth. Lightstreamer will throttle the data on the fly, taking into consideration not only the client capacity, but also the network status.
When throttling, you can choose between conflating updates (typical for stock market data) and queuing them.

Related

How to estimate the bandwidth/throughput of grpc

I am working on a network related project, where communication between client and server is implemented by grpc-cpp. I want to estimate the bandwidth/throughput of data transfer between server and client. Currently, client sends request containing data and server will reply a short message. The data is transferred as bytes with size 10~100KB.
It can be easy to estimate the bandwidth on client side by measuring the time difference between sending and receiving, then minus the execution time on server. But how to do that on server side? It looks like the GlobalCallbacks::PreSynchronousRequest is called only after the whole frame has been received, and there is no way to know the duration between two packets (each contains a part of the whole frame).
Is there any other way to roughly estimate the bandwidth between server-client on server side?
Depending on what you want to achieve with this bandwidth measurement, tools can vary but I suggest to start with a network throughput measuring tool such as iptraf, iftop, so on. Then you don't need to implement this by yourself with the limitation of gRPC API which isn't meant to measure this bandwidth.

Need a high latency, ultra low bandwidth data transfer technique the likes of Google Pub/Sub but inverse

I have an interesting IoT use case. For this example let's say I need to deploy a thousand IoT cellular displays. These have a single LED that will display information that will be useful to someone out in the world. For example a sign at the start of hiking trail that indicates whether the conditions are favorable. Each display needs to receive 3 bytes of data every 5-10 minutes.
I have successfully created a computer based demo of this system using a basic http GET request and cloud functions in the GCP. The "device" will ask for its 3 bytes every 10 minutes and receive the data back. The issue here is that the http overhead takes up 200+ bytes so the bandwidth usage will be high over cellular.
I then decided to try out Google cloud Pub/Sub protocol, but quickly realized that it is designed for devices transmitting to the cloud rather than receiving. My best guess is that each device would need its own topic, which would scale horribly?
Does anyone have any advice on a protocol that would work with the cloud (hopefully GCP) that could serve low bandwidth receive only devices? Does the pub/sub structure really not work for this case?

Are websockets a suitable lowest latency and robust real-time communication protocol between two nearby servers in the same AWS Availability Zones?

Suitable technologies I am aware of:
Websockets
Zeromq
Please suggest others if they are a better fit for my problem.
For this use case I have just two machines, the sender and the receiver, and it's important to note they are fixed "nearby" each other, as they will be in the same availability zone on AWS. Answers which potentially relate to message passing over large spans of the internet aren't necessarily applicable. Note also the receiver server isn't queuing these up as tasks, it will just be forwarding select message feeds to website visitors over a websocket. The sending server does a lot of pre-processing and collating to the messages.
The solution needs to:
Be very high throughput. At present the sending server is processing about 10,000 messages per second (written in Rust) without breaking a sweat. Bursty traffic may increase this up to 20,000 or a bit more. I know zeromq can handle this.
Robust. The communication pipe will be open 24/7 365 days per year. My budget is extremely limited in terms of setting up clusters of machines as failovers so I have to do the best I can with two machines.
Message durability isn't required or a concern, the receiving server isn't required to store anything, it just needs all the data. The sender server asynchronously writes a durable 5 second summary of the data to a database and to a cache.
Messages must retain the order in which they are sent.
Low latency. This is very important as the data needs to be as realtime as possible.
A websocket seems to get this job done for 1 to 4. What I don't know is how robust a websocket is for communication that's 24 hours a day 7 days a week. I've observed websocket connections getting dropped online in general (Of course I will write re-connect code, heartbeat mointoring if required but still this concerns me). I also wonder if the high throughput is too much for the websocket.
I have zero experience in this kind of problem but I have a very good websocket library that I'm comfortable using. I ruled out Apache Kafka as it seems expensive to get high throughput, tricky to manage with dev ops (zookeeper) and seems overkill as I don't need durability and it's only communication between 2 machines. So I'm hoping for a simple solution.
It sounds like you are explaining precicely what EC2 cluster placement groups provide: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-placementgroup.html
Edit: You should be able to create the placement group with 2 machines to cover your limited budget. Using larger instances, according to your budget, will also support higher network throughput.
Point 4 looks like it would be supported by SQS FIFO though, despite the fact that SQS FIFO queues only support up to 3,000 messages per second with batching.
A managed streaming solution like Kinesis Data Streams would definitely cover your use case, at scale, much better than a raw web socket. Using Kinesis Client Libraries, you can write your consumer to read from the stream.
AWS also has a Managed Kafka service to rule out the overhead and management of necessary components like Apache ZK: https://aws.amazon.com/msk/

streaming data through mqtt to aws IoT with acknowledgement mechanism

I am trying to send chunks of a data by using MQTT to aws IoT and by rule engine, the data will be streamed to a web app in real time (or near to real time )manner.
I am trying to find a way by which I can get some kind of acknowledgement. the role of acknowledgement is important for my use case as I am trying to send some kind of medical equipment data (lets say blood pressure data during operation).
thus, In a nut shell, I need a quickest way to transfer data(just like MQTT) between the device and AWS IoT which should have some kind acknowledgement mechanism.
Also I would like to add that ,'if suppose the device or web server could not get the message sent by MQTT due to some internet issues, then it will be lost right. I need to add some sort of mechanism by which the mqtt messages could be buffered and queued for some time so that once the device or web app comes online it can get the queued data. I also know that there is something called device shadow but we have thought using it differently. can you suggest about it ?'
I am sure that some of you have faced this problem and also found an alternate of MQTT in data transferring to AWS IoT.
Kindly share your thoughts.
Thanks.

Debugging network applications and testing for synchronicity?

If I have a server running on my machine, and several clients running on other networks, what are some concepts of testing for synchronicity between them? How would I know when a client goes out-of-sync?
I'm particularly interested in how network programmers in the field of game design do this (or just any continuous network exchange application), where realtime synchronicity would be a commonly vital aspect of success.
I can see how this may be easily achieved on LAN via side-by-side comparisons on separate machines... but once you branch out the scenario to include clients from foreign networks, I'm just not sure how it can be done without clogging up your messaging system with debug information, and therefore effectively changing the way that synchronicity would result without that debug info being passed over the network.
So what are some ways that people get around this issue?
For example, do they simply induce/simulate latency on the local network before launching to foreign networks, and then hope for the best? I'm hoping there are some more concrete solutions, but this is what I'm doing in the meantime...
When you say synchronized, I believe you are talking about network latency. Meaning, that a client on a local network may get its gaming information sooner than a client on the other side of the country. Correct?
If so, then I'm sure you can look for books or papers that cover this kind of topic, but I can give you at least one way to detect this latency and provide a way to manage it.
To detect latency, your server can use a type of trace route program to determine how long it takes for data to reach each client. A common Linux program example can be found here http://linux.about.com/library/cmd/blcmdl8_traceroute.htm. While the server is handling client data, it can also continuously collect the latency statistics and provide the data to the clients. For example, the server can update each client on its own network latency and what the longest latency is for the group of clients that are playing each other in a game.
The clients can then use the latency differences to determine when they should process the data they receive from the server. For example, a client is told by the server that its network latency is 50 milliseconds and the maximum latency for its group it 300 milliseconds. The client then knows to wait 250 milliseconds before processing game data from the server. That way, each client processes game data from the server at approximately the same time.
There are many other (and probably better) ways to handle this situation, but that should get you started in the right direction.