Websocket from Kafka messages - django

I am working on an IoT project that uses MQTT protocol to transport the sensor data from embedded devices to an App. For this i have created,
A MQTT broker to send the data from the device.
A custom bridge that push data from MQTT broker to my Kafka broker
Django server to push the messages via websocket to the App
Right now, What i need is to consume the Kafka messages from django, save to the DB and then push this data to client via websockets. But i don't have much idea regarding how to consume Kafka messages from Django.
So far the solution in my mind is using custom management command, start a kafka consumer, push the data to DB and then to websockets.
Is this a good approach? If not, what would be a good solution to solve this?

You can add periodic task to consume a topic and bulk insert (or update) to database (it impact on the performance).

Related

Kafka Python producer integration with django web app

I have a question on how can we integrate kafka producer with a front end web app. get the data for every minute or second . Can the web app pass the JSON object to a running producer each time the it is created ? or do we need to initiate the kafka client each time we get a JSON object ?
You would want to probably open a new Producer for every session, probably not open and close for each and every request. And this would be done on the backend, not the frontend.
But a web server consisting of a Kafka client is no different underneath the HTTP layer vs a regular console app; you accept an incoming request, deserialize it, then optionally parse, then serialize again for Kafka output, then optionally render something back to the user.
If you're really asking, "is Kafka with HTTP requests possible", regardless of the language and platforms, then sure, the Confluent REST Proxy operates similarly, only written in Java
As far as webapps tracking goes, I would suggest looking into Divolte Collector

streaming data through mqtt to aws IoT with acknowledgement mechanism

I am trying to send chunks of a data by using MQTT to aws IoT and by rule engine, the data will be streamed to a web app in real time (or near to real time )manner.
I am trying to find a way by which I can get some kind of acknowledgement. the role of acknowledgement is important for my use case as I am trying to send some kind of medical equipment data (lets say blood pressure data during operation).
thus, In a nut shell, I need a quickest way to transfer data(just like MQTT) between the device and AWS IoT which should have some kind acknowledgement mechanism.
Also I would like to add that ,'if suppose the device or web server could not get the message sent by MQTT due to some internet issues, then it will be lost right. I need to add some sort of mechanism by which the mqtt messages could be buffered and queued for some time so that once the device or web app comes online it can get the queued data. I also know that there is something called device shadow but we have thought using it differently. can you suggest about it ?'
I am sure that some of you have faced this problem and also found an alternate of MQTT in data transferring to AWS IoT.
Kindly share your thoughts.
Thanks.

Can KAFKA producer read log files?

Log files of my application keep accumulating on a server.I want to dump them into HDFS through KAFKA.I want the Kafka producer to read the log files,send them to Kafka broker and then move those files to another folder.Can the Kafka producer read log files ? Also, is it possible to have the copying logic in Kafka producer ?
Kafka maintains feeds of messages in categories called topics.
We'll call processes that publish messages to a Kafka topic producers.
We'll call processes that subscribe to topics and process the feed of published messages consumers..
Kafka is run as a cluster comprised of one or more servers each of which is called a broker.
So, at a high level, producers send messages over the network to the Kafka cluster which in turn serves them up to consumers like this:
So this is not a suitable for your application where you want to injest log files. Instead you can try flume.
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
As you know, Apache Kafka is publish-subscribe messaging system. you can send message from your application. To send message from your application you can use kafka clients or kafka rest api.
In short, you can read your log with your application and can send these logs to kafka topics.
To handle these logs, you can use apache storm. You can find many integrated solution for these purposes. And by using storm you can
add any logic your stream processing.
You can read many useful detailed information about storm kafka integration.
Also to put your processed logs to hdfs, you can easily integrate your storm with hadoop. You can check this repo for it.
Kafka was developed to support high volume event streams such as real-time log aggregation. From the kafka documentation
Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption
Also I got this little piece of information from this nice article which almost similar to your use-case
Today, Kafka has been used in production at LinkedIn for a number of projects. There are both offline and online usage. In the offline case, we use Kafka to feed all activity events to our data warehouse and Hadoop, from which we then run various batch analysis

How to configure OSB to consume messages from Amazon SQS

I'm newbie to AWS and trying to work on the SQS for the first time. I've an Oracle Service Bus (OSB) in non-cloud environment and would like to configure OSB to consume messages from Amazon SQS. The documentation mentions to use REST API and poll repeatedly for messages. I also read about the 'client library for JMS' so that the OSB could treat SQS as JMS provider. What is the best approach to achieve this? Appreciate your inputs.
The easiest (not necessarily the purest way) would be to create a Java EE app that imports the SQS libraries and pulls messages from AWS and puts them on a local queue for OSB to process. The example code snippets are in Java, so it should be relatively straight forward.
The purest way would be to set it up as a remote JMS provider. However, how to set that up is not so clear - you may end up writing most of the code that went into option #1 above, but making a JMS client library instead of a MDB.

Ideas for scaling chat in AWS?

I'm trying to come up with the best solution for scaling a chat service in AWS. I've come up with a couple potential solutions:
Redis Pub/Sub - When a user establishes a connection to a server that server subscribes to that user's ID. When someone sends a message to that user, a server will perform a publish to the channel with the user's id. The server the user is connected to will receive the message and push it down to the appropriate client.
SQS - I've thought of creating a queue for each user. The server the user is connected to will poll (or use SQS long-polling) that queue. When a new message is discovered, it will be pushed to the user from the server.
SNS - I really liked this solution until I discovered the 100 topic limit. I would need to create a topic for each user, which would only support 100 users.
Are their any other ways chat could be scaled using AWS? Is the SQS approach viable? How long does it take AWS to add a message to a queue?
Building a chat service isn't as easy as you would think.
I've built full XMPP servers, clients, and SDK's and can attest to some of the subtle and difficult problems that arise. A prototype where users see each other and chat is easy. A full features system with account creation, security, discovery, presence, offline delivery, and friend lists is much more of a challenge. To then scale that across an arbitrary number of servers is especially difficult.
PubSub is a feature offered by Chat Services (see XEP-60) rather than a traditional means of building a chat service. I can see the allure, but PubSub can have drawbacks.
Some questions for you:
Are you doing this over the Web? Are users going to be connecting and long-poling or do you have a Web Sockets solution?
How many users? How many connections per user? Ratio of writes to reads?
Your idea for using SQS that way is interesting, but probably won't scale. It's not unusual to have 50k or more users on a chat server. If you're polling each SQS Queue for each user you're not going to get anywhere near that. You would be better off having a queue for each server, and the server polls only that queue. Then it's on you to figure out what server a user is on and put the message into the right queue.
I suspect you'll want to go something like:
A big RDS database on the backend.
A bunch of front-end servers handling the client connections.
Some middle tier Java / C# code tracking everything and routing messages to the right place.
To get an idea of the complexity of building a chat server read the XMPP RFC's:
RFC 3920
RFC 3921
SQS/ SNS might not fit your chatty requirement. we have observed some latency in SQS which might not be suitable for a chat application. Also SQS does not guarantee FIFO. i have worked with Redis on AWS. It is quite easy and stable if it is configured taking all the best practices in mind.
I've thought about building a chat server using SNS, but instead of doing one topic per user, as you describe, doing one topic for the entire chat system and having each server subscribe to the topic - where each server is running some sort of long polling or web sockets chat system. Then, when an event occurs, the data is sent in the payload of the SNS notification. The server can then use this payload to determine what clients in its queue should receive the response, leaving any unrelated clients untouched. I actually built a small prototype for this, but haven't done a ton of testing to see if it's robust enough for a large number of users.
HI realtime chat doesn't work well with SNS. It's designed for email/SMS or service 1 or a few seconds latency is acceptable. In realtime chat, 1 or a few seconds are not acceptable.
check this link
Latency (i.e. “Realtime”) for PubNub vs SNS
Amazon SNS provides no latency guarantees, and the vast majority of latencies are measured over 1 second, and often many seconds slower. Again, this is somewhat irrelevant; Amazon SNS is designed for server-to-server (or email/SMS) notifications, where a latency of many seconds is often acceptable and expected.
Because PubNub delivers data via an existing, established open network socket, latencies are under 0.25 seconds from publish to subscribe in the 95% percentile of the subscribed devices. Most humans perceive something as “realtime” if the event is perceived within 0.6 – 0.7 seconds.
the way i would implement such a thing (if not using some framework) is the following:
have a webserver (on ec2) which accepts the msgs from the user.
use Autoscalling group on this webserver. the webserver can update any DB on amazon RDS which can scale easily.
if you are using your own db, you might consider to decouple the db from the webserver using the sqs (by sending all requests the same queue), and then u can have a consumer which consume the queue. this consumer can also be placed behind an autoscalling group, so that if the queue is larger than X msgs, it will scale (u can set it up with alarms)
sqs normally updates pretty fast i.e less than one second. (from the moment u sent it, to the moment it appears on the on the queue), and rarely more than that.
Since a new AWS IoT service started to support WebSockets, Keepalive and Pub/Sub couple months ago, you may easily build elastic chat on it. AWS IoT is a managed service with lots of SDKs for different languages including JavaScript that was build to handle monster loads (billions of messages) with zero administration.
You can read more about update here:
https://aws.amazon.com/ru/about-aws/whats-new/2016/01/aws-iot-now-supports-websockets-custom-keepalive-intervals-and-enhanced-console/
Edit:
Last SQS update (2016/11): you can now use Amazon Simple Queue Service (SQS) for applications that require messages to be processed in a strict sequence and exactly once using First-in, First-out (FIFO) queues. FIFO queues are designed to ensure that the order in which messages are sent and received is strictly preserved and that each message is processed exactly once.
Source:
https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-sqs-introduces-fifo-queues-with-exactly-once-processing-and-lower-prices-for-standard-queues/
Now on, implementing SQS + SNS looks like a good idea too.