my company has a messaging system which sends real-time messages in JSON format, and it's not built on AWS
our team is trying to use AWS SQS to receive these messages, which will then have DynamoDB to storage this messages
im thinking to use EC2 to read this messages then save them
any better solution ?? or how to do it i don't have a good experience
First of All EC2 is infrastructure on Cloud, It is similar to physical machine with OS on local setup. If you want to create any application that will fetch the data from Amazon SQS(Messages in Json Format) and Push it in dynamodb(No Sql database), Your design is correct as both SQS and DynamoDb have thorough Json Support. Once your application is ready then you deploy that application on EC2 machine.
For achieving this, your application must have the asyc Buffered SQS consumer that will consume the messages(limit of sqs messages is 256KB), Hence whichever application is publishing messages size of messages needs to be less thab 256Kb.
Please refer below link for sqs consumer
is putting sqs-consumer to detect receiveMessage event in sqs scalable
Once you had consumed the message from sqs queue you need to save it in dynamodb, that you can easily do it using crud repository. With Repository you can directly save the json in Dynamodb table but please sure to configure the provisioning write capacity based on requests, because more will be the provisioning capacity more will be the cost. Please refer below link for configuring the write capacity of table.
Dynamodb reading and writing units
In general, you'll have a setup something like this:
The EC2 instances (one or more) will read your queue every few seconds to see if there is anything there. If so, they will write this data to DynamoDB.
Based on what you're saying you'll have less than 1,000,000 reads from SQS in a month so you can start out on the free tier for that. You can have a single EC2 instance initially and that can be a very small instance - a T2.micro should be more than sufficient. And you don't need more than a few writes per second on DynamoDB.
The advantage of SQS is that if for some reason your EC2 instance is temporarily unavailable the messages continue to queue up and you won't lose any of them.
From a coding perspective, you don't mention your development environment but there are AWS libraries available for a pretty wide variety of environments. I develop in Java and the code to do this would be maybe 100 lines. I would guess that other languages would be similar. Make sure you look at long polling in the language you're using - it can help to speed up the processing and save you money.
Related
Generic information :-i am designing solution for one of IOT problem approach in which data is continuously streaming from plc(programmable logic controller),plc have different tags these tags are representation of telemetry data and data will be continuously streaming from these tags, each of devices will have alarm tags which will be 0 or 1 , 1 means there is an equipment failure
problem statement:- i have to read the alarm tag and raise a ticket if any of alarm tag value is 1 and i have to stream these alerts to dashboard and also i have to maintain the ticket history too,so the operator can update the ticket status too
My solution:- i am using aws IOT , i am getting data in dynamo db then i am using dynamo db stream to check if any new item is added in alarm table and if it will trigger lambda function (which i have implemented in java) lambda function opens a new ticket in relational database using hibernate.
problem with my approach:-the aws iot data is continuously streaming in alarm table at a very fast rate and this is opening a lot of connection before it can be closed that's taking my relational database down
please let me know if other good design approach can i adopt?
USE Amazon Kinesis Analytics to process streaming data. Dynamodb isn't suitable for this.
Read more here
Below image will give you an idea for same
Just a proposal....
From lambda, do not contact RDS,
Rather push all alarms in AWS SQS
then you can have one another lambda scheduled for every minute using AWS CloudWatch Rules that will pick all items from AWS SQS and then insert them in RDS at once.
I agree with raevilman's design of not letting Lambda contact RDS directly.
Since creating a new ticket is not the only task you Lambda function is doing, you are also streaming these alerts to a dashboard. Depending on the streaming rate and the RDS limitations, you may want to split these tasks in multiple queues.
Generic solution: I'd suggest you can push the alarm to a fanout exchange and this exchange will in turn push the alarm to one or more queues as required. You can then batch the alarms and perform multiple writes together without performing connect/disconnect cycle multiple times.
AWS specific Solution: I haven't used SQS so can't really comment on it's architecture. Alternatively, you can create an SNS Topic and publish these alarms to this topic. You can then have SQS queues as subscribers to this topic which in turn will be used for Ticketing and Dashboard purpose independent of each other.
Here again, from Ticketing queue, you can poll messages using Lambda or your own scheduler in batch and process tickets(frequency depending on how time critical alarms are).
You may want to read this tutorial to get some pointers.
You can control number of lambda function concurrency. And this will reduce the number of lambdas that get spinned up based on the dynamo events. Thereby reducing the connections to RDS.
https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/
Ofcourse , this will throttle the dynamo events.
I've recently signed up to AWS to test out their IoT platform and after setting up a few Things and going through the documentation I still seem to be missing a crucial bit of information - how to wrangle all the information from my Things?
For example if I were to build a web-based application to display the health/status of all the Things and possibly also interact with a specific Thing, what would be the way to go about it?
Do I register a "dummy" thing that also uses the device SDK to pub/sub to the topics?
Do I take whatever data the Things publish and route it to a shared DB for further processing?
Do I create Lambdas that the Things invoke?
Do I create a stand-alone application that uses the general AWS SDK to connect itself to the IoT platform?
To me the last idea sounds the most viable and "preferred" as I would need two-way interaction, not just passive listening to changes in Things, is that correct?
Generally speaking your setup might be:
IoT device publishes to AWS SQS
Some Service (application or lambda) reads from SQS and processes data (e.g. saves it to DynamoDB)
And then to display data
Stand alone application reads from DynamoDB and makes data available to users
There are lots of permutations of this. For example your IoT device can write directly to DynamoDB, then you can process the data from there. I would suggest a better pattern is to write to SQS, as you will have a clean separation between data publishing, processing and storage.
In the first instance I would probably write one application that reads from the SQS, processes the data, stores it in DynamoDB and then provides access to that data for users. A better solution longer term is to have separate systems to process/store the data, and to present that data to users.
Lambda is popular for processing of the device data, as its cost effective (runs only when needed) and scales well. Your data presentation application is probably a traditional webapp running on something like elastic beanstalk.
Our .net core web app currently accepts websocket connections and pushes out data to clients on certain events (edit, delete, create of some of our entities).
We would like to load balance this application now but foresee a problem in how we handle the socket connections. Basically, if I understand correctly, only the node that handles a specific event will push data out to its clients and none of the clients connected to the other nodes will get the update.
What is a generally accepted way of handling this problem? The best way I can think of is to also send that same event to all nodes in a cluster so that they can also update their clients. Is this possible? How would I know about the other nodes in the cluster?
The will be hosted in AWS.
You need to distribute the event to all nodes in the cluster, so that they can each push the update out to their websocket clients. A common way to do this on AWS is to use SNS to distribute the event to all nodes. You could also use ElastiCache Redis Pub/Sub for this.
As an alternative to SNS or Redis, you could use a Kinesis Stream. But before going to that link, read about Apache Kafka, because the AWS docs don't do a good job of explaining why you'd use Kinesis for anything other than log ingest.
To summarize: Kinesis is a "persistent transaction log": everything that you write to it is stored for some amount of time (by default a day, but you can pay for up to 7 days) and is readable by any number of consumers.
In your use case, each worker process would start reading at the then-current end-of stream, and continue reading (and distributing events) until shut down.
The main issue that I have with Kinesis is that there's no "long poll" mechanism like there is with SQS. A given read request may or may not return data. What it does tell you is whether you're currently at the end of the stream; if not, you have to keep reading until you are. And, of course, Amazon will throttle you if you read too fast. As a result, your code tends to have sleeps.
We need to get data from 1000s of IOT devices (temperature, pressure, RPM etc total 50+ parameters) and show it on a dashboard without much processing (just checking if numbers are in range otherwise raise alarm) but real time.
I have reviewed and tested many aws blog resources like Kinesis Storm ClickStream App
however I think using storm is an overkill for such an easy task. All I want to do is save the data in DB and show graphs (30 Minute, 1 Hour, or custom date). This is what I have figured so far
Device -> AWS IOT(mqtt) -> Kinesis -> x -> dynamoDB -> Presenter Web APP (Laravel)
I might have to use Node.js and Redis Pub/Sub as mentioned in ClickStream example for real time updates to graphs and alerts.
I don't want to use Apache Storm because it's in Java and have learning curve (and couldn't find any good resource). I know I can use Lambda but not sure how will it scale.
any thoughts on solution ?
AWS don't have KCL for PHP, alternatives or solutions? because I am familiar with PHP but not with Java.
Apache storm is a distributed event processing framework. In your use-case, you do not seem to perform any computation on the events. Basically, your application is doing three tasks:
Ingest data into the system.
Read the data from period X to Y.
Draw graphs on a web frontend.
The ingestion part is taken care by AWS-IOT. The first step you should do is create an SNS topic and publish all IoT data to SNS topics. Here you get the flexibility to create one topic per datatype(ex: temperature, pressure) and attach consumer SQS queues to the topics to accumulate messages into. For a persistent DB, one consumer can be DynamoDB table, another consumer can be a Lambda function which performs some kind of filtering and data transform and updates your cache. If you need to perform some kind of OLAP/Analytical queries on the data, then consider using Redshift as one of the consumers. You will have to get into specific requirements to finalize your design.
Have you considered routing your data to AWS IoT Analytics after receiving the mqtt message in IoT Core? This way you could get rid of all the infrastructure heavy lifting with kinesis, Dynamo and your presentation layer.
AWS IoT Analytics provides you the ingestion, data preparation and querying capabilities. Once you have the data stored in the processed datastore, you can visualize it with AWS QuickSight.
I'm basically just looking for a starting point here. I have an app which needs to include the ability to update certain data in real time. For instance, the user has the ability to specify that she wants X to happen exactly 24 hours from the current time. I want to implement a framework for updating this end-user and any other relevant end-users after 24 hours that the event has occurred. Can anyone just provide me with a high-level explanation of which AWS services to implement and how to implement them in order to achieve this sort of framework? I think it includes some combination of SNS and SQS, but I'm not sure if these are relevant since I don't need to send a message or notification, rather more of an update that some sort of data has changed. If it's relevant, I'm currently using RDS with a MySQL database and Cognito for establishing user identities. Thanks!
I think its most likely a combination of SNS, and an EC2 instance - plus your existing database (and optionally SQS).
SNS can take care of the 'push' notification to a mobile device, but you can't schedule things to happen in the future (except for a few minutes).
Off the top of my head I would say the database keeps a list of what needs to be pushed, when it needs to be pushed and to whom.
The Ec2 instance has a cron job of some sort that polls on some in interval, running queries against your database to find 'things that need to be pushed now'.
If something needs to get a pushed, the cron job uses SNS to do the push - that could either just be a message (hey, you need to get new data), or else if the data is small enough, you could send the data within the message itself.
If you wanted to add a bit of scaling capability, the cron job that finds items to be pushed could, instead of sending out the SNS notifications itself, add a message to an SQS queue (i.e. work to be done), and you could use as many Ec2 instances as you needed querying the SQS queue and then sending out the SNS notifications in a parallel fashion.