How to handle AWS IOT streaming data in relational database

How to handle AWS IOT streaming data in relational database - amazon-web-services

Generic information :-i am designing solution for one of IOT problem approach in which data is continuously streaming from plc(programmable logic controller),plc have different tags these tags are representation of telemetry data and data will be continuously streaming from these tags, each of devices will have alarm tags which will be 0 or 1 , 1 means there is an equipment failure
problem statement:- i have to read the alarm tag and raise a ticket if any of alarm tag value is 1 and i have to stream these alerts to dashboard and also i have to maintain the ticket history too,so the operator can update the ticket status too
My solution:- i am using aws IOT , i am getting data in dynamo db then i am using dynamo db stream to check if any new item is added in alarm table and if it will trigger lambda function (which i have implemented in java) lambda function opens a new ticket in relational database using hibernate.
problem with my approach:-the aws iot data is continuously streaming in alarm table at a very fast rate and this is opening a lot of connection before it can be closed that's taking my relational database down
please let me know if other good design approach can i adopt?

USE Amazon Kinesis Analytics to process streaming data. Dynamodb isn't suitable for this.
Read more here
Below image will give you an idea for same

Just a proposal....
From lambda, do not contact RDS,
Rather push all alarms in AWS SQS
then you can have one another lambda scheduled for every minute using AWS CloudWatch Rules that will pick all items from AWS SQS and then insert them in RDS at once.

I agree with raevilman's design of not letting Lambda contact RDS directly.
Since creating a new ticket is not the only task you Lambda function is doing, you are also streaming these alerts to a dashboard. Depending on the streaming rate and the RDS limitations, you may want to split these tasks in multiple queues.
Generic solution: I'd suggest you can push the alarm to a fanout exchange and this exchange will in turn push the alarm to one or more queues as required. You can then batch the alarms and perform multiple writes together without performing connect/disconnect cycle multiple times.
AWS specific Solution: I haven't used SQS so can't really comment on it's architecture. Alternatively, you can create an SNS Topic and publish these alarms to this topic. You can then have SQS queues as subscribers to this topic which in turn will be used for Ticketing and Dashboard purpose independent of each other.
Here again, from Ticketing queue, you can poll messages using Lambda or your own scheduler in batch and process tickets(frequency depending on how time critical alarms are).
You may want to read this tutorial to get some pointers.

You can control number of lambda function concurrency. And this will reduce the number of lambdas that get spinned up based on the dynamo events. Thereby reducing the connections to RDS.
https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/
Ofcourse , this will throttle the dynamo events.

Related

AWS RDS notification when record is added to a table

Is this possible?
I did my research but this is the only possible events for RDS:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html
Mostly for maintenance type events but what I want is - let's I have a RDS Oracle table called Users. Whenever a record is inserted in the table, an event or stream can be picked up by a Lambda and do the necessary action.

In short, no, not with the existing events you refer to - these are for monitoring the RDS service, not what you actually use it for, i.e. contents auditing (manipulation/tracking)
You can of course create notifications when an insert occurs, but you'll probably need to build/setup a few things.
A couple of ideas:
Building something closer to the database logic, i.e. in your code base add something that fires a SQS / SNS event.
If you can't (or don't want to) modify the logic that handle the database, maybe you could add a trigger that gets fired on INSERTs to the user table. Unfortunately I don't think there's support to execute a Lamdba from a trigger (as it is possible to do with PostgreSQL at the moment).
Set up a database activity stream from RDS to Kinesis to monitor the INSERTS. This is a bit of a additional infrastructure to set up, so it might be a bit too much depending on your use case:
"Database Activity Streams is an Amazon RDS feature that provides a near real-time stream of the activity in your Oracle DB instance. Amazon RDS pushes activities to an Amazon Kinesis data stream."
From Kinesis, you can configure AWS Lambda to consume the stream and take action on INSERT events.
Some references:
https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis-example.html
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/DBActivityStreams.Enabling.html

Can dynamodb send to sns based on some row of data it holds

Trying to design a solution for error handling. We have a lambda that receives data from an sns topic and sends data to a legacy service that has been known to be unavailable at times.
When the legacy service is down I want to send the messages to a dyanmodb table to be replayed.
I want to use a circuit breaker pattern. So at the minute I am thinking of spinning up a service that will constantly poll the legacy service then some pseudo code that looks like this
If (legacy service changes from dead to alive){
Send all entries from dynamo to sns topic;
//This will trigger the lambda again which will hit legacy service which we know is now up
}
The thing is, we like using serverless technologies and not sure I can have a serverless service that constantly polls, it makes sense for that to run on a server.
I am looking for a nice way to do this so I am wondering is it possible to configure dynamodb to poll the legacy service and on the condition it changes from dead to alive populate the sns topic. Or any other solutions using serverless technologies.
P.s I don't like the idea of running a lambda in intervals to check the dB as we could miss some down time, also reading data from dB and sending to sns could be a lengthy operation.
Update: been reading into circuit pattern more and realise I don't need to constantly poll I can just check amount of failed calls in last XX seconds in my dynamodb table so a new question has arose, can I send message from dynamodb to sns depending on a condition on one of its entries. Eg. FailsInLastMinute changes from 3 to below 3 we send all the messages from a column in dynamo to sns or do I need a service for this part

I don't think DynamoDB can do this, it's a database after all not an integration platform.
That said, a possible solution would be to use DynamoDB as a queue between SNS and the legacy app using DynamoDB streams. Any message from SNS gets inserted into DynamoDB using a Lambda. DynamoDB streams then triggers another Lambda that sends the message to the legacy app.
If the legacy app is down the Lambda function generates a failure as it cannot connect. DynamoDB will then retry the Lambda until it succeeds.
Note that you are probably better off using an SQS queue with fifo enabled. This will do the same but without the overhead of DynomoDB.

Does lambdas execute operations in sequence.?

We are contemplating using Amazon web services for our project. Wherein the upstream flow will push the messages into the kinesis and later those messages will be fed into the lambdas, those messages before processing are going to be in order. As per my understanding, the AWS lambdas will scale out horizontally based on the volume of the messages. We have a volume of 400 messages per second, which means AWS lambda will respond to message volume and will instantiate new processes on separate containers to leverage parallelism and in order to achieve parallelism, ordering has to be compromised. So in case of 10 messages, which were ordered, hit the lambda functions and one function takes more time than another, a new function will be provisioned in some container by the AWS to serve the request.
Is the final output going to be in order after all of this processes?
Any help is appreciated.
Thanks.

If you are using Amazon Kinesis, then you can use a Data Transformation to trigger an AWS Lambda function on each incoming record.
This allows the record to be transformed or deleted, before continuing through the Firehose. Thus, records can be processed by Lambda while remaining in the same order. The final data can be delivered to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service or Splunk.
If your application is consuming records from Amazon Kinesis directly (instead of via Firehose), then records will be consumed in order by your application.

AWS SQS and other services

my company has a messaging system which sends real-time messages in JSON format, and it's not built on AWS
our team is trying to use AWS SQS to receive these messages, which will then have DynamoDB to storage this messages
im thinking to use EC2 to read this messages then save them
any better solution ?? or how to do it i don't have a good experience

First of All EC2 is infrastructure on Cloud, It is similar to physical machine with OS on local setup. If you want to create any application that will fetch the data from Amazon SQS(Messages in Json Format) and Push it in dynamodb(No Sql database), Your design is correct as both SQS and DynamoDb have thorough Json Support. Once your application is ready then you deploy that application on EC2 machine.
For achieving this, your application must have the asyc Buffered SQS consumer that will consume the messages(limit of sqs messages is 256KB), Hence whichever application is publishing messages size of messages needs to be less thab 256Kb.
Please refer below link for sqs consumer
is putting sqs-consumer to detect receiveMessage event in sqs scalable
Once you had consumed the message from sqs queue you need to save it in dynamodb, that you can easily do it using crud repository. With Repository you can directly save the json in Dynamodb table but please sure to configure the provisioning write capacity based on requests, because more will be the provisioning capacity more will be the cost. Please refer below link for configuring the write capacity of table.
Dynamodb reading and writing units

In general, you'll have a setup something like this:
The EC2 instances (one or more) will read your queue every few seconds to see if there is anything there. If so, they will write this data to DynamoDB.
Based on what you're saying you'll have less than 1,000,000 reads from SQS in a month so you can start out on the free tier for that. You can have a single EC2 instance initially and that can be a very small instance - a T2.micro should be more than sufficient. And you don't need more than a few writes per second on DynamoDB.
The advantage of SQS is that if for some reason your EC2 instance is temporarily unavailable the messages continue to queue up and you won't lose any of them.
From a coding perspective, you don't mention your development environment but there are AWS libraries available for a pretty wide variety of environments. I develop in Java and the code to do this would be maybe 100 lines. I would guess that other languages would be similar. Make sure you look at long polling in the language you're using - it can help to speed up the processing and save you money.

using AWS SNS and Lambda - what's the right use case for an activity feed

I want to use an AWS lambda function to fan out and insert activity stream info to a firebase endpoint for every user.
Should I be using Kinesis, SQS or SNS to trigger the lambda function for this use case? The updates to the activity stream can be triggered from the server and clients should receive the update near real time (within 60 seconds or so).
I think I have a pretty good idea on what SQS is, and have used Kinesis in the past but not quite sure about SNS.
If we created an SNS topic for each user and then each follower subscribes to these topics with an AWS lambda function - would that work?
Does it make sense to programmatically create topics and subscriptions for every user and follow relationship respectively?

As usual, answer to such a question is mostly, 'it depends on your use-case'.
Kinesis vs SQS:
If your clients care about relative (timestamp-based, for e.g.) ordering between events, you'll almost certainly have to go with Kinesis. SQS is a best-effort FIFO queue, meaning events can arrive out of order and it would up to your client to manage relative ordering.
As far as latencies are concerned, I have seen that data ingested into Kinesis can become visible to its consumer in as less as 300 ms.
When can SNS be interesting to you?
(Even with SNS, you'd have to use SQS). If you use SNS, it will be easy to add a new application that can process your events. For example, if in future you decide to ingest all events into, say, an Elasticsearch to provide real-time analytics, all you'd have to do is add another SQS queue to your existing topic(s) and write a consumer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js