Store AWS IoT data and request real-time and bulk data?

Store AWS IoT data and request real-time and bulk data? - amazon-web-services

Currently i'm working on a project where i have 7 of the same sensors (temperature, light, movement) in the AWS IoT. These sensors are placed in different rooms of the building. These sensors send an update every 5 minutes.
I need a secure way to store the data in AWS and get this to an external customer.
To showcase this i want do create a dashboard showing a floor map of the building with following features:
Get latest data of each sensor
Show the temperature on the correct place on the image
Update the data in realtime when an update of the sensor happened
Show a graph and all data of a selected sensor
I currently saved the data of all the sensors in 1 DynamoDB Table and created a working API in this manner:
API Gateway -> AWS Lambda -> DynamoDB <- AWS IoT <- Sensors
But problems with this are the fact that it does not update realtime. And it is hard to only get the latest value out of DynamoDB. So i need a better way.
2 questions:
What is the best way to store the data in AWS for this purpose? And how to create a userfriendly and secured API to request realtime and bulk data?
Does there exist a dashboard tool that can show sensor data on a static image?

The best way to store your data in AWS depends in a large part on how you want to access the data. From the use case you have described Dynamodb will work along with a web socket connection to the AWS IOT MQTT message broker.
For Dynamodb I would consider creating a table with a partion key that is your sensor id, and a sort key that is the timestamp. Then you can easily query the table to get the newest records for each sensor. Set the ScanIndexForward parameter to false to return the record in descending order in the query.
For Realtime messages you can connect to the AWS IOT MQTT message broker from your browser using a MQTT over web sockets. You can subscribe to the same topics that your sensors publish to and receive realtime updates.

Related

Transfer/Replicate Data periodically from AWS Documentdb to Google Cloud Big Query

We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.

This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.

How do I monitor AWS IOT connected devices history and group by shadow state/attribute

Given I have xK Devices, Each of them connect to AWS IoT and each of them have attributes/shadow states to group them
device shadow example:
{
"factory": "factoryA",
"pipeInstalledVersion: "1.5.6"
}
What's the easiest way to monitor these devices on a grouped basis (Based on the shadow state)
E.g. I want to know how many devices are connected in factory A at 11:05, 15:30, 20:50. I also want to know what pipeInstalledVersion the devices have at a particular time interval (e.g. every 5 minutes). I also want to know e.g. if there are less than X devices connected at 09:00 at factory A, then send an alert.
AWS IoT has a monitoring dashboard for connected devices but there is no way to group that by shadow state/attribute
I've looked into AWS IoT analytics but it looked like there were some limitations
- The recommended platform QuickSight to visualise data has a limited auto refresh period (1 hour I believe), even though the underlying dataset can be refreshed every ~5 minutes.
- The dataset will only show data if an IOT device has transmitted data in that time. What if the IOT device is connected but it does not transmit data in that time period? It will be as if it is not connected.

Fleet indexing provides the capability to powerfully search over your fleet of devices using Thing attributes and Thing Shadow state.
Combine this with AWS Lambda and you have yourself scheduled searches over your fleet that can be paired with any number of AWS actions (i.e. record CloudWatch metric, scale EC2, and so on).
Sample Fleet indexing queries:
connectivity.connected:true
returns all Things currently connected.
connectivity.connected:true AND shadow.reported.model:A
returns all Things currently connected and have a particular Shadow state.
aws iot get-cardinality --aggregation-field "connectivity.connected" --query-string "*"
To find # of connected devices at a given time. (aggregation queries)

How to get notified when a AWS Dynamo DB entry are updated?

I would like to be notified when a DynamoDB table changes, the same way as Google Firebase Realtime Database.
I consuming this service in a frontend javascript application.

DynamoDB doesn't have realtime notification/trigger for update on table.
But in this case you can try to use DynamoDB Streams for Capturing Table Activity.
Here are some example use cases:
An application in one AWS region modifies the data in a DynamoDB
table. A second application in another AWS region reads these data
modifications and writes the data to another table, creating a replica
that stays in sync with the original table.
A popular mobile app modifies data in a DynamoDB table, at the rate of
thousands of updates per second. Another application captures and
stores data about these updates, providing near real time usage
metrics for the mobile app.
A global multi-player game has a multi-master topology, storing data
in multiple AWS regions. Each master stays in sync by consuming and
replaying the changes that occur in the remote regions.
An application automatically sends notifications to the mobile devices
of all friends in a group as soon as one friend uploads a new picture.
A new customer adds data to a DynamoDB table. This event invokes
another application that sends a welcome email to the new customer.
more details in this DynamoDB Streams document.
And here is how to you can integrate DynamoDB Streams with AWS Javascript SDK:
var dynamodbstreams = new AWS.DynamoDBStreams();
dynamodbstreams.describeStream(params, function (err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
We have some Event supported by DynamoDB Streams
eventName — (String) The type of data modification that was performed
on the DynamoDB table:
INSERT - a new item was added to the table.
MODIFY - one or more of an existing item's attributes were modified.
REMOVE - the item was deleted from the table.
By the way, if you want to notify to your client via another way instead of DynamoDB Streams you can try to using Lambda Function follow this article.
Hope this can help you solving your issue.

DynamoDB and Firebase/Firestore are really different.
Firebase/Firestore is a realtime database where you scan subscribe to changes on the client.
DynamoDB is a NoSQL Database to Store Key/Value Pairs.
More suitable for a similar use case is "AWS AppSync" which provides live updates like Firebase/Firestore does.
If you want to use DynamoDB nonetheless have a look at DynamoDB Streams to trigger an event on update of the table.
The questions is then how do you get the update to the client.
You could send a message to an SNS Topic, sending Push Notifications to the client if necessary.
But in the end you will build with DynamoDB Streams and SNS and maybe Lambda what Firebase/Firestore or "AWS AppSync" provides out of the box.

I normally see the DynamoDB -> SNS topic pattern -> (With custom lambda).
If your application is for mobile have you taken a look at AWS SNS Mobile Push and seen if it would not be a better fit for your architecture.

Do I need SQS queues to store remote data in the Amazon Web Services (AWS) cloud?

My first question is, do I need SQS queues to receive my remote data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?
Currently, my company uses a third-party vendor to gather and report on our remote data. By remote data, I mean data coming from our machines out in the wilderness. These data are uploaded a few times each day to Amazon Web Services SQS queues (setup by the third party vendor), and then the third-party vendor polls the data from the queues, removing it and saving it in their own on-premises databases for one year only. This company only provides reporting services to us, so they don't need to store the data long-term.
Going forward, we want to own the data and store it permanently in Amazon Web Services (AWS). Then we want to use machine learning to monitor the data and report any potential problems with the machines.
To repeat my first question, do we need SQS queues to receive this data, or can it go directly into an Amazon cloud storage solution like S3 or EC2?
My second question is, can an SQS queue send data to two different places? That is, can the queue send the data to the third party vendor, and also to an Amazon Web Services database?
I am an analyst/data scientist, so I know how to use the data once it's in a database. I just don't know the best way of getting it into a database.

You don't really need to have a queue. Whenever you push an item in Queue a function gets triggered and you can perform your custom logic in that. whether you want to store the information to S3/EC2 or sending it to anyother Http service.
Your Lambda(function) can send the data to anyother 3rd party service easily.

AWS Stream data from IOT to dashboard graphs

We need to get data from 1000s of IOT devices (temperature, pressure, RPM etc total 50+ parameters) and show it on a dashboard without much processing (just checking if numbers are in range otherwise raise alarm) but real time.
I have reviewed and tested many aws blog resources like Kinesis Storm ClickStream App
however I think using storm is an overkill for such an easy task. All I want to do is save the data in DB and show graphs (30 Minute, 1 Hour, or custom date). This is what I have figured so far
Device -> AWS IOT(mqtt) -> Kinesis -> x -> dynamoDB -> Presenter Web APP (Laravel)
I might have to use Node.js and Redis Pub/Sub as mentioned in ClickStream example for real time updates to graphs and alerts.
I don't want to use Apache Storm because it's in Java and have learning curve (and couldn't find any good resource). I know I can use Lambda but not sure how will it scale.
any thoughts on solution ?
AWS don't have KCL for PHP, alternatives or solutions? because I am familiar with PHP but not with Java.

Apache storm is a distributed event processing framework. In your use-case, you do not seem to perform any computation on the events. Basically, your application is doing three tasks:
Ingest data into the system.
Read the data from period X to Y.
Draw graphs on a web frontend.
The ingestion part is taken care by AWS-IOT. The first step you should do is create an SNS topic and publish all IoT data to SNS topics. Here you get the flexibility to create one topic per datatype(ex: temperature, pressure) and attach consumer SQS queues to the topics to accumulate messages into. For a persistent DB, one consumer can be DynamoDB table, another consumer can be a Lambda function which performs some kind of filtering and data transform and updates your cache. If you need to perform some kind of OLAP/Analytical queries on the data, then consider using Redshift as one of the consumers. You will have to get into specific requirements to finalize your design.

Have you considered routing your data to AWS IoT Analytics after receiving the mqtt message in IoT Core? This way you could get rid of all the infrastructure heavy lifting with kinesis, Dynamo and your presentation layer.
AWS IoT Analytics provides you the ingestion, data preparation and querying capabilities. Once you have the data stored in the processed datastore, you can visualize it with AWS QuickSight.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Store AWS IoT data and request real-time and bulk data? - amazon-web-services

Related

Transfer/Replicate Data periodically from AWS Documentdb to Google Cloud Big Query

How do I monitor AWS IOT connected devices history and group by shadow state/attribute

How to get notified when a AWS Dynamo DB entry are updated?

Do I need SQS queues to store remote data in the Amazon Web Services (AWS) cloud?

AWS Stream data from IOT to dashboard graphs

Categories

Resources