MQTT message can not ingest to AWS iot analytics - amazon-web-services

I have a device to generate mqtt message to AWS IoT. I have created the channel/pipeline/data store/dataset for it, but I received nothing but _dt in my dataset.
The reason is the message contains some invalid payload (some special characters and hyphen -), so it cannot be ingested to the data store.
I cannot modify the original MQTT message because to the device cannot be reprogrammed.
Should I republish the message by making a new rule or use the function in pipeline to filter it?

Related

AWS IoT simulator - How to forward messages from a topic created in MQTT test client to a Kinesis stream

I am trying to use the https://aws.amazon.com/solutions/implementations/iot-device-simulator/ and am able to see the messages streaming to a topic in the MQTT test client. The topic is called /test/devicesimulator.
I created an IoT rule to forward messages to a Kinesis data stream and the rule just says :
SELECT * FROM '/test/devicesimulator'
The attached role seems to have the permissions to write to the destination kinesis data stream.
I seem to get nothing on the stream though.
My question is:
Is it even possible to create an IoT rule for a topic created in the MQTT test client? If yes, Does my SQL query look like what it should be?
If not, what's the alternative to this?
I am Sorry I am new to IoT and trying to put pieces together. What do I need to change to get this working?
0 Commen
Have you checked CloudWatch Logs? You can check rule execution status when you send a message. Run the follwing command and look for "eventType":”RuleExecution” in the log.
aws logs tail --follow AWSIotLogsV2

A question about IoT Core (MQTT) data integrity and service guarantees

I'm looking at using AWS IoT Core as our data ingress for various types of devices. One of the unbreakable rules of our old ingress pipeline is data integrity. When a device has sent data into our backend, the data does not get lost (it's written to permanent storage before we ack to the device that we've received the data).
In MQTT things seem a bit different. From what I've read so far, if a device writes to an MQTT topic, it has the option of setting QoS to 0 (at most once) or 1 (at least once) and to guarantee delivery we would pick QoS 1 of course.
However, to the best of my understanding, that doesn't guarantee that there is any subscriber on the topic to pick the message up. If a device sends a message to a topic with no subscriber, the message will get lost. MQTT has a concept of retained messages (which AWS supports since about a year ago) but that only retains the latest message, so if a device sends two messages to a non-subscribed topic, the first message will be lost.
So now for my actual question (finally). AWS IoT has "rules" that you can attach to MQTT topics. However, I have not found any information about what guarantees AWS IoT provides that these rules will always be monitoring the topics they're created on. Can anyone tell me whether there is a 100% guarantee that a message sent to an MQTT topic that has a rule assigned to it will not ever get lost? By that I mean that I need that rule to finish processing and either successfully execute the actions defined on it or successfully execute the error action defined on it (which would just be writing the message to a DLQ, either SQS or S3 bucket).
I personally never heard about data loss caused by a AWS IoT Rule.
This is just a simple message forwarding. I had a project where we had to forward about thousand messages per second to other services with these rules. We had some data loss, but not caused by the rules, but:
Edge device did not send the message (kind of rejected)
Wrong handling of a specific kind of message in the transformation process
Duplicates (data is also not plausible) - Can be handled with SQS
Quotas: very important if You have a high load to check the quotas. If the quota is being hit the ingest may fail silently.
At the end of the day we had several problems with IoT Core including Greengrass and we switched to Kinesis Data Streams and Kinesis Delivery Streams, where we had more control. Edge device was configured for retries in case ingest failed and we didn't reached the quotas with autoscaling option on. There were also no duplicates received.
Keep in mind that this is only my project experiance, Your case is probably very different and the IoT Rules could be actually a valid approach for You.

Push vs Pull for GCP Dataflow

I want to know what type of subscription one should create in GCP pubsub in order to handle high-frequency data from pubsub topic.
I will be ingesting data in dataflow with 100 plus messages per second.
Will pull or push subscription really matters and how it will gonna affect the speed and all.
If you consume the PubSub subscription with Dataflow, only Pull subscription is available
either you create one and you give it in the parameter of your dataflow pipeline
or you specify only the topic in your dataflow pipeline and Dataflow will create by itself the pull subscription.
If both case, Dataflow will process the messages in streaming mode
The difference
If you create the subscription by yourselves, all the messages will be stored and kept (up to 7 days by default) and will be consumed when the dataflow pipeline will be started.
If you let Dataflow to create the subscription, only the message that arrives AFTER the subscription creation will be consumed by the dataflow pipeline. If you want to not loose a message, it's not the recommended solution. If you don't care about the old message, it's a good choice.
High frequency
Then, 100 messages per second is absolutely not high frequency. 1 pubsub topic can ingest up to 1 000 000 of messages per second. Don't worry about that!
Push VS Pull
The model is different.
With the push subscription, you have to specify an HTTP endpoint (on GCP or elsewhere) that consumes the message. It's a webhook pattern. If the platform endpoint scale automatically with the traffic (Cloud Run, Cloud Functions for example), the message rate can go very high!! And the HTTP return code stands for message acknowledgment.
With Pull subscription, the client needs to open a connection to the subscription and then pull the message. The client needs to explicitly acknowledge the messages. Several clients can be connected at the same time. With the client library, the message is consumed with gRPC protocol and it's more efficient (in terms of network bandwidth) to receive and consume the message
Security point of view
With push, it's the PubSub to be authenticated on the HTTP endpoint, if the endpoint required authentication
With pull, it's the client that needs to be authenticated on the PubSub subscription.

AWS IoT and SQS : How to get IoT topic and complete JSON payload when redirecting to SQS in AWS

I am using AWS IoT and have created a rule, which forwards the data received on this specific topic to a SQS queue.
The SQL statement for this rule is as below -
SELECT *, topic() AS topic FROM '+/topicname'
When a message is published to this queue, a lambda function is triggered.
This lambda function processes the payload.
When I use the above rule, the lambda is getting triggered correctly.
I am parsing the sqsEvent.Records[0].Body to extract the payload and the topic name.
When I am using this rule, I am able to extract the topic name. But the complete JSON payload is not received. Only a partial payload is received in the lambda function.
The size of the JSON payload is around 700 bytes.
I think the maximum size of the message for a SQS queue is around 256 kb.
So I am not sure why the payload is getting trimmed.
Is there any issue with the SQL statement in the IoT rule?
If I use the below SQL statement, I am getting the complete payload, but I am not able to extract the topic name.
SELECT * FROM '+/topicname'
Is there any other way to extract the topic name?
Change the SQL version of you rule to "2016-03-23". I had the same issue with the trimmed messages, but with the latest SQL version it works like documented.

What is the difference between Jobs and Messages in AWS IoT?

Jobs and Messages are both just transactions of text between AWS IoT service and devices.
Why should I use jobs than messages or the other way around?
They are transaction but they have their differences
Messages - The AWS IoT message broker is a publish/subscribe broker service that enables the sending and receiving of messages to
and from AWS IoT. The act of sending the message is referred to as
publishing. The act of registering to receive messages for a topic
filter is referred to as subscribing.
Example - When communicating with AWS IoT, a client sends a message addressed to a topic like Sensor/temp/room1. The message broker, in turn, sends the message to all clients that have registered to receive messages for that topic.
Jobs - AWS IoT jobs can be used to define a set of remote operations that are sent to and executed on one or more devices
connected to AWS IoT.
Example - you can define a job that instructs a set of devices to download and install application or firmware updates, reboot, rotate certificates, or perform remote troubleshooting operations.
To use Jobs or Messages is up to your requirements. If you want to update a set of devices Jobs seems to do the work, or its just one device message will do.